Test set for measuring PDF viewer performance, streaming, and linearization-limit behaviour.
Each PDF contains a recognizable rainbow-grid label image (top half), a unique high-entropy noise image (bottom half, forces real file size), and a per-page "Page N" header. Sizes were chosen to straddle the 2 GB and 4 GB byte boundaries (signed/unsigned 32-bit) and to test viewers' ability to stream linearized PDFs across them.
These are large files. The 9 GB and 10 GB versions in particular will saturate slow connections. Use a download manager that supports HTTP range requests if your browser stalls.
| File | Size | Pages | |
|---|---|---|---|
| sample_1gb.pdf Non-linearized | ~1 GB | 576 | OpenDownload |
| sample_1gb_linearized.pdf Linearized | ~1 GB | 576 | OpenDownload |
| sample_3gb.pdf Non-linearized > 2 GB | ~3 GB | 1728 | OpenDownload |
| sample_3gb_linearized.pdf Linearized > 2 GB | ~3 GB | 1728 | OpenDownload |
| sample_5gb.pdf Non-linearized > 4 GB | ~5 GB | 2880 | OpenDownload |
| sample_5gb_linearized.pdf Linearized > 4 GB | ~5 GB | 2880 | OpenDownload |
| sample_9gb.pdf Non-linearized | ~9 GB | 5184 | OpenDownload |
| sample_9gb_linearized.pdf Linearized Max spec-linearizable | ~9 GB | 5184 | OpenDownload |
| sample_10gb.pdf Non-linearized | ~10 GB | 5760 | OpenDownload |
The PDF spec does not allow it. Linearized PDFs must use classic cross-reference tables, whose byte offsets are fixed at 10 decimal digits. That caps the addressable file size at 1010 bytes — about 9.31 GiB. Files larger than that cannot be cleanly linearized using the standard.
Linearization (also called "Fast Web View") is defined in PDF 1.4 / Annex F of the ISO 32000 spec. It rearranges the document so the first page and its dependencies appear at the start of the file, and adds a hint table that lets a viewer render page 1 before the rest of the file has streamed. Without it, a viewer must download the entire PDF before it can render anything — the cross-reference table sits at the end of the file.
Linearization predates cross-reference streams (added in PDF 1.5), which encode object offsets in raw binary and support arbitrary file sizes. The two formats are mutually exclusive: per the Foxit PDF SDK header documentation for e_SaveFlagLinearized, "this should be used alone and cannot be used with other saving flags except e_SaveFlagNoUpdatingMetadataDateTime" — including the xref-stream flag. A linearized PDF must use the older classic xref table format, whose entries look like 0000123456 00000 n; that fixed 10-digit offset field tops out at 9,999,999,999 bytes.
We built a small custom binary, foxit_linearize.exe, against the Foxit PDF SDK Core (Windows C++) on our GPU box. It calls PDFDoc::SaveAs(output, e_SaveFlagLinearized) — the same code path the desktop product uses for "Save As Optimized for Fast Web View."
For each input we record the JSON returned by the wrapper, including IsLinearized() on the freshly-written output file. The behaviour we found is more conservative than the spec ceiling alone would predict:
IsLinearized() == true. Output sizes match input within a few KB. The variants in the table above were produced by Foxit.SaveAs returns success and writes an output file, but reopening returns IsLinearized() == false. Foxit silently degrades to a non-linearized save even though the file is comfortably under the 1010-byte spec ceiling. We fell back to qpdf --linearize for this variant, which produces a clean linearized file at this size.qpdf --linearize instead writes a malformed file with truncated offsets that most viewers reject. Both tools confirm the spec's hard ceiling; we don't ship a 10 GB linearized variant.So Foxit's empirical threshold sits somewhere between 5 GB and 9 GB — not at 10 GB. The probable cause is an internal sanity check or 32-bit-signed offset usage in Foxit's hint-table generator, but we haven't binary-searched it. Net result: Foxit's behaviour is safer than qpdf's (no malformed output ever), but its useful linearization range is narrower.
int32_t.uint32_t.Linearization is a deployment-time optimization for one specific scenario: web-embedded viewers loading large documents over the public internet. For everything else — archives, edit pipelines, fast networks, the increasingly common multi-GB technical PDFs — the spec ceiling and edit fragility outweigh the streaming benefit. Treat it as a publish-step toggle, not a default. And if your file is already over 5 GB, don't bother trying.
Every page of every file shows its target size in the rainbow label (e.g. "1 GB", "10 GB") plus a "Page N" header. If you can read those, the viewer is rendering content streams correctly. The bottom-half noise image is unique per page so viewer-side image deduplication never short-circuits the test.