Sample PDFs

Test set for measuring PDF viewer performance, streaming, and linearization-limit behaviour.

Purpose

Each PDF contains a recognizable rainbow-grid label image (top half), a unique high-entropy noise image (bottom half, forces real file size), and a per-page "Page N" header. Sizes were chosen to straddle the 2 GB and 4 GB byte boundaries (signed/unsigned 32-bit) and to test viewers' ability to stream linearized PDFs across them.

Heads up

These are large files. The 9 GB and 10 GB versions in particular will saturate slow connections. Use a download manager that supports HTTP range requests if your browser stalls.

Files

File Size Pages
sample_1gb.pdf Non-linearized ~1 GB 576 OpenDownload
sample_1gb_linearized.pdf Linearized ~1 GB 576 OpenDownload
sample_3gb.pdf Non-linearized > 2 GB ~3 GB 1728 OpenDownload
sample_3gb_linearized.pdf Linearized > 2 GB ~3 GB 1728 OpenDownload
sample_5gb.pdf Non-linearized > 4 GB ~5 GB 2880 OpenDownload
sample_5gb_linearized.pdf Linearized > 4 GB ~5 GB 2880 OpenDownload
sample_9gb.pdf Non-linearized ~9 GB 5184 OpenDownload
sample_9gb_linearized.pdf Linearized Max spec-linearizable ~9 GB 5184 OpenDownload
sample_10gb.pdf Non-linearized ~10 GB 5760 OpenDownload

Why no 10 GB linearized variant?

Quick answer

The PDF spec does not allow it. Linearized PDFs must use classic cross-reference tables, whose byte offsets are fixed at 10 decimal digits. That caps the addressable file size at 1010 bytes — about 9.31 GiB. Files larger than that cannot be cleanly linearized using the standard.

The longer version

Linearization (also called "Fast Web View") is defined in PDF 1.4 / Annex F of the ISO 32000 spec. It rearranges the document so the first page and its dependencies appear at the start of the file, and adds a hint table that lets a viewer render page 1 before the rest of the file has streamed. Without it, a viewer must download the entire PDF before it can render anything — the cross-reference table sits at the end of the file.

Linearization predates cross-reference streams (added in PDF 1.5), which encode object offsets in raw binary and support arbitrary file sizes. The two formats are mutually exclusive: per the Foxit PDF SDK header documentation for e_SaveFlagLinearized, "this should be used alone and cannot be used with other saving flags except e_SaveFlagNoUpdatingMetadataDateTime" — including the xref-stream flag. A linearized PDF must use the older classic xref table format, whose entries look like 0000123456 00000 n; that fixed 10-digit offset field tops out at 9,999,999,999 bytes.

How we tested the ceiling

We built a small custom binary, foxit_linearize.exe, against the Foxit PDF SDK Core (Windows C++) on our GPU box. It calls PDFDoc::SaveAs(output, e_SaveFlagLinearized) — the same code path the desktop product uses for "Save As Optimized for Fast Web View."

For each input we record the JSON returned by the wrapper, including IsLinearized() on the freshly-written output file. The behaviour we found is more conservative than the spec ceiling alone would predict:

So Foxit's empirical threshold sits somewhere between 5 GB and 9 GB — not at 10 GB. The probable cause is an internal sanity check or 32-bit-signed offset usage in Foxit's hint-table generator, but we haven't binary-searched it. Net result: Foxit's behaviour is safer than qpdf's (no malformed output ever), but its useful linearization range is narrower.

What this test set covers

When (and when not) to linearize, in 2026

Linearize when…

Skip it when…

The short version

Linearization is a deployment-time optimization for one specific scenario: web-embedded viewers loading large documents over the public internet. For everything else — archives, edit pipelines, fast networks, the increasingly common multi-GB technical PDFs — the spec ceiling and edit fragility outweigh the streaming benefit. Treat it as a publish-step toggle, not a default. And if your file is already over 5 GB, don't bother trying.

Visual verification

Every page of every file shows its target size in the rainbow label (e.g. "1 GB", "10 GB") plus a "Page N" header. If you can read those, the viewer is rendering content streams correctly. The bottom-half noise image is unique per page so viewer-side image deduplication never short-circuits the test.