Many of the issues you see in PDF file come from the interaction of different parts. Here is a really good example I came across while debugging a file…
The CropBox is usually the visible page area within a larger MediaBox. It is often used to hide things like printers crop marks and is generally the visible part you see in a PDF viewer. So what happens if the CropBox is larger than the MediaBox? Does the PDF even work?
Here is the raw Root object from an example PDF file I have been looking at.
3 0 obj<</CropBox[0 0 595.22 842] /Parent 2 0 R
/B[347 0 R] /Contents 4 0 R /Rotate 0 /BleedBox[0 0 595.22 842] /ArtBox[0 0 595.22 842] /MediaBox[56.6929 56.6929 476.22 651.969] /TrimBox[0 0 595.22 842] /Resources<< /Font<</TT2 496 0 R/TT4 497 0 R/TT6 498 0 R/TT8 511 0 R>> /ProcSet[/PDF/Text/ImageB/ImageC/ImageI] /Properties<</MC2 503 0 R>> /ExtGState<</GS2 499 0 R>>>>/Type/Page>>
endobj
As you can see the MediaBox is actually inside the CropBox. Do we:-
1. Use the CropBox value and display a ‘margin’ around the actual page data.
2. Use the smaller MediaBox as the CropBox value.
3. Throw an error.
As usual, our guide is how Acrobat behaves – the correct answer is 2. Did you guess correctly?


Mark,
There is actually no reason to guess – or to mimic Acrobat’s behavior. The desired behavior in this case is described in the PDF spec (from section 14.11.2.1): “The crop, bleed, trim, and art boxes shall not ordinarily extend beyond the boundaries of the media box. If they do, they are effectively reduced to their intersection with the media box.”.
As you probably know, the PDF spec is not always easy to understand – and sometimes requires some experimentation to to get to the bottom of things, but in this case, it is pretty clear
May I ask what application created the PDF file in question? It’s interesting that all the boxes are set to A4, but the media box is smaller. It looks like somebody was trying to crop out a smaller portion of the page, but picked the wrong box to modify.
Thanks for digging up all these strange PDF files.
Karl Heinz
Thanks for the exact reference. One of the ‘fun’ things about the PDF spec is that you quite often find that the answer is indeed somewhere in the 1000+ plus pages of terse documentation but not always easy to find (often there is a subnote in an appendix somewhere).
There is no file Producer/Creator listed on the example file so not sure what it was created with .