Six year old PDF loop bug affects most major implementations

Posted by Hanno Böck on Sunday, September 3. 2017

I recently did some testing of the qpdf library with afl and libfuzzer. I discovered an input sample that would generate a high CPU load spike and eventually after several minutes cause an out of memory error. It looked like the parser was caught in some kind of endless loop.

This reminded me of something. In 2011 at the Chaos Communication Camp Andreas Bogk gave a talk about creating a formally verified PDF parser with Ocaml and Coq. In that talk he mentioned that one could create a PDF file with cross-referencing xref tables. A naive parser would get caught in an endless loop. He showed that the evince thumbnailer process was affected by this.

This was eventually fixed in evince's underlying rendering library poppler. But it seems this issue hasn't reached many other PDF parser libraries. The qpdf issue I had seen was exactly the same bug. The sample file can be found here. If you follow that link you'll immediately notice that Github's Javascript PDF viewer is also affected.

How about Mozilla Firefox? Affected. (They use pdf.js, which is the same code Github uses.) What about Chrome / Chromium, which uses a library called PDFium? Affected. (Notably they already had a test PDF for this, but the function that causes the loop wasn't tested.) Microsoft Edge? Affected. Ghostscript, which is the other major PDF parser used by many free and open source tools? Also affected. For completeness, Adobe Reader and Apple's OS X internal PDF viewer were unaffected.

To make this clear: This isn't a major security issue, the impact is a denial of service. But it is certainly something undesirable that should be fixed.

It is remarkable that a bug that was discovered six years ago affected the majority of widely used PDF implementations. But it falls into a pattern of IT security: Very often discovering security issues means rediscovering old issues. In general this is a difficult problem to solve, as it touches complex questions about knowledge transfer.

However in this specific case - an input file that causes a bug in one implementation also causes the same bug in other implementations - there are things that can be done.

I report fuzzing-related bugs on a very regular basis and I always share a sample file that triggers the bug. In the best cases the maintainers of the affected software take the bug triggering sample and use it in their test suite. I think this should be a standard practice.

However that only prevents regressions in the same software. But maintainers of parsers for common file formats could also take a look at their competitors and check their test suites. While the PDF standard probably isn't well defined enough to allow a visual comparison of rendering output, it's surely obvious that no input file should cause a crash, an invalid memory access or an endless loop.

Looking at the common PDF libraries the situation with test cases was quite mixed. Mozilla's pdf.js had the most extensive collection of files, many of them examples from previous bugs. However one should know that about a third of them are not part of their code repository. They're referenced as links (all of them pointing to their own bug tracker or to the Internet archive, so I guess they're reasonably stable).

PDFium, used by Google Chrome, has a far less extensive test suite with only 96 PDF files. I have reported a bunch of PDFium bugs myself in the past (Examples: [1], [2], [3]) and the test cases I provided never got added to the test suite.

QPDF is doing very well: They ship 278 test PDFs and for all the bugs I reported lately they added the provided sample files.

Ghostscript has only three PDF example files in its code (other PDFs in the code seem to be documentation, not test cases). Poppler's code has no PDFs bundled at all. They have a separate test repository for that with 35 files, but it hasn't been updated since 2009. There's definitely lots of room for improvement in that area.

Image Source

Trackbacks

Trackback specific URI for this entry

No Trackbacks

Comments

Display comments as Linear | Threaded

anon on Monday, September 4. 2017:

Shouldn't quite a few of those 'affected' be 'effected' instead?
Since Firefox/chrome/edge aren't the ones causing the issue but are on the receiving side?

Craig Young on Monday, September 4. 2017:

FWIW, I believe the use of the word affected is correct.

timendum on Monday, September 4. 2017:

Firefox 57 should not be affected by the issue any more, see the bugzilla link for the patch.

ralf on Monday, September 4. 2017:

No. He effected a denial-of-service attack. Many PDF viewers were affected.

anonymous on Monday, September 4. 2017:

Do you happen to know if MuPDF is affected?

Tilman Hausherr on Monday, September 4. 2017:

Heh heh... I remembered you were up to something with PDF so I looked again on twitter and saw it, and yes, PDFBox is affected. Thank you for digging this up. I've fixed it for Apache PDFBox in https://issues.apache.org/jira/browse/PDFBOX-3919 .

In case you're planning to write something, our test file repository isn't big, but I keep over 1000 PDF files locally (from previous issues and from other projects, mostly PDF.js, ghostscript and poppler) and do parse/rendering tests near every commit. The loop_edited.pdf file will be added to it. We can't host this in the repository and distribute it with the source code ZIP because 1) copyrights 2) size over 800MB.

CaveJ on Monday, September 4. 2017:

SumatraPDF (v3.2.10710 prerelease), which uses muPDF, doesn't seem to be affected (renders 'bar' text with normal memory+cpu usage.)

Andreas Bogk on Tuesday, September 5. 2017:

Thanks for remembering.

And a shoutout for not only finding this again with fuzzing, but also for taking the time to report this to all the vendors.

Also, I'd like to mention that this was joint work with Marco Schoepl, not just me alone.

F on Wednesday, September 6. 2017:

Hi,
I'm getting an "invalid bug" page with "
Bug #69840 does not exist." for the ghostscript bug at https://bugs.ghostscript.com/show_bug.cgi?id=69840

Add Comment

Name

Homepage

Comment

In reply to

Phone*

What is four plus three?

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.

Standard emoticons like :-) and ;-) are converted to images.

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Enter the string from the spam-prevention image above:

Form options

Remember Information?

Subscribe to this entry