When your Memory Allocator hides Security Bugs

MemoryRecently I shared some information about potential memory safety bugs in the Apache web server together with Craig Young. One issue that came up in that context is the so-called pool allocator Apache is using.

What is this pool allocator? Apache’s APR library has a feature where you can allocate a pool, which is a larger area of memory, and then do memory allocations within that pool. It’s essentially a separate memory allocation functionality by the library. Similar concepts exist in other software.

Why would anyone do that? It might bring performance benefits to have memory allocation that’s optimized for a specific application. It also can make programming more convenient when you can allocate many small buffers in a pool and then not bothering about freeing each one of then and instead just free the whole pool with all allocations within.

There’s a disadvantage with the pool allocator, and that is that it may hide bugs. Let’s look at a simple code example:

#include <apr_pools.h>
#include <stdio.h>
#include <string.h>

int main() {
apr_pool_t *p;
char *b1, *b2;

apr_initialize();
apr_pool_create(&p, NULL);
b1 = apr_palloc(p, 6);
b2 = apr_palloc(p, 6);

strcpy(b1, "This is too long");
strcpy(b2, "Short");
printf("%s %s\n", b1, b2);
}


We can compile this with:
gcc $(pkg-config --cflags --libs apr-1) input.c

What we’re doing here is that we create a pool p and we create two buffers (b1, b2) within that pool, each six byte. Now we fill those buffers with strings. However for b1 we fill it with a string that is larger than its size. This is thus a classic buffer overflow. The printf at the end which outputs both strings will show garbled output, because the two buffers interfere.

Now the question is how do we find such bugs? Of course we can carefully analyze the code, and in the simple example above this is easy to do. But in complex software such bugs are hard to find manually, therefore there are tools to detect unsafe memory access (e.g. buffer overflows, use after free) during execution. The state of the art tool is Address Sanitizer (ASAN). If you write C code and don’t use it for testing yet, you should start doing so now.

Address Sanitizer is part of both the gcc and clang compiler and it can be enabled by passing -fsanitize=address on the command line. We’ll also add -g, which adds debugging information and will give us better error messages. So let’s try:
gcc -g -fsanitize=address $(pkg-config --cflags --libs apr-1) input.c

If you try this you will find out that nothing has changed. We still see the garbled string and Address Sanitizer has not detected the buffer overflow.

Let’s try rewriting the above code in plain C without the pool allocator:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main() {
char *b1, *b2;
b1 = malloc(6);
b2 = malloc(6);

strcpy(b1, "This is too long");
strcpy(b2, "Short");
printf("%s %s\n", b1, b2);
}


If we compile and run this with ASAN it will give us a nice error message that tells us what’s going on:

==9319==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000016 at pc 0x7f81fdd08c9d bp 0x7ffe82881930 sp 0x7ffe828810d8
WRITE of size 17 at 0x602000000016 thread T0
#0 0x7f81fdd08c9c in __interceptor_memcpy /var/tmp/portage/sys-devel/gcc-8.2.0-r6/work/gcc-8.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:737
#1 0x5636994851e0 in main /tmp/input.c:10
#2 0x7f81fdb204ea in __libc_start_main (/lib64/libc.so.6+0x244ea)
#3 0x5636994850e9 in _start (/tmp/a.out+0x10e9)

0x602000000016 is located 0 bytes to the right of 6-byte region [0x602000000010,0x602000000016)
allocated by thread T0 here:
#0 0x7f81fddb6b10 in __interceptor_malloc /var/tmp/portage/sys-devel/gcc-8.2.0-r6/work/gcc-8.2.0/libsanitizer/asan/asan_malloc_linux.cc:86
#1 0x5636994851b6 in main /tmp/input.c:7
#2 0x7f81fdb204ea in __libc_start_main (/lib64/libc.so.6+0x244ea)


So why didn’t the error show up when we used the pool allocator? The reason is that ASAN is built on top of the normal C memory allocation functions like malloc/free. It does not know anything about APR’s pools. From ASAN’s point of view the pool is just one large block of memory, and what’s happening inside is not relevant.

Thus we have a buffer overflow, but the state of the art tool to detect buffer overflows is unable to detect it. This is obviously not good, it means the pool allocator takes one of the most effective ways to discover an important class of security bugs away from us.

If you’re looking for solutions for that problem you may find old documentation about "Debugging Memory Allocation in APR". However it relies on flags that have been removed from the APR library, so it’s not helpful. However there’s a not very well documented option of the APR library that allows us to gain memory safety checks back. Passing --enable-pool-debug=yes to the configure script will effectively disable the pool allocator and create a separate memory allocation for each call to the pool allocator.

If we compile our first example again, this time with the pool debugger and ASAN, we’ll see the error:

==20228==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000016 at pc 0x7fe2e625dc9d bp 0x7ffe8419a180 sp 0x7ffe84199928
WRITE of size 17 at 0x602000000016 thread T0
#0 0x7fe2e625dc9c in __interceptor_memcpy /var/tmp/portage/sys-devel/gcc-8.2.0-r6/work/gcc-8.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:737
#1 0x55fe439d132c in main /tmp/input.c:15
#2 0x7fe2e5fc34ea in __libc_start_main (/lib64/libc.so.6+0x244ea)
#3 0x55fe439d1129 in _start (/tmp/a.out+0x1129)

0x602000000016 is located 0 bytes to the right of 6-byte region [0x602000000010,0x602000000016)
allocated by thread T0 here:
#0 0x7fe2e630bb10 in __interceptor_malloc /var/tmp/portage/sys-devel/gcc-8.2.0-r6/work/gcc-8.2.0/libsanitizer/asan/asan_malloc_linux.cc:86
#1 0x7fe2e6203157 (/usr/lib64/libapr-1.so.0+0x1f157)
#2 0x7fe2e5fc34ea in __libc_start_main (/lib64/libc.so.6+0x244ea)


Apache is not alone in having a custom memory allocation that can hide bugs. Mozilla’s NSPR and NSS libraries have something called an Arena Pool, Glib has memory slices and PHP has the Zend allocator. All of them have the potential of hiding memory safety bugs from ASAN, yet luckily all have an option to be turned off for testing. I maintain a collection of information about such custom allocators and how to disable them.

But back to Apache. When we started reporting use after free bugs we saw with the debugging option for the pool allocator we learned from the Apache developers that there are incompatibilities with the http2 module and the pool debugger. This has led to replies after our disclosure that these are non-issues, because nobody should run the pool debugger in production.

It should be noted that we were also able to reproduce some bugs without the pool debugger in the latest Apache version (we have shared this information with Apache and will share it publicly later), and that indeed it seems some people did run the pool debugger in production (OpenBSD).

But I have another problem with this. If we consider that parts of the current Apache code are incompatible with the APR pool debugger then we end up with an unfortunate situation: If a tool like ASAN reports memory safety bugs with the pool debugger we don’t know if they are real issues or just incompatibilities. If we turn off the pool debugger we won’t see most of the memory safety bugs.

That’s a situation where testing Apache for memory safety bugs becomes practically very difficult. In my opinion that by itself is a worrying and severe security issue.

Image source: Dreamstime, CC0

Stack buffer overflow in WolfSSL before 3.13.0

During some tests of TLS libraries I found a stack buffer overflow vulnerability in the WolfSSL library.
Finding this one was surprisingly simple: I had a wolfssl server that was compiled with address sanitizer and ran the SSL Labs test against it.

The bug happens in the parsing of the signature hash algorithm list that is sent in a ClientHello and is basically a textbook stack buffer overflow. WolfSSL simply tries to store that in an array with 32 elements. If one sends more than 32 hash algorithms it overflows.

With the SSL Labs scan the bug only causes WolfSSL to terminate if it's compiled with address sanitizer, but if one sends a very large list of hash algorithms it also crashes in a normal compile. In situations where WolfSSL is used without ASLR this bug is probably trivially exploitable.

I have created a simple bash proof of concept (using netcat and xxd) that crashes a WolfSSL server.

The bug was fixed in this commit and in version 3.13.0 of WolfSSL.

FLIMP! The GIMP has some security problems

I recently got an E-Mail from Tobias Stöckmann, who reminded me of some bugs I had almost forgotten. When I started the Fuzzing Project I reported two bugs in import parsers of the GIMP. Tobias managed to write an exploit for one of them.

See FLIMP! for more info.

How Optionsbleed wasn't found in 2014

Shortly after I published details about the Optionsbleed bug I learned about something quite surprising: Others had already discovered this bug before, but have neither pinned it down to Apache nor recognized that it is a security vulnerability.

A paper published in 2014 on Arxiv titled "Support for Various HTTP Methods on the Web" mentions servers sending malformed Allow headers. It has examples listed that very clearly look like the output you get from a server vulnerable to Optionsbleed.

This alone would be noteworthy enough, but there's something that makes this even more surprising. This paper was published in May 2014, about a month after the Heartbleed bug was found. Heartbleed gained a lot of attention, not just in the tech scene, it was widely covered in the mainstream media. It can be assumed that almost everyone working in IT had heard of it.

So we have a situation where a major bug hit the news - and several people must have had evidence of a very similar bug in front of their eyes shortly afterwards. Yet nobody has recognized it as such. One of the authors mentioned in a comment that they hadn't looked at it from a security perspective, but still you'd think that someone should have noticed.

While it's always problematic to interpret too much into single anecdotes, it still makes me wonder things. Are we just terribly bad at explaining security issues? My personal impression is that Heartbleed is actually an issue that is relatively simple to grasp (of course best explained by XKCD). Going from there to the idea that seeing random garbage in HTTP headers indicates a very similar bug doesn't seem so far fetched to me. But the facts seem to disprove that.

Optionsbleed - HTTP OPTIONS method can leak Apache's server memory

optionsbleedIf you're using the HTTP protocol in everday Internet use you are usually only using two of its methods: GET and POST. However HTTP has a number of other methods, so I wondered what you can do with them and if there are any vulnerabilities.

One HTTP method is called OPTIONS. It simply allows asking a server which other HTTP methods it supports. The server answers with the "Allow" header and gives us a comma separated list of supported methods.

A scan of the Alexa Top 1 Million revealed something strange: Plenty of servers sent out an "Allow" header with what looked like corrupted data. Some examples:
Allow: ,GET,,,POST,OPTIONS,HEAD,,
Allow: POST,OPTIONS,,HEAD,:09:44 GMT
Allow: GET,HEAD,OPTIONS,,HEAD,,HEAD,,HEAD,, HEAD,,HEAD,,HEAD,,HEAD,POST,,HEAD,, HEAD,!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
Allow: GET,HEAD,OPTIONS,=write HTTP/1.0,HEAD,,HEAD,POST,,HEAD,TRACE

That clearly looked interesting - and dangerous. It suspiciously looked like a "bleed"-style bug, which has become a name for bugs where arbitrary pieces of memory are leaked to a potential attacker. However these were random servers on the Internet, so at first I didn't know what software was causing this.

Sometimes HTTP servers send a "Server" header telling the software. However one needs to be aware that the "Server" header can lie. It's quite common to have one HTTP server proxying another. I got all kinds of different "Server" headers back, but I very much suspected that these were all from the same bug.

I tried to contact the affected server operators, but only one of them answered, and he was extremely reluctant to tell me anything about his setup, so that wasn't very helpful either.

However I got one clue: Some of the corrupted headers contained strings that were clearly configuration options from Apache. It seemed quite unlikely that those would show up in the memory of other server software. But I was unable to reproduce anything alike on my own Apache servers. I also tried reading the code that put together the Allow header to see if I can find any clues, but with no success. So without knowing any details I contacted the Apache security team.

Fortunately Apache developer Jacob Champion digged into it and figured out what was going on: Apache supports a configuration directive Limit that allows restricting access to certain HTTP methods to a specific user. And if one sets the Limit directive in an .htaccess file for an HTTP method that's not globally registered in the server then the corruption happens. After that I was able to reproduce it myself. Setting a Limit directive for any invalid HTTP method in an .htaccess file caused a use after free error in the construction of the Allow header which was also detectable with Address Sanitizer. (However ASAN doesn't work reliably due to the memory allocation abstraction done by APR.)

FAQ

What's Optionsbleed?

Optionsbleed is a use after free error in Apache HTTP that causes a corrupted Allow header to be constructed in response to HTTP OPTIONS requests. This can leak pieces of arbitrary memory from the server process that may contain secrets. The memory pieces change after multiple requests, so for a vulnerable host an arbitrary number of memory chunks can be leaked.

The bug appears if a webmaster tries to use the "Limit" directive with an invalid HTTP method.

Example .htaccess:

<Limit abcxyz>
</Limit>

How prevalent is it?

Scanning the Alexa Top 1 Million revealed 466 hosts with corrupted Allow headers. In theory it's possible that other server software has similar bugs. On the other hand this bug is nondeterministic, so not all vulnerable hosts may have been caught.

So it only happens if you set a quite unusual configuration option?

There's an additional risk in shared hosting environments. The corruption is not limited to a single virtual host. One customer of a shared hosting provider could deliberately create an .htaccess file causing this corruption hoping to be able to extract secret data from other hosts on the same system.

I can't reproduce it!

Due to its nature the bug doesn't appear deterministically. It only seems to appear on busy servers. Sometimes it only appears after multiple requests.

Does it have a CVE?

CVE-2017-9798.

I'm seeing Allow headers containing HEAD multiple times!

This is actually a different Apache bug (#61207) that I found during this investigation. It causes HEAD to appear three times instead of once. However it's harmless and not a security bug.

Launchpad also has a harmless bug that produces a malformed Allow header, using a space-separated list instead of a comma-separated one.

How can I test it?

A simple way is to use Curl in a loop and send OPTIONS requests:

for i in {1..100}; do curl -sI -X OPTIONS https://www.google.com/|grep -i "allow:"; done

Depending on the server configuration it may not answer to OPTIONS requests on some URLs. Try different paths, HTTP versus HTTPS hosts, non-www versus www etc. may lead to different results.

Please note that this bug does not show up with the "*" OPTIONS target, you need a specific path.

Here's a python proof of concept script.

What shall I do?

If you run an Apache web server you should update. Most distributions should have updated packages by now or very soon. A patch can be found here. A patch for Apache 2.2 is available here (thanks to Thomas Deutschmann for backporting it).

Unfortunately the communication with the Apache security team wasn't ideal. They were unable to provide a timeline for a coordinated release with a fix, so I decided to define a disclosure date on my own without an upstream fix.

If you run an Apache web server in a shared hosting environment that allows users to create .htaccess files you should drop everything you are doing right now, update immediately and make sure you restart the server afterwards.

Is this as bad as Heartbleed?

No. Although similar in nature, this bug leaks only small chunks of memory and more importantly only affects a small number of hosts by default.

It's still a pretty bad bug, particularly for shared hosting environments.

Updates:

Analysis by Apache developer William A. Rowe Jr.

Distribution updates:
Gentoo: Commit (2.2.34 / 2.4.27-r1 fixed), Bug
NetBSD/pkgsrc: Commit
Guix: Commit
Arch Linux: Commit (2.4.27-2 fixed)
Slackware: Advisory
NixOS: Commit
Debian: Security Tracker, Advisory (2.4.10-10+deb8u11, 2.4.25-3+deb9u3)
Ubuntu: Advisory (2.4.25-3ubuntu2.3, 2.4.18-2ubuntu3.5, 2.4.7-1ubuntu4.18)

Media:
Apache-Webserver blutet (Golem.de)
Apache Webserver: "Optionsbleed"-Bug legt Speicherinhalte offen (heise online)
Risks Limited With Latest Apache Bug, Optionsbleed (Threatpost)
Apache “Optionsbleed” vulnerability – what you need to know (Naked Security)
Apache bug leaks contents of server memory for all to see—Patch now (Ars Technica)

Six year old PDF loop bug affects most major implementations

Endless LoopI recently did some testing of the qpdf library with afl and libfuzzer. I discovered an input sample that would generate a high CPU load spike and eventually after several minutes cause an out of memory error. It looked like the parser was caught in some kind of endless loop.

This reminded me of something. In 2011 at the Chaos Communication Camp Andreas Bogk gave a talk about creating a formally verified PDF parser with Ocaml and Coq. In that talk he mentioned that one could create a PDF file with cross-referencing xref tables. A naive parser would get caught in an endless loop. He showed that the evince thumbnailer process was affected by this.

This was eventually fixed in evince's underlying rendering library poppler. But it seems this issue hasn't reached many other PDF parser libraries. The qpdf issue I had seen was exactly the same bug. The sample file can be found here. If you follow that link you'll immediately notice that Github's Javascript PDF viewer is also affected.

How about Mozilla Firefox? Affected. (They use pdf.js, which is the same code Github uses.) What about Chrome / Chromium, which uses a library called PDFium? Affected. (Notably they already had a test PDF for this, but the function that causes the loop wasn't tested.) Microsoft Edge? Affected. Ghostscript, which is the other major PDF parser used by many free and open source tools? Also affected. For completeness, Adobe Reader and Apple's OS X internal PDF viewer were unaffected.

To make this clear: This isn't a major security issue, the impact is a denial of service. But it is certainly something undesirable that should be fixed.

It is remarkable that a bug that was discovered six years ago affected the majority of widely used PDF implementations. But it falls into a pattern of IT security: Very often discovering security issues means rediscovering old issues. In general this is a difficult problem to solve, as it touches complex questions about knowledge transfer.

However in this specific case - an input file that causes a bug in one implementation also causes the same bug in other implementations - there are things that can be done.

I report fuzzing-related bugs on a very regular basis and I always share a sample file that triggers the bug. In the best cases the maintainers of the affected software take the bug triggering sample and use it in their test suite. I think this should be a standard practice.

However that only prevents regressions in the same software. But maintainers of parsers for common file formats could also take a look at their competitors and check their test suites. While the PDF standard probably isn't well defined enough to allow a visual comparison of rendering output, it's surely obvious that no input file should cause a crash, an invalid memory access or an endless loop.

Looking at the common PDF libraries the situation with test cases was quite mixed. Mozilla's pdf.js had the most extensive collection of files, many of them examples from previous bugs. However one should know that about a third of them are not part of their code repository. They're referenced as links (all of them pointing to their own bug tracker or to the Internet archive, so I guess they're reasonably stable).

PDFium, used by Google Chrome, has a far less extensive test suite with only 96 PDF files. I have reported a bunch of PDFium bugs myself in the past (Examples: [1], [2], [3]) and the test cases I provided never got added to the test suite.

QPDF is doing very well: They ship 278 test PDFs and for all the bugs I reported lately they added the provided sample files.

Ghostscript has only three PDF example files in its code (other PDFs in the code seem to be documentation, not test cases). Poppler's code has no PDFs bundled at all. They have a separate test repository for that with 35 files, but it hasn't been updated since 2009. There's definitely lots of room for improvement in that area.

Image Source

exiv2: multiple memory safety issues

This post first appeared on oss-security.

I'm reporting three issues here in exiv2, a parser libary for image metadata. These are only examples, exiv2 is full of memory safety bugs that can trivially be found by running afl with asan for a few hours.

I have not reported those issues upstream. When I previously tried to report bugs in exiv2 found via fuzzing the upstream author made it clear to me that he has little interest in fixing those issues and doesn't consider his software suitable to parse defect files (which basically means it's unsuitable for untrusted input). The discussion can be read here. (the page is sometimes not available, searching for it in the google cache usually works though)

exiv2 is to my knowledge used by the major Linux Desktops GNOME and KDE. I'll also inform their security teams. I leave it up to Linux distros how to handle this, but it certainly is problematic that a crucial parser used by major desktop applications is not interested in fixing potential security issues.


Heap overflow (write) in tiff parser

A malformed tiff file can cause a one byte heap overflow in exiv2.

Stack trace:

==22873==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000df1 at pc 0x000000842091 bp 0x7fff51b3ee70 sp 0x7fff51b3ee68
WRITE of size 1 at 0x602000000df1 thread T0
#0 0x842090 in Exiv2::ul2Data(unsigned char*, unsigned int, Exiv2::ByteOrder) /f/exiv2-trunk/src/types.cpp:362:20
#1 0x68beac in long Exiv2::toData<unsigned int>(unsigned char*, unsigned int, Exiv2::ByteOrder) /f/exiv2-trunk/src/../include/exiv2/value.hpp:1459:16
#2 0x68beac in Exiv2::ValueType<unsigned int>::copy(unsigned char*, Exiv2::ByteOrder) const /f/exiv2-trunk/src/../include/exiv2/value.hpp:1612
#3 0x6742b2 in Exiv2::Exifdatum::copy(unsigned char*, Exiv2::ByteOrder) const /f/exiv2-trunk/src/exif.cpp:362:48
#4 0x7f794d in Exiv2::TiffImage::readMetadata() /f/exiv2-trunk/src/tiffimage.cpp:204:18
#5 0x59786a in Action::Print::printSummary() /f/exiv2-trunk/src/actions.cpp:289:16
#6 0x596ef8 in Action::Print::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /f/exiv2-trunk/src/actions.cpp:244:44
#7 0x55fb3f in main /f/exiv2-trunk/src/exiv2.cpp:170:25
#8 0x7f91c1e571d0 in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.24-r2/work/glibc-2.24/csu/../csu/libc-start.c:289
#9 0x468979 in _start (/r/exiv2/exiv2+0x468979)

0x602000000df1 is located 0 bytes to the right of 1-byte region [0x602000000df0,0x602000000df1)
allocated by thread T0 here:
#0 0x55af00 in operator new[](unsigned long) (/r/exiv2/exiv2+0x55af00)
#1 0x83fadf in Exiv2::DataBuf::alloc(long) /f/exiv2-trunk/src/types.cpp:158:22



Heap out of bounds read in jp2 / JPEG2000 parser

A malformed jpeg2000 file causes a (large) out of bounds read.

Stack trace:

==32038==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f5e099e6838 at pc 0x00000050f22c bp 0x7ffdf7f3dcd0 sp 0x7ffdf7f3d480
READ of size 808464432 at 0x7f5e099e6838 thread T0
#0 0x50f22b in __asan_memcpy (/r/exiv2/exiv2+0x50f22b)
#1 0x6e82bc in Exiv2::Jp2Image::readMetadata() /f/exiv2-trunk/src/jp2image.cpp:277:29
#2 0x59786a in Action::Print::printSummary() /f/exiv2-trunk/src/actions.cpp:289:16
#3 0x596ef8 in Action::Print::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /f/exiv2-trunk/src/actions.cpp:244:44
#4 0x55fb3f in main /f/exiv2-trunk/src/exiv2.cpp:170:25
#5 0x7f5e130a71d0 in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.24-r2/work/glibc-2.24/csu/../csu/libc-start.c:289
#6 0x468979 in _start (/r/exiv2/exiv2+0x468979)

0x7f5e099e6838 is located 0 bytes to the right of 808452152-byte region [0x7f5dd96e6800,0x7f5e099e6838)
allocated by thread T0 here:
#0 0x55af00 in operator new[](unsigned long) (/r/exiv2/exiv2+0x55af00)
#1 0x6e8176 in Exiv2::DataBuf::DataBuf(long) /f/exiv2-trunk/src/../include/exiv2/types.hpp:204:46
#2 0x6e8176 in Exiv2::Jp2Image::readMetadata() /f/exiv2-trunk/src/jp2image.cpp:273
#3 0x59786a in Action::Print::printSummary() /f/exiv2-trunk/src/actions.cpp:289:16
#4 0x596ef8 in Action::Print::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /f/exiv2-trunk/src/actions.cpp:244:44
#5 0x7f5e130a71d0 in
__libc_start_main /var/tmp/portage/sys-libs/glibc-2.24-r2/work/glibc-2.24/csu/../csu/libc-start.c:289



Stack out of bounds read in webp parser

A malformed webp file causes a six bytes stack out of bounds read.

Stack trace:

==598==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffcc12aa054 at pc 0x0000004fe311 bp 0x7ffcc12a9f90 sp 0x7ffcc12a9740
READ of size 6 at 0x7ffcc12aa054 thread T0
#0 0x4fe310 in __interceptor_memcmp.part.76 (/r/exiv2/exiv2+0x4fe310)
#1 0x8889d0 in Exiv2::WebPImage::getHeaderOffset(unsigned char*, long, unsigned char*, long) /f/exiv2-trunk/src/webpimage.cpp:798:17
#2 0x8889d0 in Exiv2::WebPImage::decodeChunks(unsigned long) /f/exiv2-trunk/src/webpimage.cpp:601
#3 0x884ff2 in Exiv2::WebPImage::readMetadata() /f/exiv2-trunk/src/webpimage.cpp:496:20
#4 0x59786a in Action::Print::printSummary() /f/exiv2-trunk/src/actions.cpp:289:16
#5 0x596ef8 in Action::Print::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /f/exiv2-trunk/src/actions.cpp:244:44
#6 0x55fb3f in main /f/exiv2-trunk/src/exiv2.cpp:170:25
#7 0x7f7f9cc9f1d0 in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.24-r2/work/glibc-2.24/csu/../csu/libc-start.c:289
#8 0x468979 in _start (/r/exiv2/exiv2+0x468979)

Address 0x7ffcc12aa054 is located in stack of thread T0 at offset 180 in frame
#0 0x885a0f in Exiv2::WebPImage::decodeChunks(unsigned long) /f/exiv2-trunk/src/webpimage.cpp:501

This frame has 13 object(s):
[32, 36) 'size_buff' (line 503)
[48, 64) 'payload' (line 516)
[80, 84) 'size_buf' (line 520)
[96, 100) 'size_buf48' (line 536)
[112, 114) 'size_buf_w' (line 551)
[128, 131) 'size_buf_h' (line 552)
[144, 148) 'size_buf112' (line 568)
[160, 162) 'size_buff152' (line 587)
[176, 180) 'exifLongHeader295' (line 588) <== Memory access at offset 180 overflows this variable
[192, 196) 'exifTiffBEHeader297' (line 591)
[208, 272) 'xmpData' (line 650)
[304, 688) 'temp.lvalue'
[752, 1136) 'temp.lvalue232'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions are supported)

Diving into Control Flow Integrity

To improve security, modern systems contain many mitigation strategies that try to make it harder to exploit security vulnerabilities. Commonly used strategies include stack canaries, address space layout randomization (ASLR) and nonexecutable memory pages. Unfortunately the Linux distributions have been slow in adopting ASLR, but this is finally changing.

A new set of mitigation technologies has been discussed for a while under the umbrella term "Control Flow Integrity" (CFI). I won't get into technical details, but the very general idea is to add additional checks to the code that prohibit jumps to code parts that are not supposed to happen in the normal operation of a software.

LLVM's Clang compiler supports a form of CFI since version 3.7. Other forms of CFI are available under windows in Visual Studio (CFGuard) and with Grsecurity (RAP).

Recently I experimented a bit with the Clang version of CFI. It's been one of those situations where you dig into a topic and find out that it's really hard to Google for advice. The information out there is relatively limited: There's the official LLVM documentation, a talk by Kostya Serebryany that briefly mentions CFI in the second half and two blog posts by the company Trail of Bits. Also Chrome is using a subset of the CFI functionality and there's a list of bugs found with it.

Given the relatively scarce information when starting to use it you will experience situations where things fail and you won't find any help via Google.

So why would you want to use CFI? One possibility would be to create a super-hardened Linux system where, beyond using the "old" exploit mitigations like ASLR, one would also enable CFI. The computational costs of doing so are relatively small (Kostya Serebryany mentions in the talk above that they were unable to measure the CPU cost in Chrome). The executables grow in size and likely use more memory, but not in extraordinary amounts (below 10 percent). So from a performance side this is doable.

I started by compiling some small applications with CFI to see what happens. In some cases they "just work". In most cases I ended up having strange linker errors that were nontrivial to debug. Interesting for Linux distributions: There seems to be no dependency chain that needs to be considered (which is different from many of Clang's Sanitizer features). It's possible to have an executable built with CFI that depends on libraries not using CFI and it's also possible to have libraries using CFI that are used by executables not using CFI. This is good: If we'd strive for our super-hardened Linux we can easily exclude packages that create too many problems or start by including packages where we think they'd profit most from such extra protection.

CFI itself is enabled with the flag -fsanitize=cfi. It needs two other compiler and linker flags to work: -fvisibility=hidden for the compiler, which hides unnecessary symbols, and -flto for the linker to enable Link Time Optimization. The latter one needs the Gold linker, depending on your system you may need to install the LLVM gold plugin. If Gold isn't your default linker you can pass -fuse-ld=gold. Furthermore if you want to debug things you want to add -fno-sanitize-trap=all and enable extended debugging with -g, this will give you useful error messages in case CFI stops your application. (However you probably don't want to do that in production systems, as the error reporting can introduce additional bugs.)

In theory some of these flags only need to be passed to the compiler and others to the linker, but compilation systems aren't always strict in separating those, so we just add all of them to both.

So we can start compiling the current version of curl (7.53.1):
./configure CC=clang CXX=clang++ LD=clang CFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto" CXXFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto" LDFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto"
make

However we'll end up getting an error when it tries to link to the shared library. I guess it's a libtool problem, but I haven't digged into it. For now we can just disable the shared library by adding --disable-shared:
./configure CC=clang CXX=clang++ LD=clang CFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto" CXXFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto" LDFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto" --disable-shared
make

We end up getting a curl executable in src/curl. It runs, but as soon as we try to download something (e. g. src/curl google.com) it will just show "Illegal instruction" and quit. What's going on? Let's recompile, but this time with CFI error messages and some more debugging:
./configure CC=clang CXX=clang++ LD=clang CFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto -fno-sanitize-trap=all -g" CXXFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto -fno-sanitize-trap=all" LDFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto -fno-sanitize-trap=all" --disable-shared --enable-debug
make

Now the output gets more verbose:
sendf.c:578:22: runtime error: control flow integrity check for type 'unsigned long (char , unsigned long, unsigned long, void )' failed during indirect function call
/mnt/ram/curl-7.53.1/src/tool_cb_hdr.c:44: note: tool_header_cb defined here

In tool_cb_hdr.c we find this function definition:
size_t tool_header_cb(void *ptr, size_t size, size_t nmemb, void *userdata)

The code in sendf.c it complains about is this:
size_t wrote = writeheader(ptr, 1, chunklen, data->set.writeheader);

The writeheader function is a function pointer variable of type curl_write_callback. This is defined in curl.h:
typedef size_t (*curl_write_callback)(char *buffer, size_t size, size_t nitems, void *outstream);

So we have a function pointer of type curl_write_callback pointing to the function tool_header_cb. If you closely look at both function definitions you'll spot that they don't match. The first parameter is a void* for tool_header_cb and a char* for curl_write_callback. CFI doesn't allow such indirect function calls if the functions don't match exactly. The fix is to align them, in this case I proposed to change tool_header_cb to also use char*. There's a second function tool_write_cb which has exactly the same problem. The patch is already applied to curl's git repository.

With this patch applied curl runs again. There was a second issue that showed up when running curl's test suite. This was a difference of a signed versus unsigned char pointer. It was a bit trickier, because further down the function it expected unsigned values, so it needed an explicit cast. That fix has been applied as well.

A good question is how relevant these issues are. On the assembly level pointers are just pointers and there's no difference between a char* and a void*. But C sometimes fails in interesting and unexpected ways, so it's certainly better to fix those bugs. I don't know whether there's a realistic scenario in which such a bug ends up being a security bug, if you are aware of one please post a comment.

Notable here is that CFI is designed to be a runtime hardening feature, but given it interrupts program execution when it hits certain bug classes it is also a bug finding tool.

From my layman understanding CFI is more likely to catch and prevent C++ bugs, because indirect function calls are much more common within object oriented programming. But given that most C++ applications are more complex, it's more likely you run into problems when trying to compile them with CFI.

For now I think it's a good idea that C/C++-based projects test their code with CFI. This lays the groundwork for future projects where people might want to create hardened systems with CFI enabled. But it can also helps to find bugs. One project that has already extensively uncovered bugs with CFI is Chrome. If you follow Chrome's release notes you may have noticed that they often attribute security fixes to CFI.

htpasswDoS: Local Denial of Service via Apache httpd password hashes

Apache logoThe way the Apache httpd web server handles password hashes can be abused by a malicious user on a server to cause resource exhaustion and denial of service of the web server. I reported this a while ago to the Apache security team - which led to a lengthy discussion where I was a bit appalled about some of the statements I got from the Apache developers. They can be summed up in a way that major use cases of Apache - especially in web hosting environments - are not recommended by their developers.

Apache supports HTTP basic authentication, a simple login mechanism with username and password that is part of the HTTP protocol. This can be configured via the .htaccess file on a web server. A very simple htaccess file looks like this:

AuthType Basic
AuthName "privat"
AuthUserFile /home/user/pass
require valid-user


The file "/home/user/pass" is a file containing usernames and password hashes. It can be created with the htpasswd tool. It supports several different password hashing schemes. An entry looks like this:

guest:$2y$05$sCcklxS8F1mvB/B2R090IOjqC0/i2FdhlAOJ0ujy.yfXswXIjQwGe

bcrypt hash with insane running time

By fuzzing htpasswd I recognized that some inputs would cause a very long running time of the application. This was caused by bcrypt hashes with a very long computing time.

The hash above uses the bcrypt hash function, indicated by the $2y. The number after that - the $05 - indicates the computing time of the hash function. It can have values between 04 and 31. The purpose of password hashing function is to make brute force attacks slow in case of a breach. Therefore it is desirable for password hashing functions to be slow (which is a very different requirement from many other use cases, where hash functions should be fast). However they also can't be too slow, because they still have to be calculated once for every login attempt.

A user who wants to cause disturbance on a server can now choose to set the computing time of the hash to an insanely large value. The hash value doesn't have to make any sense for this. A user can simply create a password file and change the 05 to 31:

guest:$2y$31$sCcklxS8F1mvB/B2R090IOjqC0/i2FdhlAOJ0ujy.yfXswXIjQwGe

For every login attempt with the right username the server will calculate the hash. The running time of bcrypt doubles with every increase of the computing time value. On my system calculating a hash with the maximum value 31 takes around 30 hours. Therefore with this a user can cause a server process to consume lots of resources for more than a day.

Two things are notable about the Apache behavior here:
  • The hash calculation is neither limited by a connection timeout or by a termination of the connection. Once a connection tries to log in the hashing starts and won't stop even if the user closes his browser.
  • The calculation happens within the server process. In common configurations this means it is not owned by the user, instead it's running under a system-wide user for the httpd server. Apache has functionalities to make sure that user scripts can only be executed under their own user account (SuExec), but these don't matter for the password hashing. This means any resource limit the server administrator has applied to the user account is irrelevant here.

So in summary a user that has the ability to host content on a server is able to severely slow down the whole server for more than a day with a single http request. This can only be stopped by restarting the server. For an administrator it will be nontrivial to figure out what's going on, because he'll only see a server process running amok, he won't have any easy way to trace that back to a user account.

Obviously the malicious user can also open multiple connections, but in my tests at least this didn't cause more resource exhaustion, so it seems all password hashing jobs were processed by the same Apache process. It will probably run longer, but I haven't investigated that in detail. This behavior may differ depending on the server configuration.

As an easy fix I proposed to the Apache developers to limit the computing time of the bcrypt hash. While a slower password hash can be a security improvement, there is no reason to have a hash function that runs longer than a few seconds at best. Here's a patch against apr-util - the package that contains Apache's bcrypt function - that will reject computing time values larger than 17. On my system that runs for 8 seconds, which some may argue is already too much. But it will prevent very simple DoS scenarios.

Is Apache inherently unable to protect against malicious users?

The Apache developers mostly rejected the notion that this is a security problem or any problem at all. I got statements that argue that Apache is inherently unable to defend against a user DoS'ing the server if a user is allowed to use .htaccess and that server operators shouldn't give untrusted users access to configuration files.

This is notable, because the ability to allow users a certain kind of configurability via .htaccess is actually a very distinctive feature of Apache and is widely used in shared web hosting environments. It has lost a lot of market share to Nginx in the past years, yet one of the major reasons some people don't want to switch to Nginx is that it has no comparable feature. In essence I get the feeling that the Apache developers consider the one feature that distincts them from many competitors as being inherently dangerous and discouraged.

One concern that was raised by one of the Apache developers was that even if the bcrypt password hash is capped in its execution time a user can do the same thing via password hashes supported by the C standard library. Apart from its own implementations Apache supports all password hashes provided by the crypt() function. However the concern here is much smaller. The maximum one can achieve is a running time of several minutes with the SHA512-based password hash and 999,999,999 iterations. I submitted a patch to Glibc that limits the execution time to a sane value (no reaction in the bug report yet).

To my surprise the musl libc already capped the running time of the SHA256 and SHA512 password hashing functions - and the code comment by Rich Felker explicitly mentions that this was done to avoid DoS.

Another issue that was raised and that I haven't investigated further is that a user could cause a similar denial of service by creating a crazy regular expression in the .htaccess file (the mod_rewrite functionality supports regular expression, see also Regular expression Denial of Service - ReDoS).

Separating from other issues

I want to point out that there are other security issues that shouldn't be confused with this one. There's an issue with hash table implementations named HashDoS (see this 29C3 talk) that can cause very slow running times of hash tables. It sounds similar, but it's a very different issue.

Another issue is the run time of password hashes for long inputs. Especially the SHA512-based password hash supported by glibc is exposed to this, because its running time grows quadratically with the size of the input. OpenSSH recently has restricted the password length to 1024 characters. There are probably other applications affected by this. Yet again this is a different issue.

Conclusion and comment

While the Apache developers were unwilling to accept that this is a potential security issue, they eventually patched it (the patch is incomplete, but we'll sort that out). So it will be fixed eventually.

The problem highlights several issues. It shows that the user permission concept of Apache is quite questionable. A user has a lot of control over the operations of a server - this also leads to other security problems in multi-user settings that are very hard to avoid. Ideally all user-controlled actions should run under the user account. This can be achieved with the nonstandard mpm-itk, but it can't properly handle use cases where different hosts are accessed over the same connection. It also seems non-ideal that server processes can execute code that can continue to run for hours after a connection has been terminated.

Apart from the specific issue in Apache the idea of abusing password hashes for Denial of Service might work in other settings. It can be applied everywhere where a user is in control of the hash of his password. Commonly this is not the case, but it would be worthwhile to investigate whether there are similar situations.

I've published some proof of concept examples here.

Logo source, Copyright: The Apache Software Foundation, License: Apache License, Version 2.0

Fuzzing Irssi with Perl Scripts

When using fuzzing tools like afl a common challenge is how you can pass input to the interesting parts of the application you want to fuzz. In easy situations we have a tool that will accept our input as a file or via stdin. However sometimes this is not easily possible.

Let's have a look at Irssi, an irc chat client. The only input you can pass on the command line is a config file. Fuzzing Irssi 0.8.10 easily led to a segfault caused by a null pointer access. However while bugs in config file parsers probably still should be fixed, usually they are not very interesting. (There can be exceptions.)

So what else might be interesting? Irssi does some parsing on all output, e.g. due to color codes. However we can't just print text that is passed via the command line as an input file. We have to abuse Irssi's perl scripting capability for that.

We can place a simple perl script that will read a file (fuzzp.txt) and print it into Irssi's autorun directory (default location is ~/.Irssi/scripts/autorun/). We can then place some examples of Irssi color codes into the directory "in/". I have installed an afl/asan-instrumented Irssi into my system to /usr/local/, because for running perl scripts it needs more than just the executable. So we can run afl like this:

afl-fuzz -i in -o out -m none -f fuzzp.txt Irssi

afl will put the fuzzed output into fuzzp.txt and our autoload script will read it from there. Doing this lets us learn that the sequence "%[" causes Irssi to read an invalid memory byte. For reasons unclear to me this only happens if a script outputs this sequence, not if a user types it in. (This issue got CVE-2017-5196 assigned.)

We can go further and do a similar script that executes input as a command. Commands are things like "/QUIT" that control the application and the channel behavior. I named the input file fuzzc.txt, so we can place some simple Irssi commands into in/ and run:

afl-fuzz -i in -o out -m none -f fuzzc.txt Irssi

Thus we will now fuzz Irssi's command processing.

As we have seen, scripting functionality can be used to fuzz an application. So if you want to fuzz something and don't know how to pass input: See if there's a scripting functionality.

Irssi has issued a security advisory for several security vulnerabilities, including the out of bounds read mentioned above. All vulnerabilities and the config file parser segfault are fixed in 0.8.21 and 1.0.0.

Update on MatrixSSL miscalculation (CVE-2016-8671, incomplete fix for CVE-2016-6887)

I recently reported how I found various bugs in the bignum implementation of MatrixSSL, some of them leading to remotely exploitable vulnerabilities.

One of the bugs was that the modular exponentiation function - pstm_exptmod() - produced wrong results for some inputs . This wasn't really fixed, but only worked around by restricting the allowed size of the modulus. Not surprisingly it is still possible to find inputs that cause miscalculations (code). I reported this to MatrixSSL on August 1st.

Recently MatrixSSL released another update (3.8.6) fixing several vulnerabilities reported by Craig Young from Tripwire. However the pstm_exptmod() bug is still there. (The incomplete fix got assigned CVE-2016-8671.)

It is unclear how exploitable such bugs are, but given that it's used in the context of cryptographic functions handling secret key material this is clearly a reason for concern.

MatrixSSL has long advertised itself as a safer alternative to OpenSSL, because it didn't suffer from the same kind of high severity bugs. I think it has been sufficiently shown that this was due to the fact that nobody was looking. But what's more worrying is that bugs they knew about for several months now don't get fixed properly.

Out of bounds heap bugs in glib, heap buffer overflow in gnome-session

By testing GNOME-related packages with Address Sanitizer I recently discovered several trivial to find bugs.

Two out of bounds bugs in the glib library were uncovered by running the test suite with Address Sanitizer enabled. One heap buffer overflow in the parameter parsing of gnome-session was uncovered by trying to start GNOME. Given that these bugs weren't discovered earlier means that most likely nobody ever used Address Sanitizer to test GNOME components.

I strongly recommend to GNOME and to other software communities to use Address Sanitizer testing in order to improve the quality of their software.

Out of bounds read in g_unichar_iswide_bsearch() / glib
Upstream bug report (again reported here)
Commit / fix
Fixed in 2.48.2.

Out of bounds read in token_stream_prepare() / glib
Upstream bug report
Commit / fix
Fixed in 2.48.0.

Heap buffer overflow in gnome-session
Upstream bug report
Commit / fix
Fixed in 3.20.2.

Multiple vulnerabilities in RPM – and a rant

Last year in November I decided that it might be a good idea to fuzz the parsers of package management tools in Linux distributions. I quickly found a couple of issues in DPKG and RPM. For DPKG the process went very smooth. I reported them to Debian's security team, eight days later fixes and security advisories were published by both Debian and Ubuntu, the main distributions using DPKG. For RPM the process was a bit more difficult.

If you want to report a bug to RPM you first may wonder where to report it. The RPM webpage [1] is a trac installation which has its own bug tracker. However if you try to register an account there you'll get forwarded to an HTTPS site with an expired certificate that doesn't match the domain name. In case you are brave and tell your browser to ignore all warnings you'll be greeted by a broken-looking trac without any CSS. Should you proceed and create an account you will learn that this doesn't help you, because in order to be allowed to report a bug you first have to ask on the mailing list or in the IRC channel for permission [2]. That's probably the point where many well-meaning bug reporters give up.

Okay, but RPM originally stood for “Red Hat package manager” (I've been told that today it stands for RPM Package Manager), so maybe Red Hat feels responsible. So I reported three bugs with sample files triggering them to the Red Hat security team on November 20th. The answer was – to put it mildly – a bit dissatisfying. I'll just fully quote it: “Thanks for the report. We also received about 30+ crash reports in RPM from
a different reporter recently so processing all of them (yours and the
others) will take quite a bit of time. We simply don't have the resources
to spend hours upon hours analyzing all crash reports.”

Okay, so I wasn't the only one fuzzing RPM and the maybe bugs will be fixed some day. I waited. In the meantime I got contacted by another person who also had tried to report fuzzing bugs in RPM and who has made similar experiences (maybe the same person who reported the 30+ crashers, I don't know).

In February I decided to ask what the state of things is. I also gave them a 30 day period until I'd publish the bugs (I know that it's now long past that, I shouldn't have let this issue wait so long). I ended up having a call with a Red Hat security team member and exchanged a couple of mails. I learned that RPM has a Github repository [3], which contains fixes for some (but not all) of the issues I reported, however that's nowhere referenced on its webpage. I then fuzzed the current RPM git code again and found two more issues I also reported to the Red Hat security team.

Status today is that the latest release of RPM on its webpage – 4.12.0.1 - is from July 2015, so all of the bugs still affect this release. However it seems there is an unofficial 4.13 release that's nowhere to be found on the RPM webpage, but Red Hat is using it together with some fixes [4]. And the Github repository says the latest release is 4.12.0, so according to three different sources three different versions are the current one (4.12.0, 4.12.0.1, 4.13).

One of the bugs – a stack overflow (write) - is still present in the latest code on Github.

Commend and Conclusion

This blog post probably reads like a big rant about how unprofessional Red Hat is in handling potential security issues. But this is contrary to my usual experience. I often see Red Hat developers being very active in the free software security community and often contributing in a positive way. Quite simply I expect better from Red Hat. This is not some dubious Enterprise vendor where I wouldn't be the least bit surprised of such a reaction.

The development process of RPM seems to be totally chaotic, it's neither clear where one reports bugs nor where one gets the latest code and security bugs don't get fixed within a reasonable time.

There's been some recent events that make me feel especially worried about this: An unknown person has created an entry in the Libarchive issue tracker [5] that points to an anonymous document [6] with a very detailed description of various security weaknesses in the FreeBSD update process (most of them are still unfixed). The most worrying thing about this is however that the anonymous post mentions the existence similar documents affecting multiple Linux distributions. These documents haven't shown up publicly yet and given the unclear nature of this incident it's hard to know whether they ever will become public or exist at all. But this should still be reason enough to have a closer look at potential security vulnerabilities in all pieces of Linux package management systems.

I haven't analyzed the RPM installation process in detail, so I can't say how likely it is that the RPM tool ever sees a malformed input file. It seems downloads happen over HTTP, but the first thing that happens is a signature check. As the signature is part of the RPM file it already needs to be parsed for this. The exact impact of these bugs would require further analysis. But independent of how likely this is I think the parser in such a crucial piece of software should be robust. It should be safe to use the rpm tool to show info about a file on the command line.


[1] http://rpm.org/
[2] http://rpm.org/wiki/ReportingBugs
[3] https://github.com/rpm-software-management/rpm
[4] http://pkgs.fedoraproject.org/cgit/rpms/rpm.git/diff/rpm-4.13.0-rpmtd-out-of-bounds.patch?h=f22&id=165614f3dd42caa188f78b55e7723dad2900b2f4
[5] https://github.com/libarchive/libarchive/issues/743
[6] https://gist.github.com/anonymous/e48209b03f1dd9625a992717e7b89c4f

All bugs were found with the help of american fuzzy lop. Here are the bugs:

Stack Overflow in glob() / rpmglob.c.
Sample file (test with rpm -i [input]):
https://crashes.fuzzing-project.org/rpm-stackoverflow-glob.rpm
Unfixed in the current Git code.

Heap out of bounds read in headerVerifyInfo() / header.c.
Sample file (test with “rpm -i [input]”):
https://crashes.fuzzing-project.org/rpm-heap-oob-read-headerVerifyInfo.rpm
Git commit:
https://github.com/rpm-software-management/rpm/commit/8e847d52c811e9a57239e18672d40f781e0ec48e

Null pointer access / segfault in stringFormat() / formats.c
Sample file (test with “rpm -i [input]”):
https://crashes.fuzzing-project.org/rpm-nullptr-rpmtdFormat.rpm
Git commit:
https://github.com/rpm-software-management/rpm/commit/cddf43a56f19711866371f02f378dc4095b0fadd

Out of bounds read in rpmtdGetNumber() / rpmtd.c
Sample file (test with “rpm -qi -p -- [input]”)
https://crashes.fuzzing-project.org/rpm-heap-oob-read-rpmtdGetNumber.rpm
Git commit:
https://github.com/rpm-software-management/rpm/commit/b722cf86200505b3e3fcbb2095c4ff61f1f5a2ab

Finally one annoying thing to admit: In my original report I included another segfault in headerVerifyInfo() with unclear reasons. However I am now unable to reproduce this one. It may be due to compiler options, different command line parameters or dependencies on my system that have changed. For completeness I'm still providing the sample file:
https://crashes.fuzzing-project.org/rpm-segfault-headerVerifyInfo.rpm
(Ideally the RPM developers should include all those sample files in a test suite they regularly run against an address sanitizer build of RPM.)

Please also note that I expect this list to be incomplete and there are likely more issues that could be uncovered with further fuzzing. I'll test that once all the existing issues are fixed.

Fun with Bignums: Crashing MatrixSSL and more

If you've been following my fuzzing work you will be aware that I've fuzzed various bignum libraries and found several bugs by comparing implementations against each other.

I recently had a look at the MatrixSSL's modular exponentiation function, for reasons I'll explain later. I wrote a wrapper, similar to previous experiments, comparing its result to OpenSSL.

I immediately noted that the pstm_exptmod() function of MatrixSSL has certain limitations that weren't documented. If one tries to calculate a modular exponentiation with the base equal to the modulus (a^b mod a, code) it would return an error. If one tries to calculate a modular exponentiation with the base zero (0^b mod a, code, CVE-2016-6885) it would crash with an invalid free operation, potentially leading to memory corruption.

In normal cryptographic operations these values should never appear. But these values are in many situations attacker controlled. One situation is during an RSA key exchange. What happens here is that a client encrypts a random secret with the server's key. However a malicious client could simply send a zero or the key's modulus here. I created a patch against openssl that allows to test this. Both values crash the MatrixSSL server. However the crash seems not to happen in pstm_exptmod(), it hits another bug earlier (CVE-2016-6886). In both cases the crash happens due to an invalid memory read in the function pstm_reverse(), which is not prepared for zero-sized inputs and will underflow the len variable.

The crashes have been fixed in 3.8.4, but the pstm_exptmod() function still doesn't accept these inputs. However it no longer crashes with a zero base. It may be possible that these issues can be triggered through other code paths. I haven't tested Diffie Hellman key exchanges, which also allows putting attacker-controlled values into a modular exponentiation.

This is an interesting class of bugs. Bignum functions often aren't designed to handle all inputs and only consider values that make sense in the context of the cryptographic operations. However if they are attacker-controlled this may lead to problems. I just discovered a somewhat similar issue in Nettle. They switched their RSA implementation from GMP's mpz_powm() function to mpz_powm_sec(), which is supposed to be sidechannel resistant. However mpz_powm_sec() is no drop-in replacement. Unlike mpz_pown() it doesn't accept even moduli and crashes with a floating point error. Therefore when trying to use a specifically crafted RSA key with an even modulus this will crash. Fortunately this was discovered before the change made it into a release.

But back to MatrixSSL: Independent of these corner case values that lead to failures I was able to identify an input value that caused a wrong calculation result (code,CVE-2016-6887.

There's a particularly severe risk with calculation errors in the modulo exponentiation when it comes to the RSA algorithm. A common way to speed up the calculation of RSA signatures is an algorithm based on the chinese remainder theorem (CRT) that splits it up into two smaller calculations. However if one of these calculations goes wrong an attacker can learn the private key. Last year Florian Weimer observed that various devices had this error and he could extract their keys. He recently mentioned on the oss-security mailing list that he also observed this in devices using MatrixSSL.

The way the MatrixSSL team "fixed" the miscalculation issue is not really satisfying: They now restrict the input to the pstm_exptmod() function to a set of bit sizes (512, 1024, 1536, 2048, 3072, 4096). My test input had a different bit size, therefore I cannot reproduce the miscalculation any more, but the underlying bug is most likely still there. I've tried to find inputs matching these restrictions and still causing wrong results, but without success yet. Independent of that the restriction means that connections to sites with unusual key sizes or Diffie Hellman moduli will no longer work. While they are not common, there is no rule that RSA keys or Diffie Hellman moduli need to have certain sizes.

Despite the fact that the bug may be still there the CRT attack will probably no longer work. A protection mechanism against that was implemented in version 3.8.3.

I got told by the MatrixSSL developers that their bignum code is based on libtommath. Therefore I also checked if the same bugs appeared there. That wasn't the case. The test input causing wrong results in MatrixSSL were correctly calculated by libtommath and it was also capable of correctly using a zero base or a base equal to the modulus.