Diving into Control Flow Integrity
To improve security, modern systems contain many mitigation strategies that try to make it harder to exploit security vulnerabilities. Commonly used strategies include stack canaries, address space layout randomization (ASLR) and nonexecutable memory pages. Unfortunately the Linux distributions have been slow in adopting ASLR, but this is finally changing.
A new set of mitigation technologies has been discussed for a while under the umbrella term "Control Flow Integrity" (CFI). I won't get into technical details, but the very general idea is to add additional checks to the code that prohibit jumps to code parts that are not supposed to happen in the normal operation of a software.
LLVM's Clang compiler supports a form of CFI since version 3.7. Other forms of CFI are available under windows in Visual Studio (CFGuard) and with Grsecurity (RAP).
Recently I experimented a bit with the Clang version of CFI. It's been one of those situations where you dig into a topic and find out that it's really hard to Google for advice. The information out there is relatively limited: There's the official LLVM documentation, a talk by Kostya Serebryany that briefly mentions CFI in the second half and two blog posts by the company Trail of Bits. Also Chrome is using a subset of the CFI functionality and there's a list of bugs found with it.
Given the relatively scarce information when starting to use it you will experience situations where things fail and you won't find any help via Google.
So why would you want to use CFI? One possibility would be to create a super-hardened Linux system where, beyond using the "old" exploit mitigations like ASLR, one would also enable CFI. The computational costs of doing so are relatively small (Kostya Serebryany mentions in the talk above that they were unable to measure the CPU cost in Chrome). The executables grow in size and likely use more memory, but not in extraordinary amounts (below 10 percent). So from a performance side this is doable.
I started by compiling some small applications with CFI to see what happens. In some cases they "just work". In most cases I ended up having strange linker errors that were nontrivial to debug. Interesting for Linux distributions: There seems to be no dependency chain that needs to be considered (which is different from many of Clang's Sanitizer features). It's possible to have an executable built with CFI that depends on libraries not using CFI and it's also possible to have libraries using CFI that are used by executables not using CFI. This is good: If we'd strive for our super-hardened Linux we can easily exclude packages that create too many problems or start by including packages where we think they'd profit most from such extra protection.
CFI itself is enabled with the flag -fsanitize=cfi. It needs two other compiler and linker flags to work: -fvisibility=hidden for the compiler, which hides unnecessary symbols, and -flto for the linker to enable Link Time Optimization. The latter one needs the Gold linker, depending on your system you may need to install the LLVM gold plugin. If Gold isn't your default linker you can pass -fuse-ld=gold. Furthermore if you want to debug things you want to add -fno-sanitize-trap=all and enable extended debugging with -g, this will give you useful error messages in case CFI stops your application. (However you probably don't want to do that in production systems, as the error reporting can introduce additional bugs.)
In theory some of these flags only need to be passed to the compiler and others to the linker, but compilation systems aren't always strict in separating those, so we just add all of them to both.
So we can start compiling the current version of curl (7.53.1):
However we'll end up getting an error when it tries to link to the shared library. I guess it's a libtool problem, but I haven't digged into it. For now we can just disable the shared library by adding --disable-shared:
We end up getting a curl executable in src/curl. It runs, but as soon as we try to download something (e. g. src/curl google.com) it will just show "Illegal instruction" and quit. What's going on? Let's recompile, but this time with CFI error messages and some more debugging:
Now the output gets more verbose:
In tool_cb_hdr.c we find this function definition:
The code in sendf.c it complains about is this:
The writeheader function is a function pointer variable of type curl_write_callback. This is defined in curl.h:
So we have a function pointer of type curl_write_callback pointing to the function tool_header_cb. If you closely look at both function definitions you'll spot that they don't match. The first parameter is a void* for tool_header_cb and a char* for curl_write_callback. CFI doesn't allow such indirect function calls if the functions don't match exactly. The fix is to align them, in this case I proposed to change tool_header_cb to also use char*. There's a second function tool_write_cb which has exactly the same problem. The patch is already applied to curl's git repository.
With this patch applied curl runs again. There was a second issue that showed up when running curl's test suite. This was a difference of a signed versus unsigned char pointer. It was a bit trickier, because further down the function it expected unsigned values, so it needed an explicit cast. That fix has been applied as well.
A good question is how relevant these issues are. On the assembly level pointers are just pointers and there's no difference between a char* and a void*. But C sometimes fails in interesting and unexpected ways, so it's certainly better to fix those bugs. I don't know whether there's a realistic scenario in which such a bug ends up being a security bug, if you are aware of one please post a comment.
Notable here is that CFI is designed to be a runtime hardening feature, but given it interrupts program execution when it hits certain bug classes it is also a bug finding tool.
From my layman understanding CFI is more likely to catch and prevent C++ bugs, because indirect function calls are much more common within object oriented programming. But given that most C++ applications are more complex, it's more likely you run into problems when trying to compile them with CFI.
For now I think it's a good idea that C/C++-based projects test their code with CFI. This lays the groundwork for future projects where people might want to create hardened systems with CFI enabled. But it can also helps to find bugs. One project that has already extensively uncovered bugs with CFI is Chrome. If you follow Chrome's release notes you may have noticed that they often attribute security fixes to CFI.
A new set of mitigation technologies has been discussed for a while under the umbrella term "Control Flow Integrity" (CFI). I won't get into technical details, but the very general idea is to add additional checks to the code that prohibit jumps to code parts that are not supposed to happen in the normal operation of a software.
LLVM's Clang compiler supports a form of CFI since version 3.7. Other forms of CFI are available under windows in Visual Studio (CFGuard) and with Grsecurity (RAP).
Recently I experimented a bit with the Clang version of CFI. It's been one of those situations where you dig into a topic and find out that it's really hard to Google for advice. The information out there is relatively limited: There's the official LLVM documentation, a talk by Kostya Serebryany that briefly mentions CFI in the second half and two blog posts by the company Trail of Bits. Also Chrome is using a subset of the CFI functionality and there's a list of bugs found with it.
Given the relatively scarce information when starting to use it you will experience situations where things fail and you won't find any help via Google.
So why would you want to use CFI? One possibility would be to create a super-hardened Linux system where, beyond using the "old" exploit mitigations like ASLR, one would also enable CFI. The computational costs of doing so are relatively small (Kostya Serebryany mentions in the talk above that they were unable to measure the CPU cost in Chrome). The executables grow in size and likely use more memory, but not in extraordinary amounts (below 10 percent). So from a performance side this is doable.
I started by compiling some small applications with CFI to see what happens. In some cases they "just work". In most cases I ended up having strange linker errors that were nontrivial to debug. Interesting for Linux distributions: There seems to be no dependency chain that needs to be considered (which is different from many of Clang's Sanitizer features). It's possible to have an executable built with CFI that depends on libraries not using CFI and it's also possible to have libraries using CFI that are used by executables not using CFI. This is good: If we'd strive for our super-hardened Linux we can easily exclude packages that create too many problems or start by including packages where we think they'd profit most from such extra protection.
CFI itself is enabled with the flag -fsanitize=cfi. It needs two other compiler and linker flags to work: -fvisibility=hidden for the compiler, which hides unnecessary symbols, and -flto for the linker to enable Link Time Optimization. The latter one needs the Gold linker, depending on your system you may need to install the LLVM gold plugin. If Gold isn't your default linker you can pass -fuse-ld=gold. Furthermore if you want to debug things you want to add -fno-sanitize-trap=all and enable extended debugging with -g, this will give you useful error messages in case CFI stops your application. (However you probably don't want to do that in production systems, as the error reporting can introduce additional bugs.)
In theory some of these flags only need to be passed to the compiler and others to the linker, but compilation systems aren't always strict in separating those, so we just add all of them to both.
So we can start compiling the current version of curl (7.53.1):
./configure CC=clang CXX=clang++ LD=clang CFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto" CXXFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto" LDFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto"
make
However we'll end up getting an error when it tries to link to the shared library. I guess it's a libtool problem, but I haven't digged into it. For now we can just disable the shared library by adding --disable-shared:
./configure CC=clang CXX=clang++ LD=clang CFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto" CXXFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto" LDFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto" --disable-shared
make
We end up getting a curl executable in src/curl. It runs, but as soon as we try to download something (e. g. src/curl google.com) it will just show "Illegal instruction" and quit. What's going on? Let's recompile, but this time with CFI error messages and some more debugging:
./configure CC=clang CXX=clang++ LD=clang CFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto -fno-sanitize-trap=all -g" CXXFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto -fno-sanitize-trap=all" LDFLAGS="-fsanitize=cfi -fvisibility=hidden -fuse-ld=gold -flto -fno-sanitize-trap=all" --disable-shared --enable-debug
make
Now the output gets more verbose:
sendf.c:578:22: runtime error: control flow integrity check for type 'unsigned long (char , unsigned long, unsigned long, void )' failed during indirect function call
/mnt/ram/curl-7.53.1/src/tool_cb_hdr.c:44: note: tool_header_cb defined here
In tool_cb_hdr.c we find this function definition:
size_t tool_header_cb(void *ptr, size_t size, size_t nmemb, void *userdata)
The code in sendf.c it complains about is this:
size_t wrote = writeheader(ptr, 1, chunklen, data->set.writeheader);
The writeheader function is a function pointer variable of type curl_write_callback. This is defined in curl.h:
typedef size_t (*curl_write_callback)(char *buffer, size_t size, size_t nitems, void *outstream);
So we have a function pointer of type curl_write_callback pointing to the function tool_header_cb. If you closely look at both function definitions you'll spot that they don't match. The first parameter is a void* for tool_header_cb and a char* for curl_write_callback. CFI doesn't allow such indirect function calls if the functions don't match exactly. The fix is to align them, in this case I proposed to change tool_header_cb to also use char*. There's a second function tool_write_cb which has exactly the same problem. The patch is already applied to curl's git repository.
With this patch applied curl runs again. There was a second issue that showed up when running curl's test suite. This was a difference of a signed versus unsigned char pointer. It was a bit trickier, because further down the function it expected unsigned values, so it needed an explicit cast. That fix has been applied as well.
A good question is how relevant these issues are. On the assembly level pointers are just pointers and there's no difference between a char* and a void*. But C sometimes fails in interesting and unexpected ways, so it's certainly better to fix those bugs. I don't know whether there's a realistic scenario in which such a bug ends up being a security bug, if you are aware of one please post a comment.
Notable here is that CFI is designed to be a runtime hardening feature, but given it interrupts program execution when it hits certain bug classes it is also a bug finding tool.
From my layman understanding CFI is more likely to catch and prevent C++ bugs, because indirect function calls are much more common within object oriented programming. But given that most C++ applications are more complex, it's more likely you run into problems when trying to compile them with CFI.
For now I think it's a good idea that C/C++-based projects test their code with CFI. This lays the groundwork for future projects where people might want to create hardened systems with CFI enabled. But it can also helps to find bugs. One project that has already extensively uncovered bugs with CFI is Chrome. If you follow Chrome's release notes you may have noticed that they often attribute security fixes to CFI.