Ubuntu Security Podcast 199

Last updated:


Ubuntu Security Podcast Episode

Paper

While many real-world programs are shipped with configurations to enable/disable functionalities, fuzzers have mostly been applied to test single configurations of these programs. In this work, we first conduct an empirical study to understand how program configurations affect fuzzing performance. We find that limiting a campaign to a single configuration can result in failing to cover a significant amount of code. We also observe that different program configurations contribute differing amounts of code coverage, challenging the idea that each one can be efficiently fuzzed individually. Motivated by these two observations, we propose ConfigFuzz , which can fuzz configurations along with normal inputs. ConfigFuzz transforms the target program to encode its program options within part of the fuzzable input, so existing fuzzers’ mutation operators can be reused to fuzz program configurations. We instantiate ConfigFuzz on six configurable, common fuzzing targets, and integrate their executions in FuzzBench. In our evaluation, ConfigFuzz outperforms two baseline fuzzers in four targets, while the results are mixed in the other targets due to program size and configuration space. We also analyze the options fuzzed by ConfigFuzz and how they affect the performance.

Transcript

Introduction

Hi there!

My name is Andrei, a member of the Ubuntu Security Team. Today we will continue the section started in Episode 194 last month, where we bring cybersecurity academic research closer to the industry.

In this second episode, we will check out a recent paper, published only three months ago in the journal "Transactions on Software Engineering and Methodology", issued by the Association for Computing Machinery. Written by five researchers from various American education institutions, "Fuzzing Configurations of Program Options" approaches the subject of configuration-aware fuzzing and introduces ConfigFuzz, which is a generator of wrappers for programs to be fuzzed. The goal here is to split the random input generated by a standard fuzzer into two parts: one that dictates the configuration of the program, and another that is piped as input data to a process instantiated based on the provided executable.

Key Topics

But before exploring in-depth the approach proposed by this paper, let’s take some time to better understand the key concepts that are leveraged.

Firstly, what is in fact a program? We can consider it a sequence of instructions that are understood by the processor of a computer. More than that, it interacts with the environment in which it runs, namely the operating system, which can be a proxy for user’s interaction. What makes a program useful is that it can take input data, and translate it into other forms. For example, consider a static analysis tool that scans code for security issues; this has the source files as an input, over which it applies a series of analytical heuristics in order to generate possible security concerns that need to be tackled by the developers.

Beside standard input and files to process, a special form of input data for a program is its configuration. It dictates how the program should behave at runtime, regardless of the input data. The configuration can take different forms, for example command line options and configuration files. We can think about apt, the standard package management system present in Debian-based distros. It has a configuration file named apt.conf which sets its behavior. For example, the Architecture option from the APT group may be used to override the default architecture for which apt was compiled, with an arbitrary, user-controlled value.

Another relevant observation that was identified by the paper we are looking at is the possible relationships between various options. The first type of relation is dependence. To continue the apt-related examples, –install-suggests is only possible when used in conjunction with the install command-line option. On the other hand, we can have options that are contradicting each other. Here, we can think about two options that are frequently implemented in CLI programs, namely –silent and -verbose.

The last concept we will introduce here is fuzzing, which led us to a small story. During a lightning storm in 1988, Professor Barton Miller from University of Wisconsin observed that garbage characters were introduced in his terminal as a result of electrical interference between the modem and the lightning. He came up with a project for his students, namely The Fuzz Generator, in which random input was thrown to Unix utilities to discover bugs.

Despite 35 years having passed since that moment, this core definition of fuzzing has remained the same. What makes fuzzing work is that these random inputs make the program enter diverse states. Correlating this with the fact that programmers will try to maximize the utility of the program in normal conditions, the unusual paths that are executed, and the unusual input data may trigger issues in the program. The root cause for the crashes detected by the fuzzer may vary from NULL pointer' dereferences to running code from a tainted, invalid address stored on the program’s stack and many others.

A metric that is frequently used to assess the performance of a fuzzer is coverage. It describes the percentage of code paths executed during a fuzzing session. A fuzzer with greater coverage executes a higher percentage of the various code paths and hence is more likely to uncover latent issues in the code.

Observations. Preliminary Study

Coming back to the paper, the authors start with two simple observations.

Firstly, fuzzing sessions usually assess the security of the default configuration of a program. The developers or researchers who are running the test will not change it, despite the fact that it may generate greater coverage.

Secondly, the authors stated that a configuration takes values from a discrete set.

Based on these two pieces of information, they proposed a fuzzer which generates different forms of configuration beside the random data that is fed as input in generic fuzzing sessions.

But, before building anything too complex, the researchers started with a preliminary study for validating their idea. The applications, nm, gif2png, and ffmpeg were selected as software under test. For each one, they manually generated multiple configurations, and attached AFL (american fuzzy lop) as a fuzzer. The coverage was measured after finishing the fuzzing sessions with llvm-cov.

They observed that when the configurations vary, the execution proceeds on distinct branches in the control flow graph as compared to fuzzing sessions with only a single, default configuration. To use FFmpeg as an example, employing various configurations resulted in 13% extra lines.

ConfigFuzz

With these promising results, they implemented a configuration-aware fuzzer, named ConfigFuzz, whose purpose was to fuzz program configuration beside the usual input.

The system had as an input the program that should be fuzzed, and a grammar. The grammar permitted the definition of all command-line options, with types like Boolean, finite sets of possible values (namely choices), integers, real numbers, and strings. In addition to this, the grammar modeled special cases that could be encountered, like an argument used for indicating the input file, two arguments that are dependent on each other, two arguments that contradicts each other, the maximum length of a generated string, and the maximum number of arguments the program may have.

The internals of ConfigFuzz can be summarized as a code generator. It produces, as an output, a C wrapper that translates the input coming from a fuzzer into two parts: one that encodes the options to be used, and another that is directly fed, without further modification, as input data.

Let’s consider a simple program with 2 arguments, an integer and a string with a maximum length of 10 characters. Given an initial input of 100 bytes, the first 4 were converted to an integer, and the next 10 to a string. The rest of 86 bytes were directed to the program as regular input.

Having this wrapping mechanism in-place, ConfigFuzz was integrated in Google’s FuzzBench which was running two well-known fuzzers, AFL and afl++. In this way, the wrapped program could receive random input that was processed with the aforementioned splitting mechanism.

Evaluation

For evaluating ConfigFuzz, the researchers chose 6 programs with command line options, namely the three mentioned in the preliminary study (nm, gif2pngandffmpeg), plusobjdump,cxxfilt, andxmllint`.

Having these programs, they set different configurations for ConfigFuzz. The first three had the maximum number of generated options set to one, two or the program’s maximum number of accepted options. They were instructed to not generate string options in order to ensure consistency in the generated data.

The last two ConfigFuzz configurations had the string options enabled, but limited to one or two occurrences, respectively.

The experimental setup consisted of Intel-based CPUs with a total of 48 cores, 192 GB of RAM, and Ubuntu 16.04 as an operating system.

To cut to the results, let’s look at three observations from xmllint:

As you can see, these results are very specific to xmllint - to see the results obtained for the other programs, I invite you to take a look at the paper, which can be freely accessed in ACM’s Digital Library. Please see the show notes, we’ve included a link there.

Related Work

Before wrapping this episode, let me tell you about an alternative to fuzzing programs’ arguments.

american fuzzy lop has an experimental feature named argv_fuzzing. Compared to ConfigFuzz, it does not require a grammar, thus being capable of detecting security issues in the code parsing the arguments.

However, since development on AFL was discontinued a few years ago , I want to mention that this feature was ported to afl++ too, so you can download the official afl++ Docker image, spin up a container and test this feature by yourself.

Conclusion

Huh, as we reached the end of this episode, let’s see what we were talking about! We started with some core concepts: what is a program, how it interacts with its environment, the configuration and CLI arguments, and our little story with fuzzing and lightning storms.

After that, we've seen the architecture of ConfigFuzz, the solution proposed by the authors, but not before examining the results of the preliminary study. Lastly, we reviewed the results and some related work on arguments’ fuzzing.

This being said, if you have any recommendations for improving this section, please contact us at [email protected]. Until next time!