AFL internals

AFL (American Fuzzy Loop) is one of the most commonly used coverage-guided fuzzers. It has a lot of variants as well – each suited for a different scenario. This time I wanted to understand a bit more about the AFL fuzzer – this note contains some of my scribblings based on the same with some examples which can help you understand how it works.

Coverage

As I said earlier – AFL is a coverage guided fuzzer which means that it has tries to record what all are the paths executed in the target. One method of doing this is to just print the values of the PC register at all the basic blocks(which is what KCoV does) – but AFL has a more effective way of doing this – using a bitmap (more like a byte-map)!

Byte-map

AFL uses compile time instrumentation to add routines at the end of each basic block. When AFL initialises the target program it also passes a shared buffer to the program which acts like bitmap. Every byte of this shared buffer represents a tuple of [previous branch, current branch] (which means the program passed through the previous branch and reached the current branch).

Also AFL actually provides like each block/branch a random number during the compilation process. This is branch identifier is unique for each of the block and is typically of the size 2 bytes.

Now let’s take a look at how the branch identifiers are used to access the bitmap.

Algorithm

Assuming that the function AFL instruments is coverage(). Not that the code below is just my interpretation of the code

def coverage():
    global prev
    curr = block_identifier() // returns identifier of block
    shared_buf[curr ^ prev]++
    prev = curr >> 1

Assume here that prev is initialised from the prev time the code was called so already has the value of a prev block. Also the index of the array is defined by curr ^ prev which makes it a unique value.

Such an approach actually helps in getting more insight in the execution path since now each byte represents a transition. So we can distinguish between executions that call the same blocks but in a different order.

Note that – The reason for the shift operation in the last line of the pseudocode is to preserve the directionality of tuples (without
this, A ^ B would be indistinguishable from B ^ A) and to retain the identity of tight loops (otherwise, A ^ A would be obviously equal to B ^ B).

Example

Let’s take a small example to show how the above algorithm works. Assume the following scenario where there are 4 basic blocks – A, B, C, D. And AFL assigns each of them the following identifiers (I have assigned 1 byte identifiers for simplicity) –

Block A = 0x24
Block B = 0xab
Block C = 0xde
Block D = 0xad

Now imagine we have an execution scenario where the blocks get executed in the order A -> B -> C -> D . Now let’s see what are the locations accessed in the bitmap.

prev = 0x24 >> 1 = 0x12
 exec  | curr ^ prev | addr | prev
A -> B | 0x12 ^ 0xab | 0xb9 | prev = 0x55
B -> C | 0x55 ^ 0xde | 0x8b | prev = 0x6f
C -> D | 0x6f ^ 0xad | 0xc2 | prev = 0x56

You can see from the above that the addresses that get assigned are 0xb9, 0x8b and 0xc2 . Each representing AB, BC and CD respectively.

Now let’s try to find the execution trace of D -> C -> B -> A. (With the same block values as above).

prev = 0xad >> 1 = 0x56
 exec  | curr ^ prev | addr | prev
D -> C | 0x56 ^ 0xde | 0x88 | prev = 0x6f
C -> B | 0x6f ^ 0xab | 0xc4 | prev = 0x55
B -> A | 0x55 ^ 0x24 | 0x71 | prev = 0x12

Now notice that the addresses used are 0x88, 0xc4 and 0x71 which are totally different from the values that were triggered earlier.

Collisions

An interesting question here is the presence of collisions. The size of the shared buffer actually played a role to minimise the same. We can actually tune the size of the buffer to find an optimal solution (size of buffer proportional to size of identifiers). Take a look at data that I picked from lcamtuf’s notes on the same.

   Branch cnt | Colliding tuples | Example targets
  ------------+------------------+-----------------
        1,000 | 0.75%            | giflib, lzo
        2,000 | 1.5%             | zlib, tar, xz
        5,000 | 3.5%             | libpng, libwebp
       10,000 | 7%               | libxml
       20,000 | 14%              | sqlite
       50,000 | 30%              | -

Using Coverage

So, Now that we know that the fuzzer maintains a global buffer which contains the execution data as shown previously. Let’s take a look at how the coverage is used to detect new behaviours.

The generated map is compared with existing maps to check whether there is a difference in the hits for each byte. This is much easier to do than comparing actual PC values. Inputs that do not trigger any change to the execution state are discarded.

bits not bytes

Why are the counts necessary? Well, If we only look at the tuples – then the we would have multiple execution states that give out the same result. Let’s look at an example.

Execution trace #1 | A -> B -> C -> D -> E -> A
Execution trace #2 | A -> B -> C -> D -> E -> A -> B -> C -> D -> E

In case we there were no hit counts, both these would look similar.

Similarity matrix

AFL tries to find out unique execution traces. This means it tries to avoid traces that exhibit the same behaviour. Let’s use lcamtuf’s own example to understand this.

Execution trace #1 | A -> B -> C -> D -> E
Execution trace #2 | A -> B -> C -> A -> E

AFL would find these execution traces unique – since CA and AE are tuples that were not encountered before. However if after these got processed we added a third execution trace.

Execution trace #3 | A -> B -> C -> A -> B -> C -> A -> B -> C -> D -> E

At this point, AFL would discard #3 since there is no new tuple that was executed. So in effect no new branch was discovered and the execution is not useful for us.

Hit counters

Okay! now you again – A question of why hit counters arises. (Note that all example are given to illustrate points).

In addition to just detecting new tuples, AFL also keep track of the hit count. However what AFL has done is divide the values into several brackets. According to lcamtuf’s blogpost these are – 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+. Now, only if for a given map we actually have a transition of the bracket is it flagged as a newer input.

Mutation

AFL tries to mutate the input based on the coverage so that it can explore as many paths as possible. Now that we have looked into how a new path is recognised let’s take a look at how the mutation procedure works.

Input queue

AFL has a queue known as the input queue which contains inputs that are known to have generated newer paths. AFL takes inputs off the queue and mutates it randomly and passes it to the program. Then it checks whether the new input actually generated new coverage – If it did then it’s added to the input queue.

This technique produces something called a Corpus which is something like a Database containing a set of inputs which are known to produce an unseen execution. These would be used later on to seed the next set of inputs.

lcamtuf’s paper show some statistics of how this technique has been proved to having more path coverage than a technique where we just modify the a single seed input multiple times.

Corpus reduction

The exploration technique actually would mean that we get a lot of new test cases which would a direct superset of the existing ones. Hence, AFL periodically reduces the queue to select a smaller set of test cases which cover all the tuples seen so far, hence making them the more relevant ones to the tool.

I am not explaining the implementation here – You can look at the implementation – AFL github.

Trimming Input Files

Another important process is to cut down the input file size. This is because larger sizes lead to slower I/O, slower and less effective mutations.

Interestingly, AFL comes with it’s own built-in trimmer.

Fuzzing Strategies

<TODO>

Links

Leave a comment