A Quantitative Comparison of Coverage-Based Greybox Fuzzers

A Quantitative Comparison of
Coverage-Based Greybox Fuzzers
Natsuki Tsuzuki, Nagoya University, Japan
Norihiro Yoshida, Nagoya University, Japan
Koji Toda, Fukuoka Institute of Technology, Japan
Kenji Fujiwara, National Institute of Technology, Toyota College , Japan
Ryota Yamamoto, Nagoya University, Japan
Hiroaki Takada, Nagoya University, Japan
1

Many Coverage-Based Greybox Fuzzers
• AFL (originally developed by Zalewski in ‘13)
• AFLFast (Böhme et al., CCS ‘16, TSE ‘19)
• AFLGo (Böhme et al., CCS’ 17)
• FairFuzz (Lemieux & Sen, ASE ‘18)
https://www.pikrepo.com/
2

Questions
- The newest fuzzer is always better than the others?
- How each fuzzer works better than the use of default test suite?
3

Fuzzers are evaluated by different criteria
Fuzzers Fuzzing targets Criteria
AFLFast - Binutils 2.26
(c++filt, nm, objdump, readelf, size, strings)
- Coreutils 8.25
- # unique crashes
- ground truth
- line coverage
AFLGo - Binutils (detailed information is not found)
- Diffutils
- libPNG
- basic block coverage
- ground truth
FairFuzz - Binutils 2.28 (c++filt, nm,objdump, readelf )
- tcpdump
- xmllint
- mutool draw
- djpeg
- readpng
- basic block translations covered
- # occurrences of specific
sequences
It is difficult to compare the experimental results in these papers.
4

Research Overview
We prepared a unified collection of fuzzing targets
and then compared the existing fuzzers.
Evaluation measures:
- The number of executed paths
- Branch coverage
5

Research Questions
RQ1 Is a newer AFL-based fuzzer able to execute significantly
a larger number of paths?
RQ2 Does an AFL-based fuzzer improve branch coverage?
RQ3 Does a newer AFL-based fuzzer always achieve higher
coverage?
6

Fuzzers and Fuzzing Targets
- AFL 1.94b
- AFL 2.40b
- AFL 2.49b
- AFL 2.51b
- AFL 2.52b
- AFLFast
- AFLGo
- FairFuzz
- Binutils 2.26
(c++filt, nm, objdump, readelf)
- Binutils 2.28
- Binutils 2.32
Each execution of a fuzzer is terminated after 6 hours. 7

Significance test (# paths)
We used Steel-Dwass test for judging the significance.
AFL
1.94b
AFL
2.40b
AFL
2.49b
AFL
2.51b
AFL
2.52b
AFLFast AFLGo FairFuzz
AFL 1.94b - ✓
AFL 2.40b - ✓
AFL 2.49b - ✓
AFL 2.51b - ✓
AFL 2.52b - ✓
AFLFast -
AFLGo - ✓
FairFuzz ✓ ✓ ✓ ✓ ✓ ✓ -
Answer to RQ1: In most cases, the newest fuzzer FairFuzz executes
significantly larger number of paths. 8

Branch coverage
in the non-use and use of fuzzers
Answer to RQ2: The fuzzers can improve branch coverage. 9

Significance test (branch coverage)
We used Steel-Dwass test for judging the significance.
AFL
1.94b
AFL
2.40b
AFL
2.49b
AFL
2.51b
AFL
2.52b
AFLFast AFLGo FairFuzz
AFL 1.94b -
AFL 2.40b -
AFL 2.49b -
AFL 2.51b -
AFL 2.52b -
AFLFast -
AFLGo -
FairFuzz -
Answer to RQ3: The newer fuzzer does not always achieve
higher branch coverage. 10

Discussion
The results are different between the number of paths and
branch coverage.
Newer fuzzers are unoptimized for quality assurance
process based on branch coverage.
The use of fuzzers can improve branch coverage.
11

Thank you for listening!
E-mail: yoshida AT ertl.jp
12

A Quantitative Comparison of Coverage-Based Greybox Fuzzers

Recommended

Recommended

More Related Content

Similar to A Quantitative Comparison of Coverage-Based Greybox Fuzzers

Similar to A Quantitative Comparison of Coverage-Based Greybox Fuzzers (14)

More from Norihiro Yoshida

More from Norihiro Yoshida (12)

Recently uploaded

Recently uploaded (20)

A Quantitative Comparison of Coverage-Based Greybox Fuzzers