9. Fast Fuzzing
9
Producer
Generator
Fuzzer
Program under test
Millions of
Test inputs
Millions of test cases / sec
"World's fastest fuzzer"
Producer
1 produce_member_0: {
2 ++returnp;
3 *returnp = &&return__0__0__member;
4 val = map(2);
5 goto *produce_ws[val];
6 return__0__0__member:;
7 *returnp = &&return__1__0__member;
8 val = map(1);
9 goto *produce_string[val];
10 return__1__0__member:;
11 *returnp = &&return__2__0__member;
12 val = map(2);
13 goto *produce_ws[val];
14 return__2__0__member:;
15 *out_region++ = ’:’;
16 *returnp = &&return__4__0__member;
17 val = map(1);
18 goto *produce_element[val];
19 return__4__0__member:;
20 --returnp;
21 goto **returnp;
22 }
1 gen_member_0:
2 val = map(2)
3 call *gen_ws[val]
4 val = map(1)
5 call *gen_string[val]
6 val = map(2)
7 call *gen_ws[val]
8 *out_region = ’:’
9 incr out_region
10 val = map(1)
11 call *gen_element[val]
12 ret
Figure 9: A fragment of the
grammar VM that generates
9.2 Context Threaded VM
One of the problems with dire
https://github.com/vrthra/F1
Gopinath, Zeller: "Building Fast Fuzzers", arXiv:1911.07707 (2019)
15. Mining Grammars
15
Grammar
Miner
Sample
Inputs
Grammar
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
Inferring Input Grammars from Dynamic Control Flow
Anonymous Author(s)
ABSTRACT
A program is characterized by its input model, and a formal input
model can be of use in diverse areas including vulnerability analysis,
reverse engineering, fuzzing and software testing, clone detection
and refactoring. Unfortunately, input models for typical programs
are often unavailable or out of date. While there exist algorithms
that can mine the syntactical structure of program inputs, they ei-
ther produce unwieldy and incomprehensible grammars, or require
heuristics that target specic parsing patterns.
In this paper, we present a general algorithm that takes a pro-
gram and a small set of sample inputs and automatically infers a
readable context-free grammar capturing the input language of
the program. We infer the syntactic input structure only by ob-
serving comparisons of input characters at dierent locations of
the input parser. This works on all program stack based recursive
descent input parsers, including PEG and parser combinators, and
can do entirely without program specic heuristics. Our Mimid
prototype produced accurate and readable grammars for a variety
of evaluation subjects, including expr, URLparse, and microJSON.
CCS CONCEPTS
• Software and its engineering → Dynamic analysis; • The-
hSTARTi ::= hjson_rawi
hjson_rawi ::= ‘ ’ hjson_string0i | ‘[’ hjson_list0i | ‘{’ hjson_dict0i
| hjson_number0i | ‘true’ | ‘false’ | ‘null’
hjson_number0i ::= hjson_numberi+
| hjson_numberi+ ‘e’ hjson_numberi+
hjson_numberi ::= ‘+’ | ‘-’ | ‘.’ | [0-9] | ‘E’ | ‘e’
hjson_string0i ::= hjson_stringi* ‘ ’
hjson_list0i ::= ‘]’
| hjson_rawi (‘,’ hjson_rawi )* ‘]’
| ( ‘,’ hjson_rawi )+ (‘,’ hjson_rawi )* ‘]’
hjson_dict0i ::= ‘}’
| ( ‘ ’ hjson_string0i ‘:’ hjson_rawi ‘,’ )*
‘ ’ hjson_string0i ‘:’ hjson_rawi ‘}’
hjson_stringi ::= ‘ ’ | ‘!’ | ‘#’ | ‘$’ | ‘%’ | ‘’ | ‘’’
| ‘*’ | ‘+’ | ‘-’ | ‘,’ | ‘.’ | ‘/’ | ‘:’ | ‘;’
| ‘’ | ‘=’ | ‘’ | ‘?’ | ‘@’ | ‘[’ | ‘]’ | ‘^’ | ’_’, ’‘’,
| ‘{’ | ‘|’ | ‘}’ | ‘~’
| ‘[A-Za-z0-9]’
| ‘’ ‘decode_escape’
hdecode_escapei ::= ‘ ’ | ‘/’ | ‘b’ | ‘f’ | ‘n’ | ‘r’ | ‘t’
Figure 1: JSON grammar extracted from microjson.py.
Parser-Directed
Test Generator
16. Mining Grammars
16
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
Inferring Input Grammars from Dynamic Control Flow
Anonymous Author(s)
ABSTRACT
A program is characterized by its input model, and a formal input
model can be of use in diverse areas including vulnerability analysis,
reverse engineering, fuzzing and software testing, clone detection
and refactoring. Unfortunately, input models for typical programs
are often unavailable or out of date. While there exist algorithms
that can mine the syntactical structure of program inputs, they ei-
ther produce unwieldy and incomprehensible grammars, or require
heuristics that target specic parsing patterns.
In this paper, we present a general algorithm that takes a pro-
gram and a small set of sample inputs and automatically infers a
readable context-free grammar capturing the input language of
the program. We infer the syntactic input structure only by ob-
serving comparisons of input characters at dierent locations of
the input parser. This works on all program stack based recursive
descent input parsers, including PEG and parser combinators, and
can do entirely without program specic heuristics. Our Mimid
prototype produced accurate and readable grammars for a variety
of evaluation subjects, including expr, URLparse, and microJSON.
CCS CONCEPTS
• Software and its engineering → Dynamic analysis; • The-
hSTARTi ::= hjson_rawi
hjson_rawi ::= ‘ ’ hjson_string0i | ‘[’ hjson_list0i | ‘{’ hjson_dict0i
| hjson_number0i | ‘true’ | ‘false’ | ‘null’
hjson_number0i ::= hjson_numberi+
| hjson_numberi+ ‘e’ hjson_numberi+
hjson_numberi ::= ‘+’ | ‘-’ | ‘.’ | [0-9] | ‘E’ | ‘e’
hjson_string0i ::= hjson_stringi* ‘ ’
hjson_list0i ::= ‘]’
| hjson_rawi (‘,’ hjson_rawi )* ‘]’
| ( ‘,’ hjson_rawi )+ (‘,’ hjson_rawi )* ‘]’
hjson_dict0i ::= ‘}’
| ( ‘ ’ hjson_string0i ‘:’ hjson_rawi ‘,’ )*
‘ ’ hjson_string0i ‘:’ hjson_rawi ‘}’
hjson_stringi ::= ‘ ’ | ‘!’ | ‘#’ | ‘$’ | ‘%’ | ‘’ | ‘’’
| ‘*’ | ‘+’ | ‘-’ | ‘,’ | ‘.’ | ‘/’ | ‘:’ | ‘;’
| ‘’ | ‘=’ | ‘’ | ‘?’ | ‘@’ | ‘[’ | ‘]’ | ‘^’ | ’_’, ’‘’,
| ‘{’ | ‘|’ | ‘}’ | ‘~’
| ‘[A-Za-z0-9]’
| ‘’ ‘decode_escape’
hdecode_escapei ::= ‘ ’ | ‘/’ | ‘b’ | ‘f’ | ‘n’ | ‘r’ | ‘t’
Figure 1: JSON grammar extracted from microjson.py.
Testers can control
what to test
and how to test
▪ Assign probabilities to
productions + elements
▪ Add special inputs for
logins / passwords /
security testing
▪ Complete grammar with
hard-to-infer features
▪ Testers can do this – or
use full automatic mode
17. Assigning Probabilities
17
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
Inferring Input Grammars from Dynamic Control Flow
Anonymous Author(s)
ABSTRACT
A program is characterized by its input model, and a formal input
model can be of use in diverse areas including vulnerability analysis,
reverse engineering, fuzzing and software testing, clone detection
and refactoring. Unfortunately, input models for typical programs
are often unavailable or out of date. While there exist algorithms
that can mine the syntactical structure of program inputs, they ei-
ther produce unwieldy and incomprehensible grammars, or require
heuristics that target specic parsing patterns.
In this paper, we present a general algorithm that takes a pro-
gram and a small set of sample inputs and automatically infers a
readable context-free grammar capturing the input language of
the program. We infer the syntactic input structure only by ob-
serving comparisons of input characters at dierent locations of
the input parser. This works on all program stack based recursive
descent input parsers, including PEG and parser combinators, and
can do entirely without program specic heuristics. Our Mimid
prototype produced accurate and readable grammars for a variety
of evaluation subjects, including expr, URLparse, and microJSON.
CCS CONCEPTS
• Software and its engineering → Dynamic analysis; • The-
hSTARTi ::= hjson_rawi
hjson_rawi ::= ‘ ’ hjson_string0i | ‘[’ hjson_list0i | ‘{’ hjson_dict0i
| hjson_number0i | ‘true’ | ‘false’ | ‘null’
hjson_number0i ::= hjson_numberi+
| hjson_numberi+ ‘e’ hjson_numberi+
hjson_numberi ::= ‘+’ | ‘-’ | ‘.’ | [0-9] | ‘E’ | ‘e’
hjson_string0i ::= hjson_stringi* ‘ ’
hjson_list0i ::= ‘]’
| hjson_rawi (‘,’ hjson_rawi )* ‘]’
| ( ‘,’ hjson_rawi )+ (‘,’ hjson_rawi )* ‘]’
hjson_dict0i ::= ‘}’
| ( ‘ ’ hjson_string0i ‘:’ hjson_rawi ‘,’ )*
‘ ’ hjson_string0i ‘:’ hjson_rawi ‘}’
hjson_stringi ::= ‘ ’ | ‘!’ | ‘#’ | ‘$’ | ‘%’ | ‘’ | ‘’’
| ‘*’ | ‘+’ | ‘-’ | ‘,’ | ‘.’ | ‘/’ | ‘:’ | ‘;’
| ‘’ | ‘=’ | ‘’ | ‘?’ | ‘@’ | ‘[’ | ‘]’ | ‘^’ | ’_’, ’‘’,
| ‘{’ | ‘|’ | ‘}’ | ‘~’
| ‘[A-Za-z0-9]’
| ‘’ ‘decode_escape’
hdecode_escapei ::= ‘ ’ | ‘/’ | ‘b’ | ‘f’ | ‘n’ | ‘r’ | ‘t’
Figure 1: JSON grammar extracted from microjson.py.
80%
0%
18. Inputs from Hell
18
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
Inferring Input Grammars from Dynamic Control Flow
Anonymous Author(s)
ABSTRACT
A program is characterized by its input model, and a formal input
model can be of use in diverse areas including vulnerability analysis,
reverse engineering, fuzzing and software testing, clone detection
and refactoring. Unfortunately, input models for typical programs
are often unavailable or out of date. While there exist algorithms
that can mine the syntactical structure of program inputs, they ei-
ther produce unwieldy and incomprehensible grammars, or require
heuristics that target specic parsing patterns.
In this paper, we present a general algorithm that takes a pro-
gram and a small set of sample inputs and automatically infers a
readable context-free grammar capturing the input language of
the program. We infer the syntactic input structure only by ob-
serving comparisons of input characters at dierent locations of
the input parser. This works on all program stack based recursive
descent input parsers, including PEG and parser combinators, and
can do entirely without program specic heuristics. Our Mimid
prototype produced accurate and readable grammars for a variety
of evaluation subjects, including expr, URLparse, and microJSON.
CCS CONCEPTS
• Software and its engineering → Dynamic analysis; • The-
hSTARTi ::= hjson_rawi
hjson_rawi ::= ‘ ’ hjson_string0i | ‘[’ hjson_list0i | ‘{’ hjson_dict0i
| hjson_number0i | ‘true’ | ‘false’ | ‘null’
hjson_number0i ::= hjson_numberi+
| hjson_numberi+ ‘e’ hjson_numberi+
hjson_numberi ::= ‘+’ | ‘-’ | ‘.’ | [0-9] | ‘E’ | ‘e’
hjson_string0i ::= hjson_stringi* ‘ ’
hjson_list0i ::= ‘]’
| hjson_rawi (‘,’ hjson_rawi )* ‘]’
| ( ‘,’ hjson_rawi )+ (‘,’ hjson_rawi )* ‘]’
hjson_dict0i ::= ‘}’
| ( ‘ ’ hjson_string0i ‘:’ hjson_rawi ‘,’ )*
‘ ’ hjson_string0i ‘:’ hjson_rawi ‘}’
hjson_stringi ::= ‘ ’ | ‘!’ | ‘#’ | ‘$’ | ‘%’ | ‘’ | ‘’’
| ‘*’ | ‘+’ | ‘-’ | ‘,’ | ‘.’ | ‘/’ | ‘:’ | ‘;’
| ‘’ | ‘=’ | ‘’ | ‘?’ | ‘@’ | ‘[’ | ‘]’ | ‘^’ | ’_’, ’‘’,
| ‘{’ | ‘|’ | ‘}’ | ‘~’
| ‘[A-Za-z0-9]’
| ‘’ ‘decode_escape’
hdecode_escapei ::= ‘ ’ | ‘/’ | ‘b’ | ‘f’ | ‘n’ | ‘r’ | ‘t’
Figure 1: JSON grammar extracted from microjson.py.
80%
0%
Sample
Inputs
▪ Learn probabilities from sample inputs
▪ Use same probabilities as past bugs
▪ Invert probabilities from common inputs
▪ Obtain unlikely, yet valid inputs
Soremekun, Pavese, Havrikov, Grunske, Zeller: “Inputs from Hell:
Learning Input Distributions for Grammar-Based Test Generation“, 2019
21. start
Order Form
Terms and Conditions
click('Terms and conditions')
Thank You
fill(...)
submit('submit')
click('order form') click('order form')
Modeling GUI Interaction
21
How to model both
textual and GUI input?
22. start
Order Form
Terms and Conditions
click('Terms and conditions')
Thank You
fill(...)
submit('submit')
click('order form') cli
Embedding Finite State Models
22
start ::= order form
order form ::=
click('terms and conditions') terms and conditions
|
fill('name', Walter White)
fill('email', white@jpwynne.edu)
fill('city', Albuquerque)
fill('zip', 87101)
check('terms', True)
submit('submit') thank you
terms and conditions ::=
click('order form') order form
thank you ::=
click('order form') order form
23. Embedding Finite State Models
23
click('terms and conditions')
click('order form')
fill('name', Walter White)
fill('email', white@jpwynne.edu)
fill('city', Albuquerque)
fill('zip', 87101)
check('terms', True)
submit('submit')
click('order form')
fill('name', Walter White)
...
start ::= order form
order form ::=
click('terms and conditions') terms and conditions
|
fill('name', Walter White)
fill('email', white@jpwynne.edu)
fill('city', Albuquerque)
fill('zip', 87101)
check('terms', True)
submit('submit') thank you
terms and conditions ::=
click('order form') order form
thank you ::=
click('order form') order form
24. name ::= Walter White | Jesse Pinkman | ...
Embedding Finite State Models
24
start ::= order form
order form ::=
click('terms and conditions') terms and conditions
|
fill('name', name)
fill('email', email)
fill('city', Albuquerque)
fill('zip', zip)
check('terms', boolean)
submit('submit') thank you
click('terms and conditions')
click('order form')
fill('name', Jesse Pinkman)
fill('email', abc@some.network)
fill('city', '1 OR 1=1)
fill('zip', -1)
check('terms', True)
submit('submit')
click('order form')
fill('name', Duke of Orléans)
...
email ::= localpart@domain | ...
zip ::= [0-9][0-9][0-9][0-9][0-9] | -1 | '1 OR 1=1 | ϵ | 😀 | ...
terms ::= True | False
25. Embedding Finite State Models
25
start ::= order form
order form ::=
click('terms and conditions') terms and conditions
|
name ::= Walter White | Jesse Pinkman | ...
email ::= localpart@domain | ...
zip ::= [0-9][0-9][0-9][0-9][0-9] | -1 | '1 OR 1=1 | ϵ | 😀 | ...
terms ::= True | False
fill('name', name)
fill('email', email)
fill('city', Albuquerque)
fill('zip', zip)
check('terms', boolean)
submit('submit') thank you
26. name ::= Walter White | Jesse Pinkman | ...
email ::= localpart@domain | ...
zip ::= [0-9][0-9][0-9][0-9][0-9] | -1 | '1 OR 1=1 | ϵ | 😀 | ...
terms ::= True | False
start
Order Form
Terms and Conditions
click('Terms and conditions')
Thank You
fill(...)
submit('submit')
click('order form') click(
Achieving Grammar Coverage
26
start ::= order form
order form ::=
click('terms and conditions') terms and conditions
|
fill('name', name)
fill('email', email)
fill('city', Albuquerque)
fill('zip', zip)
check('terms', boolean)
submit('submit') thank you
✅
✅ ✅
✅
✅ ✅
✅
✅ ✅ ✅ ✅ ✅
✅
✅
✅ ✅
✅ ✅
38. Unique academic career opportunities
2
CISPA is a research center in Germany
• Solving the grand challenges in information security
• Rich base funding
• Rapidly expanding in all sub-fields
• Extensive potential for collaborations
• High quality of living
Talk to me!
www.cispa.saarland
And if you're not already at Facebook…