LEARNING INPUT TOKENS FOR EFFECTIVE FUZZING
BJÖRN MATHIS, RAHUL GOPINATH, ANDREAS ZELLER
FUZZING - THE ART OF AUTOMATIC BUG FINDING
2
PROGRAM UNDER TESTFUZZER
FUZZING - THE ART OF AUTOMATIC BUG FINDING
2
PROGRAM UNDER TEST
7245
FUZZER
FUZZING - THE ART OF AUTOMATIC BUG FINDING
2
PROGRAM UNDER TEST
7245
FUZZER
FUZZING - THE ART OF AUTOMATIC BUG FINDING
2
PROGRAM UNDER TEST
7245
FUZZER
FUZZING - THE ART OF AUTOMATIC BUG FINDING
2
PROGRAM UNDER TESTFUZZER
FUZZING - THE ART OF AUTOMATIC BUG FINDING
2
PROGRAM UNDER TEST
C4tscs
FUZZER
FUZZING - THE ART OF AUTOMATIC BUG FINDING
2
PROGRAM UNDER TEST
C4tscs
FUZZER
FUZZING - THE ART OF AUTOMATIC BUG FINDING
2
PROGRAM UNDER TEST
C4tscs
FUZZER
PROGRAM UNDER TEST
FUZZING - THE ART OF AUTOMATIC BUG FINDING
3
FUZZER
PROGRAM UNDER TEST
FUZZING - THE ART OF AUTOMATIC BUG FINDING
3
FUZZER
PROGRAM UNDER TEST
FUZZING - THE ART OF AUTOMATIC BUG FINDING
3
FUZZER
PROGRAM UNDER TEST
FUZZING - THE ART OF AUTOMATIC BUG FINDING
3
FUZZER
C4tscs
PROGRAM UNDER TEST
FUZZING - THE ART OF AUTOMATIC BUG FINDING
3
FUZZER
C4tscs
PROGRAM UNDER TEST
FUZZING - THE ART OF AUTOMATIC BUG FINDING
3
FUZZER
PROGRAM UNDER TEST
FUZZING - THE ART OF AUTOMATIC BUG FINDING
3
FUZZER
X + 0
PROGRAM UNDER TEST
FUZZING - THE ART OF AUTOMATIC BUG FINDING
3
FUZZER
X + 0
PROGRAM UNDER TEST
FUZZING - THE ART OF AUTOMATIC BUG FINDING
3
FUZZER
X + 0
PROGRAM UNDER TEST
FUZZING - THE ART OF AUTOMATIC BUG FINDING
3
FUZZER
X + 0
COMPLEX INPUT STRUCTURES NEED SYNTACTIC FUZZING
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
&
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
&
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
&
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
X
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
X
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
X
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
X @
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
X @
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
X @
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
X +
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
X +
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
X + 0
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
PFUZZER - SURVIVING THE PARSING STAGE
4
PFUZZER
X + 0
def parse_exp(i):
c = input[i]
if isDigit(c):
parse_op(i + 1)
elif isAlpha(c):
parse_op(i + 1)
def parse_op(i):
c = input[i]
if c == '-':
parse_exp(i + 1)
elif c == '+':
parse_exp(i + 1)
else:
raise InvalidSyntax
TOKENIZATION - COMPLEX PARSERS
5
TOKENIZATION - COMPLEX PARSERS
5
X + 0
TOKENIZATION - COMPLEX PARSERS
5
X + 0
TOKENIZER
TOKENIZATION - COMPLEX PARSERS
5
X + 0
TOKENIZER
T_ALPHA T_PLUS T_DIGIT
TOKENIZATION - COMPLEX PARSERS
5
X + 0
TOKENIZER
T_ALPHA T_PLUS T_DIGIT
PARSER
6
TOKENIZATION - COMPLEX PARSERS
X + 0
TOKENIZER
T_ALPHA T_PLUS T_DIGIT
PARSER
6
TOKENIZATION - COMPLEX PARSERS
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
X + 0
TOKENIZER
T_ALPHA T_PLUS T_DIGIT
PARSER
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
X + 0
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
X + 0
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
X + 0
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
X + 0
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
X + 0
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
T_DIGIT
T_ALPHA
X + 0
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
T_DIGIT
T_ALPHA
X + 0
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
T_DIGIT
T_ALPHA
X + 0
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
T_DIGIT
T_ALPHA
T_MINUS
T_PLUS
X + 0
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
T_DIGIT
T_ALPHA
T_MINUS
T_PLUS
X + 0
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
T_DIGIT
T_ALPHA
T_MINUS
T_PLUS
X + 0
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
DYNAMIC TAINTING - LOOKING INTO A PROGRAM
7
T_DIGIT
T_ALPHA
T_MINUS
T_PLUS
T_DIGIT
X + 0
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
&
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
&
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
&
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
X
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
X
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
X
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
X
Tokenmapping
String Token
A .. Z, a .. z T_ALPHA
0 .. 9 T_DIGIT
- T_MINUS
+ T_PLUS
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
Tokenmapping
String Token
A .. Z, a .. z T_ALPHA
0 .. 9 T_DIGIT
- T_MINUS
+ T_PLUS
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
X 3
Tokenmapping
String Token
A .. Z, a .. z T_ALPHA
0 .. 9 T_DIGIT
- T_MINUS
+ T_PLUS
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
X 3
Tokenmapping
String Token
A .. Z, a .. z T_ALPHA
0 .. 9 T_DIGIT
- T_MINUS
+ T_PLUS
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
X 3
Tokenmapping
String Token
A .. Z, a .. z T_ALPHA
0 .. 9 T_DIGIT
- T_MINUS
+ T_PLUS
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
X +
Tokenmapping
String Token
A .. Z, a .. z T_ALPHA
0 .. 9 T_DIGIT
- T_MINUS
+ T_PLUS
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
X +
Tokenmapping
String Token
A .. Z, a .. z T_ALPHA
0 .. 9 T_DIGIT
- T_MINUS
+ T_PLUS
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
X + 0
Tokenmapping
String Token
A .. Z, a .. z T_ALPHA
0 .. 9 T_DIGIT
- T_MINUS
+ T_PLUS
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - SURVIVING THE TOKENIZATION AND PARSING STAGE
8
LFUZZER
X + 0
Tokenmapping
String Token
A .. Z, a .. z T_ALPHA
0 .. 9 T_DIGIT
- T_MINUS
+ T_PLUS
def parse_exp(i):
c = input[i]
token = tokenize(c)
if token == T_DIGIT:
parse_op(i + 1)
elif token == T_ALPHA:
parse_op(i + 1)
def parse_op(i):
c = input[i]
token = tokenize(c)
if token == T_MINUS:
parse_exp(i + 1)
elif token == T_PLUS:
parse_exp(i + 1)
else:
raise InvalidSyntax
def tokenize(c):
if isDigit(c):
return T_DIGIT
elif isAlpha(c):
return T_ALPHA
elif c == '-':
return T_MINUS
elif c == '+':
return T_PLUS
else:
raise InvalidToken
LFUZZER - BOOSTING FUZZERS
9
LFUZZER - BOOSTING FUZZERS
9
0 .. 9

A .. Z

a .. z

+

-
TOKENS
LFUZZER - BOOSTING FUZZERS
9
0 .. 9

A .. Z

a .. z

+

-
TOKENS
0 + 5

a + 6
SAMPLE INPUTS
LFUZZER - BOOSTING FUZZERS
9
0 .. 9

A .. Z

a .. z

+

-
TOKENS
0 + 5

a + 6
SAMPLE INPUTS
AFL

MIMID*

LIBFUZZER

…

YOURFAVORITEFUZZER
FUZZER
* In: "Mining Input Grammars from Dynamic Control Flow" at FSE 2020
LFUZZER - BOOSTING FUZZERS
9
0 .. 9

A .. Z

a .. z

+

-
TOKENS
0 + 5

a + 6
SAMPLE INPUTS
AFL

MIMID*

LIBFUZZER

…

YOURFAVORITEFUZZER
FUZZER
A - K

8 - I + P - q

R + y - 6 + u

…
INPUTS
* In: "Mining Input Grammars from Dynamic Control Flow" at FSE 2020
LFUZZER - BOOSTING FUZZERS
9
0 .. 9

A .. Z

a .. z

+

-
TOKENS
0 + 5

a + 6
SAMPLE INPUTS
AFL

MIMID*

LIBFUZZER

…

YOURFAVORITEFUZZER
FUZZER
A - K

8 - I + P - q

R + y - 6 + u

…
INPUTS
PROGRAM UNDER TEST
* In: "Mining Input Grammars from Dynamic Control Flow" at FSE 2020
LFUZZER - BOOSTING FUZZERS
9
0 .. 9

A .. Z

a .. z

+

-
TOKENS
0 + 5

a + 6
SAMPLE INPUTS
AFL

MIMID*

LIBFUZZER

…

YOURFAVORITEFUZZER
FUZZER
A - K

8 - I + P - q

R + y - 6 + u

…
INPUTS
PROGRAM UNDER TEST
* In: "Mining Input Grammars from Dynamic Control Flow" at FSE 2020
EVALUATION - TOKENS AND COVERAGE
10
EVALUATION - TOKENS AND COVERAGE
10
Fsv
ini
Fjson
lisS
tinyF
mjs
6uEjeFt
0
20
40
60
80
TokensExtraFted
6tring ExtraFtion
lFuzzer
NUMBER OF VALID TOKENS
EXTRACTED
EVALUATION - TOKENS AND COVERAGE
10
Fsv
ini
Fjson
lisS
tinyF
mjs
6uEjeFt
0
20
40
60
80
TokensExtraFted
6tring ExtraFtion
lFuzzer
NUMBER OF VALID TOKENS
EXTRACTED
Fsv
ini
Fjson
lisS
tinyF
mjs
SuEjeFt
0
25
50
75
100
125
150
175
200
7okensExtraFted
String ExtraFtion
lFuzzer
NUMBER OF INVALID TOKENS
EXTRACTED
EVALUATION - TOKENS AND COVERAGE
10
Fsv
ini
Fjson
lisS
tinyF
mjs
6uEjeFt
0
20
40
60
80
TokensExtraFted
6tring ExtraFtion
lFuzzer
NUMBER OF VALID TOKENS
EXTRACTED
Fsv
ini
Fjson
lisS
tinyF
mjs
SuEjeFt
0
25
50
75
100
125
150
175
200
7okensExtraFted
String ExtraFtion
lFuzzer
NUMBER OF INVALID TOKENS
EXTRACTED
0 4 8 12 16 20 24
TLme (h)
0
5
10
15
20
25
30
35
CoverDge(%)
mjs
A)L
A)L_DLFt
p)uzzer
p)uzzer + A)L
l)uzzer + A)L
COVERAGE OVER TIME FOR MJS
11
11
11
11
11
11
GITHUB.COM/UDS-SE/LFUZZER

lFuzzer - Learning Input Tokens for Effective Fuzzing

  • 1.
    LEARNING INPUT TOKENSFOR EFFECTIVE FUZZING BJÖRN MATHIS, RAHUL GOPINATH, ANDREAS ZELLER
  • 2.
    FUZZING - THEART OF AUTOMATIC BUG FINDING 2 PROGRAM UNDER TESTFUZZER
  • 3.
    FUZZING - THEART OF AUTOMATIC BUG FINDING 2 PROGRAM UNDER TEST 7245 FUZZER
  • 4.
    FUZZING - THEART OF AUTOMATIC BUG FINDING 2 PROGRAM UNDER TEST 7245 FUZZER
  • 5.
    FUZZING - THEART OF AUTOMATIC BUG FINDING 2 PROGRAM UNDER TEST 7245 FUZZER
  • 6.
    FUZZING - THEART OF AUTOMATIC BUG FINDING 2 PROGRAM UNDER TESTFUZZER
  • 7.
    FUZZING - THEART OF AUTOMATIC BUG FINDING 2 PROGRAM UNDER TEST C4tscs FUZZER
  • 8.
    FUZZING - THEART OF AUTOMATIC BUG FINDING 2 PROGRAM UNDER TEST C4tscs FUZZER
  • 9.
    FUZZING - THEART OF AUTOMATIC BUG FINDING 2 PROGRAM UNDER TEST C4tscs FUZZER
  • 10.
    PROGRAM UNDER TEST FUZZING- THE ART OF AUTOMATIC BUG FINDING 3 FUZZER
  • 11.
    PROGRAM UNDER TEST FUZZING- THE ART OF AUTOMATIC BUG FINDING 3 FUZZER
  • 12.
    PROGRAM UNDER TEST FUZZING- THE ART OF AUTOMATIC BUG FINDING 3 FUZZER
  • 13.
    PROGRAM UNDER TEST FUZZING- THE ART OF AUTOMATIC BUG FINDING 3 FUZZER C4tscs
  • 14.
    PROGRAM UNDER TEST FUZZING- THE ART OF AUTOMATIC BUG FINDING 3 FUZZER C4tscs
  • 15.
    PROGRAM UNDER TEST FUZZING- THE ART OF AUTOMATIC BUG FINDING 3 FUZZER
  • 16.
    PROGRAM UNDER TEST FUZZING- THE ART OF AUTOMATIC BUG FINDING 3 FUZZER X + 0
  • 17.
    PROGRAM UNDER TEST FUZZING- THE ART OF AUTOMATIC BUG FINDING 3 FUZZER X + 0
  • 18.
    PROGRAM UNDER TEST FUZZING- THE ART OF AUTOMATIC BUG FINDING 3 FUZZER X + 0
  • 19.
    PROGRAM UNDER TEST FUZZING- THE ART OF AUTOMATIC BUG FINDING 3 FUZZER X + 0 COMPLEX INPUT STRUCTURES NEED SYNTACTIC FUZZING
  • 20.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 21.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER & def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 22.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER & def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 23.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER & def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 24.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER X def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 25.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER X def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 26.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER X def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 27.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER X @ def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 28.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER X @ def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 29.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER X @ def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 30.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER X + def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 31.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER X + def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 32.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER X + 0 def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 33.
    PFUZZER - SURVIVINGTHE PARSING STAGE 4 PFUZZER X + 0 def parse_exp(i): c = input[i] if isDigit(c): parse_op(i + 1) elif isAlpha(c): parse_op(i + 1) def parse_op(i): c = input[i] if c == '-': parse_exp(i + 1) elif c == '+': parse_exp(i + 1) else: raise InvalidSyntax
  • 34.
  • 35.
    TOKENIZATION - COMPLEXPARSERS 5 X + 0
  • 36.
    TOKENIZATION - COMPLEXPARSERS 5 X + 0 TOKENIZER
  • 37.
    TOKENIZATION - COMPLEXPARSERS 5 X + 0 TOKENIZER T_ALPHA T_PLUS T_DIGIT
  • 38.
    TOKENIZATION - COMPLEXPARSERS 5 X + 0 TOKENIZER T_ALPHA T_PLUS T_DIGIT PARSER
  • 39.
    6 TOKENIZATION - COMPLEXPARSERS X + 0 TOKENIZER T_ALPHA T_PLUS T_DIGIT PARSER
  • 40.
    6 TOKENIZATION - COMPLEXPARSERS def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax X + 0 TOKENIZER T_ALPHA T_PLUS T_DIGIT PARSER
  • 41.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 42.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 X + 0 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 43.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 X + 0 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 44.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 X + 0 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 45.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 X + 0 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 46.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 X + 0 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 47.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 T_DIGIT T_ALPHA X + 0 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 48.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 T_DIGIT T_ALPHA X + 0 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 49.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 T_DIGIT T_ALPHA X + 0 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 50.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 T_DIGIT T_ALPHA T_MINUS T_PLUS X + 0 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 51.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 T_DIGIT T_ALPHA T_MINUS T_PLUS X + 0 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 52.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 T_DIGIT T_ALPHA T_MINUS T_PLUS X + 0 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 53.
    DYNAMIC TAINTING -LOOKING INTO A PROGRAM 7 T_DIGIT T_ALPHA T_MINUS T_PLUS T_DIGIT X + 0 def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 54.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 55.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER & def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 56.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER & def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 57.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER & def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 58.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER X def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 59.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER X def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 60.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER X def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 61.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER X Tokenmapping String Token A .. Z, a .. z T_ALPHA 0 .. 9 T_DIGIT - T_MINUS + T_PLUS def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 62.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER Tokenmapping String Token A .. Z, a .. z T_ALPHA 0 .. 9 T_DIGIT - T_MINUS + T_PLUS def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 63.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER X 3 Tokenmapping String Token A .. Z, a .. z T_ALPHA 0 .. 9 T_DIGIT - T_MINUS + T_PLUS def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 64.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER X 3 Tokenmapping String Token A .. Z, a .. z T_ALPHA 0 .. 9 T_DIGIT - T_MINUS + T_PLUS def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 65.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER X 3 Tokenmapping String Token A .. Z, a .. z T_ALPHA 0 .. 9 T_DIGIT - T_MINUS + T_PLUS def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 66.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER X + Tokenmapping String Token A .. Z, a .. z T_ALPHA 0 .. 9 T_DIGIT - T_MINUS + T_PLUS def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 67.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER X + Tokenmapping String Token A .. Z, a .. z T_ALPHA 0 .. 9 T_DIGIT - T_MINUS + T_PLUS def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 68.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER X + 0 Tokenmapping String Token A .. Z, a .. z T_ALPHA 0 .. 9 T_DIGIT - T_MINUS + T_PLUS def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 69.
    LFUZZER - SURVIVINGTHE TOKENIZATION AND PARSING STAGE 8 LFUZZER X + 0 Tokenmapping String Token A .. Z, a .. z T_ALPHA 0 .. 9 T_DIGIT - T_MINUS + T_PLUS def parse_exp(i): c = input[i] token = tokenize(c) if token == T_DIGIT: parse_op(i + 1) elif token == T_ALPHA: parse_op(i + 1) def parse_op(i): c = input[i] token = tokenize(c) if token == T_MINUS: parse_exp(i + 1) elif token == T_PLUS: parse_exp(i + 1) else: raise InvalidSyntax def tokenize(c): if isDigit(c): return T_DIGIT elif isAlpha(c): return T_ALPHA elif c == '-': return T_MINUS elif c == '+': return T_PLUS else: raise InvalidToken
  • 70.
  • 71.
    LFUZZER - BOOSTINGFUZZERS 9 0 .. 9 A .. Z a .. z + - TOKENS
  • 72.
    LFUZZER - BOOSTINGFUZZERS 9 0 .. 9 A .. Z a .. z + - TOKENS 0 + 5
 a + 6 SAMPLE INPUTS
  • 73.
    LFUZZER - BOOSTINGFUZZERS 9 0 .. 9 A .. Z a .. z + - TOKENS 0 + 5
 a + 6 SAMPLE INPUTS AFL
 MIMID*
 LIBFUZZER … YOURFAVORITEFUZZER FUZZER * In: "Mining Input Grammars from Dynamic Control Flow" at FSE 2020
  • 74.
    LFUZZER - BOOSTINGFUZZERS 9 0 .. 9 A .. Z a .. z + - TOKENS 0 + 5
 a + 6 SAMPLE INPUTS AFL
 MIMID*
 LIBFUZZER … YOURFAVORITEFUZZER FUZZER A - K 8 - I + P - q R + y - 6 + u … INPUTS * In: "Mining Input Grammars from Dynamic Control Flow" at FSE 2020
  • 75.
    LFUZZER - BOOSTINGFUZZERS 9 0 .. 9 A .. Z a .. z + - TOKENS 0 + 5
 a + 6 SAMPLE INPUTS AFL
 MIMID*
 LIBFUZZER … YOURFAVORITEFUZZER FUZZER A - K 8 - I + P - q R + y - 6 + u … INPUTS PROGRAM UNDER TEST * In: "Mining Input Grammars from Dynamic Control Flow" at FSE 2020
  • 76.
    LFUZZER - BOOSTINGFUZZERS 9 0 .. 9 A .. Z a .. z + - TOKENS 0 + 5
 a + 6 SAMPLE INPUTS AFL
 MIMID*
 LIBFUZZER … YOURFAVORITEFUZZER FUZZER A - K 8 - I + P - q R + y - 6 + u … INPUTS PROGRAM UNDER TEST * In: "Mining Input Grammars from Dynamic Control Flow" at FSE 2020
  • 77.
    EVALUATION - TOKENSAND COVERAGE 10
  • 78.
    EVALUATION - TOKENSAND COVERAGE 10 Fsv ini Fjson lisS tinyF mjs 6uEjeFt 0 20 40 60 80 TokensExtraFted 6tring ExtraFtion lFuzzer NUMBER OF VALID TOKENS EXTRACTED
  • 79.
    EVALUATION - TOKENSAND COVERAGE 10 Fsv ini Fjson lisS tinyF mjs 6uEjeFt 0 20 40 60 80 TokensExtraFted 6tring ExtraFtion lFuzzer NUMBER OF VALID TOKENS EXTRACTED Fsv ini Fjson lisS tinyF mjs SuEjeFt 0 25 50 75 100 125 150 175 200 7okensExtraFted String ExtraFtion lFuzzer NUMBER OF INVALID TOKENS EXTRACTED
  • 80.
    EVALUATION - TOKENSAND COVERAGE 10 Fsv ini Fjson lisS tinyF mjs 6uEjeFt 0 20 40 60 80 TokensExtraFted 6tring ExtraFtion lFuzzer NUMBER OF VALID TOKENS EXTRACTED Fsv ini Fjson lisS tinyF mjs SuEjeFt 0 25 50 75 100 125 150 175 200 7okensExtraFted String ExtraFtion lFuzzer NUMBER OF INVALID TOKENS EXTRACTED 0 4 8 12 16 20 24 TLme (h) 0 5 10 15 20 25 30 35 CoverDge(%) mjs A)L A)L_DLFt p)uzzer p)uzzer + A)L l)uzzer + A)L COVERAGE OVER TIME FOR MJS
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.