SlideShare a Scribd company logo
1 of 28
Fuzz Testing
Abhik
Roychoudhury
National
University of
Singapore
Fuzz Testing
2
Springfield Project - Fuzzing as a service
OSS-Fuzz - Continuous fuzzing for open-source projects
Proposed by Barton Miller at Univ. of Wisconsin in 1988
(Biased) random search over inputs to find crashes / hangs, tools existed.
And then, since 2016 … research in looking inside the tools
Who cares?
3
A team of hackers won $2 million by
building a machine that could hack better
than they could
DARPA Cyber Grand
Challenge
Automation of Security
[detecting and fixing
vulnerabilities in binaries
automatically]
Dagstuhl,
2021
Presented by Thuan Pham
Black-box Fuzzing
4
Lack of specifications
• Many programs take in structured inputs
 PDF Reader, library for manipulating TIFF, PNG images
 Compilers which take in programs as input
 Web-browsers, ...
• Generating a completely random input will likely crash the application with little
insight gained about the underlying vulnerability.
• Instead take a legal well-formed PDF file and mutate it!
 Does not depend on program at all [nature of BB fuzzing]
 Does not even depend on input structure (Peach fuzzer provides Peach PIT files)
 Yet can leverage complex input structure by starting with a well-formed seed and
minimally modifying it.
Dagstuhl,
2021
5
White-box Fuzzing
6
Grey-box Fuzzing
7
Mutators
Test suite
Mutated files
Input Queue
Enqueue
Dequeue
Dagstuhl,
2021
Compile time instrumentation
- Basic block transitions – hit counts
- Do not disambiguate every path
Grey-box Fuzzing Algorithm
8
• Input: Seed Inputs S
• 1: T✗ = ∅
• 2: T = S
• 3: if T = ∅ then
• 4: add empty file to T
• 5: end if
• 6: repeat
• 7: t = chooseNext(T)
• 8: p = assignEnergy(t)
• 9: for i from 1 to p do
• 10: t0 = mutate_input(t)
• 11: if t0 crashes then
• 12: add t0 to T✗
• 13: else if isInteresting(t0 ) then
• 14: add t0 to T
• 15: end if
• 16: end for
• 17: until timeout reached or abort-signal
• Output: Crashing Inputs T✗
Dagstuhl,
2021
Space of Problems in SW Reliability
• Fuzz Testing
 Feed semi-random inputs to find hangs and crashes
• Continuous fuzzing
 Incrementally find new “problems” in software
• Crash reproduction
 Re-construct a reported crash, crashing input not included due to privacy
• Reaching nooks and corners
• Localizing reported observable errors
• Patching reported errors from input-output examples
9
Dagstuhl,
2021
Space of Techniques
Search (Fuzzing)
• Random
• Biased-random …
• … with mutations (AFL Fuzzer)
• …
• Low set-up overhead
• Fast, less accurate
• Use objective function to steer
Symbolic Execution
• Dynamic Symbolic execution
• Concolic Execution
• Cluster paths based on symbolic
expressions of variables
• ....
• High set-up overhead
• Slow, more accurate
• Use logical formula to steer
10
Dagstuhl,
2021
In this talk …
Search
• Enhance the effectiveness of
search techniques, with symbolic
execution and model checking as
inspiration
• Think: Symbolic (Formal)
• Act: Fuzzing (Efficient)
11
Dagstuhl,
2021
Grey-box Fuzzing Algorithm
12
• Input: Seed Inputs S
• 1: T✗ = ∅
• 2: T = S
• 3: if T = ∅ then
• 4: add empty file to T
• 5: end if
• 6: repeat
• 7: t = chooseNext(T)
• 8: p = assignEnergy(t)
• 9: for i from 1 to p do
• 10: t0 = mutate_input(t)
• 11: if t0 crashes then
• 12: add t0 to T✗
• 13: else if isInteresting(t0 ) then
• 14: add t0 to T
• 15: end if
• 16: end for
• 17: until timeout reached or abort-signal
• Output: Crashing Inputs T✗
Dagstuhl,
2021
Programming by
experienced
people
Schematic
if (condition1)
return // short path, frequented by many inputs
else if (condition2)
exit // short paths, frequented by many inputs
else ….
Dagstuhl,
2021
13
Prioritize low probability paths
14
 Use grey-box fuzzer which keeps track of “path id” for a test.
 Find probabilities that fuzzing a test t which exercises π leads to an
input which exercises π’
 Higher weightage to low probability paths discovered, to gravitate to
those -> discover new paths with minimal effort.
π π'
1 void crashme (char* s) {
2 if (s[0] == ’b’)
3 if (s[1] == ’a’)
4 if (s[2] == ’d’)
5 if (s[3] == ’!’)
6 abort ();
7 }
p
Dagstuhl,
2021
Power-Schedules
15
Constant:
AFL uses this schedule (fuzzing ~1 minute)
 (i) .. how AFL judges fuzzing time for the test exercising path i
Cut-off Exponential:
p(i) = (i)
p(i) = 0, if f(i) > µ
min( ((i)/β)*2s(i), M) otherwise
β is a constant
s(i) #times the input exercising path i has been chosen from queue
f(i) # generated inputs exercising path i (path-frequency)
µ mean #fuzz exercising a discovered path (avg. path-frequency)
M maximum energy expendable on a state
Dagstuhl,
2021
Independent Evaluation and Deployment
• An independent evaluation by team Codejitsu found that AFLFast exposes errors in the benchmark
binaries of the DARPA Cyber Grand Challenge 19x faster than AFL.
• Considered by Zalewski@AFL, with following observations, paraphrased
 AFLFAST assigns substantially less energy in the beginning of the fuzzing campaign.
 Most of the cycles that AFLFAST carries out, are in fact very short. This causes the queue to be cycled very
rapidly, which in turn causes new retained inputs to be fuzzed almost immediately. In other words, because
AFLFAST assigns less energy, it can process the complete queue substantially faster. We say it starts by
exploration rather than by exploitation
• Implemented inside AFL, and distributed approximately within one year of publication
16
In this talk …
• Greybox Fuzzing is frequently used, daily in corporations
 State-of-the-art in automated vulnerability detection
 Extremely efficient coverage-based input generation
 All program analysis before/at instrumentation time.
 Start with a seed corpus, choose a seed file, fuzz it.
 Add to corpus only if new input increases coverage.
 Cannot be directed, unlike symbolic execution!
• Enhance the effectiveness of search
techniques, with symbolic execution &
model checking as inspiration
– Enhance coverage, how to make it
directed?
Dagstuhl,
2021
17
Patch testing
Crash reproduction
Reaching a suspicious location / module
(Earlier) View-point
18
 Directed Fuzzing: classical constraint satisfaction prob.
 Program analysis to identify program paths
that reach given program locations.
 Symbolic Execution to derive path conditions
for any of the identified paths.
 Constraint Solving to find an input that
 satisfies the path condition and thus
 reaches a program location that was given.
φ1 = (x>y)∧(x+y>10)
φ2 = ¬(x>y)∧(x+y>10)
x > y
a = x a = y
x+y>10
b = a
return b
Dagstuhl,
2021
(Later) View-point
19
 Directed Fuzzing as optimization problem!
1. Instrumentation Time:
• Instrument program to aggregate distance values.
2. Runtime, for each input
• decide how long to be fuzzed based on distance.
• If input is closer to the targets, it is fuzzed for longer.
• If input is further away from the targets, it is fuzzed for shorter.
Dagstuhl,
2021
Power Schedules - Recap
20
• Input: Seed Inputs S
• 1: T✗ = ∅
• 2: T = S
• 3: if T = ∅ then
• 4: add empty file to T
• 5: end if
• 6: repeat
• 7: t = chooseNext(T)
• 8: p = assignEnergy(t)
• 9: for i from 1 to p do
• 10: t0 = mutate_input(t)
• 11: if t0 crashes then
• 12: add t0 to T✗
• 13: else if isInteresting(t0 ) then
• 14: add t0 to T
• 15: end if
• 16: end for
• 17: until timeout reached or abort-signal
• Output: Crashing Inputs T✗
Dagstuhl,
2021
Instrumentation
21
 Function-level target distance using call graph (CG)
 BB-level target distance using control-flow graph (CFG)
1. Identify target BBs and
assign distance 0
2. Identify BBs that
call functions and
assign 10*FLTD
3. For each BB, compute harmonic
mean of (length of shortest path to
any function-calling BB + 10*FLTD).
CFG for function b
8.7
11
10
30
13
12
N/A
Dagstuhl,
2021
Directed fuzzing as optimization
22
 Integrating Simulated Annealing as power schedule
 In the beginning (t = 0min),
assign the same energy
to all seeds.
 Later (t=10min), assign
a bit more energy to
seeds that are closer.
 At exploitation (t=80min),
assign maximal energy to
seeds that are closest.
Dagstuhl,
2021
Outcomes
23
• Directed greybox fuzzer (AFLGo) outperforms
symbolic execution-based directed fuzzers (KATCH & BugRedux)
• in terms of reaching more target locations and
• in terms of detecting more vulnerabilities,
• on their own, original benchmark sets.
• Integrated as OSS-Fuzz fork (AFLGo for Continuous Fuzzing)
• Tool AFLGo publicly available, follow-up works, survey by community.
Dagstuhl,
2021
Beyond Symb. Exec. … Verification?
Search
• Enhance the effectiveness of search
techniques, with symbolic execution as
inspiration
 Enhance coverage
 Achieve directed search
24
84 139 59
AFLGo KLEE
Dagstuhl,
2021
Synergy
Finally: Structured Data?
Make Greybox Fuzzing input-structure aware
by
1. Changing the input representation
(structured files)
 Use tree-like representation instead of
bit string
2. Adding new mutation operators
 working at chunk level (e.g., chunk
deletion, insertion and splicing)
3. Prioritizing more valid seed inputs
 More valid seeds are assigned higher
fuzzing “energy”
4. Applying optimizations to retain fuzzing
efficiency
Dagstuhl,
2021
25
Recap: Black-box fuzzing
Mutators
Test suite
Mutated files
Input Queue
Enqueue
Dequeue
Dagstuhl, 2021 26
AFLSmart
File Cracker
root
chunk 1
… …
chunk 2
Seed input
validity score (0->100)
100: the whole seed is
fully cracked/parsed
XML-based input model.
One input model for each file format.
(e.g., Peach pits)
Google FuzzBench
Dagstuhl,
2021
27
Pointers
Dagstuhl,
2021
28
Directed Greybox Fuzzing ( PDF )
24th ACM Conference on Computer and Communications Security (CCS) 2017.
Coverage-based Greybox Fuzzing as Markov Chain ( PDF )
23rd ACM Conference on Computer and Communications Security (CCS) 2016,
Also in IEEE Transactions in Software Engineering (TSE) 2019.
NUS Fuzzing page: https://www.comp.nus.edu.sg/~abhik/projects/Fuzz/
“All” fuzzing papers: https://wcventure.github.io/FuzzingPaper/
Fuzzing Book by Zeller et al: https://www.fuzzingbook.org/
Smart Greybox Fuzzing ( PDF )
IEEE Transactions on Software Engineering, September 2021.
Fuzzing: Challenges and Reflections ( PDF )
IEEE Software, 38(3), pages 79-86, 2021, Outcome from a 2019 Shonan Meeting.
ACKNOWLEDGEMENT: National Cyber Security Research program from NRF Singapore

More Related Content

What's hot

Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Sangmin Park
 
Effective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent SoftwareEffective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent SoftwareSangmin Park
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrellzukun
 
LSRepair: Live Search of Fix Ingredients for Automated Program Repair
LSRepair: Live Search of Fix Ingredients for Automated Program RepairLSRepair: Live Search of Fix Ingredients for Automated Program Repair
LSRepair: Live Search of Fix Ingredients for Automated Program RepairDongsun Kim
 
Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Gael Varoquaux
 
Open Source Verification under a Cloud (OpenCert 2010)
Open Source Verification under a Cloud (OpenCert 2010)Open Source Verification under a Cloud (OpenCert 2010)
Open Source Verification under a Cloud (OpenCert 2010)Peter Breuer
 
TRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text DescriptionTRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text DescriptionGeorge Awad
 
TRECVID 2016 : Concept Localization
TRECVID 2016 : Concept LocalizationTRECVID 2016 : Concept Localization
TRECVID 2016 : Concept LocalizationGeorge Awad
 
TRECVID 2016 : Ad-hoc Video Search
TRECVID 2016 : Ad-hoc Video Search TRECVID 2016 : Ad-hoc Video Search
TRECVID 2016 : Ad-hoc Video Search George Awad
 
Fast, Private and Verifiable: Server-aided Approximate Similarity Computation...
Fast, Private and Verifiable: Server-aided Approximate Similarity Computation...Fast, Private and Verifiable: Server-aided Approximate Similarity Computation...
Fast, Private and Verifiable: Server-aided Approximate Similarity Computation...Mateus S. H. Cruz
 
Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Stro...
Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Stro...Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Stro...
Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Stro...Mateus S. H. Cruz
 
Static analysis: looking for errors ... and vulnerabilities?
Static analysis: looking for errors ... and vulnerabilities? Static analysis: looking for errors ... and vulnerabilities?
Static analysis: looking for errors ... and vulnerabilities? Andrey Karpov
 
Ieee2013_upgrading_knowledge_matlab_pt2
Ieee2013_upgrading_knowledge_matlab_pt2Ieee2013_upgrading_knowledge_matlab_pt2
Ieee2013_upgrading_knowledge_matlab_pt2Georgios Drakopoulos
 
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the CloudPrivacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the CloudMateus S. H. Cruz
 
Breaking Obfuscated Programs with Symbolic Execution
Breaking Obfuscated Programs with Symbolic ExecutionBreaking Obfuscated Programs with Symbolic Execution
Breaking Obfuscated Programs with Symbolic ExecutionSebastian Banescu
 

What's hot (20)

Symbexecsearch
SymbexecsearchSymbexecsearch
Symbexecsearch
 
DETR ECCV20
DETR ECCV20DETR ECCV20
DETR ECCV20
 
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
 
Effective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent SoftwareEffective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent Software
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrell
 
LSRepair: Live Search of Fix Ingredients for Automated Program Repair
LSRepair: Live Search of Fix Ingredients for Automated Program RepairLSRepair: Live Search of Fix Ingredients for Automated Program Repair
LSRepair: Live Search of Fix Ingredients for Automated Program Repair
 
Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016
 
Open Source Verification under a Cloud (OpenCert 2010)
Open Source Verification under a Cloud (OpenCert 2010)Open Source Verification under a Cloud (OpenCert 2010)
Open Source Verification under a Cloud (OpenCert 2010)
 
STAMP
STAMPSTAMP
STAMP
 
TRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text DescriptionTRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text Description
 
TRECVID 2016 : Concept Localization
TRECVID 2016 : Concept LocalizationTRECVID 2016 : Concept Localization
TRECVID 2016 : Concept Localization
 
TRECVID 2016 : Ad-hoc Video Search
TRECVID 2016 : Ad-hoc Video Search TRECVID 2016 : Ad-hoc Video Search
TRECVID 2016 : Ad-hoc Video Search
 
Fast, Private and Verifiable: Server-aided Approximate Similarity Computation...
Fast, Private and Verifiable: Server-aided Approximate Similarity Computation...Fast, Private and Verifiable: Server-aided Approximate Similarity Computation...
Fast, Private and Verifiable: Server-aided Approximate Similarity Computation...
 
Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Stro...
Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Stro...Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Stro...
Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Stro...
 
Static analysis: looking for errors ... and vulnerabilities?
Static analysis: looking for errors ... and vulnerabilities? Static analysis: looking for errors ... and vulnerabilities?
Static analysis: looking for errors ... and vulnerabilities?
 
Ieee2013_upgrading_knowledge_matlab_pt2
Ieee2013_upgrading_knowledge_matlab_pt2Ieee2013_upgrading_knowledge_matlab_pt2
Ieee2013_upgrading_knowledge_matlab_pt2
 
Presentation-Umar
Presentation-UmarPresentation-Umar
Presentation-Umar
 
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the CloudPrivacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud
 
7-DIG_FINAL_paper
7-DIG_FINAL_paper7-DIG_FINAL_paper
7-DIG_FINAL_paper
 
Breaking Obfuscated Programs with Symbolic Execution
Breaking Obfuscated Programs with Symbolic ExecutionBreaking Obfuscated Programs with Symbolic Execution
Breaking Obfuscated Programs with Symbolic Execution
 

Similar to Dagstuhl2021

Fuzzing: The New Unit Testing
Fuzzing: The New Unit TestingFuzzing: The New Unit Testing
Fuzzing: The New Unit TestingDmitry Vyukov
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Intel Software Brasil
 
Who pulls the strings?
Who pulls the strings?Who pulls the strings?
Who pulls the strings?Ronny
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit
 
Lecture 25
Lecture 25Lecture 25
Lecture 25Shani729
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15MLconf
 
Scientific visualization with_gr
Scientific visualization with_grScientific visualization with_gr
Scientific visualization with_grJosef Heinen
 
Multiprocessing with python
Multiprocessing with pythonMultiprocessing with python
Multiprocessing with pythonPatrick Vergain
 
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkBRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkDoug Chang
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA Taiwan
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsTravis Oliphant
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up Craig Schumann
 

Similar to Dagstuhl2021 (20)

Fuzzing.pptx
Fuzzing.pptxFuzzing.pptx
Fuzzing.pptx
 
Fuzzing: The New Unit Testing
Fuzzing: The New Unit TestingFuzzing: The New Unit Testing
Fuzzing: The New Unit Testing
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
 
Who pulls the strings?
Who pulls the strings?Who pulls the strings?
Who pulls the strings?
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim Hunter
 
Lecture 25
Lecture 25Lecture 25
Lecture 25
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
TAU on Power 9
TAU on Power 9TAU on Power 9
TAU on Power 9
 
Scientific visualization with_gr
Scientific visualization with_grScientific visualization with_gr
Scientific visualization with_gr
 
Multiprocessing with python
Multiprocessing with pythonMultiprocessing with python
Multiprocessing with python
 
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkBRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning Talk
 
Symbolic Execution And KLEE
Symbolic Execution And KLEESymbolic Execution And KLEE
Symbolic Execution And KLEE
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up
 

More from Abhik Roychoudhury

More from Abhik Roychoudhury (10)

16May_ICSE_MIP_APR_2023.pptx
16May_ICSE_MIP_APR_2023.pptx16May_ICSE_MIP_APR_2023.pptx
16May_ICSE_MIP_APR_2023.pptx
 
IFIP2023-Abhik.pptx
IFIP2023-Abhik.pptxIFIP2023-Abhik.pptx
IFIP2023-Abhik.pptx
 
Art of Computer Science Research Planning
Art of Computer Science Research PlanningArt of Computer Science Research Planning
Art of Computer Science Research Planning
 
Automated Repair - ISSTA Summer School
Automated Repair - ISSTA Summer SchoolAutomated Repair - ISSTA Summer School
Automated Repair - ISSTA Summer School
 
Repair dagstuhl jan2017
Repair dagstuhl jan2017Repair dagstuhl jan2017
Repair dagstuhl jan2017
 
Abhik-Satish-dagstuhl
Abhik-Satish-dagstuhlAbhik-Satish-dagstuhl
Abhik-Satish-dagstuhl
 
Issta13 workshop on debugging
Issta13 workshop on debuggingIssta13 workshop on debugging
Issta13 workshop on debugging
 
Repair dagstuhl
Repair dagstuhlRepair dagstuhl
Repair dagstuhl
 
PAS 2012
PAS 2012PAS 2012
PAS 2012
 
Pas oct12
Pas oct12Pas oct12
Pas oct12
 

Recently uploaded

9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Recently uploaded (20)

9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Dagstuhl2021

  • 2. Fuzz Testing 2 Springfield Project - Fuzzing as a service OSS-Fuzz - Continuous fuzzing for open-source projects Proposed by Barton Miller at Univ. of Wisconsin in 1988 (Biased) random search over inputs to find crashes / hangs, tools existed. And then, since 2016 … research in looking inside the tools
  • 3. Who cares? 3 A team of hackers won $2 million by building a machine that could hack better than they could DARPA Cyber Grand Challenge Automation of Security [detecting and fixing vulnerabilities in binaries automatically] Dagstuhl, 2021
  • 4. Presented by Thuan Pham Black-box Fuzzing 4
  • 5. Lack of specifications • Many programs take in structured inputs  PDF Reader, library for manipulating TIFF, PNG images  Compilers which take in programs as input  Web-browsers, ... • Generating a completely random input will likely crash the application with little insight gained about the underlying vulnerability. • Instead take a legal well-formed PDF file and mutate it!  Does not depend on program at all [nature of BB fuzzing]  Does not even depend on input structure (Peach fuzzer provides Peach PIT files)  Yet can leverage complex input structure by starting with a well-formed seed and minimally modifying it. Dagstuhl, 2021 5
  • 7. Grey-box Fuzzing 7 Mutators Test suite Mutated files Input Queue Enqueue Dequeue Dagstuhl, 2021 Compile time instrumentation - Basic block transitions – hit counts - Do not disambiguate every path
  • 8. Grey-box Fuzzing Algorithm 8 • Input: Seed Inputs S • 1: T✗ = ∅ • 2: T = S • 3: if T = ∅ then • 4: add empty file to T • 5: end if • 6: repeat • 7: t = chooseNext(T) • 8: p = assignEnergy(t) • 9: for i from 1 to p do • 10: t0 = mutate_input(t) • 11: if t0 crashes then • 12: add t0 to T✗ • 13: else if isInteresting(t0 ) then • 14: add t0 to T • 15: end if • 16: end for • 17: until timeout reached or abort-signal • Output: Crashing Inputs T✗ Dagstuhl, 2021
  • 9. Space of Problems in SW Reliability • Fuzz Testing  Feed semi-random inputs to find hangs and crashes • Continuous fuzzing  Incrementally find new “problems” in software • Crash reproduction  Re-construct a reported crash, crashing input not included due to privacy • Reaching nooks and corners • Localizing reported observable errors • Patching reported errors from input-output examples 9 Dagstuhl, 2021
  • 10. Space of Techniques Search (Fuzzing) • Random • Biased-random … • … with mutations (AFL Fuzzer) • … • Low set-up overhead • Fast, less accurate • Use objective function to steer Symbolic Execution • Dynamic Symbolic execution • Concolic Execution • Cluster paths based on symbolic expressions of variables • .... • High set-up overhead • Slow, more accurate • Use logical formula to steer 10 Dagstuhl, 2021
  • 11. In this talk … Search • Enhance the effectiveness of search techniques, with symbolic execution and model checking as inspiration • Think: Symbolic (Formal) • Act: Fuzzing (Efficient) 11 Dagstuhl, 2021
  • 12. Grey-box Fuzzing Algorithm 12 • Input: Seed Inputs S • 1: T✗ = ∅ • 2: T = S • 3: if T = ∅ then • 4: add empty file to T • 5: end if • 6: repeat • 7: t = chooseNext(T) • 8: p = assignEnergy(t) • 9: for i from 1 to p do • 10: t0 = mutate_input(t) • 11: if t0 crashes then • 12: add t0 to T✗ • 13: else if isInteresting(t0 ) then • 14: add t0 to T • 15: end if • 16: end for • 17: until timeout reached or abort-signal • Output: Crashing Inputs T✗ Dagstuhl, 2021
  • 13. Programming by experienced people Schematic if (condition1) return // short path, frequented by many inputs else if (condition2) exit // short paths, frequented by many inputs else …. Dagstuhl, 2021 13
  • 14. Prioritize low probability paths 14  Use grey-box fuzzer which keeps track of “path id” for a test.  Find probabilities that fuzzing a test t which exercises π leads to an input which exercises π’  Higher weightage to low probability paths discovered, to gravitate to those -> discover new paths with minimal effort. π π' 1 void crashme (char* s) { 2 if (s[0] == ’b’) 3 if (s[1] == ’a’) 4 if (s[2] == ’d’) 5 if (s[3] == ’!’) 6 abort (); 7 } p Dagstuhl, 2021
  • 15. Power-Schedules 15 Constant: AFL uses this schedule (fuzzing ~1 minute)  (i) .. how AFL judges fuzzing time for the test exercising path i Cut-off Exponential: p(i) = (i) p(i) = 0, if f(i) > µ min( ((i)/β)*2s(i), M) otherwise β is a constant s(i) #times the input exercising path i has been chosen from queue f(i) # generated inputs exercising path i (path-frequency) µ mean #fuzz exercising a discovered path (avg. path-frequency) M maximum energy expendable on a state Dagstuhl, 2021
  • 16. Independent Evaluation and Deployment • An independent evaluation by team Codejitsu found that AFLFast exposes errors in the benchmark binaries of the DARPA Cyber Grand Challenge 19x faster than AFL. • Considered by Zalewski@AFL, with following observations, paraphrased  AFLFAST assigns substantially less energy in the beginning of the fuzzing campaign.  Most of the cycles that AFLFAST carries out, are in fact very short. This causes the queue to be cycled very rapidly, which in turn causes new retained inputs to be fuzzed almost immediately. In other words, because AFLFAST assigns less energy, it can process the complete queue substantially faster. We say it starts by exploration rather than by exploitation • Implemented inside AFL, and distributed approximately within one year of publication 16
  • 17. In this talk … • Greybox Fuzzing is frequently used, daily in corporations  State-of-the-art in automated vulnerability detection  Extremely efficient coverage-based input generation  All program analysis before/at instrumentation time.  Start with a seed corpus, choose a seed file, fuzz it.  Add to corpus only if new input increases coverage.  Cannot be directed, unlike symbolic execution! • Enhance the effectiveness of search techniques, with symbolic execution & model checking as inspiration – Enhance coverage, how to make it directed? Dagstuhl, 2021 17 Patch testing Crash reproduction Reaching a suspicious location / module
  • 18. (Earlier) View-point 18  Directed Fuzzing: classical constraint satisfaction prob.  Program analysis to identify program paths that reach given program locations.  Symbolic Execution to derive path conditions for any of the identified paths.  Constraint Solving to find an input that  satisfies the path condition and thus  reaches a program location that was given. φ1 = (x>y)∧(x+y>10) φ2 = ¬(x>y)∧(x+y>10) x > y a = x a = y x+y>10 b = a return b Dagstuhl, 2021
  • 19. (Later) View-point 19  Directed Fuzzing as optimization problem! 1. Instrumentation Time: • Instrument program to aggregate distance values. 2. Runtime, for each input • decide how long to be fuzzed based on distance. • If input is closer to the targets, it is fuzzed for longer. • If input is further away from the targets, it is fuzzed for shorter. Dagstuhl, 2021
  • 20. Power Schedules - Recap 20 • Input: Seed Inputs S • 1: T✗ = ∅ • 2: T = S • 3: if T = ∅ then • 4: add empty file to T • 5: end if • 6: repeat • 7: t = chooseNext(T) • 8: p = assignEnergy(t) • 9: for i from 1 to p do • 10: t0 = mutate_input(t) • 11: if t0 crashes then • 12: add t0 to T✗ • 13: else if isInteresting(t0 ) then • 14: add t0 to T • 15: end if • 16: end for • 17: until timeout reached or abort-signal • Output: Crashing Inputs T✗ Dagstuhl, 2021
  • 21. Instrumentation 21  Function-level target distance using call graph (CG)  BB-level target distance using control-flow graph (CFG) 1. Identify target BBs and assign distance 0 2. Identify BBs that call functions and assign 10*FLTD 3. For each BB, compute harmonic mean of (length of shortest path to any function-calling BB + 10*FLTD). CFG for function b 8.7 11 10 30 13 12 N/A Dagstuhl, 2021
  • 22. Directed fuzzing as optimization 22  Integrating Simulated Annealing as power schedule  In the beginning (t = 0min), assign the same energy to all seeds.  Later (t=10min), assign a bit more energy to seeds that are closer.  At exploitation (t=80min), assign maximal energy to seeds that are closest. Dagstuhl, 2021
  • 23. Outcomes 23 • Directed greybox fuzzer (AFLGo) outperforms symbolic execution-based directed fuzzers (KATCH & BugRedux) • in terms of reaching more target locations and • in terms of detecting more vulnerabilities, • on their own, original benchmark sets. • Integrated as OSS-Fuzz fork (AFLGo for Continuous Fuzzing) • Tool AFLGo publicly available, follow-up works, survey by community. Dagstuhl, 2021
  • 24. Beyond Symb. Exec. … Verification? Search • Enhance the effectiveness of search techniques, with symbolic execution as inspiration  Enhance coverage  Achieve directed search 24 84 139 59 AFLGo KLEE Dagstuhl, 2021 Synergy
  • 25. Finally: Structured Data? Make Greybox Fuzzing input-structure aware by 1. Changing the input representation (structured files)  Use tree-like representation instead of bit string 2. Adding new mutation operators  working at chunk level (e.g., chunk deletion, insertion and splicing) 3. Prioritizing more valid seed inputs  More valid seeds are assigned higher fuzzing “energy” 4. Applying optimizations to retain fuzzing efficiency Dagstuhl, 2021 25 Recap: Black-box fuzzing
  • 26. Mutators Test suite Mutated files Input Queue Enqueue Dequeue Dagstuhl, 2021 26 AFLSmart File Cracker root chunk 1 … … chunk 2 Seed input validity score (0->100) 100: the whole seed is fully cracked/parsed XML-based input model. One input model for each file format. (e.g., Peach pits)
  • 28. Pointers Dagstuhl, 2021 28 Directed Greybox Fuzzing ( PDF ) 24th ACM Conference on Computer and Communications Security (CCS) 2017. Coverage-based Greybox Fuzzing as Markov Chain ( PDF ) 23rd ACM Conference on Computer and Communications Security (CCS) 2016, Also in IEEE Transactions in Software Engineering (TSE) 2019. NUS Fuzzing page: https://www.comp.nus.edu.sg/~abhik/projects/Fuzz/ “All” fuzzing papers: https://wcventure.github.io/FuzzingPaper/ Fuzzing Book by Zeller et al: https://www.fuzzingbook.org/ Smart Greybox Fuzzing ( PDF ) IEEE Transactions on Software Engineering, September 2021. Fuzzing: Challenges and Reflections ( PDF ) IEEE Software, 38(3), pages 79-86, 2021, Outcome from a 2019 Shonan Meeting. ACKNOWLEDGEMENT: National Cyber Security Research program from NRF Singapore