SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits

SemFuzz: Semantics-based Automatic
Generation of Proof-of-Concept Exploits
CCS 2017
Wei You, Kai Chen, XiaoFeng Wang, etc
Indiana University, UCAS, etc

Abstract
● Input validation flaws exploit can be automatically generated, through it’s hard and
rare
● Less understood are the implications of other bug-related informations(CVE infos,
etc.), and such information can facilitate exploit generation
● They present a tool called SemFuzz that can leverage vulnerability-related text to
guide automatic generation of POC exploits
○ Target: Linux kernel with CVE report and git log
○ Including UAF, Memory corruption, information leak, etc
● 18/122 Succeed, 1 0-day and 1 undisclosed vulnerability

Background
● Implication of “other” Information
● Challenges in automatic exploit
generation

Vulnerability Life Cycle
● System updates are often slow
● Miscreants are often given a large time frame (30
days on average), during which they can leverage
the information exposed by public patches to
recover hidden bugs
● Less understood, however, are the implications of
other information
○ CVE, git log, bug description posted on forums and blogs
● Whether such information can also be leveraged
for automatic construction of complicated
exploits?

Challenges in AEG
● Attack on input-validation flaws
○ Symbolic execution
○ Constraint solving are known to be difficult
■ Non-linear, incomplete constraints
● Other types of vulnerabilities are more complicated,
cannot be patched by a patch
○ Even a whole chunk of code need to be replaced

SemFuzz
● Design
● Semantic Information Retrieving
● Semantic Guided Fuzzing

Design
● Semantic Information
Retrieving
○ NLP
○ affected version, vulnerability
type, vulnerable functions, critical
variables, system calls
● Semantics-based Fuzzing
○ Generate seeds
○ Mutate
■ Coarse-level
■ Fine-grained
○ Event Listener

Semantic Information Retrieving
●

● Natural Language Processing
○ Part-of-Speech(POS) Tagging, Phrase Parsing and Syntactic Parsing
● Generating parse tree
○ Represent the syntactic structure of a sentence according to a Context-Free Grammar(CoFG)
S: Sentence, NP: none phrase, VP: verb phrase, JJ: adj., NN: noun.
“the whole skb len is dangerous”

● Affected Version: Regular expression
● Vulnerability Type: Match Candidate Types List
● Vulnerable Functions: Code Diff
● Critical Variables: Match Symbol Table
● System Call:
○ 2 types, prepare environment or trigger the bug
○ Sometimes no syscall in bug description

Syscall
● Build a knowledge base
○ LPM
● Correlate the keywords
to domain-speci€c
concepts
○ e.g. Link MSG MORE to
the flags parameter of the
sendto system call
● Selects the system call
that can cover the most
keywords

Semantics-Guided Fuzzing
● Environment Setup
○ Syzkaller based Framwork
● Generating the seed input
● Coarse mutation
○ Find a system call sequence
● Fire-Grained mutation
○ Mutation on variable
○ Monitor “critical variables”
● Trigger the vulnerability
KCOV: kernel code coverage API
Parameter Monitor: observe param of kf instead of critical variables, with C/DFA
Out-Box Loader: capture abnormal events, KASAN, UBSAN, etc.

Seed Input
● First, put all retrieved syscall together
○ incomplete seed input
○ fill all parameters, including structures (learn from LPM)
○ socket, sendto need syscall bind
● Second, correlates other system calls with the retrieved ones

Coarse-level Mutation
● Mutate input and check distance between
vulnerable function and trace
○ shortest path
○ new seed input
● Construct a reverse call graph
○ Backward reachability analysis
○ Modify GCC to collect call info

Fine-grained Mutation
● Mutate the values of system call parameters
● Only observes the function parameters that the critical variables depend on
○ DFA, CFA
● Measure the input quality using the distance between BBL
e: entry bbl
p: patch bbl
b: current bbl

Evaluation
● Effectiveness
● Performance
● Findings
● Cases

Effectiveness
● Environment
○ x86/x86_64 Linux kernel from 4.0 to 4.11
○ KCOV ported to version before 4.6
○ KASAN & UBSAN enabled
○ Vulnerabilties require specific devices are filtered out
○ Time limit: 48 hour
● Generate PoC exploits for 18(16%) CVEs
○ 5 of 18 have been studied, other without trigger
● For the rest 94
○ 49% lead to vulnerable function
○ 20% lead to patched block

Performance
● Faster than Syzkaller
○ 13.2h VS 33.9h
○ 18 VS 7 (trigger vulnerabilities)
● Conner Cases
○ Specific condition
○ Race Condition

Findings
● More vulnerable functions decrease the possibility to generate a vulnerability
○ So do the Critical Variables
● More precise info works well
● Unknown Vulnerabilities
○ 0day: CVE-2017-6347
○ Undisclosed vulnerability

Cases
● 0day: CVE-2017-6347
○ In the fuzzing process of CVE-2016-4794
■ a UAF vulnerability in the Berkeley Packet Filter
(bpf) subsystem
○ Same syscall sequence with different params
● Undisclosed vulnerability
○ In the fuzzing process of CVE-2016-3841
■ a UAF vulnerability in the networking subsystem
○ 18 vulnerable functions/patches
○ triggered in another protocol

SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits

Similar to SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits (20)

Recently uploaded

Recently uploaded (20)

SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits