[cb22] SMARTIAN: Enhancing Smart Contract Fuzzing with Static and Dynamic Data-Flow Analyses by Doyeon Kim

SMARTIAN:
Enhancing Smart Contract Fuzzing with
Static and Dynamic Data-Flow Analyses
Jaeseung Choi
KAIST
CODE BLUE 2022
Doyeon Kim
LINE Plus
Soomin Kim
KAIST
Gustavo Grieco
Trail of Bits
Alex Groce
Northern Arizona University
Sang Kil Cha
KAIST

Ethereum Smart Contract
• Ethereum: most popular smart contract platform based on blockchain
• Smart contract = (code + data) on blockchain
ether
ether
$
Blockchain
$
</> </>
Digital cash
EVM (Ethereum Virtual Machine)

Smart Contract is Stateful
• Smart contract defines functions that a user can call.
• Each function can read or write state variables.
g(uint y) {
... = state_v + 1;
...
}
Smart contract
f(uint x) {
state_v = ...;
...
}
Call
State
variable
(persistent)
</>
f()
g()
state_v
User

Smart Contract Security
Need Testing!
Reentrancy attacks on DAO [1] Integer overflow attacks on ERC20
Bugs in smart contract can cause a catastrophic loss of digital assets.
$70M
[1] P. Daian, “Analysis of the dao exploit,” https://hackingdistributed.com/2016/06/18/analysis-of-the-dao-exploit/

• Approximate the program behaviors without actual execution.
• Can investigate various semantic properties.
• Ex) Does buffer overflow bug occur?
Program code
?
Static Program Analysis

• Repeatedly execute the target program with random inputs.
• Simple but effective technique to find vulnerabilities.
• Employed by major software companies. (e.g., Google and Microsoft)
Inputs
Mutate
Program
Crash
Google’s OSS-Fuzz [1,2]
[1] https://github.com/google/oss-fuzz
[2] https://github.com/google/clusterfuzz
Fuzz Testing (Fuzzing)

• For smart contracts, a test case (seed) is a sequence of function calls.
• Deciding the order of function call is important in fuzzing.
g( ) {
if(state_v == 31337) {
bug();
}
}
f(uint x) {
state_v = x;
}
</>
f()
g()
Can trigger bug w/ mutation
Smart contract
state_v f(0) --> g( )
g( ) --> f(0)
Can’t trigger bug w/ mutation
Challenge in Fuzzing

• Traditional coverage-based fuzzing cannot discern two sequences.
• Previous work is based on machine learning [1] or runtime heuristics [2].
</>
f()
g()
Smart contract
state_v
g( ) {
if(state_v == 31337) {
bug();
}
}
f(uint x) {
state_v = x;
}
f(0) --> g( )
g( ) --> f(0)
Same code coverage
Existing Approach
[1] J. He et al., “Learning to fuzz from symbolic execution with application to smart contracts”, CCS 2019
[2] V. Wustholz et al., “Harvey: A greybox fuzzer for smart contracts”, FSE 2020

1 f(uint x, uint y) {
2 if (x == 41)
3 state_v = y;
4 }
5 g( ) {
6 if (state_v == 61)
7 bug();
8 }
9 h( ) { ... }
• Traditional code coverage (e.g., line coverage) may miss critical seed.
𝑺𝑺𝟏𝟏: f(0,0)-->g()
𝑺𝑺𝒃𝒃𝒃𝒃𝒃𝒃: f(41,61)-->g()
Covers Line 3
𝑺𝑺𝟐𝟐: f(0,0)-->h()
𝑺𝑺𝟐𝟐′ : f(41,0)-->h()
Covers Line 3
We can miss critical
intermediate seed
𝑺𝑺𝟏𝟏′ : f(41,0)-->g()
Only 𝑺𝑺𝟏𝟏′ covers
Line 3
𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡_𝑣𝑣
Line 6
Why is Line Coverage Not Enough?

• Statically analyze data-flows between functions.
• Initialize fuzzing seeds to have promising function call orders.
</>
f()
g()
Promising sequence
Smart contract
state_v
g( ) {
if(state_v == 31337) {
bug();
}
}
f(uint x) {
state_v = x;
}
f(0) --> g( )
g( ) --> f(0)
Static Analysis
Our Approach: Static Analysis

• Integrating static analysis with fuzzing
• Collect program knowledges that can improve fuzzing performance.
Program code
Inputs
Mutate
Program
Crash
+
Fuzzing
Static Analysis
?
Our Work

Contract Code
Static
Analyzer
Fuzzer
Bugs
Initial
Seed Pool
Smartian
</>
f()
g()
Dynamic
Analysis
Our System: Smartian

Fuzzer
Bugs
Smartian
Dynamic
Analysis
Initial
Seed Pool
Contract Code
Static
Analyzer
</>
f()
g()
Smartian runs on bytecode
C
Src
C
01101
Byte
(Compile)

• Smart contracts are deployed to the blockchain in bytecode form.
• For certain contracts in the blockchain, source code may be unavailable.
• Binary-only fuzzing broadens the range of testing targets.
Binary-Only Smart Contract Fuzzing

• During compilation, ABI files are generated along with the bytecode.
• ABI contains various information, e.g., the type of function parameters.
• Only bytecode are uploaded to the blockchain.
ABI Specification

Contract Code
Static
Analyzer
Fuzzer
Bugs
Initial
Seed Pool
Smartian
</>
f()
g()
Dynamic
Analysis
011
101
111

Analyzing State Variable Access
• Contract bytecode runs in a stack-based machine called EVM.
• We must figure out the operands for storage access instructions.
C
01101
Byte
100
Stack
200
EVM
PUSH 20
ADD
...
SLOAD // Storage load
Memory Storage
20
state_v
20 + 100
120

Analyzing State Variable Access
• Contract bytecode runs in a stack-based machine called EVM.
• We must figure out the operands for storage access instructions.
C
01101
Byte
Stack
200
EVM
PUSH 20
ADD
...
SLOAD // Storage load
Memory Storage
state_v
120
...

High Level Design
• We run flow-sensitive analysis for each function.
− Approximates the state of EVM along the execution.
• We identify which state variables are loaded & stored by the function using
SLOAD and SSTORE instructions.
</>
f()
g()
011
101
111
f(…
)
g(…)
h(…)
Store: var_x, var_y
Load: var_x
Load: var_y

• Identify function call orders that may produce data-flows across functions.
• Ensure that at least one seed includes the identified order.
Initial Seed Pool
f(…
)
g(…)
h(…)
Store: var_x, var_y
Load: var_x
Load: var_y
Generate
</>
f()
g()
011
101
111
Data-flow
f()->g()
f()->h()
Generating Initial Seeds for Fuzzing

• Funcs: A set of identified functions.
• Defs: A map from each identified function to the state variables defined by the
function.
• Uses: A map from each identified function to the state variables used by the
function.
• DataFlowGain: Function-level data flows as triples <f1,v,f2> from a given
sequence, where (1) f1 and f2 are functions that appear in the sequence, (2) f1
defines v, and (3) f2 uses that v.
Seed Initialization Algorithm

• We should mutate function arguments to realize the expected data-flows.
• For this, we dynamically analyze concrete data-flows and use them as feedback.
𝑺𝑺𝟏𝟏: f(0,0)-->g()
1 f(uint x, uint y) {
2 if (x == 41)
3 state_v = y;
4 }
5 g( ) {
6 if (state_v == 61)
7 bug();
8 }
9 h( ) { ... }
𝑺𝑺𝒃𝒃𝒃𝒃𝒃𝒃: f(41,61)-->g()
Mutate
Initial seed
𝑺𝑺𝟏𝟏′: f(41,0)--
>g()
Intermediate seed
Realize data-flow
Line 3
𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡_𝑣𝑣
Line 6
Dynamic Data-Flow Analysis

• Smart contract bugs (mostly) do not incur a crash.
− Must implement bug oracle that monitors the execution.
• Smartian implements bug oracles for 13 classes of bugs.
− Investigated previous works on finding bugs from smart contract.
Bug Oracles for Fuzzing

• Assertion Failure(AF): The condition of an assert statement is not satisfied.
− Check if an INVALID instruction is executed.
• Arbitrary Write(AW): An Attacker can overwrite arbitrary storage data by
accessing a mismanaged array object.
− Check if someone accesses storage data in a location that is larger than the length of the
storage.
− Same bug oracle with Harvey[1].
• Requirement Violation(RV): The condition of a require statement is not satisfied.
− Check if a REVERT instruction is executed.
Bug Oracles
[1] V. Wu ̈stholz and M. Christakis, “Harvey: A greybox fuzzer for smart contracts,” in Proceedings of the International Symposium on Founda- tions of Software Engineering: Industry Papers, 2020.

• Block State Dependency(BD): Block states decide ether transfer of a contract.
− Check if a block state(e.g. TIMESTAMP, NUMBER) can affect an ether transfer tracing both
direct and indirect taint flows for this.
• Control-Flow Hijack(CH): An attacker can arbitrarily control the destination of a
JUMP or DELEGATECALL instruction.
− Raise an alarm if someone can set the destination contract of a DELEGATECALL into an
arbitrary user contract.
− Report an alarm if the destination of a JUMP instruction is manipulatable.
Bug Oracles

• Ether Leak(EL): A contract allows an arbitrary user to freely retrieve ether from
the contract.
− Check if a normal user can gain ether by sending transactions to the contract only when the
transaction sequence does not have any preceding transaction from the deployer.
• Freezing Ether(FE): A contract can receive ether but does not have any means to
send out ether.
− Check if there is no way to transfer ether to someone during the execution while contract
balance is greater than zero.
− Same bug oracle with ContractFuzzer[1].
Bug Oracles
[1] B. Jiang, Y. Liu, and W. K. Chan, “ContractFuzzer: Fuzzing smart contracts for vulnerability detection,” in Proceedings of the International Conference on Automated Software Engineering, 2018.

• Mishandled Exception(ME): A contract does not check for an exception when
calling external functions or sending ether.
− Taint the return value of a CALL instruction flows into a predicate of a JUMPI instruction.
− If there is a return value that is not used by a JUMPI, we report an alarm.
• Multiple Send(MS): A contract sends out ether multiple times within one
transaction. This is a specific case of DoS.
− Detect multiple ether transfers taking place in a single transaction.
Bug Oracles

• Integer Bug(IB): Integer overflows or underflows occur, and the result becomes
an unexpected value.
− Check if the over/underflowed value is used to critical variables.
• Reentrancy(RE): A function in a victim contract is re-entered and leads to a race
condition on state variables.
− First, monitor if there is a cyclic call chain during an ether transfer.
− Then, use taint analysis to identify state variables that affect this ether transfer.
− Finally, report if such variables are updated after the transfer takes place.
Bug Oracles

• Suicidal Contract(SC): An arbitrary user can destroy a victim contract by running
a SELFDESTRUCT instruction.
− Check if a normal user can execute SELFDESTRUCT instruction and destroy the contract.
− Filter out that have any preceding transaction from the deployer in the sequence.
• Transaction Origin Use(TO): A contract relies on the origin of a transaction (i.e.
tx.origin) for user authorization.
− Taint the return value of ORIGIN instruction, and check if it flows into the predicate of a
JUMPI instruction.
Bug Oracles

• Static analysis module
− Used B2R2 [1] as a front-end for EVM bytecode.
− Wrote main analysis logic in 1K lines of F# code.
• Fuzzing module
− Extended Eclipser [2] to support EVM bytecode.
− Used Nethermind [3] for the emulation of the bytecode.
Implementation
[1] M. Jung et al., “B2R2: Building an efficient front-end for binary analysis,” NDSS BAR 2019
[2] J. Choi et al., “Grey-box Concolic Testing on Binary Code,” ICSE 2019
[3] "Nethermind," https://github.com/NethermindEth/nethermind

• Q1. Can static & dynamic data-flow analyses improve fuzzing?
• Q2. Can Smartian outperform other testing tools for smart contracts?
• Q3. How does Smartian perform on a large-scale benchmark?
Evaluation

• Benchmarks
− Used the dataset from Verismart [1] and SmartBugs [2]
• Comparison targets
− Two fuzzers (sFuzz, ILF) and two symbolic executors (Mythril, Manticore)
• Environment
− Used Docker container to run each tool on a single contract
Experimental Setup
[1] S. So et al., “VeriSmart: A highly precise safety verifier for ethereum smart contracts,” S&P 2020
[2] T. Durieux et al., “Empirical review of automated analysis tools on 47,587 ethereum smart contracts,” ICSE 2020

• Verismart [1] benchmark: 58 real-world contracts with integer overflow CVEs
• Compare three different modes of Smartian
Impact of Data-Flow Analyses

• Verismart [1] benchmark: 58 real-world contracts with integer overflow CVEs
• Compare four different modes of Smartian
What about Dynamic Analysis Only?

• Used a subset of the previous benchmark
• Compared against tools that support integer overflow detection
ILF: no support
Comparison against other Tool - 1

• SmartBugs [1] benchmark: contracts with labeled bugs
− Selected 3 bug class: block state dependency, mishandled exception, reentrancy
Comparison against other Tool - 2

• More experimental results
− Coverage measurement
− Consideration on different bug oracles
− Large-scale experiment
More in the Paper

• Improving the precision of static analysis
• Automatically inferring the ABI specification of contract
• Applying of our idea to other domains
Future Works

• Smartian is available at https://github.com/SoftSec-KAIST/Smartian
• We also release the artifacts for our evaluation
Open Science

[cb22] SMARTIAN: Enhancing Smart Contract Fuzzing with Static and Dynamic Data-Flow Analyses by Doyeon Kim

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [cb22] SMARTIAN: Enhancing Smart Contract Fuzzing with Static and Dynamic Data-Flow Analyses by Doyeon Kim

Similar to [cb22] SMARTIAN: Enhancing Smart Contract Fuzzing with Static and Dynamic Data-Flow Analyses by Doyeon Kim (20)

More from CODE BLUE

More from CODE BLUE (20)

Recently uploaded

Recently uploaded (17)

[cb22] SMARTIAN: Enhancing Smart Contract Fuzzing with Static and Dynamic Data-Flow Analyses by Doyeon Kim