SlideShare a Scribd company logo
How data flow analysis
operates in a
static code analyzer
Pavel Belikov
C++ Developer, PVS-Studio
belikov@viva64.com
/ 50
PVS-Studio
• Static analyzer for C, C++, C# code
• It works on Windows, Linux, macOS
• Plugin for Visual Studio
• Integrates into SonarQube and Jenkins
• Quick start (Standalone, pvs-studio-analyzer)
2
/ 50
Contents:
• Types and objectives of Data Flow Analysis
• Analysis of conditions
• Analysis of loops
• Symbolic execution
• Examples of errors found in real projects
3
/ 50
What is data flow analysis
•Calculate a set of values for expression or its properties
•Numbers
•Null/non-null pointer
•Strings
•The size and contents of containers/optional
• Determine state of variables
4
/ 50
The main objectives
• Set of values must be a superset of real values
• Time is limited
• Number of false positives must be minimized
5
/ 50
Why do we
need it?
static const int kDaysInMonth[13] = {
0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
};
bool ValidateDateTime(const DateTime& time) {
if (time.year < 1 || time.year > 9999 ||
time.month < 1 || time.month > 12 ||
time.day < 1 || time.day > 31 ||
time.hour < 0 || time.hour > 23 ||
time.minute < 0 || time.minute > 59 ||
time.second < 0 || time.second > 59) {
return false;
}
if (time.month == 2 && IsLeapYear(time.year)) {
return time.month <= kDaysInMonth[time.month] + 1;
} else {
return time.month <= kDaysInMonth[time.month];
}
}
6
/ 50
Why do we
need it?
static const int kDaysInMonth[13] = {
0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
};
bool ValidateDateTime(const DateTime& time) {
if (time.year < 1 || time.year > 9999 ||
time.month < 1 || time.month > 12 ||
time.day < 1 || time.day > 31 ||
time.hour < 0 || time.hour > 23 ||
time.minute < 0 || time.minute > 59 ||
time.second < 0 || time.second > 59) {
return false;
}
if (time.month == 2 && IsLeapYear(time.year)) {
return time.month <= kDaysInMonth[time.month] + 1;
} else {
return time.month <= kDaysInMonth[time.month];
}
}
7
Protobuf
• V547 / CWE-571 Expression 'time.month <=
kDaysInMonth[time.month] + 1' is always true. time.cc 83
• V547 / CWE-571 Expression 'time.month <=
kDaysInMonth[time.month]' is always true. time.cc 85
/ 50
The basic equation
• b – a code block
• in/out - a state of variables when entering and exiting the block
• trans - a function that transforms the state of variables in the block
• join - a function that merges the state of variables in different paths of execution
8
/ 50
Example
int a = 3;
if (something)
{
a = 4;
}
std::cout << a;
9
/ 50
Example
int a = 3; in = {}, out = {a=3}
if (something)
a = 4; in = {a=3}, out = {a=4}
std::cout << a; in = {a=3}∪{a=4}={a=[3;4]}
10
/ 50
Flow sensitivity
• Flow-sensitive analysis depends on the order of expressions in code
• An example of a flow-insensitive analysis: searching for modified variables in a block
• A way for code traversal is needed
11
/ 50
Flow sensitivity
• Data Flow works with Control Flow Graph
• In practice you can use AST (abstract syntax tree)
• AST is simpler and more understandable to most developers
• There are more tools for AST, parsers can generate AST
• CFG can be simulated on top of the AST
12
/ 50
Flow sensitivity
• Forward analysis
• Pass the information to the block B from the preceding blocks
• It suits well for calculating the values of variables and determining reaching
definitions
• Backward analysis
• Pass the information from the block B to the preceding blocks
• It suits well for live variable analysis
13
/ 50
Example of backward analysis
__private_extern__ void
YSHA1Transform(u_int32_t state[5],
const unsigned char buffer[64])
{
u_int32_t a, b, c, d, e;
....
state[0] += a;
state[1] += b;
state[2] += c;
state[3] += d;
state[4] += e;
/* Wipe variables */
a = b = c = d = e = 0;
}
XNU kernel
V1001 CWE-563 The 'a' variable is assigned but is not used until the end of the function. sha1mod.c 120
14
/ 50
Example of forward analysis
• Reaching definitions
• REACH - a set of variable definitions that can be read in the expression S
• GEN - new definitions
• KILL - "killed" definitions
15
/ 50
Example of forward analysis
ParseResult ParseOption (string option, ref string[] args , CompilerSettings settings) {
AssemblyResource res = null; GEN={res0}
switch (s.Length) {
case 1:
res = new AssemblyResource (s[0], Path.GetFileName (s[0])); GEN={res1}, KILL={res0}
break;
case 2:
res = new AssemblyResource (s[0], s[1]); GEN={res2}, KILL={res0}
break;
default:
report.Error (-2005, "Wrong number of arguments for option '{0}'", option);
return ParseResult.Error;
}
if (res != null) { ... } REACH={res1, res2}
}
ILSpy
V3022 Expression 'res != null' is always true. settings.cs 827
16
/ 50
Must vs may
• Must
• Data flow fact must be true for all paths
• It’s expressed through the intersection of sets
• May
• Fact should be correct at least for one path
• It is expressed through the union of sets
17
/ 50
Must vs may
• Static analysis often works with may
• No one writes
int *p = nullptr;
if (something) p = nullptr;
else if (something_else) p = nullptr;
else p = nullptr;
*p = 42;
18
/ 50
Must vs may
STDMETHODIMP sdnAccessible::get_computedStyle(
BSTR __RPC_FAR* aStyleProperties,
BSTR __RPC_FAR* aStyleValues,
unsigned short __RPC_FAR* aNumStyleProperties)
{
if (!aStyleProperties || aStyleValues || !aNumStyleProperties)
return E_INVALIDARG;
....
aStyleValues[realIndex] = ::SysAllocString(value.get());
....
}
Mozilla Thunderbird
V522 Dereferencing of the null pointer ‘aStyleValues’ might take place. sdnaccessible.cpp 252
19
/ 50
Path-sensitive analysis
• May in one of the paths is not enough
• What if the path is impossible?
• We need to analyze the conditions!
20
/ 50
Path-sensitive analysis
enum {
Runesync = 0x80,
Runeself = 0x80,
};
char* utfrune(const char *s, int c) {
....
if (c < Runesync) return strchr(s, c); // c: then [INT_MIN; 0x79] else [0x80; INT_MAX]
for(;;) {
c1 = *(unsigned char*)s;
if (c1 < Runeself) { // c1: then [0; 0x79]
if (c1 == 0) return 0; // c1: then 0 else [1; 0x79]
if (c1 == c) return (char*)s; // if ([1; 0x79] == [0x80; INT_MAX])
....
}
....
}
return 0;
}
RE2 V547 CWE-570 Expression 'c1 == c' is always false. rune.cc 247
21
/ 50
Short circuit
if ( x >= 0 && x <= 10 ) {
} else {
}
22
/ 50
Short circuit
23
x = [0; INT_MAX]
x = [INT_MIN; -1]
if ( x >= 0 && x <= 10 ) {
} else {
}
/ 50
Short circuit
24
x = [0; INT_MAX]
x = [INT_MIN; -1]
x = [0; 10]
x = [11; INT_MAX]
then: x = [0; 10]
else: x = [INT_MIN; -1] ∪ [11; INT_MAX]
x = [0; INT_MAX]
x = [INT_MIN; -1]
if ( x >= 0 && x <= 10 ) {
} else {
}
/ 50
Short circuit
internal bool SafeForExport()
{
return DisplayEntry.SafeForExport() &&
ItemSelectionCondition == null
|| ItemSelectionCondition.SafeForExport();
}
PowerShell
V3080 Possible null dereference. Consider inspecting ‘ItemSelectionCondition’. System.Management.Automation
displayDescriptionData_List.cs 352
25
/ 50
Join problem
int *p;
if (condition) {
p = new int;
} else {
p = nullptr;
}
// p - nullable
if (condition) {
*p = 42; // null dereference?
}
26
/ 50
Join problem
• We lose the information when we unite the paths
• It is better to postpone the merging of states for as long as possible
• But there is a problem with path explosion
27
/ 50
Join problem
int *p;
if (condition) {
p = new int; // p = non null if condition
} else {
p = nullptr; // p = null if !condition
}
// p = non null if condition
// ∪ null if !condition
if (condition) {
// p = non null
*p = 42;
}
28
/ 50
Join problem
int arr[4];
int a, b;
if (condition) {
a = 1;
b = 2;
} else {
a = 2;
b = 1;
}
return arr[a + b]; // a = 1 if condition ∪ 2 if !condition
// b = 2 if condition ∪ 1 if !condition
// a + b = 3 if condition ∪ 3 if !condition
// a + b = 3
29
/ 50
Try-catch
30
try {
SomeClass c(someFunction(), 42);
c.foo();
return c + “abc”;
} catch (...) {
}
/ 50
Try-catch
try {
SomeClass c(someFunction(), 42);
c.foo();
return c + “abc”;
} catch (...) {
}
• call of someFunction()
• constructor of c variable
• call of foo() method
• constructors of temporary objects
• operator +
• constructor for returned object
• destructors for temporary objects
• destructor of c variable
31
/ 50
Loop analysis
•In general case, it is difficult and slow to analyze
•Analyze the first iteration separately
•"Kill" all new definitions of variables after a loop
32
Are you stuck in
an infinite loop?
YesNo
/ 50
Loop invariants
public final R getSomeBuildWithWorkspace() {
int cnt=0; // <= variable definition outside of the loop
for (R b = getLastBuild(); cnt<5 && b!=null; b=b.getPreviousBuild()) {
FilePath ws = b.getWorkspace();
if (ws!=null) return b;
}
return null;
}
Jenkins
V6022 Expression 'cnt < 5' is always true AbstractProject.java 557
33
/ 50
The first iteration
void Measure::read(XmlReader& e, int staffIdx) {
Segment* segment = 0;
....
while (e.readNextStartElement()) {
const QStringRef& tag(e.name());
if (tag == "move")
e.initTick(e.readFraction().ticks() + tick());
....
else if (tag == "sysInitBarLineType") {
....
segment = getSegmentR(SegmentType::BeginBarLine, 0); // !!!
segment->add(barLine); // <= OK
}
....
else if (tag == "Segment")
segment->read(e); // <= ERROR
....
}
}
MuseScore V522 Dereferencing of the null pointer 'segment' might take place. measure.cpp 2220
34
/ 50
Loop control flow
SkOpSpan* SkOpContour::undoneSpan() {
SkOpSegment* testSegment = &fHead;
bool allDone = true;
do {
if (testSegment->done()) {
continue;
}
allDone = false;
return testSegment->undoneSpan();
} while ((testSegment = testSegment->next()));
if (allDone) {
fDone = true;
}
return nullptr;
}
Skia Graphics Engine
V547 CWE-571 Expression 'allDone' is always true. skopcontour.cpp 43
35
/ 50
Loop control flow
SkOpSpan* SkOpContour::undoneSpan() {
SkOpSegment* testSegment = &fHead;
bool allDone = true;
do {
if (testSegment->done()) {
continue;
}
allDone = false; // <= we don’t take into account this path
return testSegment->undoneSpan();
} while ((testSegment = testSegment->next()));
if (allDone) {
fDone = true;
}
return nullptr;
}
Skia Graphics Engine
V547 CWE-571 Expression 'allDone' is always true. skopcontour.cpp 43
36
/ 50
Loop counter analysis
for (int i = 0; i < 10; ++i)
{
// i = [INT_MIN; 9] ?
// i = [0; 9] !!!
}
37
/ 50
Loop counter analysis
#define AE_IDLE_TIMEOUT 100
static void
ae_stop_rxmac(ae_softc_t *sc)
{
int i;
....
/*
* Wait for IDLE state.
*/
for (i = 0; i < AE_IDLE_TIMEOUT; i--) { // <=
val = AE_READ_4(sc, AE_IDLE_REG);
if ((val & (AE_IDLE_RXMAC | AE_IDLE_DMAWRITE)) == 0)
break;
DELAY(100);
}
....
}
FreeBSD Kernel
V621 Consider inspecting the 'for' operator. It's possible that the loop will be executed incorrectly or won't be executed at all. if_ae.c 1663
38
/ 50
There is a problem
for (int i = 0; i < n; ++i) {
for (int j = i + 1; j < n; ++j) {
// j - i
}
}
39
/ 50
There is a problem
int i = /* [0; 42] */;
int j = i + 1; // [1; 43]
int r = j - i; // [-43; 41]???
40
/ 50
Symbolic execution
int i = /* [0; 42] */;
int j = i + 1; // [1; 43]
int r = j - i; // i + 1 - i = 1
41
/ 50
Symbolic execution
• Calculate everything in symbolic expressions
• Create a system of equations
• Upload it into SMT solver
• ???
• PROFIT
42
/ 50
Symbolic execution
public static MMMethodKind valueOf(....) {
MMMethodKind result = OTHER;
for (MMMethodKind k : values()) {
if (k.detector.test(method) && result.level < k.level) {
if (result.level == k.level) {
throw new SpoonException(....);
}
result = k;
}
}
return result;
}
Spoon
MMMethodKind.java:129: V6007 Expression 'result.level == k.level' is always false.
43
/ 50
Context sensitive
•foo();
•We can reset all of the accumulated information
•Annotate popular libraries
•Enjoy the 10 ways to pass a variable to a function
44
/ 50
Context sensitive
•analysis of a function considering the context of the caller
•scales poorly
•useful for analyzing small functions (getters/setters, for example)
45
/ 50
Context sensitive
void foo(int *p) { // analyze two times
*p = 42;
}
void bar() {
int *p = something ? new int : nullptr;
foo(p); // repeatedly analyze foo and find a bug
}
46
/ 50
Context insensitive
void foo(int *p) { // p != nullptr
*p = 42;
}
void bar() {
int *p = something ? new int : nullptr;
foo(p); // p != nullptr contract is violated, found a bug
}
47
/ 50
Context insensitive
• Analyze the body of a function, compose an annotation for it
• Contract for arguments
• Presence of a global state
• Returned value
• And much more
• contracts proposal
void foo(const std::vector<int> &indices)
[[expects: !indices.empty()]];
48
/ 50
Conclusions
• Data flow analysis is a useful technique for finding errors
• To find bugs one has to operate large and sometimes strange set of properties
• The combination of different techniques allows to increase the reliability of analysis
results
• Various heuristics and assumptions allow finding more bugs
• Every significant static analyzer must use data flow analysis
49
/ 50
Answering your questions
PVS-Studio: http://www.viva64.com/ru/pvs-studio/
50

More Related Content

What's hot

What's hot (20)

External service interaction
External service interactionExternal service interaction
External service interaction
 
Denial of Service Attacks (DoS/DDoS)
Denial of Service Attacks (DoS/DDoS)Denial of Service Attacks (DoS/DDoS)
Denial of Service Attacks (DoS/DDoS)
 
Botnet
Botnet Botnet
Botnet
 
Introduction to Deep Learning (Dmytro Fishman Technology Stream)
Introduction to Deep Learning (Dmytro Fishman Technology Stream) Introduction to Deep Learning (Dmytro Fishman Technology Stream)
Introduction to Deep Learning (Dmytro Fishman Technology Stream)
 
Text summarization using deep learning
Text summarization using deep learningText summarization using deep learning
Text summarization using deep learning
 
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
 
LFI to RCE
LFI to RCELFI to RCE
LFI to RCE
 
KHNOG 3: DDoS Attack Prevention
KHNOG 3: DDoS Attack PreventionKHNOG 3: DDoS Attack Prevention
KHNOG 3: DDoS Attack Prevention
 
Ethical Hacking
Ethical HackingEthical Hacking
Ethical Hacking
 
Sécurité des applications web: attaque et défense
Sécurité des applications web: attaque et défenseSécurité des applications web: attaque et défense
Sécurité des applications web: attaque et défense
 
[NeuralIPS 2020]filter in filter pruning
[NeuralIPS 2020]filter in filter pruning[NeuralIPS 2020]filter in filter pruning
[NeuralIPS 2020]filter in filter pruning
 
Social Engineering,social engeineering techniques,social engineering protecti...
Social Engineering,social engeineering techniques,social engineering protecti...Social Engineering,social engeineering techniques,social engineering protecti...
Social Engineering,social engeineering techniques,social engineering protecti...
 
Bài 2: Phần mềm độc hại và các dạng tấn công sử dụng kỹ nghệ xã hội - Giáo tr...
Bài 2: Phần mềm độc hại và các dạng tấn công sử dụng kỹ nghệ xã hội - Giáo tr...Bài 2: Phần mềm độc hại và các dạng tấn công sử dụng kỹ nghệ xã hội - Giáo tr...
Bài 2: Phần mềm độc hại và các dạng tấn công sử dụng kỹ nghệ xã hội - Giáo tr...
 
MW_Arch Fastest_way_to_hunt_on_Windows_v1.01
MW_Arch Fastest_way_to_hunt_on_Windows_v1.01MW_Arch Fastest_way_to_hunt_on_Windows_v1.01
MW_Arch Fastest_way_to_hunt_on_Windows_v1.01
 
Shellshock - A Software Bug
Shellshock - A Software BugShellshock - A Software Bug
Shellshock - A Software Bug
 
Process injection - Malware style
Process injection - Malware styleProcess injection - Malware style
Process injection - Malware style
 
Breaking Obfuscated Programs with Symbolic Execution
Breaking Obfuscated Programs with Symbolic ExecutionBreaking Obfuscated Programs with Symbolic Execution
Breaking Obfuscated Programs with Symbolic Execution
 
Notes de cours et tp - Administation Systèmes
Notes de cours et tp  - Administation Systèmes Notes de cours et tp  - Administation Systèmes
Notes de cours et tp - Administation Systèmes
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & Python
 
Cloud Privacy & Security compliance
Cloud Privacy & Security complianceCloud Privacy & Security compliance
Cloud Privacy & Security compliance
 

Similar to How Data Flow analysis works in a static code analyzer

Day2 Verilog HDL Basic
Day2 Verilog HDL BasicDay2 Verilog HDL Basic
Day2 Verilog HDL Basic
Ron Liu
 

Similar to How Data Flow analysis works in a static code analyzer (20)

Story of static code analyzer development
Story of static code analyzer developmentStory of static code analyzer development
Story of static code analyzer development
 
The Unicorn Getting Interested in KDE
The Unicorn Getting Interested in KDEThe Unicorn Getting Interested in KDE
The Unicorn Getting Interested in KDE
 
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
 
The operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerThe operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzer
 
How to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJITHow to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJIT
 
Static analysis and writing C/C++ of high quality code for embedded systems
Static analysis and writing C/C++ of high quality code for embedded systemsStatic analysis and writing C/C++ of high quality code for embedded systems
Static analysis and writing C/C++ of high quality code for embedded systems
 
Checking the code of Valgrind dynamic analyzer by a static analyzer
Checking the code of Valgrind dynamic analyzer by a static analyzerChecking the code of Valgrind dynamic analyzer by a static analyzer
Checking the code of Valgrind dynamic analyzer by a static analyzer
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...
 
Node.js System: The Landing
Node.js System: The LandingNode.js System: The Landing
Node.js System: The Landing
 
Checking the Cross-Platform Framework Cocos2d-x
Checking the Cross-Platform Framework Cocos2d-xChecking the Cross-Platform Framework Cocos2d-x
Checking the Cross-Platform Framework Cocos2d-x
 
How to fake_properly
How to fake_properlyHow to fake_properly
How to fake_properly
 
Detection of errors and potential vulnerabilities in C and C++ code using the...
Detection of errors and potential vulnerabilities in C and C++ code using the...Detection of errors and potential vulnerabilities in C and C++ code using the...
Detection of errors and potential vulnerabilities in C and C++ code using the...
 
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMEREVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
 
Introduction to web programming for java and c# programmers by @drpicox
Introduction to web programming for java and c# programmers by @drpicoxIntroduction to web programming for java and c# programmers by @drpicox
Introduction to web programming for java and c# programmers by @drpicox
 
Static analysis: Around Java in 60 minutes
Static analysis: Around Java in 60 minutesStatic analysis: Around Java in 60 minutes
Static analysis: Around Java in 60 minutes
 
Algorithm analysis.pptx
Algorithm analysis.pptxAlgorithm analysis.pptx
Algorithm analysis.pptx
 
Static code analysis: what? how? why?
Static code analysis: what? how? why?Static code analysis: what? how? why?
Static code analysis: what? how? why?
 
Day2 Verilog HDL Basic
Day2 Verilog HDL BasicDay2 Verilog HDL Basic
Day2 Verilog HDL Basic
 
Update on C++ Core Guidelines Lifetime Analysis. Gábor Horváth. CoreHard Spri...
Update on C++ Core Guidelines Lifetime Analysis. Gábor Horváth. CoreHard Spri...Update on C++ Core Guidelines Lifetime Analysis. Gábor Horváth. CoreHard Spri...
Update on C++ Core Guidelines Lifetime Analysis. Gábor Horváth. CoreHard Spri...
 
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 1
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 1Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 1
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 1
 

More from Andrey Karpov

More from Andrey Karpov (20)

60 антипаттернов для С++ программиста
60 антипаттернов для С++ программиста60 антипаттернов для С++ программиста
60 антипаттернов для С++ программиста
 
60 terrible tips for a C++ developer
60 terrible tips for a C++ developer60 terrible tips for a C++ developer
60 terrible tips for a C++ developer
 
Ошибки, которые сложно заметить на code review, но которые находятся статичес...
Ошибки, которые сложно заметить на code review, но которые находятся статичес...Ошибки, которые сложно заметить на code review, но которые находятся статичес...
Ошибки, которые сложно заметить на code review, но которые находятся статичес...
 
PVS-Studio in 2021 - Error Examples
PVS-Studio in 2021 - Error ExamplesPVS-Studio in 2021 - Error Examples
PVS-Studio in 2021 - Error Examples
 
PVS-Studio in 2021 - Feature Overview
PVS-Studio in 2021 - Feature OverviewPVS-Studio in 2021 - Feature Overview
PVS-Studio in 2021 - Feature Overview
 
PVS-Studio в 2021 - Примеры ошибок
PVS-Studio в 2021 - Примеры ошибокPVS-Studio в 2021 - Примеры ошибок
PVS-Studio в 2021 - Примеры ошибок
 
PVS-Studio в 2021
PVS-Studio в 2021PVS-Studio в 2021
PVS-Studio в 2021
 
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
 
Best Bugs from Games: Fellow Programmers' Mistakes
Best Bugs from Games: Fellow Programmers' MistakesBest Bugs from Games: Fellow Programmers' Mistakes
Best Bugs from Games: Fellow Programmers' Mistakes
 
Does static analysis need machine learning?
Does static analysis need machine learning?Does static analysis need machine learning?
Does static analysis need machine learning?
 
Typical errors in code on the example of C++, C#, and Java
Typical errors in code on the example of C++, C#, and JavaTypical errors in code on the example of C++, C#, and Java
Typical errors in code on the example of C++, C#, and Java
 
How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)
How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)
How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)
 
Game Engine Code Quality: Is Everything Really That Bad?
Game Engine Code Quality: Is Everything Really That Bad?Game Engine Code Quality: Is Everything Really That Bad?
Game Engine Code Quality: Is Everything Really That Bad?
 
C++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical ReviewerC++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical Reviewer
 
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
The Use of Static Code Analysis When Teaching or Developing Open-Source SoftwareThe Use of Static Code Analysis When Teaching or Developing Open-Source Software
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
 
Static Code Analysis for Projects, Built on Unreal Engine
Static Code Analysis for Projects, Built on Unreal EngineStatic Code Analysis for Projects, Built on Unreal Engine
Static Code Analysis for Projects, Built on Unreal Engine
 
Safety on the Max: How to Write Reliable C/C++ Code for Embedded Systems
Safety on the Max: How to Write Reliable C/C++ Code for Embedded SystemsSafety on the Max: How to Write Reliable C/C++ Code for Embedded Systems
Safety on the Max: How to Write Reliable C/C++ Code for Embedded Systems
 
The Great and Mighty C++
The Great and Mighty C++The Great and Mighty C++
The Great and Mighty C++
 
Zero, one, two, Freddy's coming for you
Zero, one, two, Freddy's coming for youZero, one, two, Freddy's coming for you
Zero, one, two, Freddy's coming for you
 
PVS-Studio Is Now in Chocolatey: Checking Chocolatey under Azure DevOps
PVS-Studio Is Now in Chocolatey: Checking Chocolatey under Azure DevOpsPVS-Studio Is Now in Chocolatey: Checking Chocolatey under Azure DevOps
PVS-Studio Is Now in Chocolatey: Checking Chocolatey under Azure DevOps
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with StrimziStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
A Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data MigrationA Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data Migration
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 

How Data Flow analysis works in a static code analyzer

  • 1. How data flow analysis operates in a static code analyzer Pavel Belikov C++ Developer, PVS-Studio belikov@viva64.com
  • 2. / 50 PVS-Studio • Static analyzer for C, C++, C# code • It works on Windows, Linux, macOS • Plugin for Visual Studio • Integrates into SonarQube and Jenkins • Quick start (Standalone, pvs-studio-analyzer) 2
  • 3. / 50 Contents: • Types and objectives of Data Flow Analysis • Analysis of conditions • Analysis of loops • Symbolic execution • Examples of errors found in real projects 3
  • 4. / 50 What is data flow analysis •Calculate a set of values for expression or its properties •Numbers •Null/non-null pointer •Strings •The size and contents of containers/optional • Determine state of variables 4
  • 5. / 50 The main objectives • Set of values must be a superset of real values • Time is limited • Number of false positives must be minimized 5
  • 6. / 50 Why do we need it? static const int kDaysInMonth[13] = { 0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }; bool ValidateDateTime(const DateTime& time) { if (time.year < 1 || time.year > 9999 || time.month < 1 || time.month > 12 || time.day < 1 || time.day > 31 || time.hour < 0 || time.hour > 23 || time.minute < 0 || time.minute > 59 || time.second < 0 || time.second > 59) { return false; } if (time.month == 2 && IsLeapYear(time.year)) { return time.month <= kDaysInMonth[time.month] + 1; } else { return time.month <= kDaysInMonth[time.month]; } } 6
  • 7. / 50 Why do we need it? static const int kDaysInMonth[13] = { 0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }; bool ValidateDateTime(const DateTime& time) { if (time.year < 1 || time.year > 9999 || time.month < 1 || time.month > 12 || time.day < 1 || time.day > 31 || time.hour < 0 || time.hour > 23 || time.minute < 0 || time.minute > 59 || time.second < 0 || time.second > 59) { return false; } if (time.month == 2 && IsLeapYear(time.year)) { return time.month <= kDaysInMonth[time.month] + 1; } else { return time.month <= kDaysInMonth[time.month]; } } 7 Protobuf • V547 / CWE-571 Expression 'time.month <= kDaysInMonth[time.month] + 1' is always true. time.cc 83 • V547 / CWE-571 Expression 'time.month <= kDaysInMonth[time.month]' is always true. time.cc 85
  • 8. / 50 The basic equation • b – a code block • in/out - a state of variables when entering and exiting the block • trans - a function that transforms the state of variables in the block • join - a function that merges the state of variables in different paths of execution 8
  • 9. / 50 Example int a = 3; if (something) { a = 4; } std::cout << a; 9
  • 10. / 50 Example int a = 3; in = {}, out = {a=3} if (something) a = 4; in = {a=3}, out = {a=4} std::cout << a; in = {a=3}∪{a=4}={a=[3;4]} 10
  • 11. / 50 Flow sensitivity • Flow-sensitive analysis depends on the order of expressions in code • An example of a flow-insensitive analysis: searching for modified variables in a block • A way for code traversal is needed 11
  • 12. / 50 Flow sensitivity • Data Flow works with Control Flow Graph • In practice you can use AST (abstract syntax tree) • AST is simpler and more understandable to most developers • There are more tools for AST, parsers can generate AST • CFG can be simulated on top of the AST 12
  • 13. / 50 Flow sensitivity • Forward analysis • Pass the information to the block B from the preceding blocks • It suits well for calculating the values of variables and determining reaching definitions • Backward analysis • Pass the information from the block B to the preceding blocks • It suits well for live variable analysis 13
  • 14. / 50 Example of backward analysis __private_extern__ void YSHA1Transform(u_int32_t state[5], const unsigned char buffer[64]) { u_int32_t a, b, c, d, e; .... state[0] += a; state[1] += b; state[2] += c; state[3] += d; state[4] += e; /* Wipe variables */ a = b = c = d = e = 0; } XNU kernel V1001 CWE-563 The 'a' variable is assigned but is not used until the end of the function. sha1mod.c 120 14
  • 15. / 50 Example of forward analysis • Reaching definitions • REACH - a set of variable definitions that can be read in the expression S • GEN - new definitions • KILL - "killed" definitions 15
  • 16. / 50 Example of forward analysis ParseResult ParseOption (string option, ref string[] args , CompilerSettings settings) { AssemblyResource res = null; GEN={res0} switch (s.Length) { case 1: res = new AssemblyResource (s[0], Path.GetFileName (s[0])); GEN={res1}, KILL={res0} break; case 2: res = new AssemblyResource (s[0], s[1]); GEN={res2}, KILL={res0} break; default: report.Error (-2005, "Wrong number of arguments for option '{0}'", option); return ParseResult.Error; } if (res != null) { ... } REACH={res1, res2} } ILSpy V3022 Expression 'res != null' is always true. settings.cs 827 16
  • 17. / 50 Must vs may • Must • Data flow fact must be true for all paths • It’s expressed through the intersection of sets • May • Fact should be correct at least for one path • It is expressed through the union of sets 17
  • 18. / 50 Must vs may • Static analysis often works with may • No one writes int *p = nullptr; if (something) p = nullptr; else if (something_else) p = nullptr; else p = nullptr; *p = 42; 18
  • 19. / 50 Must vs may STDMETHODIMP sdnAccessible::get_computedStyle( BSTR __RPC_FAR* aStyleProperties, BSTR __RPC_FAR* aStyleValues, unsigned short __RPC_FAR* aNumStyleProperties) { if (!aStyleProperties || aStyleValues || !aNumStyleProperties) return E_INVALIDARG; .... aStyleValues[realIndex] = ::SysAllocString(value.get()); .... } Mozilla Thunderbird V522 Dereferencing of the null pointer ‘aStyleValues’ might take place. sdnaccessible.cpp 252 19
  • 20. / 50 Path-sensitive analysis • May in one of the paths is not enough • What if the path is impossible? • We need to analyze the conditions! 20
  • 21. / 50 Path-sensitive analysis enum { Runesync = 0x80, Runeself = 0x80, }; char* utfrune(const char *s, int c) { .... if (c < Runesync) return strchr(s, c); // c: then [INT_MIN; 0x79] else [0x80; INT_MAX] for(;;) { c1 = *(unsigned char*)s; if (c1 < Runeself) { // c1: then [0; 0x79] if (c1 == 0) return 0; // c1: then 0 else [1; 0x79] if (c1 == c) return (char*)s; // if ([1; 0x79] == [0x80; INT_MAX]) .... } .... } return 0; } RE2 V547 CWE-570 Expression 'c1 == c' is always false. rune.cc 247 21
  • 22. / 50 Short circuit if ( x >= 0 && x <= 10 ) { } else { } 22
  • 23. / 50 Short circuit 23 x = [0; INT_MAX] x = [INT_MIN; -1] if ( x >= 0 && x <= 10 ) { } else { }
  • 24. / 50 Short circuit 24 x = [0; INT_MAX] x = [INT_MIN; -1] x = [0; 10] x = [11; INT_MAX] then: x = [0; 10] else: x = [INT_MIN; -1] ∪ [11; INT_MAX] x = [0; INT_MAX] x = [INT_MIN; -1] if ( x >= 0 && x <= 10 ) { } else { }
  • 25. / 50 Short circuit internal bool SafeForExport() { return DisplayEntry.SafeForExport() && ItemSelectionCondition == null || ItemSelectionCondition.SafeForExport(); } PowerShell V3080 Possible null dereference. Consider inspecting ‘ItemSelectionCondition’. System.Management.Automation displayDescriptionData_List.cs 352 25
  • 26. / 50 Join problem int *p; if (condition) { p = new int; } else { p = nullptr; } // p - nullable if (condition) { *p = 42; // null dereference? } 26
  • 27. / 50 Join problem • We lose the information when we unite the paths • It is better to postpone the merging of states for as long as possible • But there is a problem with path explosion 27
  • 28. / 50 Join problem int *p; if (condition) { p = new int; // p = non null if condition } else { p = nullptr; // p = null if !condition } // p = non null if condition // ∪ null if !condition if (condition) { // p = non null *p = 42; } 28
  • 29. / 50 Join problem int arr[4]; int a, b; if (condition) { a = 1; b = 2; } else { a = 2; b = 1; } return arr[a + b]; // a = 1 if condition ∪ 2 if !condition // b = 2 if condition ∪ 1 if !condition // a + b = 3 if condition ∪ 3 if !condition // a + b = 3 29
  • 30. / 50 Try-catch 30 try { SomeClass c(someFunction(), 42); c.foo(); return c + “abc”; } catch (...) { }
  • 31. / 50 Try-catch try { SomeClass c(someFunction(), 42); c.foo(); return c + “abc”; } catch (...) { } • call of someFunction() • constructor of c variable • call of foo() method • constructors of temporary objects • operator + • constructor for returned object • destructors for temporary objects • destructor of c variable 31
  • 32. / 50 Loop analysis •In general case, it is difficult and slow to analyze •Analyze the first iteration separately •"Kill" all new definitions of variables after a loop 32 Are you stuck in an infinite loop? YesNo
  • 33. / 50 Loop invariants public final R getSomeBuildWithWorkspace() { int cnt=0; // <= variable definition outside of the loop for (R b = getLastBuild(); cnt<5 && b!=null; b=b.getPreviousBuild()) { FilePath ws = b.getWorkspace(); if (ws!=null) return b; } return null; } Jenkins V6022 Expression 'cnt < 5' is always true AbstractProject.java 557 33
  • 34. / 50 The first iteration void Measure::read(XmlReader& e, int staffIdx) { Segment* segment = 0; .... while (e.readNextStartElement()) { const QStringRef& tag(e.name()); if (tag == "move") e.initTick(e.readFraction().ticks() + tick()); .... else if (tag == "sysInitBarLineType") { .... segment = getSegmentR(SegmentType::BeginBarLine, 0); // !!! segment->add(barLine); // <= OK } .... else if (tag == "Segment") segment->read(e); // <= ERROR .... } } MuseScore V522 Dereferencing of the null pointer 'segment' might take place. measure.cpp 2220 34
  • 35. / 50 Loop control flow SkOpSpan* SkOpContour::undoneSpan() { SkOpSegment* testSegment = &fHead; bool allDone = true; do { if (testSegment->done()) { continue; } allDone = false; return testSegment->undoneSpan(); } while ((testSegment = testSegment->next())); if (allDone) { fDone = true; } return nullptr; } Skia Graphics Engine V547 CWE-571 Expression 'allDone' is always true. skopcontour.cpp 43 35
  • 36. / 50 Loop control flow SkOpSpan* SkOpContour::undoneSpan() { SkOpSegment* testSegment = &fHead; bool allDone = true; do { if (testSegment->done()) { continue; } allDone = false; // <= we don’t take into account this path return testSegment->undoneSpan(); } while ((testSegment = testSegment->next())); if (allDone) { fDone = true; } return nullptr; } Skia Graphics Engine V547 CWE-571 Expression 'allDone' is always true. skopcontour.cpp 43 36
  • 37. / 50 Loop counter analysis for (int i = 0; i < 10; ++i) { // i = [INT_MIN; 9] ? // i = [0; 9] !!! } 37
  • 38. / 50 Loop counter analysis #define AE_IDLE_TIMEOUT 100 static void ae_stop_rxmac(ae_softc_t *sc) { int i; .... /* * Wait for IDLE state. */ for (i = 0; i < AE_IDLE_TIMEOUT; i--) { // <= val = AE_READ_4(sc, AE_IDLE_REG); if ((val & (AE_IDLE_RXMAC | AE_IDLE_DMAWRITE)) == 0) break; DELAY(100); } .... } FreeBSD Kernel V621 Consider inspecting the 'for' operator. It's possible that the loop will be executed incorrectly or won't be executed at all. if_ae.c 1663 38
  • 39. / 50 There is a problem for (int i = 0; i < n; ++i) { for (int j = i + 1; j < n; ++j) { // j - i } } 39
  • 40. / 50 There is a problem int i = /* [0; 42] */; int j = i + 1; // [1; 43] int r = j - i; // [-43; 41]??? 40
  • 41. / 50 Symbolic execution int i = /* [0; 42] */; int j = i + 1; // [1; 43] int r = j - i; // i + 1 - i = 1 41
  • 42. / 50 Symbolic execution • Calculate everything in symbolic expressions • Create a system of equations • Upload it into SMT solver • ??? • PROFIT 42
  • 43. / 50 Symbolic execution public static MMMethodKind valueOf(....) { MMMethodKind result = OTHER; for (MMMethodKind k : values()) { if (k.detector.test(method) && result.level < k.level) { if (result.level == k.level) { throw new SpoonException(....); } result = k; } } return result; } Spoon MMMethodKind.java:129: V6007 Expression 'result.level == k.level' is always false. 43
  • 44. / 50 Context sensitive •foo(); •We can reset all of the accumulated information •Annotate popular libraries •Enjoy the 10 ways to pass a variable to a function 44
  • 45. / 50 Context sensitive •analysis of a function considering the context of the caller •scales poorly •useful for analyzing small functions (getters/setters, for example) 45
  • 46. / 50 Context sensitive void foo(int *p) { // analyze two times *p = 42; } void bar() { int *p = something ? new int : nullptr; foo(p); // repeatedly analyze foo and find a bug } 46
  • 47. / 50 Context insensitive void foo(int *p) { // p != nullptr *p = 42; } void bar() { int *p = something ? new int : nullptr; foo(p); // p != nullptr contract is violated, found a bug } 47
  • 48. / 50 Context insensitive • Analyze the body of a function, compose an annotation for it • Contract for arguments • Presence of a global state • Returned value • And much more • contracts proposal void foo(const std::vector<int> &indices) [[expects: !indices.empty()]]; 48
  • 49. / 50 Conclusions • Data flow analysis is a useful technique for finding errors • To find bugs one has to operate large and sometimes strange set of properties • The combination of different techniques allows to increase the reliability of analysis results • Various heuristics and assumptions allow finding more bugs • Every significant static analyzer must use data flow analysis 49
  • 50. / 50 Answering your questions PVS-Studio: http://www.viva64.com/ru/pvs-studio/ 50