Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Technologies used in the PVS-Studio code
analyzer for finding bugs and potential
vulnerabilities
Author: Andrey Karpov
Date: 21.11.2018
Tags: Cpp, StaticAnalysis, Knowledge, Security
A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large
number of error patterns and potential vulnerabilities. The article describes the implementation of the
analyzer for C and C++ code, but this information is applicable for modules responsible for the analysis
of C# and Java code.
Introduction
There are misconceptions that static code analyzers are simple programs based on code patterns search
using regular expressions. This is far from the truth. Moreover, it is simply impossible to detect the vast
majority of errors using regular expressions.
This wrong belief arose based on developers' experience when working with some tools, which existed
10-20 years ago. Back then, functionality of those tools often came down to searching for dangerous
code patterns and such functions as strcpy, strcat and so on. RATS can be called a representative of such
kind of tools.
Although such tools could provide benefit, they were generally irrelevant and ineffective. Since that
time, many developers have had these memories that static analyzers are quite useless tools that
interfere with the work rather than help it.
Time has passed and static analyzers started to represent complicated solutions performing deep code
analysis and finding bugs, which remain in code even after a careful code review. Unfortunately, due to
past negative experiences, many programmers still consider static analysis methodology as useless and
are reluctant to introduce it into the development process.

In this article, I will try to somehow fix the situation. I'd like to ask readers to give me 15 minutes and get
acquainted with technologies the PVS-Studio static code analyzer uses to find bugs. Perhaps after that
you will look in a new way at static analysis tools and might like to apply them in your work.
Data-Flow Analysis
Data flow analysis enables you to find various errors. Here are some of them: array index out of bounds,
memory leaks, always true/false conditions, null pointer dereference, and so on.
Data analysis can be also used to search for situations when unchecked data coming from the outside is
used. An attacker can prepare a set of input data to make the program operate in a way he needs. In
other words, he can exploit insufficient control of input data as a vulnerability. A specialized V1010
diagnostic that detects unchecked data usage in PVS-Studio is implemented and constantly improved.
Data-Flow Analysis represents the calculation of possible values of variables at various points in a
computer program. For example, if a pointer is dereferenced, and it is known that at this moment it can
be null, then this is a bug, and a static analyzer will warn about it.
Let's take a practical example of data flow analysis usage for finding bugs. Here we have a function from
the Protocol Buffers (protobuf) project meant for data validation.
static const int kDaysInMonth[13] = {
0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
};
bool ValidateDateTime(const DateTime& time) {
if (time.year < 1 || time.year > 9999 ||
time.month < 1 || time.month > 12 ||
time.day < 1 || time.day > 31 ||
time.hour < 0 || time.hour > 23 ||
time.minute < 0 || time.minute > 59 ||
time.second < 0 || time.second > 59) {
return false;
}
if (time.month == 2 && IsLeapYear(time.year)) {
return time.month <= kDaysInMonth[time.month] + 1;
} else {
return time.month <= kDaysInMonth[time.month];
}
}
In the function, the PVS-Studio analyzer found two logical errors and issued the following messages:
• V547 / CWE-571 Expression 'time.month <= kDaysInMonth[time.month] + 1' is always true.
time.cc 83

• V547 / CWE-571 Expression 'time.month <= kDaysInMonth[time.month]' is always true. time.cc
85
Let's pay attention to the subexpression "time.month < 1 || time.month > 12". If the month value is
outside the range [1..12], the function finishes its work. The analyzer takes this into account and knows
that if the second if statement started to execute, the month value certainly fell into the range [1..12].
Similarly, it knows about the range of other variables (year, day, etc.), but they are not of interest for us
now.
Now let's take a look at two similar access statements to the array elements:
kDaysInMonth[time.month].
The array set statically, and the analyzer knows the values of all of its elements:
static const int kDaysInMonth[13] = {
0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
};
As the months are numbered starting with 1, the analyzer ignores 0 at the beginning of the array. It
turns out that a value in the range [28..31] can be taken from the array.
Whether a year is a leap one or not, 1 is added to the number of days. However, it's also not interesting
for us now. Comparisons themselves are important:
time.month <= kDaysInMonth[time.month] + 1;
time.month <= kDaysInMonth[time.month];
The range [1..12] (number of a month) is compared with the number of days in the month.
Considering the fact that February always takes place in the first case (time.month == 2), we get that the
following ranges are compared:
• 2 <= 29
• [1..12] <= [28..31]
As you can see, the result of comparison is always true, this is what the PVS-Studio analyzer is warning
us about. Indeed, the code contains two identical typos. A day class member should have been used in
the left part of the expression instead of month.
The correct code should be as follows:
if (time.month == 2 && IsLeapYear(time.year)) {
return time.day <= kDaysInMonth[time.month] + 1;
} else {
return time.day <= kDaysInMonth[time.month];
}
The error considered here has already been described in the article February 31.
Symbolic Execution
In the previous section, there is a description of a method where the analyzer evaluates possible
variables' values. However, to find some errors, it is not necessary to know variables' values. Symbolic
Execution involves solution of equations in symbolic form.

I have not found a suitable demo example in our error database, so let's consider a synthetic code
example.
int Foo(int A, int B)
{
if (A == B)
return 10 / (A - B);
return 1;
}
The PVS-Studio analyzer issues a warning V609 / CWE-369 Divide by zero. Denominator 'A - B' == 0.
test.cpp 12
The values of A and B variables are not known to the analyzer. However, the analyzer knows that, when
the 10 / (A - B) expression is evaluated, the variables A and B are equal. Therefore, division by 0 will
occur.
I said that the values A and B are unknown. For the general case it is really so. However, if the analyzer
sees a function call with specific values of the actual arguments, it will take them into account. Let's
consider the example:
int Div(int X)
{
return 10 / X;
}
void Foo()
{
for (int i = 0; i < 5; ++i)
Div(i);
}
The PVS-Studio analyzer warns about dividing by zero: V609 CWE-628 Divide by zero. Denominator 'X'
== 0. The 'Div' function processes value '[0..4]'. Inspect the first argument. Check lines: 106, 110.
consoleapplication2017.cpp 106
Here a mixture of technologies is working: data flow analysis, symbolic execution, and automatic
method annotation (we will cover this technology in the next section). The analyzer sees that X variable
is used in the Div function as a divisor. On this basis, a special annotation is built for the Div function.
Further it is taken into account that in the function a range of values [0..4] is passed as the X argument.
The analyzer comes to a conclusion that a division by 0 has to occur.
Method Annotations
Our team has annotated thousands of functions and classes, given in:
• WinAPI
• standard C library
• standard template library (STL)

• glibc (GNU C Library)
• Qt
• MFC
• zlib
• libpng
• OpenSSL
• and so on
All functions are manually annotated, which allows us to specify many characteristics that are important
in terms of finding errors. For example, it is set that the size of the buffer passed to the function fread,
must not be less than the number of bytes to be read from the file. The relationship between the 2nd
and 3rd arguments, and the function's return value is also specified. It all looks as follows (you can click
on the picture to enlarge it):
Thanks to this annotation in the following code, which uses fread function, two errors will be revealed.
void Foo(FILE *f)
{
char buf[100];
size_t i = fread(buf, sizeof(char), 1000, f);
buf[i] = 1;
....
}
PVS-Studio warnings:
• V512 CWE-119 A call of the 'fread' function will lead to overflow of the buffer 'buf'. test.cpp 116
• V557 CWE-787 Array overrun is possible. The value of 'i' index could reach 1000. test.cpp 117
Firstly, the analyzer multiplied the 2nd and the 3rd actual argument and figured out that this function
can read up to 1000 bytes of data. In this case, the buffer size is only 100 bytes, and an overflow can
occur.

Secondly, since the function can read up to 1000 bytes, the range of possible values of the variable i is
equal to [0..1000]. Accordingly, accessing an array by the incorrect index can occur.
Let's take a look at another simple error example, identifying of which became possible thanks to the
markup of the memset function. Here we have a code fragment from the CryEngine V project.
void EnableFloatExceptions(....)
{
....
CONTEXT ctx;
memset(&ctx, sizeof(ctx), 0);
....
}
The PVS-Studio analyzer has found a typo: V575 The 'memset' function processes '0' elements. Inspect
the third argument. crythreadutil_win32.h 294
The 2nd and the 3rd arguments of function are mixed up. As a result, the function processes 0 bytes and
does nothing. The analyzer notices this anomaly and warns developers about it. We have previously
described this error in the article "Long-Awaited Check of CryEngine V".
The PVS-Studio analyzer is not limited to annotations specified by us manually. In addition, it tries to
create annotations by studying bodies of functions itself. This enables to find errors of incorrect function
usage. For example, the analyzer remembers that a function can return nullptr. If the pointer returned
by this function is used without prior verification, the analyzer will warn you about it. Example:
int GlobalInt;
int *Get()
{
return (rand() % 2) ? nullptr : &GlobalInt;
}
void Use()
{
*Get() = 1;
}
Warning: V522 CWE-690 There might be dereferencing of a potential null pointer 'Get()'. test.cpp 129
Note. You can approach searching for the error that we have just considered from the opposite
direction. You can remember nothing about the return value but analyze the Get function based on
knowledge of its actual arguments when you encounter a call to it. Such algorithm theoretically lets you
find more errors, but it has exponential complexity. Time of the program analysis increases in hundreds
to thousands of times, and we believe this approach is pointless from practical point of view. In PVS-
Studio, we develop the direction of automatic function annotation.

Pattern-Based Matching Analysis
At first glance, pattern-matching technology might seem the same as search using regular expressions.
Actually, this is not the case, and everything is much more complicated.
Firstly, as I have already told, regular expressions in general are no good. Secondly, analyzers work not
with text strings, but with syntax trees, allows recognizing more complex and higher-level patterns of
errors.
Let's look at two examples, one is simpler and other is more complicated. I found the first error when
checking the Android source code.
void TagMonitor::parseTagsToMonitor(String8 tagNames) {
std::lock_guard<std::mutex> lock(mMonitorMutex);
if (ssize_t idx = tagNames.find("3a") != -1) {
ssize_t end = tagNames.find(",", idx);
char* start = tagNames.lockBuffer(tagNames.size());
start[idx] = '0';
....
}
....
}
The PVS-Studio analyzer detects a classic error pattern related to wrong understanding by a
programmer of operation priority in C++: V593 / CWE-783 Consider reviewing the expression of the 'A =
B != C' kind. The expression is calculated as following: 'A = (B != C)'. TagMonitor.cpp 50
Look closely at this line:
if (ssize_t idx = tagNames.find("3a") != -1) {
The programmer assumes that first the assignment is executed and then the comparison with -1.
Comparison is actually happening in the first place. Classic. This error is covered in detail in the article on
the Android check (see the section "Other errors").
Now let's take a closer look at a high-level pattern matching variant.
static inline void sha1ProcessChunk(....)
{
....
quint8 chunkBuffer[64];
....
#ifdef SHA1_WIPE_VARIABLES
....
memset(chunkBuffer, 0, 64);
#endif
}

PVS-Studio warning: V597 CWE-14 The compiler could delete the 'memset' function call, which is used
to flush 'chunkBuffer' buffer. The RtlSecureZeroMemory() function should be used to erase the private
data. sha1.cpp 189
The crux of the problem lies in the fact that after null-filling the buffer using memset, this buffer is not
used anywhere else. When building the code with optimization flags, a compiler will decide that this
function call is redundant and will remove it. It has the right to do so, because in terms of C++ language,
a function call doesn't cause any observable effect at program flow. Immediately after filling the buffer
chunkBuffer the function sha1ProcessChunk finishes its work. As the buffer is created on the stack, it will
become unavailable after the function exits. Therefore, from the viewpoint of the compiler, it makes no
sense to fill it with zeros.
As a result, somewhere in the stack private data will remain that can lead to trouble. This topic is
regarded in detail in the article "Safe Clearing of Private Data".
This is an example of a high-level pattern matching. Firstly, the analyzer must be aware of the existence
of this security defect, classified according to the Common Weakness Enumeration as CWE-14: Compiler
Removal of Code to Clear Buffers.
Secondly, it must find all the places in code where the buffer is created on stack, cleared using memset,
and is not used anywhere else further on.
Conclusion
As you can see, static analysis is a very interesting and useful methodology. It allows you to fix a large
number of bugs and potential vulnerabilities at the earliest stages (see SAST). If you still don't fully
appreciate static analysis I invite you to read our blog where we regularly investigate errors found by
PVS-Studio in various projects. You will not be able to remain indifferent.
We will be glad to see your company among our clients and will help to make your applications
qualitative, reliable and safe.

Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Similar to Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities (20)

More from Andrey Karpov

More from Andrey Karpov (20)

Recently uploaded

Recently uploaded (20)

Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities