XpertSolvers: Your Partner in Building Innovative Software Solutions
The operation principles of PVS-Studio static code analyzer
1. The operation principles of PVS-
Studio static code analyzer
Authors:
Candidate of Engineering Sciences
Evgeniy Ryzhkov, evg@viva64.com
Candidate of Physico-Mathematical Sciences
Andrey Karpov, karpov@viva64.com
2. OOO "Program Verification Systems"
(www.viva64.com)
• Development, marketing and sales of a software product.
• Office: Tula, 200 km away from Moscow.
• Staff: 24 people.
3. PVS-Studio
• More than 320 diagnostics for C, C++
• More than 120 diagnostics for C#
• Windows
• Linux
• Plugin for Visual Studio
• Quick Start (compilation monitoring)
• SonarQube
4. Our achievements
• To let the world know about our product, we check open-source projects. By the
moment we have checked about 270 projects.
• A “side” effect: we found more than 10 000 bugs in open source projects, without
setting it as a goal.
• On the average there are 40 errors in a project - not that much.
• It is important to emphasize once more that this was a “side” effect. We don’t have a
goal to find as many errors as possible. Quite often, we stop when we find enough
errors for an article.
• Conclusion: it’s rather easy to check even unfamiliar projects and find errors in them.
6. We do not use formal grammar for analysis
• The analyzer works on a higher level
• We analyze the derivation tree
• To build the tree we rely on existing components:
• External preprocessor
• OpenC ++ library, which we improved with the development of C++ (actually
there is almost nothing left from OpenC++)
• When working with C# code we take Roslyn as the basis
7. We do not use methods of programs proofs.
• PVS-Studio has nothing to do with the Prototype Verification System
(PVS) http://pvs.csl.sri.com/
• PVS-Studio is a contraction of "Program Verification Systems" (OOO
"Program Verification Systems")
8. We do not use substring search (string matching)
and regular expressions
• A dead-end way
• It is of no use even in the simplest situations
• Example: if (A+B == A+B)
• A+B == B+A
• A+(B) == (A)+B
• ((A+B)) == A+B
• More fatal: types, object sizes, inheritance, variable values and so on.
9. What we USE
The details of C++ and C# analysis differ, we are not going to cover them
here
10. Pattern-based analysis
• Pattern matching based on the derivation tree
• It is used to search for fragments in the source code that are similar
to the known code patterns with an error
• The complexity of the diagnostics varies greatly
• In some cases these are empirical algorithms
11. if ((*path)[0]->e->dest->loop_father != path->last()->e->....)
{
delete_jump_thread_path (path);
e->aux = NULL;
ei_next (&ei;);
}
else
{
delete_jump_thread_path (path);
e->aux = NULL;
ei_next (&ei;);
}
A simple case: copy-paste
The GCC Project
V523 The 'then' statement is equivalent to the 'else' statement. tree-ssa-
threadupdate.c 2596
12. A more complicated case: check of a wrong
variable
public override Predicate JoinWith(Predicate other)
{
var right = other as PredicateNullness;
if (other != null)
{
if (this.value == right.value)
{
The CodeContracts Project
V3019 Possibly an incorrect variable is compared to null after type conversion
using 'as' keyword. Check variables 'other', 'right'. CallerInvariant.cs 189
13. Quite a complicated case: a badly written
macro
#define ICB2400_VPINFO_PORT_OFF(chan)
(ICB2400_VPINFO_OFF +
sizeof (isp_icb_2400_vpinfo_t) +
(chan * ICB2400_VPOPT_WRITE_SIZE))
off += ICB2400_VPINFO_PORT_OFF(chan - 1);
V733 It is possible that macro expansion resulted in incorrect evaluation
order. Check expression: chan - 1 * 20. isp.c 2301
The FreeBSD Project
14. Type inference
• The type inference based on the semantic model of the program
allows the analyzer to have full information about all variables and
statements in the code.
• It is important to detect errors
• It is important for exceptions
• The information about classes is especially important
15. Types are also important for bug detection
The Cocos2d-x project
WCHAR *gai_strerrorW(int ecode);
#define gai_strerror gai_strerrorW
fprintf(stderr, "net_listen error for %s: %s",
serv, gai_strerror(n));
V576 Incorrect format. Consider checking the fourth actual argument of
the 'fprintf' function. The pointer to string of char type symbols is
expected. ccconsole.cpp 341
16. Types are important for exceptions
// volatile the variable is assigned to itself
volatile int *ptr;
....
*ptr = *ptr; // No positive V570
17. The information about classes is especially
important: inheritance hierarchy, for instance
class sg_throwable : public std::exception { .... };
class sg_exception : public sg_throwable { .... };
if (!aInstall) {
sg_exception("missing argument to scheduleToUpdate");
}
V596 The object was created but it is not being used. The 'throw' keyword
could be missing: throw sg_exception(FOO); root.cxx 239
The FlightGear project
18. Symbolic execution
• The symbolic execution allows evaluating variable values that can lead
to errors, perform range checking of values.
• One of the most important mechanisms:
• Overflows
• Memory Leaks
• Array index out of bounds
• Null pointers/references
• Meaningless conditions
• Division by zero
• and so on…
19. The values of variables: the size of the array,
indices
Handle<YieldTermStructure> md0Yts() {
double q6mh[] = {
0.0001,0.0001,0.0001,0.0003,0.00055,0.0009,0.0014,0.0019,
0.0025,0.0031,0.00325,0.00313,0.0031,0.00307,0.00309,
........................................................
0.02336,0.02407,0.0245 }; 60 elements
....
for(int i=0;i<10+18+37;i++) { i < 65
q6m.push_back(
boost::shared_ptr<Quote>(new SimpleQuote(q6mh[i])));
The QuantLib
project
V557 Array overrun is possible. The value of 'i' index could reach 64.
markovfunctional.cpp 176
20. The values of variables: using conditions to
determine the range
std::string rangeTypeLabel(int idx)
{
const char* rangeTypeLabels[] = {"Self", "Touch", "Target"};
if (idx >= 0 && idx <= 3)
return rangeTypeLabels[idx];
else
return "Invalid";
}
V557 Array overrun is possible. The value of 'idx' index could reach 3.
esmtool labels.cpp 502
The OpenMW project
21. The values of functions
static inline size_t UnboxedTypeSize(JSValueType type)
{
switch (type) {
.......
default: return 0;
}
}
Minstruction *loadUnboxedProperty(size_t offset, ....)
{
size_t index = offset / UnboxedTypeSize(unboxedType);
The Thunderbird project
V609 Divide by zero. Denominator range [0..8]. ionbuilder.cpp 10922
22. The values of variables: pointers/references
if (providerName == null)
{
ProviderNotFoundException e =
new ProviderNotFoundException(
providerName.ToString(),
SessionStateCategory.CmdletProvider,
"ProviderNotFound",
SessionStateStrings.ProviderNotFound);
throw e;
V3080 Possible null dereference. Consider inspecting 'providerName'.
System.Management.Automation SessionStateProviderAPIs.cs 1004
The PowerShell Project
23. Method annotations
• Method annotations provides more information about the used
methods than can be obtained by analyzing only their signatures.
• C/C++. By this moment we have annotated 6570 functions (standard
C and C++ libraries, POSIX, MFC, Qt, ZLib and so on).
• C#. At the moment we have annotated 920 functions.
24. An example of annotating the memcmp
function
C_"int memcmp(const void *buf1, const void *buf2, size_t count);"
ADD(REENTERABLE | RET_USE | F_MEMCMP | STRCMP | HARD_TEST | INT_STATUS,
nullptr, nullptr, "memcmp", POINTER_1, POINTER_2, BYTE_COUNT);
• C_- an auxiliary control mechanism of annotations (unit tests)
• REENTERABLE - repetitive call with the same arguments will give the same result
• RET_USE - the result should be used
• F_MEMCMP - launch of certain checks for buffer out of bounds
• STR_CMP - the function returns 0 in case of equality
• HARD_TEST - a special function. Some programmers define their own functions in
their own namespace. Ignore namespace.
• INT_STATUS - explicitly compare the result with 1 or -1.
• POINTER_1, POINTER_2 - the pointers must be non-zero and different.
• BYTE_COUNT - this parameter specifies the number of bytes and must be > 0.
25. Annotation of memcmp: checking the result
bool operator()(const GUID& _Key1, const GUID& _Key2) const
{
return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1;
}
The CoreCLR project
V698 Expression 'memcmp(....) == -1' is incorrect. This function can
return not only the value '-1', but any negative value. Consider using
'memcmp(....) < 0' instead. sos util.cpp 142
26. Annotation of memcmp: storing the result
The Firebird project
V642 Saving the 'memcmp' function result inside the 'short' type variable is
inappropriate. The significant bits could be lost breaking the program's logic.
texttype.cpp 3
SSHORT TextType::compare(ULONG len1, const UCHAR* str1,
ULONG len2, const UCHAR* str2)
{
....
SSHORT cmp = memcmp(str1, str2, MIN(len1, len2));
if (cmp == 0)
cmp = (len1 < len2 ? -1 : (len1 > len2 ? 1 : 0));
return cmp;
}
27. Annotation of memcmp: wrong argument
The GLG3D project
V575 The 'memcmp' function processes '0' elements. Inspect
the 'third' argument. graphics3D matrix4.cpp 269
bool Matrix4::operator==(const Matrix4& other) const {
if (memcmp(this, &other, sizeof(Matrix4) == 0)) {
return true;
}
...
}
28. static int
psymbol_compare (const void *addr1, const void *addr2,
int length)
{
struct partial_symbol *sym1 = (struct partial_symbol *) addr1;
struct partial_symbol *sym2 = (struct partial_symbol *) addr2;
return (memcmp (&sym1->ginfo.value, &sym1->ginfo.value,
sizeof (sym1->ginfo.value)) == 0
&& .......
Annotation of memcmp: different arguments
The GDB Project
V549 The first argument of 'memcmp' function is equal to the second
argument. psymtab.c 1580
30. Annotation of memcmp: no status
The PHP project
V501 There are identical sub-expressions '!memcmp("auto", charset_hint,
4)' to the left and to the right of the '||' operator. html.c 396
if ((len == 4) /* sizeof (none|auto|pass) */ &&
(!memcmp("pass", charset_hint, 4) ||
!memcmp("auto", charset_hint, 4) ||
!memcmp("auto", charset_hint, 4)))
31. Annotation of custom functions
• Almost no support (except certain elements, as for example our own
printf function)
• There is no sense to develop this mechanism
• No one will spend months doing the markup of large projects
• The analyzer must work immediately
32. Testing the analyzer
• Testing the analyzer is the most important part of the development
process
• The hardest part about static analysis: not to complain
• A large test base:
• C++ Windows (Visual C++): 120 projects
• C++ Linux (GCC): 34 more projects
• C# Windows: 54 projects
33. We can send a more detailed version of the
presentation
• Write to us: support@viva64.com
• Follow on Twitter: @Code_Analysis
• Download PVS-Studio for Windows:
http://www.viva64.com/en/pvs-studio/
• Download PVS-Studio for Linux:
http://www.viva64.com/en/pvs-studio-download-linux/