The operation principles of PVS-
Studio static code analyzer
Authors:
Candidate of Engineering Sciences
Evgeniy Ryzhkov, evg@viva64.com
Candidate of Physico-Mathematical Sciences
Andrey Karpov, karpov@viva64.com
OOO "Program Verification Systems"
(www.viva64.com)
• Development, marketing and sales of a software product.
• Office: Tula, 200 km away from Moscow.
• Staff: 24 people.
PVS-Studio
• More than 320 diagnostics for C, C++
• More than 120 diagnostics for C#
• Windows
• Linux
• Plugin for Visual Studio
• Quick Start (compilation monitoring)
• SonarQube
Our achievements
• To let the world know about our product, we check open-source projects. By the
moment we have checked about 270 projects.
• A “side” effect: we found more than 10 000 bugs in open source projects, without
setting it as a goal.
• On the average there are 40 errors in a project - not that much.
• It is important to emphasize once more that this was a “side” effect. We don’t have a
goal to find as many errors as possible. Quite often, we stop when we find enough
errors for an article.
• Conclusion: it’s rather easy to check even unfamiliar projects and find errors in them.
In the beginning: what we DO NOT USE
We do not use formal grammar for analysis
• The analyzer works on a higher level
• We analyze the derivation tree
• To build the tree we rely on existing components:
• External preprocessor
• OpenC ++ library, which we improved with the development of C++ (actually
there is almost nothing left from OpenC++)
• When working with C# code we take Roslyn as the basis
We do not use methods of programs proofs.
• PVS-Studio has nothing to do with the Prototype Verification System
(PVS) http://pvs.csl.sri.com/
• PVS-Studio is a contraction of "Program Verification Systems" (OOO
"Program Verification Systems")
We do not use substring search (string matching)
and regular expressions
• A dead-end way
• It is of no use even in the simplest situations
• Example: if (A+B == A+B)
• A+B == B+A
• A+(B) == (A)+B
• ((A+B)) == A+B
• More fatal: types, object sizes, inheritance, variable values and so on.
What we USE
The details of C++ and C# analysis differ, we are not going to cover them
here
Pattern-based analysis
• Pattern matching based on the derivation tree
• It is used to search for fragments in the source code that are similar
to the known code patterns with an error
• The complexity of the diagnostics varies greatly
• In some cases these are empirical algorithms
if ((*path)[0]->e->dest->loop_father != path->last()->e->....)
{
delete_jump_thread_path (path);
e->aux = NULL;
ei_next (&ei;);
}
else
{
delete_jump_thread_path (path);
e->aux = NULL;
ei_next (&ei;);
}
A simple case: copy-paste
The GCC Project
V523 The 'then' statement is equivalent to the 'else' statement. tree-ssa-
threadupdate.c 2596
A more complicated case: check of a wrong
variable
public override Predicate JoinWith(Predicate other)
{
var right = other as PredicateNullness;
if (other != null)
{
if (this.value == right.value)
{
The CodeContracts Project
V3019 Possibly an incorrect variable is compared to null after type conversion
using 'as' keyword. Check variables 'other', 'right'. CallerInvariant.cs 189
Quite a complicated case: a badly written
macro
#define ICB2400_VPINFO_PORT_OFF(chan) 
(ICB2400_VPINFO_OFF + 
sizeof (isp_icb_2400_vpinfo_t) + 
(chan * ICB2400_VPOPT_WRITE_SIZE))
off += ICB2400_VPINFO_PORT_OFF(chan - 1);
V733 It is possible that macro expansion resulted in incorrect evaluation
order. Check expression: chan - 1 * 20. isp.c 2301
The FreeBSD Project
Type inference
• The type inference based on the semantic model of the program
allows the analyzer to have full information about all variables and
statements in the code.
• It is important to detect errors
• It is important for exceptions
• The information about classes is especially important
Types are also important for bug detection
The Cocos2d-x project
WCHAR *gai_strerrorW(int ecode);
#define gai_strerror gai_strerrorW
fprintf(stderr, "net_listen error for %s: %s",
serv, gai_strerror(n));
V576 Incorrect format. Consider checking the fourth actual argument of
the 'fprintf' function. The pointer to string of char type symbols is
expected. ccconsole.cpp 341
Types are important for exceptions
// volatile the variable is assigned to itself
volatile int *ptr;
....
*ptr = *ptr; // No positive V570
The information about classes is especially
important: inheritance hierarchy, for instance
class sg_throwable : public std::exception { .... };
class sg_exception : public sg_throwable { .... };
if (!aInstall) {
sg_exception("missing argument to scheduleToUpdate");
}
V596 The object was created but it is not being used. The 'throw' keyword
could be missing: throw sg_exception(FOO); root.cxx 239
The FlightGear project
Symbolic execution
• The symbolic execution allows evaluating variable values that can lead
to errors, perform range checking of values.
• One of the most important mechanisms:
• Overflows
• Memory Leaks
• Array index out of bounds
• Null pointers/references
• Meaningless conditions
• Division by zero
• and so on…
The values of variables: the size of the array,
indices
Handle<YieldTermStructure> md0Yts() {
double q6mh[] = {
0.0001,0.0001,0.0001,0.0003,0.00055,0.0009,0.0014,0.0019,
0.0025,0.0031,0.00325,0.00313,0.0031,0.00307,0.00309,
........................................................
0.02336,0.02407,0.0245 }; 60 elements
....
for(int i=0;i<10+18+37;i++) { i < 65
q6m.push_back(
boost::shared_ptr<Quote>(new SimpleQuote(q6mh[i])));
The QuantLib
project
V557 Array overrun is possible. The value of 'i' index could reach 64.
markovfunctional.cpp 176
The values of variables: using conditions to
determine the range
std::string rangeTypeLabel(int idx)
{
const char* rangeTypeLabels[] = {"Self", "Touch", "Target"};
if (idx >= 0 && idx <= 3)
return rangeTypeLabels[idx];
else
return "Invalid";
}
V557 Array overrun is possible. The value of 'idx' index could reach 3.
esmtool labels.cpp 502
The OpenMW project
The values of functions
static inline size_t UnboxedTypeSize(JSValueType type)
{
switch (type) {
.......
default: return 0;
}
}
Minstruction *loadUnboxedProperty(size_t offset, ....)
{
size_t index = offset / UnboxedTypeSize(unboxedType);
The Thunderbird project
V609 Divide by zero. Denominator range [0..8]. ionbuilder.cpp 10922
The values of variables: pointers/references
if (providerName == null)
{
ProviderNotFoundException e =
new ProviderNotFoundException(
providerName.ToString(),
SessionStateCategory.CmdletProvider,
"ProviderNotFound",
SessionStateStrings.ProviderNotFound);
throw e;
V3080 Possible null dereference. Consider inspecting 'providerName'.
System.Management.Automation SessionStateProviderAPIs.cs 1004
The PowerShell Project
Method annotations
• Method annotations provides more information about the used
methods than can be obtained by analyzing only their signatures.
• C/C++. By this moment we have annotated 6570 functions (standard
C and C++ libraries, POSIX, MFC, Qt, ZLib and so on).
• C#. At the moment we have annotated 920 functions.
An example of annotating the memcmp
function
C_"int memcmp(const void *buf1, const void *buf2, size_t count);"
ADD(REENTERABLE | RET_USE | F_MEMCMP | STRCMP | HARD_TEST | INT_STATUS,
nullptr, nullptr, "memcmp", POINTER_1, POINTER_2, BYTE_COUNT);
• C_- an auxiliary control mechanism of annotations (unit tests)
• REENTERABLE - repetitive call with the same arguments will give the same result
• RET_USE - the result should be used
• F_MEMCMP - launch of certain checks for buffer out of bounds
• STR_CMP - the function returns 0 in case of equality
• HARD_TEST - a special function. Some programmers define their own functions in
their own namespace. Ignore namespace.
• INT_STATUS - explicitly compare the result with 1 or -1.
• POINTER_1, POINTER_2 - the pointers must be non-zero and different.
• BYTE_COUNT - this parameter specifies the number of bytes and must be > 0.
Annotation of memcmp: checking the result
bool operator()(const GUID& _Key1, const GUID& _Key2) const
{
return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1;
}
The CoreCLR project
V698 Expression 'memcmp(....) == -1' is incorrect. This function can
return not only the value '-1', but any negative value. Consider using
'memcmp(....) < 0' instead. sos util.cpp 142
Annotation of memcmp: storing the result
The Firebird project
V642 Saving the 'memcmp' function result inside the 'short' type variable is
inappropriate. The significant bits could be lost breaking the program's logic.
texttype.cpp 3
SSHORT TextType::compare(ULONG len1, const UCHAR* str1,
ULONG len2, const UCHAR* str2)
{
....
SSHORT cmp = memcmp(str1, str2, MIN(len1, len2));
if (cmp == 0)
cmp = (len1 < len2 ? -1 : (len1 > len2 ? 1 : 0));
return cmp;
}
Annotation of memcmp: wrong argument
The GLG3D project
V575 The 'memcmp' function processes '0' elements. Inspect
the 'third' argument. graphics3D matrix4.cpp 269
bool Matrix4::operator==(const Matrix4& other) const {
if (memcmp(this, &other, sizeof(Matrix4) == 0)) {
return true;
}
...
}
static int
psymbol_compare (const void *addr1, const void *addr2,
int length)
{
struct partial_symbol *sym1 = (struct partial_symbol *) addr1;
struct partial_symbol *sym2 = (struct partial_symbol *) addr2;
return (memcmp (&sym1->ginfo.value, &sym1->ginfo.value,
sizeof (sym1->ginfo.value)) == 0
&& .......
Annotation of memcmp: different arguments
The GDB Project
V549 The first argument of 'memcmp' function is equal to the second
argument. psymtab.c 1580
dst_s_read_private_key_file(....)
{
....
if (memcmp(in_buff, "Private-key-format: v", 20) != 0)
goto fail;
....
} 21 character
Annotation of memcmp: buffer underrun
The Haiku project
V512 A call of the 'memcmp' function will lead to underflow of the
buffer '"Private-key-format: v"'. dst_api.c 858
Annotation of memcmp: no status
The PHP project
V501 There are identical sub-expressions '!memcmp("auto", charset_hint,
4)' to the left and to the right of the '||' operator. html.c 396
if ((len == 4) /* sizeof (none|auto|pass) */ &&
(!memcmp("pass", charset_hint, 4) ||
!memcmp("auto", charset_hint, 4) ||
!memcmp("auto", charset_hint, 4)))
Annotation of custom functions
• Almost no support (except certain elements, as for example our own
printf function)
• There is no sense to develop this mechanism
• No one will spend months doing the markup of large projects
• The analyzer must work immediately
Testing the analyzer
• Testing the analyzer is the most important part of the development
process
• The hardest part about static analysis: not to complain
• A large test base:
• C++ Windows (Visual C++): 120 projects
• C++ Linux (GCC): 34 more projects
• C# Windows: 54 projects
We can send a more detailed version of the
presentation
• Write to us: support@viva64.com
• Follow on Twitter: @Code_Analysis
• Download PVS-Studio for Windows:
http://www.viva64.com/en/pvs-studio/
• Download PVS-Studio for Linux:
http://www.viva64.com/en/pvs-studio-download-linux/

The operation principles of PVS-Studio static code analyzer

  • 1.
    The operation principlesof PVS- Studio static code analyzer Authors: Candidate of Engineering Sciences Evgeniy Ryzhkov, evg@viva64.com Candidate of Physico-Mathematical Sciences Andrey Karpov, karpov@viva64.com
  • 2.
    OOO "Program VerificationSystems" (www.viva64.com) • Development, marketing and sales of a software product. • Office: Tula, 200 km away from Moscow. • Staff: 24 people.
  • 3.
    PVS-Studio • More than320 diagnostics for C, C++ • More than 120 diagnostics for C# • Windows • Linux • Plugin for Visual Studio • Quick Start (compilation monitoring) • SonarQube
  • 4.
    Our achievements • Tolet the world know about our product, we check open-source projects. By the moment we have checked about 270 projects. • A “side” effect: we found more than 10 000 bugs in open source projects, without setting it as a goal. • On the average there are 40 errors in a project - not that much. • It is important to emphasize once more that this was a “side” effect. We don’t have a goal to find as many errors as possible. Quite often, we stop when we find enough errors for an article. • Conclusion: it’s rather easy to check even unfamiliar projects and find errors in them.
  • 5.
    In the beginning:what we DO NOT USE
  • 6.
    We do notuse formal grammar for analysis • The analyzer works on a higher level • We analyze the derivation tree • To build the tree we rely on existing components: • External preprocessor • OpenC ++ library, which we improved with the development of C++ (actually there is almost nothing left from OpenC++) • When working with C# code we take Roslyn as the basis
  • 7.
    We do notuse methods of programs proofs. • PVS-Studio has nothing to do with the Prototype Verification System (PVS) http://pvs.csl.sri.com/ • PVS-Studio is a contraction of "Program Verification Systems" (OOO "Program Verification Systems")
  • 8.
    We do notuse substring search (string matching) and regular expressions • A dead-end way • It is of no use even in the simplest situations • Example: if (A+B == A+B) • A+B == B+A • A+(B) == (A)+B • ((A+B)) == A+B • More fatal: types, object sizes, inheritance, variable values and so on.
  • 9.
    What we USE Thedetails of C++ and C# analysis differ, we are not going to cover them here
  • 10.
    Pattern-based analysis • Patternmatching based on the derivation tree • It is used to search for fragments in the source code that are similar to the known code patterns with an error • The complexity of the diagnostics varies greatly • In some cases these are empirical algorithms
  • 11.
    if ((*path)[0]->e->dest->loop_father !=path->last()->e->....) { delete_jump_thread_path (path); e->aux = NULL; ei_next (&ei;); } else { delete_jump_thread_path (path); e->aux = NULL; ei_next (&ei;); } A simple case: copy-paste The GCC Project V523 The 'then' statement is equivalent to the 'else' statement. tree-ssa- threadupdate.c 2596
  • 12.
    A more complicatedcase: check of a wrong variable public override Predicate JoinWith(Predicate other) { var right = other as PredicateNullness; if (other != null) { if (this.value == right.value) { The CodeContracts Project V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'other', 'right'. CallerInvariant.cs 189
  • 13.
    Quite a complicatedcase: a badly written macro #define ICB2400_VPINFO_PORT_OFF(chan) (ICB2400_VPINFO_OFF + sizeof (isp_icb_2400_vpinfo_t) + (chan * ICB2400_VPOPT_WRITE_SIZE)) off += ICB2400_VPINFO_PORT_OFF(chan - 1); V733 It is possible that macro expansion resulted in incorrect evaluation order. Check expression: chan - 1 * 20. isp.c 2301 The FreeBSD Project
  • 14.
    Type inference • Thetype inference based on the semantic model of the program allows the analyzer to have full information about all variables and statements in the code. • It is important to detect errors • It is important for exceptions • The information about classes is especially important
  • 15.
    Types are alsoimportant for bug detection The Cocos2d-x project WCHAR *gai_strerrorW(int ecode); #define gai_strerror gai_strerrorW fprintf(stderr, "net_listen error for %s: %s", serv, gai_strerror(n)); V576 Incorrect format. Consider checking the fourth actual argument of the 'fprintf' function. The pointer to string of char type symbols is expected. ccconsole.cpp 341
  • 16.
    Types are importantfor exceptions // volatile the variable is assigned to itself volatile int *ptr; .... *ptr = *ptr; // No positive V570
  • 17.
    The information aboutclasses is especially important: inheritance hierarchy, for instance class sg_throwable : public std::exception { .... }; class sg_exception : public sg_throwable { .... }; if (!aInstall) { sg_exception("missing argument to scheduleToUpdate"); } V596 The object was created but it is not being used. The 'throw' keyword could be missing: throw sg_exception(FOO); root.cxx 239 The FlightGear project
  • 18.
    Symbolic execution • Thesymbolic execution allows evaluating variable values that can lead to errors, perform range checking of values. • One of the most important mechanisms: • Overflows • Memory Leaks • Array index out of bounds • Null pointers/references • Meaningless conditions • Division by zero • and so on…
  • 19.
    The values ofvariables: the size of the array, indices Handle<YieldTermStructure> md0Yts() { double q6mh[] = { 0.0001,0.0001,0.0001,0.0003,0.00055,0.0009,0.0014,0.0019, 0.0025,0.0031,0.00325,0.00313,0.0031,0.00307,0.00309, ........................................................ 0.02336,0.02407,0.0245 }; 60 elements .... for(int i=0;i<10+18+37;i++) { i < 65 q6m.push_back( boost::shared_ptr<Quote>(new SimpleQuote(q6mh[i]))); The QuantLib project V557 Array overrun is possible. The value of 'i' index could reach 64. markovfunctional.cpp 176
  • 20.
    The values ofvariables: using conditions to determine the range std::string rangeTypeLabel(int idx) { const char* rangeTypeLabels[] = {"Self", "Touch", "Target"}; if (idx >= 0 && idx <= 3) return rangeTypeLabels[idx]; else return "Invalid"; } V557 Array overrun is possible. The value of 'idx' index could reach 3. esmtool labels.cpp 502 The OpenMW project
  • 21.
    The values offunctions static inline size_t UnboxedTypeSize(JSValueType type) { switch (type) { ....... default: return 0; } } Minstruction *loadUnboxedProperty(size_t offset, ....) { size_t index = offset / UnboxedTypeSize(unboxedType); The Thunderbird project V609 Divide by zero. Denominator range [0..8]. ionbuilder.cpp 10922
  • 22.
    The values ofvariables: pointers/references if (providerName == null) { ProviderNotFoundException e = new ProviderNotFoundException( providerName.ToString(), SessionStateCategory.CmdletProvider, "ProviderNotFound", SessionStateStrings.ProviderNotFound); throw e; V3080 Possible null dereference. Consider inspecting 'providerName'. System.Management.Automation SessionStateProviderAPIs.cs 1004 The PowerShell Project
  • 23.
    Method annotations • Methodannotations provides more information about the used methods than can be obtained by analyzing only their signatures. • C/C++. By this moment we have annotated 6570 functions (standard C and C++ libraries, POSIX, MFC, Qt, ZLib and so on). • C#. At the moment we have annotated 920 functions.
  • 24.
    An example ofannotating the memcmp function C_"int memcmp(const void *buf1, const void *buf2, size_t count);" ADD(REENTERABLE | RET_USE | F_MEMCMP | STRCMP | HARD_TEST | INT_STATUS, nullptr, nullptr, "memcmp", POINTER_1, POINTER_2, BYTE_COUNT); • C_- an auxiliary control mechanism of annotations (unit tests) • REENTERABLE - repetitive call with the same arguments will give the same result • RET_USE - the result should be used • F_MEMCMP - launch of certain checks for buffer out of bounds • STR_CMP - the function returns 0 in case of equality • HARD_TEST - a special function. Some programmers define their own functions in their own namespace. Ignore namespace. • INT_STATUS - explicitly compare the result with 1 or -1. • POINTER_1, POINTER_2 - the pointers must be non-zero and different. • BYTE_COUNT - this parameter specifies the number of bytes and must be > 0.
  • 25.
    Annotation of memcmp:checking the result bool operator()(const GUID& _Key1, const GUID& _Key2) const { return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1; } The CoreCLR project V698 Expression 'memcmp(....) == -1' is incorrect. This function can return not only the value '-1', but any negative value. Consider using 'memcmp(....) < 0' instead. sos util.cpp 142
  • 26.
    Annotation of memcmp:storing the result The Firebird project V642 Saving the 'memcmp' function result inside the 'short' type variable is inappropriate. The significant bits could be lost breaking the program's logic. texttype.cpp 3 SSHORT TextType::compare(ULONG len1, const UCHAR* str1, ULONG len2, const UCHAR* str2) { .... SSHORT cmp = memcmp(str1, str2, MIN(len1, len2)); if (cmp == 0) cmp = (len1 < len2 ? -1 : (len1 > len2 ? 1 : 0)); return cmp; }
  • 27.
    Annotation of memcmp:wrong argument The GLG3D project V575 The 'memcmp' function processes '0' elements. Inspect the 'third' argument. graphics3D matrix4.cpp 269 bool Matrix4::operator==(const Matrix4& other) const { if (memcmp(this, &other, sizeof(Matrix4) == 0)) { return true; } ... }
  • 28.
    static int psymbol_compare (constvoid *addr1, const void *addr2, int length) { struct partial_symbol *sym1 = (struct partial_symbol *) addr1; struct partial_symbol *sym2 = (struct partial_symbol *) addr2; return (memcmp (&sym1->ginfo.value, &sym1->ginfo.value, sizeof (sym1->ginfo.value)) == 0 && ....... Annotation of memcmp: different arguments The GDB Project V549 The first argument of 'memcmp' function is equal to the second argument. psymtab.c 1580
  • 29.
    dst_s_read_private_key_file(....) { .... if (memcmp(in_buff, "Private-key-format:v", 20) != 0) goto fail; .... } 21 character Annotation of memcmp: buffer underrun The Haiku project V512 A call of the 'memcmp' function will lead to underflow of the buffer '"Private-key-format: v"'. dst_api.c 858
  • 30.
    Annotation of memcmp:no status The PHP project V501 There are identical sub-expressions '!memcmp("auto", charset_hint, 4)' to the left and to the right of the '||' operator. html.c 396 if ((len == 4) /* sizeof (none|auto|pass) */ && (!memcmp("pass", charset_hint, 4) || !memcmp("auto", charset_hint, 4) || !memcmp("auto", charset_hint, 4)))
  • 31.
    Annotation of customfunctions • Almost no support (except certain elements, as for example our own printf function) • There is no sense to develop this mechanism • No one will spend months doing the markup of large projects • The analyzer must work immediately
  • 32.
    Testing the analyzer •Testing the analyzer is the most important part of the development process • The hardest part about static analysis: not to complain • A large test base: • C++ Windows (Visual C++): 120 projects • C++ Linux (GCC): 34 more projects • C# Windows: 54 projects
  • 33.
    We can senda more detailed version of the presentation • Write to us: support@viva64.com • Follow on Twitter: @Code_Analysis • Download PVS-Studio for Windows: http://www.viva64.com/en/pvs-studio/ • Download PVS-Studio for Linux: http://www.viva64.com/en/pvs-studio-download-linux/