How to add a new diagnostic rule into PVS-Studio?


Published on

We are often asked the question how one can add one's own diagnostic rule into our static analyzer PVS-Studio. And we always answer that it is very simple: "You just need to write us a letter with your request and we will add this rule into the analyzer". This interface of adding new rules is convenient to users. This is the best and most convenient interface actually. It's not so easy to do it on your own as it may seem. In this post, I will show you the bottom of the iceberg implied in the words "we have added this simple rule".

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

How to add a new diagnostic rule into PVS-Studio?

  1. 1. How to add a new diagnostic rule intoPVS-Studio? Days from developers life...Author: Evgeniy RyzhkovDate: 14.09.2011We are often asked the question how one can add ones own diagnostic rule into our static analyzerPVS-Studio. And we always answer that it is very simple: "You just need to write us a letter with yourrequest and we will add this rule into the analyzer". This interface of adding new rules is convenient tousers. This is the best and most convenient interface actually. Its not so easy to do it on your own as itmay seem. In this post, I will show you the bottom of the iceberg implied in the words "we have addedthis simple rule".OK, lets be honest. PVS-Studio doesnt have a mechanism to enable users to write their own rules. Thisis, first of all, an architectural restriction. Some users may get upset because of it. But actually suchpeople dont understand what they want because the mechanism of adding a new rule is rathercomplicated in practice.For instance, there is rule V536: Be advised that the utilized constant value is represented by an octalform. It is intended for detecting errors like the following one found in Miranda IM:static const struct _tag_cpltbl{ unsigned cp; const char* mimecp;} cptbl[] ={
  2. 2. { 037, "IBM037" }, // IBM EBCDIC US-Canada { 437, "IBM437" }, // OEM United States { 500, "IBM500" }, // IBM EBCDIC International { 708, "ASMO-708" }, // Arabic (ASMO 708) ...}Value 037 is written here instead of 37 just for the sake of alignment, although it is really number 31 indecimal form because 037 is an octal number according to the C++ rules. The error is rather simple fordiagnosis, isnt it? Indeed: search for a constant, check if it begins with 0 and generate a diagnosticmessage. This diagnostic rule can be implemented in an hour. And a static analyzers user (a skilledprogrammer but novice regarding development of such tools) is theoretically ready to make this rulehimself/herself.But he/she will fail, or rather he/she will succeed but the result wont satisfy him/her because of toomany false reports.We (PVS-Studios developers) do it in the following way. Before starting to implement a rule, we at firstwrite unit-tests for it where we add special marks for those lines that should trigger the rule and leavethose lines that shouldnt. Then we write the first version of the code for diagnostics. As this new rulepasses successfully all the unit-tests written particularly for it, we run full unit-tests to make sure thatthe new rule hasnt broken anything. The full unit-tests are run for several tens of minutes. Surely, yourarely manage "not to break anything" at the first time, so iterations are run again and again until theunit-tests are passed completely.The unit-tests are followed by the next step when we run the rule on real projects. We have about 70projects in our test base (for 2011); these are famous and not very famous open source projects we testour analyzer on. We have a self-made launcher that opens projects in Visual Studio, launches theanalyzer, saves the log, compares it to the master copy and shows the differences (messagesdisappeared, messages added and messages changed). One cycle of running all the projects for VisualStudio 2005 takes 4 hours on a computer with four cores and 8 Gbytes of memory, the analysis processbeing parallelized. Judging by the analysis results we can see how well the new rule is integrated.Usually we start to study the differences without waiting for analysis to complete because sometimesits clear if something has gone wrong. But we may stop running tests even if everything works well andnothing is broken. It happens when there are too many false reports. If we see that a diagnostic rule istriggered too often, we make exceptions from this rule.Lets return to the search of incorrect octal numbers, the V536 rule. It has quite a number of exceptions.These are the situations when the warning is not generated: 1. The constants value is below 8. 2. The number is defined through #define. 3. There are more than two octal numbers in one block (the rule was triggered more than twice), except for 0 and char type. 4. This number is used as an argument in functions: _open, mknod, open, wxMkdir, wxFileName::Mkdir, wxFileName::AppendDir, chmod.
  3. 3. 5. This is a nested block of numbers. We check only the top one. For example: { {090}, {091}, {092}, {093} }. 6. The number is <= 0777 and acts as a part of a statement that contains words: file, File. 7. The number is an argument of a function one parameter of which contains the string "%o".Sometimes you cannot predict what exceptions there will be before running the rule on real projects.Thats why the iteration "making an exception - running tests - analyzing results" may be repeated manytimes.Then, when all the exceptions are implemented and the number of false reports is tolerable, the testsresults (saved log-files of detected errors) are defined as a master copy. After that we run the same testsfor other versions of Visual Studio. At present (in 2011), we have support for the three versions of VisualStudio: Visual Studio 2005/2008/2010. Tests in them run for 4/4.5/5 hours correspondingly, whichmakes the total time of 14-15 hours. It may appear that one and the same code causes a differentnumber of diagnostic messages depending on Visual Studios version. Usually it is determined bydifferences in header files of different SDKs. We try to eliminate the differences so that the tests run inall the versions of Visual Studio give identical results.Why is it so important to run all those tiresome tests? Is it only because of false reports? No, its not onlybecause of them. The point is that you cant take account of all the possible types of constructs whilemaking a rule and write it so that it works correctly and moreover doesnt cause the analyzer to crash.This is what we need all those numerous tests for!What strange constructs do we mean? Here you are a couple of examples.Do you know that you may define a variable, for instance, in a way like this:int const unsigned static a22 = 0;Do you know that __int64 can be a variables name?int __identifier(__int64);Do you know that braces are not necessary after switch?switch (0) if (X) Foo();Even if you do not use such constructs in code yourself, there can be some in libraries you use.Finally, having passed all the tests, we may start working on documentation. For each diagnostic rule wewrite a help entry in Russian and English. Translation to English also takes some time. Now that the codeis debugged, the tests are passed and documentation is written, the rule appears in the next release ofPVS-Studio.So, the development cycle of a new diagnostic rule can be briefly represented as follows: 1. An idea of a new rule: either we invent it ourselves, or our users tell us, or we manage to spot it somewhere. 2. Formulating the rule. 3. Implementation of the rule.
  4. 4. 4. Testing of every kind. 5. Analyzing the results. Variants are possible: a. Improving the rules formulation, adding exceptions, going to step 2. b. Rules implementation is considered established; we may write and translate the documentation.The total time of implementing one rule is several days. Usually it is from 2 to 5 days. Naturally, we haveenough experience now and can predict what exceptions from a rule we may need and implement themright away. We also manage to reduce the number of test runs. But anyway, the key idea is that onlyrunning tests you can formulate the exceptions and eliminate most of the false reports.Now lets return to the question of how users can add rules themselves. As you can see, there are tworeasons why its difficult to do: they dont have enough skill to make out good exceptions and a goodtest base to run the rule on it.A question arises: "Then why are there means of creating user-own rules in some famous andrespectable static analyzers?" Perhaps, we cant do it? Partly because of this, yes. But the mechanism ofcreating user-own rules serves only one purpose - to compose "rival-comparison matrixes":Well, users will ask, wont we be enabled to create our rules in PVS-Studio some day? Some day we willmanage to implement this mechanism and not to lose in comparison matrixes :-). But for now, all whowant to create a new rule will receive a link to this post with the words: "We have the best interface foradding user rules - just write us a letter!"