Difficulties of comparing code analyzers, or don't forget about usability


Published on

Users' desire to compare different code analyzers is natural and understandable. However, it's not so easy to fulfill this desire as it may seem at first sight. The point is that you don't know what particular factors must be compared.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Difficulties of comparing code analyzers, or don't forget about usability

  1. 1. Difficulties of comparing code analyzers,or dont forget about usabilityAuthors: Evgeniy Ryzhkov, Andrey KarpovDate: 31.03.2011AbstractUsers desire to compare different code analyzers is natural and understandable. However, its not soeasy to fulfill this desire as it may seem at first sight. The point is that you dont know what particularfactors must be compared.IntroductionIf we eliminate such quite ridiculous ideas like "we should compare the number of diagnosable errors"or "we should compare the number of tool-generated messages", then even the reasonable parameter"signal-to-noise ratio" doesnt seem to be an ideal criterion of estimating code analyzers.You doubt that its unreasonable to compare the mentioned parameters? Here you are some examples.What parameters are just unreasonable to compareLets take a simple (at first sight) characteristic like the number of diagnostics. It seems that the morediagnostics, the better. But the general number of rules doesnt matter for the end user who exploits aparticular set of operating systems and compilers. Diagnostic rules which are relevant to systems,libraries and compilers he doesnt use wont give him anything useful. They even disturb himoverloading the settings system and documentation, and complicate use and integration of the tool.Here you an analogy: say, a man comes in a store to buy a heater. He is interested in the domesticappliances department and its good if this department has a wide range of goods. But the customerdoesnt need other departments. Its OK if he can buy a inflatable boat, cell phone or chair in this store.But the inflatable boats department doesnt enlarge the range of heaters anyway.Take, for instance, the Klockwork tool that supports a lot of various systems, including exotic ones. Oneof them has a compiler that easily "swallows" this code:inline int x;The Klocwork analyzer has a special diagnostic message to detect this anomaly in code: "The inlinekeyword is applied to something other than a function or method". Well, it seems good to have such adiagnostic. But developers using the Microsoft Visual C++ compiler or any other adequate compilerwont benefit from this diagnostic anyhow. Visual C++ simply doesnt compile this code: "error C2433: x: inline not permitted on data declarations".Another example. Some compilers provide poor support of the bool type. So Klockwork may warn youwhen a class member is assigned the bool type: "PORTING.STRUCT.BOOL: This checker detectssituations in which a struct/class has a bool member".
  2. 2. "They wrote bool in class! How awful..." Its clear that only few developers will benefit from having thisdiagnostic message.There are plenty of such examples. So it turns out that the number of diagnostic rules in no way isrelated to the number of errors an analyzer can detect in a particular project. An analyzer implementing100 diagnostics and intended for Windows-applications can find much more errors in a project built withMicrosoft Visual Studio than a cross-platform analyzer implementing 1000 diagnostics.The conclusion is the number of diagnostic rules cannot be relevant when comparing analyzers byusability.You may say: "OK, lets compare the number of diagnostics relevant for a particular system then. Forinstance, lets single out all the rules to search for errors in Windows-applications". But this approachdoesnt work either. There are two reasons for that:First, it may be that some diagnostic is implemented in one diagnostic rule in some analyzer and inseveral rules in some other analyzer. If you compare them by the number of diagnostics, the latteranalyzer seems better although they both have the same functional to detect a certain type of errors.Second, implementation of certain diagnostics may be of different quality. For instance, nearly all theanalyzers have the search of "magic numbers". But, say, some analyzer can detect only magic numbersdangerous from the viewpoint of code migration to 64-bit systems (4, 8, 32, etc) and some other simplydetects all the magic numbers (1, 2, 3, etc). So it wont do if we only write a plus mark for each analyzerin the comparison table.They also like to take the characteristic of tools speed or number of code lines processessed per second.But its unreasonable from the viewpoint of practice either. There is no relation between the speed of acode analyzer and speed of analysis performed by man! First, code analysis is often launchedautomatically during night builds. You just must "be in time" for the morning. And second, they oftenforget about the usability parameter when comparing analyzers. Well, lets study this issue in detail.Tools usability is very important for adequate comparisonThe point is that usability of a tool influences the practice of real use of code analyzers very much...We have checked the eMule project recently with two code analyzers estimating the convenience of thisoperation in each case. One of the tools was a static analyzer integrated into some Visual Studioeditions. The second analyzer was our PVS-Studio. We at once encountered several issues whenhandling the code analyzer integrated into Visual Studio. And those issues did not relate to the analysisquality itself or speed.The first issue is that you cannot save a list of analyzer-generated messages for further examination. Forinstance, while checking eMule with the integrated analyzer, I got two thousand messages. No one canthoroughly investigate them all at once, so you have to examine them for several days. But theimpossibility to save analysis results causes me to re-analyze the project each time, which tires me verymuch. PVS-Studio allows you to save analysis results for you to continue examining them later.The second issue is about the way how processing of duplicate analyzer-messages is implemented. Imean diagnosis of problems in header files (.h-files). Say the analyzer has detected an issue in an .h-fileincluded into ten .cpp-files. While analyzing each of these ten .cpp-files, the Visual Studio-integrated
  3. 3. analyzer produces the same message about the issue in the .h-file ten times! Here you are a real sample.The following message was generated more than ten times while checking eMule:c:usersevgdocumentsemuleplusdialogmintraybtn.hpp(450):warning C6054: String szwThemeColor might not be zero-terminated:Lines: 434, 437, 438, 443, 445, 448, 450Because of this, analysis results get messy and you have to review almost the same messages. I shouldsay, PVS-Studio has been filtering duplicate messages instead of showing them to user since the verybeginning.The third issue is generation of messages on issues in plug-in files (from folders like C:Program Files(x86)Microsoft Visual Studio 10.0VCinclude). The analyzer built into Visual Studio is not ashamed toattaint system header files although there is little sense in it. Again, here you are an example. We gotseveral times one and the same message about system files while checking eMule:1>c:program files (x86)microsoftsdkswindowsv7.0aincludews2tcpip.h(729):warning C6386: Buffer overrun: accessing argument 1,the writable size is 1*4 bytes,but 4294967272 bytes might be written:Lines: 703, 704, 705, 707, 713, 714, 715, 720,721, 722, 724, 727, 728, 729Nobody will ever edit system files. What for to "curse" them? PVS-Studio has never done that.Into the same category we can place the impossibility to tell the analyzer not to perform mask-check ofcertain files, for instance, all the files "*_generated.cpp" or "c:libs". You may specify exception files inPVS-Studio.The fourth issue relates to the very process of handling the list of analyzer-generated messages. Ofcourse, you may disable any diagnostic messages by code in any code analyzer. But it can be done atdifferent convenience levels. To be more exact, the question is: should analysis be relaunched to hideunnecessary messages by code or not. In the Visual-Studio-integrated analyzer, you must rewrite codesof messages to be disabled in the projects settings and relaunch the analysis. Sure, you hardly canspecify all the "unnecessary" diagnostics, so you will have to relaunch the analysis several times. In PVS-Studio, you can easily hide and reveal messages by code without relaunching the analysis, which is muchmore convenient.The fifth issue is filtering of messages not only by code but by text as well. For instance, it might beuseful to hide all the messages containing "printf". The analyzer integrated into Visual Studio doesnthave this feature while PVS-Studio has it.Finally, the sixth issue is convenience of specifying false alarms to the tool. The #pragma warning disablemechanism employed in Visual Studio lets you hide a message only relaunching the analysis. The
  4. 4. mechanism in PVS-Studio lets you mark messages as "False Alarm" and hide them without relaunchingthe analysis.All the six above mentioned issues dont relate to code analysis itself yet they are very important sinceusability of a tool is that very integral index showing whether it will come to estimating analysis qualityat all.Lets see what weve got. The static analyzer integrated into Visual Studio checks the eMule projectseveral times quicker than PVS-Studio. But it took us 3 days to complete work with the Visual Studiosanalyzer (actually it was less but we had to switch to other tasks to have a rest). PVS-Studio took us only4 hours to complete the work.Note. What the quantity of errors found is concerned - the both analyzers have shown almost the sameresults and found the same errors.SummaryComparison of two static analyzers is a very difficult and complex task. And there is no answer to thequestion what tool is the best IN GENERAL. You can only speak of what tool is better for a particularproject and user.