1) The author conducted experiments to determine if PVS-Studio could analyze code without being tied to Visual Studio. Experiments showed that properly handling include paths and duplicate file names is difficult without a project file, and preprocessing is essential for quality static analysis.
2) Compilation switches had little impact on analysis, except include paths. Preprocessing provides necessary type and symbol information.
3) Checking all files in a folder led to analyzing unnecessary files and preprocessing errors. Project structure is important for proper static analysis.
SQL Database Design For Developers at php[tek] 2024
Â
R&D on PVS-Studio without Visual Studio
1. R&D on PVS-Studio
Author: Evgeniy Ryzhkov
Date: 06.11.2012
We have a large list of tasks and wishes we try to stick to while developing PVS-Studio. But occasionally
we find some time to spend on unusual experiments that may bring new development ways and
capabilities. If research results are successful, they may be included into the main product. They can, on
the contrary, prove to be meaningless and useless, in which case we appear to have carried out a few
experiments to find out one more thing that doesn't work. It is this type of experiments we're going to
speak about today.
Introduction
PVS-Studio is at present a plugin for Visual Studio and cannot work without this environment. This is
actually not quite so, but common users are not aware of it. This is what makes them think that the tool
depends on Visual Studio:
• PVS-Studio employs an external preprocessor for its work. Earlier this used to be the
preprocessor from Visual C++. Now it's Clang in most cases, though we sometimes have to use
Visual C++ as well.
• PVS-Studio uses the project file .vcproj/.vcxproj to get information about the project settings:
for instance, #define/#include being used, compilation switches that may affect the code
analysis process, and so on.
• Besides, PVS-Studio also needs the project file .vcproj/.vcxproj to know which files should be
checked.
However, sometimes both users and ourselves feel like using PVS-Studio without being bound to Visual
Studio. In this article, we will tell you about some of our experiments related to this.
What do we already have?
First of all, I will tell you about the features that we made long ago and which are successfully used in
PVS-Studio.
First, PVS-Studio has for some time been able to analyze projects, integrating into build systems (both
classic makefile and tougher systems). It is described in the documentation, and some our customers
use this feature actively. In this case, the Visual Studio environment as such almost remains overboard.
Second, we could have started using Clang, which is integrated into the PVS-Studio distribution pack,
instead of the cl.exe preprocessor long ago. And it is Clang which is switched on by default. This is done
because Clang runs faster as a preprocessor than cl.exe. Besides, Clang is free from certain errors that
can be found in cl.exe. However, it does have its own ones, but it all looks quite transparent to users.
What experiments did we carry out?
Here is the list of the questions we wanted to find answers to in our experiments:
2. 1. Do we need the project structure defined in makefile or .vcxproj files for correct static code
analysis? Are the individual compilation parameters of each particular file important to this
task? Cannot we just do with commands like "Check all the files with the same build switches in
this folder?"
2. Do we need to take into account the compilation parameters settings (we mean the compiler
switches) for static analysis?
3. Does static analysis require file preprocessing at all or can errors be well found without it?
To answer these questions, we have written a utility that recursively traverses the specified folder and
runs PVS-Studio.exe on all the files with the source code (*.c, *.cpp, *.cxx, etc.). We wanted to compare
the analysis results thus obtained with the results produced by the traditional project analysis within the
Visual Studio environment.
Experiment one. Do we need the project structure?
We started our experiments on the WinMerge project that we already checked long ago. If you check it
under Visual Studio using the project file, PVS-Studio will analyze about 270 files included into .vcproj.
What's interesting, there are about 500 files with the source code (without .h-files) in the WinMerge
folder. Though it's obvious now, we still didn't expect it at that time. It appears that if you tell the
analyzer: "I want you to check the files in this folder", it will check unnecessary files too! And in case of
WinMerge, their number is about twice larger.
So, the first problem we encountered in this experiment was this: if you check just "all the files in the
folder", the analyzer will check more files than necessary, including those which are known to be
incorrect and uncompilable.
But this wasn't the main problem. When we launched the analysis for all the files, we started to get
preprocessor errors at once: "This or that #include-file cannot be found" with a reference to the names
of the project .h-files located in other folders. We understood that we needed to point out to the
preprocessor the folders where include-files lay. How to do this automatically, without having to specify
these folders manually? We added all the subfolders into the #include-directories list for each file.
Running a bit ahead, I want to tell you that it's no easy task for some projects. If your project contains
thousands of subfolders, automatically adding them into the #include-files search list will make the
preprocessor command line swell. While you can use the response file for cl.exe, there's no solution to
this problem in case of Clang yet.
So, you face another problem after automatically specifying all the subfolders to be searched through
for #include-files.
This problem is this: projects sometimes have files with identical names. In this case, you cannot
automatically specify files in which folder and for which project should be used in each particular search
of #include-files. You may say: "Well, yeah, there are some projects that have files with the same names,
but they are rare and can be ignored". No, they cannot. For instance, almost all the Visual Studio
projects contain files with identical names. Don't you believe me? Do you think your projects don't
contain such files? Then run a search for stdafx.h in your projects... Since stdafx.h must be included into
all the files, choosing a wrong version of stdafx.h leads to a preprocessing error for ALL the project files.
Although we found many other files with identical names besides stdafx.h, the very presence of this
problem makes it impossible to preprocess files for further handling in automated mode.
3. We have drawn the following conclusions from the first experiment's results. Checking "all the files in
the folder" without any file project (whether it is makefile or vcproj) is difficult due to the following
two reasons:
1. The folder in most cases contains additional "unnecessary" files which are usually quite
numerous. At the same time, they may be uncompilable, incorrect or simply add "trash" to the
analyzer's output.
2. The task of individually specifying the #include switches for each particular file can be solved
only manually. It cannot be solved in automated mode, by simply passing an identical switch
set to each file, because projects usually contain files with identical names.
Experiment two. The usefulness of compilation switches
Generally speaking, there are a lot of compilation switches that at first sight seem to affect the way a file
is interpreted from the viewpoint of static analysis. But there are many other parameters influencing the
inclusion of certain code branches besides the obvious paths to #include-files and #define-directives we
have just discussed. How strong is their influence?
For instance, there is the "/J" switch in the cl.exe compiler:
• /J (Default char Type Is unsigned)
The parameter seems to be important, but how much does it influence the static analysis? Or, for
example, take some other parameters referring to language extensions:
• /Za, /Ze (Disable Language Extensions)
To estimate the influence of such parameters, we compared the results of project analysis in a common
(traditional) mode and the same check without accounting for such compilation parameters.
The experiment has shown that the latter results have an extremely insufficient difference from the
former results. Moreover, even absence of #define-parameters being passed has almost no influence on
the analysis quality in general. Of course, the analyzer made wrong choices of code branches in #ifdef-
constructs, but it had been expected and is quite logical. The only parameter which is 100% necessary is
still the paths to #include-files.
The conclusions drawn from the second experiment's results are as follows: it is desirable to account
for compilation parameters to get more accurate static analysis results. But if you cannot do that for
some reason (for example, you have a complex build system), you may try do without them. The only
parameter which is an exception is the one defining paths to #include-files - that is necessary.
Experiment three. Do we need preprocessing?
Finally, we wanted to check how much necessary preprocessing was to get high-quality static analysis
results. After all, you can detect a lot of errors through "local" analysis, i.e. inside one function.
To understand it, we carried out the following experiment. We disabled preprocessing completely in the
analyzer and started feeding PVS-Studio.exe with source .cpp-files "as is" without any preprocessing.
Then we compared the results to our reference results.
It appeared that abandoning the preprocessing is highly destructive to the analysis quality. The reason is
that when you stop using preprocessing (either as a separate step or "on the fly"), you miss information
about data types, classes and functions defined in the .h-files. Because of this, quite many of our
4. diagnostics simply fell off. Yes, they still were able to find something. But, first, it was much less than
before. And second, too many trash messages were generated because the analyzer "had failed to find
out" the data type and assumed that it might cause troubles.
Did we get real errors among the results acquired through the analysis without preprocessing? Yes, we
did. But they were very hard to filter out among the huge amount of false reports.
The conclusions from the third experiment's results are the following: absence of preprocessing is very
destructive to the static analysis quality. Though you can still detect some errors even without the
preprocessor, absence of information about data types, function titles and other similar information,
distorts the analysis results and causes a lot of trash messages. It means that static analysis is not
reasonable without preprocessing.
Conclusion
So, these are the conclusions drawn from the results of our experiment (or, rather, an evidence of what
we already supposed before):
• The project structure is very important. You can't just check "all the files in the folder". First,
unnecessary files will be checked. Second, confusion occurs when including header files with
identical names because of the impossibility to automatically generate individual build
parameters for each file without having any project file.
• Compilation switches should be accounted for, but their influence is not that crucial, except for
the paths to #include-files and, perhaps, #define-parameters.
• Preprocessing is a necessary step for static analysis. Without it, a significant part of information
about the code structure gets lost, which leads to a poor quality of analysis results.
That's why we hardly will abandon the established scheme of project checking in nearest future.