SlideShare a Scribd company logo
1 of 9
Download to read offline
What	do	static	analysis	and	search	
engines	have	in	common?	A	good	"top"!
Author: Evgeniy Ryzhkov
Date: 18.04.2012
Developers of search engines like Google/Yandex and developers of static code analysis tools to some
extent solve the same task. Both have to provide users with a certain selection of resources that meet
users' wishes. Well, of course search engines' developers would like to confine themselves just to the
button "I'm Feeling Lucky!", while developers of static code analysis tools want to generate a list of real
errors only. But reality imposes constrains, as usual. Do you want to know how we fight the cruel reality
while developing PVS-Studio?
Introduction
So, what is the task of search systems in the conditions of existing restrictions? Without pretending to
fully cover this issue, I'll tell you that a search system should give several answers to a user's query
(stated explicitly). That is, it should show several websites that might be of interest to a user. At the
same time, it could show some advertisement as well.
From the viewpoint of static code analyzers, the task is almost the same. It is answering to the user's
implicit query ("You, a smart program, show me please where I have errors in my code") that the tool
should point at the code fragments in the program that most likely will be of interest to the user.
Those who dealt with static code analyzers (regardless for which language) understand that any tool
produces false positives. This is a situation when there is "formally" an error judging by the code from
the viewpoint of the tool, but a human sees that there is no error. Then the human perception comes
into play. So, imagine the following situation.
Someone downloads a trial version of the code analyzer and launches it. It even doesn't crash (a
miracle!) and manages to work for some time. It shows a list of some tens/hundreds/thousands of
messages to the user. If there are just a few dozens of messages, the user will review them all. If he/she
finds anything interesting, it's the reason for him/her to think of using the tool constantly and buying it.
If he/she doesn't find anything interesting, he/she will soon forget of it. But if there are hundreds or
thousands of messages in the list, the user will review just a few of them and draw a conclusion
proceeding from what he/she has seen. That's why it is very important that relevant messages can at
once "catch" the user's eye. This is the similarity between approaches to the "right top" of search
engines' developers and developers of static code analyzers.
So how to provide "right top" for static analysis?
To allow PVS-Studio users to see the most interesting messages first of all, we have several tricks.
First, all the messages are categorized into levels similar to Compiler Warning Levels. Only first-level and
second-level messages are shown at the first launch by default, while the third level is disabled.
Second, our diagnostics are divided into classes "General Analysis", "64-bit diagnostics" and "OpenMP
diagnostics". At the same time, OpenMP and 64-bit diagnostics are also disabled, and users don't see
them. It doesn't mean that they are bad, or meaningless, or buggy at all. No, it's just that you are much
more likely to find the most interesting errors among errors of the "General Analysis" category. And if a
user does find anything interesting there, he/she will turn on the other diagnostics and handle them if
he/she needs them, of course.
Third, we are constantly fighting against false positives.
So how do you do all this?
We have an internal tool that allows us to make statistic (do not confuse with "static"!) analysis of our
code analyzer's output results. It allows us to estimate the following three parameters:
• Share of an error in the project — how prevalent errors are (by their codes) at the project level
(Project Level Share).
• Average density of an error — the ratio of the number of errors of one type to the number of
files where errors of this type occur (Average Density (project level)).
• Distribution of errors of one type throughout the project files compared to their average density
(Errors count on file).
Let's see how we use this internal tool by the example of the Miranda IM project.
Note that this post is not about errors found in Miranda IM. If you want to see them, please refer to this
post.
So, we open the analysis report (plog-file) in our internal tool, turn off the third error level and leave
only the GA-analyzer (General Analysis). The error distribution is shown in Figure 1.
Figure 1 - Distribution of errors in the Miranda IM project.
The color sectors correspond to a more than 2.5% share of reports of a certain diagnostic out of the
general amount of detected issues. The black sectors correspond to shares less than 2.5%. You can see
that errors with codes V547, V595 and V560 are the most frequent. Let's keep them in mind.
In Figure 2, you can see the average number of errors of each type per file (i.e. their average density for
the project).
Figure 2 - Average density of errors in the Miranda IM project.
As you can see from this graph, the errors with codes V547, V595 and V560 are reported from 1.5 to 2.5
times per file. This is actually a normal value and there's no reason, as we think, to "fight" these errors
regarding false positives. But the final conclusion is drawn on the basis of the third graph for these
errors shown in Figure 3, Figure 4 and Figure 5.
Figure 3 - Distribution of V547 errors in the Miranda IM project compared to their average density.
Figure 4 - Distribution of V595 errors in the Miranda IM project compared to their average density.
Figure 5 - Distribution of V560 errors in the Miranda IM project compared to their average density.
In Figures 3-5, names of individual files are written horizontally, and the number of times a certain error
was reported for a particular file - vertically. The red columns are files where the error was reported
more than the average (blue dots) number of times for this error type.
So what do you do with these graphs?
Then we study these "red" files and make a decision: if there is a false positive and it occurs quite
frequently in other projects too, then we eliminate it. And if there is a real error which is, in addition,
was swiftly cloned with the copy-paste technology, there's nothing to "improve".
In this post, I'm consciously omitting code samples the analyzer swore at in order not to overload the
text.
In other words, after drawing a whole lot of such graphs and analyzing them, we can easily see where
our analyzer misses and fix those places. It confirms an old truth that the visual representation of
"boring" data allows you to have a better view of the issue being investigated.
What is that OP button in the pictures?
Attentive readers have noticed one more button (OP) in the pictures besides the three standard buttons
of analyzers (GA, 64, MP). OP is the abbreviation of "optimization". In PVS-Studio 4.60, we have
introduced the new group of diagnostic messages referring to micro-optimizations. Diagnostics of
possible micro-optimizations is quite an ambiguous feature of our analyzer. Somebody will be glad to
find a place where a large object is passed into a function through copying instead of by reference
(V801). Somebody will significantly save memory by decreasing structure sizes for large object arrays
(V802). And somebody thinks it all is rubbish and premature optimization. Everything depends on the
project type.
Anyway, analyzing the results of our tool's output, we have come to the necessity of:
• arranging optimization diagnostics into a separate group so that they could be easily hidden or
shown;
• turning them off by default, as they can "jam" the error list with those diagnostics that not
everyone likes.
That's how this new button OP has appeared in the PVS-Studio Output Window (Figure 6):
Figure 6 - OP button (optimization) has appeared in PVS-Studio 4.60.
By the way, we have also significantly reduced the number of false positives for 64-bit issues analysis in
the same version.
I invite you to download the new PVS-Studio version and to check how adequate the recommendations
on optimizing your code are.
Conclusion
Developers of static code analyzers, as well as search engine developers, are interested in making the
output as adequate as possible. Both employ many methods to achieve that, including statistical
analysis methods. In this post I have shown you how we achieve that when developing PVS-Studio.
A question to the audience
I have a small question to those who dealt with (or at least played around with) PVS-Studio or any other
code analyzer. Do you think a code analyzer's end user needs the graphs demonstrated in this article as
an end-user tool? In other words, do you think you could learn anything useful from such diagrams if
your code analyzer contained them? Or is it a tool "for internal use" only? Please share your opinion by
writing to us.

More Related Content

Viewers also liked

PVS-Studio advertisement - static analysis of C/C++ code
PVS-Studio advertisement - static analysis of C/C++ codePVS-Studio advertisement - static analysis of C/C++ code
PVS-Studio advertisement - static analysis of C/C++ codePVS-Studio
 
Errors detected in C++Builder
Errors detected in C++BuilderErrors detected in C++Builder
Errors detected in C++BuilderPVS-Studio
 
Visual C++ project model
Visual C++ project modelVisual C++ project model
Visual C++ project modelPVS-Studio
 
The D language comes to help
The D language comes to helpThe D language comes to help
The D language comes to helpPVS-Studio
 
What comments hide
What comments hideWhat comments hide
What comments hidePVS-Studio
 
How to make fewer errors at the stage of code writing. Part N1.
How to make fewer errors at the stage of code writing. Part N1.How to make fewer errors at the stage of code writing. Part N1.
How to make fewer errors at the stage of code writing. Part N1.PVS-Studio
 
Intel IPP Samples for Windows - error correction
Intel IPP Samples for Windows - error correctionIntel IPP Samples for Windows - error correction
Intel IPP Samples for Windows - error correctionPVS-Studio
 
Software diseases: memset
Software diseases: memsetSoftware diseases: memset
Software diseases: memsetPVS-Studio
 
Re-checking the ReactOS project - a large report
Re-checking the ReactOS project - a large reportRe-checking the ReactOS project - a large report
Re-checking the ReactOS project - a large reportPVS-Studio
 
Visual Studio tool windows
Visual Studio tool windowsVisual Studio tool windows
Visual Studio tool windowsPVS-Studio
 
Wade Not in Unknown Waters. Part Four.
Wade Not in Unknown Waters. Part Four.Wade Not in Unknown Waters. Part Four.
Wade Not in Unknown Waters. Part Four.PVS-Studio
 
100% code coverage by static analysis - is it that good?
100% code coverage by static analysis - is it that good?100% code coverage by static analysis - is it that good?
100% code coverage by static analysis - is it that good?PVS-Studio
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityPVS-Studio
 
Comparing static analysis in Visual Studio 2012 (Visual C++ 2012) and PVS-Studio
Comparing static analysis in Visual Studio 2012 (Visual C++ 2012) and PVS-StudioComparing static analysis in Visual Studio 2012 (Visual C++ 2012) and PVS-Studio
Comparing static analysis in Visual Studio 2012 (Visual C++ 2012) and PVS-StudioPVS-Studio
 
Studying methods of attracting people to a software product's website
Studying methods of attracting people to a software product's websiteStudying methods of attracting people to a software product's website
Studying methods of attracting people to a software product's websitePVS-Studio
 
Leo Tolstoy and static code analysis
Leo Tolstoy and static code analysisLeo Tolstoy and static code analysis
Leo Tolstoy and static code analysisPVS-Studio
 
Myths about static analysis. The fifth myth - a small test program is enough ...
Myths about static analysis. The fifth myth - a small test program is enough ...Myths about static analysis. The fifth myth - a small test program is enough ...
Myths about static analysis. The fifth myth - a small test program is enough ...PVS-Studio
 
Checking Intel IPP Samples for Windows - Continuation
Checking Intel IPP Samples for Windows - ContinuationChecking Intel IPP Samples for Windows - Continuation
Checking Intel IPP Samples for Windows - ContinuationPVS-Studio
 
Analyzing the Dolphin-emu project
Analyzing the Dolphin-emu projectAnalyzing the Dolphin-emu project
Analyzing the Dolphin-emu projectPVS-Studio
 

Viewers also liked (20)

PVS-Studio advertisement - static analysis of C/C++ code
PVS-Studio advertisement - static analysis of C/C++ codePVS-Studio advertisement - static analysis of C/C++ code
PVS-Studio advertisement - static analysis of C/C++ code
 
Errors detected in C++Builder
Errors detected in C++BuilderErrors detected in C++Builder
Errors detected in C++Builder
 
Visual C++ project model
Visual C++ project modelVisual C++ project model
Visual C++ project model
 
The D language comes to help
The D language comes to helpThe D language comes to help
The D language comes to help
 
What comments hide
What comments hideWhat comments hide
What comments hide
 
How to make fewer errors at the stage of code writing. Part N1.
How to make fewer errors at the stage of code writing. Part N1.How to make fewer errors at the stage of code writing. Part N1.
How to make fewer errors at the stage of code writing. Part N1.
 
Intel IPP Samples for Windows - error correction
Intel IPP Samples for Windows - error correctionIntel IPP Samples for Windows - error correction
Intel IPP Samples for Windows - error correction
 
Software diseases: memset
Software diseases: memsetSoftware diseases: memset
Software diseases: memset
 
Re-checking the ReactOS project - a large report
Re-checking the ReactOS project - a large reportRe-checking the ReactOS project - a large report
Re-checking the ReactOS project - a large report
 
Visual Studio tool windows
Visual Studio tool windowsVisual Studio tool windows
Visual Studio tool windows
 
Wade Not in Unknown Waters. Part Four.
Wade Not in Unknown Waters. Part Four.Wade Not in Unknown Waters. Part Four.
Wade Not in Unknown Waters. Part Four.
 
100% code coverage by static analysis - is it that good?
100% code coverage by static analysis - is it that good?100% code coverage by static analysis - is it that good?
100% code coverage by static analysis - is it that good?
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
 
Comparing static analysis in Visual Studio 2012 (Visual C++ 2012) and PVS-Studio
Comparing static analysis in Visual Studio 2012 (Visual C++ 2012) and PVS-StudioComparing static analysis in Visual Studio 2012 (Visual C++ 2012) and PVS-Studio
Comparing static analysis in Visual Studio 2012 (Visual C++ 2012) and PVS-Studio
 
Cppcheck
CppcheckCppcheck
Cppcheck
 
Studying methods of attracting people to a software product's website
Studying methods of attracting people to a software product's websiteStudying methods of attracting people to a software product's website
Studying methods of attracting people to a software product's website
 
Leo Tolstoy and static code analysis
Leo Tolstoy and static code analysisLeo Tolstoy and static code analysis
Leo Tolstoy and static code analysis
 
Myths about static analysis. The fifth myth - a small test program is enough ...
Myths about static analysis. The fifth myth - a small test program is enough ...Myths about static analysis. The fifth myth - a small test program is enough ...
Myths about static analysis. The fifth myth - a small test program is enough ...
 
Checking Intel IPP Samples for Windows - Continuation
Checking Intel IPP Samples for Windows - ContinuationChecking Intel IPP Samples for Windows - Continuation
Checking Intel IPP Samples for Windows - Continuation
 
Analyzing the Dolphin-emu project
Analyzing the Dolphin-emu projectAnalyzing the Dolphin-emu project
Analyzing the Dolphin-emu project
 

Similar to What do static analysis and search engines have in common? A good "top"!

An ideal static analyzer, or why ideals are unachievable
An ideal static analyzer, or why ideals are unachievableAn ideal static analyzer, or why ideals are unachievable
An ideal static analyzer, or why ideals are unachievablePVS-Studio
 
0136 ideal static_analyzer
0136 ideal static_analyzer0136 ideal static_analyzer
0136 ideal static_analyzerPVS-Studio
 
Static analysis is most efficient when being used regularly. We'll tell you w...
Static analysis is most efficient when being used regularly. We'll tell you w...Static analysis is most efficient when being used regularly. We'll tell you w...
Static analysis is most efficient when being used regularly. We'll tell you w...PVS-Studio
 
Static analysis is most efficient when being used regularly. We'll tell you w...
Static analysis is most efficient when being used regularly. We'll tell you w...Static analysis is most efficient when being used regularly. We'll tell you w...
Static analysis is most efficient when being used regularly. We'll tell you w...Andrey Karpov
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team developmentAndrey Karpov
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team developmentPVS-Studio
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team developmentPVS-Studio
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityAndrey Karpov
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityPVS-Studio
 
Static analysis as part of the development process in Unreal Engine
Static analysis as part of the development process in Unreal EngineStatic analysis as part of the development process in Unreal Engine
Static analysis as part of the development process in Unreal EnginePVS-Studio
 
Searching for bugs in Mono: there are hundreds of them!
Searching for bugs in Mono: there are hundreds of them!Searching for bugs in Mono: there are hundreds of them!
Searching for bugs in Mono: there are hundreds of them!PVS-Studio
 
Three Interviews About Static Code Analyzers
Three Interviews About Static Code AnalyzersThree Interviews About Static Code Analyzers
Three Interviews About Static Code AnalyzersAndrey Karpov
 
How we test the code analyzer
How we test the code analyzerHow we test the code analyzer
How we test the code analyzerPVS-Studio
 
Static analysis and ROI
Static analysis and ROIStatic analysis and ROI
Static analysis and ROIPVS-Studio
 
Static analysis and ROI
Static analysis and ROIStatic analysis and ROI
Static analysis and ROIAndrey Karpov
 
How we test the code analyzer
How we test the code analyzerHow we test the code analyzer
How we test the code analyzerPVS-Studio
 
War of the Machines: PVS-Studio vs. TensorFlow
War of the Machines: PVS-Studio vs. TensorFlowWar of the Machines: PVS-Studio vs. TensorFlow
War of the Machines: PVS-Studio vs. TensorFlowPVS-Studio
 
Konstantin Knizhnik: static analysis, a view from aside
Konstantin Knizhnik: static analysis, a view from asideKonstantin Knizhnik: static analysis, a view from aside
Konstantin Knizhnik: static analysis, a view from asidePVS-Studio
 
PVS-Studio for Visual C++
PVS-Studio for Visual C++PVS-Studio for Visual C++
PVS-Studio for Visual C++PVS-Studio
 
Machine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeMachine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeAndrey Karpov
 

Similar to What do static analysis and search engines have in common? A good "top"! (20)

An ideal static analyzer, or why ideals are unachievable
An ideal static analyzer, or why ideals are unachievableAn ideal static analyzer, or why ideals are unachievable
An ideal static analyzer, or why ideals are unachievable
 
0136 ideal static_analyzer
0136 ideal static_analyzer0136 ideal static_analyzer
0136 ideal static_analyzer
 
Static analysis is most efficient when being used regularly. We'll tell you w...
Static analysis is most efficient when being used regularly. We'll tell you w...Static analysis is most efficient when being used regularly. We'll tell you w...
Static analysis is most efficient when being used regularly. We'll tell you w...
 
Static analysis is most efficient when being used regularly. We'll tell you w...
Static analysis is most efficient when being used regularly. We'll tell you w...Static analysis is most efficient when being used regularly. We'll tell you w...
Static analysis is most efficient when being used regularly. We'll tell you w...
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
 
Static analysis as part of the development process in Unreal Engine
Static analysis as part of the development process in Unreal EngineStatic analysis as part of the development process in Unreal Engine
Static analysis as part of the development process in Unreal Engine
 
Searching for bugs in Mono: there are hundreds of them!
Searching for bugs in Mono: there are hundreds of them!Searching for bugs in Mono: there are hundreds of them!
Searching for bugs in Mono: there are hundreds of them!
 
Three Interviews About Static Code Analyzers
Three Interviews About Static Code AnalyzersThree Interviews About Static Code Analyzers
Three Interviews About Static Code Analyzers
 
How we test the code analyzer
How we test the code analyzerHow we test the code analyzer
How we test the code analyzer
 
Static analysis and ROI
Static analysis and ROIStatic analysis and ROI
Static analysis and ROI
 
Static analysis and ROI
Static analysis and ROIStatic analysis and ROI
Static analysis and ROI
 
How we test the code analyzer
How we test the code analyzerHow we test the code analyzer
How we test the code analyzer
 
War of the Machines: PVS-Studio vs. TensorFlow
War of the Machines: PVS-Studio vs. TensorFlowWar of the Machines: PVS-Studio vs. TensorFlow
War of the Machines: PVS-Studio vs. TensorFlow
 
Konstantin Knizhnik: static analysis, a view from aside
Konstantin Knizhnik: static analysis, a view from asideKonstantin Knizhnik: static analysis, a view from aside
Konstantin Knizhnik: static analysis, a view from aside
 
PVS-Studio for Visual C++
PVS-Studio for Visual C++PVS-Studio for Visual C++
PVS-Studio for Visual C++
 
Machine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeMachine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source Code
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

What do static analysis and search engines have in common? A good "top"!

  • 1. What do static analysis and search engines have in common? A good "top"! Author: Evgeniy Ryzhkov Date: 18.04.2012 Developers of search engines like Google/Yandex and developers of static code analysis tools to some extent solve the same task. Both have to provide users with a certain selection of resources that meet users' wishes. Well, of course search engines' developers would like to confine themselves just to the button "I'm Feeling Lucky!", while developers of static code analysis tools want to generate a list of real errors only. But reality imposes constrains, as usual. Do you want to know how we fight the cruel reality while developing PVS-Studio? Introduction So, what is the task of search systems in the conditions of existing restrictions? Without pretending to fully cover this issue, I'll tell you that a search system should give several answers to a user's query (stated explicitly). That is, it should show several websites that might be of interest to a user. At the same time, it could show some advertisement as well. From the viewpoint of static code analyzers, the task is almost the same. It is answering to the user's implicit query ("You, a smart program, show me please where I have errors in my code") that the tool should point at the code fragments in the program that most likely will be of interest to the user. Those who dealt with static code analyzers (regardless for which language) understand that any tool produces false positives. This is a situation when there is "formally" an error judging by the code from the viewpoint of the tool, but a human sees that there is no error. Then the human perception comes into play. So, imagine the following situation. Someone downloads a trial version of the code analyzer and launches it. It even doesn't crash (a miracle!) and manages to work for some time. It shows a list of some tens/hundreds/thousands of messages to the user. If there are just a few dozens of messages, the user will review them all. If he/she finds anything interesting, it's the reason for him/her to think of using the tool constantly and buying it. If he/she doesn't find anything interesting, he/she will soon forget of it. But if there are hundreds or thousands of messages in the list, the user will review just a few of them and draw a conclusion proceeding from what he/she has seen. That's why it is very important that relevant messages can at once "catch" the user's eye. This is the similarity between approaches to the "right top" of search engines' developers and developers of static code analyzers. So how to provide "right top" for static analysis? To allow PVS-Studio users to see the most interesting messages first of all, we have several tricks. First, all the messages are categorized into levels similar to Compiler Warning Levels. Only first-level and second-level messages are shown at the first launch by default, while the third level is disabled.
  • 2. Second, our diagnostics are divided into classes "General Analysis", "64-bit diagnostics" and "OpenMP diagnostics". At the same time, OpenMP and 64-bit diagnostics are also disabled, and users don't see them. It doesn't mean that they are bad, or meaningless, or buggy at all. No, it's just that you are much more likely to find the most interesting errors among errors of the "General Analysis" category. And if a user does find anything interesting there, he/she will turn on the other diagnostics and handle them if he/she needs them, of course. Third, we are constantly fighting against false positives. So how do you do all this? We have an internal tool that allows us to make statistic (do not confuse with "static"!) analysis of our code analyzer's output results. It allows us to estimate the following three parameters: • Share of an error in the project — how prevalent errors are (by their codes) at the project level (Project Level Share). • Average density of an error — the ratio of the number of errors of one type to the number of files where errors of this type occur (Average Density (project level)). • Distribution of errors of one type throughout the project files compared to their average density (Errors count on file). Let's see how we use this internal tool by the example of the Miranda IM project. Note that this post is not about errors found in Miranda IM. If you want to see them, please refer to this post. So, we open the analysis report (plog-file) in our internal tool, turn off the third error level and leave only the GA-analyzer (General Analysis). The error distribution is shown in Figure 1.
  • 3. Figure 1 - Distribution of errors in the Miranda IM project. The color sectors correspond to a more than 2.5% share of reports of a certain diagnostic out of the general amount of detected issues. The black sectors correspond to shares less than 2.5%. You can see that errors with codes V547, V595 and V560 are the most frequent. Let's keep them in mind. In Figure 2, you can see the average number of errors of each type per file (i.e. their average density for the project).
  • 4.
  • 5. Figure 2 - Average density of errors in the Miranda IM project. As you can see from this graph, the errors with codes V547, V595 and V560 are reported from 1.5 to 2.5 times per file. This is actually a normal value and there's no reason, as we think, to "fight" these errors regarding false positives. But the final conclusion is drawn on the basis of the third graph for these errors shown in Figure 3, Figure 4 and Figure 5. Figure 3 - Distribution of V547 errors in the Miranda IM project compared to their average density.
  • 6. Figure 4 - Distribution of V595 errors in the Miranda IM project compared to their average density.
  • 7. Figure 5 - Distribution of V560 errors in the Miranda IM project compared to their average density. In Figures 3-5, names of individual files are written horizontally, and the number of times a certain error was reported for a particular file - vertically. The red columns are files where the error was reported more than the average (blue dots) number of times for this error type.
  • 8. So what do you do with these graphs? Then we study these "red" files and make a decision: if there is a false positive and it occurs quite frequently in other projects too, then we eliminate it. And if there is a real error which is, in addition, was swiftly cloned with the copy-paste technology, there's nothing to "improve". In this post, I'm consciously omitting code samples the analyzer swore at in order not to overload the text. In other words, after drawing a whole lot of such graphs and analyzing them, we can easily see where our analyzer misses and fix those places. It confirms an old truth that the visual representation of "boring" data allows you to have a better view of the issue being investigated. What is that OP button in the pictures? Attentive readers have noticed one more button (OP) in the pictures besides the three standard buttons of analyzers (GA, 64, MP). OP is the abbreviation of "optimization". In PVS-Studio 4.60, we have introduced the new group of diagnostic messages referring to micro-optimizations. Diagnostics of possible micro-optimizations is quite an ambiguous feature of our analyzer. Somebody will be glad to find a place where a large object is passed into a function through copying instead of by reference (V801). Somebody will significantly save memory by decreasing structure sizes for large object arrays (V802). And somebody thinks it all is rubbish and premature optimization. Everything depends on the project type. Anyway, analyzing the results of our tool's output, we have come to the necessity of: • arranging optimization diagnostics into a separate group so that they could be easily hidden or shown; • turning them off by default, as they can "jam" the error list with those diagnostics that not everyone likes. That's how this new button OP has appeared in the PVS-Studio Output Window (Figure 6):
  • 9. Figure 6 - OP button (optimization) has appeared in PVS-Studio 4.60. By the way, we have also significantly reduced the number of false positives for 64-bit issues analysis in the same version. I invite you to download the new PVS-Studio version and to check how adequate the recommendations on optimizing your code are. Conclusion Developers of static code analyzers, as well as search engine developers, are interested in making the output as adequate as possible. Both employ many methods to achieve that, including statistical analysis methods. In this post I have shown you how we achieve that when developing PVS-Studio. A question to the audience I have a small question to those who dealt with (or at least played around with) PVS-Studio or any other code analyzer. Do you think a code analyzer's end user needs the graphs demonstrated in this article as an end-user tool? In other words, do you think you could learn anything useful from such diagrams if your code analyzer contained them? Or is it a tool "for internal use" only? Please share your opinion by writing to us.