Static analysis of source code by the example of WinMerge

  • 206 views
Uploaded on

The today's post is devoted to the question why tools of static source code analysis are helpful regardless of programmer's knowledge and skill. I will demonstrate the benefit of static analysis by …

The today's post is devoted to the question why tools of static source code analysis are helpful regardless of programmer's knowledge and skill. I will demonstrate the benefit of static analysis by the example of the tool known to every programmer - WinMerge.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
206
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Static analysis of source code by theexample of WinMergeAuthor: Andrey KarpovDate: 30.10.2010The todays post is devoted to the question why tools of static source code analysis are helpfulregardless of programmers knowledge and skill. I will demonstrate the benefit of static analysis by theexample of the tool known to every programmer - WinMerge.The earlier the developer finds an error in application code, the cheaper it is to fix it. From this weconclude that its cheapest and easiest to eliminate an error while writing the code. The best way iscertainly just to write without errors at all: imagine that you are only going to make an error but youslap your hand with the other and go on writing correct code. Still we dont manage to do that, do we?So, the approach "you should write without errors" doesnt work anyway.Even a highly skilled programmer who takes his time makes errors from common misprints to errors inalgorithms. It is the law of large numbers that works in this case. Does it seem to you that one cantmake a mistake in every particular "if" operator? But I carried out such an experiment and wrote 200comparisons - I did make an error once. Andrey Urazov discussed this thing in his interesting lecture"Quality-oriented programming" at the CodeFest 2010 conference (the video of this lecture (RU)). Iwould like to cite his thought that however skilled developers are, errors will appear in code all thesame. You just cant stop making them. But you may successfully fight many of them at much earlierstages of development process than usually.Usually the first level of error defense is creating unit-tests for the newly written code. Sometimes testsare written earlier than the code they are intended to check. However, unit-tests have somedisadvantages that Im not going to discuss in detail here because all the programmers are aware ofthem. It is not always easy to create a unit-test for a function that requires a complicated procedure ofpreliminarily preparing the data. Unit-tests become a burden if the project requirements change rapidly;tests consume a lot of time to write and support; it is not always easy to cover all the program brancheswith tests, etc. Moreover, you may get a solid project "as a present" that merely has no unit-tests andthey were not intended at all. Without denying the great benefit of unit-tests, I still think that althoughit is a good defense level, we can and must improve it greatly.Programmers usually neglect an even earlier defense level - static code analysis. Many developers utilizecapabilities of static code analysis without leaving the scope of diagnostic warnings generated bycompilers. However, there is a wide range of tools that allow you to detect a significant part of logicalerrors and common misprints at the coding stage already. These tools perform a higher-level code checkrelying on knowledge of some coding patterns, use heuristic algorithms and provide for a flexiblesettings system.
  • 2. Of course, static analysis has its own disadvantages: it just cannot detect many types of errors; analyzersproduce false alarms and make you modify code so that they like it and consider safe.But there are huge advantages as well. Static analysis covers all the program branches regardless of howoften they are used. It doesnt depend on execution stages. You may check even incomplete code or youmay check a large amount of code you inherited from some developer. Static analysis is quick and wellscalable unlike dynamic analysis tools.So you have read a lot of words about static analysis of source code. Now its time for practice. I want totake one application in C++ and try to find errors in it.I wanted to choose something small and widely known. Since I dont use too many tools, I just lookedthrough the "Programs" list in the "Start" menu and decided to take WinMerge. The WinMergeapplication is open-source and it is small (about 186000 lines). Its quality is rather high. Im saying thisrelying on my experience - I have no complaints about it and I like that comments occupy 25% of itssource code (it is a good sign). So, it is a good choice.I downloaded the latest available version 2.13.20 (from 20.10.2010). I used the prototype of a general-purpose analyzer we are developing now. Let me tell you a bit more about it.Currently, the PVS-Studio static analyzer includes two rule sets. One of them is intended to detect 64-bitdefects and the other is intended to check OpenMP programs. Now we are developing a general-purpose set of rules. We havent got even a beta-version yet but some code already works and Im veryeager to have a real war against errors. We intend to make the new rule set free, so please dont writethat we are indulging in self-advertisement. The new tool will be presented to the community in 1-2months as a part of PVS-Studio 4.00.So, here are some interesting issues I detected in WinMerge-2.13.20s code during a half an hour (15minutes for analysis, 15 minutes to review the results). There are also some other suspicious fragmentsbut it demands some efforts to make it out if they are really errors or not. My current task is not to findas many defects in one project as possible; I just want to make a nice demonstration of benefits staticanalysis provides and show how to quickly detect some errors through even superficial examination.The first sample. The analyzer pointed to several errors "V530 - The return value of function Foo isrequired to be utilized". These warnings are usually generated for inappropriately used functions. Studythis code fragment:/*** @brief Get the file names on both sides for specified item.* @note Return empty strings if item is special item.*/void CDirView::GetItemFileNames(int sel, String& strLeft, String& strRight) const{ UINT_PTR diffpos = GetItemKey(sel);
  • 3. if (diffpos == (UINT_PTR)SPECIAL_ITEM_POS) { strLeft.empty(); strRight.empty(); } else { ... }}The function must return two empty strings in a particular case. But due to programmers inattention, itits the std::string::empty() functions which are called instead of std::string::clear(). By the way, thiserror is not so rare as it may seem - I encountered it in many other projects. This error is also present inanother WinMerges function:/*** @brief Clear variants value (reset to defaults).*/void VariantValue::Clear(){ m_vtype = VT_NULL; m_bvalue = false; m_ivalue = 0; m_fvalue = 0; m_svalue.empty(); m_tvalue = 0;}Again we dont get the expected clearing of the string.And here we have the warning "V501 - There are identical sub-expressions to the left and to the right ofthe || operator":BUFFERTYPE m_nBufferType[2];...
  • 4. // Handle unnamed buffersif ((m_nBufferType[nBuffer] == BUFFER_UNNAMED) || (m_nBufferType[nBuffer] == BUFFER_UNNAMED)) nSaveErrorCode = SAVE_NO_FILENAME;If we review the code nearby, we conclude by analogy that we must have the following lines in ourfragment:(m_nBufferType[0] == BUFFER_UNNAMED) ||(m_nBufferType[1] == BUFFER_UNNAMED)If it is not so, still there is some error here.When various crashes occur, WinMerge tries to report about errors but fails in most cases. By the way, itis a good example of how a code analyzer can detect errors in rarely used program fragments. There areseveral errors in the code PVS-Studio reports about with the following warning: "V510 - The Formatfunction is not expected to receive class-type variable as N actual argument". Study this code sample:String GetSysError(int nerr);...CString msg;msg.Format(_T("Failed to open registry key HKCU/%s:nt%d : %s"),f_RegDir, retVal, GetSysError(retVal));Everything seems good at first. But the "String" type is actually "std::wstring" and therefore we will havesome rubbish printed at best, or an access violation error at worst. It is an object of the "std::wstring"type which is put into the stack instead of a string-pointer. Read the post "Big Brother helps you" whereI described this error in detail. The correct code must have a call with c_str():msg.Format(_T("Failed to open registry key HKCU/%s:nt%d : %s"),f_RegDir, retVal, GetSysError(retVal).c_str());Lets go further. Here we have a suspicious code fragment. I dont know if there is really an error, but itis strange that two branches of the "if" operator contain absolutely the same code. The analyzer warnsabout it with the diagnostic message "V532 - The then statement is equivalent to the else statement".Here is this suspicious code:if (max < INT_MAX){
  • 5. for (i = min; i < max; i++) { if (eptr >= md->end_subject || IS_NEWLINE(eptr)) break; eptr++; while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++; } }else{ for (i = min; i < max; i++) { if (eptr >= md->end_subject || IS_NEWLINE(eptr)) break; eptr++; while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++; } }}I feel that "this humming is no accident".OK, lets study one more sample and get finished with the post. The analyzer found a suspicious loop:"V534 - It is likely that a wrong variable is being compared inside the for operator. Consider reviewingi." This is the source code:// Get length of translated array of bytes from text.int Text2BinTranslator::iLengthOfTransToBin(
  • 6. char* src, int srclen ){ ... for (k=i; i<srclen; k++) { if (src[k]==>) break; } ...}This code is inclined to Access Violation. The loop must continue until the > character is found or thestring with the length of srclen characters ends. But the programmer used by accident the i variableinstead of k for comparison. If the > character is not found, the consequences are likely to be bad.SummaryDont forget about static analysis. It may often help you find some peculiar issues even in good code. Ialso invite you to visit our site some time later to try our free general-purpose analyzer when it is ready.