Comparison of analyzers' diagnostic possibilities at checking 64-bit code


Published on

The article compares a specialized static analyzer Viva64 with universal static analyzers Parasoft C++Test and Gimpel Software PC-Lint. The comparison is carried within the framework of the task of porting 32-bit C/C++ code on 64-bit systems or developing new code with taking into account peculiarities of 64-bit architecture.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Comparison of analyzers' diagnostic possibilities at checking 64-bit code

  1. 1. Comparison of analyzers diagnosticpossibilities at checking 64-bit codeAuthor: Andrey KarpovDate: 30.05.2008AbstractThe article compares a specialized static analyzer Viva64 with universal static analyzers Parasoft C++Testand Gimpel Software PC-Lint. The comparison is carried within the framework of the task of porting 32-bit C/C++ code on 64-bit systems or developing new code with taking into account peculiarities of 64-bitarchitecture.IntroductionThe purpose of this article is to show advantages of Viva64 analyzer in comparison with other productspossessing similar functional abilities. Viva64 is a specialized static analyzer for verifying 64-bit C/C++code [1]. Its scope of use is developing new 64-bit code or porting old code on 64-bit systems. By nowthe analyzer is implemented for Windows operation system being a module pluggable in VisualStudio2005/2008 development environment.This article is topical because there is no systematized information about abilities of modern staticanalyzers which are announced as tools of diagnosing 64-bit errors. Within the framework of this articlewe will compare three most popular analyzers which implement checking of 64-bit code: Viva64,Parasoft C++Test, Gimpel Software PC-Lint.Comparison made will be presented in the table and then well briefly touch upon each of the evaluationcriteria. But at first lets explain some notions which will be used in this article.1. Terms and definitions1.1. Data modelUnder a data model correlations of types dimensions accepted within the framework of thedevelopment environment are understood. There can be several development environments holdingdifferent data models for one operation system, but usually there is only one model most correspondingthe hardware and software environment. An example is a 64-bit Windows operation system for whichLLP64 data model is native. But for the purpose of compatibility a 64-bit Windows supports 32-bitprograms which work in ILP32LL data model.Table 1 shows the most popular data models. We are interested first of all in LP64 and LLP64 datamodels.
  2. 2. Table 1. Most popular data models.LP64 and LLP64 data models differ only in the size of "long" type. But this small difference contains agreat difference in recommended methodologies of developing programs for 64-bit operation systemsof Unix and Windows families. For example, in Unix programs it is recommended to use long type orunsigned long type for storing pointers and creating loops to process a large number of elements. Butthese types are unsuitable for Windows programs and you should use ptrdiff_t and size_t instead ofthem. To learn more about peculiarities of using different data models you may read the article"Forgotten problems of developing 64-bit programs" [2].In this article we speak about data models because different static analyzers are not always adapted forLP64 data model and LLP64 as well. Looking forward, we may say that Parasoft C++Test and GimpelSoftware PC-Lint analyzers are better adapted for Unix systems than for Windows ones.1.2. Memsize-typesTo make it easier to understand the matter of the article well use the term "memsize-type". This termappeared as a try to briefly name all the types capable of storing the size of pointers and indexes of thelargest arrays. Memsize type can store the maximum arrays size which can be theoretically allocatedwithin the framework of the given architecture.Under memsize-types we understand all the simple data types of C/C++ language which have 32-bit sizeon the 32-bit architecture and 64-bit size on the 64-bit one. Mind that long type is not a memsize-type inWindows while in Unix it is. To make it clearer the main memsize-types are shown in table 2.
  3. 3. Table 2. Examples of memsize-types.2. Comparison tableLets set to the comparison of static analyzers itself. Comparative information is given in table 3. The listof evaluation criteria was composed on the basis of the static analyzers documentation, articles andother additional sources. You may get acquainted with the original sources by the following links: • Article: Andrey Karpov, Evgeniy Ryzhkov. 20 issues of porting C++ code on the 64-bit platform • Parasoft C++Test: C++Test Users Guide (User Items: 3264bit_xxxxxxx.rule) • Gimpel Software PC-Lint: 64-bit Test (C) Checking programs against the LP64 model • Program Verification Systems Viva64: On-line help
  4. 4. Table 3. Comparison of static analyzers from the viewpoint of searching 64-bit codes specific errors.3. Evaluation criteriaThe names of the evaluation criteria listed in the table dont reveal much information by themselves.Thats why lets briefly speak about each of them. Paragraph 3.1. corresponds to the first criterion,paragraph 3.2. to the second one etc.To learn more about typical errors occurring while porting applications on 64-bit systems see thefollowing articles: 20 issues of porting C++ code on the 64-bit platform [3], Problems of testing 64-bitapplications [4], Development of resource-intensive applications in Visual C++ environment [5].3.1. Use of memsize-types as factual arguments in functions with variablenumber of argumentsA typical example is the incorrect use of printf, scanf functions and their varieties:1) const char *invalidFormat = "%u"; size_t value = SIZE_MAX; printf(invalidFormat, value);2) char buf[9]; sprintf(buf, "%p", pointer);In the first case it is not taken into consideration that size_t type is not equivalent to unsigned type on a64-bit platform. This will cause printing of incorrect result if value > UINT_MAX.In the second case it is not taken into consideration that the pointers size can be more than 32 bit infuture. As a result this code will cause an overflow on a 64-bit architecture.3.2. Use of magical constantsIn a low-quality code you may often see magical constants which are dangerous by themselves. Duringmigration of the code on the 64-bit platform these constants may make it invalid if they participate inoperations of calculating addresses, objects size or in bit operations. The main magical constants are: 4,32, 0x7fffffff, 0x80000000, 0xffffffff. For example:size_t ArraySize = N * 4;intptr_t *Array = (intptr_t *)malloc(ArraySize);3.3. Storing of integer values represented by a memsize type in doubleDouble type as a rule has 64-bit size and is compatible with IEEE-754 standard on 32-bit and 64-bitsystems. Sometimes double type is used in the code to store and work with integer types:size_t a = size_t(-1);double b = a;--a;--b;
  5. 5. size_t c = b; // x86: a == c // x64: a != cSuch code can be justified on a 32-bit system where double type can store a 32-bit integer value withoutloss as it has 52 significant bits. But when trying to save a 64-bit integer number into double the exactvalue can be lost.3.4. Incorrect work with shift operationsShift operations can cause a lot of troubles when used inattentively while porting code from a 32-bit to a64-bit system. Lets consider the function defining the value of the specified bit as "1" in a variable ofmemsize type:ptrdiff_t SetBitN(ptrdiff_t value, unsigned bitNum) { ptrdiff_t mask = 1 << bitNum; return value | mask;}This code is valid on a 32-bit architecture and allows you to define bits with numbers from 0 to 31. Afterporting the program on a 64-bit platform you should define bits from 0 to 63. But the call of SetBitN(0,32) function will return 0. You should take into consideration that "1" has int type and an overflow willoccur at the shift of 32 positions and the result will be incorrect.3.5. Stowage of pointers into non-memsize typesA lot of errors concerning the migration on 64-bit systems are related to the change of a pointers sizewith respect to the size of simple integers. Many programmers stored pointers in such types as int andunsigned in their 32-bit programs. This is of course incorrect from the viewpoint of 64-bit data models.For example:char *p;p = (char *) ((unsigned int)p & PAGEOFFSET);You should keep in mind that one should use only memsize types for storing pointers in integer form.Fortunately, such errors are detected easily not only by static analyzers but by compilers as well whenswitching on corresponding options.3.6. Use of memsize types in unionsA peculiarity of union in C/C++ is that one and the same memory area is allocated for storing all items -members of a union. Although access to this memory area is possible using any of the items, still theitem for access should be selected so that the result would be sensible.You should be attentive to unions which contain pointers and other members of memsize type.Developers often mistakenly think that size of memsize type will always equal the group of other objectson all the architectures. Here is an example of an incorrect function implementing table algorithm forcalculating the number of zero bits in variable "value":union SizetToBytesUnion { size_t value;
  6. 6. struct { unsigned char b0, b1, b2, b3; } bytes;} u;SizetToBytesUnion u;u.value = value;size_t zeroBitsN = TranslateTable[u.bytes.b0] + TranslateTable[u.bytes.b1] + TranslateTable[u.bytes.b2] + TranslateTable[u.bytes.b3];3.7. Change of an arrays typeSometimes it is necessary (or simply convenient) to convert an arrays items into items of a differenttype. Unsafe and safe type conversion is shown in the following code:int array[4] = { 1, 2, 3, 4 };enum ENumbers { ZERO, ONE, TWO, THREE, FOUR };//safe cast (for MSVC2005/2008)ENumbers *enumPtr = (ENumbers *)(array);cout << enumPtr[1] << " ";//unsafe castsize_t *sizetPtr = (size_t *)(array);cout << sizetPtr[1] << endl;//Output on 32-bit system: 2 2//Output on 64 bit system: 2 171798691873.8. Errors occurring when using virtual functions with arguments of memsizetype.If your program has large hierarchies of inheritance of classes with virtual functions you mayinattentively use arguments of different types which nearly coincide on a 32-bit system. For example,you use size_t type as an argument of a virtual function in a base class while in the descendant it isunsigned type. Consequently, this code will be incorrect on a 64-bit system.Such errors dont always relate to complex inheritance hierarchies, for example:
  7. 7. class CWinApp { ... virtual void WinHelp(DWORD_PTR dwData, UINT nCmd);};class CSampleApp : public CWinApp { ... virtual void WinHelp(DWORD dwData, UINT nCmd);};Such errors may occur not only because of the programmers inattention. The error shown in theexample occurs if you have developed your code for earlier versions of MFC library where WinHelpfunctions prototype in CWinApp class was like this:virtual void WinHelp(DWORD dwData, UINT nCmd = HELP_CONTEXT);Surely, you have used DWORD type in your code. In Microsoft Visual C++ 2005/2008 the functionsprototype was changed. On a 32-bit system the program will continue to work correctly as DWORD andDWORD_PTR types coincide here. But there will be troubles in the 64-bit program. Youll have twofunctions with same names but different parameters and as a result your code wont be executed.3.9. Incorrect pointer arithmeticLets consider the following example:unsigned short a16, b16, c16;char *pointer;...pointer += a16 * b16 * c16;This code works correctly with pointers if "a16 * b16 * c16" expressions value doesnt exceedUINT_MAX (4Gb). Such code could always work correctly on a 32-bit platform as a program could neverallocate an array of bigger size. On a 64-bit architecture the arrays size will exceed UINT_MAX of items.Assume that we want to shift the pointers value in bytes and thats why variables a16,b16 and c16 have values 3000, 2000 and 1000 correspondingly. While calculating "a16 * b16 * c16"expression all the variables will be converted into int type according to C++ languages rules and onlythen they will be multiplied. During multiplication an overflow will occur. The incorrect result of theexpression will be extended to ptrdiff_t type and the pointer will be calculated incorrectly.Here is another example of the code valid in a 32-bit version and invalid in a 64-bit one:int A = -2;unsigned B = 1;int array[5] = { 1, 2, 3, 4, 5 };
  8. 8. int *ptr = array + 3;ptr = ptr + (A + B); //Invalid pointer value on 64-bit platformprintf("%in", *ptr); //Access violation on 64-bit platformLets trace the way of calculating "ptr + (A + B)" expression: • According to C++ languages rules variable A of int type is converted into unsigned type. • A and B are summed up. As a result we get value 0xFFFFFFFF of unsigned type.Then "ptr + 0xFFFFFFFFu" expression is calculated but the result of this depends on the pointers size onthe given architecture. If addition will be carried in a 32-bit program the given expression will beequivalent to "ptr - 1" and well have number 3 printed.In a 64-bit program 0xFFFFFFFFu value will be added to the pointer and as a result the pointer will be farbeyond the arrays limits.3.10. Incorrect indexing of large arraysIn C and later C++ programming the practice was developed of using variables of int and unsigned typesas indexes for working with arrays. But time passes and everything changes. And now its high time tosay: "Stop doing it! Use only memsize types for indexing large arrays." An example of incorrect codeusing unsigned type:unsigned Index = 0;while (MyBigNumberField[Index] != id) Index++;This code cannot process an array containing more than UINT_MAX items in a 64-bit program. After theaccess to the item with UINT_MAX index an overflow of Index variable will occur and well get an eternalloop.We would like Windows developers once more to pay attention that long type remains 32-bit in a 64-bitWindows. Thats why Unix developers advice to use long type for long loops is irrelevant.3.11. Mixed use of simple integer types and memsize typesMixed use of memsize types and non-memsize types in expressions can cause incorrect results on 64-bitsystems and relate to the change of the range of input values. Lets consider some examples:size_t Count = BigValue;for (unsigned Index = 0; Index != Count; ++Index){ ... }This is an example of an eternal loop if Count > UINT_MAX. Assume that on 32-bit systems this codeworked at the range of less than UINT_MAX iterations. But a 64-bit version of the program can processmore data and it may need more iterations. Since Index variables values lie in the range [0..UINT_MAX]condition "Index != Count" will never be fulfilled and cause an eternal loop.
  9. 9. Here is a small code showing that inaccurate expressions with mixed types can be dangerous (the resultsare received by using Microsoft Visual C++ 2005 in 64-bit compilation mode):int x = 100000;int y = 100000;int z = 100000;intptr_t size = 1; // Result:intptr_t v1 = x * y * z; // -1530494976intptr_t v2 = intptr_t(x) * y * z; // 1000000000000000intptr_t v3 = x * y * intptr_t(z); // 141006540800000intptr_t v4 = size * x * y * z; // 1000000000000000intptr_t v5 = x * y * z * size; // -1530494976intptr_t v6 = size * (x * y * z); // -1530494976intptr_t v7 = size * (x * y) * z; // 141006540800000intptr_t v8 = ((size * x) * y) * z; // 1000000000000000intptr_t v9 = size * (x * (y * z)); // -1530494976It is necessary that all the operands in such expressions be converted to a type of larger dimensionbeforehand. Remember that an expression likeintptr_t v2 = intptr_t(x) * y * z;doesnt guarantee a correct result at all. It guarantees only that "intptr_t(x) * y * z" expression will haveintptr_t type. The correct result shown by this expression in the example is nothing more than a goodluck.3.12. Unsafe implicit type conversions at function callsDanger of mixed use of memsize and non-memsize types can concern not only expressions. An example:void foo(ptrdiff_t delta);int i = -2;unsigned k = 1;foo(i + k);Above (see Incorrect pointer arithmetic) we discussed such a situation. An incorrect result here occursbecause of the implicit extension of a factual 32-bit argument to 64 bits in the moment of function call.3.13. Dangerous implicit type conversions at returning value from functionUnsafe implicit type conversion may occur also when using return operation. An example:extern int Width, Height, Depth;
  10. 10. size_t GetIndex(int x, int y, int z) { return x + y * Width + z * Width * Height;}...MyArray[GetIndex(x, y, z)] = 0.0f;Although we return the value of size_t type, "x + y * Width + z * Width * Height" expression is calculatedwith the use of int type. When working with large arrays (more than INT_MAX items) this code willbehave incorrectly and well address other items of MyArray array than we wanted.3.14. ExceptionsGeneration and processing of exceptions with the use of integer types is not a good programmingpractice in C++ language. You should use more informative types for such aims, for example classesderived from std::exception classes. But sometimes you have to work with less quality code as in theexample:char *ptr1;char *ptr2;try { try { throw ptr2 - ptr1; } catch (int) { std::cout << "catch 1: on x86" << std::endl; }}catch (ptrdiff_t) { std::cout << "catch 2: on x64" << std::endl;}You should be very careful and avoid generation and processing of exceptions with the use of memsizetypes as it can change the programs working logic.3.15. Explicit type conversionsBe careful with explicit type conversions. They may change the programs executing logic when typesdimensions are changed or cause loss of significant bits. It is difficult to show type errors related toexplicit type conversion by examples as they vary very much and are specific for different programs. Yougot acquainted with some of such errors earlier. But on a whole it is useful to look through all theexplicit type conversions in which memsize types are used.
  11. 11. 3.16. Overloaded functionsWhile porting 32-bit programs on a 64-bit platform the working logic may be changed and this is relatedto the use of overloaded functions. If a function is overlaid for 32-bit and 64-bit values, the access to itwith the use of an argument of memsize type will be translated into different calls on different systems.Such a change in the working logic may be dangerous. An example of this is saving into and reading fromthe data file by means of a set of functions like:class CMyFile { ... void Write(__int32 &value); void Write(__int64 &value);};CMyFile object;SSIZE_T value;object.Write(value);Depending on the compilation mode (32- or 64-bit) this code will write into the file a different numberof bytes what may cause failure of files formats compatibility.3.17. Bit fieldsIf you use bit fields you should take into consideration that using memsize types will cause change ofsizes of structures and alignment. But thats not all. Lets consider a peculiar example:struct BitFieldStruct { unsigned short a:15; unsigned short b:13;};BitFieldStruct obj;obj.a = 0x4000;size_t addr = obj.a << 17; //Sign Extensionprintf("addr 0x%Ixn", addr);//Output on 32-bit system: 0x80000000//Output on 64-bit system: 0xffffffff80000000Pay attention that if you compile this code for a 64-bit system you will have signed extension in "addr =obj.a << 17;" expression despite that both variables addr and obj.a are unsigned. This signed extension isdetermined by rules of type conversions which work in the following way:
  12. 12. 1) obj.a member of the structure is converted from the bit field of unsigned short type into int type. Weget int type but not unsigned int because the 15-bit field is placed into a 32-bit signed integer.2) "obj.a << 17" expression has int type but it is converted into ptrdiff_t and then into size_t before it isassigned to addr variable. Signed extension occurs in the moment of conversion from int into ptrdiff_t.3.18. Use of strictly defined values when calculating shifts inside structuresIt can be very dangerous when you are trying to calculate fields addresses inside structures manually.Such actions often cause generation of incorrect code. Diagnosis of such type errors is presented inC++test analyzer but unfortunately it is poorly described.3.19. Use of long typeThe use of long types in cross-platform code is theoretically always dangerous when porting code from a32-bit to a 64-bit system. This is because long type has different sizes in two most popular data models -LP64 and LLP64. This kind of check implements search of all longs in applications code.3.20. Use of macros preventing the compiler from checking typesThis check is implemented in C++ Test unlike Viva64 and PC-Lint, but all the macros open and the fullcheck is carried anyway. Thats why lets consider that this type of check is implemented in Viva64 andPC-Lint too.3.21. Overflow of arrays with explicitly defined sizeSometimes you may find an overflow of an array which will occur while porting on a 64-bit architecture.For example:struct A { long n, m; };void foo(const struct A *p) { static char buf[ 8 ]; // should have used sizeof memcpy(buf, p, sizeof( struct A )); //Owerflow ...4. Efficiency of static analyzersIt is difficult to speak about efficiency of static analyzers. For sure, static analysis methodology is veryuseful and allows you to detect more errors already at the stage of writing the code what significantlyreduces the period of debugging and testing.But you should remember that static code analysis will never help you to detect all the errors even inthe concrete sphere of 64-bit code analysis. Lets list the main reasons:1. Some C++ languages elements are difficult for analysis. First of all this refers to generic classes codeas they work with different data types using same constructions.2. Errors occurring while porting a 32-bit program on a 64-bit system may be not only in the code itselfbut also appear indirectly. A good example is a stacks size which by default doesnt change and equals 1MB in Visual Studio 2005/2008 while building a 64-bit version of a project. While working 64-bit codemay fill the stack much more than 32-bit code. This is related to the growth of sizes of pointers and
  13. 13. other objects, to a different alignment. As a result a 64-bit programs version may suddenly lack thestack while working.3. There are algorithmical errors caused by some suppositions concerning types dimensions whichchange in a 64-bit system.4. Exterior libraries may also contain errors.This list is not complete but it allows us to state that some errors can be detected only when launching aprogram. In other words we need load testing of applications, dynamic analysis systems (for example,Compuware BoundsChecker), unit-testing, manual testing etc.Thus, only a complex approach using different strategies and tools can guarantee a good quality of a 64-bit program.You should also understand that criticism we referred to above by no means reduces the efficiency ofstatic analysis. Static analysis is the most efficient method of detecting errors while porting 32-bit codeon 64-bit systems. It allows you to detect most errors in rather a short time. The advantages of staticanalysis are as follows:1. Possibility to check all the code branches irrespectively of the frequency of their execution in realconditions.2. Possibility to carry the check already at the stage of migration or development of the code. It allowsyou to correct a lot of errors before testing and debugging. It saves a lot of resources and time. It iscommonly known that the earlier an error is detected the cheaper it is to correct it.3. A static analyzer can detect unsafe constructions a programmer considers correct as far as they arevalid on 32-bit systems.4. Static analysis allows you to evaluate codes quality from the point of view of its correctness for 64-bitsystems and thus make the best plan of work.5. A specialized analyzer Viva64 is the leader in the sphere of diagnosis of 64-bit code for Windows OS.First of all it is because of its orientation on LLP64 data model, and also because new specific diagnosisrules are implemented in it [1].In the sphere of diagnosis of 64-bit code for operation system of Unix family preference should be givento a universal analyzer PC-Lint. You cannot judge about its leadership by table 3 but it implements moreimportant rules in comparison with C++ Test.References 1. Evgeniy Ryzhkov. Viva64: working up of 64-bit applications. 2. 3. Andrey Karpov. Forgotten problems of developing 64-bit programs. 4. Andrey Karpov, Evgeniy Ryzhkov. 20 issues of porting C++ code on the 64-bit platform. 5. Andrey Karpov. Problems of testing 64-bit applications. 6.
  14. 14. 7. Andrey Karpov, Evgeniy Ryzhkov. Development of resource-intensive applications in Visual C++ environment.