Optimization of 64-bit programs


Published on

Some means of 64-bit Windows applications performance increase are considered in the article.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Optimization of 64-bit programs

  1. 1. Optimization of 64-bit programsAuthor: Andrey KarpovDate: 12.10.2008AbstractSome means of 64-bit Windows applications performance increase are considered in the article.IntroductionPeople often have questions concerning 64-bit solutions performance and means of its increasing. Somequestionable points are considered in this article and then some recommendations concerning programcode optimization are given.1. The result of porting to 64-bit systemsIn a 64-bit environment old 32-bit application run owing to Wow64 subsystem. This subsystem emulates32-bit environment by means of an additional layer between a 32-bit application and 64-bit WindowsAPI. In some localities this layer is thin, in others its thicker. For an average program the productivityloss caused by this layer is about 2%. For some programs this value may be larger. 2% are certainly notmuch but still we have to take into account the fact that 32-bit applications function a bit slower under a64-bit operation system than under a 32-bit one.Compiling of a 64-bit code not only eliminates Wow64 but also increases performance. Its related toarchitectural alterations in microprocessors, such as the increase in number of general-purposeregisters. For an average program the expected performance growth caused by an ordinary compilationis 5-15%. But in this case everything depends upon the application and data types. For instance, AdobeCompany claims that new 64-bit "Photoshop CS4" is 12% faster than its 32-bit version.Some programs dealing with large data arrays may greatly increase their performance when expandingaddress space. The ability to store all the necessary data in the random access memory eliminates slowoperations of data swapping. In this case performance increase can be measured in times, not inpercent rate.Here we can consider the following example: Alfa Bank has integrated Itanium 2-based platform into itsIT infrastructure. The banks investment growth resulted in the fact that the existing system becameunable to cope with the increasing workload: users service delays attained its deadline. Case analysisshowed up that the systems bottleneck is not the processors performance but the limitation of 32-bitarchitecture in a memory subsystem part that does not allow using efficiently more than 4 GB of theservers addressing space. The data base itself was larger than 9 GB. Its intensive usage resulted in thecritical workload of input-output subsystem. Alfa Bank decided to purchase a cluster consisting of twofour-processor Itanium2-based servers with 12GB of random access memory. This decision allowed toensure the necessary level of systems performance and fault-tolerance. As explained by companyrepresentatives implementation of Itanium2-based servers allowed to terminate problems to cut costs.
  2. 2. 2. Program code optimizationWe can consider optimization at three levels: microprocessor instructions optimization, codeoptimization on the level of high-level languages and algorithmic optimization (which takes into accountpeculiarities of 64-bit systems). The first one is available when we use such development tools asassembler and is too specific to be of any interest for a wide audience. For those who are interested inthis theme we can recommend "Software Optimization Guide for AMD64 Processors" [2] -an AMD guideof application optimization for a 64-bit architecture. Algorithmic optimization is unique for every taskand its consideration is beyond this article.From the point of view of high-level languages, such as C++, 64-bit architecture optimization depends onthe choice of optimal data types. Using homogeneous 64-bit data types allows the optimizing compilerto construct a simpler and more efficient code, as theres no need to convert 32-bit and 64-bit data interse often. Primarily, this can be referred to variables which are used as loop counters, array indexes andfor variables storing different sizes. Traditionally we use such types as int, unsigned and long torepresent the above-listed types. With 64-bit Windows systems which use LLP64 [3] data model thesetypes remain 32-bit ones. In a number of cases this results in less efficient code construction for thereare some additional conversions. For instance, if you need to figure out the address of an element in anarray with a 64-bit code, first you must turn the 32-bit index into a 64-bit one.The use of such types as ptrdiff_t and size_t is more effective, as they possess optimal size forrepresenting indexes and counters. For 32-bit systems they are scaled as 32-bit, for 64-bit systems as 64-bit (see table 1). Table 1. Data type dimension of 32-bit and 64-bit versions of Windows operation system.
  3. 3. Using ptrdiff_t, size_t and derivative types allows to optimize program code up to 30%. You can study anexample of such optimization in the article "Development of resource-intensive applications in VisualC++ environment" [4]. Additional advantage here is a more reliable code. Using 64-bit variables asindexes permits to avoid overflows when we deal with large arrays having several billions of elements.Data type alteration is not an easy task far less if the alteration is really necessary. We bring forwardViva64 static code analyzer as a tool which is meant to simplify this process. Though it specializes in 64-bit code error search, one can considerably increase code performance if he follows itsrecommendations on data type alteration.3. Memory usage decreaseAfter a program was compiled in a 64-bit regime it starts consuming more memory than its 32-bitvariant used to do. Often this increase is almost imperceptible but sometimes memory consumptionincreases two times. This coheres with the following reasons: • Increasing memory allocation size for certain objects storage, for instance, pointers; • Alteration of regulations of data alignment in structures; • Stack memory consumption increase.One can often put up with ram memory consumption increase. The advantage of 64-bit systems isexactly that the amount of this memory is rather large. Theres nothing bad in the fact that with a 32-bitsystem having 2 GB of memory a program took 300 MB, but with a 64-bit system having 8 GB ofmemory this program takes 400 MB. In relative units, we see that with a 64-bit system this programtakes three times less available physical memory. There is no sense trying to fight this memoryconsumption growth. Its easier to add some memory.But the increase of consumed memory has one disadvantage. This increase causes loss of performance.Though a 64-bit program code functions faster, extracting of large amounts of data out of memoryfrustrate all the advantages and even decrease performance. Data transfer between memory andmicroprocessor (cache) is not a cheap operation.Let us assume that we have a program which processes a large amount of text data (up to 400 MB). Itcreates an array of pointers, each indicating a succeeding word in the processed text. Let the averageword length be 5 symbols. Then the program will require about 80 million pointers. So, a 32-bit variantof the program will require 400 MB + (80 MB * 4) = 720 MB memory. As for a 64-bit version of theprogram, it will require 400 MB+ (80 MB * 8) = 1040 MB memory. This is a considerable increase whichmay adversely affect the program performance. And if theres no need to process gigabyte-sized texts,the chosen data structure will be useless. The use of unsigned- type indexes instead of pointers may beviewed as a simple and effective solution of the problem. In this case the size of consumed memoryagain is 720 MB.One can waste considerable amount of memory altering regulations of data alignment. Let us consideran example:struct MyStruct1{ char m_c;
  4. 4. void *m_p; int m_i;};Structure size in a 32-bit program is 12 bytes, and in a 64-bit one it is 24 bytes, which is not thrifty. Butwe can improve this situation by altering the sequence of elements in the following way:struct MyStruct2{ void *m_p; int m_i; char m_c;};MyStruct2 structure size still equals to 12 bytes in a 32-bit program, and in a 64-bit program it is only 16bytes. Therewith, from the point of view of data access efficiency MyStruct1 and MyStruct2 structuresare equivalent. Picture 1 is a visual representation of structure elements distribution in memory.
  5. 5. Picture 1.Its not easy to give clear instructions concerning order of elements in structures. But the commonrecommendation is the following: the objects should be distributed in the order of their size decrease.The last point is stack memory consumption growth. Storing of larger return addresses and dataalignment increases the size. Optimizing them makes no sense. A sensible developer would never createmegabyte-sized objects in stack. Remember that if you are porting a 32-bit program to a 64-bit systemdont forget to alter the size of stack in project settings. For instance, you can double it. On default a 32-bit application as well as a 64-bit one is assigned a 2MB stack as usual. It may turn out to be insufficientand securing makes sense.ConclusionThe author hopes that this article will help in efficient 64-bit solutions development and invites you tovisit www.viva64.com to learn more about 64-bit technologies. You can find lots of items devoted todevelopment, testing and optimization of 64-bit applications. We wish you the best of luck in developingyour 64-bit projects.
  6. 6. References 1. Valentin Sedykh. Russian 64 bit: lets dot all the "i"s. 2. http://www.viva64.com/go.php?url=151 3. Software Optimization Guide for AMD64 Processors. http://www.viva64.com/go.php?url=59 4. Blog "The Old New Thing": "Why did the Win64 team choose the LLP64 model?" http://www.viva64.com/go.php?url=25 5. Andrey Karpov, Evgeniy Ryzhkov. Development of Resource-intensive Applications in Visual C++. http://www.viva64.com/art-1-2-2014169752.html