The reasons why 64-bit programs require more stack memory

330 views

Published on

In forums, people often say that 64-bit versions of programs consume a larger amount of memory and stack. Saying so, they usually argue that the sizes of data have become twice larger. But this statement is unfounded since the size of most types (char, short, int, float) in the C/C++ language remains the same on 64-bit systems. Of course, for instance, the size of a pointer has increased but far not all the data in a program consist of pointers. The reasons why the memory amount consumed by programs has increased are more complex. I decided to investigate this issue in detail.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
330
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The reasons why 64-bit programs require more stack memory

  1. 1. The reasons why 64-bit programsrequire more stack memoryAuthor: Andrey KarpovDate: 07.06.2010In forums, people often say that 64-bit versions of programs consume a larger amount of memory andstack. Saying so, they usually argue that the sizes of data have become twice larger. But this statement isunfounded since the size of most types (char, short, int, float) in the C/C++ language remains the sameon 64-bit systems. Of course, for instance, the size of a pointer has increased but far not all the data in aprogram consist of pointers. The reasons why the memory amount consumed by programs hasincreased are more complex. I decided to investigate this issue in detail.In this post, I will speak about the stack and in future I plan to discuss memory allocation and binarycodes size. And I would like also to note right away that the article covers the language C/C++ andMicrosoft Visual Studio development environment.Until recently, I have believed that the code of a 64-bit application cannot consume the stack quickerthan twice in comparison to 32-bit code. Relying on this assumption, in my articles, I recommended toincrease the program stack two times just in case. But now I have explored an unpleasant thing: stackconsumption might grow much higher than twice. I was astonished since I considered the stack growthof two times the worst-case scenario. The reason of my unfounded hopes will become clear a bit later.But now lets see how parameters are passed in a 64-bit program when calling functions.When developing calling conventions for the x86-64 architecture, they decided to bring an end tovarious versions of function calls. In Win32, there was a wide range of calling conventions: stdcall, cdecl,fastcall, thiscall, etc. In Win64, there is only one "native" calling convention. Modifiers like __cdecl areignored by the compiler. I think everybody agrees that such an axe of calling conventions is noble.The calling convention on the x86-64 platform resembles the fastcall convention existing in x86. In thex64-convention, the first four integer arguments (left to right) are passed in 64-bit registers chosenspecially for this purpose:RCX: the 1-st integer argumentRDX: the 2-nd integer argumentR8: the 3-rd integer argumentR9: the 4-th integer argumentThe rest integer arguments are passed through the stack. The pointer "this" is considered an integerargument, so it is always placed into the RCX register. If floating-point values are passed, the first four ofthem are passed in the registers XMM0-XMM3 while all the next are passed through the stack.Relying on this information, I concluded that a 64-bit program can in many cases save the stack memoryunlike a 32-bit one. For if parameters are passed through registers and the code of the function is brief
  2. 2. and there is no need to save the arguments in the memory (stack), then the size of the stack memorybeing consumed must be smaller. But it is not so.Although arguments can be passed in registers, the compiler all the same reserves some space for themin the stack by reducing the value of the RSP register (the stack pointer). Each function must reserve atleast 32 bytes (four 64-bit values corresponding to the registers RCX, RDX, R8, R9) in the stack. Thisspace in the stack allows to easily save the contents of the registers passed into the function in thestack. The function being called is not required to save input parameters passed through the registersinto the stack but reserving space in the stack allows to do this if necessary. If more than four integerparameters are passed, some additional space must be reserved in the stack.Lets consider an example. Some function passes two integer parameters to a child function. Thecompiler places the arguments values into the registers RCX and RDX and meanwhile subtracts 32 bytesfrom the RSP register. The function being called can address the parameters through the registers RCXand RDX. If the code of this function needs these registers for some purpose, it can copy their contentsinto the reserved space in the stack with the size 32 bytes.The described feature leads to a significant growth of the stack consumption speed. Even if the functiondoes not have parameters, 32 bytes will be "bit off" the stack anyway and they will not be used anyhowthen. I failed to find the reason for such a wasteful mechanism. There were some explanationsconcerning unification and simplification of debugging but this information was too vague.Note another thing. The stack pointer RSP must be aligned on a 16-byte boundary before a next functioncall. Thus, the total size of the stack being used when calling a function without parameters in 64-bitcode is: 8 (the return address) + 8 (alignment) + 32 (reserved space for arguments) = 48 bytes!Lets see what it might cause in practice. Here and further, I will use Visual Studio 2010 for myexperiments. Lets make a recursive function like this:void StackUse(size_t *depth){ volatile size_t *ptr = 0; if (depth != NULL) ptr = depth; cout << *ptr << endl; (*ptr)++; StackUse(depth); (*ptr)--;}The function is deliberately a bit confused to prevent the optimizer from turning it into "nothing". Themain thing here is: the function has an argument of the pointer type and one local variable, also pointer-type. Lets see how much stack is consumed by the function in the 32-bit and 64-bit versions and howmany times it can be recursively called when the stacks size is 1 Mbyte (the size by default).
  3. 3. Release 32-bit: the last displayed number (stack depth) - 51331The compiler uses 20 bytes when calling this function.Release 64-bit: the last displayed number - 21288The compiler uses 48 bytes when calling this function.Thus, the 64-bit version of the StackUse function is more than twice voracious than the 32-bit one.Note that changing of data alignment rules might also influence the size of consumed stack. Letsassume that the function takes the following structure as an argument:struct S{ char a; size_t b; char c;};void StackUse(S s) { ... }The size of the S structure increases from 12 bytes to 24 bytes when being recompiled in the 64-bitversion due to changes of alignment rules and change of the b members size. The structure is passedinto the function by the value. And, correspondingly, the structure will also take twice more memory inthe stack.Can it all be so bad? No. Do not forget that the 64-bit compiler can handle more registers than the 32-bitone. Lets complicate the experiment functions code:void StackUse(size_t *depth, char a, int b){ volatile size_t *ptr = 0; int c = 1; int d = -1; for (int i = 0; i < b; i++) for (char j = 0; j < a; j++) for (char k = 0; k < 5; k++) if (*depth > 10 && k > 2) { c += j * k - i;
  4. 4. d -= (i - j) * c; } if (depth != NULL) ptr = depth; cout << c << " " << d << " " << *ptr << endl; (*ptr)++; StackUse(depth, a, b); (*ptr)--;}Here are the results of its execution:Release 32-bit: the last displayed number - 16060The compiler uses 64 bytes this time when calling this function.Release 64-bit: the last displayed number - 21310The compiler still uses 48 bytes when calling this function.The 64-bit compiler managed to use additional registers for this sample and build a more efficient codeallowing us to reduce the amount of the stack memory being consumed!Conclusions 1. One cannot foresee how much stack memory a 64-bit version of a program will consume in comparison to a 32-bit one. It might be both less (unlikely) and much more. 2. For a 64-bit program, you should increase the amount of reserved stack 2-3 times. 3 times is better - just to feel at ease. To do this, see the parameter Stack Reserve Size (the /STACK:reserve switch) in project settings. By default the stacks size is 1 Mbyte. 3. You should not worry if your 64-bit program consumes more stack memory. There is much more physical memory in 64-bit systems. The stack with the size 2 Mbytes on a 64-bit system with 8 Gbytes of memory takes fewer percent of memory than 1 Mbyte of stack in a 32-bit system with 2 Gbytes.Additional References 4. Raymond Chen. The history of calling conventions, part 5: amd64. http://www.viva64.com/go.php?url=325 5. Kevin Frei. x64 ABI vs. x86 ABI (aka Calling Conventions for AMD64 & EM64T). http://www.viva64.com/go.php?url=326 6. MSDN. x64 Software Conventions. http://www.viva64.com/go.php?url=327 7. Wikipedia. x86 calling conventions. http://www.viva64.com/go.php?url=328

×