AMD64 (EM64T) architectureAuthors: Evgeniy Ryzhkov, Andrey KarpovDate: 02.10.2008AbstractThe article briefly describes AMD64 architecture by AMD Company and its implementation EM64T byIntel Company. The architectures peculiarities, advantages and disadvantages are described.IntroductionDevelopment of computer-solved tasks demands more and more from the hardware these tasks arebeing solved on. The requirements to computer systems of personal-computer class have been growingyear by year for 20 years already. It happens because people wish to solve on their personal computersmore and more complex tasks which have been earlier solved only on high-performance mainframes.What are these requirements to the personal computers for solving complex tasks? Of course, these arerequirements of main-memory size and processors performance (dont mix up with frequency!). IA32architecture (Intel Architecture 32) dominating during the last decade offers 4Gb (2^32) of mainmemory of which only 2Gb are usually allocated to an application; different register blocks and sets ofvarious tricks such as branch predication block, which should increase the systems performancewithout increasing such an abstract parameter as processors frequency .Modern tasks for personal computers approach 2Gb while processors frequency increase cannot helpincrease performance.Newly-developed 64-bit architectures SPARC64 and Intel Itanium can to some extend serve to solve theproblem of modern 32-bit computers limitations. But they are intended for hi-end systems and are notavailable as cheap solutions. It is AMD64 architecture by AMD Company and its implementation EM64Tby Intel Company which are to become really popular. These architectures are twins and programscompiled for one of them can be launched on the other as well. But it is the solution by AMD thathistorically appeared first. EM64T is actually only an implementation of AMD64 by Intel. AMD64architecture is now implemented in processors of all classes: mobiles, work-stations, servers.Despite evident advantages of AMD64 platform (which are described in detail in this article) it doesntintroduce anything revolutionary into computing machinery. Porting from 32 bits to 64 bits didnt leadto quality improvements while previous porting from 16 bits to 32 bits had increased systems safetyand performance significantly.1. AMD64 architectureAMD64 architecture is fully described in five documentation volumes provided by AMD Company. Thischapter provides a brief description based on the first volume . Pay attention that in officialdocumentation this architecture is defined as AMD x86-64 what underlines its backward compatibility.
1.1. The architectures descriptionAMD x86-64 architecture is a simple but powerful backward compatible extension of the obsolete 64 backward-compatibleindustrial architecture x86 . It adds 64 ]. 64-bit address space and extends register resources forsupporting more performance for recompiled 64 bit programs providing support of obsolete 16 64-bit 16-bit and32-bit code of applications and operational systems without modifying or recompiling them. bitNecessity of 64-bit x86 architecture is explained by applications which need large address space. Theseare high-performance servers, data managers, CAD systems and of course games. Such applications will performance CAD-systemsgain an advantage due to 64-bit address space and more registers. Few registers available in obsolete bitx86 architecture limit computing task performance. More registers provide sufficient performance for computing-taskmost applications.x86-64 architecture introduces two new peculiarities: 641. Extended registers (Picture 1): • 8 general-purpose registers; purpose • all 16 general-purpose registers are 64 purpose 64-bit; • 8 new 128-bit XMM registers; bit • a new command prefix (REX) for access to extended registers.2. special mode "Long Mode" which is shown in Table 1: • up to 64-bit virtual addresses; bit • 64-bit command pointer (RIP); bit • flat address space.
Table 1. Processor operating modes.Table 2 contains comparison of registers and stacks resources available to an application in differentmodes. Left columns show resources provided by obsolete x86 architecture which are available only tocompatibility. Right columns show resources available in 64 bit mode. The difference between the 64-bitmodes is marked grey.
Table 2. Registers and stack available in different modesAs shown in Table 2 obsolete x86 architecture (this mode is called legacy mode in x86 x86-64) supports 8general-purpose registers. But actually only 4 registers are usually used: EAX, EBX, ECX, EDX. Registers purpose ,EBP, ESI, EDI, ESP have a special purpose X86-64 architecture adds 8 general- purpose. -purpose registers andenlarges the register range from 32 bits to 64 bits. It allows compilers to increase code performance. A64-bit compiler can use registers for storing variables more efficiently. The compiler also allows you to bit efficiently.minimize memory access by locating operation inside general purpose registers. general-purpose • x86-64 architecture supports the whole set of x86 instructions and adds some new instructions 64 for supporting long-mode. The commands are divided into several subsets: mode.
• General-purpose commands. These are main x86 integer commands used in all programs. Most of them are intended for loading, saving and processing data located in general-purpose registers or memory. Some of these commands manage the command stream providing passage from one program section to another. • 128-bit media-commands. These are SSE and SSE2 (streaming SIMD extension) commands intended for loading, saving or processing data located in 128-bit XMM registers. They perform integer or floating-point operations over vector (packed) and scalar data types. As vector commands can perform one operation over a data set independently they are called single- instruction, multiple-data (SIMD) commands. They are used for media- and science applications for processing data blocks. • 64-bit media-commands. These are multimedia extension (MMX) and 3DNow! Commands. They save, restore and process data located in 64-bit MMX registers. Like 128-bit commands described before they perform integer and floating-point operations over vector (packed) and scalar data. • x87 commands. They are intended for working with the floating point in obsolete x87 applications. They process data in x87 registers.Some of these commands connect two or more subsets of the commands described above. Forexample, such are commands of data transmission between general-purpose registers and XMM orMMX registers.Lets consider in detail the operating modes shown in Table 1 supported by x86-64. In most casesaddresses and operands sizes can be overlayed by a command prefix.Lets describe long-mode at first. This is an extension of the obsolete protected mode. Long-modeconsists of two submodes: 64-bit mode and compatibility mode. 64-bit mode supports all the newpossibilities and register extensions introduced into x86-64. Compatibility mode supports binarycompatibility with existing 16-bit and 32-bit code. Long-mode doesnt support obsolete real mode orobsolete virtual-8086 mode and it also doesnt support hardware task switching.As 64-bit mode supports 64-bit address space you need to use a new 64-bit operational system for itswork. Meanwhile, the existing applications can be launched without recompiling in compatibility modeunder the OS working in 64-bit mode. For 64-bit command addressing a 64-bit register (RIP) and a newaddressing mode with single flat address space for code, stack and data are used.64-bit mode implements support of extended registers through a new prefix group of REX commands.In 64-bit mode addresses size is 64 bits on default but implementations of x86-64 may have a smallersize. An operands size is 32 bits on default. For most instructions the operands size can be overlaidusing a prefix of REX-type commands.64-bit mode provides data addressing relative to the 64-bit register RIP. X86 architecture providedaddressing relative to IP register only in control transfer commands. RIP-relative addressing increasesefficiency of position-independent code and code addressing global data.Some opcode commands were redefined to support extended registers and 64-bit addressing.Compatibility mode is intended for executing existing 16-bit and 32-bit programs in a 64-bit OS.Applications are launched in compatibility mode with the use of 32- or 16-bit address space and can
have access to 4Gb of virtual address space. Commands prefixes can switch 16- and 32-bit addressesand operands sizes.From the applications viewpoint compatibility mode looks like the obsolete protected x86 mode butfrom the viewpoint of the OS (address translation, processing of interruptions and exceptions) 64-bitmechanisms are used.Legacy mode provides binary compatibility not only with 16- and 32-bit applications but with 16- and32-bit operational systems as well. It includes three modes: • Protected mode. 16- and 32-bit programs with segmental memory organization, privilege and virtual memory support. Address space is 4Gb. • Virtual-8086 mode. Supports 16-bit applications launched as tasks in protected mode. Address space is 1Mb. • Real mode. Supports 16-bit programs with simple register addressing of segmented memory. Virtual memory and privileges are not supported. 1Mb of memory is available.Legacy mode is used only when 16- and 32-bit OS are operating.1.2. The architectures advantagesLets outline the main advantages of AMD x86-64 architecture. • 64-bit address space. • Extended register set. • Developer-habitual command set. • Possibility of launching obsolete 32-bit applications in a 64-bit OS. • Possibility of using a 32-bit OS.1.3. The architectures disadvantagesThe new architecture AMD x86-64 hasnt introduced crucial disadvantages into 32-bit architecture. Wecan point out only a bit increased programs memory requirements because of the larger size ofaddresses and operands. But it wont influence however significantly the code size or the requirementsto available main memory.But the fact is that AMD x86-64 hasnt introduced anything significantly new. There is no performancegain. On the average, you can expect 5-15% performance gain after recompiling a program.AMD64 program modelNearly all modern OS now have versions for AMD64 architecture. Thus, Microsoft presents Windows XP64-bit, Windows Server 2003 64bit, Windows Vista 64bit. The leading UNIX system developers alsoprovide 64-bit versions, such as, for example, Linux Debian 3.1 x86-64. But it doesnt mean that thewhole code of such a system is completely 64-bit. Some OS code and many applications still can remain32-bit as AMD64 provides backward compatibility.64-bit Windows version, for example, uses a special mode WoW (Windows-on-Windows 64) whichtranslates 32-bit applications calls to the resources of a 64-bit OS. Lets consider in detail AMD64program model available to a programmer in 64-bit Windows [3, 4] shortly called Win64.
Lets begin with address space. Although a 64-bit processor can theoretically address 16 exabyte (2^64)Win64 now supports 16 terabytes (2^44). There are several reasons for this. Existing processors canprovide access only to 1 terabyte (2^40) of actual storage. The architecture (but not the hardware part)can extend this space up to 4 petabytes. But anyway we need a great memory size for page tablesrepresenting memory. (see Table 3). 32-bit mode 64-bit modeProcesss general 4Gb 16Tbaddress spaceAddress space 2Gb (3Gb if the system is 4Gb if the application is compiled withavailable to a 32-bit loaded with /3GB key) /LARGEADDRESSAWARE key (2Gb otherwise)processAddress space Impossible 8Tbavailable to a 64-bitprocessPaged pool 470Mb 128GbNon-paged pool 256Mb 128GbSystem Page Table 660Mb - 900Mb 128Gb(PTE)Table 3. Main memory limitations in WindowsLike in Win32 the addressed memory range is divided into user and system addresses. Each processreceives 8Tb and 8Tb remain in the system (unlike 2Gb and 2Gb in Win32 correspondingly). DifferentWindows versions have different limitations shown in Table 4.Actual storage and number of processors 32-bit models 64-bit modelsWindows XP Home 4 Gb, 1 CPU Not presentWindows XP Professional 4 Gb, 1-2 CPU 128 Gb, 1-2 CPUWindows Server 2003, Standard 4 Gb, 1-4 CPU 32 Gb, 1-4 CPUWindows Server 2003, Enterprise 64 Gb, 1-8 CPU 1 Tb, 1-8 CPUWindows Server 2003, Datacenter 64 Gb, 8-32 CPU 1 Tb, 8-64 CPUWindows Server 2008, Datacenter 64 Gb, 2-64 CPU 2 Tb, 2-64 CPUWindows Server 2008, Enterprise 64 Gb, 1-8 CPU 2 Tb, 1-8 CPUWindows Server 2008, Standard 4 Gb, 1-4 CPU 32 Gb, 1-4 CPUWindows Server 2008, Web Server 4 Gb, 1-4 CPU 32 Gb, 1-4 CPUVista Home Basic 4 Gb, 1 CPU 8 Gb, 1 CPUVista Home Premium 4 Gb, 1-2 CPU 16 Gb, 1-2 CPUVista Business 4 Gb, 1-2 CPU 128 Gb, 1-2 CPUVista Enterprise 4 Gb, 1-2 CPU 128 Gb, 1-2 CPUVista Ultimate 4 Gb, 1-2 CPU 128 Gb, 1-2 CPUTable 4. Limitations of different Windows versionsLike in Win32 a pages size is 4Kb. First 4Kb of address space are never shown, i.e. the least true addressis 0x10000. Unlike Win32 system DLL are loaded exceeding 4Gb.All the processors implementing AMD64 have support for "CPU No Execution" bit which is used byWindows for implementing the hardware technology "Data Execution Protection" (DEP) which forbidsexecution of user data instead of code. It allows you to increase programs safety excluding influence ofsuch errors as execution of the buffer with data as code.The peculiarity of AMD64 compilers is that they can most efficiently implement registers for passingparameters into functions instead of using the stack. It allowed Win64 architecture developers to get rid
off such a notion as calling convention. In Win32 you can use different conventions (ways of passingparameters): __stdcall, __cdecl, __fastcall etc. In Win64 there is only one calling convention. Letsconsider an example of how four arguments of integer-type are passed in registers: • RCX: first argument • RDX: second argument • R8: third argument • R9: fourth argumentArguments after the first four integers are passed on the stack. For float arguments XMM0-XMM3 boththe registers and the stack are used.The difference in calling conventions leads to that you cannot use both 64-bit and 32-bit code in oneprogram. In other words, if an application is compiled for 64-bit mode all the used DLL libraries must be64-bit too.While writing 64-bit code you can get additional performance gain thanks to special optimization. Thisquestion is considered in detail in optimizing instructions .3. Porting applications on AMD64One of the purposes of high-level languages is to reduce as far as possible the binding of program codeto the architecture and provide the most possible portability between hardware platforms. For example,C++ programs written correctly are theoretically independent from the hardware platform. And, ideally,to compile the corresponding 32-bit applications for AMD64 platform it is enough only to change thecompiler [ 6] and just recompile the program. But in practice everything is more complicated.Software using Assembler code for 32-bit processors still exists. Many programs written in high-levellanguages contain Assembler blocks. Thats why it is often impossible just to recompile a large project.The solution of this problem is clear. Firstly, you can refuse porting an application on a new platform. Itcan be a very reasonable solution because, for example, Windows-family OS provide good backwardcompatibility due to Wow64 technology. The second variant is to rewrite the program code. Moreover,it seems reasonable to rewrite it using high-level languages. By the way, pay attention that Visual C++compiler doesnt support compilation of Assembler blocks in 64-bit compilation mode anymore .Presence of Assembler program code is not the only obstacle we face while mastering 64-bit systems.While porting programs on 64-bit systems different errors occur relating to changing of the data model(type dimension). Whats more, some errors become apparent only while using large memory size whichwas unavailable in 32-bit systems. Such errors are well described in the article "20 issues of porting C++code on the 64-bit platform" .All said above relates mostly to C/C++ applications. It is better with managed code (C#) although we canface some small problems here as well. Unfortunately, large program complexes are often built usinglibraries written in C/C++. And thats why in case of a large C# project it most likely contains C/C++modules or libraries which can be unsafe and contain vulnerabilities.For testing and checking program code ported on a 64-bit platform you can use different specialmethods and tools . For example, such static analyzers as Viva64 (for Windows systems) and PC-Lint(for Unix systems) can provide good results. To learn more about this toolkit read the article"Comparison of analyzers diagnostic abilities while testing 64-bit code" .
ConclusionUndoubtedly, AMD64 architecture offered by AMD Company turned out to be needed on market.AMD64s advantage is that it allows you to smoothly switch to 64-bit programs without losingcompatibility with obsolete 32-bit applications. But there is nothing revolutionary in AMD64.Migration of 32-bit programs on AMD64, as experiments demonstrate, allows you, firstly, to solve taskswhich are much more memory-demanding and, secondly, get about 10% performance gain "just so"without changing code due to optimization of an application by the compiler for the new architecture.We may conclude that AMD64 architecture postponed the problem of limited available main-memorysize for many years but didnt solve the problem of modern personal computers performance gain. Thefuture is still with multi-core and multi-processor systems.References 1. Intel Software Developers Manual. Volume 1: Basic Architecture. http://www.viva64.com/go.php?url=212 2. AMD x86-64 Architecture Programmers Manual. Volume 1: Application Programming. http://www.viva64.com/go.php?url=213 3. Mike Wall. Tricks for Porting Applications to 64-Bit Windows on AMD64 Architecture. http://www.viva64.com/go.php?url=214 4. Matt Pietrek. Everything You Need To Know To Start Programming 64-Bit Windows Systems. http://www.viva64.com/go.php?url=215 5. Software Optimization Guide for AMD Athlon 64 and AMD Opteron Processors. http://www.viva64.com/go.php?url=59 6. Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 Platforms. http://www.viva64.com/go.php?url=216 7. Daniel Pistelli. Moving to Windows Vista x64. http://www.viva64.com/go.php?url=217 8. Andrey Karpov, Evgeniy Ryzhkov. 20 issues of porting C++ code on the 64-bit platform. http://www.viva64.com/art-1-2-599168895.html 9. Andrey Karpov. Problems of testing 64-bit applications. http://www.viva64.com/art-1-2- 1289354852.html 10. Andrey Karpov. Comparison of analyzers diagnostic abilities while testing 64-bit code. http://www.viva64.com/art-1-2-914146540.html