Virtual Memory In Contemporary Microprocessors      And 64-Bit Microprocessors ArchitectureScopeVirtual Memory ScopeVirtua...
Table of contents  1. Introduction……………………………………………………………………page 6  2. Case Study Virtual Memory In Contemporary Microproc...
Introduction to Virtual memory in contemporary microprocessorsVirtual memory is a technique for managing the resource of p...
Module DescriptionCase Study Virtual Memory In Contemporary MicroprocessorsVirtual memory is a technique for managing the ...
During a TLB miss, the instruction pipeline effectively freezes: in contrast to taking anexception, the pipeline is not di...
and mapped/unmapped regions. In both architectures, virtual addresses are extended with anaddress space identifier (ASID) ...
shared memory and allows the kernel to access its own code and data while operating on behalfof a user process without hav...
TLBWR instruction writes the PTE into the TLB at a randomly selected location. At this point,the mapping for the faulting ...
parameters, and location in the linear address space. A 20-bit limit field specifies the segment’smaximum legal length fro...
Case Study 64-bit microprocessor architectures and discuss the followingfeatures briefly:       a. Bus architectures      ...
GB limit pretty soon if current trends continue. A 64-bit chip has none of these constraintsbecause a 64-bit RAM address s...
Process LogicBUS architectureComputer Bus ArchitectureComputers comprises of many internal components and in order for the...
4. Timing - The bus provides a system clock signal to synchronize the peripherals attached to itwith the rest of the syste...
H/W support to memory managementDatabase applications and database management systems typically use large amounts ofmemory...
also be a shortage of PTEs or TLB entries, in which case the OS will have to free one for the newmapping.In some cases a "...
Pipeline architectureA pipe is a message queue. A message can be anything. A filter is a process, thread, or othercomponen...
Pipelines have also been used to implement compilers. Each stage of compilation is a filter:The scanner reads a stream of ...
18 | P a g e
References             1.   citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.26.352.             2.   ieeexplore.ieee.org...
Upcoming SlideShare
Loading in...5
×

Virtual Memory In Contemporary Microprocessors And 64-Bit Microprocessors Architecture

1,090

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,090
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
36
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Virtual Memory In Contemporary Microprocessors And 64-Bit Microprocessors Architecture

  1. 1. Virtual Memory In Contemporary Microprocessors And 64-Bit Microprocessors ArchitectureScopeVirtual Memory ScopeVirtual memory: a technique for executing processes that aren’t entirely in memory whichprovides the illusion of large memory.. It use a combination of RAM + disk. Swap parts of theprogram (pages) in and out of memory as needed. Page tables keep track of the pages.On 32 bitplatforms each user process sees a separate 32 bit address space allowing 4 Gbytes of virtualmemory per process. by default half is reserved for the OS Large memory intensive applicationsrun more effectively using 64-bit Windows Most modern PCs use the AMD64 processorarchitecture which is capable of running as either a 32-bit or 64-bit system all address referencesare logical references that are translated at run time to real addresses a process can be broken upinto pieces two approaches are paging and segmentation management scheme requires bothhardware and software support.A 64 Bit Processor ScopeA 64-bit processor can theoretically address 16 Ebytes of memory (2^64), Win64 now supportsonly 16 Tbytes (2^44). There are some reasons for that. Contemporary processors can provideaccess only to one Tbyte (2^40) of physical memory. The architecture (but not the hardwarepart) can extend this space up to 4 Pbytes (2^52). But in this case you need an immense amountof memory for the page tables representing it.Besides the limitations described above, the size ofmemory available in every particular version of the 64-bit Windows depends upon thecommercial reasons of Microsoft. Different Windows versions have different limitationsThespecific feature of compilers for Intel 64 is that they can most effectively use registers to passparameters into functions instead of using the stack. It allowed the developers of the Win64architecture to get rid of such a notion as calling convention. In Win32, you may use differentconventions: __stdcall, __cdecl, __fastcall, etc. In Win64, there is only one calling convention. 1|Page
  2. 2. Table of contents 1. Introduction……………………………………………………………………page 6 2. Case Study Virtual Memory In Contemporary Microprocessors………………page 7 1) Memory management………………………………………………….…page 7 2) Address Space……………………………………………………………page 8 3) Segment registers and pointers…………………………………………...page 12 3. Case Study 64-bit microprocessor architectures……………………….……….page 13 4. Pipeline architectures……………………….…………………………………..page 13 5. Bus Architecture………………………………………………………………..page 15 6. H/W support to memory management………………………………………….page 17 7. References………………………………………………………………………page 23 2|Page
  3. 3. Introduction to Virtual memory in contemporary microprocessorsVirtual memory is a technique for managing the resource of physical memory. It gives anapplication the illusion of a very large amount of memory, typically much larger than what isactually available. It protects the code and data of user-level applications from the actions ofother programs but also allows programs to share portions of their address spaces if desired. Itsupports the execution of processes partially resident in memory. Only the most recently usedportions of a process’s address space actually occupy physical memory—the rest of the addressspace is stored on disk until needed.The various micro architectures define the virtual-memory interface differently, and, asexplained in the next section, this is becoming a significant problem. Here, we consider thememory management designs of a sampling of six recent processors, focusing primarily on theirarchitectural differences, and hint at optimizations that someone designing or porting systemsoftware might want to consider.Introduction to 64-bit Microprocessors architectureA microprocessor incorporates the functions of a computers central processing unit (CPU) on asingle integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmabledevice that accepts digital data as input, processes it according to instructions stored in itsmemory, and provides results as output. It is an example of sequential digital logic, as it hasinternal memory. Microprocessors operate on numbers and symbols represented in the binarynumeral system. The advent of low-cost computers on integrated circuits has transformedmodern society. General-purpose microprocessors in personal computers are used forcomputation, text editing, multimedia display, and communication over the Internet. Many moremicroprocessors are part of embedded systems, providing digital control of a myriad of objectsfrom appliances to automobiles to cellular phones and industrial process control. 64-bit microprocessor designs have been in use in several markets since the early 1990s,the early 2000s saw the introduction of 64-bit microprocessors targeted at the PC market. Themove to 64 bits by PowerPC processors had been intended since the processors design in theearly 90s and was not a major cause of incompatibility. Existing integer registers are extended asare all related data pathways, but, as was the case with IA-32, both floating point and vector unitshad been operating at or above 64 bits for several years. Unlike what happened when IA-32 wasextended to x86-64, no new general purpose registers were added in 64-bit PowerPC, so anyperformance gained when using the 64-bit mode for applications making no use of the largeraddress space is minimal. 3|Page
  4. 4. Module DescriptionCase Study Virtual Memory In Contemporary MicroprocessorsVirtual memory is a technique for managing the resource of physical memory. It gives anapplication the illusion of a very large amount of memory, typically much larger than what isactually available. It protects the code and data of user-level applications from the actions ofother programs but also allows programs to share portions of their address spaces if desired. Itsupports the execution of processes partially resident in memory. Only the most recently usedportions of a process’s address space actually occupy physical memory—the rest of the addressspace is stored on disk until needed. For a primer on virtual memory, see our companion articlein Computer magazine.One most contemporary general-purpose processors support virtual memory through a hardwarememory management unit (MMU) that translates virtual addresses to physical addresses.Unfortunately, the various micro architectures define the virtual-memory interface differently,and, as explained in the next section, this is becoming a significant problem.Here, we consider the memory management designs of a sampling of six recent processors,focusing primarily on their architectural differences, and hint at optimizations that someonedesigning or porting system software might want to consider. We selected examples from themost popular commercial micro architectures: the MIPS R10000, Alpha 21164,PowerPC 604, PA-8000, Ultra SPARC-I, and Pentium II. Table 1 points out a few of theirsimilarities by comparing their support for some core virtual-memory functions.Memory managementThe classic MMU, as in the DEC VAX, GE 645, and Intel Pentium architectures,2-4includes atranslation look-aside buffer that translates addresses and a finite-state machine that walks thepage table. The TLB is an on chip memory structure that caches only page table entries (PTEs).If the necessary translation information is in the TLB, the system can translate a virtual addressto a physical address without accessing the page table. If the translation information is not foundin the TLB (called a TLB miss), one must search the page table for the mapping and insert it intothe TLB before processing can continue. In early designs a hardware state machine performedthis activity; on a TLB miss, the state machine walked the page table, loaded the mapping,refilled the TLB, and restarted the computation. TLBs usually have on the order of 100 entries,are often fully associative, and are typically accessed every clock cycle. They trans- late bothinstruction and data stream addresses.They can constrain the chip’s clock cycle as they tend to be fairly slow, and they are also power-hungry—both are a consequence of the TLB’s high degree of associativity. Today’s systemsrequire both high clock speeds and low power; in response, two-way and four way set-associative TLB designs are popular, as lower degrees of associativity have far less impact onclock speed and power consumption than fully associative designs. To provide increasedtranslation bandwidth, designers often use split TLB designs. The state machine is an efficientdesign as it disturbs the processor pipeline only slightly. 4|Page
  5. 5. During a TLB miss, the instruction pipeline effectively freezes: in contrast to taking anexception, the pipeline is not disturbed, and the reorder buffer need not be flushed. Theinstruction cache is not used, and the data cache is used only if the page table is located incacheable space. At the worst, the execution of the state machine will replace a few lines in thedata cache. Many designs do not even freeze the pipeline; for instance, the Intel Pentium Proallows instructions that are independent of the faulting instruction to continue processing whilethe TLB miss is serviced. The primary disadvantage of the state machine is that the page tableorganization is effectively etched in stone; the operating system (OS) has little flexibility intailoring a design. In response, recent memory management designs have used a software-managed TLB, in which the OS handles TLB misses. MIPS was one of the earliest commercialarchitectures to offer a software-managed TLB,5 though the Astronautics Corporation ofAmerica holds a patent for a software-managed design.6 In a software-managed TLB miss, thehardware interrupts the OS and vectors to a software routine that walks the page table. The OSthus defines the page table organization, since hardware never directly manages the table.The flexibility of the software-managed mechanism comes at a performance cost. The TLB misshandler that walks the page table is an OS primitive usually 10 to 100 instructions long. If thehandler code is not in the instruction cache at the time of the TLB miss, the time to handle themiss can be much longer than in the hardware-walked scheme. In addition, the use of theinterrupt mechanism adds a number of cycles to the cost by flushing the pipeline—possiblyflushing a large number of instructions from the reorder buffer. This can add hundreds of cyclesto the overhead of walking the page table. Nonetheless, the flexibility afforded by the softwaremanaged scheme can outweigh the potentially higher per-miss cost of the design.7 Given the fewdetails presented so far, one can easily see that the use of different virtual memory interfacedefinitions in every micro architecture is becoming a significant problem. More often than not,the OS running on a microprocessor was not initially designed for that processor: OSs oftenoutlast the hardware on which they were designed and built, and the more popular OSs areported to many different architectures. Hardware abstraction layers (for example, see Rashid etal.8 and Custer9) hide hardware particulars from most of the OS, and they can prevent systemdesigners from fully optimizing their software. These types of mismatches between OSs andmicro architectures cause significant performance problems;10 an OS not tuned to the hardwareon which it operates is unlikely to live up to its potential performance. The following sectionsdescribe the different commercial virtual memory interfaces. First is the MIPS organization,which has the most in common with the others. Then, we concentrate on those mechanisms thatare unique to each architecture. MIPS. MIPS defines one of the simplest memory managementarchitectures among recent microprocessors. The OS handles TLB misses entirely in software:the software fills in the TLB,and the OS defines the TLB replacement policy.Address spaceThe R2000/R3000 virtual address is 32 bits wide; the R10000 virtual address is 64 bits wide,though not all 64 bits are translated in the R10000. The top “region” bits divide the virtual spaceinto areas of different behavior. The top two bits distinguish between user, supervisor, and kernelspaces (the R10000 offers three levels of execution access privileges). Further bits divide thekernel and supervisor regions into areas of different memory behavior (that is, cached/ uncached,mapped/unmapped). In the R2000/R3000, the top bit divides the 4-Gbyte address space into userand kernel regions, and the next two bits further divide the kernel’s space into cached/uncached 5|Page
  6. 6. and mapped/unmapped regions. In both architectures, virtual addresses are extended with anaddress space identifier (ASID) to distinguish between contexts. The 6-bit-wide ASID on theR2000/R3000 uniquely identifies 64 processes; the 8-bit-wide ASID on the R10000 uniquelyidentifies 256. Since it is simpler, we describe the 32-bit address space of the R2000/R3000.User space, called kuseg, occupies the bottom 2 Gbytes of the address space. All kusegreferences are mapped through the TLB and considered cacheable by the hardware unlessotherwise noted in a TLB entry. The top half of the virtual address space belongs to the kernel:an address generated with the top bit set while in user mode causes an exception. Kernel space isdivided into three regions: the 1-Gbyte kseg2 region is cacheable and mapped through the TLBlike kuseg. The other two 512-Mbyte regions (kseg0 and kseg1) are mapped directly ontophysical memory; the hardware zeroes out the top three bits to generate the physical addressdirectly. The hardware then caches references to kseg0, but not the references to kseg1. TLBThe MIPS TLB is a unified 64-entry, fully associative cache. The OS loads page table entries(PTEs) into the TLB, using either random replacement (the hardware chooses a TLB slotrandomly) or specified placement (the OS tells the hardware which slot to choose). The TLB’s64 entries are partitioned between “wired” and “random” entries. While the R2000/R3000 haseight wired entries, the partition between wired and random entries is set by the R10000software. The hardware provides a mechanism to choose one of the random slots. On request, itproduces a random number between index values of 8 and 63, inclusive (the R10000 producesvalues between Nand 63, inclusive, where Nis set by software). This random number referencesonly the TLB’s random entries; by not returning values corresponding to wired entries, iteffectively protects those entries. The TLBWR (TLB write random) instruction uses thismechanism to insert a mapping randomly into the TLB, and the TLBWI (TLB write indexed)instruction inserts mappings at any specified location. Most OSs use the wired partition to storerootlevel PTEs and kernel mappings in the protected slots, keeping user mappings in the randomslots and using a low-cost random replacement policy to manage them. The OS interacts with the TLB through the EntryHi and EntryLo registers, pictured I .EntryHi contains a virtual page number and an ASID; EntryLo corresponds to a PTE andcontains a page frame number and status bits. A TLB entry is equivalent to the concatenation ofthese structures. The R10000 structure is similar but larger. It also has two separate EntryLoregisters one for each of two paired virtual page umbers. This allows the R10000 to effectivelydouble the reach of the TLB without adding more entries. A single TLB entry maps every two contiguous even-odd virtual pages, though eachreceives its own page frame number (PFN) and status bits. The design saves die area and power.It is nearly as flexible as a 128- entry TLB but requires half the tag area— because two mappingsshare each virtual page number (VPN)—and half the comparators. In the MIPS R2000/R3000,the status fields in EntryLo are • N, noncacheable. If this bit is set for a TLB entry, the page itmaps is not cached; the processor sends the address out to main memory without accessing thecache. • D, dirty. If this bit is set, the page is writable. The bit can be set by software, so it iseffectively a write-enable bit. A store to a page with the dirty bit cleared causes a protectionviolation.• V, valid. If this bit is set, the entry contains a valid mapping. • G, global. If this bit is set, theTLB ignores the ASID match requirement for a TLB hit on this page. This feature supports 6|Page
  7. 7. shared memory and allows the kernel to access its own code and data while operating on behalfof a user process without having to save or restore ASIDs. The R10000 inherits this organizationand adds more powerful control-status bits that support features such as more complex cachingbehavior and specification for coherency protocols. Also, if the G bit is set for either page in apaired entry, the ASID check is disabled for both pages. When the OS reads an entry from theTLB, the hardware places the information into EntryHi and EntryLo. When the OS inserts amapping into the TLB, it first loads the desired values into these registers. It then exe- cutes aTLBWR instruction or it loads a slot number into the index register and executes a TLBWIinstruction. Thus, the OS has the tools to implement a wide range of replacement policies.Periodic TLB flushes are unavoidable in these MIPS processors, as there are 64 unique contextidentifiers in the R2000/R3000 and 256 in the R10000. Many systems have more activeprocesses than this, requiring ASID sharing and periodic remapping. When an ASID istemporarily reassigned from one process to another, it is necessary to first flush TLB entries withthat ASID. It is possible to avoid flushing the cache by flushing the TLB; since the caches arephysically tagged, the new process cannot overwrite the old process’s data. Address translationand TLB-miss handling MIPS supports a simple bottom-up hierarchical page tableorganization,1 though an OS is free to choose a different page table organization. We describethe R2000/3000 translation mechanism here; the R10000 mechanism is similar. The VPN of anypage in a user’s address space is also an index into the user page table.On a user-level TLB miss, one can use the faulting VPN to create a virtual address for themapping PTE. Frequently, the OS will successfully load the PTE with this address, requiringonly one memory reference to handle a TLB miss. In the worst case, a PTE lookup will requirean additional memory reference to look up the root-level PTE as well. MIPS offers a hardwareassist for the software software TLB miss handler: the TLB context register, as depicted inFigure 3. At the time of a user-level TLB miss, the context register contains the virtual address ofthe PTE that maps the faulting page. The system software loads the top bits of the TLB contextregister, called PTEBase.PTEBase represents the virtual base address of the current process’s user page table. When a useraddress misses the TLB, hardware fills in the next bits of the context register with the VPN ofthe faulting address. The bottom bits of the context register are defined to be zero (theR2000/3000 PTE is 4 bytes, the R10000 PTE is 8 bytes), so the faulting VPN is an index into thelinear user page table structure and identifies the user PTE that maps the faulting address. TheTLB miss handler can use the context register immediately, and the handler looks much like thefollowing (those who know the MIPS instruction set will notice that a few NOPs have beenomitted for clarity): mfc0 k0,tlbcxt #move the contents of TLB #context register into k0 mfc0k1,epc #move PC of faulting load #instruction into k1 lw k0,0(k0) #load thru address that was#inTLB context register mtc0 k0,entry_lo #move the loaded value #into the EntryLo registertlbwr #write entry into the TLB #at a random slot number j k1 #jump to PC of faulting #loadinstruction to retry rfe #RESTORE FROM #EXCEPTION The handler first moves the addressout of the TLB context register into general-purpose register k0, through which it can use theaddress as a load target. The program counter of the instruction that caused the TLB miss isfound in exception register epc and is moved into general-purpose register k1. The handler loadsthe PTE into k0 then moves it directly into EntryLo. EntryHi is already filled in by the hardware;it contains the faulting VPN and the ASID of the currently executing process (the one that causedthe TLB miss). 7|Page
  8. 8. TLBWR instruction writes the PTE into the TLB at a randomly selected location. At this point,the mapping for the faulting Pentium II’s address translation mechanism. The Pentium IImemory management design features a segmented 32-bit address space, split TLBs, and ahardware-managed TLB miss mechanism. Processes generate 32-bit near pointers and 48-bit farpointers that are mapped by the segmentation mechanism onto a 32-bit linear address space. Thelinear address space is mapped onto the physical space through the TLBs and hardware-definedpage table. The canonical two-tiered hierarchical page table uses a 4-Kbyte root-level table, 4-Kbyte PTE pages, and 4-byte PTEs. The processor supports 4-Kbyte and 4-Mbyte page sizes, aswell as 2-Mbyte superpages in some modes. Physical addresses in the IA-32 are 32 bits wide,though the Pentium II supports an even larger physical address through its physical addressextension or 36-bit page-size extension modes. In either of these modes, physical addressesproduced by the TLB are 36 bits wide. Protection and shared memory The Pentium II is anothersegmented architecture with no explicit ASIDs. For performance reasons, its segmentation mechanism is often unused by today’soperating systems, which typically flush the TLBs on context switch to provide protection. Thecaches are physically indexed and tagged and therefore need no flushing on context switch,provided the TLBs are flushed. The location of the root page table is loaded into one of a set ofcontrol registers, CR3, and on a TLB miss the hardware walks the table to refill the TLB. Ifevery process has its own page table, the TLBs are guaranteed to contain only entries belongingto the current process—those from the current page table— if the TLBs are flushed and the valuein CR3 changes on context switch. Shared memory is often implemented by aliasing—duplicating mapping information across page tables.1 Writing to the CR3 control register flushesthe entire TLB; during a context switch, the hardware writes to CR3, so flushing the TLB is partof the hardware-defined contexts witch protocol. Globally shared pages (protected kernel pagesor library pages) can have the global bit of their PTE set. This bit keeps the entries from beingflushed from the TLB; an entry so marked remains indefinitely in the TLB until it is intentionallyremoved. Segmented addressing The IA-32 segmentation mechanism supports variable-size (1-byteto 4-Gbyte) segments; the size is set by software and can differ for every segment. Unlike othersegmented schemes, in which the global space is much larger than each process’s virtual space,the IA- 32 global virtual space (the linear address space) is the same size or, from one viewpoint,smaller than an individual user-level address space. Processes generate 32-bit addresses that areextended by 16-bit segment selectors. Two bits of the 16-bit selector contain protectioninformation, 13 bits select an entry within a software descriptor table (similar to the PowerPCsegment registers or the PA-RISC space registers), and the last bit chooses between two differentdescriptor tables. Conceptually, an application can access several thousand segments, each ofwhich can range from 1 byte to 4 Gbytes. This may seem to imply an enormous virtual space, buta process’s address space is actually 4 Gbytes.During address generation, the segment’s base address is added to the 32-bit address the processgenerates. A process actually has access to several thousand segments, each of which ultimatelylies completely within the 4-Gbyte linear address space. The processor cannot distinguish morethan four unique Gbytes of data at a time; it is limited by the linear address space. The segmentselector indexes the global and local descriptor tables. The entries in these tables are calledsegment descriptors and contain information including the segment’s length, protection 8|Page
  9. 9. parameters, and location in the linear address space. A 20-bit limit field specifies the segment’smaximum legal length from 1 byte to 4 Gbytes. A granularity bit determines how the 20-bit limitfield is to be interpreted. If the granularity bit is clear, the limit field specifies a maximum lengthfrom 1 byte to 1 Mbyte in increments of 1 byte. If the granularity bit is set, the limit fieldspecifies a maximum length from 4 Kbytes to 4 Gbytes in increments of 4 Kbytes. The segmentdescriptor also contains a 2-bit field specifying one of four privilege levels (highest is usuallyreserved for the OS kernel, lowest for user-level processes, and intermediate levels are for OSservices). Other bits indicate fine-grained protection, whether the segment is allowed to grow(for example, a stack segment), and whether the descriptor allows interprivilege-level transfers.These transfers support special functions such as task switching, calling exception handlers,calling interrupt handlers, and accessing shared libraries or code within the OS from the userlevel.Segment registers and pointersFor improved performance, the hardware caches six of a process’s segment selectors anddescriptors in six segment registers. The IA-32 divides each segment register into “visible” and“hidden” parts. Software can modify the visible part, the segment selector. The software cannotmodify the hidden part, the corresponding segment descriptor. Hardware loads the correspondingsegment descriptor from the local or global descriptor table into the hidden part of the segmentregister whenever a new selector is placed into the visible part of the segment register. Thesegment registers are similar to the segment registers of the PowerPC architecture in that theyhold the IDs of segments that comprise a process’s address space. They differ in that they arereferenced by context rather than by a field in the virtual address. Instruction fetches implicitlyreference CS, the register holding the code segment selector. Any stack references (Push or Popinstructions) use SS, the register holding the stack segment selector.Destinations of string instructions like MOVS, CMPS, LODS, or STOS, implicitly use ES, oneof the four registers holding data segment selectors. Other data references use DS by default, butapplications can override this explicitly if desired, making available the remaining data-segmentregisters FS and GS. A near pointer references a location within one of the currently activesegments, that is, the segments whose selectors are in the six segment registers. On the otherhand, the application may reference a location outside the currently active segments by using a48-bit far pointer. This loads a selector and corresponding descriptor into a segment register;then, the segment is referenced. Near pointers are less flexible but incur less overhead than farpointers (one fewer memory reference), and so they tend to be more popular. If used properly,the IA-32 segmentation mechanism could provide address-space protection, obviating the needto flush the TLBs on context switch. Protection is one of the original intents of segmentation.3The segments guarantee protection if the 4-Gbyte linear address space is divided among alLprocesses, in the way that the PowerPC divides its 52-bit virtual space among all processes.However, 4 Gbytes is an admittedly small amount of space in which to work. 9|Page
  10. 10. Case Study 64-bit microprocessor architectures and discuss the followingfeatures briefly: a. Bus architectures b. H/W support to memory managementPipeline architecturesA 64-bit processor is a microprocessor with a word size of 64 bits, a requirement for memoryand data intensive applications such as computer-aided design (CAD) applications, databasemanagement systems, technical and scientific applications, and high-performance servers. 64-bitcomputer architecture provides higher performance than 32-bit architecture by handling twice asmany bits of information in the same clock cycle. 64-bit defines certain classes of computerarchitecture, buses, memory and CPUs, and by extension the software that runs on them.In computer architecture, 64-bit integers, memory addresses, or other data units are those that areat most 64 bits (8 octets) wide. Also, 64-bit CPU and ALU architectures are those that are basedon registers, address buses, or data buses of that size. 64-bit is also a term given to a generationof computers in which 64-bit processors are the norm.A 64-bit register can store 264 = 18 446 744 073 709 551 616 different values.Without further qualification, 64-bit computer architecture generally has integer and addressingregisters that are 64 bits wide, allowing direct support for 64-bit data types and addresses.However, a CPU might have external data buses or address buses with different sizes from theregisters, even larger (the 32-bit Pentium had a 64-bit data bus, for instance). The term may alsorefer to the size of low-level data types, such as 64-bit floating-point numbers.The 64-bit processor is backwards compatible with older applications and operating systems; itdetects whether an application or operating system is 16-bit, 32-bit, or 64-bit and computesaccordingly. This is essential for enterprise situations where purchasing new software is notfeasible.Intel, IBM, Sun Microsystems, Hewlett Packard, and AMD currently develop or offer 64-bitprocessors.Sixty-four-bit processors have been with us since 1992, and in the 21st century they have startedto become mainstream. Both Intel and AMD have introduced 64-bit chips, and the Mac G5sports a 64-bit processor. Sixty-four-bit processors have 64-bit ALUs, 64-bit registers, 64-bitbuses and so on.One reason why the world needs 64-bit processors is because of their enlarged address spaces.Thirty-two-bit chips are often constrained to a maximum of 2 GB or 4 GB of RAM access. Thatsounds like a lot, given that most home computers currently use only 256 MB to 512 MB ofRAM. However, a 4-GB limit can be a severe problem for server machines and machinesrunning large databases. And even home machines will start bumping up against the 2 GB or 4 10 | P a g e
  11. 11. GB limit pretty soon if current trends continue. A 64-bit chip has none of these constraintsbecause a 64-bit RAM address space is essentially infinite for the foreseeable future -- 2^64bytes of RAM is something on the order of a billion gigabytes of RAM.With a 64-bit address bus and wide, high-speed data buses on the motherboard, 64-bit machinesalso offer faster I/O (input/output) speeds to things like hard disk drives and video cards. Thesefeatures can greatly increase system performance. 11 | P a g e
  12. 12. Process LogicBUS architectureComputer Bus ArchitectureComputers comprises of many internal components and in order for these components tocommunicate with each other a „bus‟ is used.A bus is a common pathway through which information is connected from one component toanother. This pathway is used for communication and can be established between two or morecomputer components. We are going to review computer bus architectures that are used incomputers.Functions of Buses1. Data sharing - the expansion bus must be able to transfer data between the computer and theperipherals connected to it.The data is transferred in parallel, which allows the exchange of 1, 2, 4 or even 8 bytes of data ata time. (A byte is a group of 8 bits). Buses are classified depending on how many bits they canmove at the same time, which means that we have 8-bit, 16-bit or even 32-bit buses.2. Addressing - A bus has address lines, which match those of the processor. This allows data tobe sent to or from specific memory locations.3. Power - A bus supplies power to various peripherals that are connected to it. 12 | P a g e
  13. 13. 4. Timing - The bus provides a system clock signal to synchronize the peripherals attached to itwith the rest of the system.The expansion bus facilitates the easy connection of additional components and devices on acomputer for example the addition of a TV card or sound card.Bus TerminologiesComputers can be viewed to be having just two types of buses: 1. System bus:- The bus thatconnects the CPU to main memory on the motherboard. The system bus is also called thefrontside bus, memory bus, local bus, or host bus. 2. A number of I/O Buses, (Acronym forinput/output), connecting various peripheral devices to the CPU -these are connected to thesystem bus via a „bridge‟ implemented in the processors chipset. Other names for the I/O businclude “expansion bus", "external bus” or “host bus”.Expansion Bus TypesThese are some of the common expansion bus types that have ever been used in computers: ISA- Industry Standard Architecture EISA- Extended Industry Standard Architecture MCA- Micro Channel Architecture VESA- Video Electronics Standards Association PCI- Peripheral Component Interconnect PCMCIA - Personal Computer Memory Card Industry Association (Also called PC bus) AGP- Accelerated Graphics Port SCSI- Small Computer Systems Interface. 13 | P a g e
  14. 14. H/W support to memory managementDatabase applications and database management systems typically use large amounts ofmemory. Maintaining application working sets in memory helps reduce storage I/O. In addition,data access patterns can cause large changes in the contents of the working set. These two factorsmake efficient memory management a determinant factor in database performance.Memory management in virtual machines differs from physical machines in one key aspect:virtual memory address translation. Guest virtual memory addresses are translated to guestphysical addresses using the guest operating system page tables. ESX then translates the guestphysical addresses into machine physical addresses on the host. ESX maintains mappings fromthe guest virtual addresses to the machine physical addresses in shadow page tables. The shadowpage tables allow ESX to avoid doing two levels of translation for every memory access and arecached in the hardwares TLB. However, creating and maintaining the shadow page tables causessome overhead.Hardware support for memory management unit virtualization is available in current processors.Offerings from Intel and AMD are called EPT and RVI, respectively. This support consists of anadditional level of page tables implemented in hardware. These page tables contain guestphysical to machine physical memory address translations.A memory management unit (MMU), sometimes called paged memory management unit(PMMU), is a computer hardware component responsible for handling accesses to memoryrequested by the CPU. Its functions include translation of virtual addresses to physical addresses(i.e., virtual memory management), memory protection, cache control, and bus arbitration and insimpler computer architectures (especially 8-bit systems) bank switching. Modern MMUstypically divide the virtual address space (the range of addresses used by the processor) intopages, each having a size which is a power of 2, usually a few kilobytes, but they may be muchlarger. The bottom n bits of the address (the offset within a page) are left unchanged. The upperaddress bits are the (virtual) page number. The MMU normally translates virtual page numbersto physical page numbers via an associative cache called a Translation Look aside Buffer (TLB).When the TLB lacks a translation, a slower mechanism involving hardware-specific datastructures or software assistance is used. The data found in such data structures are typicallycalled page table entries (PTEs), and the data structure itself is typically called a page table. Thephysical page number is combined with the page offset to give the complete physical address. A PTE or TLB entry may also include information about whether the page has been written to(the dirty bit), when it was last used (the accessed bit, for a least recently used page replacementalgorithm), what kind of processes (user mode, supervisor mode) may read and write it, andwhether it should be cached.Sometimes, a TLB entry or PTE prohibits access to a virtual page, perhaps because no physicalrandom access memory has been allocated to that virtual page. In this case the MMU signals apage fault to the CPU. The operating system (OS) then handles the situation, perhaps by trying tofind a spare frame of RAM and set up a new PTE to map it to the requested virtual address. If noRAM is free, it may be necessary to choose an existing page (known as a victim), using somereplacement algorithm, and save it to disk (this is called "paging"). With some MMUs, there can 14 | P a g e
  15. 15. also be a shortage of PTEs or TLB entries, in which case the OS will have to free one for the newmapping.In some cases a "page fault" may indicate a software bug. A key benefit of an MMU is memoryprotection: an OS can use it to protect against errant programs, by disallowing access to memorythat a particular program should not have access to. Typically, an OS assigns each program itsown virtual address space.An MMU also reduces the problem of fragmentation of memory. After blocks of memory havebeen allocated and freed, the free memory may become fragmented (discontinuous) so that thelargest contiguous block of free memory may be much smaller than the total amount.With virtual memory, a contiguous range of virtual addresses can be mapped to several non-contiguous blocks of physical memory.While this article concentrates on modern MMUs, commonly based on pages, early systems useda similar concept for base-limit addressing that further developed into segmentation. Those areoccasionally also present on modern architectures. The x86 architecture provided segmentationrather than paging in the 80286, and provides both paging and segmentation in the 80386 andlater processors (although the use of segmentation is not available in 64-bit operation). 15 | P a g e
  16. 16. Pipeline architectureA pipe is a message queue. A message can be anything. A filter is a process, thread, or othercomponent that perpetually reads messages from an input pipe, one at a time, processes eachmessage, then writes the result to an output pipe. Thus, it is possible to form pipelines of filtersconnected by pipes:The inspiration for pipeline architectures probably comes from signal processing. In this contexta pipe is a communication channel carrying a signal (message), and filters are signal processingcomponents such as amplifiers, noise filters, receivers, and transmitters. Pipelines architecturesappear in many software contexts. (They appear in hardware contexts, too. For example, manyprocessors use pipeline architectures.) UNIX and DOS command shell users create pipelines byconnecting the standard output of one program (i.e., cout) to the standard input of another :% cat inFile | grep pattern | sort > outFileIn this case pipes (i.e., "|") are inter process communication channels provided by the operatingsystem, and filters are any programs that read messages from standard input, and write theirresults to standard output.LISP programmers can represent pipes by lists and filters by list processing procedures. Pipelinesare built using procedural composition. In each case the nums parameter represents a list ofintegers:// = list got by removing even numbers from nums (define (filterEvens nums) ... )// = list got by squaring each n in nums (define (mapSquare nums) ... )// = sum of all n in nums (define (sum nums) ... )We can use these procedures to build a pipeline that sums the squares of odd integers:Heres the corresponding LISP definition:// = sum of squares of odd n in nums (define (sumOddSquares nums) (sum (mapSquare(filterEvens nums)))) 16 | P a g e
  17. 17. Pipelines have also been used to implement compilers. Each stage of compilation is a filter:The scanner reads a stream of characters from a source code file and produces a stream oftokens. A parser reads a stream of tokens and produces a stream of parse trees. A translator readsa stream of parse trees and produces a stream of assembly language instructions. We can insertnew filters into the pipeline such as optimizers and type checkers, or we can replace existingfilters with improved versions.Filter ClassificationThere are four types of filters: producers, consumers, transformers, and testers. A producer is aproducer of messages. It has no input pipe. It generates a message into its output pipe. Aconsumer is a consumer of messages. It has no output pipe. It eats messages taken from its inputpipe. A transformer reads a message from its input pipe, modulates it, then writes the result to itsoutput pipe. (This is what DOS and UNIX programmers call filters.) A tester reads a messagefrom its input pipe, and then tests it.If the message passes the test, it is written, unaltered, to the output pipe; otherwise, it isdiscarded. (This is what signal processing engineers call filters). Filters can also be classified asactive or passive. An active filter has a control loop that runs in its own process or thread. Itperpetually reads messages from its input pipe, processes them, then writes the results to itsoutput pipe. An active filter needs to be derived from a thread class provided by the operatingsystem.There are two types of passive filters. A data-driven filter is activated when another filter writesa message into its input pipe. A demand-driven filter is activated when another filter attempts toread a message from its empty output pipe. 17 | P a g e
  18. 18. 18 | P a g e
  19. 19. References 1. citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.26.352. 2. ieeexplore.ieee.org › Browse › Journals › Micro, IEEE 3. www.ece.cmu.edu/~jhoe/course/ece447/handouts/L20.pdf 4. drum.lib.umd.edu/handle/1903/7465 5. en.wikipedia.org/wiki/64-bit 6. en.wikibooks.org/wiki/Microprocessor.../Computer_Architecture 7. www.csbdu.in/econtent/...%20Microprocessors/UNIT%205.pdf 8. Memory systems and pipelined processors By Harvey G. Cragon 19 | P a g e

×