To Enhance Performance-Increase in clock rateo Involves reducing clock cycle timeo Can increase the performance by increasing number ofinstructions finishing per secondo H/w limitations limit this featureCache hierarchieso Having frequently used data on the processor cachesreduces average accesses timeIntroduction
Pipeliningo Implementation Technique whereby multiple instructionsare overlapped in executiono Limited by the dependencies between instructionso Effected by stalls and effective CPI is greater than 1Instruction Level Parallelismo It refers to techniques to increase the number ofinstructions executed in each clock cycle.o Exists whenever the machine instructions that make up aprogram are insensitive to the order in which they areexecuted if dependencies does not exist, they may beexecuted.
Thread level parallelismChip Multi Processingo Two processors, each with full set of execution andarchitectural resources, reside on a single die.Time Slice Multi Threadingo single processor to execute multiple threads byswitching between themSwitch on Event Multi Threadingo switch threads on long latency events such as cachemisses
Simultaneous Multi Threadingo Multiple threads can execute on a single processorwithout switching.oThe threads execute simultaneously and make muchbetter use of the resources.oIt maximizes the performance vs. transistor count andpower consumption.Thread level parallelism (cont..)
Hyper-Threading TechnologyHyper-Threading Technology brings the simultaneousmulti-threading approach to the Intel architecture. Hyper-Threading Technology makes a single physicalprocessor appear as two or more logical processors Hyper-Threading Technology first invented by Intel Corp. Hyper-Threading Technology provides thread-level-parallelism (TLP) on each processor resulting in increasedutilization of processor and execution resources.Each logical processor maintain one copy of the architecturestate
Processor ExecutionResourcesProcessor ExecutionResourcesArch State Arch State Arch StateProcessor with out Hyper-Threading TechnologyProcessor with Hyper-Threading TechnologyRef: Intel Technology Journal, Volume 06 Issue 01, February 14, 2002Hyper-Threading Technology Architecture
Register Alias Tables Next-Instruction Pointer Instruction Streaming Buffers and Trace Cache FillBuffers Instruction Translation Look-aside BufferFollowing resources are duplicated to support Hyper-Threading Technology
Sharing of Resources Major Sharing Schemes are-o Partitiono Thresholdo Full SharingPartition Each logical processor uses half the resources Simple and low in complexity Ensures fairness and progress Good for major pipeline queues
Partitioned Queue Example• Yellow thread – It is faster thread• Green thread – It is slower thread
Partitioned Queue Example• Partitioning resource ensures fairness andensures progress for both logical processors.
Threshold Puts a threshold on number of resource entries a logicalprocessor can use. Limits maximum resource usage For small structures where resource utilization in burst andtime of utilization is short, uniform and predictable Eg- Processor Scheduler
Full Sharing Most flexible mechanism for resource sharing, do notlimit the maximum uses for resource usage for a logicalprocessor Good for large structures in which working set sizes arevariable and there is no fear of starvation Eg: All Processor caches are sharedo Some applications benefit from a shared cachebecause they share code and data, minimizingredundant data in the caches
• Two modes of operations– single-task (ST)– multi-task (MT).• MT-mode- There are two active logical processors andsome of the resources are partitioned.• There are two flavors of ST-mode: single-task logicalprocessor 0 (ST0) and single-task logical processor 1(ST1).• In ST0- or ST1-mode, only one logical processor is active,and resources that were partitioned in MT-mode are re-combined to give the single active logical processor use ofall of the resourcesSINGLE-TASK AND MULTI-TASK MODES
• HALT instruction that stops processor execution.• On a processor with Hyper-Threading Technology,executing HALT transition the processor from MT-modeto ST0- or ST1-mode, depending on which logicalprocessor executed the HALT.• In ST0- or ST1-modes, an interrupt sent to the haltedlogical processor would cause a transition to MT-mode.
OPERATING SYSTEM• For best performance, the operating system shouldimplement two optimizations.– The first is to use the HALT instruction if one logicalprocessor is active and the other is not. HALT willallow the processor to transition MT mode to either theST0- or ST1-mode.– The second optimization is in scheduling softwarethreads to logical processors. The operating systemshould schedule threads to logical processors ondifferent physical processors before scheduling twothreads to the same physical processor.
Business Benefits of Hyper-ThreadingTechnology• Higher transaction rates for e-Businesses• Improved reaction and response times for end-users andcustomers.• Increased number of users that a server system can support• Handle increased server workloads• Compatibility with existing server applications andoperating systems
Performance increases fromHyper-Threading Technology onan OLTP workloadWeb server benchmarkperformance
Conclusion•Intel’s Hyper-Threading Technology brings the concept ofsimultaneous multi-threading to the Intel Architecture.•It will become increasingly important going forward as it adds anew technique for obtaining additional performance for lowertransistor and power costs.•The goal was to implement the technology at minimum costwhile ensuring forward progress on logical processors, even ifthe other is stalled, and to deliver full performance even whenthere is only one active logical processor.
References• “HYPER-THREADING TECHNOLOGYARCHITECTURE AND MICROARCHITECTURE” byDeborah T. Marr, Frank Binns, David L. Hill, GlennHinton,David A. Koufaty, J. Alan Miller, Michael Upton,intel Technology Journal, Volume 06 Issue 01, PublishedFebruary 14, 2002. Pages: 4 –15.• “:HYPERTHREADING TECHNOLOGY IN THENETBURST MICROARCHITECTURE” by DavidKoufaty,Deborah T. Marr, IEEE Micro, Vol. 23, Issue 2,March–April 2003. Pages: 56 – 65.• http://cache-www.intel.com/cd/00/00/22/09/220943_220943.pdf• http://www.cs.washington.edu/research/smt/papers/tlp2ilp.final.pdf• http://mos.stanford.edu/papers/mj_thesis.pdf