Your SlideShare is downloading. ×
Features of modern intel microprocessors
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Features of modern intel microprocessors

1,047
views

Published on

Published in: Education, Technology, Business

2 Comments
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
1,047
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
103
Comments
2
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.EssentialPerformanceAdvancedPerformanceDistributedPerformanceEfficientPerformanceFeatures Of Modern IntelMicroprocessorsPrepared By:Krunal P Siddhapathak (10BEC097)
  • 2. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Core and Multi-Core ProcessorWhat is a Core? A standard processor has one core (single-core.) Single core processorsonly process one instruction at a time (they do use pipelines internally,which allow several instructions to be processed together; however, theyare still run one at a time.)What is a Multi-Core Processor? A multi-core processor is comprised of two or more independent cores,each capable of processing individual instructions. A dual-core processorcontains two cores, a quad-core processor contains four cores, and ahexa-core processor contains six cores.
  • 3. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Need Of Multi-Core ProcessorsMultiple cores can be used to run two programs side by sideand, when an intensive program is running, (AV Scan, Videoconversion, CD ripping etc.) you can utilize another core torun your browser to check your email etc.Multiple cores really shine when you’re using a program thatcan utilize more than one core (called Parallelization) toimprove the program’s efficiency and addressability. Programssuch as graphic software, games etc. can run multipleinstructions at the same time and deliver faster, smootherresults.If you use CPU-intensive software, multiple cores will likelyprovide a better computing experience. If you use your PC tocheck emails and watch the occasional video, you really don’tneed a multi-core processor.
  • 4. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Core 2 Duo vs. Core i3 vs. Core i5Core 2 Duo Core i3 Core i5Number ofThreadsTwo Four FourSocket 775 (45/65nm) 1156 (nm) 1156 (nm)CompatibleRAMDDR2 DDR3 DDR3Turbo Boost No No YesOverclocking No Yes No
  • 5. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Do I need an i3, i5, or i7?As with all computer hardware, the type of processor youneed depends on your needs, for how long you want yourcomputer to stay current, and your budget.If you: Browse the internet, check email, and play the occasional flash game (likeFarmville): Get a single core netbook or desktop Do word processing, spreadsheets etc., listen to music often, and watchmovies, get an i3 processor (or any dual core processor i.e. core 2 duo) Play the occasional game and are happy with lower resolution and lowerquality graphics (my suggestion assumes the graphics processor on thepre-built PC will be well-matched for the processor suggestions), watchHD movies etc., get an i5. If you do graphic publishing, music creation, programming (andcompiling), watch HD movies, or like to play visually appealing games,get a quad core i5, or i7. If you like to have the very best hardware and play the most graphicallyintense games, get a quad core or hexa corei7 Extreme.
  • 6. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Intel Sandy Bridge Microarchitecture Many of the bottlenecks of previous designs have been dealt with in the Sandy Bridge. Instruction fetch and predecoding has been a serious bottleneck in Intel designs formany years. In the NetBurst architecture they tried to fix this problem by cachingdecoded µops, without much success. In the Sandy Bridge design, they are caching instructions both before and afterdecoding. The limited size of the µop cache is therefore less problematic, and the µopcache appears to be very efficient. The limited number of register read ports has been aserious, and often neglected, bottleneck since the old Pentium Pro. This bottleneck has now finally been removed in the Sandy Bridge. Previous Intelprocessors have only one memory read port where AMD processors have two. This was abottleneck in many math applications. The Sandy Bridge has two read ports, wherebythis bottleneck is removed. The branch prediction has been improved by having biggerbuffers and a shorter misprediction penalty, but it has no loop predictor, andmispredictions are still quite common. The new AVX instruction set is an important improvement. The throughput of floatingpoint addition and multiplication is doubled when the new 256-bit YMM registers areused. The new non-destructive three-operand instructions are quite convenient forreducing register pressure and avoiding register move instructions. There is, however, aserious performance penalty for mixing vector instructions with and without the VEXprefix. This penalty is easily avoided if the programming guidelines are followed, but Isuspect that it will be a very common programming error in the future to inadvertentlymix VEX and non-VEX instructions, and such errors will be difficult to detect.
  • 7. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Intel Sandy Bridge Microarchitecture(Contd.)Whenever the narrowest bottleneck is removed from asystem, the next less narrow bottleneck will become thelimiting factor. The new bottlenecks that require attention inthe Sandy Bridge are the following: The µop cache: This cache can ideally hold up to 1536 µops. The effective utilizationwill be much less in most cases. The programmer should pay attention to make surethe most critical inner loops fit into the µop cache. Instruction fetch and decoding: The fetch/decode rate has not been improved overprevious processors and is still a potential bottleneck for code that doesn’t fit intothe µop cache. Data cache bank conflicts: The increased memory read bandwidth means that thefrequency of cache conflicts will increase. Cache bank conflicts are almostunavoidable in programs that utilize the memory ports to their maximum capacity. Branch prediction: While the branch history buffer and branch target buffers areprobably bigger than in previous designs, mispredictions are still quite common. Sharing of resources between threads: Many of the critical resources are sharedbetween the two threads of a core when hyperthreading is on. It may be wise to turnoff hyperthreading when multiple threads depend on the same execution resources.
  • 8. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Intel Ivy Bridge MicroarchitectureIvy Bridge is the codename for an Intel microprocessor usingthe Sandy Bridge microarchitecture. The name is also appliedmore broadly to the 22 nm die shrink of the microarchitecturebased on tri-gate ("3D") transistors, which is also used in thefuture Ivy Bridge-EX and Ivy Bridge-EP microprocessors. IvyBridge processors are backwards-compatible with the SandyBridge platform, but might require a firmware update (vendorspecific). Intel has released new 7-series Panther Pointchipsets with integrated USB 3.0 to complement Ivy Bridge.Volume production of Ivy Bridge chips began in the thirdquarter of 2011. Quad-core and dual-core-mobile modelslaunched on April 29, 2012 and May 31, 2012 respectively.Core i3 desktop processors, as well as the first 22 nm Pentiumwere launched and available the first week of September,2012.
  • 9. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Intel Ivy Bridge Microarchitecture(Contd.)How much faster are the Ivy Bridge processors? The base clock frequency of these processors ranges from 2.8GHz (for Core i5-3450S) to 3.5 GHz (for Core i7-3770K).What different types of the Ivy Bridge processorsare available? There are many types of processors in the Ivy Bridge family. Thetype is indicated by putting a suffix to the CPU model name. Thefollowing list explains these suffixes - K – Unlocked, ready to be overclocked. S – Performance optimized. Low power consumption. T – Power optimized. Ultra low power consumption. M – Mobile processors for mobile devices. Q – Quad core processors.
  • 10. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Intel Ivy Bridge Microarchitecture(Contd.)Features present in Ivy Bridge: HD graphics – Ivy Bridge processors have in-built GPU chip inside them.The GPU supports DirectX 11 (Sandy Bridge supports version 10.1),OpenGL 3.1 (Sandy Bridge supports version 3.0). Ivy Bridge processorshave the Intel HD4000/HD2500 GPU chips. This means that you do notneed an add-on graphics card. QuickSync Video – This feature is introduced in the Intel 3rd generationprocessors. It uses dedicated media processing to make video creationand conversion faster and easier. Whether you want to create DVDs,create, convert and edit 3D/2D videos, upload to your favorite socialnetworking sites – everything is done in a jiffy. WiDi 3.0 – Wireless Display technology allows you to stream mediacontent to a multitude of your Wi-Fi connected display devices. You canshare a 1080p 60FPS video using WiDi. Turbo Boost Technology 2.0 – Using the Turbo boost technology, you canmake your Ivy Bridge processors run faster than their base frequency. Forexample, a 3.5GHz iCore i7 can be made to run at 3.9 GHz for some time.
  • 11. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Core 2 Duo vs. Core i3 The Core 2 Duo is Intels veteran, covering a wide range of price andperformance sweet spots. It is now being replaced, however, byIntels rookie Core i3. So, is the Core i3 actually better than the Core2 Duo, or can you hold off upgrading for a while longer? The Core 2 Duo has been the processor of choice in laptops for aboutthree years. Over those three years the average speeds of Core 2Duo processors have advanced significantly and many of todaysCore 2 Duo laptops have speeds of around 2.2 GHz or faster. Core 2Duo processors have also been the go-to for many less expensivedesktop systems, with speeds reaching over 3 GHz. However, there is a newcomer which is challenging the Core 2 Duo.This is the Core i3. It is very similar to the Core 2 Duo in manyways. Both are dual-core processors and most Core 2 Duos and Corei3 have similar clock speeds. However, the processors are based ondifferent architectures. So, which one is better?
  • 12. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Core 2 Duo vs. Core i3(Contd.)Architecture The Core 2 Duo processors are based off the Core 2 architecture.The Core and Core 2 architectures were arguably Intels mostsuccessful architectures, as they replaced the Pentium 4processors in desktop systems and made Intel competitive in thatspace once again. The Core i3 is based off a new architecture called Nehalem. TheNehalem architecture has numerous advantages over the Core 2architecture. Nehalem is better constructed for quad-coreprocessors, has hyper-threading available, and can use a featurecalled Turbo Boost which maximizes processor speed. However,because the Core i3 is the low-end Nehalem variant, most ofthese features are disabled or not relevant - the Core i3 is a dualcore processor and Turbo Boost is disabled, but hyper-threadingis enabled.
  • 13. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Core 2 Duo vs. Core i3(Contd.)Processor Performance The Core i3 is the slowest variant of the Nehalem based processor. TheCore 2 Duo processors, however, dont have the same differentiationbetween versions of the same architecture. The fastest Core 2 Duodesktop processor has a speed of 3.33 GHz, while the fastest Core i3desktop processor is clocked at 3.06 GHz. You might therefore expect that the Core 2 Duo would have the edge -particularly when you consider that the Core 2 Duo costs almost threetimes as much if you buy it individually - but in fact the Core i3 is faster,and often by no small margin. The Core i3 is faster even in single-threaded applications, but the performance gap really widens in multi-threaded applications. This is because the Core i3 has hyper-threading,which turns the two real cores into four virtual cores. Windows works withthe Core i3 as if it is a quad-core processor. These results remain true in the mobile space, as well. Core i3 processorspunch at least 500 MHz above their weight in single-thread applications,and are virtually always faster in multi-threaded applications, no matterthe clock speeds of the Core 2 Duo and Core i3 processors you arecomparing.
  • 14. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Core 2 Duo vs. Core i3(Contd.)Power Usage and Heat A look at the technical specifications of the Core i3 processors automatically putsthem into a negative light when it comes to power consumption. The desktop Core i3parts at listed as having a 73 Watt TDP, while most Core 2 Duo desktop parts have a65 Watt TDP. In laptops the Core i3 has a 35 watt TDP, while Core 2 Duo mobileprocessors usually have a 25 Watt TDP. These differences pan out about how youd expect them to when it comes toabsolute power consumption. The Core i3 processors do consume just slightly morepower than Core 2 Duo processors at load and at idle. Were talking a difference ofaround 10 Watts on desktops and a few on laptops - nothing huge, but a differencenone the less. However, when it comes to power efficiency the answer becomes less clear. In orderfor a processor to be power efficient, it needs to not only have low powerconsumption but also the ability to complete tasks quickly. This lowers the overall"task energy" because a faster processor will be done with a task before a slowerprocessor, and once done it will slip back into an idle state. When viewed from this perspective, the Core i3 is much more efficient than the Core2 Duo on both the desktop and the laptop. This means that the Core i3 will probablynot use any more power than a Core 2 Duo - and may actually use less - unless yourusage patterns place a constant load on your processor.
  • 15. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Various Core Processors Of IntelCore i3 Series Intels Core i3 processor line has always been a budget option. Theseprocessors remain dual-core, unlike the rest of the Core line, which ismade up of quad core processors. Intels Core i3 processors also havemany features restricted. The main feature that is kept from the Core i3 processors is Turbo Boost,the dynamic overclocking available on most Intel processors. This,alongside with the dual-core design, accounts for most of the performancedifference between Core i3 processors and the i5 and i7 options. One feature that Core i3 has - and i5 doesnt - is hyper-threading. This isIntels logic-core duplication technology which allows each physical core tobe used as two logic cores. The result of this is that Windows will display adual-core Core i3 processor as if it were a quad-core. Finally, Core i3 processors have their integrated graphics processorrestricted to a maximum clock speed of 1100 MHz, and all Core i3processors have the 2000 series IGP, which is restricted to 6 executioncores. This will result in slightly lower IGP performance overall, but thedifference is frankly inconsequential in many situations.
  • 16. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Various Core Processors Of Intel(Contd.)Core i5 Series Intel used to split the Core i5 processor brand into two differentlines, one of which was dual-core and one of which was quad-core. All Sandy Bridge Core i5 processors are quad-core processors,they all have Turbo Boost, and they all lack Hyper-Threading.Most of the Core i5 processors, besides the K series (explainedlater) use the same 2000 series IGP with a maximum clock speedof 1100 MHz and six execution cores. In the i3 vs. i5 vs. i7 battle, the Core i5 processor is nowobviously the main-stream option no matter which product youbuy. The only substantial difference between the Core i5 optionsis the clock speed, which ranges from 2.8 GHz to 3.3 GHz.Obviously, the products with a quicker clock speed are moreexpensive than those that are slower.
  • 17. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Various Core Processors Of Intel(Contd.)Core i7 Series These processors are virtually identical to the Core i5. They have a 100MHz higher base clock speed, which is inconsequential in most situations.The real feature difference is the addition of hyper-threading on the Corei7, which means that the processor will appear as an 8-core processor inWindows. This improves threaded performance and can result in asubstantial boost if youre using a program that is able to take advantageof 8 threads. Of course, most programs cant take advantage of 8 threads. Those thatcan are almost usually meant for enterprise or advanced video editingapplications - 3D rendering programs, photo editing programs, andscientific programs are categories of software frequently designed to use8 threads. The average user is unlikely to see the full benefit of the hyper-threading feature. In the Core i3 vs. i5 vs. i7 battle, the i7 has limitedappeal. The IGP on Core i7 processors can also reach a higher maximum clockspeed of 1350 MHz as Ive said before; however, this difference is largelyinconsequential when measuring real-world performance.
  • 18. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Various Core Processors Of Intel(Contd.)The K series processor Late in the lifespan of Intels previous Core i branded products;Intel introduced the "K" series. These processors had unlockedmultipliers, making them easier to overclock. Intel has kept this line of products alive with the new SandyBridge architecture by introducing a K series Core i5 and i7processor. As before, these processors have unlocked multipliers.However, they also have a new feature - better integratedgraphics processors. This comes in the form of the 3000 series IGP, which has 12execution cores instead of 6. The maximum clock speed remainslimited by the processor brand - the Core i5 K is limited to 1100MHz, while the Core i7 K can reach 1350 MHz the additionalexecution cores can result in better performance in games,although to honest, the IGP isnt remotely cut out for desktopgaming.
  • 19. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Various Core Processors Of Intel(Contd.)The IGP Features: Sandy Bridge The most importance new feature added to Intels Sandy Bridgeprocessors is the inclusion of an IGP on the processor. Intel did thisbefore with Core i3 and some Core i5 processors, but the IGP was stillseparate from the processor itself - the IGP and CPU were placed on thesame piece of silicon, but didnt physically work together. Now Intel has taken the IGP integration a step further and worked the IGPinto the CPU architecture. It even shares cache with the processor. Whatthis means, in practical terms, is that the on-board graphics of Intels newprocessors are superior to anything theyve offered before. It also enablesQuick Sync, a video transcoding feature that provides blazingperformance when converting videos to a different format. Intel is offering two different types of IGPs on its processors. The 2000has 6 execution units, while the 3000 has 12 execution units. Obviously,the later is quicker. Intel hasnt tied the IGP that you receive to the typeof processor you choose, however. Instead, it has tied the 3000 seriesIGP to the "K" series processors. If you see a "K" at the end of theprocessors name, it has the 3000 series IGP. So far, Intel doesnt offer aCore i3 K series processor, but that could change in the future.
  • 20. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Various Core Processors Of Intel(Contd.)Laying Out the Chipset The staggered release of Intels previous Core i3/i5/i7 products alsoresulted in a staggered release of processor sockets and their relatedchipsets. First came LGA 1366 processor socket, which was tied to someCore i7 processors. Then Intel confused things by releasing the LGA 1156socket, which was made available on several different chipsets andprocessor types. Choosing the right socket and chipset for a processorwasnt easy. Intel has now clarified matters by releasing a single processor socket andtwo processor chipsets alongside Sandy Bridge. The new socket is LGA1155, and it isnt backwards compatible with anything Intel has previouslyoffered. The new chipsets are P67 and H67, with the P variant beingperformance-oriented and the H variant targeted at general use. The maindifference is that P67 allows for processor overclocking, while H67 doesnot. P67 also offers 16 additional PCIe lanes. Both Core i3 and i5processors are compatible with either chipset.
  • 21. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Core i5 vs. Core i7Core i5: The New Middle Class While the hardware has changed, Intels branding scheme remains thesame, and Core i5 remains Intels primary mid-range processor. It istargeted at the heart of the market, with pricing that is not at budgetlevels but still affordable, and performance that is extremely quick but notthe fastest Intel offers. Intels high-end processor line is the Core i7. Many users who are lookingfor a high-performance part end up considering both i5 and i7 products.A Unified Socket and Chipset Perhaps the best news to come out of Intels new line of i5 and i7processors is introduction of a single socket for all Sandy Bridge Corei3/i5/i7 processors. For now, however, the Sandy Bridge processors alluse the LGA 1155 socket. In case youre wondering, this socket is notbackwards compatible with previous LGA1156 processors.
  • 22. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Core i5 vs. Core i7(Contd.)Intel Turbo Boost Intel has made Turbo Boost a standard feature on all Core i5 andi7 processors, from the least to most expensive. Intel has alsoreduced the gap between the maximum turbo boost frequencieson different processors. Previously, some of the older Core i7processors actually had a much less efficient Turbo Boost featurethan some newer Core i5s. All of Intels current Core i5 and i7 processors offer a boost ofbetween 300 and 400 MHz The least expensive i5s offer the 300MHz boost - for example, the Core i5 2300 has a base clockspeed of 2.8 GHz and a maximum Turbo Boost speed of 3.1 GHz.The Intel Core i7 2600, on the other hand, offers a base clockspeed of 3.4 GHz and a maximum Turbo Boost of 3.8 GHz. Besides the clock speed difference, Turbo Boost is essentially thesame on the i5 and i7 processors.
  • 23. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Core i5 vs. Core i7(Contd.)Difference in Hyper-Threading Another significant performance difference is how the Core i7 andCore i5 products will be handling hyper-threading. Hyper-threading is a technology used by Intel to simulate more coresthan actually exist on the processor. While Core i7 products haveall been quad-cores, they appear in Windows as having eightcores. This further improves performance when using programsthat make good use of multi-threading. All Sandy Bridge Core i5 processors have hyper-threadingdisabled, and all Sandy Bridge Core i7 processors have hyper-threading enabled. This is a major feature difference of Core i5vs. Core i7 processors, and it will give the Core i7 products anadvantage over Core i5 processors in some heavily multi-threaded applications.
  • 24. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Core i5 vs. Core i7(Contd.)The New IGPAll of Intels Sandy Bridge processors make use of a newintegrated IGP that is part of the processor architecture.While far from a gaming-grade video solution, theintegrated IGP offers reasonable performance withoutconsuming much power. It also enables features like QuickSync, which can transcode video extremely quickly.There are two versions of this IGP; the 2000 and the 3000.The only difference between the two is the number ofexecution units. The 2000 has 6, while the 3000 has 12.This doesnt mean the 3000 is twice as quick, but it doesmeans the 3000 is about 50% quicker in mostbenchmarks.
  • 25. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Core i5 vs. Core i7(Contd.)i5 vs. i7: What it means to Consumers and Power Users Currently, the Core i5 processor brand makes up most of Intels SandyBridge processor line. The prices of these processors range from $177 to$216 with base clock speeds between 2.8 GHz and 3.3 GHz. Intel onlyoffers two Core i7 products, the Core i7-2600 and Core i7-2600K, both ofwhich have a 3.4 GHz base clock speed. The i7-2600 has a price tag of$294. As you may have guessed, paying about $80 more for the 100 MHz clockspeed increase between the fastest i5 and the i7 isnt a great deal. Themain reason to pay this additional cash for an i7 is hyper-threading, butthis advantage will only be evident if you frequently use programs thatcan actually make use of 8 threads. For most users, the i5 is clearly the better deal. The i5-2500 makes themost sense in my opinion, as it offers an extremely quick base clockspeed of 3.3 GHz for about $200. Of course, the value of this is subject tochange in the future as Intel fleshes out its product line with new models.
  • 26. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Hyper Threading Hyper-Threading Technology brings the concept of simultaneous multi-threading tothe Intel Architecture. Hyper-Threading Technology makes a single physicalprocessor appear as two logical processors. The physical execution resources areshared and the architecture state is duplicated for the two logical processors. Froma software or architecture perspective, this means operating systems and userprograms can schedule processes or threads to logical processors as they would onmultiple physical processors. From a microarchitecture perspective, this meansthat instructions from both logical processors will persist and executesimultaneously on shared execution resources. The amazing growth of the Internet and telecommunications is powered by ever-faster systems demanding increasingly higher levels of processor performance. Tokeep up with this demand we cannot rely entirely on traditional approaches toprocessor design. Microarchitecture techniques used to achieve past processorperformance improvement–super-pipelining, branch prediction, super-scalarexecution, out-of-order execution, caches–have made microprocessorsincreasingly more complex, have more transistors, and consume more power. Infact, transistor counts and power are increasing at rates greater than processorperformance. Processor architects are therefore looking for ways to improveperformance at a greater rate than transistor counts and power dissipation. Intel’sHyper-Threading Technology is one solution.
  • 27. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Hyper Threading(Contd.) A look at today’s software trends reveals that server applications consist ofmultiple threads or processes that can be executed in parallel. On-linetransaction processing and Web services have an abundance of softwarethreads that can be executed simultaneously for faster performance. Evendesktop applications are becoming increasingly parallel. Intel architects havebeen trying to leverage this so-called thread-level parallelism (TLP) to gain abetter performance vs. transistor count and power ratio. In both the high-end and mid-range server markets, multiprocessors havebeen commonly used to get more performance from the system. By addingmore processors, applications potentially get substantial performanceimprovement by executing multiple threads on multiple processors at thesame time. These threads might be from the same application, from differentapplications running simultaneously, from operating system services, or fromoperating system threads doing background maintenance. Multiprocessorsystems have been used for many years, and high-end programmers arefamiliar with the techniques to exploit multiprocessors for higher performancelevels.
  • 28. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Hyper Threading(Contd.) In recent years a number of other techniques to further exploit TLP have beendiscussed and some products have been announced. One of these techniques ischip multiprocessing (CMP), where two processors are put on a single die. The twoprocessors each have a full set of execution and architectural resources. Theprocessors may or may not share a large on-chip cache. CMP is largely orthogonalto conventional multiprocessor systems, as you can have multiple CMP processorsin a multiprocessor configuration. Recently announced processors incorporate twoprocessors on each die. However, a CMP chip is significantly larger than the size ofa single-core chip and therefore more expensive to manufacture; moreover, itdoes not begin to address the die size and power considerations. Another approach is to allow a single processor to execute multiple threads byswitching between them. Time-slice multithreading is where the processorswitches between software threads after a fixed time period. Time-slicemultithreading can result in wasted execution slots but can effectively minimizethe effects of long latencies to memory. Switch-on-event multithreading wouldswitch threads on long latency events such as cache misses. This approach canwork well for server applications that have large numbers of cache misses andwhere the two threads are executing similar tasks. However, both the time-sliceand the switch-on event multi- threading techniques do not achieve optimaloverlap of many sources of inefficient resource usage, such as branchmispredictions, instruction dependencies, etc.
  • 29. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Hyper Threading(Contd.) Finally, there is simultaneous multi-threading, where multiple threads canexecute on a single processor without switching. The threads executesimultaneously and make much better use of the resources. This approachmakes the most effective use of processor resources: it maximizes theperformance vs. transistor count and power consumption. Hyper-ThreadingTechnology brings the simultaneous multi-threading approach to the Intelarchitecture. In this paper we discuss the architecture and the firstimplementation of Hyper-Threading Technology on the Intel Xeon processorfamily. Hyper-Threading Technology makes a single physical processor appear asmultiple logical processors. To do this, there is one copy of the architecturestate for each logical processor, and the logical processors share a single setof physical execution resources. From a software or architecture perspective,this means operating systems and user programs can schedule processes orthreads to logical processors as they would on conventional physicalprocessors in a multiprocessor system. From a microarchitecture perspective,this means that instructions from logical processors will persist and executesimultaneously on shared execution resources.
  • 30. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Hyper Threading(Contd.)There are few elements in CPU that need to be understand toknow about hyper-threading technology: Registers - Registers are basically circuits that hold a single 64-bit value and arethe fastest form of storage available on a computer. The x86- architecture provides anumber of General Purpose Registers that are used by an executing program. In amulticore chip, registers are unique to each core so if you have a quad-coreprocessor, there will be 4 sets of general purpose registers. Cache – Cache is essentially a form of storage that falls between registers and RAMin terms of speed. In modern processors there are generally three levels and in thecase of the i7, Levels 1 & 2 is private and Level 3 is shared by all the cores on a chip.The most important thing to know is that accessing the cache is slower thanregisters but still faster than RAM. Execution Unit – This is the section in the CPU responsible for actually executingthe instructions. If you tell the computer to add 2 + 3, this is the part that operationwould be performed in. Front-End – This is a unit of the processor that is also known as InstructionFetch/Decode. Essentially this unit will grab instructions from either cache or RAMand decode them into a form that execution unit can understand. Branch Predictor - this unit will attempt to predict branches in program code. Ifthere is an ―if-then‖ statement in a program, it will guess which statements will beexecuted and prefetch them for the front-end.
  • 31. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Hyper Threading(Contd.) In a core with HT, the registers are all duplicated. This means that one corewill have 2 sets of registers and this is what the operating will see as a―logical core‖ since the sum of the registers represents the processor’s state.We’ll call these sets A and B. Even though it appears as two cores, they willstill be sharing the same cache, branch predictor, front-end, and mostimportantly, execution unit. Because they still share so many resources, onlyone thread will technically execute at once. The advantage of adding the HTlogic is that if a thread is executing and stalls for any reason, the otherthread can be switched in very fast while the cause of the stall in the firstthread is addressed. To better illustrate how this works, consider thefollowing: Set A is considered the current state of the processor. Thread a starts executing. Thread A needs a value from memory that isn’t in the cache. Memory access is very time consuming in CPU terms, so thread A is consideredstalled. Instead of wasting cycles waiting for the memory operation to complete, set B isconsidered the current state. Thread B is now executing until it stalls or until thread A can execute again (memoryoperation finishes).
  • 32. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Hyper Threading(Contd.)This process basically just continues on constantly. Now,there should be an obvious question: What can cause athread to stall? There are a few things; the simplest one tounderstand is a cache miss. This is when the thread goes toaccess a value that isn’t currently in the cache or any of theregisters. A branch miss prediction can also occur when thebranch predictor prefetches the wrong instructions into thecache.There is another time Hyper-Threading kicks in, and that is ifone thread is using Floating-Point resources while the 2nd isonly using Integer resources. HT will allow them both toexecute simultaneously while they dont conflict.
  • 33. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Does hyper threading actually help? Hyper-Threading has some interesting performance characteristics as aresult of its nature. HT will provide close to zero advantage if instructiondecoding or execution is the limiting factor in performance. In theNehalem architecture this is rarely the case. It performs ideally whenthere are a lot of cache misses or branch miss predictions since theexecution unit would otherwise be idle waiting for these issues to beresolved. Basically, certain applications will benefit more than others. Running amore parallel workload such as rendering or encoding video will see anice benefit from HT since it’s likely both threads will be accessing thesame data so they aren’t really competing for cache. Additionally therelatively small amount of local L2 cache in the i7 (256k) means therewill be a decent amount of memory access giving the second thread timeto execute. Also, it can result in a more responsive machine if not muchis going on since threads will have very low execution time and it’s muchfaster for the CPU to switch the active register set than to grab anotherthread from RAM and load it into the registers.
  • 34. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Are there drawbacks?As with most engineering decisions, there are drawbacks toHT. One of the more obvious one is that since HT keeps theexecution unit fed more efficiently, it spends less time idleand can result in higher operating temperatures. More timeidle would mean the CPU got a chance to cool down beforethe next execution burst and would result in a lower maxtemperature.There are also programs that will either not see any benefitfrom HT or see decreased performance as well. Typicallysomething that has performance limited by cache, instructiondecode, the execution unit, or memory access will see little tonegative improvement from HT (one of the reasons the i7 hasso much memory bandwidth).
  • 35. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Are there drawbacks?(Contd.) Running more than one multithreaded, computationally intensive task ata time can also be a situation where HT doesn’t help performance. If aprocessor core is running threads from different programs or that areoperating on different data, all of the shared resources are effectivelyhalved (data cache, branch prediction, instruction cache). This meansbranch miss predictions and cache misses become even more common,possibly to the point where both threads are stalled. Depending on thespecific program this can mean either lower performance (compared toHT being disabled) or worse scaling than expected. The last drawback is probably the most important one: The benefit of HTis inconsistent and dependent upon the specific operating environmentand programs being run. Because of the way it works, code that isheavily optimized is likely to show less benefit as it would be designed tolower branch miss-predictions and cache misses. The inconsistency of HTwhile multitasking won’t show up on benchmarks since they’re designedto only test a single task at a time.
  • 36. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Is it worth to use Hyper-thread technology?If one does a lot of 3d rendering or VideoTranscoding then it probably is since this is theworkload HT is best suited for. If you find that yougenerally run multiple intensive taskssimultaneously (like playing a game while encodinga video or recompiling the Linux kernel in a VM)then HT could have a negative impact on overallperformance (though not necessarily). One thingthat is for sure is its impact is exaggerated insynthetic benchmarks, almost to the point where itbecomes misleading.
  • 37. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.VirtualizationServer virtualization: Huge data-centers contains large number of server. Work- load,user-activity and other things decides which server when to useand for the servers that are not been used according to theircapacities companies still spending their money, energy andresources to keeping them updated and preventing them fromany crashing and overheating. So server virtualization concept isused to make that physical server consolidate on fewer morepowerful and energy efficient server and that vm (virtualmachine) or energy efficient server imitate or pretends to bemultiple servers on network. Virtual server environment istransparent on network so each user can interact with virtualserver as if they are still multiple servers but now mainadvantage is that they should have to take care of only fewenergy efficient servers instead of many servers and saving ofresources, energy and money also possible.
  • 38. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Virtualization(Contd.) As shown in figure in traditional architecture there is hardware which isworking on single operating system and in that operating system different -different application are working. But as we know as this system as not energy efficient so one virtualenvironment is developed through which now we can work on differentoperating system with a single machine.
  • 39. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Virtual Machine A virtual machine monitor (VMM) is a host program that allows asingle computer to support multiple, identical executionenvironments. All the users see their systems as self-containedcomputers isolated from other users, even though every user isserved by the same machine. In this context, a virtual machine is anoperating system (OS) that is managed by an underlying controlprogram. For example, IBMs VM/ESA can control multiple virtualmachines on an IBM S/390 system. We are doing server virtualization to reduce energy cost, simplifymanageability and disaster management. In server virtualization what we are doing is adding VMM software toallow hardware to use more than one OS. Major component of the server: Processor Chipset Network interface
  • 40. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Virtual Machine(Contd.) Individual technologies that make up Intel VT are built in thiscomponent that boost Performance, boost reliability, and boostflexibility. Intel VT supports virtual machine architectures comprised of twoprincipal classes of software: Virtual-Machine Monitor (VMM): A VMM acts as a host and has full controlof the processor(s) and other platform hardware. VMM presents guestsoftware (see below) with an abstraction of a virtual processor and allowsit to execute directly on a logical processor. A VMM is able to retainselective control of processor resources, physical memory, interruptmanagement, and I/O. Guest Software: Each virtual machine is a guest software environmentthat supports a stack consisting of an operating system (OS) andapplication software. Each operates independently of other virtualmachines and uses the same interface to processor(s), memory, storage,graphics, and I/O provided by a physical platform. The software stackacts as if it were running on a platform with no VMM. Software executingin a virtual machine must operate with reduced privilege so that the VMMcan retain control of platform resources.
  • 41. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Intel Virtualization Technology-Flex Migration(Intel VT-X) Obviously, as IT adds new systems, it would be much more convenient andefficient if an IT manager could simply add new resources to existing pools withouthaving to worry about differences in processor generation. For this reason, Intelhas developed Intel VT Flex Migration. When combined with support fromvirtualization software, it ensures that the hypervisor can expose a consistent setof instructions across all servers in the pool. Intel VT Flex Migration support startswith Intel® Core™ microarchitecture and will be available in future generations ofthe Intel Xeon processor family. With Intel VT Flex Migration, IT managers can easily add current and future IntelXeon processor-based systems to the same resource pool when using supportinghypervisor software. This gives IT the power to choose the right server platformwhen it is needed to optimize performance, cost, power, and reliability, withouthaving to worry about forward and backward compatibility across generations ofIntel Xeon processor-based servers starting with Intel Core microarchitecture andextending into future generations of Intel Xeon processors. IT managers can poolserver resources using multiple generations of Intel Xeon processors whether theyare single, dual- or multi-processor based. This creates a dynamic virtual serverinfrastructure that enables the use of live VM migration to improve usage modelssuch as failover, load balancing, disaster recovery, and server maintenance.
  • 42. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Intel VT-X(Contd.) Current Intel® Xeon® 5400 and 5200processor series, 3300 and 3100 processorseries, as well as future Intel Xeonprocessors, support Intel VT Flex Migration.Using virtualization software that is enabledto take advantage of this feature, Intelservers based on these processors can bepooled with earlier generation of Intel Coremicroarchitecture processors. These includeIntel® Xeon® 7300, 5300, 5100, 3200,3000 series processors. Major Intel VT-xcomponent is Intel VT-x flex migration. Byusing this technology, we will be able tomigrate the application from one server toanother and recover from disaster. From Intel VT flex migration one canmigrate between to generation processor soone can react quickly on change in conditionmaking it much easier to server upendrunning.
  • 43. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Flex Priority Intel VT Flex Priority optimizes and accelerates interrupt virtualization byimproving virtual machine access to the Task Priority Register therebyenabling efficient Symmetric Multi-Processing (SMP) configurations of 32-bitguest operating systems. For users, this translates into more efficientperformance in virtual environments for their critical enterprise applications. Intel VT Flex Priority was designed to accelerate virtualization interrupthandling thereby improving virtualization performance. Intel VT Flex Priorityaccelerates interrupt handling by preventing unnecessary VMExits onaccesses to the Advanced Programmable Interrupt Controller. Intel flex priority improves virtualization by 35% When processor is constantly bombarded with interruption many of which arecritical so Intel VT flex priority is kind of like receptionist who alerts wheninterruption is critical. Because it is not necessary that all the interrupt thatare given to the processor are necessarily Critical to be executed at the time of occurrence of interruption so throughflex priority is kind like receptionist who alerts when interruption is critical soprocessor can work efficiently if it is less interrupted.
  • 44. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Virtualization for directed I/O A VMM must support virtualization of I/O requests from guest software. I/Ovirtualization may be supported by a VMM through any of the followingmodels: Emulation: A VMM may expose a virtual device to guest software by emulating an existing(legacy) I/O device. VMM emulates the functionality of the I/O device in software over whateverphysical devices are available on the physical platform. I/O virtualization through emulationprovides good compatibility (by allowing existing device drivers to run within a guest), but poselimitations with performance and functionality. New Software Interfaces: This model is similar to I/O emulation, but instead of emulatinglegacy devices, VMM software exposes a synthetic device interface to guest software. Thesynthetic device interface is defined to be virtualization-friendly to enable efficient virtualizationcompared to the overhead associated with I/O emulation. This model provides improvedperformance over emulation, but has reduced compatibility (due to the need for specialized guestsoftware or drivers utilizing the new software interfaces). Assignment: A VMM may directly assign the physical I/O devices to VMs. In this model, the driverfor an assigned I/O device runs in the VM to which it is assigned and is allowed to interact directlywith the device hardware with minimal or no VMM involvement. Robust I/O assignment requiresadditional hardware support to ensure the assigned device accesses are isolated and restricted toresources owned by the assigned partition. The I/O assignment model may also be used to createone or more I/O container partitions that support emulation or software interfaces for virtualizingI/O requests from other guests. The I/O-container-based approach removes the need for runningthe physical device drivers as part of VMM privileged software.
  • 45. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Virtualization for directed I/O(Contd.)Models contd.: I/O Device Sharing: In this model, which is an extension to the I/Oassignment model, an I/O device supports multiple functional interfaces,each of which may be independently assigned to a VM. The devicehardware itself is capable of accepting multiple I/O requests through anyof these functional interfaces and processing them utilizing the deviceshardware resources. Depending on the usage requirements, a VMM may support any ofthe above models for I/O virtualization. For example, I/O emulationmay be best suited for virtualizing legacy devices. I/O assignmentmay provide the best performance when hosting I/O-intensiveworkloads in a guest. Using new software interfaces makes a trade-off between compatibility and performance, and device I/O sharingprovides more virtual devices than the number of physical devices inthe platform.
  • 46. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Overview Of Intel VirtualizationA general requirement for all of above I/O virtualizationmodels is the ability to isolate and restrict device accesses tothe resources owned by the partition managing the device.Intel VT for Directed I/O provides VMM software with thefollowing capabilities: I/O device assignment: For flexibly assigning I/O devices toVMs and extending the protection and isolation properties of VMsfor I/O operations. DMA remapping: For supporting independent addresstranslations for Direct Memory Accesses (DMA) from devices. Interrupt remapping: For supporting isolation and routing ofinterrupts from devices and external interrupt controllers toappropriate VMs. Reliability: For recording and reporting to system software DMAand interrupt errors that may otherwise corrupt memory orimpact VM isolation.
  • 47. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.DMA Remapping DMA remapping facilities have been implemented in a variety of contextsin the past to facilitate different usages. In workstations and serverplatforms, traditional I/O memory management units (IOMMUs) havebeen implemented in PCI root bridges to efficiently supportscatter/gather operations or I/O devices with limited DMA addressability.Other well-known examples of DMA remapping facilities include the AGPGraphics Aperture Remapping Table (GART), the Translation andProtection Table (TPT) defined in the Virtual Interface Architecture, andsubsequently influencing a similar capability in the InfiniBandArchitecture and Remote DMA (RDMA) over TCP/IP specifications. DMAremapping facilities have also been explored in the context of NICsdesigned for low latency cluster interconnects. Traditional IOMMUs typically support an aperture-based architecture. AllDMA requests that target a programmed aperture address range in thesystem physical address space are translated irrespective of the sourceof the request. While this is useful for handling legacy device limitations(such as limited DMA addressability or scatter/gather capabilities), theyare not adequate for I/O virtualization usages that require full DMAisolation.
  • 48. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.DMA Remapping(Contd.) The VT-d architecture is a generalized IOMMU architecture that enablessystem software to create multiple DMA protection domains. A protectiondomain is abstractly defined as an isolated environment to which a subset ofthe host physical memory is allocated. Depending on the software usagemodel, a DMA protection domain may represent memory allocated to a VM,or the DMA memory allocated by a guest-OS driver running in a VM or aspart of the VMM itself. The VT-d architecture enables system software toassign one or more I/O devices to a protection domain. DMA isolation isachieved by restricting access to a protection domains physical memory fromI/O devices not assigned to it, through address- translation tables. The I/O devices assigned to a protection domain can be provided a view ofmemory that may be different than the host view of physical memory. VT-dhardware treats the address specified in a DMA request as a DMA virtualaddress (DVA). Depending on the software usage model, a DVA may be theGuest Physical Address (GPA) of the VM to which the I/O device is assigned,or some software-abstracted virtual I/O address (similar to CPU linearaddresses). VT-d hardware transforms the address in a DMA request issuedby an I/O device to its corresponding Host Physical Address (HPA).
  • 49. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.DMA Remapping(Contd.) Figure 5 illustrates DMAaddress translation in amulti-domain usage. I/Odevices 1 and 2 areassigned to protectiondomains 1 and 2,respectively, each withits own view of the DMAaddress space. Figure 6 illustrates a PCplatform configurationwith VT-d hardwareimplemented in thenorth-bridge component.Figure 5: DMA remappingFigure 6: Platform configuration with VT-d
  • 50. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Intel Smart Memory Access Intel Smart Memory Access improves system performance byoptimizing the use of the available data bandwidth from the memorysubsystem and hiding the latency of memory accesses. The goal isto ensure that data can be used as quickly as possible and is locatedas close as possible to where it’s needed to minimize latency andthus improve efficiency and speed. Intel Smart Memory Access includes a new capability called memorydisambiguation, which increases the efficiency of out-of-orderprocessing by providing the execution cores with the built-inintelligence to speculatively load data for instructions that are aboutto execute before all previous store instructions are executed. Intel Smart Memory Access also includes an instruction pointer-based prefetcher that ―prefetches‖ memory contents before they arerequested so they can be placed in cache and readily accessed whenneeded. Increasing the number of loads that occur from cacheversus main memory reduces memory latency and improvesperformance.
  • 51. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Intel Smart Memory Access(Contd.)How Intel smart memory access improves executionthroughput? Intel core microarchitecture memory cluster (level 1 data memorysubsystem) is highly out of order, non blocking and speculative.It has a variety of methods of caching and buffering to helpachieve its performance. Included among these are Intel SmartMemory Access and its two key features: memory disambiguationand instruction pointer based (IP-based) prefetcher to the level 1data cache.
  • 52. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Memory Disambiguation Since Intel Pentium pro and all Intel processor have featured a sophisticatedout of order memory engine allowing the CPU to execute non -dependentinstruction in any order but they had significant short coming, theseprocessors were built around a conservative set of assumptions concerningwhich memory accesses could proceed out of order. They would not move aload in the execution order above a store having an unknown address (caseswhere a prior store has not been executed yet). This was because if the storeand load end up sharing the same address, it results in an incorrectinstruction execution. Yet many loads are to locations unrelated to recentlyexecuted stores. Prior hardware implementations created false dependenciesif they blocked such loads based on unknown store addresses. All these falsedependencies resulted in many lost opportunities for out-of-order execution. In designing Intel Core microarchitecture, Intel sought a way to eliminatefalse dependencies using a technique known as memory disambiguation.(―Disambiguation‖ is defined as the clarification that follows the removal of anambiguity.) Through memory disambiguation, Intel Core microarchitecture isable to resolve many of the cases where the ambiguity of whether aparticular load and store share the same address thwart out-of-orderexecution.
  • 53. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Memory Disambiguation(Contd.) Memory disambiguation uses a predictor and accompanyingalgorithms to eliminate these false dependencies that block a loadfrom being moved up and completed as soon as possible. The basicobjective is to be able to ignore unknown store-address blockingconditions whenever a load operation dispatched from theprocessor’s reservation station (RS) is predicted to not collide with astore. This prediction is eventually verified by checking all RS-dispatched store addresses for an address match against newerloads that were predicted non-conflicting and already executed. Ifthere is an offending load already executed, the pipe is flushed andexecution restarted from that load. The memory disambiguation predictor is based on a hash table thatis indexed with a hashed version of the load’s EIP address bits.(―EIP‖ is used here to represent the instruction pointer in all x86modes.) Each predictor entry behaves as a saturating counter, withreset.
  • 54. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Memory Disambiguation(Contd.)The predictor has two write operation both done during theload’s retirement: Increment the entry if load ―behaved well‖ that if it meetunknown store address but none of them collided. Reset the entry to zero if the load ―misbehaved.‖ That is, if itcollided with at least one older store that was dispatched by theRS after the load. The reset is done regardless of whether theload was actually disambiguated.The predictor takes a conservative approach. In order to allowmemory disambiguation, it requires that a number ofconsecutive iterations of a load having the same EIP behavewell. This isn’t necessarily a guarantee of success though. Iftwo loads with different EIPs clash in the same predictorentry, their prediction will interact.
  • 55. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Memory Disambiguation(Contd.)Predictor lookup The predictor is looked up when load instruction is dispatched from RS tothe memory pipe. If the respective counter is saturated, the load isassumed to be safe and the result is written to the ―disambiguationallowed bit‖ in the loaded buffer. This means that if load finds its relevantstore address and the load is allowed to go on. If the predictor is notsaturated, the load will behave like in prior implementations. In otherwords, if there is a relevant unknown store address, the load will getblocked. Load dispatch In case the load meets an older unknown store address, it sets the―update bit‖ indicating the load should update the predictor. If theprediction was "go,‖ the load will be dispatched and set the ―done‖ bitindicating that disambiguation was done. If the prediction was "no go,"the load will be conservatively blocked until resolving of all older storeaddresses.
  • 56. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Memory Disambiguation(Contd.)Prediction verification To recover in case of a misprediction by the disambiguation predictor, theaddress of all the store operations dispatched from the RS to the MemoryOrder Buffer must be compared with the address of all the loads that areyounger than the store. If such a match is found the respective ―reset bit‖is set. When a load retires that was disambiguated and its reset bit set,we restart the pipe from that load to re-execute it and all its dependentinstructions correctly.Watchdog mechanism Disambiguation is based on prediction and mispredictions can causeexecution pipe flush, it’s important to build in safeguards to avoid rarecases of performance loss. Consequently, Intel Core microarchitectureincludes a mechanism to temporarily disable memory disambiguation toprevent cas.es of performance loss. This mechanism constantly monitorsthe success rate of the disambiguation predictor.
  • 57. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Advanced smart cache Intel Advanced Smart Cache is a multi-core optimized cache thatimproves performance and efficiency by increasing the probabilitythat each execution core of a dual core processor can access datafrom a higher-performance, more-efficient cache subsystem. To accomplish this, Intel Core microarchitecture shares the Level 2(L2) cache between the cores. This better optimizes cache resourcesby storing data in one place that each core can access. By sharing L2cache between each core, Intel Advanced Smart Cache allows eachcore to dynamically use up to 100 percent of available L2 cache.Threads can then dynamically use the required cache capacity. As an extreme example, if one of the cores is inactive, the other corewill have access to the full cache. Intel Advanced Smart Cacheenables very efficient sharing of data between threads running indifferent cores. It also enables obtaining data from cache at higherthroughput rates for better performance. Intel Advanced SmartCache provides a peak transfer rate of 96 GB/sec (at 3 GHzfrequency).
  • 58. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Wide dynamic execution Intel Wide Dynamic Execution significantly enhances dynamic execution,enabling delivery of more instructions per clock cycle to improve executiontime and energy efficiency. Every execution core is 33 percent wider thanprevious generations, allowing each core to fetch, decode, and retire up tofour full instructions simultaneously. Intel Wide Dynamic Execution also includes a new and innovative capabilitycalled Macrofusion. Macrofusion combines certain common x86 instructionsinto a single instruction that is executed as a single entity, increasing thepeak throughput of the engine to five instructions per clock. The wideexecution engine, when Macrofusion comes into play, is then capable of up tosix instructions per cycle throughputs for even greater energy -efficientperformance. Intel Core microarchitecture also uses extended microfusion, a technique that―fuses‖ micro-ops derived from the same macro-op to reduce the number ofmicro-ops that need to be executed. Studies have shown that micro-op fusioncan reduce the number of micro-ops handled by the out-of-order logic bymore than 10 percent. Intel Core microarchitecture ―extends‖ the number of micro-ops that can befused internally within the processor.
  • 59. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Wide dynamic execution(Contd.)Intel Core microarchitecture also incorporates an updated ESP(Extended Stack Pointer) Tracker. Stack tracking allows safeearly resolution of stack references by keeping track of the valueof the ESP register. About 25 percent of all loads are stack loadsand 95 percent of these loads may be resolved in the front end,again contributing to greater energy efficiency [Bekerman].Micro-op reduction resulting from micro-op fusion, Macrofusion,ESP Tracker, and other techniques make various resources in theengine appear virtually deeper than their actual size and resultsin executing a given amount of work with less toggling ofsignals—two factors that provide more performance for the sameor less power.Intel Core microarchitecture also provides deep out of-orderbuffers to allow for more instructions in flight, enabling more out-of-order execution to better instruction level parallelism.
  • 60. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Advanced Digital media boost Intel Advanced Digital Media Boost helps achieve similar dramatic gainsin throughputs for programs utilizing SSE instructions of 128-bitoperands. (SSE instructions enhance Intel architecture by enablingprogrammers to develop algorithms that can mix packed, single-precision, and double-precision floating point and integers, using SSEinstructions.) These throughput gains come from combining a 128-bit-wide internaldata path with Intel Wide Dynamic Execution and matching widths andthroughputs in the relevant caches. Intel Advanced Digital Media Boostenables most 128-bit instructions to be dispatched at a throughput rateof one per clock cycle, effectively doubling the speed of execution andresulting in peak floating point performance of 24 GFlops (on each core,single precision, at 3 GHz frequency). Intel Advanced Digital Media Boost is particularly useful when runningmany important multimedia operations involving graphics, video, andaudio, and processing other rich data sets that use SSE, SSE2, and SSE3instructions.
  • 61. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Intelligent power capability Intel Intelligent Power Capability is a set of capabilities for reducingpower consumption and device design requirements. This featuremanages the runtime power consumption of all the processor’sexecution cores. It includes an advanced power-gating capabilitythat allows for an ultra fine-grained logic control that turns onindividual processor logic subsystems only if and when they areneeded. Additionally, many buses and arrays are split so that data required insome modes of operation can be put in a low-power state when notneeded. In the past, implementing such power gating has beenchallenging because of the power consumed in powering down andramping back up, as well as the need to maintain systemresponsiveness when returning to full power [Wechsler]. Through Intel Intelligent Power Capability Intel has been able tosatisfy these concerns, ensuring significant power savings withoutsacrificing responsiveness.
  • 62. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Referenceshttp://www.brighthub.comhttp://mintywhite.comhttp://www.flyertalk.comhttp://www.overclock.net/a/hyperthreading-explainedhttp://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdfhttp://software.intel.com/sites/default/files/m/3/4/d/6/3/18374-sma.pdfhttp://www.youtube.com/watch?v=gqZrarZiHp8http://www.youtube.com/watch?v=3fcI6G7Scqkhttp://www.youtube.com/watch?v=V9AiN7oJaIMhttp://www.youtube.com/watch?v=kkrqyEpINSQhttp://www.youtube.com/watch?v=y0Q40pBoIwA
  • 63. Software & Services GroupDeveloper Products Division Copyright© 2011, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners.Thank You