0
Intel® Core™ Microarchitecture           Intel® Software College
Intel® Software CollegeObjectivesAfter completion of this module you will be able to describe• Components of an IA process...
Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable featuresMicro-architecture tourCoding considerations...
Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable featuresMicro-architecture tourCoding considerations...
Industrial Recognition                                                                                                    ...
Intel® Software College Performance Summary Intel® Core™ Microarchitecture dramatically boosts Intel platform performance ...
Intel® Software CollegeAgendaIntroductionKnowledge preparation• Architecture VS Microarchitecture• CISC VS RISC• Performan...
Intel® Software College         Architecture and Micro-architectureWhat is Computer Architecture?• Architecture is the set...
Intel® Software CollegeArchitecture and Micro-architecture (cont.)What is Micro-architecture?• Same as m–Architecture or u...
Intel® Software College      Intel® Architecture History                                                                * ...
Intel® Software CollegeIntel® Core™ Microarchitecture Processors Intel® NetBurst®+ New Innovations     MobileMicroarchitec...
Intel® Software CollegeRISC Approach to CPU design  (RISC = Reduced Instruction Set Computers)     Optimize H/W for common...
Intel® Software CollegeCISC Approach to CPU design     (CISC = Complex Instruction Set Computers)  Rich architecture  • Va...
Intel® Software CollegePerformance Measurement Performance is the reciprocal of the “Time of execution”:                  ...
Intel® Software CollegePerformance Measurement (cont.)                                                                    ...
Intel® Software CollegeDesign Considerations for DifferentMarket SegmentsConstrains:• Thermally, area constrained         ...
Intel® Software CollegeDesign MetricsIPC = Instructions per Cycle• The more the betterLatency – same as Response Time• The...
Intel® Software CollegeCPU PipelineBreak the work to smaller pieces• Four basic stages of instruction life  •      Fetch -...
Intel® Software CollegePipeline Design - Explore ParallelismNew instruction not always depends on previous one•       Can ...
Intel® Software CollegePipeline Design – Fighting StallsData flow dependency (instructions output/input)• Solved by bypass...
Intel® Software CollegeRace of CISC vs. RISCIn modern CPUs Advanced µ-Architecture Techniques minimize theadvantages of RI...
Intel® Software Collegeop – Intel’s Take of the CICS/RISC Race(CISC) Instructions are translated into one or more (RISC)uo...
Intel® Software CollegePower and EnergyMaximum power (TDP):•    Cooling requirements•    Cooling solution•    Computer for...
Intel® Software CollegeDual/Multi Core and SMT Put more than one core per package Architectural change:    • Software must...
Intel® Software CollegeIntel Approach                                                                                     ...
Intel® Software CollegeA “Acronym Cheat Sheet” of ParallelComputingCMP: Chip Multi Processor (two or more cores per packag...
Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable features• Wide Dynamic Execution• Smart Memory Acces...
Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures                  Instruction FetchIntel® Wide Dynam...
Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures (cont.)Intel® Advanced Memory Access• Improved pref...
Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures (cont.)Intel® Advanced Smart Cache• Multi-core opti...
Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures (cont.)Advantages of Shared Cache                  ...
Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures (cont.)Advantages of Shared Cache (cont.)          ...
Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures (cont.)Intel® Advanced Digital Media Boost         ...
Intel® Software CollegeIntel® Core® Micro-architecture NotableFeaturesIntel® Advanced Digital Media Boost• Additional Medi...
Intel® Software CollegeIntel® Core® Micro-architecture NotableFeaturesIntel® Advanced Digital Media Boost• Supplemental SS...
Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures (cont.)Intelligent Power Capability• Advanced power...
Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable featuresMicro-architecture tour• Front End• Out-Of-O...
Intel® Software CollegeIntel® Core® Micro-architecture Drill-down                                                         ...
Intel® Software CollegeAgendaIntroductionKnowledge refreshmentNotable featuresMicro-architecture tour• Front End• Out-Of-O...
Intel® Software CollegeCore® Micro-architecture Front EndInstruction preparation before executed                          ...
Intel® Software CollegeIntel® Core™ Microarchitecture – Front End  Instruction Queue  Buffer between instruction pre-decod...
Intel® Software CollegeIntel® Core™ Microarchitecture – Front End  Macro - Fusion                                         ...
Intel® Software CollegeIntel® Core™ Microarchitecture – Front End  Macro-Fusion Absent                                    ...
Intel® Software CollegeIntel® Core™ Microarchitecture – Front End  Macro-Fusion Presented                                 ...
Intel® Software CollegeIntel® Core™ Microarchitecture – Front End  Instruction Decode / Micro-Op Fusion  Frequent pairs of...
Intel® Software CollegeIntel® Core™ Microarchitecture – Front End  Instruction Decode / Micro-Fusion (cont.)  u-ops of a S...
Intel® Software CollegeIntel® Core™ Microarchitecture – Front End  Branch Prediction Improvements  Intel® Pentium® 4 Proce...
Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable featuresMicro-architecture tour• Front End• Out-Of-O...
Intel® Software CollegeCore® Micro-architecture Execution Core                                                            ...
Intel® Software CollegeIntel® Core™ Microarchitecture – Execution Core  Execution Core Building Blocks                    ...
Intel® Software CollegeIntel® Core™ Microarchitecture – Execution Core  Issue Ports and Execution Units  6 dispatch ports ...
Intel® Software CollegeIntel® Core™ Microarchitecture – Execution Core  Retirement Unit  ReOrder Buffer (ROB)  • Holds mic...
Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable featuresMicro-architecture tour• Front End• Out-Of-O...
Intel® Software CollegeCore® Micro-architecture Memory Sub-SystemMemory Ordering Buffer• Store Address Buffer • Stores the...
Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system  Core® Micro-architecture Memory Sub-  System (c...
Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system  Advanced Memory Access / Enhanced Data  Pre-fet...
Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system  Advanced Memory Access / Enhanced Data  Pre-fet...
Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system  Advanced Memory Access / Memory  Disambiguation...
Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system  Advanced Memory Access / Memory  Disambiguation...
Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system  Advanced Memory Access / Memory  Disambiguation...
Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system  Advanced Memory Access / Stores  Forwarding  If...
Intel® Software CollegeAdvanced Memory Access / StoresForwarding: Aligned Store Cases store 16                           s...
Intel® Software CollegeAdvanced Memory Access / StoresForwarding: Unaligned CasesNote that unaligned store forward does no...
Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable featuresMicro-architecture tourCoding considerations...
Intel® Software CollegeOptimizing forInstruction Fetch and PreDecodeAvoid “Length Changing Prefixes” (LCPs)• Affects instr...
Intel® Software CollegeOptimizing forInstruction QueueIncludes a “Loop Stream Detector” (LSD)• Potentially very high bandw...
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
01 intel processor architecture core
Upcoming SlideShare
Loading in...5
×

01 intel processor architecture core

2,829

Published on

Published in: Technology, Business
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,829
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
133
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "01 intel processor architecture core"

  1. 1. Intel® Core™ Microarchitecture Intel® Software College
  2. 2. Intel® Software CollegeObjectivesAfter completion of this module you will be able to describe• Components of an IA processor• Working flow of the instruction pipeline• Notable features of the architecture Intel® Processor Micro-architecture - Core® microarchitecture 2 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  3. 3. Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable featuresMicro-architecture tourCoding considerations Intel® Processor Micro-architecture - Core® microarchitecture 3 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  4. 4. Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable featuresMicro-architecture tourCoding considerations Intel® Processor Micro-architecture - Core® microarchitecture 4 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  5. 5. Industrial Recognition Intel® Software CollegePC Format May 2006“Intel Strikes Back! Conroe is the name. Pistol-whipping Athlon64s into burger meat is the game..“ Intels Next Generation Microarchitecture Unveiled Real World Tech “Just as important as the technical innovations in Core MPUs, this microarchitecture will have a profound impact on the industry. “ Intel Dishes the Knockout Punch to AMD with Conroe, GD Hardware.com “…the results were far more than we could hope for and itll be amusing to see AMDs response to this beat-down sessionIntel Regains Performance Crown, Anandtech“… At 2.8 or 3.0GHz, a Conroe EE would offer even stronger performancethan what we’ve seen here.” Intel Reveals Conroe Architecture, Extremetech “… And not only was the Intel system running at 2.66GHz— a slower clock rate than the top Pentium 4—it was outpacing an overclocked Athlon 64 FX-60. Wrap your brain around that idea for a bit…” Conroe Benchmarks - Intel Showing Big Strength Hot Hardware.com Intel® Processor Micro-architecture - Core® microarchitecture “… Intel is poised to change the face of the desktop computing landscape…” 5 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  6. 6. Intel® Software College Performance Summary Intel® Core™ Microarchitecture dramatically boosts Intel platform performance • Conroe & Woodcrest drive clear Desktop/Server performance leadership • Merom extends Intel Mobile performance leadership Intel® Core™ Microarchitecture-based platforms set the bar in Performance and Energy Efficiency for the Multi- Core era • Intel’s 3rd generation dual-core (while competition stuck on 1st generation) • New Intel high-performance ‘engine’: Wider, Smarter, Faster, More Efficient Best Processor on the Planet: Energy-Efficient Performance 1 Energy- The “Core™ Effect”: Intel® Core™ Microarchitecture20% (Merom), broad roadmap accelerationsPerformance Boosts1 ! ramp fuels 40% (Conroe), 80% (Woodcrest) Intel® Processor Micro-architecture - Core® microarchitecture 6 1 Based on SPECint*_rate_base2000 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  7. 7. Intel® Software CollegeAgendaIntroductionKnowledge preparation• Architecture VS Microarchitecture• CISC VS RISC• Performance Measurements• Pipeline Design• Power and Energy• Chip Multi-ProcessingNotable featuresMicro-architecture tourCoding considerations Intel® Processor Micro-architecture - Core® microarchitecture 7 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  8. 8. Intel® Software College Architecture and Micro-architectureWhat is Computer Architecture?• Architecture is the set of features which are externally visible: • Instruction set • Registers • Addressing modes • Bus protocolsIntel Architectures (IA)• IA32/X86 (8-bit, 16-bit and 32-bit Integer architecture) • X87 (Floating Point extension) • MMX (Multi-Media extension) • SSE, SSE2, SSE3 (SIMD Streaming Extension)• Intel® 64/EM64T (64-bit Integer extension of IA32) ? Go to detail!• IA64 (Intel new 64-bit architecture) • Itanium/Itainium2 processor family Intel® Processor Micro-architecture - Core® microarchitecture 8 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  9. 9. Intel® Software CollegeArchitecture and Micro-architecture (cont.)What is Micro-architecture?• Same as m–Architecture or u-Architecture• “Invisible” features that provide meaningful value to the end user (whatever makes you buy a new compatible PC) • Programs run faster Improved Performance • Reduced Power consumption Extended Battery life • H/W fits into Smaller Form Factor Intel® Processor Micro-architecture - Core® microarchitecture 9 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  10. 10. Intel® Software College Intel® Architecture History * IXA – Intel Internet Exchange Architecture/ EPIC – Explicitly Parallel Instruction Computing Examples:Architecture:Instruction set definition EPIC* (Itanium®) IA-32 IXA* (XScale)and compatibilityMicroarchitecture:Hardware implementation Examples:maintaining instruction setcompatibility with high-level P5 P6 Intel NetBurst® BaniasarchitectureProcessors:Productizedimplementation ofMicroarchitecture Examples: Pentium® 4 Pentium® Pro Pentium® Pentium® D Pentium® M Pentium® II/III Xeon® Intel® Processor Micro-architecture - Core® microarchitecture 10 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  11. 11. Intel® Software CollegeIntel® Core™ Microarchitecture Processors Intel® NetBurst®+ New Innovations MobileMicroarchitecture Intel® Core™ 2 Duo/Quad/Extreme processors Intel® Processor Micro-architecture - Core® microarchitecture 11 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  12. 12. Intel® Software CollegeRISC Approach to CPU design (RISC = Reduced Instruction Set Computers) Optimize H/W for common basic operations • Fixed instruction length • Shorter Execution Pipeline • Ease of Instruction Level Parallelism • Large number of registers • Less memory accesses • ‘Load/Store’ architecture • Shorter Execution Pipeline • Ease of advancing Loads • Branch Hints • Reduce pipeline flush events • ‘Exotic’ stuff to be implemented in S/W with minimal H/W support • No ‘complex’ H/W instructions • Handle exceptional conditions in S/W Examples: MIPS, IBM Power and PowerPC, Sun Sparc Achieve Maximum performance by right partitioning between H/W and S/W Intel® Processor Micro-architecture - Core® microarchitecture 12 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  13. 13. Intel® Software CollegeCISC Approach to CPU design (CISC = Complex Instruction Set Computers) Rich architecture • Variable length instructions. • Complex addressing modes. On-chip HW / SW partitioning required • H/W keeps executing ‘simple’ stuff • Complex instructions are ‘emulated’ using u-code routines from ROM • More instructions treated as ‘simple’ as more H/W is available COMPATIBILITY has some major advantages: • Large (and forever increasing) software base • Code development tools • Expertise • H/W - S/W spiral Example: Intel IA32, Motorola 680X0 Maximize information passed to the HW Intel® Processor Micro-architecture - Core® microarchitecture 13 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  14. 14. Intel® Software CollegePerformance Measurement Performance is the reciprocal of the “Time of execution”: 1 1 Performance ≈ = Were: Time _ of _ Execution L * CPI * TC L = Code Length (# of machine instructions) CPI = Clock cycles Per Instruction Tc = Clock period (nSecs) Substitute: IPC = Instructions Per Cycle = 1/CPI F = Frequency = 1/Tc Improve ILP Improve Timing IPC * F Performance ≈ L Arch Enhancements Intel® Processor Micro-architecture - Core® microarchitecture 14 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  15. 15. Intel® Software CollegePerformance Measurement (cont.) Benchmarks examplesPerformance considerations: • Industry Standard• Which Code/Application to run? • Spec (ISPEC, FSPEC)• Which OS? • TPC • Commercial• Which other components in the • SysMark platform? • MobileMark• Under which thermal conditions? • PCMark• Multithreading? Multiprocessing? • Sandra • ScienceMark • Applications • Video (Windows Media encoder, DivX) • Audio (Lame MP3) • Compression (RAR) • Content creation (3DSM, Photoshop, Premiere) • Latest Games (Doom III, FarCry, but changes fast) • Specific industries use specific benchmarks • Linux compilation, POVRay, LinPack, lmbench Intel® Processor Micro-architecture - Core® microarchitecture 15 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  16. 16. Intel® Software CollegeDesign Considerations for DifferentMarket SegmentsConstrains:• Thermally, area constrained Desktop• Unconstrained Extreme• Very area constrained Value• Thermally, Energy and Area constrained Mobile• Thermally, Energy ServersMicro-architecture is the Art of Tradeoffs between:• Schedule• Requirements / Standards• Performance• Features• Power / Energy• Area / Cost Intel® Processor Micro-architecture - Core® microarchitecture 16 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  17. 17. Intel® Software CollegeDesign MetricsIPC = Instructions per Cycle• The more the betterLatency – same as Response Time• The time interval between • when any request for data is made and • when the data transfer completes• The less the betterThroughput• The amount of work completed by the system per unit of time.• The more the better• ops/sec Intel® Processor Micro-architecture - Core® microarchitecture 17 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  18. 18. Intel® Software CollegeCPU PipelineBreak the work to smaller pieces• Four basic stages of instruction life • Fetch - bring instruction to core • Decode - read operands from register • Execute - perform the operation • Writeback - save result to register• Execution timing of simple instructions (legend: “op src1,src2 dst”) add eax, ebx eax F D E W sub ecx, edx ecx F D E WIncreased throughput• increased number of completed instructions per cycle Intel® Processor Micro-architecture - Core® microarchitecture 18 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  19. 19. Intel® Software CollegePipeline Design - Explore ParallelismNew instruction not always depends on previous one• Can start new instruction before previous one is finished• ...if different stages use different H/W resourcesRun instructions in parallel (pipeline)Add eax, ebx eax F D E WSub ecx, edx ecx F D E WOr edi, esi edi F D E WNeed to balance pipe stages• Each stage should take same time for best throughput and utilization Clock cycle is determined by the longest path! Fetch Decode Exec WB Fetch Decode Exec WB Fetch Decode Exec WB Fetch Decode Exec WB Intel® Processor Micro-architecture - Core® microarchitecture 19 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  20. 20. Intel® Software CollegePipeline Design – Fighting StallsData flow dependency (instructions output/input)• Solved by bypasses, renaming etcControl flow dependencies• Solved by branch predictionOthers (Cache misses, long latency instructions)• Solved by other dynamic scheduling techniques ? Go to detail! Intel® Processor Micro-architecture - Core® microarchitecture 20 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  21. 21. Intel® Software CollegeRace of CISC vs. RISCIn modern CPUs Advanced µ-Architecture Techniques minimize theadvantages of RISC over CISC• Branch Prediction • Reduces the effect of extra pipeline stages• Register Renaming • Effectively Increase the Number of Registers• Out Of Order • Reduce Number of stalls caused by shortage of registers• Speculative Execution • Further Reduce Number of stalls• Power saving features • Reduce the overhead when not needed. Intel® Processor Micro-architecture - Core® microarchitecture 21 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  22. 22. Intel® Software Collegeop – Intel’s Take of the CICS/RISC Race(CISC) Instructions are translated into one or more (RISC)uop(micro-operation)s• Fixed format• Wide and simple• Temp registersUsually one uop per instructionComplex instruction can be thousands of uopsStores divided into two uops (STA and STD)Fusion play games here Intel® Processor Micro-architecture - Core® microarchitecture 22 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  23. 23. Intel® Software CollegePower and EnergyMaximum power (TDP):• Cooling requirements• Cooling solution• Computer form factor and acoustic noiseAverage power• Battery life• Electricity billGeneral calculation:• P = frequency * voltage^2 * activity factor * capacitance + leakageReducing TDP• Less transistors and wires• Smaller transistors and wires• Power features less activity• Low leakage transistorsReducing average power• Energy efficiency• Power states• Lower leakage Intel® Processor Micro-architecture - Core® microarchitecture 23 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  24. 24. Intel® Software CollegeDual/Multi Core and SMT Put more than one core per package Architectural change: • Software must be multi-threaded or multi-process • …but backward compatible with multiprocessor systems (MP) Several ways of implementing it • All of them being used I/O I/O I/O I/O LLC LLC LLC LLC LLC Core Core Core Core Core Core SMT: Run two (or more) threads on the same core, simultaneously Intel® Processor Micro-architecture - Core® microarchitecture 24 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  25. 25. Intel® Software CollegeIntel Approach ? Intel® Intel® XQ6700* Intel® Intel® Core 2 Duo® Duo® Intel® Intel® Pentium® D Pentium® Processor 80 Threads Intel® Intel® Pentium® Pentium® With HT Intel® Intel® 4 Threads Pentium® Pentium® 2 Threads State 2 Threads Execution Units Cache Bus 2 Threads 1 Threads Q4 2000 Q2 2003 Q2 2005 Q3 2006 Q4 2006 While single core performance has increased due to clock speed, While single core performance has increased due to clock speed, increased cache and improved ILP the biggest performance increases increased cache and improved ILP the biggest performance increases have come from the thread level parallelism. have come from the thread level parallelism. Intel® Processor Micro-architecture - Core® microarchitecture 25 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  26. 26. Intel® Software CollegeA “Acronym Cheat Sheet” of ParallelComputingCMP: Chip Multi Processor (two or more cores per package)• Dual Core: two cores in same package• Quad Core: four cores in same packageDP: Dual Processor (two packages)MP: Multi Processor (four or more packages)SMT: Symmetric Multi Threading (virtual multi core: HyperThreading) Intel® Processor Micro-architecture - Core® microarchitecture 26 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  27. 27. Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable features• Wide Dynamic Execution• Smart Memory Access• Advanced Smart Cache• Advanced Digital Media Boost• Intelligent Power CapabilityMicro-architecture tourCoding considerations Intel® Processor Micro-architecture - Core® microarchitecture 27 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  28. 28. Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures Instruction FetchIntel® Wide Dynamic Execution and PreDecode• 14-stage efficient pipeline Instruction Queue 2M/4M • Wider execution path 5 shared L2 • Advanced branch prediction uCode ROM Decode Cache • Macro-fusion 4 • Roughly ~15% of all instructions are conditional branches up to • Macro-fusion fuses a comparison Rename/Alloc and jump to reduce micro-ops 10.4 Gb/s running down the pipeline FSB • Micro-fusion Retirement Unit 4 • Merges the load and operation (ReOrder Buffer) micro-ops into one macro-op• 64-Bit Support Schedulers ALU ALU ALU • Merom, Conroe, and Woodcrest Branch FAdd FMul support EM64T MMX/SSE MMX/SSE MMX/SSE Load Store FPmove FPmove FPmove L1 D-Cache and D-TLB Intel® Processor Micro-architecture - Core® microarchitecture 28 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  29. 29. Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures (cont.)Intel® Advanced Memory Access• Improved prefetching• Memory disambiguation • Advance load before a possible data dependency (pointer conflict) • Earlier loads hide memory latencies Intel® Processor Micro-architecture - Core® microarchitecture 29 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  30. 30. Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures (cont.)Intel® Advanced Smart Cache• Multi-core optimization • Shared between the two cores • Advanced Transfer Cache architecture • Reduced bus traffic • Both cores have full access to the entire cache • Dynamic Cache sizing Intel® Processor Micro-architecture - Core® microarchitecture 30 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  31. 31. Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures (cont.)Advantages of Shared Cache Memory Front Side Bus (FSB) Shipping L2 Cache Line ~Half access to memory Cache Line CPU1 CPU2 Intel® Processor Micro-architecture - Core® microarchitecture 31 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  32. 32. Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures (cont.)Advantages of Shared Cache (cont.) Memory Front Side Bus (FSB) L2 is shared: No need to ship cache line Cache Line CPU1 CPU2 Intel® Processor Micro-architecture - Core® microarchitecture 32 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  33. 33. Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures (cont.)Intel® Advanced Digital Media Boost SIMD Operation (SSE/SSE2/SSE3/SSSE)• Single Cycle SIMD Operation SOURCE 127 0 • 8 Single Precision Flops/cycle X4 X3 X2 X1 • 4 Double Precision Flops/cycle SSE/2/3 OP• Wide Operations Y4 Y3 Y2 Y1 • 128-bit packed Add DEST • 128-bit packed Multiply Core™ µarch • 128-bit packed Load CLOCK X4opY4 X3opY3 X2opY2 X1opY1 • 128-bit packed Store CYCLE 1• Support for Intel® EM64T Previous CLOCK X2opY2 X1opY1 CYCLE 1 instructions CLOCK X4opY4 X3opY3 CYCLE 2 Intel® Processor Micro-architecture - Core® microarchitecture 33 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  34. 34. Intel® Software CollegeIntel® Core® Micro-architecture NotableFeaturesIntel® Advanced Digital Media Boost• Additional Media Instructions - Supplemental Streaming SIMD Extensions 3 (SSSE3) • 16 new packed integer instructions • Targeting video encode/decode• Significantly improved strings • REP MOVS and REP STOS • ~8 bytes / cycle throughput • mileage may vary Intel® Processor Micro-architecture - Core® microarchitecture 34 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  35. 35. Intel® Software CollegeIntel® Core® Micro-architecture NotableFeaturesIntel® Advanced Digital Media Boost• Supplemental SSE-3 (SSSE-3)Horizontal Addition/Subtraction PHADDW, PHADDSW, PHADDD, PHSUBW, PHSUBSW, PHSUBD Packed Absolute Values PABSB, PABSW, PABSD Multiply and Add Packed Signed/Unsigned bytes PMADDUBSW Packed multiply High with Round and Scale PMULHRSW Packed Shuffle Bytes PSHUFB Packed SIGN PSIGNB/W/D Packed Align Right PALIGNR Intel® Processor Micro-architecture - Core® microarchitecture 35 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  36. 36. Intel® Software CollegeIntel® Core® Micro-architecture NotableFeatures (cont.)Intelligent Power Capability• Advanced power gating & Dynamic power coordination • Multi-point demand-based switching • Voltage-Frequency switching separation • Supports transitions to deeper sleep modes • Event blocking • Clock partitioning and recovery • Dynamic Bus Parking • During periods of high performance execution, many parts of the chip core can be shut off Intel® Processor Micro-architecture - Core® microarchitecture 36 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  37. 37. Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable featuresMicro-architecture tour• Front End• Out-Of-Order Execution Core• Memory Sub-systemCoding considerations Intel® Processor Micro-architecture - Core® microarchitecture 37 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  38. 38. Intel® Software CollegeIntel® Core® Micro-architecture Drill-down page miss handler store icache branch address integer predictionpredecode unit data memory FP load SIMD cache orderinstruction unit buffer store (3x) queue data instruction register Reservation decode alias table Station MS ALLOC Re-Order Buffer Intel® Processor Micro-architecture - Core® microarchitecture 38 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  39. 39. Intel® Software CollegeAgendaIntroductionKnowledge refreshmentNotable featuresMicro-architecture tour• Front End• Out-Of-Order Execution Core• Memory Sub-systemCoding considerations Intel® Processor Micro-architecture - Core® microarchitecture 39 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  40. 40. Intel® Software CollegeCore® Micro-architecture Front EndInstruction preparation before executed icache branch• Instruction Fetch Unit prediction predecode unit• Instruction Queue• Instruction Decode Unit• Branch Prediction Unit instruction queue instruction decode MS Intel® Processor Micro-architecture - Core® microarchitecture 40 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  41. 41. Intel® Software CollegeIntel® Core™ Microarchitecture – Front End Instruction Queue Buffer between instruction pre-decode unit and decoder • up to six predecoded instructions written per cycle • 18 Instructions contained in IQ • up to 5 Instructions read from IQ Potential Loop cache Loop Stream Detector (LSD) support • Re-use of decoded instruction • Potential power saving Intel® Processor Micro-architecture - Core® microarchitecture 41 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  42. 42. Intel® Software CollegeIntel® Core™ Microarchitecture – Front End Macro - Fusion Scheduler Roughly ~15% of all instructions are cmpjae eax, [mem], label conditional branches. Macro-fusion merges two instructions into a single micro-op, as if the two instructions were a single long instruction. Execution Enhanced Arithmetic Logic Unit (ALU) for macro-fusion. Each macro-fused instruction executes with a single dispatch. Branch Eval Not supported in EM64T long mode flags and target to Write back Intel® Processor Micro-architecture - Core® microarchitecture 42 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  43. 43. Intel® Software CollegeIntel® Core™ Microarchitecture – Front End Macro-Fusion Absent Instruction Queue addps xmm0, [EAX+16] Read four instructions from mulps xmm0, xmm0 Instruction Queue Each instruction gets decoded movps [EAX+240], xmm0 into separate uops cmp eax, 100000 Enabling Example jge label for (int i=0; i<100000; i++) { … addps xmm0, [EAX+16] dec0 Cycle 1 } mulps xmm0, xmm0 dec1 movps [EAX+240], xmm0 dec2 cmp eax, 100000 dec3 Cycle 2 jge label dec0 Intel® Processor Micro-architecture - Core® microarchitecture 43 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  44. 44. Intel® Software CollegeIntel® Core™ Microarchitecture – Front End Macro-Fusion Presented Instruction Queue addps xmm0, [EAX+16] Read five Instructions from Instruction Queue mulps xmm0, xmm0 Send fusable pair to single movps [EAX+240], xmm0 decoder cmp eax, 100000 Single uop represents two instructions jae label Enabling Example for (unsigned int i=0; Cycle 1 addps xmm0, [EAX+16] dec0 i<100000; i++) { mulps xmm0, xmm0 dec1 … movps [EAX+240], xmm0 dec2 } cmpjae eax, 100000, label dec3 Intel® Processor Micro-architecture - Core® microarchitecture 44 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  45. 45. Intel® Software CollegeIntel® Core™ Microarchitecture – Front End Instruction Decode / Micro-Op Fusion Frequent pairs of micro-operations derived from the same Macro Instruction can be fused into a single micro-operation Micro-op fusion effectively widens the pipeline Intel® Processor Micro-architecture - Core® microarchitecture 45 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  46. 46. Intel® Software CollegeIntel® Core™ Microarchitecture – Front End Instruction Decode / Micro-Fusion (cont.) u-ops of a Store “movps [EAX+240], xmm0” sta eax+240 st xmm0, [eax+240] std xmm0, [eax+240] Intel® Processor Micro-architecture - Core® microarchitecture 46 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  47. 47. Intel® Software CollegeIntel® Core™ Microarchitecture – Front End Branch Prediction Improvements Intel® Pentium® 4 Processor branch prediction PLUS the following two improvements: Indirect Branch Predictor Loop Detector Branch miss-predictions reduced by >20% Intel® Processor Micro-architecture - Core® microarchitecture 47 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  48. 48. Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable featuresMicro-architecture tour• Front End• Out-Of-Order Execution Core• Memory Sub-systemCoding considerations Intel® Processor Micro-architecture - Core® microarchitecture 48 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  49. 49. Intel® Software CollegeCore® Micro-architecture Execution Core storeAccepted decoded u-ops, assign resources, address integerexecute and retire u-ops FP load• Renamer SIMD store data (3x)• Reservation station (RS) register Reservation• Issue ports alias table Station• Execution Unit ALLOC Re-Order Buffer Intel® Processor Micro-architecture - Core® microarchitecture 49 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  50. 50. Intel® Software CollegeIntel® Core™ Microarchitecture – Execution Core Execution Core Building Blocks Renamer Ports (number) RS 0,1,5 0,1,5 SIMD/Integer 0,1,5 SIMD Floating MUL Integer ROB Integer Point Execution Unit 2 Load 3,4 Store Memory Sub-system Intel® Processor Micro-architecture - Core® microarchitecture 50 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  51. 51. Intel® Software CollegeIntel® Core™ Microarchitecture – Execution Core Issue Ports and Execution Units 6 dispatch ports from RS • 3 execution ports • (shared for integer / fp / simd) • load • store (address) • store (data) 128-bit SSE implementation • Port 0 has packed multiply (4 cycles SP 5 DP pipelined) • Port 1 has packed add (3 cycles all precisions) Intel® Processor Micro-architecture - Core® microarchitecture 51 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  52. 52. Intel® Software CollegeIntel® Core™ Microarchitecture – Execution Core Retirement Unit ReOrder Buffer (ROB) • Holds micro-ops in various stages of completion • Buffers completed micro-ops • updates the architectural state in order • manages ordering of exceptions register Reservation alias table Station ALLOC Re-Order Buffer Intel® Processor Micro-architecture - Core® microarchitecture 52 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  53. 53. Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable featuresMicro-architecture tour• Front End• Out-Of-Order Execution Core• Memory Sub-systemCoding considerations Intel® Processor Micro-architecture - Core® microarchitecture 53 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  54. 54. Intel® Software CollegeCore® Micro-architecture Memory Sub-SystemMemory Ordering Buffer• Store Address Buffer • Stores the address of each store not actually performed • Loads compare address to any store older than itself • If it find a hole…• Store Data Buffer • Stores data of each store not actually performed • If load hit on the SAB, it forward the data from here• Load Buffer • Stores address of non-retired loads • For snoops and re-dispatch• One 128-bit load and one 128-bit store per cycle to different memory locations• Out of order Memory operations Intel® Processor Micro-architecture - Core® microarchitecture 54 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  55. 55. Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system Core® Micro-architecture Memory Sub- System (cont.) 32k D-Cache (8-way, 64 byte line size) Shared second level (L2) 2MB 8-way or 4MB 16-way instruction and data cache Cache to cache transfer • improves producer / consumer style MP Wider interface to L2 • reduced interference • processor line fill is 2 cycles Core1 Core2 Higher bandwidth from the L2 cache to the core • ~14 clock latency and 2 clock throughput Load & Store Access order Bus 1. L1 cache of immediate core 2. L1 cache of the other core 2 MB L2 Cache 3. L2 cache 4. Memory Intel® Processor Micro-architecture - Core® microarchitecture 55 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  56. 56. Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Enhanced Data Pre-fetch Logic Speculates the next needed data and loads it into cache by HW and/or SW Door Valet Parking Area Main Parking Lot (L1 Cache) (L2 Cache) (External Memory) Intel® Processor Micro-architecture - Core® microarchitecture 56 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  57. 57. Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Enhanced Data Pre-fetch Logic (cont.) • L1D cache prefetching • Data Cache Unit Prefetcher • Known as the streaming prefetcher • Recognizes ascending access patterns in recently loaded data • Prefetches the next line into the processors cache • Instruction Based Stride Prefetcher • Prefetches based upon a load having a regular stride • Can prefetch forward or backward 2 Kbytes • 1/2 default page size • L2 cache prefetching: Data Prefetch Logic (DPL) • Prefetches data to the 2nd level cache before the DCU requests the data • Maintains 2 tables for tracking loads • Upstream – 16 entries • Downstream – 4 entries • Every load is either found in the DPL or generates a new entry • Upon recognition of the 2nd load of a “stream” the DPL will prefetch the next load Intel® Processor Micro-architecture - Core® microarchitecture 57 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  58. 58. Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Memory Disambiguation Memory Disambiguation predictor • Loads that are predicted NOT to forward from preceding store are allowed to schedule as early as possible • increasing the performance of OOO memory pipelines Disambiguated loads checked at retirement • Extension to existing coherency mechanism • Invisible to software and system Intel® Processor Micro-architecture - Core® microarchitecture 58 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  59. 59. Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Memory Disambiguation Absent Load4 must WAIT until previous stores complete Memory Data W Store1 Y Load2 Y Data Z Store3 W Load4 X Data Y Data X Intel® Processor Micro-architecture - Core® microarchitecture 59 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  60. 60. Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Memory Disambiguation Presented Loads can decouple from stores Load4 can get its data WITHOUT waiting for stores Memory Data W Load4 X Store1 Y Load2 Y Data Z Store3 W Data Y Data X Intel® Processor Micro-architecture - Core® microarchitecture 60 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  61. 61. Intel® Software CollegeIntel® Core™ Microarchitecture – Memory Sub-system Advanced Memory Access / Stores Forwarding If a load follows a store and reloads the data that the store writes to memory, the micro-architecture can forward the data directly from the store to the load Memory Store1 Y Internal Load2 Y Buffers Data Y Intel® Processor Micro-architecture - Core® microarchitecture 61 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  62. 62. Intel® Software CollegeAdvanced Memory Access / StoresForwarding: Aligned Store Cases store 16 store 32 bit store 64 bit load 16 load 32 bit load 64 bit ld 8 ld 8 load 16 load 16 load 32 bit load 32 bit ld 8 ld 8 ld 8 ld 8 load 16 load 16 load 16 load 16 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 store 128 bit load 128 bit load 64 bit load 64 bit load 32 bit load 32 bit load 32 bit load 32 bit load 16 load 16 load 16 load 16 load 16 load 16 load 16 load 16 ld 8 ld 8 ld 8 ld 8 ld 8 Intel® Processorld 8 ld 8 ld -8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 Micro-architecture Core® microarchitecture 62 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  63. 63. Intel® Software CollegeAdvanced Memory Access / StoresForwarding: Unaligned CasesNote that unaligned store forward does not occur when the loadcrosses a cache line boundary store 16 store 32 bit store 64 bit load 16‡ load 32 bit‡ load 64 bit ld 8 ld 8 load 16‡ load 16 load 32 bit‡ load 32 bit ld 8 ld 8 ld 8 ld 8 load 16‡ load 16 load 16 load 16 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 ld 8 Store forwarded to load Note: Unaligned 128-bit stores ld 8 No forwarding are issued as two 64-bit stores. ‡: This provides two alignments for No forwarding if the load store forwarding crosses a cache line boundary Intel® Processor Micro-architecture - Core® microarchitecture 63 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  64. 64. Intel® Software CollegeAgendaIntroductionKnowledge preparationNotable featuresMicro-architecture tourCoding considerations Intel® Processor Micro-architecture - Core® microarchitecture 64 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  65. 65. Intel® Software CollegeOptimizing forInstruction Fetch and PreDecodeAvoid “Length Changing Prefixes” (LCPs)• Affects instructions with immediate data or offset• Operand Size Override (66H)• Address Size Override (67H) [obsolete]• LCPs change the length decoding algorithm – increasing the processing time from one cycle to six cycles (or eleven cycles when the instruction spans a 16-byte boundary)• The REX (EM64T) prefix (4xH) is not an LCP • The REX prefix does lengthen the instruction by one byte, so use of the first eight general registers in EM64T is preferred Intel® Processor Micro-architecture - Core® microarchitecture 65 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  66. 66. Intel® Software CollegeOptimizing forInstruction QueueIncludes a “Loop Stream Detector” (LSD)• Potentially very high bandwidth instruction streaming• A number of requirements to make use of the LSD • Maximum of 18 instructions in up to four 16-byte packets • No RET instructions (hence, little practical use for CALLs) • Up to four taken branches allowed • Most effective at 70+ iterations• LSD is after PreDecode so there is no added cost for LCPs• Trade-off LSD with conventional loop unrolling Intel® Processor Micro-architecture - Core® microarchitecture 66 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×