Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Topic
Architecture of High End Processors
Core 2 Duo-Core I5
2
• Ahsan Zafar 12063122-087
• Naeem Raza 12063122-084
•
Zain-ul-Hassan 12063122-049
•
Fahad Ali Amjad 12063122-027
•
Mohs...
3
What is a High End Microprocessor?
The Microprocessors that are:
• Usually expensive as compared to other types
of Micro...
4
Micro-Architecture?
• Microarchitecture (sometime abbreviated to
µarch or uarch) is a description of the
electrical circ...
5
RoadMap
6
Core Micro-Architecture
• The Intel Core Microarchitecture is a new
foundation for Intel architecture-based
desktop, mob...
7
Core Micro-Architecture
Intel Core Microarchitecture based
processors:
• DP Server
Dual-core Intel® Xeon® 51xx Processor...
8
OverView
9
OverView
10
Architectural Features of Core 2
• Intel Wide Dynamic Execution
• Intel Intelligent Power Capability
• Intel Advanced S...
11
Intel® Wide Dynamic Execution
•Advantage
Wider execution
Comprehensive Advancements
Enabled in each core
Each core fetc...
12
What is L1 and L2?
• Level-1 and Level-2 caches
• The cache memories in a computer
• Much faster than RAM
• L1 is built...
13
Intel® Advanced Smart Cache
Decreased traffic
Increased traffic
Higher cache hit rate
Reduced bus traffic
Lower latency...
14
Intel® Smart Memory Access
15
Intel® Smart Memory Access
16
Intel® Smart Memory Access
17
Intel® Smart Memory Access
18
Intel® Smart Memory Access
19
Intel® Smart Memory Access
20
Intel® Smart Memory Access
21
Intel® Smart Memory Access
22
Intel® Smart Memory Access
23
Intel® Smart Memory Access
24
Intel® Smart Memory Access
25
Intel® Smart Memory Access
26
Intel® Smart Memory Access
27
Intel® Smart Memory Access
• Why?
– Lost opportunities for out-of-order execution.
• What is the idea?
– Ignore the sto...
28
Intel® Advanced Digital Media Boost
Lower 64 bit in one cycle, upper in the next
29
Intel® Advanced Digital Media Boost
128 bit instruction completed in one cycle
30
Intel® Advanced Digital Media Boost
• Accelerate a broad range of applications
– Video, speech, image processing
– Encr...
31
Nehalam Micro-Architecture
• Nehalem is the codename for an Intel
processor microarchitecture, successor to the
Core mi...
32
Nehalem Micro-Architecture
• Nehalam Based Processors are:
Core I3
Core I5
Core I7
Nehalam Micro-Architecture was Repl...
33
Nehalem System Example:
33
34
Building Blocks
34
35
Overview of Nehalem
Processor Chip
• Four identical compute core
• UIU: Un-core interface unit
• L3 cache memory
and
da...
36
Overview of Nehalem
Processor Chip(cont.)
• IMC : Integrated Memory Controller with 3 DDR3 memory channels
• QPI : Quic...
37
Overview of Nehalem
Processor Chip(cont.)
• Chip is divided into two domains:
“Un-core” and “core”
• “Core” components ...
38
Nehalem Memory Hierarchy
Overview
38
39
Cache Hierarchy Latencies
• L1 32KB 8-way, Latency 4 cycles
• L2 256KB 8-way, Latency < 12 cycles
• L3 8MB shared , 16-...
40
Nehalem Microarchitecture
40
41
Instruction Execution
41
42
Instruction Execution (1/5)
1. Instructions fetched
from L2 cache
42
43
Instruction Execution (2/5)
1. Instructions fetched
from L2 cache
2. Instructions
decoded,
prefetched and
queued
43
44
Instruction Execution (3/5)
1. Instructions fetched
from L2 cache
2. Instructions
decoded,
prefetched and
queued
3. Ins...
45
Instruction Execution (4/5)
1. Instructions fetched
from L2 cache
2. Instructions
decoded,
prefetched and
queued
3. Ins...
46
Instruction Execution (5/5)
1. Instructions fetched
from L2 cache
2. Instructions
decoded, prefetched
and queued
3. Ins...
47
Caches and Memory
47
48
Caches and Memory (1/5)
1. 4-way set
associative
instruction cache
48
49
Caches and Memory (2/5)
1. 4-way set
associative
instruction cache
2. 8-way set
associative L1 data
cache (32 KB)
49
50
Caches and Memory (3/5)
1. 4-way set
associative
instruction cache
2. 8-way set
associative L1 data
cache (32 KB)
3. 8-...
51
Caches and Memory (4/5)
1. 4-way set associative
instruction cache
2. 8-way set associative
L1 data cache (32 KB)
3. 8-...
52
Caches and Memory (5/5)
1. 4-way set associative
instruction cache
2. 8-way set associative
L1 data cache (32 KB)
3. 8-...
53
Components
1. Instructions fetched
from L2 cache
2. Instructions
decoded, prefetched
and queued
3. Instructions
optimiz...
54
Components: Fetch
1. Instructions fetched
from L2 cache
– 32 KB instruction cache
– 2-level TLB
• L1
– Instructions:
7-...
55
Components: Decode
2. Instructions
decoded,
prefetched and
queued
– 16 byte prefetch buffer
– 18-op instruction queue
–...
56
Components: Optimization
3. Instructions
optimized and
combined
– 4 op decoders
• Enables multiple
instructions per cyc...
57
Components: Execution
4. Instructions
executed
– 4 FPUs
• MUL, DIV, STOR, LD
– 3 ALUs
– 2 AGUs
• Address generation
– 3...
58
Components: Write-Back
5. Results written
– Private L1/L2 cache
58
59
Components: Write-Back
5. Results written
– Private L1/L2 cache
– Shared L3 cache
– QuickPath
• Dedicated channel to
an...
Upcoming SlideShare
Loading in …5
×

Architecture of high end processors

572 views

Published on

This presentation is prepared to demonstrate the Architecture of high end processors.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Architecture of high end processors

  1. 1. Topic Architecture of High End Processors Core 2 Duo-Core I5
  2. 2. 2 • Ahsan Zafar 12063122-087 • Naeem Raza 12063122-084 • Zain-ul-Hassan 12063122-049 • Fahad Ali Amjad 12063122-027 • Mohsin Raza 12063122-019 Group Members University Of Gujrat (Pakistan)
  3. 3. 3 What is a High End Microprocessor? The Microprocessors that are: • Usually expensive as compared to other types of Microprocessors • Superior in Quality • More Sophisticated
  4. 4. 4 Micro-Architecture? • Microarchitecture (sometime abbreviated to µarch or uarch) is a description of the electrical circuitry of a computer, central processing unit, or digital signal processor that is sufficient for completely describing the operation of the hardware.
  5. 5. 5 RoadMap
  6. 6. 6 Core Micro-Architecture • The Intel Core Microarchitecture is a new foundation for Intel architecture-based desktop, mobile, and mainstream server multi-core processors • Designed for efficiency and optimized performance across a range of market segments and power envelopes
  7. 7. 7 Core Micro-Architecture Intel Core Microarchitecture based processors: • DP Server Dual-core Intel® Xeon® 51xx Processors Quad-core codenamed Clovertown • Desktop Dual-core Intel® Core™ 2 Duo Processors Quad-core codenamed Kentsfield • Mobile Dual-core Intel® Core™ 2 Duo Processors
  8. 8. 8 OverView
  9. 9. 9 OverView
  10. 10. 10 Architectural Features of Core 2 • Intel Wide Dynamic Execution • Intel Intelligent Power Capability • Intel Advanced Smart Cache • Intel Smart Memory Access • Intel Advanced Digital Media Boost
  11. 11. 11 Intel® Wide Dynamic Execution •Advantage Wider execution Comprehensive Advancements Enabled in each core Each core fetches, dispatches, executes and returns up to four full instructions simultaneously. Performance increases while energy consumption decreases L 2 C A C H E
  12. 12. 12 What is L1 and L2? • Level-1 and Level-2 caches • The cache memories in a computer • Much faster than RAM • L1 is built on the microprocessor chip itself. • L2 is a seperate chip • L2 cache is much larger than L1 cache
  13. 13. 13 Intel® Advanced Smart Cache Decreased traffic Increased traffic Higher cache hit rate Reduced bus traffic Lower latency to data •Advantage L2 cache is shared equally Data stored in one place Optimizes cache resource Up to 100% utilization of L2 cache
  14. 14. 14 Intel® Smart Memory Access
  15. 15. 15 Intel® Smart Memory Access
  16. 16. 16 Intel® Smart Memory Access
  17. 17. 17 Intel® Smart Memory Access
  18. 18. 18 Intel® Smart Memory Access
  19. 19. 19 Intel® Smart Memory Access
  20. 20. 20 Intel® Smart Memory Access
  21. 21. 21 Intel® Smart Memory Access
  22. 22. 22 Intel® Smart Memory Access
  23. 23. 23 Intel® Smart Memory Access
  24. 24. 24 Intel® Smart Memory Access
  25. 25. 25 Intel® Smart Memory Access
  26. 26. 26 Intel® Smart Memory Access
  27. 27. 27 Intel® Smart Memory Access • Why? – Lost opportunities for out-of-order execution. • What is the idea? – Ignore the store-load dependecies – If there is a dependency, flash the load instruction • How is it checked? – Verify by checking all dispatched store addresses in the memory order buffer – There is a watchdog
  28. 28. 28 Intel® Advanced Digital Media Boost Lower 64 bit in one cycle, upper in the next
  29. 29. 29 Intel® Advanced Digital Media Boost 128 bit instruction completed in one cycle
  30. 30. 30 Intel® Advanced Digital Media Boost • Accelerate a broad range of applications – Video, speech, image processing – Encryption – Financial – Engineering and scientific
  31. 31. 31 Nehalam Micro-Architecture • Nehalem is the codename for an Intel processor microarchitecture, successor to the Core microarchitecture. • Nehalem processors use the 45 nm process. • The first processor released with the Nehalem architecture was the desktop Core i7
  32. 32. 32 Nehalem Micro-Architecture • Nehalam Based Processors are: Core I3 Core I5 Core I7 Nehalam Micro-Architecture was Replaced By Sandy-Bridge Micro-Architecture.
  33. 33. 33 Nehalem System Example: 33
  34. 34. 34 Building Blocks 34
  35. 35. 35 Overview of Nehalem Processor Chip • Four identical compute core • UIU: Un-core interface unit • L3 cache memory and data block memory 35
  36. 36. 36 Overview of Nehalem Processor Chip(cont.) • IMC : Integrated Memory Controller with 3 DDR3 memory channels • QPI : Quick Path Interconnect ports • Auxiliary circuitry for cache-coherence, power control, system management, performance monitoring 36
  37. 37. 37 Overview of Nehalem Processor Chip(cont.) • Chip is divided into two domains: “Un-core” and “core” • “Core” components operate with a same clock frequency of the actual Core • “Un-Core” components operate with different frequency. 37
  38. 38. 38 Nehalem Memory Hierarchy Overview 38
  39. 39. 39 Cache Hierarchy Latencies • L1 32KB 8-way, Latency 4 cycles • L2 256KB 8-way, Latency < 12 cycles • L3 8MB shared , 16-way, Latency 30-40 cycles (4 core system) • L3 24MB shared, 24-way, Latency 30-60 cycles(8 core system) • DRAM , Latency ~ 180 – 200 cycles 39
  40. 40. 40 Nehalem Microarchitecture 40
  41. 41. 41 Instruction Execution 41
  42. 42. 42 Instruction Execution (1/5) 1. Instructions fetched from L2 cache 42
  43. 43. 43 Instruction Execution (2/5) 1. Instructions fetched from L2 cache 2. Instructions decoded, prefetched and queued 43
  44. 44. 44 Instruction Execution (3/5) 1. Instructions fetched from L2 cache 2. Instructions decoded, prefetched and queued 3. Instructions optimized and combined 44
  45. 45. 45 Instruction Execution (4/5) 1. Instructions fetched from L2 cache 2. Instructions decoded, prefetched and queued 3. Instructions optimized and combined 4. Instructions executed 45
  46. 46. 46 Instruction Execution (5/5) 1. Instructions fetched from L2 cache 2. Instructions decoded, prefetched and queued 3. Instructions optimized and combined 4. Instructions executed 5. Results written 46
  47. 47. 47 Caches and Memory 47
  48. 48. 48 Caches and Memory (1/5) 1. 4-way set associative instruction cache 48
  49. 49. 49 Caches and Memory (2/5) 1. 4-way set associative instruction cache 2. 8-way set associative L1 data cache (32 KB) 49
  50. 50. 50 Caches and Memory (3/5) 1. 4-way set associative instruction cache 2. 8-way set associative L1 data cache (32 KB) 3. 8-way set associative L2 data cache (256 KB) 50
  51. 51. 51 Caches and Memory (4/5) 1. 4-way set associative instruction cache 2. 8-way set associative L1 data cache (32 KB) 3. 8-way set associative L2 data cache (256 KB) 4. 16-way shared L3 cache (8 MB) 51
  52. 52. 52 Caches and Memory (5/5) 1. 4-way set associative instruction cache 2. 8-way set associative L1 data cache (32 KB) 3. 8-way set associative L2 data cache (256 KB) 4. 16-way shared L3 cache (8 MB) 5. 3 DDR3 memory connections 52
  53. 53. 53 Components 1. Instructions fetched from L2 cache 2. Instructions decoded, prefetched and queued 3. Instructions optimized and combined 4. Instructions executed 5. Results written 53
  54. 54. 54 Components: Fetch 1. Instructions fetched from L2 cache – 32 KB instruction cache – 2-level TLB • L1 – Instructions: 7-128 entries – Data: 32-64 entries • L2 – 512 data or instruction entries – Shared between SMT threads 54
  55. 55. 55 Components: Decode 2. Instructions decoded, prefetched and queued – 16 byte prefetch buffer – 18-op instruction queue – MacroOp fusion • Combine small instructions into larger ones – Enhanced branch prediction 55
  56. 56. 56 Components: Optimization 3. Instructions optimized and combined – 4 op decoders • Enables multiple instructions per cycle – 28 MicroOp queue • Pre-fusion buffer – MicroOp fusion • Create 1 “instruction” from MicroOps – Reorder buffer • Post-fusion buffer 56
  57. 57. 57 Components: Execution 4. Instructions executed – 4 FPUs • MUL, DIV, STOR, LD – 3 ALUs – 2 AGUs • Address generation – 3 SSE Units • Supports SSE4 – 6 ports connecting the units 57
  58. 58. 58 Components: Write-Back 5. Results written – Private L1/L2 cache 58
  59. 59. 59 Components: Write-Back 5. Results written – Private L1/L2 cache – Shared L3 cache – QuickPath • Dedicated channel to another CPU, chip, or device • Replaces FSB 59

×