SlideShare a Scribd company logo
1 of 44
Download to read offline
Intel Distinguished Lecture 2003 
Billion Transistor Processor Chips in 
Mainstream Enterprise Platforms of the Future 
Dileep Bhandarkar 
Architect at Large 
Enterprise Platforms Group 
Intel Corporation 
February 10th, 2003 
Ninth International Symposium on High Performance Computer Architecture 
Anaheim, California 
Copyright © 2002 Intel Corporation.
Intel Distinguished Lecture 2003 
Outline 
Ÿ Semiconductor Technology Evolution 
Ÿ Moore’’s Law Video 
Ÿ Parallelism in Microprocessors Today 
Ÿ Multiprocessor Systems 
Ÿ The Billion Transistor Chip 
Ÿ Summary 
©2002, Intel Corporation 
Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries 
*Other names and brands may be claimed as the property of others
Intel Distinguished Lecture 2003 
Birth of the Revolution -- 
The Intel 4004 
Introduced November 15, 1971 
108 KHz, 50 KIPs , 2300 10m transistors
Intel Distinguished Lecture 2003 
2001 –– Pentium® 4 Processor 
Introduced November 20, 2000 
@1.5 GHz core, 400 MT/s bus 
42 Million 0.18μ transistors 
August 27, 2001 
@2 GHz, 400 MT/s bus 
640 SPECint_base2000* 
704 SPECfp_base2000* 
Source: http://www.specbench.org/cpu2000/results/
Intel Distinguished Lecture 2003 
30 Years of Progress 
Ÿ 4004 to Pentium® 4 processor 
–– Transistor count: 20,000x increase 
–– Frequency: 20,000x increase 
–– 39% Compound Annual Growth rate
Intel Distinguished Lecture 2003 
2002 –– Pentium® 4 Processor 
November 14, 2002 
@3.06 GHz, 533 MT/s bus 
1099 SPECint_base2000* 
1077 SPECfp_base2000* 
55 Million 130 nm process 
Source: http://www.specbench.org/cpu2000/results/
Intel Distinguished Lecture 2003 
Branch Unit Floating Point Unit 
Pipeline Control 
Bus Logic 
Int 
RF 
CLK 
L1D 
cache 
Integer 
Datapath 
DTLB 
ALAT 
L1I 
cache 
HPW 
IA32 
L2D Array and Control L3 Tag 
Multi- 
Medi 
a 
unit 
L3 Cache 
Itanium® 2 Processor Overview 
Ÿ .18mm bulk, 6 layer Al 
process 
Ÿ 8 stage, fully stalled in-order 
pipeline 
Ÿ Symmetric six integer-issue 
design 
Ÿ IA32 execution engine 
integrated 
Ÿ 3 levels of cache on-die 
totaling 3.3MB 
Ÿ 221 Million transistors 
Ÿ 130W @1GHz, 1.5V 
Ÿ 421 mm2 die 
Ÿ 142 mm2 CPU core 
19.5mm 
21.6 mm
Intel Distinguished Lecture 2003 
Madison Processor 
Branch Unit Floating Point Unit 
HPW CLK 
IA32 L1I 
Cache 
Pipeline Control 
ALAT L1D 
Cache 
Integer 
DataPath 
Int 
RF 
Multi- 
Media 
Unit 
L2D Cache L3 Tag/LRU 
Bus Logic 
L3 Cache 
PADS 
DTLB 
Ÿ ~1.5 GHz 
Ÿ 6 MB L3 Cache 
Ÿ 24 way set 
associative 
Ÿ ~130W 
Ÿ 410 M transistors 
Ÿ 374 square mm
Intel Distinguished Lecture 2003 
Continuing at this Rate 
by End of the Decade 
8085 8086 
4004 
8008 
8080 
286 
386 
Pentium® 4 
Pentium® Pro 
486 Pentium proc 
Itanium ® 2 
• 221M in 2002 
• 410M in 2003 
10,000 
1,000 
100 
10 
1 
0.1 
0.01 
0.001 
’70 ’80 ’90 ’00 ’10 
Million 
Transistors 
Billion Transistors possible within 4 years
Intel Distinguished Lecture 2003 
““If the automobile industry advanced 
as rapidly as the semiconductor 
industry, a Rolls Royce would get 1/2 
million miles per gallon and it would 
be cheaper to throw it away than to 
park it.”” 
Gordon Moore, 
Intel Corporation
Intel Distinguished Lecture 2003 
Nanotechnology Advancements 
65nm Node 
Lgate = 30nm 
Production - 2005 
Source: Intel 
45nm Node 
Lgate = 20nm 
Production - 2007 30nm Node 
Lgate = 15nm 
Production - 2009 
50nm 
90nm Node 
Lgate = 50nm 
Production - 2003 
Influenza virus 
Process Advancements Fulfill Moore’s Law
Intel Distinguished Lecture 2003 
Semiconductor Manufacturing Process Evolution 
Actual Forecast 
Process name P852 P854 P856 P858 Px60 P1262 P1264 
Production 1993 1995 1997 1999 2001 2003 2005 
Generation 0.50 0.35 0.25 0.18μm 130 nm 90 nm 65 nm 
Gate Length 0.50 0.35 0.20 0.13 <70 nm <50 nm <35 nm 
Wafer Size (mm) 200 200 200 200 200/300 300 300 
New generation every 2 years
Intel Distinguished Lecture 2003 
Outline 
Ÿ Semiconductor Technology Evolution 
Ÿ Moore’’s Law Video 
Ÿ Parallelism in Microprocessors Today 
Ÿ Multiprocessor Systems 
Ÿ The Billion Transistor Chip 
Ÿ Summary 
©2002, Intel Corporation 
Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries 
*Other names and brands may be claimed as the property of others
Intel Distinguished Lecture 2003 
Moore’’s Law
Intel Distinguished Lecture 2003 
Outline 
Ÿ Semiconductor Technology Evolution 
Ÿ Moore’’s Law Video 
Ÿ Parallelism in Microprocessors Today 
Ÿ Multiprocessor Systems 
Ÿ The Billion Transistor Chip 
Ÿ Summary 
©2002, Intel Corporation 
Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries 
*Other names and brands may be claimed as the property of others
Intel Distinguished Lecture 2003 
Parallelism at Multiple Levels 
ŸWithin a processor 
–– multiple issue processors with lots of 
execution units 
–– wider superscalar 
–– explicit parallelism 
ŸMultiple processors on a chip 
–– Hardware Multi Threading 
–– Multiple cores 
Ÿ System Level Multiprocessors
EPIC Architecture FeatuInrteel Dsistinguished Lecture 2003 
Explicitly Parallel Instruction Computing 
Ÿ Enable wide execution by providing processor 
implementations that compiler can take advantage of 
Ÿ Performance through parallelism 
–– Multiple execution units and issue ports in parallel 
–– 2 bundles (up to 6 Instructions) dispatched every cycle 
Ÿ Massive on-chip resources 
–– 128 general registers, 128 floating point registers 
–– 64 predicate registers, 8 branch registers 
–– Exploit parallelism 
–– Efficient management engines (register stack engine) 
Ÿ Provide features that enable compiler to reschedule 
programs using advanced features (predication, 
speculation) 
Ÿ Enable, enhance, express, and exploit parallelism
Intel Distinguished Lecture 2003 
Instruction Formats: Bundles 
127 87 86 46 45 5 4 0 
Instruction Slot 0 
(41 bits) 
128 bits 
Ÿ Instruction types 
–– M: Memory 
–– I: Shifts and multimedia 
–– A: ALU 
–– B: Branch 
–– F: Floating point 
–– L+X: Long 
ŸTemplate encodes types 
–– MII, MLX, MMI, MFI, MMF, MI_I, 
M_MI 
–– Branch: MIB, MMB, MFB, MBB, 
BBB 
ŸTemplate encodes 
parallelism 
–– All come in two flavors: with and 
without stop at end 
Instruction Slot 1 
(41 bits) 
Instruction Slot 2 
(41 bits) 
Template 
(5 bits) 
Ÿ Template identifies types of instructions in bundle 
and delineates independent operations (through 
““stops””)
Intel Distinguished Lecture 2003 
Itanium® 2 Processor Architecture 
Itanium 2 processor 
6.4 GB/s 
128 bits wide 
400 MHz 
System bus System bus 
3 MB L3, 256k L2, 32k L1 all on-die 
8 
6 Integer, 
3 Branch 
2 FP, 
1 SIMD 
10 11 
2 Load & 
2 Store 
1 2 3 4 5 6 7 8 9 
328 on-board Registers 
1 GHz 
6 Instructions / Cycle 
High speed 
Large on-die cache, 
reduced latency 
8-stage pipeline 
11 
Issue ports 
Large number of 
Execution units 
Itanium 2 processor 
221 million transistors total 
25 million in CPU core logic 
Lots of 
parallelism 
within a 
single core
Intel Distinguished Lecture 2003 
Itanium® 2 Processor Block Diagram 
L1 Instruction Cache and 
Fetch/Pre-fetch Engine 
ITLB 
B B B M M M M F F 
128 Integer Registers 128 FP Registers 
Processor Structure 
L2 Cache – Quad Port 
Quad-Port 
L1 
Data 
Cache 
and 
DTLB 
Branch & Predicate 
Registers 
Branch 
Units 
Scoreboard, Predicate 
,NaTs, Exceptions 
ALAT 
IA-32 
Decode 
and 
Control 
Instruction 
Queue 
Floating 
Point 
Units 
8 bundles 
Register Stack Engine / Re-Mapping 
11 Issue Ports 
ECC 
L3 Cache Bus Controller 
ECC 
ECC 
Integer 
and 
MM Units 
I I 
Branch 
Prediction 
ECC 
ECC 
ECC 
ECC
Intel Distinguished Lecture 2003 
Integer & FP Performance 
1400 
1200 
1000 
800 
600 
400 
200 
0 
SPECint2000_base SPECfp2000_base 
UltraSPARC III Cu 1.056 GHz 
Itanium 2 Processor 1 GHz 
Alpha 21264C 1.25 GHz 
Power 4 1.3 GHz 
Pentium 4 Processor 2.8 GHz 
Source: www.specbench.org, 9/10/2002 *Other names and brands may be claimed as the property of others.
Intel Distinguished Lecture 2003 
Long Latency DRAM Accesses: 
Needs Memory Level Parallelism (MLP) 
1400 
1200 
1000 
800 
600 
400 
200 
0 
Pentium® 
Processor 
66 MHz 
Pentium-Pro 
Processor 
200 MHz 
Pentium III 
Processor 
1100 MHz 
Pentium 4 
Processor 
2 GHz 
Future CPUs 
Peak Instructions during DRAM Access
Intel Distinguished Lecture 2003 
Multithreading 
Ÿ Introduced on Intel® Xeon™ Processor MP 
Ÿ Two logical processors for < 5% additional 
die area 
Ÿ Executes two tasks simultaneously 
– Two different applications 
– Two threads of same application 
Ÿ CPU maintains architecture state for two 
processors 
– Two logical processors per physical processor 
Ÿ Power efficient performance gain 
Ÿ 20-30% performance improvement on many 
throughput oriented workloads
Intel Distinguished Lecture 2003 
HyperThreading Technology: 
What was added? 
Instruction Streaming 
Buffers 
Next Instruction Pointer 
Return Stack 
Predictor 
Trace Cache 
Next IP 
Trace Cache 
Fill Buffers 
Instruction TLB 
Register Alias 
Tables 
<5% die size (& max power), up to 30% performance increase
Intel Distinguished Lecture 2003 
IBM Power4 Dual Processor on a Chip 
Large Shared L2: 
Multi-ported: 3 
independent 
L3 & Mem slices 
Controller: 
L3 tags 
on-die for 
full-speed 
coherency 
checks 
Two cores (~30M transistors each) 
Chip-to-Chip & 
MCM-to-MCM 
Fabric: 
Glueless SMP 
*Other names and brands may be claimed as the property of others
Intel Distinguished Lecture 2003 
HP PA-8800 Dual Processor on a Chip 
*Other names and brands may be claimed as the property of others
Intel Distinguished Lecture 2003 
Outline 
Ÿ Semiconductor Technology Evolution 
Ÿ Moore’’s Law Video 
Ÿ Parallelism in Microprocessors Today 
Ÿ Multiprocessor Systems 
Ÿ The Billion Transistor Chip 
Ÿ Summary 
©2002, Intel Corporation 
Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries 
*Other names and brands may be claimed as the property of others
Intel Distinguished Lecture 2003 
Large Multiprocessor Systems 
Ÿ 32 and 64 processor systems available today 
Ÿ 300K to 400K transactions per minute 
Ÿ >100 Linpack Gigaflops
Intel Distinguished Lecture 2003 
IBM eServer pSeries 690 
Ÿ 1.3 GHz Power4 
Ÿ 8 to 32 CPU 
Ÿ Starting at 
$450,000 
Ÿ 8-way MCM @ 
$275,000** 
Ÿ 403,255 tpmc@ 
$17.80 per 
tpmC*** 
Ÿ 95 Linpack Gflops 
http://www-132.ibm.com/content/home/store_IBMPublicUSA/en_US/eServer/pSeries/high_end/pSeries_highend.html 
***Source: http://www.tpc.org/results/individual_results/IBM/IBMp690es_08142002.pdf 
Other names and brands may be claimed as the property of others
Intel Distinguished Lecture 2003 
NEC Express5800/1320Xc SMP Server 
§ Up to 32 Itanium® 2 processors 
§ Up to 512GB memory (with 2GB DIMMs) 
§ Up to 112 PCI-X I/O slots 
§ Low latency and high bandwidth 
cross-bar interconnect 
§ Inter-cell memory interleaving 
§ ECC protected data transfer 
§ 342,746 tpmC @ $12.86 per tpmC** 
§ 101 Linpack GigaFlops 
§ 32 Processors + 256GB @ $1,396,490** 
MMeemmoorryy CCoonnttrroolllleerr 
MMeemmoorryy CCoonnttrroolllleerr 
DDR 
DIMMs 
Cell 
Up to 8 Cells 
PCI-X 
Up to 112slots 
CCrroossss--bbaarr iinntteerrccoonnnneecctt 
PPCCII--XX bbrriiddggee PPCCII--XX bbrriiddggee 
PCI-X bridge PCI-X bridge 
PCI-X bridge PCI-X bridge 
PCI-X bridge PCI-X bridge 
14 PCI-X slots 
PCI-14 X bridge X slots 
PCI-X bridge 
14 PCI-X slots 
PCI-X bridge PCI-X bridge 
14 PCI-X slots 
PCI-X bridge PCI-X bridge 
14 PCI-X slots 
PCI-X bridge PCI-X bridge 
14 PCI-X slots 
14 PCI-X slots 
14 PCI-X slots 
Cell 
Controller 
IIttaanniuiumm22 
IIttaanniuiumm22 
IIttaanniuiumm22 
IIttaanniuiumm22 
**http://www.tpc.org/results/individual_results/NEC/nec.express5800.1320xc.c5.021212.es.pdf 
*Other names and brands may be claimed as the property of others
Intel Distinguished Lecture 2003 
HP Superdome Super Dome is a cell-based hierarchical 
cross-bar system. A cell consists of 
è 4 CPUs 
è 2 to 16GBs of Memory 
è A link to 12 PCI I/O Slots 
è Cell Board with 4 PA-8700 875MHz 
Processors @ $10.080** (2 chassis @ $424,275**) 
64P Performance 
875 MHz PA-RISC 8700 
•• 423,414 tpmC @ 
$15.64 per tpmC 
•• 134 Linpack 
Gigaflops 
http://www.tpc.org/results/individual_results/HP/hp_tpcc_sd_es.pd *Other names and brands may be claimed as the property of others
Intel Distinguished Lecture 2003 
HPC Clusters 
ŸCommercial Off 
The Shelf (COTS) 
components 
–– Processors 
–– Packaging 
–– Interconnects 
–– Operating systems 
15 node dual 2.4GHz Pentium® Xeon® 
cluster 
*Other names and brands may be claimed as the property of others
Intel Distinguished Lecture 2003 
Outline 
Ÿ Semiconductor Technology Evolution 
Ÿ Moore’’s Law Video 
Ÿ Parallelism in Microprocessors Today 
Ÿ Multiprocessor Systems 
Ÿ The Billion Transistor Chip 
Ÿ Summary 
©2002, Intel Corporation 
Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries 
*Other names and brands may be claimed as the property of others
Intel Distinguished Lecture 2003 
Parallelism Design Space 
HW Multi-Threading 
Instruction 
Level 
Parallelism 
Single-Stream Perf 
Faster frequency 
Wider Superscalar 
More Out-of-order speculation 
Thread 
Level 
Parallelism 
Chip Multiprocessing 
Multi-Core 
With Each Process Generation 
Ÿ Frequency increases by about 1.5X 
Ÿ Vcc will scale by only ~0.8 
Ÿ Active power will scale by ~0.9 
Ÿ Active power density will increase by ~30-80% 
Ÿ Leakage power will make it even worse 
Doubling performance requires more than 4 times the transistors
Intel Distinguished Lecture 2003 
Single-stream Performance vs Costs 
Relative power/die size/perf vs 486 
25 
20 
15 
10 
5 
0 
486 Pentium® 
Processor 
Pentium® III 
Processor 
Pentium® 4 
Processor 
SPECInt 
SPECFP 
Die Size 
Power
Intel Distinguished Lecture 2003 
A Simple Extrapolation 
4 Processor system on a chip, Integrating: 
• 4 Itanium® 2 processor Cores ~120 M transistors 
• Shared Cache 16 MB ~900 M transistors 
• Leaf interconnect 
1B Transistors Possible in 65 nm process 
With < 500 sq mm die size
Intel Distinguished Lecture 2003 
Power & Performance Tradeoffs 
Ÿ Throughput Perf a SQRT(Frequency) 
Ÿ But Power a Capacitance * Voltage2 * Frequency 
–– Frequency a Voltage 
Ÿ Power increases non-linearly with Frequency 
Single Core Dual Core Quad Core 
Capacitance 1 2 4 
Voltage 1 0.8 .63* 
Frequency 1 0.8 .63 
Power 1 1 1 
Performance 1 0.9 * 1.8 0.8 * 3.6 
CMP Alternative: Use smaller, lower power CPU cores 
*may not be feasible
Intel Distinguished Lecture 2003 
1999 Mainstream Microprocessor 
ŸPentium® III Processor 
Ÿ Integrated 256 KB L2 
cache 
Ÿ 106 mm² die size 
Ÿ 0.18μμ process 
Ÿ 6 metal layer process 
Ÿ 28 million transistors
Intel Distinguished Lecture 2003 
Technology Projection 
1999 2001 2003 2005 2007 
Process 180 nm 130 nm 90 nm 65 nm 50 nm 
Core+256K L2 100 50 25 12 6 
Sq mm 
1 MB cache 120 60 30 15 8 
Sq mm 
# of cores in ~ 2 4 8 16 32 
200 sq mm 
MB of cache in ~2 ~4 ~8 ~16 ~32 
~240 sq mm
Intel Distinguished Lecture 2003 
Art of the Possible 
Ÿ Billion Transistors possible in 65 nm process 
Ÿ Large die sizes can be built 
–– 400 to 600 square millimeters 
Ÿ What can fit on a single die? 
–– 12.5 mm2 per processor 
–– 15 mm2 per MB 
16 
cores 
8 
cores 
4 
cores 
Die size (core 
+ cache only) 
in mm2 
16 MB cache 290 340 440 
32 MB cache 530 580 680
Intel Distinguished Lecture 2003 
CMP Challenges 
Ÿ How much Thread Level Parallelism is 
there in non-embarassingly parallel 
workloads? 
Ÿ Ability to generate code with lots of 
threads & performance scaling 
Ÿ Thread synchronization 
Ÿ Operating systems for parallel machines 
Ÿ Single thread performance 
Ÿ Power limitations 
Ÿ On-chip interconnect infrastructure 
Ÿ Memory and I/O bandwidth required
Intel Distinguished Lecture 2003 
Design Challenges 
ŸDesign Complexity 
––Productivity Tools and Methods Advance 
–– ……But at slower rate than Moore’’s Law 
–– Replicating cores improves productivity 
ŸVisibility for Test & Debug 
––Pin Bandwidth/Transistor continues to decline 
––Shrinking dimensions, increasing speeds, …… 
ŸPower 
––Power Delivery –– di/dt of Amps/nano-second 
––Thermals: Overall power and thermal density
Intel Distinguished Lecture 2003 
Outline 
Ÿ Semiconductor Technology Evolution 
Ÿ Moore’’s Law Video 
Ÿ Parallelism in Microprocessors Today 
Ÿ Multiprocessor Systems 
Ÿ The Billion Transistor Chip 
Ÿ Summary 
©2002, Intel Corporation 
Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries 
*Other names and brands may be claimed as the property of others
Intel Distinguished Lecture 2003 
Summary 
ŸOne billion transistors feasible within 5 
years 
Ÿ Chip Level Multiprocessing and large 
caches will get us there 
Ÿ Plenty of opportunities for ““parallel 
programming”” in Commercial Off The 
Shelf Server platforms 
Ÿ Amount of parallelism in future 
microprocessors will increase 
Ÿ Need applications and tools that can 
exploit parallelism at all levels 
ŸDesign challenges remain

More Related Content

What's hot

Presentation on - Processors
Presentation on - Processors Presentation on - Processors
Presentation on - Processors The Avi Sharma
 
Intel Atom X Mobile Processors Announcement Slides MWC 2015
Intel Atom X Mobile Processors Announcement Slides MWC 2015Intel Atom X Mobile Processors Announcement Slides MWC 2015
Intel Atom X Mobile Processors Announcement Slides MWC 2015Ronen Mendezitsky
 
Evolution of computer generation.
Evolution of computer generation. Evolution of computer generation.
Evolution of computer generation. Mauryasuraj98
 
intel Presentation
intel Presentationintel Presentation
intel Presentationfinance6
 
Intel presentation ugttw 2015
Intel presentation ugttw 2015Intel presentation ugttw 2015
Intel presentation ugttw 2015James LeDoux
 
6th gen processor
6th gen processor6th gen processor
6th gen processorAmit Sinha
 
6th gen intel processor
6th gen intel processor 6th gen intel processor
6th gen intel processor Amit Sinha
 
Core i 7 processor
Core i 7 processorCore i 7 processor
Core i 7 processorSumit Biswas
 
Amd vs intel processors
Amd  vs  intel processorsAmd  vs  intel processors
Amd vs intel processorsBeth Schoren
 
Intel 14nm aug11
Intel 14nm aug11Intel 14nm aug11
Intel 14nm aug11lopatto
 
Intel(R)Core(Tm)I7 Desktop Processor Product Brief
Intel(R)Core(Tm)I7 Desktop Processor Product BriefIntel(R)Core(Tm)I7 Desktop Processor Product Brief
Intel(R)Core(Tm)I7 Desktop Processor Product BriefOscar del Rio
 
Intel's "Ivy Bridge" Overview
Intel's "Ivy Bridge" OverviewIntel's "Ivy Bridge" Overview
Intel's "Ivy Bridge" OverviewRemus Sinorchian
 
Generation of computer processors
Generation of computer processorsGeneration of computer processors
Generation of computer processorsArshad Qureshi
 

What's hot (20)

Presentation on - Processors
Presentation on - Processors Presentation on - Processors
Presentation on - Processors
 
Intel Atom X Mobile Processors Announcement Slides MWC 2015
Intel Atom X Mobile Processors Announcement Slides MWC 2015Intel Atom X Mobile Processors Announcement Slides MWC 2015
Intel Atom X Mobile Processors Announcement Slides MWC 2015
 
Evolution of computer generation.
Evolution of computer generation. Evolution of computer generation.
Evolution of computer generation.
 
intel Presentation
intel Presentationintel Presentation
intel Presentation
 
Intel presentation ugttw 2015
Intel presentation ugttw 2015Intel presentation ugttw 2015
Intel presentation ugttw 2015
 
6th gen processor
6th gen processor6th gen processor
6th gen processor
 
6th gen intel processor
6th gen intel processor 6th gen intel processor
6th gen intel processor
 
Basics of vlsi
Basics of vlsiBasics of vlsi
Basics of vlsi
 
Core i 7 processor
Core i 7 processorCore i 7 processor
Core i 7 processor
 
L15 micro evlutn
L15 micro evlutnL15 micro evlutn
L15 micro evlutn
 
Amd vs intel processors
Amd  vs  intel processorsAmd  vs  intel processors
Amd vs intel processors
 
xc_sp2e45
xc_sp2e45xc_sp2e45
xc_sp2e45
 
Intel i7
Intel i7Intel i7
Intel i7
 
Intel 14nm aug11
Intel 14nm aug11Intel 14nm aug11
Intel 14nm aug11
 
Intel Core i7
Intel Core i7Intel Core i7
Intel Core i7
 
Computer project
Computer projectComputer project
Computer project
 
Intel(R)Core(Tm)I7 Desktop Processor Product Brief
Intel(R)Core(Tm)I7 Desktop Processor Product BriefIntel(R)Core(Tm)I7 Desktop Processor Product Brief
Intel(R)Core(Tm)I7 Desktop Processor Product Brief
 
intel core i7
intel core i7 intel core i7
intel core i7
 
Intel's "Ivy Bridge" Overview
Intel's "Ivy Bridge" OverviewIntel's "Ivy Bridge" Overview
Intel's "Ivy Bridge" Overview
 
Generation of computer processors
Generation of computer processorsGeneration of computer processors
Generation of computer processors
 

Similar to Billion Transistor Processor Chips Future Enterprise Platforms

Cyclone II FPGA Overview
Cyclone II FPGA OverviewCyclone II FPGA Overview
Cyclone II FPGA OverviewPremier Farnell
 
Intel new processors
Intel new processorsIntel new processors
Intel new processorszaid_b
 
8085 manual NCIT SAROZ BISTA SIR
8085 manual NCIT SAROZ BISTA SIR8085 manual NCIT SAROZ BISTA SIR
8085 manual NCIT SAROZ BISTA SIRTHEE CAVE
 
Microprocessors and controllers
Microprocessors and controllersMicroprocessors and controllers
Microprocessors and controllersWendy Hemo
 
Microprocessors and controllers
Microprocessors and controllersMicroprocessors and controllers
Microprocessors and controllersWendy Hemo
 
Core 2 processors
Core 2 processorsCore 2 processors
Core 2 processorsArun Kumar
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its TypesNimrah Shahbaz
 
Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1SSGMCE SHEGAON
 
Introduction to Microprocessors
Introduction to MicroprocessorsIntroduction to Microprocessors
Introduction to MicroprocessorsSeble Nigussie
 
Lec 03 ia32 architecture
Lec 03  ia32 architectureLec 03  ia32 architecture
Lec 03 ia32 architectureAbdul Khan
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overviewlambertt
 
Introduction to microprocessor
Introduction to microprocessorIntroduction to microprocessor
Introduction to microprocessorKashyap Shah
 

Similar to Billion Transistor Processor Chips Future Enterprise Platforms (20)

Evolution of processors
Evolution of processorsEvolution of processors
Evolution of processors
 
Cyclone II FPGA Overview
Cyclone II FPGA OverviewCyclone II FPGA Overview
Cyclone II FPGA Overview
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Processor2
Processor2Processor2
Processor2
 
Introduction to intel 8086 part1
Introduction to intel 8086 part1Introduction to intel 8086 part1
Introduction to intel 8086 part1
 
Micro processor
Micro processorMicro processor
Micro processor
 
Introduction
Introduction Introduction
Introduction
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
 
8085 manual NCIT SAROZ BISTA SIR
8085 manual NCIT SAROZ BISTA SIR8085 manual NCIT SAROZ BISTA SIR
8085 manual NCIT SAROZ BISTA SIR
 
No[1][1]
No[1][1]No[1][1]
No[1][1]
 
Microprocessors and controllers
Microprocessors and controllersMicroprocessors and controllers
Microprocessors and controllers
 
Microprocessors and controllers
Microprocessors and controllersMicroprocessors and controllers
Microprocessors and controllers
 
Core 2 processors
Core 2 processorsCore 2 processors
Core 2 processors
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its Types
 
Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1
 
Introduction to Microprocessors
Introduction to MicroprocessorsIntroduction to Microprocessors
Introduction to Microprocessors
 
Lec 03 ia32 architecture
Lec 03  ia32 architectureLec 03  ia32 architecture
Lec 03 ia32 architecture
 
Mpmc
MpmcMpmc
Mpmc
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
Introduction to microprocessor
Introduction to microprocessorIntroduction to microprocessor
Introduction to microprocessor
 

More from Dileep Bhandarkar

Open Compute Summit Keynote 17 June 2011
Open Compute Summit Keynote 17 June 2011Open Compute Summit Keynote 17 June 2011
Open Compute Summit Keynote 17 June 2011Dileep Bhandarkar
 
Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010Dileep Bhandarkar
 
Energy Efficiency Considerations in Large Datacenters
Energy Efficiency Considerations in Large DatacentersEnergy Efficiency Considerations in Large Datacenters
Energy Efficiency Considerations in Large DatacentersDileep Bhandarkar
 
Data center-server-cooling-power-management-paper
Data center-server-cooling-power-management-paperData center-server-cooling-power-management-paper
Data center-server-cooling-power-management-paperDileep Bhandarkar
 
Moscow conference keynote in 2012
Moscow conference keynote in 2012Moscow conference keynote in 2012
Moscow conference keynote in 2012Dileep Bhandarkar
 
New Delhi Cloud Summit 05 26-11
New Delhi Cloud Summit 05 26-11New Delhi Cloud Summit 05 26-11
New Delhi Cloud Summit 05 26-11Dileep Bhandarkar
 
Performance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorPerformance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorDileep Bhandarkar
 
Innovation lecture for hong kong
Innovation lecture for hong kongInnovation lecture for hong kong
Innovation lecture for hong kongDileep Bhandarkar
 
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...Dileep Bhandarkar
 
Qualcomm centriq 2400 hot chips final submission corrected
Qualcomm centriq 2400 hot chips final submission correctedQualcomm centriq 2400 hot chips final submission corrected
Qualcomm centriq 2400 hot chips final submission correctedDileep Bhandarkar
 
Innovation lecture for shanghai final
Innovation lecture for shanghai finalInnovation lecture for shanghai final
Innovation lecture for shanghai finalDileep Bhandarkar
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedDileep Bhandarkar
 
Server design summit keynote handout
Server design summit keynote handoutServer design summit keynote handout
Server design summit keynote handoutDileep Bhandarkar
 

More from Dileep Bhandarkar (20)

Open Compute Summit Keynote 17 June 2011
Open Compute Summit Keynote 17 June 2011Open Compute Summit Keynote 17 June 2011
Open Compute Summit Keynote 17 June 2011
 
Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010Datacenter Dynamics Chicago 30 sept 2010
Datacenter Dynamics Chicago 30 sept 2010
 
Energy Efficiency Considerations in Large Datacenters
Energy Efficiency Considerations in Large DatacentersEnergy Efficiency Considerations in Large Datacenters
Energy Efficiency Considerations in Large Datacenters
 
Samsung cio-forum-2012
Samsung cio-forum-2012 Samsung cio-forum-2012
Samsung cio-forum-2012
 
Data center-server-cooling-power-management-paper
Data center-server-cooling-power-management-paperData center-server-cooling-power-management-paper
Data center-server-cooling-power-management-paper
 
Moscow conference keynote in 2012
Moscow conference keynote in 2012Moscow conference keynote in 2012
Moscow conference keynote in 2012
 
New Delhi Cloud Summit 05 26-11
New Delhi Cloud Summit 05 26-11New Delhi Cloud Summit 05 26-11
New Delhi Cloud Summit 05 26-11
 
Performance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorPerformance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro Processor
 
Innovation lecture for hong kong
Innovation lecture for hong kongInnovation lecture for hong kong
Innovation lecture for hong kong
 
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
Performance from Architecture: Comparing a RISC and a CISC with Similar Hardw...
 
Qualcomm centriq 2400 hot chips final submission corrected
Qualcomm centriq 2400 hot chips final submission correctedQualcomm centriq 2400 hot chips final submission corrected
Qualcomm centriq 2400 hot chips final submission corrected
 
Innovation lecture for shanghai final
Innovation lecture for shanghai finalInnovation lecture for shanghai final
Innovation lecture for shanghai final
 
Semicon2018 dileepb
Semicon2018 dileepbSemicon2018 dileepb
Semicon2018 dileepb
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updated
 
Hipeac 2018 keynote Talk
Hipeac 2018 keynote TalkHipeac 2018 keynote Talk
Hipeac 2018 keynote Talk
 
Alpha memo july 1992
Alpha memo july 1992Alpha memo july 1992
Alpha memo july 1992
 
Risc vs cisc
Risc vs ciscRisc vs cisc
Risc vs cisc
 
Server design summit keynote handout
Server design summit keynote handoutServer design summit keynote handout
Server design summit keynote handout
 
Intel microprocessors
Intel microprocessorsIntel microprocessors
Intel microprocessors
 
Future of server design
Future of server designFuture of server design
Future of server design
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Billion Transistor Processor Chips Future Enterprise Platforms

  • 1. Intel Distinguished Lecture 2003 Billion Transistor Processor Chips in Mainstream Enterprise Platforms of the Future Dileep Bhandarkar Architect at Large Enterprise Platforms Group Intel Corporation February 10th, 2003 Ninth International Symposium on High Performance Computer Architecture Anaheim, California Copyright © 2002 Intel Corporation.
  • 2. Intel Distinguished Lecture 2003 Outline Ÿ Semiconductor Technology Evolution Ÿ Moore’’s Law Video Ÿ Parallelism in Microprocessors Today Ÿ Multiprocessor Systems Ÿ The Billion Transistor Chip Ÿ Summary ©2002, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
  • 3. Intel Distinguished Lecture 2003 Birth of the Revolution -- The Intel 4004 Introduced November 15, 1971 108 KHz, 50 KIPs , 2300 10m transistors
  • 4. Intel Distinguished Lecture 2003 2001 –– Pentium® 4 Processor Introduced November 20, 2000 @1.5 GHz core, 400 MT/s bus 42 Million 0.18μ transistors August 27, 2001 @2 GHz, 400 MT/s bus 640 SPECint_base2000* 704 SPECfp_base2000* Source: http://www.specbench.org/cpu2000/results/
  • 5. Intel Distinguished Lecture 2003 30 Years of Progress Ÿ 4004 to Pentium® 4 processor –– Transistor count: 20,000x increase –– Frequency: 20,000x increase –– 39% Compound Annual Growth rate
  • 6. Intel Distinguished Lecture 2003 2002 –– Pentium® 4 Processor November 14, 2002 @3.06 GHz, 533 MT/s bus 1099 SPECint_base2000* 1077 SPECfp_base2000* 55 Million 130 nm process Source: http://www.specbench.org/cpu2000/results/
  • 7. Intel Distinguished Lecture 2003 Branch Unit Floating Point Unit Pipeline Control Bus Logic Int RF CLK L1D cache Integer Datapath DTLB ALAT L1I cache HPW IA32 L2D Array and Control L3 Tag Multi- Medi a unit L3 Cache Itanium® 2 Processor Overview Ÿ .18mm bulk, 6 layer Al process Ÿ 8 stage, fully stalled in-order pipeline Ÿ Symmetric six integer-issue design Ÿ IA32 execution engine integrated Ÿ 3 levels of cache on-die totaling 3.3MB Ÿ 221 Million transistors Ÿ 130W @1GHz, 1.5V Ÿ 421 mm2 die Ÿ 142 mm2 CPU core 19.5mm 21.6 mm
  • 8. Intel Distinguished Lecture 2003 Madison Processor Branch Unit Floating Point Unit HPW CLK IA32 L1I Cache Pipeline Control ALAT L1D Cache Integer DataPath Int RF Multi- Media Unit L2D Cache L3 Tag/LRU Bus Logic L3 Cache PADS DTLB Ÿ ~1.5 GHz Ÿ 6 MB L3 Cache Ÿ 24 way set associative Ÿ ~130W Ÿ 410 M transistors Ÿ 374 square mm
  • 9. Intel Distinguished Lecture 2003 Continuing at this Rate by End of the Decade 8085 8086 4004 8008 8080 286 386 Pentium® 4 Pentium® Pro 486 Pentium proc Itanium ® 2 • 221M in 2002 • 410M in 2003 10,000 1,000 100 10 1 0.1 0.01 0.001 ’70 ’80 ’90 ’00 ’10 Million Transistors Billion Transistors possible within 4 years
  • 10. Intel Distinguished Lecture 2003 ““If the automobile industry advanced as rapidly as the semiconductor industry, a Rolls Royce would get 1/2 million miles per gallon and it would be cheaper to throw it away than to park it.”” Gordon Moore, Intel Corporation
  • 11. Intel Distinguished Lecture 2003 Nanotechnology Advancements 65nm Node Lgate = 30nm Production - 2005 Source: Intel 45nm Node Lgate = 20nm Production - 2007 30nm Node Lgate = 15nm Production - 2009 50nm 90nm Node Lgate = 50nm Production - 2003 Influenza virus Process Advancements Fulfill Moore’s Law
  • 12. Intel Distinguished Lecture 2003 Semiconductor Manufacturing Process Evolution Actual Forecast Process name P852 P854 P856 P858 Px60 P1262 P1264 Production 1993 1995 1997 1999 2001 2003 2005 Generation 0.50 0.35 0.25 0.18μm 130 nm 90 nm 65 nm Gate Length 0.50 0.35 0.20 0.13 <70 nm <50 nm <35 nm Wafer Size (mm) 200 200 200 200 200/300 300 300 New generation every 2 years
  • 13. Intel Distinguished Lecture 2003 Outline Ÿ Semiconductor Technology Evolution Ÿ Moore’’s Law Video Ÿ Parallelism in Microprocessors Today Ÿ Multiprocessor Systems Ÿ The Billion Transistor Chip Ÿ Summary ©2002, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
  • 14. Intel Distinguished Lecture 2003 Moore’’s Law
  • 15. Intel Distinguished Lecture 2003 Outline Ÿ Semiconductor Technology Evolution Ÿ Moore’’s Law Video Ÿ Parallelism in Microprocessors Today Ÿ Multiprocessor Systems Ÿ The Billion Transistor Chip Ÿ Summary ©2002, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
  • 16. Intel Distinguished Lecture 2003 Parallelism at Multiple Levels ŸWithin a processor –– multiple issue processors with lots of execution units –– wider superscalar –– explicit parallelism ŸMultiple processors on a chip –– Hardware Multi Threading –– Multiple cores Ÿ System Level Multiprocessors
  • 17. EPIC Architecture FeatuInrteel Dsistinguished Lecture 2003 Explicitly Parallel Instruction Computing Ÿ Enable wide execution by providing processor implementations that compiler can take advantage of Ÿ Performance through parallelism –– Multiple execution units and issue ports in parallel –– 2 bundles (up to 6 Instructions) dispatched every cycle Ÿ Massive on-chip resources –– 128 general registers, 128 floating point registers –– 64 predicate registers, 8 branch registers –– Exploit parallelism –– Efficient management engines (register stack engine) Ÿ Provide features that enable compiler to reschedule programs using advanced features (predication, speculation) Ÿ Enable, enhance, express, and exploit parallelism
  • 18. Intel Distinguished Lecture 2003 Instruction Formats: Bundles 127 87 86 46 45 5 4 0 Instruction Slot 0 (41 bits) 128 bits Ÿ Instruction types –– M: Memory –– I: Shifts and multimedia –– A: ALU –– B: Branch –– F: Floating point –– L+X: Long ŸTemplate encodes types –– MII, MLX, MMI, MFI, MMF, MI_I, M_MI –– Branch: MIB, MMB, MFB, MBB, BBB ŸTemplate encodes parallelism –– All come in two flavors: with and without stop at end Instruction Slot 1 (41 bits) Instruction Slot 2 (41 bits) Template (5 bits) Ÿ Template identifies types of instructions in bundle and delineates independent operations (through ““stops””)
  • 19. Intel Distinguished Lecture 2003 Itanium® 2 Processor Architecture Itanium 2 processor 6.4 GB/s 128 bits wide 400 MHz System bus System bus 3 MB L3, 256k L2, 32k L1 all on-die 8 6 Integer, 3 Branch 2 FP, 1 SIMD 10 11 2 Load & 2 Store 1 2 3 4 5 6 7 8 9 328 on-board Registers 1 GHz 6 Instructions / Cycle High speed Large on-die cache, reduced latency 8-stage pipeline 11 Issue ports Large number of Execution units Itanium 2 processor 221 million transistors total 25 million in CPU core logic Lots of parallelism within a single core
  • 20. Intel Distinguished Lecture 2003 Itanium® 2 Processor Block Diagram L1 Instruction Cache and Fetch/Pre-fetch Engine ITLB B B B M M M M F F 128 Integer Registers 128 FP Registers Processor Structure L2 Cache – Quad Port Quad-Port L1 Data Cache and DTLB Branch & Predicate Registers Branch Units Scoreboard, Predicate ,NaTs, Exceptions ALAT IA-32 Decode and Control Instruction Queue Floating Point Units 8 bundles Register Stack Engine / Re-Mapping 11 Issue Ports ECC L3 Cache Bus Controller ECC ECC Integer and MM Units I I Branch Prediction ECC ECC ECC ECC
  • 21. Intel Distinguished Lecture 2003 Integer & FP Performance 1400 1200 1000 800 600 400 200 0 SPECint2000_base SPECfp2000_base UltraSPARC III Cu 1.056 GHz Itanium 2 Processor 1 GHz Alpha 21264C 1.25 GHz Power 4 1.3 GHz Pentium 4 Processor 2.8 GHz Source: www.specbench.org, 9/10/2002 *Other names and brands may be claimed as the property of others.
  • 22. Intel Distinguished Lecture 2003 Long Latency DRAM Accesses: Needs Memory Level Parallelism (MLP) 1400 1200 1000 800 600 400 200 0 Pentium® Processor 66 MHz Pentium-Pro Processor 200 MHz Pentium III Processor 1100 MHz Pentium 4 Processor 2 GHz Future CPUs Peak Instructions during DRAM Access
  • 23. Intel Distinguished Lecture 2003 Multithreading Ÿ Introduced on Intel® Xeon™ Processor MP Ÿ Two logical processors for < 5% additional die area Ÿ Executes two tasks simultaneously – Two different applications – Two threads of same application Ÿ CPU maintains architecture state for two processors – Two logical processors per physical processor Ÿ Power efficient performance gain Ÿ 20-30% performance improvement on many throughput oriented workloads
  • 24. Intel Distinguished Lecture 2003 HyperThreading Technology: What was added? Instruction Streaming Buffers Next Instruction Pointer Return Stack Predictor Trace Cache Next IP Trace Cache Fill Buffers Instruction TLB Register Alias Tables <5% die size (& max power), up to 30% performance increase
  • 25. Intel Distinguished Lecture 2003 IBM Power4 Dual Processor on a Chip Large Shared L2: Multi-ported: 3 independent L3 & Mem slices Controller: L3 tags on-die for full-speed coherency checks Two cores (~30M transistors each) Chip-to-Chip & MCM-to-MCM Fabric: Glueless SMP *Other names and brands may be claimed as the property of others
  • 26. Intel Distinguished Lecture 2003 HP PA-8800 Dual Processor on a Chip *Other names and brands may be claimed as the property of others
  • 27. Intel Distinguished Lecture 2003 Outline Ÿ Semiconductor Technology Evolution Ÿ Moore’’s Law Video Ÿ Parallelism in Microprocessors Today Ÿ Multiprocessor Systems Ÿ The Billion Transistor Chip Ÿ Summary ©2002, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
  • 28. Intel Distinguished Lecture 2003 Large Multiprocessor Systems Ÿ 32 and 64 processor systems available today Ÿ 300K to 400K transactions per minute Ÿ >100 Linpack Gigaflops
  • 29. Intel Distinguished Lecture 2003 IBM eServer pSeries 690 Ÿ 1.3 GHz Power4 Ÿ 8 to 32 CPU Ÿ Starting at $450,000 Ÿ 8-way MCM @ $275,000** Ÿ 403,255 tpmc@ $17.80 per tpmC*** Ÿ 95 Linpack Gflops http://www-132.ibm.com/content/home/store_IBMPublicUSA/en_US/eServer/pSeries/high_end/pSeries_highend.html ***Source: http://www.tpc.org/results/individual_results/IBM/IBMp690es_08142002.pdf Other names and brands may be claimed as the property of others
  • 30. Intel Distinguished Lecture 2003 NEC Express5800/1320Xc SMP Server § Up to 32 Itanium® 2 processors § Up to 512GB memory (with 2GB DIMMs) § Up to 112 PCI-X I/O slots § Low latency and high bandwidth cross-bar interconnect § Inter-cell memory interleaving § ECC protected data transfer § 342,746 tpmC @ $12.86 per tpmC** § 101 Linpack GigaFlops § 32 Processors + 256GB @ $1,396,490** MMeemmoorryy CCoonnttrroolllleerr MMeemmoorryy CCoonnttrroolllleerr DDR DIMMs Cell Up to 8 Cells PCI-X Up to 112slots CCrroossss--bbaarr iinntteerrccoonnnneecctt PPCCII--XX bbrriiddggee PPCCII--XX bbrriiddggee PCI-X bridge PCI-X bridge PCI-X bridge PCI-X bridge PCI-X bridge PCI-X bridge 14 PCI-X slots PCI-14 X bridge X slots PCI-X bridge 14 PCI-X slots PCI-X bridge PCI-X bridge 14 PCI-X slots PCI-X bridge PCI-X bridge 14 PCI-X slots PCI-X bridge PCI-X bridge 14 PCI-X slots 14 PCI-X slots 14 PCI-X slots Cell Controller IIttaanniuiumm22 IIttaanniuiumm22 IIttaanniuiumm22 IIttaanniuiumm22 **http://www.tpc.org/results/individual_results/NEC/nec.express5800.1320xc.c5.021212.es.pdf *Other names and brands may be claimed as the property of others
  • 31. Intel Distinguished Lecture 2003 HP Superdome Super Dome is a cell-based hierarchical cross-bar system. A cell consists of è 4 CPUs è 2 to 16GBs of Memory è A link to 12 PCI I/O Slots è Cell Board with 4 PA-8700 875MHz Processors @ $10.080** (2 chassis @ $424,275**) 64P Performance 875 MHz PA-RISC 8700 •• 423,414 tpmC @ $15.64 per tpmC •• 134 Linpack Gigaflops http://www.tpc.org/results/individual_results/HP/hp_tpcc_sd_es.pd *Other names and brands may be claimed as the property of others
  • 32. Intel Distinguished Lecture 2003 HPC Clusters ŸCommercial Off The Shelf (COTS) components –– Processors –– Packaging –– Interconnects –– Operating systems 15 node dual 2.4GHz Pentium® Xeon® cluster *Other names and brands may be claimed as the property of others
  • 33. Intel Distinguished Lecture 2003 Outline Ÿ Semiconductor Technology Evolution Ÿ Moore’’s Law Video Ÿ Parallelism in Microprocessors Today Ÿ Multiprocessor Systems Ÿ The Billion Transistor Chip Ÿ Summary ©2002, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
  • 34. Intel Distinguished Lecture 2003 Parallelism Design Space HW Multi-Threading Instruction Level Parallelism Single-Stream Perf Faster frequency Wider Superscalar More Out-of-order speculation Thread Level Parallelism Chip Multiprocessing Multi-Core With Each Process Generation Ÿ Frequency increases by about 1.5X Ÿ Vcc will scale by only ~0.8 Ÿ Active power will scale by ~0.9 Ÿ Active power density will increase by ~30-80% Ÿ Leakage power will make it even worse Doubling performance requires more than 4 times the transistors
  • 35. Intel Distinguished Lecture 2003 Single-stream Performance vs Costs Relative power/die size/perf vs 486 25 20 15 10 5 0 486 Pentium® Processor Pentium® III Processor Pentium® 4 Processor SPECInt SPECFP Die Size Power
  • 36. Intel Distinguished Lecture 2003 A Simple Extrapolation 4 Processor system on a chip, Integrating: • 4 Itanium® 2 processor Cores ~120 M transistors • Shared Cache 16 MB ~900 M transistors • Leaf interconnect 1B Transistors Possible in 65 nm process With < 500 sq mm die size
  • 37. Intel Distinguished Lecture 2003 Power & Performance Tradeoffs Ÿ Throughput Perf a SQRT(Frequency) Ÿ But Power a Capacitance * Voltage2 * Frequency –– Frequency a Voltage Ÿ Power increases non-linearly with Frequency Single Core Dual Core Quad Core Capacitance 1 2 4 Voltage 1 0.8 .63* Frequency 1 0.8 .63 Power 1 1 1 Performance 1 0.9 * 1.8 0.8 * 3.6 CMP Alternative: Use smaller, lower power CPU cores *may not be feasible
  • 38. Intel Distinguished Lecture 2003 1999 Mainstream Microprocessor ŸPentium® III Processor Ÿ Integrated 256 KB L2 cache Ÿ 106 mm² die size Ÿ 0.18μμ process Ÿ 6 metal layer process Ÿ 28 million transistors
  • 39. Intel Distinguished Lecture 2003 Technology Projection 1999 2001 2003 2005 2007 Process 180 nm 130 nm 90 nm 65 nm 50 nm Core+256K L2 100 50 25 12 6 Sq mm 1 MB cache 120 60 30 15 8 Sq mm # of cores in ~ 2 4 8 16 32 200 sq mm MB of cache in ~2 ~4 ~8 ~16 ~32 ~240 sq mm
  • 40. Intel Distinguished Lecture 2003 Art of the Possible Ÿ Billion Transistors possible in 65 nm process Ÿ Large die sizes can be built –– 400 to 600 square millimeters Ÿ What can fit on a single die? –– 12.5 mm2 per processor –– 15 mm2 per MB 16 cores 8 cores 4 cores Die size (core + cache only) in mm2 16 MB cache 290 340 440 32 MB cache 530 580 680
  • 41. Intel Distinguished Lecture 2003 CMP Challenges Ÿ How much Thread Level Parallelism is there in non-embarassingly parallel workloads? Ÿ Ability to generate code with lots of threads & performance scaling Ÿ Thread synchronization Ÿ Operating systems for parallel machines Ÿ Single thread performance Ÿ Power limitations Ÿ On-chip interconnect infrastructure Ÿ Memory and I/O bandwidth required
  • 42. Intel Distinguished Lecture 2003 Design Challenges ŸDesign Complexity ––Productivity Tools and Methods Advance –– ……But at slower rate than Moore’’s Law –– Replicating cores improves productivity ŸVisibility for Test & Debug ––Pin Bandwidth/Transistor continues to decline ––Shrinking dimensions, increasing speeds, …… ŸPower ––Power Delivery –– di/dt of Amps/nano-second ––Thermals: Overall power and thermal density
  • 43. Intel Distinguished Lecture 2003 Outline Ÿ Semiconductor Technology Evolution Ÿ Moore’’s Law Video Ÿ Parallelism in Microprocessors Today Ÿ Multiprocessor Systems Ÿ The Billion Transistor Chip Ÿ Summary ©2002, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
  • 44. Intel Distinguished Lecture 2003 Summary ŸOne billion transistors feasible within 5 years Ÿ Chip Level Multiprocessing and large caches will get us there Ÿ Plenty of opportunities for ““parallel programming”” in Commercial Off The Shelf Server platforms Ÿ Amount of parallelism in future microprocessors will increase Ÿ Need applications and tools that can exploit parallelism at all levels ŸDesign challenges remain