SlideShare a Scribd company logo
Microprocessor Futures
1
University of California
Future of Microprocessors
David Patterson
University of California,
Berkeley
June 2001
Microprocessor Futures
2
University of California
Outline
• A 30 year history of microprocessors
– Four generation of innovation
• High performance microprocessor drivers:
– Memory hierarchies
– instruction level parallelism (ILP)
• Where are we and where are we going?
• Focus on desktop/server microprocessors vs.
embedded/DSP microprocessor
Microprocessor Futures
3
University of California
Microprocessor Generations
• First generation: 1971-78
– Behind the power curve
(16-bit, <50k transistors)
• Second Generation: 1979-85
– Becoming “real” computers
(32-bit , >50k transistors)
• Third Generation: 1985-89
– Challenging the “establishment”
(Reduced Instruction Set Computer/RISC,
>100k transistors)
• Fourth Generation: 1990-
– Architectural and performance leadership
(64-bit, > 1M transistors,
Intel/AMD translate into RISC internally)
Microprocessor Futures
4
University of California
In the beginning (8-bit) Intel 4004
• First general-purpose, single-
chip microprocessor
• Shipped in 1971
• 8-bit architecture, 4-bit
implementation
• 2,300 transistors
• Performance < 0.1 MIPS
(Million Instructions Per Sec)
• 8008: 8-bit implementation in
1972
– 3,500 transistors
– First microprocessor-based
computer (Micral)
• Targeted at laboratory
instrumentation
• Mostly sold in Europe
All chip photos in this talk courtesy of Michael W. Davidson and The Florida State University
Microprocessor Futures
5
University of California
1st Generation (16-bit) Intel 8086
• Introduced in 1978
– Performance < 0.5 MIPS
• New 16-bit architecture
– “Assembly language”
compatible with 8080
– 29,000 transistors
– Includes memory protection,
support for Floating Point
coprocessor
• In 1981, IBM introduces PC
– Based on 8088--8-bit bus
version of 8086
Microprocessor Futures
6
University of California
2nd Generation (32-bit) Motorola 68000
• Major architectural step in
microprocessors:
– First 32-bit architecture
• initial 16-bit implementation
– First flat 32-bit address
• Support for paging
– General-purpose register
architecture
• Loosely based on PDP-11
minicomputer
• First implementation in 1979
– 68,000 transistors
– < 1 MIPS (Million Instructions
Per Second)
• Used in
– Apple Mac
– Sun , Silicon Graphics, & Apollo
workstations
Microprocessor Futures
7
University of California
3rd Generation: MIPS R2000
• Several firsts:
– First (commercial) RISC
microprocessor
– First microprocessor to
provide integrated support for
instruction & data cache
– First pipelined microprocessor
(sustains 1 instruction/clock)
• Implemented in 1985
– 125,000 transistors
– 5-8 MIPS (Million
Instructions per Second)
Microprocessor Futures
8
University of California
4th Generation (64 bit) MIPS R4000
• First 64-bit architecture
• Integrated caches
– On-chip
– Support for off-chip,
secondary cache
• Integrated floating point
• Implemented in 1991:
– Deep pipeline
– 1.4M transistors
– Initially 100MHz
– > 50 MIPS
• Intel translates 80x86/
Pentium X instructions into
RISC internally
Microprocessor Futures
9
University of California
Key Architectural Trends
• Increase performance at 1.6x per year (2X/1.5yr)
– True from 1985-present
• Combination of technology and architectural
enhancements
– Technology provides faster transistors
( 1/lithographic feature size) and more of them
– Faster transistors leads to high clock rates
– More transistors (“Moore’s Law”):
• Architectural ideas turn transistors into performance
– Responsible for about half the yearly performance growth
• Two key architectural directions
– Sophisticated memory hierarchies
– Exploiting instruction level parallelism
Microprocessor Futures
10
University of California
Memory Hierarchies
• Caches: hide latency of DRAM and increase BW
– CPU-DRAM access gap has grown by a factor of 30-50!
• Trend 1: Increasingly large caches
– On-chip: from 128 bytes (1984) to 100,000+ bytes
– Multilevel caches: add another level of caching
• First multilevel cache:1986
• Secondary cache sizes today: 128,000 B to 16,000,000 B
• Third level caches: 1998
• Trend 2: Advances in caching techniques:
– Reduce or hide cache miss latencies
• early restart after cache miss (1992)
• nonblocking caches: continue during a cache miss (1994)
– Cache aware combos: computers, compilers, code writers
• prefetching: instruction to bring data into cache early
Microprocessor Futures
11
University of California
Exploiting Instruction Level Parallelism (ILP)
• ILP is the implicit parallelism among instructions (programmer
not aware)
• Exploited by
– Overlapping execution in a pipeline
– Issuing multiple instruction per clock
• superscalar: uses dynamic issue decision (HW driven)
• VLIW: uses static issue decision (SW driven)
• 1985: simple microprocessor pipeline (1 instr/clock)
• 1990: first static multiple issue microprocessors
• 1995: sophisticated dynamic schemes
– determine parallelism dynamically
– execute instructions out-of-order
– speculative execution depending on branch prediction
• “Off-the-shelf” ILP techniques yielded 15 year path of 2X
performance every 1.5 years => 1000X faster!
Microprocessor Futures
12
University of California
Where have all the transistors gone?
• Superscalar
(multiple instructions per clock
cycle)
Execution
Icache
D
cache
branch
TLB
Intel Pentium III
(10M transistors)
2 Bus Intf
Out-Of-Order
SS
• Branch prediction
(predict outcome of decisions)
• 3 levels of cache
• Out-of-order execution
(executing instructions in
different order than programmer
wrote them)
Microprocessor Futures
13
University of California
Deminishing Return On Investment
• Until recently:
– Microprocessor effective work per clock cycle (instructions per
clock)goes up by ~ square root of number of transistors
– Microprocessor clock rate goes up as lithographic feature size
shrinks
• With >4 instructions per clock, microprocessor
performance increases even less efficiently
• Chip-wide wires no longer scale with technology
– They get relatively slower than gates  (1/scale)3
– More complicated processors have longer wires
Microprocessor Futures
14
University of California
0
1
10
100
1,000
1980 1990 2000
diesize(mm2)Moore’s Law vs. Common Sense?
RISC II die
Intel MPU die
• Scaled 32-bit, 5-stage RISC II 1/1000th of current MPU, die
size or transistors (1/4 mm2 )
~1000X
Microprocessor Futures
15
University of California
New view: ClusterOnaChip (CoC)
• Use several simple processors on a single chip:
– Performance goes up linearly in number of transistors
– Simpler processors can run at faster clocks
– Less design cost/time, Less time to market risk (reuse)
• Inspiration: Google
– Search engine for world: 100M/day
– Economical, scalable build block:
PC cluster today 8000 PCs, 16000 disks
– Advantages in fault tolerance, scalability, cost/performance
• 32-bit MPU as the new “Transistor”
– “Cluster on a chip” with 1000s of processors enable amazing MIPS/$,
MIPS/watt for cluster applications
– MPUs combined with dense memory + system on a chip CAD
• 30 years ago Intel 4004 used 2300 transistors:
when 2300 32-bit RISC processors on a single chip?
Microprocessor Futures
16
University of California
VIRAM-1 Integrated Processor/Memory
• Microprocessor
– 256-bit media processor (vector)
– 14 MBytes DRAM
– 2.5-3.2 billion operations per second
– 2W at 170-200 MHz
– Industrial strength compiler
• 280 mm2 die area
– 18.72 x 15 mm
– ~200 mm2 for memory/logic
– DRAM: ~140 mm2
– Vector lanes: ~50 mm2
• Technology: IBM SA-27E
– 0.18mm CMOS
– 6 metal layers (copper)
• Transistor count: >100M
• Implemented by 6 Berkeley graduate
students
15 mm
18.7mm
Thanks to DARPA: funding
IBM: donate masks, fab
Avanti: donate CAD tools
MIPS: donate MIPS core
Cray: Compilers, MIT:FPU
Microprocessor Futures
17
University of California
Concluding Remarks
• A great 30 year history and a challenge for the next 30!
– Not a wall in performance growth, but a slowing down
• Diminishing returns on silicon investment
• But need to use right metrics.
Not just raw (peak) performance, but:
– Performance per transistor
– Performance per Watt
• Possible New Direction?
– Consider true multiprocessing?
– Key question: Could multiprocessors on a single piece of silicon be
much easier to use efficiently then today’s multiprocessors?
(Thanks to John Hennessy@Stanford,
Norm Jouppi@Compaq for most of these slides)

More Related Content

Similar to Nae

02_Computer-Evolution(1).ppt
02_Computer-Evolution(1).ppt02_Computer-Evolution(1).ppt
02_Computer-Evolution(1).ppt
ShaistaRiaz4
 
Computer Evolution.ppt
Computer Evolution.pptComputer Evolution.ppt
Computer Evolution.ppt
VivekTrial
 
arquitectura_de_las_pc.pdf
arquitectura_de_las_pc.pdfarquitectura_de_las_pc.pdf
arquitectura_de_las_pc.pdf
brydyl
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
Facultad de Informática UCM
 
02 computer evolution and performance
02 computer evolution and performance02 computer evolution and performance
02 computer evolution and performance
dilip kumar
 
02 computer evolution and performance.ppt [compatibility mode]
02 computer evolution and performance.ppt [compatibility mode]02 computer evolution and performance.ppt [compatibility mode]
02 computer evolution and performance.ppt [compatibility mode]
bogi007
 
Computer Architecture
Computer ArchitectureComputer Architecture
Computer Architecture
Haris456
 
VLSI Design-Lecture2 introduction to ic technology
VLSI Design-Lecture2 introduction to ic technologyVLSI Design-Lecture2 introduction to ic technology
VLSI Design-Lecture2 introduction to ic technology
sritulasiadigopula
 
02 computer evolution and performance
02 computer evolution and performance02 computer evolution and performance
02 computer evolution and performance
Sher Shah Merkhel
 
Asynchronous processors Poster
Asynchronous processors PosterAsynchronous processors Poster
Asynchronous processors Poster
Akshit Arora
 
Parallel Computing - Lec 2
Parallel Computing - Lec 2Parallel Computing - Lec 2
Parallel Computing - Lec 2
Shah Zaib
 
aca mod1.pptx
aca mod1.pptxaca mod1.pptx
aca mod1.pptx
Shiva Kumar V
 
Presentation spd (1).pptx
Presentation spd (1).pptxPresentation spd (1).pptx
Presentation spd (1).pptx
allyn alax
 
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISALec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Hsien-Hsin Sean Lee, Ph.D.
 
The Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half OverThe Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half Over
inside-BigData.com
 
VLSI unit 1 Technology - S.ppt
VLSI unit 1 Technology - S.pptVLSI unit 1 Technology - S.ppt
VLSI unit 1 Technology - S.ppt
indrajeetPatel22
 
Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1
SSGMCE SHEGAON
 
ECESLU Microprocessors lecture
ECESLU Microprocessors lecture ECESLU Microprocessors lecture
ECESLU Microprocessors lecture
Jeffrey Des Binwag
 
KSpeculative aspects of high-speed processor design
KSpeculative aspects of high-speed processor designKSpeculative aspects of high-speed processor design
KSpeculative aspects of high-speed processor design
ssuser7dcef0
 
My ISCA 2013 - 40th International Symposium on Computer Architecture Keynote
My ISCA 2013 - 40th International Symposium on Computer Architecture KeynoteMy ISCA 2013 - 40th International Symposium on Computer Architecture Keynote
My ISCA 2013 - 40th International Symposium on Computer Architecture Keynote
Dileep Bhandarkar
 

Similar to Nae (20)

02_Computer-Evolution(1).ppt
02_Computer-Evolution(1).ppt02_Computer-Evolution(1).ppt
02_Computer-Evolution(1).ppt
 
Computer Evolution.ppt
Computer Evolution.pptComputer Evolution.ppt
Computer Evolution.ppt
 
arquitectura_de_las_pc.pdf
arquitectura_de_las_pc.pdfarquitectura_de_las_pc.pdf
arquitectura_de_las_pc.pdf
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
 
02 computer evolution and performance
02 computer evolution and performance02 computer evolution and performance
02 computer evolution and performance
 
02 computer evolution and performance.ppt [compatibility mode]
02 computer evolution and performance.ppt [compatibility mode]02 computer evolution and performance.ppt [compatibility mode]
02 computer evolution and performance.ppt [compatibility mode]
 
Computer Architecture
Computer ArchitectureComputer Architecture
Computer Architecture
 
VLSI Design-Lecture2 introduction to ic technology
VLSI Design-Lecture2 introduction to ic technologyVLSI Design-Lecture2 introduction to ic technology
VLSI Design-Lecture2 introduction to ic technology
 
02 computer evolution and performance
02 computer evolution and performance02 computer evolution and performance
02 computer evolution and performance
 
Asynchronous processors Poster
Asynchronous processors PosterAsynchronous processors Poster
Asynchronous processors Poster
 
Parallel Computing - Lec 2
Parallel Computing - Lec 2Parallel Computing - Lec 2
Parallel Computing - Lec 2
 
aca mod1.pptx
aca mod1.pptxaca mod1.pptx
aca mod1.pptx
 
Presentation spd (1).pptx
Presentation spd (1).pptxPresentation spd (1).pptx
Presentation spd (1).pptx
 
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISALec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
 
The Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half OverThe Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half Over
 
VLSI unit 1 Technology - S.ppt
VLSI unit 1 Technology - S.pptVLSI unit 1 Technology - S.ppt
VLSI unit 1 Technology - S.ppt
 
Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1Microcontrollers and intro to real time programming 1
Microcontrollers and intro to real time programming 1
 
ECESLU Microprocessors lecture
ECESLU Microprocessors lecture ECESLU Microprocessors lecture
ECESLU Microprocessors lecture
 
KSpeculative aspects of high-speed processor design
KSpeculative aspects of high-speed processor designKSpeculative aspects of high-speed processor design
KSpeculative aspects of high-speed processor design
 
My ISCA 2013 - 40th International Symposium on Computer Architecture Keynote
My ISCA 2013 - 40th International Symposium on Computer Architecture KeynoteMy ISCA 2013 - 40th International Symposium on Computer Architecture Keynote
My ISCA 2013 - 40th International Symposium on Computer Architecture Keynote
 

More from HarshitParkar6677

Wi fi hacking
Wi fi hackingWi fi hacking
Wi fi hacking
HarshitParkar6677
 
D dos attack
D dos attackD dos attack
D dos attack
HarshitParkar6677
 
Notes chapter 6
Notes chapter  6Notes chapter  6
Notes chapter 6
HarshitParkar6677
 
Interface notes
Interface notesInterface notes
Interface notes
HarshitParkar6677
 
Chapter6 2
Chapter6 2Chapter6 2
Chapter6 2
HarshitParkar6677
 
Chapter6
Chapter6Chapter6
8086 cpu 1
8086 cpu 18086 cpu 1
8086 cpu 1
HarshitParkar6677
 
Chapter 6 notes
Chapter 6 notesChapter 6 notes
Chapter 6 notes
HarshitParkar6677
 
Chapter 5 notes
Chapter 5 notesChapter 5 notes
Chapter 5 notes
HarshitParkar6677
 
Chap6 procedures &amp; macros
Chap6 procedures &amp; macrosChap6 procedures &amp; macros
Chap6 procedures &amp; macros
HarshitParkar6677
 
Chapter 5 notes new
Chapter 5 notes newChapter 5 notes new
Chapter 5 notes new
HarshitParkar6677
 
Notes arithmetic instructions
Notes arithmetic instructionsNotes arithmetic instructions
Notes arithmetic instructions
HarshitParkar6677
 
Notes all instructions
Notes all instructionsNotes all instructions
Notes all instructions
HarshitParkar6677
 
Notes aaa aa
Notes aaa aaNotes aaa aa
Notes aaa aa
HarshitParkar6677
 
Notes 8086 instruction format
Notes 8086 instruction formatNotes 8086 instruction format
Notes 8086 instruction format
HarshitParkar6677
 
Misc
MiscMisc
Copy of 8086inst logical
Copy of 8086inst logicalCopy of 8086inst logical
Copy of 8086inst logical
HarshitParkar6677
 
Copy of 8086inst logical
Copy of 8086inst logicalCopy of 8086inst logical
Copy of 8086inst logical
HarshitParkar6677
 
Chapter3 program flow control instructions
Chapter3 program flow control instructionsChapter3 program flow control instructions
Chapter3 program flow control instructions
HarshitParkar6677
 
Chapter3 8086inst stringsl
Chapter3 8086inst stringslChapter3 8086inst stringsl
Chapter3 8086inst stringsl
HarshitParkar6677
 

More from HarshitParkar6677 (20)

Wi fi hacking
Wi fi hackingWi fi hacking
Wi fi hacking
 
D dos attack
D dos attackD dos attack
D dos attack
 
Notes chapter 6
Notes chapter  6Notes chapter  6
Notes chapter 6
 
Interface notes
Interface notesInterface notes
Interface notes
 
Chapter6 2
Chapter6 2Chapter6 2
Chapter6 2
 
Chapter6
Chapter6Chapter6
Chapter6
 
8086 cpu 1
8086 cpu 18086 cpu 1
8086 cpu 1
 
Chapter 6 notes
Chapter 6 notesChapter 6 notes
Chapter 6 notes
 
Chapter 5 notes
Chapter 5 notesChapter 5 notes
Chapter 5 notes
 
Chap6 procedures &amp; macros
Chap6 procedures &amp; macrosChap6 procedures &amp; macros
Chap6 procedures &amp; macros
 
Chapter 5 notes new
Chapter 5 notes newChapter 5 notes new
Chapter 5 notes new
 
Notes arithmetic instructions
Notes arithmetic instructionsNotes arithmetic instructions
Notes arithmetic instructions
 
Notes all instructions
Notes all instructionsNotes all instructions
Notes all instructions
 
Notes aaa aa
Notes aaa aaNotes aaa aa
Notes aaa aa
 
Notes 8086 instruction format
Notes 8086 instruction formatNotes 8086 instruction format
Notes 8086 instruction format
 
Misc
MiscMisc
Misc
 
Copy of 8086inst logical
Copy of 8086inst logicalCopy of 8086inst logical
Copy of 8086inst logical
 
Copy of 8086inst logical
Copy of 8086inst logicalCopy of 8086inst logical
Copy of 8086inst logical
 
Chapter3 program flow control instructions
Chapter3 program flow control instructionsChapter3 program flow control instructions
Chapter3 program flow control instructions
 
Chapter3 8086inst stringsl
Chapter3 8086inst stringslChapter3 8086inst stringsl
Chapter3 8086inst stringsl
 

Recently uploaded

openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
snaprevwdev
 
Assistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdfAssistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdf
Seetal Daas
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
um7474492
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
b0754201
 
Impartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 StandardImpartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 Standard
MuhammadJazib15
 
Unit -II Spectroscopy - EC I B.Tech.pdf
Unit -II Spectroscopy - EC  I B.Tech.pdfUnit -II Spectroscopy - EC  I B.Tech.pdf
Unit -II Spectroscopy - EC I B.Tech.pdf
TeluguBadi
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
Shiny Christobel
 
Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...
pvpriya2
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
felixwold
 
Power Electronics- AC -AC Converters.pptx
Power Electronics- AC -AC Converters.pptxPower Electronics- AC -AC Converters.pptx
Power Electronics- AC -AC Converters.pptx
Poornima D
 
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
DharmaBanothu
 

Recently uploaded (20)

openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
 
Assistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdfAssistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdf
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
 
Impartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 StandardImpartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 Standard
 
Unit -II Spectroscopy - EC I B.Tech.pdf
Unit -II Spectroscopy - EC  I B.Tech.pdfUnit -II Spectroscopy - EC  I B.Tech.pdf
Unit -II Spectroscopy - EC I B.Tech.pdf
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
 
Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
 
Power Electronics- AC -AC Converters.pptx
Power Electronics- AC -AC Converters.pptxPower Electronics- AC -AC Converters.pptx
Power Electronics- AC -AC Converters.pptx
 
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
 

Nae

  • 1. Microprocessor Futures 1 University of California Future of Microprocessors David Patterson University of California, Berkeley June 2001
  • 2. Microprocessor Futures 2 University of California Outline • A 30 year history of microprocessors – Four generation of innovation • High performance microprocessor drivers: – Memory hierarchies – instruction level parallelism (ILP) • Where are we and where are we going? • Focus on desktop/server microprocessors vs. embedded/DSP microprocessor
  • 3. Microprocessor Futures 3 University of California Microprocessor Generations • First generation: 1971-78 – Behind the power curve (16-bit, <50k transistors) • Second Generation: 1979-85 – Becoming “real” computers (32-bit , >50k transistors) • Third Generation: 1985-89 – Challenging the “establishment” (Reduced Instruction Set Computer/RISC, >100k transistors) • Fourth Generation: 1990- – Architectural and performance leadership (64-bit, > 1M transistors, Intel/AMD translate into RISC internally)
  • 4. Microprocessor Futures 4 University of California In the beginning (8-bit) Intel 4004 • First general-purpose, single- chip microprocessor • Shipped in 1971 • 8-bit architecture, 4-bit implementation • 2,300 transistors • Performance < 0.1 MIPS (Million Instructions Per Sec) • 8008: 8-bit implementation in 1972 – 3,500 transistors – First microprocessor-based computer (Micral) • Targeted at laboratory instrumentation • Mostly sold in Europe All chip photos in this talk courtesy of Michael W. Davidson and The Florida State University
  • 5. Microprocessor Futures 5 University of California 1st Generation (16-bit) Intel 8086 • Introduced in 1978 – Performance < 0.5 MIPS • New 16-bit architecture – “Assembly language” compatible with 8080 – 29,000 transistors – Includes memory protection, support for Floating Point coprocessor • In 1981, IBM introduces PC – Based on 8088--8-bit bus version of 8086
  • 6. Microprocessor Futures 6 University of California 2nd Generation (32-bit) Motorola 68000 • Major architectural step in microprocessors: – First 32-bit architecture • initial 16-bit implementation – First flat 32-bit address • Support for paging – General-purpose register architecture • Loosely based on PDP-11 minicomputer • First implementation in 1979 – 68,000 transistors – < 1 MIPS (Million Instructions Per Second) • Used in – Apple Mac – Sun , Silicon Graphics, & Apollo workstations
  • 7. Microprocessor Futures 7 University of California 3rd Generation: MIPS R2000 • Several firsts: – First (commercial) RISC microprocessor – First microprocessor to provide integrated support for instruction & data cache – First pipelined microprocessor (sustains 1 instruction/clock) • Implemented in 1985 – 125,000 transistors – 5-8 MIPS (Million Instructions per Second)
  • 8. Microprocessor Futures 8 University of California 4th Generation (64 bit) MIPS R4000 • First 64-bit architecture • Integrated caches – On-chip – Support for off-chip, secondary cache • Integrated floating point • Implemented in 1991: – Deep pipeline – 1.4M transistors – Initially 100MHz – > 50 MIPS • Intel translates 80x86/ Pentium X instructions into RISC internally
  • 9. Microprocessor Futures 9 University of California Key Architectural Trends • Increase performance at 1.6x per year (2X/1.5yr) – True from 1985-present • Combination of technology and architectural enhancements – Technology provides faster transistors ( 1/lithographic feature size) and more of them – Faster transistors leads to high clock rates – More transistors (“Moore’s Law”): • Architectural ideas turn transistors into performance – Responsible for about half the yearly performance growth • Two key architectural directions – Sophisticated memory hierarchies – Exploiting instruction level parallelism
  • 10. Microprocessor Futures 10 University of California Memory Hierarchies • Caches: hide latency of DRAM and increase BW – CPU-DRAM access gap has grown by a factor of 30-50! • Trend 1: Increasingly large caches – On-chip: from 128 bytes (1984) to 100,000+ bytes – Multilevel caches: add another level of caching • First multilevel cache:1986 • Secondary cache sizes today: 128,000 B to 16,000,000 B • Third level caches: 1998 • Trend 2: Advances in caching techniques: – Reduce or hide cache miss latencies • early restart after cache miss (1992) • nonblocking caches: continue during a cache miss (1994) – Cache aware combos: computers, compilers, code writers • prefetching: instruction to bring data into cache early
  • 11. Microprocessor Futures 11 University of California Exploiting Instruction Level Parallelism (ILP) • ILP is the implicit parallelism among instructions (programmer not aware) • Exploited by – Overlapping execution in a pipeline – Issuing multiple instruction per clock • superscalar: uses dynamic issue decision (HW driven) • VLIW: uses static issue decision (SW driven) • 1985: simple microprocessor pipeline (1 instr/clock) • 1990: first static multiple issue microprocessors • 1995: sophisticated dynamic schemes – determine parallelism dynamically – execute instructions out-of-order – speculative execution depending on branch prediction • “Off-the-shelf” ILP techniques yielded 15 year path of 2X performance every 1.5 years => 1000X faster!
  • 12. Microprocessor Futures 12 University of California Where have all the transistors gone? • Superscalar (multiple instructions per clock cycle) Execution Icache D cache branch TLB Intel Pentium III (10M transistors) 2 Bus Intf Out-Of-Order SS • Branch prediction (predict outcome of decisions) • 3 levels of cache • Out-of-order execution (executing instructions in different order than programmer wrote them)
  • 13. Microprocessor Futures 13 University of California Deminishing Return On Investment • Until recently: – Microprocessor effective work per clock cycle (instructions per clock)goes up by ~ square root of number of transistors – Microprocessor clock rate goes up as lithographic feature size shrinks • With >4 instructions per clock, microprocessor performance increases even less efficiently • Chip-wide wires no longer scale with technology – They get relatively slower than gates  (1/scale)3 – More complicated processors have longer wires
  • 14. Microprocessor Futures 14 University of California 0 1 10 100 1,000 1980 1990 2000 diesize(mm2)Moore’s Law vs. Common Sense? RISC II die Intel MPU die • Scaled 32-bit, 5-stage RISC II 1/1000th of current MPU, die size or transistors (1/4 mm2 ) ~1000X
  • 15. Microprocessor Futures 15 University of California New view: ClusterOnaChip (CoC) • Use several simple processors on a single chip: – Performance goes up linearly in number of transistors – Simpler processors can run at faster clocks – Less design cost/time, Less time to market risk (reuse) • Inspiration: Google – Search engine for world: 100M/day – Economical, scalable build block: PC cluster today 8000 PCs, 16000 disks – Advantages in fault tolerance, scalability, cost/performance • 32-bit MPU as the new “Transistor” – “Cluster on a chip” with 1000s of processors enable amazing MIPS/$, MIPS/watt for cluster applications – MPUs combined with dense memory + system on a chip CAD • 30 years ago Intel 4004 used 2300 transistors: when 2300 32-bit RISC processors on a single chip?
  • 16. Microprocessor Futures 16 University of California VIRAM-1 Integrated Processor/Memory • Microprocessor – 256-bit media processor (vector) – 14 MBytes DRAM – 2.5-3.2 billion operations per second – 2W at 170-200 MHz – Industrial strength compiler • 280 mm2 die area – 18.72 x 15 mm – ~200 mm2 for memory/logic – DRAM: ~140 mm2 – Vector lanes: ~50 mm2 • Technology: IBM SA-27E – 0.18mm CMOS – 6 metal layers (copper) • Transistor count: >100M • Implemented by 6 Berkeley graduate students 15 mm 18.7mm Thanks to DARPA: funding IBM: donate masks, fab Avanti: donate CAD tools MIPS: donate MIPS core Cray: Compilers, MIT:FPU
  • 17. Microprocessor Futures 17 University of California Concluding Remarks • A great 30 year history and a challenge for the next 30! – Not a wall in performance growth, but a slowing down • Diminishing returns on silicon investment • But need to use right metrics. Not just raw (peak) performance, but: – Performance per transistor – Performance per Watt • Possible New Direction? – Consider true multiprocessing? – Key question: Could multiprocessors on a single piece of silicon be much easier to use efficiently then today’s multiprocessors? (Thanks to John Hennessy@Stanford, Norm Jouppi@Compaq for most of these slides)

Editor's Notes

  1. This figure presents the floorplan of Vector IRAM. It occupies nearly 300 square mm and 150 million transistors in a 0.18um CMOS process by IBM. Blue blocks on the floorplan indicate DRAM macros or compiled SRAM blocks. Golden blocks are those designed at Berkeley. They included synthesized logic for control and the FP datapaths, and full custom logic for register files, integer datapaths and DRAM. Vector IRAM operates at 200MHz. The power supply is 1.2V for logic and 1.8V for DRAM. The peak performance for the vector unit is 1.6 giga ops for 64bit integer operations. Performance doubles or quadruples for 32 and 16b operations respectively. Peak floating point performance is 1.6 Gflops. There are several interesting things to notice on the floorplan. First the overall design modularity and scalability. It mostly consists of replicated DRAM macros and vector lanes connected through a crossbar. Another very interesting feature is the percentage of this design directly visible to software. Compilers can control any part of the design that is registers, datapaths or main memory. They do that by scheduling proper arithmetic or load store instructions. The majority of our design is used for main memory, vector registers and datapaths. On the other hand, if you take a look at a processor like Pentium 3, you will see that less than 20% of its are is used for datapaths and registers. The rest is caches and dynamic issue logic. While this usually work for the benefit of applications, they cannot be controlled by compiler and they cannot be turned off when not necessary.