SlideShare a Scribd company logo
1 of 29
Download to read offline
Carnegie Mellon
1
Design of Digital Circuits 2014
Srdjan Capkun
Frank K. Gürkaynak
Adapted from Digital Design and Computer Architecture, David Money Harris & Sarah L. Harris ©2007 Elsevier
http://www.syssec.ethz.ch/education/Digitaltechnik_14
Advanced Microprocessors
Carnegie Mellon
2
What Will We Learn?
 Tricks invented over the years
 Deep Pipelining
 Branch Prediction
 Superscalar Processors
 Out of Order Processors
 Register Renaming
 SIMD
 Multithreading
 Multiprocessors
 A short history of interesting processors
Carnegie Mellon
3
Deep Pipelining
 Idea: Pipelining is good, so let us pipeline the processor as
much as possible
 MHz wars (until mid 2000s): 10–20 stages became typical
 Number of stages limited by:
 Pipeline hazards (penalty of branch misprediction increases)
 Sequencing overhead (setup and propagation delays of flip-flops)
 Power (faster clock rate, more activity)
 Cost (larger area)
Carnegie Mellon
4
Branch Prediction
 Ideal pipelined processor: CPI = 1
 Branch misprediction increases CPI
 Static branch prediction:
 Check direction of branch (forward or backward)
 If backward, predict taken
 Otherwise, predict not taken
 Dynamic branch prediction:
 Keep history of last (several hundred) branches in a branch target
buffer which holds:
 Branch destination
 Whether branch was taken
Carnegie Mellon
5
Branch Prediction Example
add $s1, $0, $0 # sum = 0
add $s0, $0, $0 # i = 0
addi $t0, $0, 10 # $t0 = 10
for:
beq $s0, $t0, done # if i == 10, branch
add $s1, $s1, $s0 # sum = sum + i
addi $s0, $s0, 1 # increment i
j for
done:
Carnegie Mellon
6
1-Bit Branch Predictor
 Remembers whether branch was taken the last time and
does the same thing
 Mispredicts first and last branch of loop
add $s1, $0, $0 # sum = 0
add $s0, $0, $0 # i = 0
addi $t0, $0, 10 # $t0 = 10
for:
beq $s0, $t0, done # if i == 10, branch
add $s1, $s1, $s0 # sum = sum + i
addi $s0, $s0, 1 # increment i
j for
done:
Carnegie Mellon
7
2-Bit Branch Predictor
 Only mispredicts last branch of loop
strongly
taken
predict
taken
weakly
taken
predict
taken
weakly
not taken
predict
not taken
strongly
not taken
predict
not taken
taken taken taken
taken
taken
taken
taken
taken
add $s1, $0, $0 # sum = 0
add $s0, $0, $0 # i = 0
addi $t0, $0, 10 # $t0 = 10
for:
beq $s0, $t0, done # if i == 10, branch
add $s1, $s1, $s0 # sum = sum + i
addi $s0, $s0, 1 # increment i
j for
done:
Carnegie Mellon
8
Superscalar
 Multiple copies of datapath: Can issue multiple
instructions at per cycle
 Dependencies make it tricky to issue multiple instructions
at once
CLK CLK CLK CLK
A
RD A1
A2
RD1
A3
WD3
WD6
A4
A5
A6
RD4
RD2
RD5
Instruction
Memory
Register
File Data
Memory
ALUs
PC
CLK
A1
A2
WD1
WD2
RD1
RD2
Here: Ideal IPC = 2
Carnegie Mellon
9
Superscalar Example
lw $t0, 40($s0)
add $t1, $s1, $s2
sub $t2, $s1, $s3
and $t3, $s3, $s4
or $t4, $s1, $s5
sw $s5, 80($s0)
Time (cycles)
1 2 3 4 5 6 7 8
RF
40
$s0
RF
$t0
+
DM
IM
lw
add
lw $t0, 40($s0)
add $t1, $s1, $s2
sub $t2, $s1, $s3
and $t3, $s3, $s4
or $t4, $s1, $s5
sw $s5, 80($s0)
$t1
$s2
$s1
+
RF
$s3
$s1
RF
$t2
-
DM
IM
sub
and $t3
$s4
$s3
&
RF
$s5
$s1
RF
$t4
|
DM
IM
or
sw
80
$s0
+
$s5
Ideal IPC = 2
Actual IPC = 2 (6 instructions issued in 3 cycles)
Carnegie Mellon
10
Superscalar Example with Dependencies
lw $t0, 40($s0)
add $t1, $t0, $s1
sub $t0, $s2, $s3
and $t2, $s4, $t0
or $t3, $s5, $s6
sw $s7, 80($t3)
Stall
Time (cycles)
1 2 3 4 5 6 7 8
RF
40
$s0
RF
$t0
+
DM
IM
lw
lw $t0, 40($s0)
add $t1, $t0, $s1
sub $t0, $s2, $s3
and $t2, $s4, $t0
sw $s7, 80($t3)
RF
$s1
$t0
add
RF
$s1
$t0
RF
$t1
+
DM
RF
$t0
$s4
RF
$t2
&
DM
IM
and
IM
or
and
sub
|
$s6
$s5
$t3
RF
80
$t3
RF
+
DM
sw
IM
$s7
9
$s3
$s2
$s3
$s2
-
$t0
or
or $t3, $s5, $s6
IM
Ideal IPC = 2
Actual IPC = 1.2 (6 instructions issued in 5 cycles)
Carnegie Mellon
11
Out of Order Processor
 Looks ahead across multiple instructions to issue as many as
possible at once
 Issues instructions out of order as long as no dependencies
 Dependencies:
 RAW (read after write): one instruction writes, and later instruction
reads a register
 WAR (write after read): one instruction reads, and a later instruction
writes a register (also called an antidependence)
 WAW (write after write): one instruction writes, and a later instruction
writes a register (also called an output dependence)
Carnegie Mellon
12
Out of Order Processor
 Instruction level parallelism: the number of instruction that
can be issued simultaneously
 Reorder buffer: stores instructions until they are executed
 Scoreboard: table that keeps track of:
 Instructions waiting to issue
 Available functional units
 Dependencies
Carnegie Mellon
13
Out of Order Processor Example
# program
lw $t0, 40($s0)
add $t1, $t0, $s1
sub $t0, $s2, $s3
and $t2, $s4, $t0
or $t3, $s5, $s6
sw $s7, 80($t3)
Carnegie Mellon
14
Out of Order Processor Example
# program
lw $t0, 40($s0)
add $t1, $t0, $s1
sub $t0, $s2, $s3
and $t2, $s4, $t0
or $t3, $s5, $s6
sw $s7, 80($t3)
# execution order
lw $t0, 40($s0) #1
or $t3, $s5, $s6 #1
sw $s7, 80($t3) #2
add $t1, $t0, $s1 #3
sub $t0, $s2, $s3 #3
and $t2, $s4, $t0 #4
Carnegie Mellon
15
Time (cycles)
1 2 3 4 5 6 7 8
RF
40
$s0
RF
$t0
+
DM
IM
lw
lw $t0, 40($s0)
add $t1, $t0, $s1
sub $t0, $s2, $s3
and $t2, $s4, $t0
sw $s7, 80($t3)
or
|
$s6
$s5
$t3
RF
80
$t3
RF
+
DM
sw $s7
or $t3, $s5, $s6
IM
RF
$s1
$t0
RF
$t1
+
DM
IM
add
sub
-
$s3
$s2
$t0
two cycle latency
between load and
use of $t0
RAW
WAR
RAW
RF
$t0
$s4
RF
&
DM
and
IM
$t2
RAW
# execution order
lw $t0, 40($s0) #1
or $t3, $s5, $s6 #1
sw $s7, 80($t3) #2
add $t1, $t0, $s1 #3
sub $t0, $s2, $s3 #3
and $t2, $s4, $t0 #4
Actual IPC = 1.5 (6 instructions issued in 4 cycles)
Carnegie Mellon
18
SIMD
 Single Instruction Multiple Data (SIMD)
 Single instruction acts on multiple pieces of data at once
 Common application: graphics
 Perform short arithmetic operations (also called packed arithmetic)
 For example: add four 8-bit numbers
 Must modify ALU to eliminate carries between 8-bit values
padd8 $s2, $s0, $s1
a0
0
7
8
15
16
23
24
32 Bit position
$s0
a1
a2
a3
b0
$s1
b1
b2
b3
a0
+ b0
$s2
a1
+ b1
a2
+ b2
a3
+ b3
+
Carnegie Mellon
19
Advanced Architecture Techniques
 Multithreading
 Wordprocessor: thread for typing, spell checking, printing
 Multiprocessors
 Multiple processors (cores) on a single chip
Carnegie Mellon
20
Multithreading: First Some Definitions
 Process: program running on a computer
 Multiple processes can run at once: e.g., surfing Web, playing
music, writing a paper
 Thread: part of a program
 Each process has multiple threads: e.g., a word processor may have
threads for typing, spell checking, printing
Carnegie Mellon
21
Threads in Conventional Processor
 One thread runs at once
 When one thread stalls (for example, waiting for memory):
 Architectural state of that thread is stored
 Architectural state of waiting thread is loaded into processor and it
runs
 Called context switching
 Appears to user like all threads running simultaneously
Carnegie Mellon
22
Multithreading
 Multiple copies of architectural state
 Multiple threads active at once:
 When one thread stalls, another runs immediately (no need to
store or restore architectural state)
 If one thread can’t keep all execution units busy, another thread
can use them
 Does not increase instruction-level parallelism (ILP) of
single thread, but does increase throughput
Carnegie Mellon
23
Multiprocessors
 Multiple processors (cores) with a method of
communication between them
 Types of multiprocessing:
 Symmetric multiprocessing (SMT): multiple cores with a shared
memory
 Asymmetric multiprocessing: separate cores for different tasks (for
example, DSP and CPU in cell phone)
 Clusters: each core has its own memory system
Carnegie Mellon
24
Some Historical Processors
 The following is an excerpt of processors from history
Carnegie Mellon
25
http://research.microsoft.com/en-us/um/people/gbell/CyberMuseum_contents/Microprocessor_Evolution_Poster.jpg
Carnegie Mellon
26
Sun Ultrasparc (1995)
500nm, 4 million transistors
 Early 64-bit architecture
 Four issue superscalar
 Thirty two 64-bit registers
 7 read – 3 write ports
 Nine stage integer pipeline
 Cache
 16 kByte data (direct)
 16 kByte Instruction (2-way)
 External L2 cache
http://www.cs.cmu.edu/afs/cs/academic/class/15740-f97/public/platform/ultrasparc.pdf
Carnegie Mellon
27
Dec Alpha 21264 (1996)
350nm, 15 million transistors
 Early high frequency/power
(600 MHz): 80–100W
 Architecture
 Out-of-order execution
 Peak CPI == 6
 Seven stage pipeline
 Up to 80 instructions active
 All instructions 32-bit (MIPS like)
 Cache
 64 kByte L1 Data & 64kByte L1 Instruction
 1-16 Mbyte L2 Cache external
http://www.ralph.timmermann.org/controller/ev6/chip.gif
Carnegie Mellon
28
Intel Pentium 4 (2000)
180nm, 42 million transistors
 Extreme pipelining
 Net Burst Architecture
 20-stage instruction pipeline
P6 has 10 stages, P5 has 5 stages
 2 ALUs, working at twice the clock
rate to increase IPC
 1 Power PC processor
 12 k Execution Trace Cache
(stores micro operations)
 8 kByte L1 data cache
 256 kByte L2 Cache
http://www.tayloredge.com/museum/processor/2000_Pentium4.jpg
Carnegie Mellon
29
IBM Cell (2006)
90nm, 250 million transistors
 Early heterogeneous multicore
 Heart of Playstation 3
 8 Synergistic Processing Elements
 256 kByte local storage
 128, 128-bit registers
 SIMD operation
(16x 8-bit, 8x 16-bit, 4x 32-bit)
 1 Power PC processor
 64 kByte L1 Cache +
512 kByte L2 Cache
http://www.ps3news.com/images/img_19889.jpg
Carnegie Mellon
30
AMD Bulldozer (2011)
32nm technology, 1.2 billion transistors
 Up to 4 modules
 2 INT + 1 FP core each
 Each INT core 2 ALUs
 Each FP core: 4 ADD + 4 MAC
 3 Levels of Cache on chip
 8 MByte L3
 2 MByte L2 per module
 64 kByte two-way
L1 instruction per module
 16 kByte four-way
L1 data cache per core
http://en.wikipedia.org/wiki/File:AMD_Bulldozer_block_diagram_(8_core_CPU).PNG
http://info.nuje.de/OrochiDieWithModule.jpg
Carnegie Mellon
31
Other Resources
 Patterson & Hennessy’s:
Computer Architecture: A Quantitative Approach
 Conferences:
 www.cs.wisc.edu/~arch/www/
 ISCA (International Symposium on Computer Architecture)
 HPCA (International Symposium on High Performance Computer
Architecture)

More Related Content

Similar to 23_Advanced_Processors controller system

Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems PerformanceBrendan Gregg
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterAnne Nicolas
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioHajime Tazaki
 
Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsAutomating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsScyllaDB
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Fwdays
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin敬倫 林
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.pptrveiga100
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchYutaka Yasuda
 
import rdma: zero-copy networking with RDMA and Python
import rdma: zero-copy networking with RDMA and Pythonimport rdma: zero-copy networking with RDMA and Python
import rdma: zero-copy networking with RDMA and Pythongroveronline
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linuxbrouer
 
Seastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration SummitSeastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration SummitDon Marti
 

Similar to 23_Advanced_Processors controller system (20)

Data race
Data raceData race
Data race
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems Performance
 
DSP Processor.pptx
DSP Processor.pptxDSP Processor.pptx
DSP Processor.pptx
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
 
Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsAutomating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency Spreads
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin
 
Lec02
Lec02Lec02
Lec02
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow Switch
 
The Spectre of Meltdowns
The Spectre of MeltdownsThe Spectre of Meltdowns
The Spectre of Meltdowns
 
import rdma: zero-copy networking with RDMA and Python
import rdma: zero-copy networking with RDMA and Pythonimport rdma: zero-copy networking with RDMA and Python
import rdma: zero-copy networking with RDMA and Python
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
Seastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration SummitSeastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration Summit
 

Recently uploaded

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 

Recently uploaded (20)

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 

23_Advanced_Processors controller system

  • 1. Carnegie Mellon 1 Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak Adapted from Digital Design and Computer Architecture, David Money Harris & Sarah L. Harris ©2007 Elsevier http://www.syssec.ethz.ch/education/Digitaltechnik_14 Advanced Microprocessors
  • 2. Carnegie Mellon 2 What Will We Learn?  Tricks invented over the years  Deep Pipelining  Branch Prediction  Superscalar Processors  Out of Order Processors  Register Renaming  SIMD  Multithreading  Multiprocessors  A short history of interesting processors
  • 3. Carnegie Mellon 3 Deep Pipelining  Idea: Pipelining is good, so let us pipeline the processor as much as possible  MHz wars (until mid 2000s): 10–20 stages became typical  Number of stages limited by:  Pipeline hazards (penalty of branch misprediction increases)  Sequencing overhead (setup and propagation delays of flip-flops)  Power (faster clock rate, more activity)  Cost (larger area)
  • 4. Carnegie Mellon 4 Branch Prediction  Ideal pipelined processor: CPI = 1  Branch misprediction increases CPI  Static branch prediction:  Check direction of branch (forward or backward)  If backward, predict taken  Otherwise, predict not taken  Dynamic branch prediction:  Keep history of last (several hundred) branches in a branch target buffer which holds:  Branch destination  Whether branch was taken
  • 5. Carnegie Mellon 5 Branch Prediction Example add $s1, $0, $0 # sum = 0 add $s0, $0, $0 # i = 0 addi $t0, $0, 10 # $t0 = 10 for: beq $s0, $t0, done # if i == 10, branch add $s1, $s1, $s0 # sum = sum + i addi $s0, $s0, 1 # increment i j for done:
  • 6. Carnegie Mellon 6 1-Bit Branch Predictor  Remembers whether branch was taken the last time and does the same thing  Mispredicts first and last branch of loop add $s1, $0, $0 # sum = 0 add $s0, $0, $0 # i = 0 addi $t0, $0, 10 # $t0 = 10 for: beq $s0, $t0, done # if i == 10, branch add $s1, $s1, $s0 # sum = sum + i addi $s0, $s0, 1 # increment i j for done:
  • 7. Carnegie Mellon 7 2-Bit Branch Predictor  Only mispredicts last branch of loop strongly taken predict taken weakly taken predict taken weakly not taken predict not taken strongly not taken predict not taken taken taken taken taken taken taken taken taken add $s1, $0, $0 # sum = 0 add $s0, $0, $0 # i = 0 addi $t0, $0, 10 # $t0 = 10 for: beq $s0, $t0, done # if i == 10, branch add $s1, $s1, $s0 # sum = sum + i addi $s0, $s0, 1 # increment i j for done:
  • 8. Carnegie Mellon 8 Superscalar  Multiple copies of datapath: Can issue multiple instructions at per cycle  Dependencies make it tricky to issue multiple instructions at once CLK CLK CLK CLK A RD A1 A2 RD1 A3 WD3 WD6 A4 A5 A6 RD4 RD2 RD5 Instruction Memory Register File Data Memory ALUs PC CLK A1 A2 WD1 WD2 RD1 RD2 Here: Ideal IPC = 2
  • 9. Carnegie Mellon 9 Superscalar Example lw $t0, 40($s0) add $t1, $s1, $s2 sub $t2, $s1, $s3 and $t3, $s3, $s4 or $t4, $s1, $s5 sw $s5, 80($s0) Time (cycles) 1 2 3 4 5 6 7 8 RF 40 $s0 RF $t0 + DM IM lw add lw $t0, 40($s0) add $t1, $s1, $s2 sub $t2, $s1, $s3 and $t3, $s3, $s4 or $t4, $s1, $s5 sw $s5, 80($s0) $t1 $s2 $s1 + RF $s3 $s1 RF $t2 - DM IM sub and $t3 $s4 $s3 & RF $s5 $s1 RF $t4 | DM IM or sw 80 $s0 + $s5 Ideal IPC = 2 Actual IPC = 2 (6 instructions issued in 3 cycles)
  • 10. Carnegie Mellon 10 Superscalar Example with Dependencies lw $t0, 40($s0) add $t1, $t0, $s1 sub $t0, $s2, $s3 and $t2, $s4, $t0 or $t3, $s5, $s6 sw $s7, 80($t3) Stall Time (cycles) 1 2 3 4 5 6 7 8 RF 40 $s0 RF $t0 + DM IM lw lw $t0, 40($s0) add $t1, $t0, $s1 sub $t0, $s2, $s3 and $t2, $s4, $t0 sw $s7, 80($t3) RF $s1 $t0 add RF $s1 $t0 RF $t1 + DM RF $t0 $s4 RF $t2 & DM IM and IM or and sub | $s6 $s5 $t3 RF 80 $t3 RF + DM sw IM $s7 9 $s3 $s2 $s3 $s2 - $t0 or or $t3, $s5, $s6 IM Ideal IPC = 2 Actual IPC = 1.2 (6 instructions issued in 5 cycles)
  • 11. Carnegie Mellon 11 Out of Order Processor  Looks ahead across multiple instructions to issue as many as possible at once  Issues instructions out of order as long as no dependencies  Dependencies:  RAW (read after write): one instruction writes, and later instruction reads a register  WAR (write after read): one instruction reads, and a later instruction writes a register (also called an antidependence)  WAW (write after write): one instruction writes, and a later instruction writes a register (also called an output dependence)
  • 12. Carnegie Mellon 12 Out of Order Processor  Instruction level parallelism: the number of instruction that can be issued simultaneously  Reorder buffer: stores instructions until they are executed  Scoreboard: table that keeps track of:  Instructions waiting to issue  Available functional units  Dependencies
  • 13. Carnegie Mellon 13 Out of Order Processor Example # program lw $t0, 40($s0) add $t1, $t0, $s1 sub $t0, $s2, $s3 and $t2, $s4, $t0 or $t3, $s5, $s6 sw $s7, 80($t3)
  • 14. Carnegie Mellon 14 Out of Order Processor Example # program lw $t0, 40($s0) add $t1, $t0, $s1 sub $t0, $s2, $s3 and $t2, $s4, $t0 or $t3, $s5, $s6 sw $s7, 80($t3) # execution order lw $t0, 40($s0) #1 or $t3, $s5, $s6 #1 sw $s7, 80($t3) #2 add $t1, $t0, $s1 #3 sub $t0, $s2, $s3 #3 and $t2, $s4, $t0 #4
  • 15. Carnegie Mellon 15 Time (cycles) 1 2 3 4 5 6 7 8 RF 40 $s0 RF $t0 + DM IM lw lw $t0, 40($s0) add $t1, $t0, $s1 sub $t0, $s2, $s3 and $t2, $s4, $t0 sw $s7, 80($t3) or | $s6 $s5 $t3 RF 80 $t3 RF + DM sw $s7 or $t3, $s5, $s6 IM RF $s1 $t0 RF $t1 + DM IM add sub - $s3 $s2 $t0 two cycle latency between load and use of $t0 RAW WAR RAW RF $t0 $s4 RF & DM and IM $t2 RAW # execution order lw $t0, 40($s0) #1 or $t3, $s5, $s6 #1 sw $s7, 80($t3) #2 add $t1, $t0, $s1 #3 sub $t0, $s2, $s3 #3 and $t2, $s4, $t0 #4 Actual IPC = 1.5 (6 instructions issued in 4 cycles)
  • 16. Carnegie Mellon 18 SIMD  Single Instruction Multiple Data (SIMD)  Single instruction acts on multiple pieces of data at once  Common application: graphics  Perform short arithmetic operations (also called packed arithmetic)  For example: add four 8-bit numbers  Must modify ALU to eliminate carries between 8-bit values padd8 $s2, $s0, $s1 a0 0 7 8 15 16 23 24 32 Bit position $s0 a1 a2 a3 b0 $s1 b1 b2 b3 a0 + b0 $s2 a1 + b1 a2 + b2 a3 + b3 +
  • 17. Carnegie Mellon 19 Advanced Architecture Techniques  Multithreading  Wordprocessor: thread for typing, spell checking, printing  Multiprocessors  Multiple processors (cores) on a single chip
  • 18. Carnegie Mellon 20 Multithreading: First Some Definitions  Process: program running on a computer  Multiple processes can run at once: e.g., surfing Web, playing music, writing a paper  Thread: part of a program  Each process has multiple threads: e.g., a word processor may have threads for typing, spell checking, printing
  • 19. Carnegie Mellon 21 Threads in Conventional Processor  One thread runs at once  When one thread stalls (for example, waiting for memory):  Architectural state of that thread is stored  Architectural state of waiting thread is loaded into processor and it runs  Called context switching  Appears to user like all threads running simultaneously
  • 20. Carnegie Mellon 22 Multithreading  Multiple copies of architectural state  Multiple threads active at once:  When one thread stalls, another runs immediately (no need to store or restore architectural state)  If one thread can’t keep all execution units busy, another thread can use them  Does not increase instruction-level parallelism (ILP) of single thread, but does increase throughput
  • 21. Carnegie Mellon 23 Multiprocessors  Multiple processors (cores) with a method of communication between them  Types of multiprocessing:  Symmetric multiprocessing (SMT): multiple cores with a shared memory  Asymmetric multiprocessing: separate cores for different tasks (for example, DSP and CPU in cell phone)  Clusters: each core has its own memory system
  • 22. Carnegie Mellon 24 Some Historical Processors  The following is an excerpt of processors from history
  • 24. Carnegie Mellon 26 Sun Ultrasparc (1995) 500nm, 4 million transistors  Early 64-bit architecture  Four issue superscalar  Thirty two 64-bit registers  7 read – 3 write ports  Nine stage integer pipeline  Cache  16 kByte data (direct)  16 kByte Instruction (2-way)  External L2 cache http://www.cs.cmu.edu/afs/cs/academic/class/15740-f97/public/platform/ultrasparc.pdf
  • 25. Carnegie Mellon 27 Dec Alpha 21264 (1996) 350nm, 15 million transistors  Early high frequency/power (600 MHz): 80–100W  Architecture  Out-of-order execution  Peak CPI == 6  Seven stage pipeline  Up to 80 instructions active  All instructions 32-bit (MIPS like)  Cache  64 kByte L1 Data & 64kByte L1 Instruction  1-16 Mbyte L2 Cache external http://www.ralph.timmermann.org/controller/ev6/chip.gif
  • 26. Carnegie Mellon 28 Intel Pentium 4 (2000) 180nm, 42 million transistors  Extreme pipelining  Net Burst Architecture  20-stage instruction pipeline P6 has 10 stages, P5 has 5 stages  2 ALUs, working at twice the clock rate to increase IPC  1 Power PC processor  12 k Execution Trace Cache (stores micro operations)  8 kByte L1 data cache  256 kByte L2 Cache http://www.tayloredge.com/museum/processor/2000_Pentium4.jpg
  • 27. Carnegie Mellon 29 IBM Cell (2006) 90nm, 250 million transistors  Early heterogeneous multicore  Heart of Playstation 3  8 Synergistic Processing Elements  256 kByte local storage  128, 128-bit registers  SIMD operation (16x 8-bit, 8x 16-bit, 4x 32-bit)  1 Power PC processor  64 kByte L1 Cache + 512 kByte L2 Cache http://www.ps3news.com/images/img_19889.jpg
  • 28. Carnegie Mellon 30 AMD Bulldozer (2011) 32nm technology, 1.2 billion transistors  Up to 4 modules  2 INT + 1 FP core each  Each INT core 2 ALUs  Each FP core: 4 ADD + 4 MAC  3 Levels of Cache on chip  8 MByte L3  2 MByte L2 per module  64 kByte two-way L1 instruction per module  16 kByte four-way L1 data cache per core http://en.wikipedia.org/wiki/File:AMD_Bulldozer_block_diagram_(8_core_CPU).PNG http://info.nuje.de/OrochiDieWithModule.jpg
  • 29. Carnegie Mellon 31 Other Resources  Patterson & Hennessy’s: Computer Architecture: A Quantitative Approach  Conferences:  www.cs.wisc.edu/~arch/www/  ISCA (International Symposium on Computer Architecture)  HPCA (International Symposium on High Performance Computer Architecture)