SlideShare a Scribd company logo
1 of 20
Spring 2003 CSE P548 1
Reorder Buffer Implementation (Pentium Pro)
Hardware data structures
• retirement register file (RRF)
(~ IBM 360/91 physical registers)
• physical register file that is the same size as the architectural
registers
• one-to-one relationship
• reorder buffer (ROB)
(~ R10K active list)
• provides in-order instruction commit
• circular queue with head & tail pointers
• holds 40 “executing” instructions in program order
(dispatched but not yet committed)
• field for either integer or FP result after it has been computed
• a result value is put in its register after its producing instruction
reaches the head of the buffer & is removed (i.e., commits)
Spring 2003 CSE P548 2
Reorder Buffer Implementation (Pentium Pro)
• register alias table (RAT)
(~ R10K map table)
• provides register renaming
• important because very few GPRs in the x86 architecture
• indicates whether a source operand of a new instruction points
to the reorder buffer or the physical register file
• do an associative search of ROB destination registers for the
new source operands
• if found, consumer instruction points to the producer
instruction in the ROB
• the data hazard check before instruction dispatch
• reservation station
(~ IBM 360/91 reservation stations, R10000 instruction buffer)
• holds instructions waiting to execute
• provides forwarding to reduce RAW hazards
• result values go back to the reservation station (as well as
ROB) so dependent instructions have source operand
values
• provides out-of-order execution
Spring 2003 CSE P548 3
Pentium Pro Execution
In-order issue
• decode instructions
• enter uops into reorder buffer for in-order completion
• rename registers via register alias table
• detect structural hazards for reservation station
Out-of-order execution
• one reservation station, multiple entries
• check source operands for RAW hazards
• check structural hazards for separate integer, FP, memory units
• result goes to reservation station & reorder buffer
In-order completion
• this & previous uops have completed
• write “G”PR registers
• rollback on interrupts
Spring 2003 CSE P548 4
Pentium Pro
fetch & decode pipeline
BTB access (1 stage)
instruction fetch & align for decoding (2.5 stages)
decode & uop generation (2.5 stages)
register renaming & instruction issue to reservation
stations (3 stages minimum)
integer pipeline
execute, resolve branch
write registers & commit
load pipeline
address calculation & to memory reorder buffer
integrated L1 & L2 data cache access
pipelined FP add & multiply
Spring 2003 CSE P548 5
Pentium Pro
Spring 2003 CSE P548 6
Pentium Pro
Spring 2003 CSE P548 7
Spring 2003 CSE P548 8
Pentium Pro
Some bandwidth constraints: maximum for one cycle
• 16 bytes fetched
• 3 instructions decoded
• 6 µops to the reorder buffer
• 5 µops dispatched to reservation station & functional units
• 1 load & 1 store access to the L1 data cache
• 1 cache result returned
• 3 µops committed
if
• good instruction mix
• right instruction order
• operands available
• functional units available
• load & store to different cache banks
• all previous instructions already committed
Spring 2003 CSE P548 9
Making Tomasulo Precise
Reorder buffer
• contains:
• instruction type (register operation, store, branch)
• destination field (register, memory, none)
• result values
• ready bit to indicate instruction has executed
• replaces load & store buffers
• replaces renaming function of the reservation stations
• operand fields in reservation stations point to reorder buffer
location
• results tagged with reorder buffer entry number
• provides hardware speculation (speculative execution, not just fetch
& issue) plus precise interrupts
Spring 2003 CSE P548 10
Precise Implementation for Tomasulo
Spring 2003 CSE P548 11
Tomasulo’s Algorithm: Execution Steps
issue & read (assume the instruction has been fetched)
• structural hazard detection for entry in reservation station &
reorder buffer
• issue if no hazard
• stall if hazard
• read registers for source operands
• can come from either registers or reorder buffer
• RAT indicates which
• allocate entry in reorder buffer
execute
• RAW hazard detection for source operands
• snoop on common data bus for missing operands
• reservation station tag is entry number in reorder
buffer
• dispatch to functional unit when obtain both operand values
Spring 2003 CSE P548 12
Tomasulo’s Algorithm: Execution Steps
write result
• broadcast result & reorder buffer entry (tag) on the common
data bus to reservation stations & reorder buffer
commit
• retire the instruction at the head of the reorder buffer
• update register with result in reorder buffer or do a store
• remove instruction from reorder buffer
• if a mispredicted branch, restart with right-path
instructions
Spring 2003 CSE P548 13
Example in the Book 1
Cycle after first load has committed
ROB 6
Spring 2003 CSE P548 14
Example in the Book 2
Cycle after second load
committed
ROB 6
Spring 2003 CSE P548 15
Example in the Book 3
Cycle after subd wrote its result in the reorder buffer but still can’t
commit
yes
ROB 4ROB 6
Spring 2003 CSE P548 16
Example in the Book 4
Cycle after addd wrote its result in the reorder buffer but can’t
commit
ROB 6 ROB 4
Spring 2003 CSE P548 17
Example in the Book 5
Cycle after multd
committed
ROB 4ROB 6 ROB 5
Spring 2003 CSE P548 18
Example in the Book 6
Cycle after subd committed; Next: complete divd, commit divd, commit
addd
ROB 6 ROB 5
Spring 2003 CSE P548 19
Pool of Physical Regisers vs. Reorder Buffer
Think about the advantages and disadvantages of these implementations
• clean-up all done at instruction commit (physical register pool)
• record that value no longer speculative in register busy table
• unmap previous mapping for the architectural register
• instruction issue simpler (physical register pool)
• only look in one place for the source operands (the physical
register file)
• book claims that deallocating register is more complicated with a
physical register pool
• have to search for outstanding uses in the active list
• not done in practice: wait until the instruction that redefines the
architectural register commits
Spring 2003 CSE P548 20
Limits
Limits on out-of-order execution
• amount of ILP in the code
• scheduling window size
• need to do associative searches & its effect on cycle time
• number & types of functional units

More Related Content

What's hot

JS introduction
JS introductionJS introduction
JS introductionYi Tseng
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memoryNico Ludwig
 
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISALec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISAHsien-Hsin Sean Lee, Ph.D.
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWHsien-Hsin Sean Lee, Ph.D.
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Hsien-Hsin Sean Lee, Ph.D.
 
Assembly Language Lecture 3
Assembly Language Lecture 3Assembly Language Lecture 3
Assembly Language Lecture 3Motaz Saad
 
Assembly Language Lecture 1
Assembly Language Lecture 1Assembly Language Lecture 1
Assembly Language Lecture 1Motaz Saad
 
An Introduction to Distributed Data Streaming
An Introduction to Distributed Data StreamingAn Introduction to Distributed Data Streaming
An Introduction to Distributed Data StreamingParis Carbone
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Universität Rostock
 
Code GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principleCode GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principleMarina Kolpakova
 
Pipelining and co processor.
Pipelining and co processor.Pipelining and co processor.
Pipelining and co processor.Piyush Rochwani
 
Unit 2 assembly language programming
Unit 2   assembly language programmingUnit 2   assembly language programming
Unit 2 assembly language programmingKartik Sharma
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisJonas Traub
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerMarina Kolpakova
 

What's hot (20)

Run time
Run timeRun time
Run time
 
JS introduction
JS introductionJS introduction
JS introduction
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
 
Run time administration
Run time administrationRun time administration
Run time administration
 
Protocol Independence
Protocol IndependenceProtocol Independence
Protocol Independence
 
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISALec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
 
ARM inst set part 2
ARM inst set part 2ARM inst set part 2
ARM inst set part 2
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
 
Assembly Language Lecture 3
Assembly Language Lecture 3Assembly Language Lecture 3
Assembly Language Lecture 3
 
Assembly Language Lecture 1
Assembly Language Lecture 1Assembly Language Lecture 1
Assembly Language Lecture 1
 
An Introduction to Distributed Data Streaming
An Introduction to Distributed Data StreamingAn Introduction to Distributed Data Streaming
An Introduction to Distributed Data Streaming
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 
Embedded system -Introduction to hardware designing
Embedded system  -Introduction to hardware designingEmbedded system  -Introduction to hardware designing
Embedded system -Introduction to hardware designing
 
Code GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principleCode GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principle
 
Pipelining and co processor.
Pipelining and co processor.Pipelining and co processor.
Pipelining and co processor.
 
Unit 2 assembly language programming
Unit 2   assembly language programmingUnit 2   assembly language programming
Unit 2 assembly language programming
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
 
FIFOPt
FIFOPtFIFOPt
FIFOPt
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 

Viewers also liked

Register transfer language
Register  transfer languageRegister  transfer language
Register transfer languagehamza munir
 
Extreme programming (xp) | David Tzemach
Extreme programming (xp) | David TzemachExtreme programming (xp) | David Tzemach
Extreme programming (xp) | David TzemachDavid Tzemach
 
Register transfer language
Register transfer languageRegister transfer language
Register transfer languageSanjeev Patel
 
extreme Programming
extreme Programmingextreme Programming
extreme ProgrammingBilal Shah
 
Agile Engineering for Managers Workshop
Agile Engineering for Managers WorkshopAgile Engineering for Managers Workshop
Agile Engineering for Managers WorkshopPaul Boos
 

Viewers also liked (8)

XP (Extreme programming) SE1
XP (Extreme programming)  SE1XP (Extreme programming)  SE1
XP (Extreme programming) SE1
 
Register transfer language
Register  transfer languageRegister  transfer language
Register transfer language
 
Ch12 system administration
Ch12 system administration Ch12 system administration
Ch12 system administration
 
Extreme programming (xp) | David Tzemach
Extreme programming (xp) | David TzemachExtreme programming (xp) | David Tzemach
Extreme programming (xp) | David Tzemach
 
Extreme programming (xp)
Extreme programming (xp)Extreme programming (xp)
Extreme programming (xp)
 
Register transfer language
Register transfer languageRegister transfer language
Register transfer language
 
extreme Programming
extreme Programmingextreme Programming
extreme Programming
 
Agile Engineering for Managers Workshop
Agile Engineering for Managers WorkshopAgile Engineering for Managers Workshop
Agile Engineering for Managers Workshop
 

Similar to Reorder buf

12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and functionSher Shah Merkhel
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
RuntimeenvironmentAnusuya123
 
Buffer overflow attacks
Buffer overflow attacksBuffer overflow attacks
Buffer overflow attacksJapneet Singh
 
Embedded computing platform design
Embedded computing platform designEmbedded computing platform design
Embedded computing platform designRAMPRAKASHT1
 
Replication
ReplicationReplication
ReplicationPerforce
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Performance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorPerformance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorDileep Bhandarkar
 
127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on Linux127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on LinuxSam Bowne
 
80386_ Bus Cycles & System Architecture.pdf
80386_ Bus Cycles & System Architecture.pdf80386_ Bus Cycles & System Architecture.pdf
80386_ Bus Cycles & System Architecture.pdfrupalidalvi10
 
Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technologyAmirali Sharifian
 

Similar to Reorder buf (20)

test
testtest
test
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
Runtimeenvironment
 
15 ia64
15 ia6415 ia64
15 ia64
 
Buffer overflow attacks
Buffer overflow attacksBuffer overflow attacks
Buffer overflow attacks
 
Embedded computing platform design
Embedded computing platform designEmbedded computing platform design
Embedded computing platform design
 
ch2.pptx
ch2.pptxch2.pptx
ch2.pptx
 
Replication
ReplicationReplication
Replication
 
class-Stacks.pptx
class-Stacks.pptxclass-Stacks.pptx
class-Stacks.pptx
 
Intel IA 64
Intel IA 64Intel IA 64
Intel IA 64
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
MICROPROCESSOR
MICROPROCESSORMICROPROCESSOR
MICROPROCESSOR
 
Lec05
Lec05Lec05
Lec05
 
Performance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro ProcessorPerformance Characterization of the Pentium Pro Processor
Performance Characterization of the Pentium Pro Processor
 
127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on Linux127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on Linux
 
Lecture 03 basics of pic
Lecture 03 basics of picLecture 03 basics of pic
Lecture 03 basics of pic
 
13 superscalar
13 superscalar13 superscalar
13 superscalar
 
80386_ Bus Cycles & System Architecture.pdf
80386_ Bus Cycles & System Architecture.pdf80386_ Bus Cycles & System Architecture.pdf
80386_ Bus Cycles & System Architecture.pdf
 
13_Superscalar.ppt
13_Superscalar.ppt13_Superscalar.ppt
13_Superscalar.ppt
 
Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technology
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Reorder buf

  • 1. Spring 2003 CSE P548 1 Reorder Buffer Implementation (Pentium Pro) Hardware data structures • retirement register file (RRF) (~ IBM 360/91 physical registers) • physical register file that is the same size as the architectural registers • one-to-one relationship • reorder buffer (ROB) (~ R10K active list) • provides in-order instruction commit • circular queue with head & tail pointers • holds 40 “executing” instructions in program order (dispatched but not yet committed) • field for either integer or FP result after it has been computed • a result value is put in its register after its producing instruction reaches the head of the buffer & is removed (i.e., commits)
  • 2. Spring 2003 CSE P548 2 Reorder Buffer Implementation (Pentium Pro) • register alias table (RAT) (~ R10K map table) • provides register renaming • important because very few GPRs in the x86 architecture • indicates whether a source operand of a new instruction points to the reorder buffer or the physical register file • do an associative search of ROB destination registers for the new source operands • if found, consumer instruction points to the producer instruction in the ROB • the data hazard check before instruction dispatch • reservation station (~ IBM 360/91 reservation stations, R10000 instruction buffer) • holds instructions waiting to execute • provides forwarding to reduce RAW hazards • result values go back to the reservation station (as well as ROB) so dependent instructions have source operand values • provides out-of-order execution
  • 3. Spring 2003 CSE P548 3 Pentium Pro Execution In-order issue • decode instructions • enter uops into reorder buffer for in-order completion • rename registers via register alias table • detect structural hazards for reservation station Out-of-order execution • one reservation station, multiple entries • check source operands for RAW hazards • check structural hazards for separate integer, FP, memory units • result goes to reservation station & reorder buffer In-order completion • this & previous uops have completed • write “G”PR registers • rollback on interrupts
  • 4. Spring 2003 CSE P548 4 Pentium Pro fetch & decode pipeline BTB access (1 stage) instruction fetch & align for decoding (2.5 stages) decode & uop generation (2.5 stages) register renaming & instruction issue to reservation stations (3 stages minimum) integer pipeline execute, resolve branch write registers & commit load pipeline address calculation & to memory reorder buffer integrated L1 & L2 data cache access pipelined FP add & multiply
  • 5. Spring 2003 CSE P548 5 Pentium Pro
  • 6. Spring 2003 CSE P548 6 Pentium Pro
  • 8. Spring 2003 CSE P548 8 Pentium Pro Some bandwidth constraints: maximum for one cycle • 16 bytes fetched • 3 instructions decoded • 6 µops to the reorder buffer • 5 µops dispatched to reservation station & functional units • 1 load & 1 store access to the L1 data cache • 1 cache result returned • 3 µops committed if • good instruction mix • right instruction order • operands available • functional units available • load & store to different cache banks • all previous instructions already committed
  • 9. Spring 2003 CSE P548 9 Making Tomasulo Precise Reorder buffer • contains: • instruction type (register operation, store, branch) • destination field (register, memory, none) • result values • ready bit to indicate instruction has executed • replaces load & store buffers • replaces renaming function of the reservation stations • operand fields in reservation stations point to reorder buffer location • results tagged with reorder buffer entry number • provides hardware speculation (speculative execution, not just fetch & issue) plus precise interrupts
  • 10. Spring 2003 CSE P548 10 Precise Implementation for Tomasulo
  • 11. Spring 2003 CSE P548 11 Tomasulo’s Algorithm: Execution Steps issue & read (assume the instruction has been fetched) • structural hazard detection for entry in reservation station & reorder buffer • issue if no hazard • stall if hazard • read registers for source operands • can come from either registers or reorder buffer • RAT indicates which • allocate entry in reorder buffer execute • RAW hazard detection for source operands • snoop on common data bus for missing operands • reservation station tag is entry number in reorder buffer • dispatch to functional unit when obtain both operand values
  • 12. Spring 2003 CSE P548 12 Tomasulo’s Algorithm: Execution Steps write result • broadcast result & reorder buffer entry (tag) on the common data bus to reservation stations & reorder buffer commit • retire the instruction at the head of the reorder buffer • update register with result in reorder buffer or do a store • remove instruction from reorder buffer • if a mispredicted branch, restart with right-path instructions
  • 13. Spring 2003 CSE P548 13 Example in the Book 1 Cycle after first load has committed ROB 6
  • 14. Spring 2003 CSE P548 14 Example in the Book 2 Cycle after second load committed ROB 6
  • 15. Spring 2003 CSE P548 15 Example in the Book 3 Cycle after subd wrote its result in the reorder buffer but still can’t commit yes ROB 4ROB 6
  • 16. Spring 2003 CSE P548 16 Example in the Book 4 Cycle after addd wrote its result in the reorder buffer but can’t commit ROB 6 ROB 4
  • 17. Spring 2003 CSE P548 17 Example in the Book 5 Cycle after multd committed ROB 4ROB 6 ROB 5
  • 18. Spring 2003 CSE P548 18 Example in the Book 6 Cycle after subd committed; Next: complete divd, commit divd, commit addd ROB 6 ROB 5
  • 19. Spring 2003 CSE P548 19 Pool of Physical Regisers vs. Reorder Buffer Think about the advantages and disadvantages of these implementations • clean-up all done at instruction commit (physical register pool) • record that value no longer speculative in register busy table • unmap previous mapping for the architectural register • instruction issue simpler (physical register pool) • only look in one place for the source operands (the physical register file) • book claims that deallocating register is more complicated with a physical register pool • have to search for outstanding uses in the active list • not done in practice: wait until the instruction that redefines the architectural register commits
  • 20. Spring 2003 CSE P548 20 Limits Limits on out-of-order execution • amount of ILP in the code • scheduling window size • need to do associative searches & its effect on cycle time • number & types of functional units