SlideShare a Scribd company logo
1 of 17
Pipelining Cache
By Riman Mandal
Contents
▪ What is Pipelining?
▪ Cache optimization
▪ Why Pipelining cache?
▪ Cache Hit and Cache Access
▪ How can we implement pipelining to cache
▪ Cache Pipelining effects
▪ References
What is Pipelining?
Time
Jobs
24 hrs
24 hrs
24 hrs
Un-pipelined
Throughput Parallelism
1 car /
24 hrs 1
Start and Finish a job before moving to next job
What is Pipelining? (cont.)
Time
Jobs
Throughput Parallelism
1 car / 8
hrs 3
Pipelined Break the job into small stages
Engine1
Engine2
Engine3
Engine4
Body1
Body2
Body2
Body4
Paint1
Paint2
Paint3
Paint4
8 hr
8 hr
8 hr
x3
What is Pipelining? (cont.)
Time
Jobs
3 ns
3 ns
3 ns
Un-pipelined Start and Finish an instruction execution before
moving to next instruction
FET DEC EXE
FET DEC EXE
FET DEC EXE
Cyc 1
Cyc 2
Cyc 3
What is Pipelining? (cont.)
Time
Jobs
Pipelined Break the instruction exeution into small stages
FET IR1
FET IR2
FET IR3
FET IR4
DEC IR1
DEC IR2
DEC IR3
DEC IR4
EXC IR1
EXC IR2
EXC IR3
EXC IR4
Cyc 1
Cyc 2
Cyc 3
1 ns 1 ns 1 ns
Un-pipelined
Clock Speed = 1 / 3ns
= 333 MHz
Pipelined
Clock Speed = 1 / 1ns
= 1 GHz
Cache optimization
▪ Average memory access time(AMAT) = Hit time + Miss rate × Miss
penalty
▪ 5 matrices : hit time, miss rate, miss penalty, bandwidth, power
consumption
▪ Optimizing CacheAccessTime
– Reducing the hit time (1st level catch, way-prediction)
– Increasing cache bandwidth (pipelining cache, non-blocking cache, multibanked
cache)
– Reducing the miss penalty (critical word first, merging write buffers)
– Reducing the miss rate (compiler optimizations)
– Reducing the miss penalty or miss rate via parallelism (prefetching)
Why Pipelining Cache?
▪ Basically used for L1 Cache.
▪ Multiple Cycles to access the cache
– Access comes in cycle N (hit)
– Access comes in Cycle N+1 (hit) (Has to wait)
Hit time = Actual hit time + wait time
Cache Hit and Cache Access
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ?
Hit ?
Hit ?
Where ?
Index
Done
Valid
bit
Designing a 3 Stage pipeline Cache
▪ Reading the tag and validity bit.
▪ Combine the result and find out the actual hit and start data read.
▪ Finishing the data read and transfer data to CPU.
Retrieve tag and valid bit Is Hit? Start data read Serve CPU request
Stage 1:Read tag and valid bit
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ?
Hit ?
Hit ?
Where ?
Index
Done
Valid
bit
Stage 2: If Hit start reading
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ?
Hit ?
Hit ?
Where ?
Index
Done
Valid
bit
Stage 3: Supply data to CPU
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ?
Hit ?
Hit ?
Where ?
Index
Done
Valid
bit
Designing a 2 Stage pipeline Cache
▪ Checking the tag and validity bit and combine them to find actual hit,
and find the location of data.
▪ Read data and serve the CPU request.
Retrieve tag and valid bit. Is Hit? Serve CPU request
Example
▪ Instruction-cache pipeline stages:
– Pentium: 1 stage
– Pentium Pro through Pentium III: 2 stages
– Pentium 4: 4 stages
Pipeline Cache Efficiency
▪ Increases the bandwidth
▪ increasing the number of pipeline stages leading to
– greater penalty on mispredicted branches
– more clock cycles between issuing the load and using the data
Technique
Hit
time
Bandwidth
Miss
penalty
Miss
rate
Power
consumption
Pipelining
Cache
_ +
References
▪ https://www.udacity.com/course/high-performance-computer-
architecture--ud007
▪ https://www.youtube.com/watch?v=r9AxfQB_qlc
▪ “ComputerArchitecture: A Quantitative Approach Fifth Edition”, by
Hennessy & Patterson

More Related Content

What's hot

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
Basic blocks and control flow graphs
Basic blocks and control flow graphsBasic blocks and control flow graphs
Basic blocks and control flow graphsTilakpoudel2
 
Approximation algorithms
Approximation algorithmsApproximation algorithms
Approximation algorithmsGanesh Solanke
 
Instruction set of 8086
Instruction set of 8086Instruction set of 8086
Instruction set of 80869840596838
 
Algorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAlgorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAdelina Ahadova
 
Webinar : P, NP, NP-Hard , NP - Complete problems
Webinar : P, NP, NP-Hard , NP - Complete problems Webinar : P, NP, NP-Hard , NP - Complete problems
Webinar : P, NP, NP-Hard , NP - Complete problems Ziyauddin Shaik
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and ArchitectureVinit Raut
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipeliningMazin Alwaaly
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
 
strassen matrix multiplication algorithm
strassen matrix multiplication algorithmstrassen matrix multiplication algorithm
strassen matrix multiplication algorithmevil eye
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical AnalyzerArchana Gopinath
 

What's hot (20)

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Quantization.pptx
Quantization.pptxQuantization.pptx
Quantization.pptx
 
Basic blocks and control flow graphs
Basic blocks and control flow graphsBasic blocks and control flow graphs
Basic blocks and control flow graphs
 
Compiler lec 8
Compiler lec 8Compiler lec 8
Compiler lec 8
 
hardwired control unit ppt
hardwired control unit ppthardwired control unit ppt
hardwired control unit ppt
 
Approximation algorithms
Approximation algorithmsApproximation algorithms
Approximation algorithms
 
Hopfield Networks
Hopfield NetworksHopfield Networks
Hopfield Networks
 
Instruction set of 8086
Instruction set of 8086Instruction set of 8086
Instruction set of 8086
 
3. Syntax Analyzer.pptx
3. Syntax Analyzer.pptx3. Syntax Analyzer.pptx
3. Syntax Analyzer.pptx
 
Algorithm Complexity and Main Concepts
Algorithm Complexity and Main ConceptsAlgorithm Complexity and Main Concepts
Algorithm Complexity and Main Concepts
 
Webinar : P, NP, NP-Hard , NP - Complete problems
Webinar : P, NP, NP-Hard , NP - Complete problems Webinar : P, NP, NP-Hard , NP - Complete problems
Webinar : P, NP, NP-Hard , NP - Complete problems
 
8085 alp programs
8085 alp programs8085 alp programs
8085 alp programs
 
Pci,usb,scsi bus
Pci,usb,scsi busPci,usb,scsi bus
Pci,usb,scsi bus
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and Architecture
 
8086 micro processor
8086 micro processor8086 micro processor
8086 micro processor
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipelining
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
strassen matrix multiplication algorithm
strassen matrix multiplication algorithmstrassen matrix multiplication algorithm
strassen matrix multiplication algorithm
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
 

Similar to Pipelining Cache

How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDBPingCAP
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKInfluxData
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWRpasalapudi
 
Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on ReadDatabricks
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streamingdatamantra
 
CMPN301-Pipelining_V2.pptx
CMPN301-Pipelining_V2.pptxCMPN301-Pipelining_V2.pptx
CMPN301-Pipelining_V2.pptxNadaAAmin
 
13 static timing_analysis_4_set_up_and_hold_time_violation_remedy
13 static timing_analysis_4_set_up_and_hold_time_violation_remedy13 static timing_analysis_4_set_up_and_hold_time_violation_remedy
13 static timing_analysis_4_set_up_and_hold_time_violation_remedyUsha Mehta
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Tal Bar-Zvi
 
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...Flink Forward
 
Unveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep DiveUnveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep DiveChieh (Jack) Yu
 
Spark + AI Summit recap jul16 2020
Spark + AI Summit recap jul16 2020Spark + AI Summit recap jul16 2020
Spark + AI Summit recap jul16 2020Guido Oswald
 
Computer network (8)
Computer network (8)Computer network (8)
Computer network (8)NYversity
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!Timo Walther
 
Web TCard - Speed optimization
Web TCard - Speed optimizationWeb TCard - Speed optimization
Web TCard - Speed optimizationEric Guo
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure Eman magdy
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big DataPingCAP
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022HostedbyConfluent
 

Similar to Pipelining Cache (20)

How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on Read
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
CMPN301-Pipelining_V2.pptx
CMPN301-Pipelining_V2.pptxCMPN301-Pipelining_V2.pptx
CMPN301-Pipelining_V2.pptx
 
13 static timing_analysis_4_set_up_and_hold_time_violation_remedy
13 static timing_analysis_4_set_up_and_hold_time_violation_remedy13 static timing_analysis_4_set_up_and_hold_time_violation_remedy
13 static timing_analysis_4_set_up_and_hold_time_violation_remedy
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
 
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
 
Google Spanner
Google SpannerGoogle Spanner
Google Spanner
 
Unveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep DiveUnveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep Dive
 
Spark + AI Summit recap jul16 2020
Spark + AI Summit recap jul16 2020Spark + AI Summit recap jul16 2020
Spark + AI Summit recap jul16 2020
 
Computer network (8)
Computer network (8)Computer network (8)
Computer network (8)
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
 
presentation
presentationpresentation
presentation
 
Web TCard - Speed optimization
Web TCard - Speed optimizationWeb TCard - Speed optimization
Web TCard - Speed optimization
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big Data
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022
 

Pipelining Cache

  • 2. Contents ▪ What is Pipelining? ▪ Cache optimization ▪ Why Pipelining cache? ▪ Cache Hit and Cache Access ▪ How can we implement pipelining to cache ▪ Cache Pipelining effects ▪ References
  • 3. What is Pipelining? Time Jobs 24 hrs 24 hrs 24 hrs Un-pipelined Throughput Parallelism 1 car / 24 hrs 1 Start and Finish a job before moving to next job
  • 4. What is Pipelining? (cont.) Time Jobs Throughput Parallelism 1 car / 8 hrs 3 Pipelined Break the job into small stages Engine1 Engine2 Engine3 Engine4 Body1 Body2 Body2 Body4 Paint1 Paint2 Paint3 Paint4 8 hr 8 hr 8 hr x3
  • 5. What is Pipelining? (cont.) Time Jobs 3 ns 3 ns 3 ns Un-pipelined Start and Finish an instruction execution before moving to next instruction FET DEC EXE FET DEC EXE FET DEC EXE Cyc 1 Cyc 2 Cyc 3
  • 6. What is Pipelining? (cont.) Time Jobs Pipelined Break the instruction exeution into small stages FET IR1 FET IR2 FET IR3 FET IR4 DEC IR1 DEC IR2 DEC IR3 DEC IR4 EXC IR1 EXC IR2 EXC IR3 EXC IR4 Cyc 1 Cyc 2 Cyc 3 1 ns 1 ns 1 ns Un-pipelined Clock Speed = 1 / 3ns = 333 MHz Pipelined Clock Speed = 1 / 1ns = 1 GHz
  • 7. Cache optimization ▪ Average memory access time(AMAT) = Hit time + Miss rate × Miss penalty ▪ 5 matrices : hit time, miss rate, miss penalty, bandwidth, power consumption ▪ Optimizing CacheAccessTime – Reducing the hit time (1st level catch, way-prediction) – Increasing cache bandwidth (pipelining cache, non-blocking cache, multibanked cache) – Reducing the miss penalty (critical word first, merging write buffers) – Reducing the miss rate (compiler optimizations) – Reducing the miss penalty or miss rate via parallelism (prefetching)
  • 8. Why Pipelining Cache? ▪ Basically used for L1 Cache. ▪ Multiple Cycles to access the cache – Access comes in cycle N (hit) – Access comes in Cycle N+1 (hit) (Has to wait) Hit time = Actual hit time + wait time
  • 9. Cache Hit and Cache Access Tag Set Offset@ Tag Data Tag Data Tag Data Set 0 Set 1 Set 2 Hit ? Hit ? Hit ? Where ? Index Done Valid bit
  • 10. Designing a 3 Stage pipeline Cache ▪ Reading the tag and validity bit. ▪ Combine the result and find out the actual hit and start data read. ▪ Finishing the data read and transfer data to CPU. Retrieve tag and valid bit Is Hit? Start data read Serve CPU request
  • 11. Stage 1:Read tag and valid bit Tag Set Offset@ Tag Data Tag Data Tag Data Set 0 Set 1 Set 2 Hit ? Hit ? Hit ? Where ? Index Done Valid bit
  • 12. Stage 2: If Hit start reading Tag Set Offset@ Tag Data Tag Data Tag Data Set 0 Set 1 Set 2 Hit ? Hit ? Hit ? Where ? Index Done Valid bit
  • 13. Stage 3: Supply data to CPU Tag Set Offset@ Tag Data Tag Data Tag Data Set 0 Set 1 Set 2 Hit ? Hit ? Hit ? Where ? Index Done Valid bit
  • 14. Designing a 2 Stage pipeline Cache ▪ Checking the tag and validity bit and combine them to find actual hit, and find the location of data. ▪ Read data and serve the CPU request. Retrieve tag and valid bit. Is Hit? Serve CPU request
  • 15. Example ▪ Instruction-cache pipeline stages: – Pentium: 1 stage – Pentium Pro through Pentium III: 2 stages – Pentium 4: 4 stages
  • 16. Pipeline Cache Efficiency ▪ Increases the bandwidth ▪ increasing the number of pipeline stages leading to – greater penalty on mispredicted branches – more clock cycles between issuing the load and using the data Technique Hit time Bandwidth Miss penalty Miss rate Power consumption Pipelining Cache _ +