2. Contents
▪ What is Pipelining?
▪ Cache optimization
▪ Why Pipelining cache?
▪ Cache Hit and Cache Access
▪ How can we implement pipelining to cache
▪ Cache Pipelining effects
▪ References
3. What is Pipelining?
Time
Jobs
24 hrs
24 hrs
24 hrs
Un-pipelined
Throughput Parallelism
1 car /
24 hrs 1
Start and Finish a job before moving to next job
4. What is Pipelining? (cont.)
Time
Jobs
Throughput Parallelism
1 car / 8
hrs 3
Pipelined Break the job into small stages
Engine1
Engine2
Engine3
Engine4
Body1
Body2
Body2
Body4
Paint1
Paint2
Paint3
Paint4
8 hr
8 hr
8 hr
x3
5. What is Pipelining? (cont.)
Time
Jobs
3 ns
3 ns
3 ns
Un-pipelined Start and Finish an instruction execution before
moving to next instruction
FET DEC EXE
FET DEC EXE
FET DEC EXE
Cyc 1
Cyc 2
Cyc 3
6. What is Pipelining? (cont.)
Time
Jobs
Pipelined Break the instruction exeution into small stages
FET IR1
FET IR2
FET IR3
FET IR4
DEC IR1
DEC IR2
DEC IR3
DEC IR4
EXC IR1
EXC IR2
EXC IR3
EXC IR4
Cyc 1
Cyc 2
Cyc 3
1 ns 1 ns 1 ns
Un-pipelined
Clock Speed = 1 / 3ns
= 333 MHz
Pipelined
Clock Speed = 1 / 1ns
= 1 GHz
7. Cache optimization
▪ Average memory access time(AMAT) = Hit time + Miss rate × Miss
penalty
▪ 5 matrices : hit time, miss rate, miss penalty, bandwidth, power
consumption
▪ Optimizing CacheAccessTime
– Reducing the hit time (1st level catch, way-prediction)
– Increasing cache bandwidth (pipelining cache, non-blocking cache, multibanked
cache)
– Reducing the miss penalty (critical word first, merging write buffers)
– Reducing the miss rate (compiler optimizations)
– Reducing the miss penalty or miss rate via parallelism (prefetching)
8. Why Pipelining Cache?
▪ Basically used for L1 Cache.
▪ Multiple Cycles to access the cache
– Access comes in cycle N (hit)
– Access comes in Cycle N+1 (hit) (Has to wait)
Hit time = Actual hit time + wait time
9. Cache Hit and Cache Access
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ?
Hit ?
Hit ?
Where ?
Index
Done
Valid
bit
10. Designing a 3 Stage pipeline Cache
▪ Reading the tag and validity bit.
▪ Combine the result and find out the actual hit and start data read.
▪ Finishing the data read and transfer data to CPU.
Retrieve tag and valid bit Is Hit? Start data read Serve CPU request
11. Stage 1:Read tag and valid bit
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ?
Hit ?
Hit ?
Where ?
Index
Done
Valid
bit
12. Stage 2: If Hit start reading
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ?
Hit ?
Hit ?
Where ?
Index
Done
Valid
bit
13. Stage 3: Supply data to CPU
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ?
Hit ?
Hit ?
Where ?
Index
Done
Valid
bit
14. Designing a 2 Stage pipeline Cache
▪ Checking the tag and validity bit and combine them to find actual hit,
and find the location of data.
▪ Read data and serve the CPU request.
Retrieve tag and valid bit. Is Hit? Serve CPU request
16. Pipeline Cache Efficiency
▪ Increases the bandwidth
▪ increasing the number of pipeline stages leading to
– greater penalty on mispredicted branches
– more clock cycles between issuing the load and using the data
Technique
Hit
time
Bandwidth
Miss
penalty
Miss
rate
Power
consumption
Pipelining
Cache
_ +