SlideShare a Scribd company logo
1 of 61
Download to read offline
SPARC-T1
Cache & Virtual Memory
Architecture
by : Kaushik Patra
(kpatra@gmail.com)
Agenda
SPARC-T1 overview
SPARC core overview
L1 Caches and TLBs
L1 I-cache
IFQ & MIL
I-cache fill path
I-cache miss path
L1 D-Cache
Data flow through LSU
Memory Management
Unit
Data flow in MMU
TLB structure
TLB entry replacement
algorithm.
L2-Cache Overview
L2-Cache structure
L2-Cache line replacement
algorithm.
SPARC T1 overview
SPARC T1 overview
8 SPARC V9 core
4 threads per core
16 KB L1 instruction cache
(I-Cache) per core.
8KB L1 data cache (D-Cache)
per core.
SPARC T1 overview
3MB L2 cache
shared by all cores
4-way banked
12-way associative
132 GB/sec cross bar
interconnect for on-chip
communication.
SPARC T1 overview
4 DDR-II DRAM controller
144 bit interface per channel
25GB/sec total peak bandwidth.
IEE 754 compliant floating
point unit (FPU).
Shared by all core
SPARC T1 overview
External interface
J-Bus interface for I/O
2.56 GB/sec peak bandwidth
128 bit multiplexed address and
data bus.
Serial System Interface (SSI) for
boot PROM.
SPARC T1 overview
SPARC core overview
SPARC core overview
Instruction Fetch Unit (IFU)
Load Store Unit (LSU)
Memory Management Unit (MMU).
Execution Unit (EXU)
Multiplier Unit
Trap Logic Unit
Floating Point Front end Unit
Stream Processing Unit
SPARC core overview
SPARC core data path
Separate instruction
cache (I-Cache) and data
cache (D-Cache).
SPARC core overview
We’ll limit our discussion
within I-Cache and D-
Cache.
We’ll also include associated
TLB architecture for
supporting memory
virtualization.
L1 Cache and TLBs
IFU contains I-cache and
I-TLB
LSU contains D-cache
and D-TLB
MMU
IFU LSU
I-Cache
I-TLB
D-Cache
D-TLB
L1 Cache and TLBs
IFU controls I-Cache
content.
LSU controls D-Cache
content.
MMU controls both the I-
TLB and D-TLB
MMU
IFU LSU
I-Cache
I-TLB
D-Cache
D-TLB
L1 I-Cache
Physically indexed and tagged
address is translated into physical location using I-TLB before
cache hit/miss is determined.
4-way set associative.
16KB data storage with 32 bytes line size.
Single ported data and tag array.
L1 I-Cache
I-Cache fill size - 16 bytes per access.
Cached data contains
32-bit instruction
1-bit parity
1-bit pre-decode
Valid bit array has 1 read and 1 write port.
Cache invalidation access only V-bit array.
L1 I-Cache
Cache line replacement in pseudo-random.
Read access has higher priority over write access to I-
cache.
Maximum wait time for a write access is 25 SPARC
core clock cycle.
Any write access request waiting more than 25 clock cycle will
cause pipeline stall in-order allow access to complete the
pending write operation.
IFQ & MIL
Instruction Fill Queue (IFQ)
feeds into I-Cache.
Missed Instruction List (MIL)
stores the addresses which
missed the I-Cache or I-TLB
access.
MIL feeds into LSU for
further processing.
I-Cache
I-TLB
IFQ MIL
To
LSU
From
LSU
IFU
Address
Instruction fetch
For every SPARC core clock cycle 2 instruction is
fetched per instruction issue.
This strategy has been takes to reduce I-Cache read access for
opportunistic I-Cache line fill.
Each thread is allowed to have one outstanding I-
cache miss.
i.e. total 4 I-cache miss per core is allowed.
Duplicate I-cache miss do not induce redundant fill
request for L2-Cache.
I-Cache fill path
Fill packet (PCX) comes from L2-
cache via LSU.
Parity and pre-decode is computed
before I-cache is filled up.
CPX packet also includes
invalidations
test access point (TAP) read & write
error notification
IFQINV
BIST > ASI > CPX
BIST ASI CPXPKT
from
LSU
To
V-bit array
To
I-Cache
Bypass To
TIR
I-Cache fill path
Invalidation CPX is handled
through INV block.
access V-bit array
IFQ has a bypass circuit to deliver
current CPX directly to Thread
Instruction Register (TIR) to avoid
extra stall in processing instruction.
IFQINV
BIST > ASI > CPX
BIST ASI CPXPKT
from
LSU
To
V-bit array
To
I-Cache
Bypass To
TIR
I-Cache fill path
Each I-cache fill takes 2 CPX, 16
bytes each.
I-cache line size is 32 bytes.
I-cache line is invalidated after first
packet is written.
I-cache line becomes valid again
after 2nd packet is written.
IFQINV
BIST > ASI > CPX
BIST ASI CPXPKT
from
LSU
To
V-bit array
To
I-Cache
Bypass To
TIR
V I
WRITE
CPX-1
WRITE
CPX-2
V = Valid
I = Invalid
WRITE
CPX-1
I-Cache miss path
Missed Instruction List (MIL) sends
I-Cache miss request to L2-Cache
using LSU.
One miss entry per thread, i.e. total
4 miss entry per SPARC core.
Each entry in MIL contains
physical address (PA).
The replacement way information.
The MIL state information.
The cacheability.
The error information
MIL
Physical
Address
(PA)
RR Arbitrator
COMP
PCXPKT
to
LSU
I-Cache miss path
PA keeps track of I-cache fetch
progress from I-cache miss till I-
cache fill.
Round robin algorithm to dispatch
I-cache fill request from different
threads.
MIL uses linked list, of size 4, to
keep track of duplicate I-cache
miss.
Marks duplicate request as child.
Any child request is serviced as soon as
parent request gets response.
MIL
Physical
Address
(PA)
RR Arbitrator
COMP
PCXPKT
to
LSU
I-Cache miss path
S1 S3
S2
S4
Make
Fill
Request
CPX-1
not done
Send
Speculative
notification
New
I-Cache
Miss
CPX-2
not doneSend
notification
MIL alters between 4
states.
starts with S1 upon new I-
cache miss.
Makes fill request.
Wait till I-cache fill is done.
Upon completing CPX-1
fill, it sends speculative
completion notification to
thread scheduler.
I-Cache miss path
An I-Cache fill request may be cancelled upon trap or
exception.
However, MIL still goes through the filling a cache line, but the bypass to
TIR is blocked.
Why ? because, the pending child request should be serviced even if the
parent request is cancelled.
Child I-cache miss request needs to wait till the parent’s I-
cache miss request is serviced.The child instruction fetch
shall be retired (rollback) to fetch stage to allow it to access
the I-cache.This is referred as ‘miss-fill crossover’ .
L1 D-Cache
4-way set associative
8 KB data storage with 16 byte line size.
Single read-write port for data and tag array.
Dual ported Valid bit (V-bit) array.
cache invalidation only access this V-bit array.
L1 D-Cache
Cache line replacement policy is pseudo random,
using a linear shift register, with allocated load miss,
but non-allocated store miss.
A cacheable load miss will allocate a line and will
execute the write through policy before the line is
loaded.
Stores do not allocate. Hence, store will causes line
invalidation if the target address is already in D-
cache, as determined by L2 cache directory.
L1 D-Cache
L1 D-cache is always inclusive to L2 cache.
L1 D-cache is always exclusive to L1 I-cache.
Each L1 D-cache is parity protected.
Parity error will cause D-cache miss, hence data will be
corrected.
In addition to pipeline read, L1 D-cache may be
accessed by ASI, BIST and RAM-test through test
access port (TAP).
Data flow through LSU
One store buffer (STB)
per thread.
Load misses are kept in
Load Miss Queue,
LMQ.
One outstanding load miss
per thread.
Load miss with duplicate
physical address (PA) is not
sent to L2-cache.
Fully associative DTLB
All CAM/RAM accesses are
single cycle operation.
I-Cache
I-TLB
DFQ
To
PCX
From
CPX
LSU
Address
STB
PCX
Generator
PCX PKT
from IFU STORELOAD
IFU
LMQ
IRF,FRF
Data flow through LSU
STB consists of store
buffer CAM (SCM) and
store data array
(STBDATA).
SCM has 1 CAM port and
1 RW port
STBDATA has 1 read and
1 write port.
Each thread is
allocated with 8 fixed
entries in the shared
data structure.
I-Cache
I-TLB
DFQ
To
PCX
From
CPX
LSU
Address
STB
PCX
Generator
PCX PKT
from IFU STORELOAD
IFU
LMQ
IRF,FRF
Data flow through LSU
A load instruction
speculate on D-cache
miss to reduce the CCX
access latency.
If speculation fails, load
instruction is taken out of
LMQ.
The arbiter (PCX
generator) takes 13
different inputs to
generate the packet to
PCX (Processor-to-
Crossbar interface).
I-Cache
I-TLB
DFQ
To
PCX
From
CPX
LSU
Address
STB
PCX
Generator
PCX PKT
from IFU STORELOAD
IFU
LMQ
IRF,FRF
Data flow through LSU
The arbiter inputs
consist of
4 load type instructions
4 store type instructions
One I-cache fill.
One FPU access.
One SPU access.
One interrupt.
One forward packet.
I-Cache
I-TLB
DFQ
To
PCX
From
CPX
LSU
Address
STB
PCX
Generator
PCX PKT
from IFU STORELOAD
IFU
LMQ
IRF,FRF
Data flow through LSU
The arbitration inputs
consist of
I-cache miss
Load miss
Stores
{FPU operations, SPU
operations, Interrupts}
A two level history
mechanism allows to
implement fair
scheduling among
different priority levels.
I-Cache
I-TLB
DFQ
To
PCX
From
CPX
LSU
Address
STB
PCX
Generator
PCX PKT
from IFU STORELOAD
IFU
LMQ
IRF,FRF
Data flow through LSU
In coming packets are
stored in the data fill
queue (DFQ).
Packets can be
Acknowledgment
Data
The targets for DFQ
are
Instruction fetch unit
(IFU)
Load Store Unit (LSU)
Trap Logic Unit (TLU)
Stream Processing Unit
(SPU)
I-Cache
I-TLB
DFQ
To
PCX
From
CPX
LSU
Address
STB
PCX
Generator
PCX PKT
from IFU STORELOAD
IFU
LMQ
IRF,FRF
Memory Management Unit
Maintains content of ITLB and
DTLB.
MMU helps SPARC-T1 to provide
support for virtualization.
Multiple OS co-exists on to of CMT
processor.
Hypervisor layer virtualizes underlying
CPU.
Virtual address (VA) from
application is translated into Real
Address (RA) and then to Physical
Address (PA) using TLB & MMU.
Data Flow in MMU
The system software maintains the content of TLBs by
sending instructions to MMU.
Instructions are - read, write, de-map.
TLB entries are shared among threads.
Consistency among TLB entries are maintained through
auto-de-map.
MMU is responsible for
Generating the pointers to Software Translation Storage Buffer (STB).
Maintains fault status for various traps.
Access to MMU is through hypervisor-managed ASI
(Alternate Space Identifier) operations, e.g. ldxa, stxa.
TLB structure
TLB structure
TLB consists of Content Addressable Memory (CAM)
and Random Access Memory (RAM).
CAM has 1 compare port and 1 read-write port.
RAM has 1 read-write port.
TLB support the mutually exclusive events of - CAM,
Read,Write, Bypass, De-map, Soft-reset, Hard-reset.
TLB structure
RAM contains the following fields.
Physical Address (PA).
Attributes.
CAM contains the following fields.
Partition ID (PID).
Real (indicates VA-to-PA or RA-to-PA translation)
Virtual address (VA), divided into page size based fields (V0 - V3)
Context ID (CTXT)
TLB entry replacement algorithm
Each entry has an used bit.
Replacement is picked up by the least significant
unused bit among all 64 entries.
A used bit is set on - write, CAM hit or lock.
A locked page has always the used bit set.
Entry invalidation will clear the used bit.
All used bit will be cleared, except the locked entry, if
TLB reaches saturation.
If TLB is saturated for all locked entry, default location
0x63 s chosen and error is reported.
L2-cache overview
3MB in total size with four
symmetrical data bank.
Each bank operates
independently.
Each bank is 12-way set
associative and 768KB of size.
Line size is 64 bytes.
Number of sets are 1024.
L2-cache overview
Accepts request from
processor-to-crossbar (PCX)
interface - a part of CCX.
Puts response on crossbar-to-
processor (CPX) interface - a
part of CCX.
Responsible to maintain on-
chip coherency across all L1-
cache.
Keeps copy of all L1 tags in a
directory structure.
L2-cache overview
128-bit fill interface.
64-bit write interface with
DMA controller.
Each bank has dedicated
DMA controller.
8-staged pipe lined cache
controller.
L2-cache overview
32-bit word is protected by 7-
bit single error correction,
double error detection (SEC/
DED) ECC code.
J-bus interface (JBI) using
snoop input queue and
RDMA write buffer.
L2-Cache structure
3 main components.
SCTAG (Secondary Cache TAG) :
contains TAG array,VUAD array, L2-
TAG directory and cache controller.
SCBUF (Secondary Cache BUF) :
contains write back buffer (WBB),
fill buffer (FB) and DMA buffer.
SCDATA (Secondary Cache
DATA) : contains L2-cache data.
L2-cache : Arbiter
Manages L2-cache pipeline access from various source
of request access.
Arbiter gets input from
Instruction from CCX and bypass path for input
queue (IQ).
DMA instruction from snoop input queue
Instructions for re-cycle from miss buffer (MB) and
fill buffer (FB).
Stall signal from the pipeline.
L2-cache :TAG
22-bit tag with 6-bit of SEC ECC protection.
No double bit error detection.
Single ported array.
Four states are maintained per tag line in VUAD array
Valid (V)
Used (U)
Allocated (A)
Dirty (D)
L2-cache :VUAD
Dual ported array structure.
VAD bits are parity protected since an error will be fatal
Used bit is not protected, since the error is not fatal.
VUAD is accessed while taking decision for line
replacement.
L2-cache : DATA
Single ported SRAM structure.
768 KB in size with 64 bytes logical line size.
Allows read access of 16 bytes and 64 bytes.
‘16-byte enable’ allows writing in 4-byte part.
Line fill updates all 64 bytes at a time.
L2-cache : DATA
Data array is subdivided into 4 column with six 32
Kbyte sub-array in each column.
Data array access needs 2 cycles to be completed.
No column can be accessed in consecutive cycle.
All accesses are pipelined, thus access have a through
put of one per cycle.
Each 32-bit line is protected by 7 bits of SEC/DED
ECC.
L2-cache : Input Queue(IQ)
16-entry FIFO queue takes incoming PCX packets.
Each entry is 130 bit wide.
FIFO implemented with dual ported array.
IQ asserts a stall when 11 entries are filled up.
To allow incoming packets already in fly.
L2-cache : Output Queue(OQ)
16-entry FIFO for the packets waiting to get access to
CPX.
Each entry is 146-bit wide.
FIFO implemented with dual ported array.
When OQ reaches high-water mark, L2-cache stops
accepting PCX and input from miss buffer.
Fills can still be happened since they do not cause CPX packets.
L2-cache : Miss Buffer (MB)
16-entry miss buffer stores instructions which can not be
processed as simple cache hit.
True L2 cache miss.
Same cache line address which had a miss.
An entry in the write back buffer.
Instructions need multiple L2 cache pipeline.
Unallocated L2-cache misses.
Access causing tag ECC error.
Non tag part holds data - it is a RAM with 1R 1W port.
Tag part holds address - it is a CAM with 1R,1W and 1CAM port.
L2-cache : Fill Buffer (FB)
8-entry buffer.
Contains cache-line wide entry to stage data from DRAM before it
fills the cache.
RAM structure is used for implementation.
Address is also stored to maintain the age ordering to satisfy data
coherence.
CAM structure is used for implementation.
Data arrives from DRAM in four 16 byte block starting with the
critical quad-word.
L2-cache :Write Back Buffer (WBB)
8-entry buffer, used to store 64-byte dirty data upon eviction.
The evicted lines are streamed to DRAM opportunistically.
An instruction having same address line as in WBB, the
instruction is pushed back into MB.
WBB also has RAM and CAM part to hold data and address
respectively.
64-byte read interface with data array and 64-bit write interface to
DRAM controller.
L2-cache : Directory
2048 entries, with one entry per L1 tag.
It is L1 tag to L2 bank mapping.
Half entries are for L1 I-cache and other half is for L1 D-cache.
I-cache dir and D-cache dir.
Participates in coherency management.
Also ensures same line is not a part of I-Cache and D-Cache.
Uses pseudo LRU for line replacement.
The ‘U’ bit (total 12, 1 per way) is set upon cache hit
All 12 ‘U’ bits get cleared when there is no unused or
unallocated.
‘A’-bit means the line is allocated for a miss.
Analogous to ‘lock’ bit.
‘A’ -bit gets cleared when the line fill happens.
L2-cache : Line Replacement Algorithm
‘D’-bit indicates the line is only valid inside cache and required
to be written back.
Set when data written to L2-cache.
Cleared when line is invalidated.
LRU examines all the ways from a certain point based one a
round-robin fashion.
The first unused-unallocated line is allocated for miss.
If no unused, first unallocated line is allocated for miss.
L2-cache : Line Replacement Algorithm
Scope of future study
Cache cross bar (CCX) for data transaction.
L2-cache pipelined data flow control.
Cache memory consistency and instruction
ordering.
Reference
http://opensparc-t1.sunsource.net/specs/
OpenSPARCT1_Micro_Arch.pdf

More Related Content

What's hot

Microprocessors and microcontrollers
Microprocessors and microcontrollersMicroprocessors and microcontrollers
Microprocessors and microcontrollersgomathy S
 
Microprocessor Architecture-III
Microprocessor Architecture-IIIMicroprocessor Architecture-III
Microprocessor Architecture-IIIDr.YNM
 
Intel x86 Architecture
Intel x86 ArchitectureIntel x86 Architecture
Intel x86 ArchitectureChangWoo Min
 
Intel x86 and ARM Data types
Intel x86 and ARM Data typesIntel x86 and ARM Data types
Intel x86 and ARM Data typesRowena Cornejo
 
GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64Yi-Hsiu Hsu
 
Comparison of pentium processor with 80386 and 80486
Comparison of pentium processor with  80386 and 80486Comparison of pentium processor with  80386 and 80486
Comparison of pentium processor with 80386 and 80486Tech_MX
 
Clock-8086 bus cycle
Clock-8086 bus cycleClock-8086 bus cycle
Clock-8086 bus cycleRani Rahul
 
Pentium (80586) Microprocessor By Er. Swapnil Kaware
Pentium (80586) Microprocessor By Er. Swapnil KawarePentium (80586) Microprocessor By Er. Swapnil Kaware
Pentium (80586) Microprocessor By Er. Swapnil KawareProf. Swapnil V. Kaware
 
Flag registers, addressing modes, instruction set
Flag registers, addressing modes, instruction setFlag registers, addressing modes, instruction set
Flag registers, addressing modes, instruction setaviban
 
Module 2 instruction set
Module 2 instruction set Module 2 instruction set
Module 2 instruction set Deepak John
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64Yi-Hsiu Hsu
 
Unit 1 8085 Timing diagram - lecture 5b
Unit 1  8085 Timing diagram - lecture 5bUnit 1  8085 Timing diagram - lecture 5b
Unit 1 8085 Timing diagram - lecture 5bDickson Nkongo
 
Register of 80386
Register of 80386Register of 80386
Register of 80386aviban
 

What's hot (20)

Microprocessors and microcontrollers
Microprocessors and microcontrollersMicroprocessors and microcontrollers
Microprocessors and microcontrollers
 
Mpmc
MpmcMpmc
Mpmc
 
Microprocessor Architecture-III
Microprocessor Architecture-IIIMicroprocessor Architecture-III
Microprocessor Architecture-III
 
Blackfin core architecture
Blackfin core architectureBlackfin core architecture
Blackfin core architecture
 
Intel x86 Architecture
Intel x86 ArchitectureIntel x86 Architecture
Intel x86 Architecture
 
Intel x86 and ARM Data types
Intel x86 and ARM Data typesIntel x86 and ARM Data types
Intel x86 and ARM Data types
 
Pentium 8086 Instruction Format
Pentium 8086 Instruction FormatPentium 8086 Instruction Format
Pentium 8086 Instruction Format
 
GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64
 
Comparison of pentium processor with 80386 and 80486
Comparison of pentium processor with  80386 and 80486Comparison of pentium processor with  80386 and 80486
Comparison of pentium processor with 80386 and 80486
 
Clock-8086 bus cycle
Clock-8086 bus cycleClock-8086 bus cycle
Clock-8086 bus cycle
 
Pentium (80586) Microprocessor By Er. Swapnil Kaware
Pentium (80586) Microprocessor By Er. Swapnil KawarePentium (80586) Microprocessor By Er. Swapnil Kaware
Pentium (80586) Microprocessor By Er. Swapnil Kaware
 
SHLD and LHLD instruction
SHLD and LHLD instructionSHLD and LHLD instruction
SHLD and LHLD instruction
 
Various type of register
Various type of registerVarious type of register
Various type of register
 
Qb microprocessors
Qb microprocessorsQb microprocessors
Qb microprocessors
 
Flag registers, addressing modes, instruction set
Flag registers, addressing modes, instruction setFlag registers, addressing modes, instruction set
Flag registers, addressing modes, instruction set
 
Module 2 instruction set
Module 2 instruction set Module 2 instruction set
Module 2 instruction set
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64
 
Unit 1 8085 Timing diagram - lecture 5b
Unit 1  8085 Timing diagram - lecture 5bUnit 1  8085 Timing diagram - lecture 5b
Unit 1 8085 Timing diagram - lecture 5b
 
Register of 80386
Register of 80386Register of 80386
Register of 80386
 
Pentinum 2
Pentinum 2Pentinum 2
Pentinum 2
 

Viewers also liked

Presentation sparc m6 m5-32 server technical overview
Presentation   sparc m6 m5-32 server technical overviewPresentation   sparc m6 m5-32 server technical overview
Presentation sparc m6 m5-32 server technical overviewsolarisyougood
 
Partnership revision questions ay 2014 2015
Partnership revision questions ay 2014 2015Partnership revision questions ay 2014 2015
Partnership revision questions ay 2014 2015JUMA BANANUKA
 
Accounting for Partnerships
Accounting for PartnershipsAccounting for Partnerships
Accounting for PartnershipsArthik Davianti
 
Parallelism
ParallelismParallelism
ParallelismMs. Ross
 
Accounting for Partnership by Guerrero et al
Accounting for Partnership by Guerrero et alAccounting for Partnership by Guerrero et al
Accounting for Partnership by Guerrero et alAdrian Chris Arevalo
 
Partnership accounting
Partnership accountingPartnership accounting
Partnership accountingKhuram Shahzad
 
Cpu and its functions
Cpu and its functionsCpu and its functions
Cpu and its functionsmyrajendra
 
Format of all accounts for O Levels
Format of all accounts for O LevelsFormat of all accounts for O Levels
Format of all accounts for O LevelsMuhammad Talha
 
08. Central Processing Unit (CPU)
08. Central Processing Unit (CPU)08. Central Processing Unit (CPU)
08. Central Processing Unit (CPU)Akhila Dakshina
 
Partnership accounts
Partnership accountsPartnership accounts
Partnership accountsSam Catlin
 

Viewers also liked (16)

Presentation sparc m6 m5-32 server technical overview
Presentation   sparc m6 m5-32 server technical overviewPresentation   sparc m6 m5-32 server technical overview
Presentation sparc m6 m5-32 server technical overview
 
shashank_micro92_00697015
shashank_micro92_00697015shashank_micro92_00697015
shashank_micro92_00697015
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
Partnership revision questions ay 2014 2015
Partnership revision questions ay 2014 2015Partnership revision questions ay 2014 2015
Partnership revision questions ay 2014 2015
 
Accounting for Partnerships
Accounting for PartnershipsAccounting for Partnerships
Accounting for Partnerships
 
Parallelism
ParallelismParallelism
Parallelism
 
Accounting for Partnership by Guerrero et al
Accounting for Partnership by Guerrero et alAccounting for Partnership by Guerrero et al
Accounting for Partnership by Guerrero et al
 
Partnership accounts
Partnership accountsPartnership accounts
Partnership accounts
 
Partnership accounting
Partnership accountingPartnership accounting
Partnership accounting
 
Types of parallelism
Types of parallelismTypes of parallelism
Types of parallelism
 
Cpu and its functions
Cpu and its functionsCpu and its functions
Cpu and its functions
 
Format of all accounts for O Levels
Format of all accounts for O LevelsFormat of all accounts for O Levels
Format of all accounts for O Levels
 
04 Cache Memory
04  Cache  Memory04  Cache  Memory
04 Cache Memory
 
Cpu
CpuCpu
Cpu
 
08. Central Processing Unit (CPU)
08. Central Processing Unit (CPU)08. Central Processing Unit (CPU)
08. Central Processing Unit (CPU)
 
Partnership accounts
Partnership accountsPartnership accounts
Partnership accounts
 

Similar to SPARC T1 MMU Architecture

Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet FiltersKernel TLV
 
Io Architecture
Io ArchitectureIo Architecture
Io ArchitectureAero Plane
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Z 80 processors (History-Products)
Z 80 processors (History-Products)Z 80 processors (History-Products)
Z 80 processors (History-Products)Mohammed Hilal
 
M2 242-scsi-bus rl-2.4.2
M2 242-scsi-bus rl-2.4.2M2 242-scsi-bus rl-2.4.2
M2 242-scsi-bus rl-2.4.2MrudulaJoshi10
 
Implementation of UART with Status Register using Multi Bit Flip-Flop
Implementation of UART with Status Register using Multi Bit  Flip-FlopImplementation of UART with Status Register using Multi Bit  Flip-Flop
Implementation of UART with Status Register using Multi Bit Flip-FlopIJMER
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptxJoyChowdhury30
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Hsien-Hsin Sean Lee, Ph.D.
 
Enhancing the Open-Source P-Mesh Cache Coherence System for Open ISAs
Enhancing the Open-Source P-Mesh Cache Coherence System for Open ISAsEnhancing the Open-Source P-Mesh Cache Coherence System for Open ISAs
Enhancing the Open-Source P-Mesh Cache Coherence System for Open ISAsGanesan Narayanasamy
 

Similar to SPARC T1 MMU Architecture (20)

Presentation on risc pipeline
Presentation on risc pipelinePresentation on risc pipeline
Presentation on risc pipeline
 
Risc vs cisc
Risc vs ciscRisc vs cisc
Risc vs cisc
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
Io Architecture
Io ArchitectureIo Architecture
Io Architecture
 
Memory Mapping Cache
Memory Mapping CacheMemory Mapping Cache
Memory Mapping Cache
 
Arm11
Arm11Arm11
Arm11
 
486 or 80486 DX Architecture
486 or 80486 DX Architecture486 or 80486 DX Architecture
486 or 80486 DX Architecture
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
arm-cortex-a8
arm-cortex-a8arm-cortex-a8
arm-cortex-a8
 
15 ia64
15 ia6415 ia64
15 ia64
 
Z 80 processors (History-Products)
Z 80 processors (History-Products)Z 80 processors (History-Products)
Z 80 processors (History-Products)
 
cache memory
 cache memory cache memory
cache memory
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
 
Arch 1112-6
Arch 1112-6Arch 1112-6
Arch 1112-6
 
M2 242-scsi-bus rl-2.4.2
M2 242-scsi-bus rl-2.4.2M2 242-scsi-bus rl-2.4.2
M2 242-scsi-bus rl-2.4.2
 
Implementation of UART with Status Register using Multi Bit Flip-Flop
Implementation of UART with Status Register using Multi Bit  Flip-FlopImplementation of UART with Status Register using Multi Bit  Flip-Flop
Implementation of UART with Status Register using Multi Bit Flip-Flop
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
 
Enhancing the Open-Source P-Mesh Cache Coherence System for Open ISAs
Enhancing the Open-Source P-Mesh Cache Coherence System for Open ISAsEnhancing the Open-Source P-Mesh Cache Coherence System for Open ISAs
Enhancing the Open-Source P-Mesh Cache Coherence System for Open ISAs
 

Recently uploaded

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 

Recently uploaded (20)

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 

SPARC T1 MMU Architecture

  • 1. SPARC-T1 Cache & Virtual Memory Architecture by : Kaushik Patra (kpatra@gmail.com)
  • 2. Agenda SPARC-T1 overview SPARC core overview L1 Caches and TLBs L1 I-cache IFQ & MIL I-cache fill path I-cache miss path L1 D-Cache Data flow through LSU Memory Management Unit Data flow in MMU TLB structure TLB entry replacement algorithm. L2-Cache Overview L2-Cache structure L2-Cache line replacement algorithm.
  • 4. SPARC T1 overview 8 SPARC V9 core 4 threads per core 16 KB L1 instruction cache (I-Cache) per core. 8KB L1 data cache (D-Cache) per core.
  • 5. SPARC T1 overview 3MB L2 cache shared by all cores 4-way banked 12-way associative 132 GB/sec cross bar interconnect for on-chip communication.
  • 6. SPARC T1 overview 4 DDR-II DRAM controller 144 bit interface per channel 25GB/sec total peak bandwidth. IEE 754 compliant floating point unit (FPU). Shared by all core
  • 7. SPARC T1 overview External interface J-Bus interface for I/O 2.56 GB/sec peak bandwidth 128 bit multiplexed address and data bus. Serial System Interface (SSI) for boot PROM.
  • 10. SPARC core overview Instruction Fetch Unit (IFU) Load Store Unit (LSU) Memory Management Unit (MMU). Execution Unit (EXU) Multiplier Unit Trap Logic Unit Floating Point Front end Unit Stream Processing Unit
  • 11. SPARC core overview SPARC core data path Separate instruction cache (I-Cache) and data cache (D-Cache).
  • 12. SPARC core overview We’ll limit our discussion within I-Cache and D- Cache. We’ll also include associated TLB architecture for supporting memory virtualization.
  • 13. L1 Cache and TLBs IFU contains I-cache and I-TLB LSU contains D-cache and D-TLB MMU IFU LSU I-Cache I-TLB D-Cache D-TLB
  • 14. L1 Cache and TLBs IFU controls I-Cache content. LSU controls D-Cache content. MMU controls both the I- TLB and D-TLB MMU IFU LSU I-Cache I-TLB D-Cache D-TLB
  • 15. L1 I-Cache Physically indexed and tagged address is translated into physical location using I-TLB before cache hit/miss is determined. 4-way set associative. 16KB data storage with 32 bytes line size. Single ported data and tag array.
  • 16. L1 I-Cache I-Cache fill size - 16 bytes per access. Cached data contains 32-bit instruction 1-bit parity 1-bit pre-decode Valid bit array has 1 read and 1 write port. Cache invalidation access only V-bit array.
  • 17. L1 I-Cache Cache line replacement in pseudo-random. Read access has higher priority over write access to I- cache. Maximum wait time for a write access is 25 SPARC core clock cycle. Any write access request waiting more than 25 clock cycle will cause pipeline stall in-order allow access to complete the pending write operation.
  • 18. IFQ & MIL Instruction Fill Queue (IFQ) feeds into I-Cache. Missed Instruction List (MIL) stores the addresses which missed the I-Cache or I-TLB access. MIL feeds into LSU for further processing. I-Cache I-TLB IFQ MIL To LSU From LSU IFU Address
  • 19. Instruction fetch For every SPARC core clock cycle 2 instruction is fetched per instruction issue. This strategy has been takes to reduce I-Cache read access for opportunistic I-Cache line fill. Each thread is allowed to have one outstanding I- cache miss. i.e. total 4 I-cache miss per core is allowed. Duplicate I-cache miss do not induce redundant fill request for L2-Cache.
  • 20. I-Cache fill path Fill packet (PCX) comes from L2- cache via LSU. Parity and pre-decode is computed before I-cache is filled up. CPX packet also includes invalidations test access point (TAP) read & write error notification IFQINV BIST > ASI > CPX BIST ASI CPXPKT from LSU To V-bit array To I-Cache Bypass To TIR
  • 21. I-Cache fill path Invalidation CPX is handled through INV block. access V-bit array IFQ has a bypass circuit to deliver current CPX directly to Thread Instruction Register (TIR) to avoid extra stall in processing instruction. IFQINV BIST > ASI > CPX BIST ASI CPXPKT from LSU To V-bit array To I-Cache Bypass To TIR
  • 22. I-Cache fill path Each I-cache fill takes 2 CPX, 16 bytes each. I-cache line size is 32 bytes. I-cache line is invalidated after first packet is written. I-cache line becomes valid again after 2nd packet is written. IFQINV BIST > ASI > CPX BIST ASI CPXPKT from LSU To V-bit array To I-Cache Bypass To TIR V I WRITE CPX-1 WRITE CPX-2 V = Valid I = Invalid WRITE CPX-1
  • 23. I-Cache miss path Missed Instruction List (MIL) sends I-Cache miss request to L2-Cache using LSU. One miss entry per thread, i.e. total 4 miss entry per SPARC core. Each entry in MIL contains physical address (PA). The replacement way information. The MIL state information. The cacheability. The error information MIL Physical Address (PA) RR Arbitrator COMP PCXPKT to LSU
  • 24. I-Cache miss path PA keeps track of I-cache fetch progress from I-cache miss till I- cache fill. Round robin algorithm to dispatch I-cache fill request from different threads. MIL uses linked list, of size 4, to keep track of duplicate I-cache miss. Marks duplicate request as child. Any child request is serviced as soon as parent request gets response. MIL Physical Address (PA) RR Arbitrator COMP PCXPKT to LSU
  • 25. I-Cache miss path S1 S3 S2 S4 Make Fill Request CPX-1 not done Send Speculative notification New I-Cache Miss CPX-2 not doneSend notification MIL alters between 4 states. starts with S1 upon new I- cache miss. Makes fill request. Wait till I-cache fill is done. Upon completing CPX-1 fill, it sends speculative completion notification to thread scheduler.
  • 26. I-Cache miss path An I-Cache fill request may be cancelled upon trap or exception. However, MIL still goes through the filling a cache line, but the bypass to TIR is blocked. Why ? because, the pending child request should be serviced even if the parent request is cancelled. Child I-cache miss request needs to wait till the parent’s I- cache miss request is serviced.The child instruction fetch shall be retired (rollback) to fetch stage to allow it to access the I-cache.This is referred as ‘miss-fill crossover’ .
  • 27. L1 D-Cache 4-way set associative 8 KB data storage with 16 byte line size. Single read-write port for data and tag array. Dual ported Valid bit (V-bit) array. cache invalidation only access this V-bit array.
  • 28. L1 D-Cache Cache line replacement policy is pseudo random, using a linear shift register, with allocated load miss, but non-allocated store miss. A cacheable load miss will allocate a line and will execute the write through policy before the line is loaded. Stores do not allocate. Hence, store will causes line invalidation if the target address is already in D- cache, as determined by L2 cache directory.
  • 29. L1 D-Cache L1 D-cache is always inclusive to L2 cache. L1 D-cache is always exclusive to L1 I-cache. Each L1 D-cache is parity protected. Parity error will cause D-cache miss, hence data will be corrected. In addition to pipeline read, L1 D-cache may be accessed by ASI, BIST and RAM-test through test access port (TAP).
  • 30. Data flow through LSU One store buffer (STB) per thread. Load misses are kept in Load Miss Queue, LMQ. One outstanding load miss per thread. Load miss with duplicate physical address (PA) is not sent to L2-cache. Fully associative DTLB All CAM/RAM accesses are single cycle operation. I-Cache I-TLB DFQ To PCX From CPX LSU Address STB PCX Generator PCX PKT from IFU STORELOAD IFU LMQ IRF,FRF
  • 31. Data flow through LSU STB consists of store buffer CAM (SCM) and store data array (STBDATA). SCM has 1 CAM port and 1 RW port STBDATA has 1 read and 1 write port. Each thread is allocated with 8 fixed entries in the shared data structure. I-Cache I-TLB DFQ To PCX From CPX LSU Address STB PCX Generator PCX PKT from IFU STORELOAD IFU LMQ IRF,FRF
  • 32. Data flow through LSU A load instruction speculate on D-cache miss to reduce the CCX access latency. If speculation fails, load instruction is taken out of LMQ. The arbiter (PCX generator) takes 13 different inputs to generate the packet to PCX (Processor-to- Crossbar interface). I-Cache I-TLB DFQ To PCX From CPX LSU Address STB PCX Generator PCX PKT from IFU STORELOAD IFU LMQ IRF,FRF
  • 33. Data flow through LSU The arbiter inputs consist of 4 load type instructions 4 store type instructions One I-cache fill. One FPU access. One SPU access. One interrupt. One forward packet. I-Cache I-TLB DFQ To PCX From CPX LSU Address STB PCX Generator PCX PKT from IFU STORELOAD IFU LMQ IRF,FRF
  • 34. Data flow through LSU The arbitration inputs consist of I-cache miss Load miss Stores {FPU operations, SPU operations, Interrupts} A two level history mechanism allows to implement fair scheduling among different priority levels. I-Cache I-TLB DFQ To PCX From CPX LSU Address STB PCX Generator PCX PKT from IFU STORELOAD IFU LMQ IRF,FRF
  • 35. Data flow through LSU In coming packets are stored in the data fill queue (DFQ). Packets can be Acknowledgment Data The targets for DFQ are Instruction fetch unit (IFU) Load Store Unit (LSU) Trap Logic Unit (TLU) Stream Processing Unit (SPU) I-Cache I-TLB DFQ To PCX From CPX LSU Address STB PCX Generator PCX PKT from IFU STORELOAD IFU LMQ IRF,FRF
  • 36. Memory Management Unit Maintains content of ITLB and DTLB. MMU helps SPARC-T1 to provide support for virtualization. Multiple OS co-exists on to of CMT processor. Hypervisor layer virtualizes underlying CPU. Virtual address (VA) from application is translated into Real Address (RA) and then to Physical Address (PA) using TLB & MMU.
  • 37. Data Flow in MMU The system software maintains the content of TLBs by sending instructions to MMU. Instructions are - read, write, de-map. TLB entries are shared among threads. Consistency among TLB entries are maintained through auto-de-map. MMU is responsible for Generating the pointers to Software Translation Storage Buffer (STB). Maintains fault status for various traps. Access to MMU is through hypervisor-managed ASI (Alternate Space Identifier) operations, e.g. ldxa, stxa.
  • 39. TLB structure TLB consists of Content Addressable Memory (CAM) and Random Access Memory (RAM). CAM has 1 compare port and 1 read-write port. RAM has 1 read-write port. TLB support the mutually exclusive events of - CAM, Read,Write, Bypass, De-map, Soft-reset, Hard-reset.
  • 40. TLB structure RAM contains the following fields. Physical Address (PA). Attributes. CAM contains the following fields. Partition ID (PID). Real (indicates VA-to-PA or RA-to-PA translation) Virtual address (VA), divided into page size based fields (V0 - V3) Context ID (CTXT)
  • 41. TLB entry replacement algorithm Each entry has an used bit. Replacement is picked up by the least significant unused bit among all 64 entries. A used bit is set on - write, CAM hit or lock. A locked page has always the used bit set. Entry invalidation will clear the used bit. All used bit will be cleared, except the locked entry, if TLB reaches saturation. If TLB is saturated for all locked entry, default location 0x63 s chosen and error is reported.
  • 42. L2-cache overview 3MB in total size with four symmetrical data bank. Each bank operates independently. Each bank is 12-way set associative and 768KB of size. Line size is 64 bytes. Number of sets are 1024.
  • 43. L2-cache overview Accepts request from processor-to-crossbar (PCX) interface - a part of CCX. Puts response on crossbar-to- processor (CPX) interface - a part of CCX. Responsible to maintain on- chip coherency across all L1- cache. Keeps copy of all L1 tags in a directory structure.
  • 44. L2-cache overview 128-bit fill interface. 64-bit write interface with DMA controller. Each bank has dedicated DMA controller. 8-staged pipe lined cache controller.
  • 45. L2-cache overview 32-bit word is protected by 7- bit single error correction, double error detection (SEC/ DED) ECC code. J-bus interface (JBI) using snoop input queue and RDMA write buffer.
  • 46. L2-Cache structure 3 main components. SCTAG (Secondary Cache TAG) : contains TAG array,VUAD array, L2- TAG directory and cache controller. SCBUF (Secondary Cache BUF) : contains write back buffer (WBB), fill buffer (FB) and DMA buffer. SCDATA (Secondary Cache DATA) : contains L2-cache data.
  • 47. L2-cache : Arbiter Manages L2-cache pipeline access from various source of request access. Arbiter gets input from Instruction from CCX and bypass path for input queue (IQ). DMA instruction from snoop input queue Instructions for re-cycle from miss buffer (MB) and fill buffer (FB). Stall signal from the pipeline.
  • 48. L2-cache :TAG 22-bit tag with 6-bit of SEC ECC protection. No double bit error detection. Single ported array. Four states are maintained per tag line in VUAD array Valid (V) Used (U) Allocated (A) Dirty (D)
  • 49. L2-cache :VUAD Dual ported array structure. VAD bits are parity protected since an error will be fatal Used bit is not protected, since the error is not fatal. VUAD is accessed while taking decision for line replacement.
  • 50. L2-cache : DATA Single ported SRAM structure. 768 KB in size with 64 bytes logical line size. Allows read access of 16 bytes and 64 bytes. ‘16-byte enable’ allows writing in 4-byte part. Line fill updates all 64 bytes at a time.
  • 51. L2-cache : DATA Data array is subdivided into 4 column with six 32 Kbyte sub-array in each column. Data array access needs 2 cycles to be completed. No column can be accessed in consecutive cycle. All accesses are pipelined, thus access have a through put of one per cycle. Each 32-bit line is protected by 7 bits of SEC/DED ECC.
  • 52. L2-cache : Input Queue(IQ) 16-entry FIFO queue takes incoming PCX packets. Each entry is 130 bit wide. FIFO implemented with dual ported array. IQ asserts a stall when 11 entries are filled up. To allow incoming packets already in fly.
  • 53. L2-cache : Output Queue(OQ) 16-entry FIFO for the packets waiting to get access to CPX. Each entry is 146-bit wide. FIFO implemented with dual ported array. When OQ reaches high-water mark, L2-cache stops accepting PCX and input from miss buffer. Fills can still be happened since they do not cause CPX packets.
  • 54. L2-cache : Miss Buffer (MB) 16-entry miss buffer stores instructions which can not be processed as simple cache hit. True L2 cache miss. Same cache line address which had a miss. An entry in the write back buffer. Instructions need multiple L2 cache pipeline. Unallocated L2-cache misses. Access causing tag ECC error. Non tag part holds data - it is a RAM with 1R 1W port. Tag part holds address - it is a CAM with 1R,1W and 1CAM port.
  • 55. L2-cache : Fill Buffer (FB) 8-entry buffer. Contains cache-line wide entry to stage data from DRAM before it fills the cache. RAM structure is used for implementation. Address is also stored to maintain the age ordering to satisfy data coherence. CAM structure is used for implementation. Data arrives from DRAM in four 16 byte block starting with the critical quad-word.
  • 56. L2-cache :Write Back Buffer (WBB) 8-entry buffer, used to store 64-byte dirty data upon eviction. The evicted lines are streamed to DRAM opportunistically. An instruction having same address line as in WBB, the instruction is pushed back into MB. WBB also has RAM and CAM part to hold data and address respectively. 64-byte read interface with data array and 64-bit write interface to DRAM controller.
  • 57. L2-cache : Directory 2048 entries, with one entry per L1 tag. It is L1 tag to L2 bank mapping. Half entries are for L1 I-cache and other half is for L1 D-cache. I-cache dir and D-cache dir. Participates in coherency management. Also ensures same line is not a part of I-Cache and D-Cache.
  • 58. Uses pseudo LRU for line replacement. The ‘U’ bit (total 12, 1 per way) is set upon cache hit All 12 ‘U’ bits get cleared when there is no unused or unallocated. ‘A’-bit means the line is allocated for a miss. Analogous to ‘lock’ bit. ‘A’ -bit gets cleared when the line fill happens. L2-cache : Line Replacement Algorithm
  • 59. ‘D’-bit indicates the line is only valid inside cache and required to be written back. Set when data written to L2-cache. Cleared when line is invalidated. LRU examines all the ways from a certain point based one a round-robin fashion. The first unused-unallocated line is allocated for miss. If no unused, first unallocated line is allocated for miss. L2-cache : Line Replacement Algorithm
  • 60. Scope of future study Cache cross bar (CCX) for data transaction. L2-cache pipelined data flow control. Cache memory consistency and instruction ordering.