SlideShare a Scribd company logo
1 of 30
Download to read offline
Transaction-Level Design

6.884 – Spring 2005 Krste, 3/14/05 L14-UTL1
Hardware Design Abstraction Levels

Algorithm
Circuits
Application
Guarded Atomic Actions (Bluespec)
Devices
Unit-Transaction Level (UTL) Model
Gates
Physics
Register-Transfer Level (Verilog RTL)
Today’s 

Lecture

6.884 – Spring 2005 Krste, 3/14/05 L14-UTL2
Application to RTL in One Step?

Modern hardware systems have complex functionality
(graphics chips, video encoders, wireless communication
channels), but sometimes designers try to map directly to
an RTL cycle-level microarchitecture in one step
ƒ	 Requires detailed cycle-level design of each sub-unit
–	 Significant design effort required before clear if design will
meet goals
ƒ	 Interactions between units becomes unclear if arbitrary
circuit connections allowed between units, with possible
cycle-level timing dependencies
– Increases complexity of unit specifications
ƒ Removes degrees of freedom for unit designers
–	 Reduces possible space for architecture exploration
ƒ	 Difficult to document intended operation, therefore
difficult to verify
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL3
Transaction-Level Design

Arch. 

Arch. State State

Arch. State
Unit 1
Unit 2 Unit
3
Shared Memory Unit
ƒ	 Model design as messages flowing through FIFO buffers between
units containing architectural state
ƒ	 Each unit can independently perform an operation, or

transaction, that may consume messages, update local state, 

and send further messages 

ƒ	 Transaction and/or communication might take many cycles (i.e.,
not necessarily a single Bluespec rule)
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL4
6.884 UTL Discipline

ƒ	 Various forms of transaction-level model are becoming
increasingly used in commercial designs
ƒ	 UTL (Unit-Transaction Level) models are the variant we’ll use
in 6.884
ƒ	 UTL forces a discipline on top-level design structure that will
result in clean hardware designs that are easier to document
and verify, and which should lead to better physical designs
–	 A discipline restricts hardware designs, with the goal of
avoiding bad choices
ƒ	 UTL specs are not directly executable (yet), but could be
easily implemented in C/C++/Java/SystemC to give a golden
model for design verification
–	 Bluespec will often, but not always, be sufficient for UTL model
ƒ	 You’re required to give an initial UTL description (in English
text) of your project design by April 1 project milestone
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL5
UTL Overview

Unit comprises:
Transactions
Scheduler
Input
queues
Output
queues
Arch.
State
Unit
ƒ Architectural state (registers + RAMs)

ƒ Input queues and output queues connected to other units

ƒ Transactions (atomic operations on state and queues)

ƒ Scheduler (combinational function to pick next transaction to run)

6.884 – Spring 2005 Krste, 3/14/05 L14-UTL6
Unit Architectural State

Arch. 

State

ƒ	 Architectural state is any state that is visible to an
external agent
–	 i.e, architectural state can be observed by sending strings
of packets into input queues and looking at values returned
at outputs.
ƒ	 High-level specification of a unit only refers to
architectural state
ƒ Detailed implementation of a unit may have additional
microarchitectural state that is not visible externally
–	 Intra-transaction sequencing logic
–	 Pipeline registers
–	 Caches/buffers
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL7
Queues

ƒ Queues expose communication latency and decouple units’ execution
ƒ Queues are point-to-point channels only
–	 No fanout, a unit must replicate messages on multiple queues
–	 No buses in a UTL design (though implementation may use them)
ƒ	 Transactions can only pop head of input queues and push at most
one element onto each output queue
–	 Avoids exposing size of buffers in queues
–	 Also avoids synchronization inherent in waiting for multiple elements
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL8
Transactions

ƒ	 Transaction is a guarded atomic action on local state and
input and output queues
–	 Similar to Bluespec rule except a transaction might take a
variable number of cycles
ƒ	 Guard is a predicate that specifies when transaction can
execute
–	 Predicate is over architectural state and heads of input
queues
–	 Implicit conditions on input queues (data available) and
output queues (space available) that transaction accesses
ƒ	 Transaction can only pop up to one record from an input
queue and push up to one record on each output queue
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL9
Scheduler

Transactions
Scheduler
Input
queues
Output
queues
Arch.
State
Unit
ƒ	 Scheduling function decides on transaction priority based on local
state and state of input queues
– Simplest scheduler picks arbitrarily among ready transactions
ƒ	 Transactions may have additional predicates which indicate when
they can fire
– E.g., implicit condition on all necessary output queues being ready
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL10
UTL Example: IP Lookup

Packet
Input
Packet
Output
Queues
Lookup
Table
(Based on Lab 3 example)
Table
Access
Table
Replies
Transactions in decreasing scheduler priority
ƒ Table_Write (request on table access queue)
– Writes a given 12-bit value to a given 12-bit address
ƒ Table_Read (request on table access queue)
– Reads a 12-bit value given a 12-bit address, puts response on reply queue
ƒ Packet_Process (request on packet input queue)
– Looks up header in table and places routed packet on correct output queue
This level of detail is all the information we really need to understand what
the unit is supposed to do! Everything else is implementation.
6.884 – Spring 2005 Krste, 3/14/05 L14-UTL11
UTL & Architectural-Level Verification

ƒ	 Can easily develop a sequential golden model of a UTL
description (pick a unit with a ready transaction and
execute that sequentially)
ƒ	 This is not straightforward if design does not obey UTL
discipline
–	 Much more difficult if units not decoupled by point-to-point
queues, or semantics of multiple operations depends on which other
operations run concurrently
ƒ	 Golden model is important component in verification
strategy
–	 e.g., can generate random tests and compare candidate design’s
output against architectural golden model’s output
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL12
UTL Helps Physical Design

ƒ	 Restricting inter-unit communication to point-
to-point queues simplifies physical layout of
units
–	 Can add latency on link to accommodate wire delay
without changing control logic
ƒ	 Queues also decouple control logic

–	 No interaction between schedulers in different units
except via queue full/empty status
–	 Bluespec methods can cause arbitrarily deep chain of
control logic if units not decoupled correctly
ƒ	 Units can run at different rates

– E.g., use more time-multiplexing in unit with lower
throughput requirements or use different clock
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL13
Refining IP Lookup to RTL

Packet
Output
Queues
Lookup
RAM
(See also Lab 3 handout)
Table
Replies
Completion
Buffer
Recirculation
Pipeline
Packet
Input
Table
Access
ƒ	 The recirculation pipeline registers and the completion buffer
are microarchitectural state that should be invisible to
external units.
ƒ	 Implementation must ensure atomicity of transactions:
–	 Completion buffer ensures packets flow through unit in order
–	 Must also ensure table write doesn’t appear to happen in middle
of packet lookup, e.g., wait for pipeline to drain before
performing write
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL14
Non-Blocking Cache Example

CPU
Memory
Memory unit transactions:
Load<address, tag>
returns Reply<tag, data>
Store<address,data> modifies memory
Load replies can be out-of-order
–	 Spec should strictly split load transaction
in two and include additional architectural
state in memory unit as otherwise no way
for loads to get reordered. Omitted
here for clarity.
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL15
Refining UTL Design

DRAM
CPU
Cache
State
ƒ	 Memory unit implemented as
two communicating units,
Cache and DRAM
ƒ	 CPU’s view of Memory unit
unchanged
– i.e., the cache state should
not be visible to the CPU
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL16
A DRAM Unit

DRAM
DRAM Unit reads and writes whole
cache lines (four words) in order
Transactions:
ƒ LoadLine<addr> returns
RepLine<dataline> from DRAM
ƒ StoreLine<addr,dataline>
updates DRAM
6.884 – Spring 2005 Krste, 3/14/05 L14-UTL17
Non-Blocking Cache Unit

Data
Tags
Miss
Tags
Replay
Queues
Replay
State
Victim
Buffer
Victim Buffer holds evicted dirty
line awaiting writeback to
DRAM (writeback cache)
Miss Tags hold address of all
cache miss requests pending in
DRAM unit
Replay Queues hold secondary
misses for each miss tag
already requested from DRAM
Replay State holds state of any
active replay of a returned
cache line
6.884 – Spring 2005 Krste, 3/14/05 L14-UTL18
CPU Load Transaction

Load<addr,tag> (if miss tag and replay queue free)
if (cache hit on addr) then

update replacement policy state bits

return Reply<tag,data> to CPU

else
if (hit in miss tags) then
append request <R,tag,addr[1:0]> to associated Replay Queue
else
allocate new miss tag and append <R,tag,addr[1:0]> to Replay
Queue

send LoadLine<addr> to DRAM unit

select victim line according to replacement policy

if victim dirty then copy to victim buffer

invalidate victim’s in-cache tag

Replay Queue holds entries with tag and offset of requested word within
cache line (addr<1:0>)
6.884 – Spring 2005 Krste, 3/14/05 L14-UTL19
CPU Store Transaction

Store<addr,data> (if miss tag and replay queue free)
if (cache hit on addr) then

update replacement policy state bits

update cache data and set dirty bit on line

else
if (hit in miss tags) then
append request <W,addr[1:0],data> to associated Replay Queue
else
allocate new miss tag and append <W,addr[1:0],data> to Replay
Queue

send LoadLine<addr> to DRAM unit

select victim line according to replacement policy

if victim dirty then copy to victim buffer

invalidate victim’s in-cache tag

6.884 – Spring 2005 Krste, 3/14/05 L14-UTL20
Victim Writeback Transaction

(if buffered victim)
send StoreLine<victim.addr,victim.dataline> to DRAM unit
clear victim buffer
6.884 – Spring 2005 Krste, 3/14/05 L14-UTL21
DRAM Response Transactions

RepLine <dataline> /* Receive DRAM Response Transaction */
locate associated miss tag (allocated in circular order)
locate invalid line in destination cache set
overwrite victim tag and data with new line
initialize replay state with new line and replay queue
(if replay state valid) /* Replay Transaction */
read next replay queue entry
if <R,addr,tag>, read from line and send Reply<tag,data> to
CPU

if <W,addr,data> write data to line and set its dirty bit

if no more reply queue entries then

clear replay state
deallocate miss tags and replay queue (circular buffer)
6.884 – Spring 2005 Krste, 3/14/05 L14-UTL22
Cache Scheduler

Descending Priority
ƒ Replay
ƒ DRAM Response
ƒ Victim Writeback
ƒ CPU Load or Store
6.884 – Spring 2005 Krste, 3/14/05 L14-UTL23
Design Template for Pipelined Unit
scheduler
Arch.
State 1
Arch.
State 2
ƒ	 Scheduler only fires transaction when it can complete without stalls
–	 Avoids driving heavily loaded stall signals
ƒ	 Architectural state (and outputs) only written in one stage of pipeline,
only read in same or earlier stages
–	 Simplifies hazard detection/prevention
ƒ	 Have different transaction types access expensive units (RAM read
ports, shifters, multiply units) in same pipeline stage to reduce area
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL24
Skid Buffering

Sched. Tags Data
Sched. Tags Data
Sched. Tags Data
Stop further
loads/stores
Primary Miss #1
Primary Miss #2
ƒ	 Consider non-blocking cache implemented as a three stage
pipeline: (scheduler, tag access, data access)
ƒ	 CPU Load/Store not admitted into pipeline unless miss tag, reply
queue,and victim buffer available in case of miss
ƒ	 If hit/miss determined at end of Tags stage, then second miss
could enter pipeline
ƒ	 Solutions?
–	 Could only allow one load/store every two cycles => low throughput
–	 Skid buffering: Add additional victim buffer, miss tags, and replay
queues to complete following transaction if miss. Stall scheduler
whenever there is not enough space for two misses.
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL25
Implementing Communication Queues

ƒ	 Queue can be implemented as centralized FIFO with single
control FSM if both ends are close to each other and directly
connected:
Cntl.
ƒ	 In large designs, there may be several cycles of communication
latency from one end to other. This introduces delay both in
forward data propagation and in reverse flow control
Recv.
Send
ƒ	 Control split into send and receive portions. A credit-based
flow control scheme is often used to tell sender how many units
of data it can send before overflowing receivers buffer.
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL26
End-End Credit-Based Flow Control

Recv.
Send
ƒ	 For one-way latency of N cycles, need 2*N buffers at
receiver
–	 Will take at least 2N cycles before sender can be informed
that first unit sent was consumed (or not) by receiver
ƒ	 If receive buffer fills up and stalls communication, will
take N cycles before first credit flows back to sender to
restart flow
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL27
Distributed Flow Control

Cntl. Cntl. Cntl.
ƒ	 An alternative to end-end control is distributed
flow control (chain of FIFOs)
ƒ	 Lower restart latency after stalls
ƒ	 Can require more circuitry and can increase
end-end latency
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL28
Buses

Bus
Cntl.
Bus
Wires
ƒ Buses were popular board-level option for implementing
communication as they saved pins and wires
ƒ Less attractive on-chip as wires are plentiful and buses are
slow and cumbersome with central control
ƒ Often used on-chip when shrinking existing legacy system design
onto single chip
ƒ Newer designs moving to either dedicated point-point unit
communications or an on-chip network
6.884 – Spring 2005 Krste, 3/14/05 L14-UTL29
On-Chip Network

Router
Router Router
Router
ƒ On-chip network
multiplexes long range
wires to reduce cost
ƒ	 Routers use distributed
flow control to transmit
packets
ƒ	 Units usually need end-
end credit flow control
in addition because
intermediate buffering
in network is shared by
all units
6.884 – Spring 2005 Krste, 3/14/05	 L14-UTL30

More Related Content

Similar to l14_utl.pdf

Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
ijceronline
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-Express
DVClub
 

Similar to l14_utl.pdf (20)

Lab6 rtos
Lab6 rtosLab6 rtos
Lab6 rtos
 
Remote core locking (rcl)
Remote core locking (rcl)Remote core locking (rcl)
Remote core locking (rcl)
 
Implementation and evaluation of novel scheduler of UC/OS (RTOS)
Implementation and evaluation of novel scheduler of UC/OS (RTOS)Implementation and evaluation of novel scheduler of UC/OS (RTOS)
Implementation and evaluation of novel scheduler of UC/OS (RTOS)
 
Sqlnet
SqlnetSqlnet
Sqlnet
 
unit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxunit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptx
 
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
 
Ijecet 06 08_004
Ijecet 06 08_004Ijecet 06 08_004
Ijecet 06 08_004
 
Fpga & VHDL
Fpga & VHDLFpga & VHDL
Fpga & VHDL
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
EMEA Airheads- Switch stacking_ ArubaOS Switch
EMEA Airheads- Switch stacking_ ArubaOS SwitchEMEA Airheads- Switch stacking_ ArubaOS Switch
EMEA Airheads- Switch stacking_ ArubaOS Switch
 
Guidelines for-early-power-analysis
Guidelines for-early-power-analysisGuidelines for-early-power-analysis
Guidelines for-early-power-analysis
 
Presentation on Behavioral Synthesis & SystemC
Presentation on Behavioral Synthesis & SystemCPresentation on Behavioral Synthesis & SystemC
Presentation on Behavioral Synthesis & SystemC
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-Express
 
cs-procstruc.ppt
cs-procstruc.pptcs-procstruc.ppt
cs-procstruc.ppt
 
200923 01en
200923 01en200923 01en
200923 01en
 
Presentation systemc
Presentation systemcPresentation systemc
Presentation systemc
 
Xilinx timing closure
Xilinx timing closureXilinx timing closure
Xilinx timing closure
 
Rtos
RtosRtos
Rtos
 
Orbit GSM UMTS LTE parser platform - ETL tool
Orbit GSM UMTS LTE parser platform - ETL toolOrbit GSM UMTS LTE parser platform - ETL tool
Orbit GSM UMTS LTE parser platform - ETL tool
 
AAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and Optimization
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

l14_utl.pdf

  • 1. Transaction-Level Design 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL1
  • 2. Hardware Design Abstraction Levels Algorithm Circuits Application Guarded Atomic Actions (Bluespec) Devices Unit-Transaction Level (UTL) Model Gates Physics Register-Transfer Level (Verilog RTL) Today’s Lecture 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL2
  • 3. Application to RTL in One Step? Modern hardware systems have complex functionality (graphics chips, video encoders, wireless communication channels), but sometimes designers try to map directly to an RTL cycle-level microarchitecture in one step ƒ Requires detailed cycle-level design of each sub-unit – Significant design effort required before clear if design will meet goals ƒ Interactions between units becomes unclear if arbitrary circuit connections allowed between units, with possible cycle-level timing dependencies – Increases complexity of unit specifications ƒ Removes degrees of freedom for unit designers – Reduces possible space for architecture exploration ƒ Difficult to document intended operation, therefore difficult to verify 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL3
  • 4. Transaction-Level Design Arch. Arch. State State Arch. State Unit 1 Unit 2 Unit 3 Shared Memory Unit ƒ Model design as messages flowing through FIFO buffers between units containing architectural state ƒ Each unit can independently perform an operation, or transaction, that may consume messages, update local state, and send further messages ƒ Transaction and/or communication might take many cycles (i.e., not necessarily a single Bluespec rule) 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL4
  • 5. 6.884 UTL Discipline ƒ Various forms of transaction-level model are becoming increasingly used in commercial designs ƒ UTL (Unit-Transaction Level) models are the variant we’ll use in 6.884 ƒ UTL forces a discipline on top-level design structure that will result in clean hardware designs that are easier to document and verify, and which should lead to better physical designs – A discipline restricts hardware designs, with the goal of avoiding bad choices ƒ UTL specs are not directly executable (yet), but could be easily implemented in C/C++/Java/SystemC to give a golden model for design verification – Bluespec will often, but not always, be sufficient for UTL model ƒ You’re required to give an initial UTL description (in English text) of your project design by April 1 project milestone 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL5
  • 6. UTL Overview Unit comprises: Transactions Scheduler Input queues Output queues Arch. State Unit ƒ Architectural state (registers + RAMs) ƒ Input queues and output queues connected to other units ƒ Transactions (atomic operations on state and queues) ƒ Scheduler (combinational function to pick next transaction to run) 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL6
  • 7. Unit Architectural State Arch. State ƒ Architectural state is any state that is visible to an external agent – i.e, architectural state can be observed by sending strings of packets into input queues and looking at values returned at outputs. ƒ High-level specification of a unit only refers to architectural state ƒ Detailed implementation of a unit may have additional microarchitectural state that is not visible externally – Intra-transaction sequencing logic – Pipeline registers – Caches/buffers 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL7
  • 8. Queues ƒ Queues expose communication latency and decouple units’ execution ƒ Queues are point-to-point channels only – No fanout, a unit must replicate messages on multiple queues – No buses in a UTL design (though implementation may use them) ƒ Transactions can only pop head of input queues and push at most one element onto each output queue – Avoids exposing size of buffers in queues – Also avoids synchronization inherent in waiting for multiple elements 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL8
  • 9. Transactions ƒ Transaction is a guarded atomic action on local state and input and output queues – Similar to Bluespec rule except a transaction might take a variable number of cycles ƒ Guard is a predicate that specifies when transaction can execute – Predicate is over architectural state and heads of input queues – Implicit conditions on input queues (data available) and output queues (space available) that transaction accesses ƒ Transaction can only pop up to one record from an input queue and push up to one record on each output queue 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL9
  • 10. Scheduler Transactions Scheduler Input queues Output queues Arch. State Unit ƒ Scheduling function decides on transaction priority based on local state and state of input queues – Simplest scheduler picks arbitrarily among ready transactions ƒ Transactions may have additional predicates which indicate when they can fire – E.g., implicit condition on all necessary output queues being ready 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL10
  • 11. UTL Example: IP Lookup Packet Input Packet Output Queues Lookup Table (Based on Lab 3 example) Table Access Table Replies Transactions in decreasing scheduler priority ƒ Table_Write (request on table access queue) – Writes a given 12-bit value to a given 12-bit address ƒ Table_Read (request on table access queue) – Reads a 12-bit value given a 12-bit address, puts response on reply queue ƒ Packet_Process (request on packet input queue) – Looks up header in table and places routed packet on correct output queue This level of detail is all the information we really need to understand what the unit is supposed to do! Everything else is implementation. 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL11
  • 12. UTL & Architectural-Level Verification ƒ Can easily develop a sequential golden model of a UTL description (pick a unit with a ready transaction and execute that sequentially) ƒ This is not straightforward if design does not obey UTL discipline – Much more difficult if units not decoupled by point-to-point queues, or semantics of multiple operations depends on which other operations run concurrently ƒ Golden model is important component in verification strategy – e.g., can generate random tests and compare candidate design’s output against architectural golden model’s output 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL12
  • 13. UTL Helps Physical Design ƒ Restricting inter-unit communication to point- to-point queues simplifies physical layout of units – Can add latency on link to accommodate wire delay without changing control logic ƒ Queues also decouple control logic – No interaction between schedulers in different units except via queue full/empty status – Bluespec methods can cause arbitrarily deep chain of control logic if units not decoupled correctly ƒ Units can run at different rates – E.g., use more time-multiplexing in unit with lower throughput requirements or use different clock 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL13
  • 14. Refining IP Lookup to RTL Packet Output Queues Lookup RAM (See also Lab 3 handout) Table Replies Completion Buffer Recirculation Pipeline Packet Input Table Access ƒ The recirculation pipeline registers and the completion buffer are microarchitectural state that should be invisible to external units. ƒ Implementation must ensure atomicity of transactions: – Completion buffer ensures packets flow through unit in order – Must also ensure table write doesn’t appear to happen in middle of packet lookup, e.g., wait for pipeline to drain before performing write 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL14
  • 15. Non-Blocking Cache Example CPU Memory Memory unit transactions: Load<address, tag> returns Reply<tag, data> Store<address,data> modifies memory Load replies can be out-of-order – Spec should strictly split load transaction in two and include additional architectural state in memory unit as otherwise no way for loads to get reordered. Omitted here for clarity. 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL15
  • 16. Refining UTL Design DRAM CPU Cache State ƒ Memory unit implemented as two communicating units, Cache and DRAM ƒ CPU’s view of Memory unit unchanged – i.e., the cache state should not be visible to the CPU 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL16
  • 17. A DRAM Unit DRAM DRAM Unit reads and writes whole cache lines (four words) in order Transactions: ƒ LoadLine<addr> returns RepLine<dataline> from DRAM ƒ StoreLine<addr,dataline> updates DRAM 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL17
  • 18. Non-Blocking Cache Unit Data Tags Miss Tags Replay Queues Replay State Victim Buffer Victim Buffer holds evicted dirty line awaiting writeback to DRAM (writeback cache) Miss Tags hold address of all cache miss requests pending in DRAM unit Replay Queues hold secondary misses for each miss tag already requested from DRAM Replay State holds state of any active replay of a returned cache line 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL18
  • 19. CPU Load Transaction Load<addr,tag> (if miss tag and replay queue free) if (cache hit on addr) then update replacement policy state bits return Reply<tag,data> to CPU else if (hit in miss tags) then append request <R,tag,addr[1:0]> to associated Replay Queue else allocate new miss tag and append <R,tag,addr[1:0]> to Replay Queue send LoadLine<addr> to DRAM unit select victim line according to replacement policy if victim dirty then copy to victim buffer invalidate victim’s in-cache tag Replay Queue holds entries with tag and offset of requested word within cache line (addr<1:0>) 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL19
  • 20. CPU Store Transaction Store<addr,data> (if miss tag and replay queue free) if (cache hit on addr) then update replacement policy state bits update cache data and set dirty bit on line else if (hit in miss tags) then append request <W,addr[1:0],data> to associated Replay Queue else allocate new miss tag and append <W,addr[1:0],data> to Replay Queue send LoadLine<addr> to DRAM unit select victim line according to replacement policy if victim dirty then copy to victim buffer invalidate victim’s in-cache tag 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL20
  • 21. Victim Writeback Transaction (if buffered victim) send StoreLine<victim.addr,victim.dataline> to DRAM unit clear victim buffer 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL21
  • 22. DRAM Response Transactions RepLine <dataline> /* Receive DRAM Response Transaction */ locate associated miss tag (allocated in circular order) locate invalid line in destination cache set overwrite victim tag and data with new line initialize replay state with new line and replay queue (if replay state valid) /* Replay Transaction */ read next replay queue entry if <R,addr,tag>, read from line and send Reply<tag,data> to CPU if <W,addr,data> write data to line and set its dirty bit if no more reply queue entries then clear replay state deallocate miss tags and replay queue (circular buffer) 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL22
  • 23. Cache Scheduler Descending Priority ƒ Replay ƒ DRAM Response ƒ Victim Writeback ƒ CPU Load or Store 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL23
  • 24. Design Template for Pipelined Unit scheduler Arch. State 1 Arch. State 2 ƒ Scheduler only fires transaction when it can complete without stalls – Avoids driving heavily loaded stall signals ƒ Architectural state (and outputs) only written in one stage of pipeline, only read in same or earlier stages – Simplifies hazard detection/prevention ƒ Have different transaction types access expensive units (RAM read ports, shifters, multiply units) in same pipeline stage to reduce area 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL24
  • 25. Skid Buffering Sched. Tags Data Sched. Tags Data Sched. Tags Data Stop further loads/stores Primary Miss #1 Primary Miss #2 ƒ Consider non-blocking cache implemented as a three stage pipeline: (scheduler, tag access, data access) ƒ CPU Load/Store not admitted into pipeline unless miss tag, reply queue,and victim buffer available in case of miss ƒ If hit/miss determined at end of Tags stage, then second miss could enter pipeline ƒ Solutions? – Could only allow one load/store every two cycles => low throughput – Skid buffering: Add additional victim buffer, miss tags, and replay queues to complete following transaction if miss. Stall scheduler whenever there is not enough space for two misses. 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL25
  • 26. Implementing Communication Queues ƒ Queue can be implemented as centralized FIFO with single control FSM if both ends are close to each other and directly connected: Cntl. ƒ In large designs, there may be several cycles of communication latency from one end to other. This introduces delay both in forward data propagation and in reverse flow control Recv. Send ƒ Control split into send and receive portions. A credit-based flow control scheme is often used to tell sender how many units of data it can send before overflowing receivers buffer. 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL26
  • 27. End-End Credit-Based Flow Control Recv. Send ƒ For one-way latency of N cycles, need 2*N buffers at receiver – Will take at least 2N cycles before sender can be informed that first unit sent was consumed (or not) by receiver ƒ If receive buffer fills up and stalls communication, will take N cycles before first credit flows back to sender to restart flow 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL27
  • 28. Distributed Flow Control Cntl. Cntl. Cntl. ƒ An alternative to end-end control is distributed flow control (chain of FIFOs) ƒ Lower restart latency after stalls ƒ Can require more circuitry and can increase end-end latency 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL28
  • 29. Buses Bus Cntl. Bus Wires ƒ Buses were popular board-level option for implementing communication as they saved pins and wires ƒ Less attractive on-chip as wires are plentiful and buses are slow and cumbersome with central control ƒ Often used on-chip when shrinking existing legacy system design onto single chip ƒ Newer designs moving to either dedicated point-point unit communications or an on-chip network 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL29
  • 30. On-Chip Network Router Router Router Router ƒ On-chip network multiplexes long range wires to reduce cost ƒ Routers use distributed flow control to transmit packets ƒ Units usually need end- end credit flow control in addition because intermediate buffering in network is shared by all units 6.884 – Spring 2005 Krste, 3/14/05 L14-UTL30