The physical design flow begins with placement which involves assigning exact locations to modules like gates and standard cells to minimize area and interconnect cost while meeting timing constraints, with the goal of enabling easier routing; placement tools take as input the netlist, floorplan, libraries, and constraints to perform global and detailed placement as well as optimization. The quality of placement significantly impacts the ability to route the design successfully.
Timing and Design Closure in Physical Design Flows Olivier Coudert
A physical design flow consists of producing a production-worthy layout from a gate-level netlist subject to a set of constraints. We focus on the problems imposed by shrinking process technologies. It exposes the problems of timing closure, signal integrity, design variable dependencies, clock and power/ground routing, and design signoff. It also surveys some physical design flows, and outlines a refinement-based flow.
Define Width and Height of Core and Die (http://www.vlsisystemdesign.com/PD-F...VLSI SYSTEM Design
https://www.udemy.com/vlsi-academy
The very first step in chip design is floorplanning, in which the width and height of the chip, basically the area of the chip, is defined. A chip consists of two parts, 'core' and 'die'.
Timing and Design Closure in Physical Design Flows Olivier Coudert
A physical design flow consists of producing a production-worthy layout from a gate-level netlist subject to a set of constraints. We focus on the problems imposed by shrinking process technologies. It exposes the problems of timing closure, signal integrity, design variable dependencies, clock and power/ground routing, and design signoff. It also surveys some physical design flows, and outlines a refinement-based flow.
Define Width and Height of Core and Die (http://www.vlsisystemdesign.com/PD-F...VLSI SYSTEM Design
https://www.udemy.com/vlsi-academy
The very first step in chip design is floorplanning, in which the width and height of the chip, basically the area of the chip, is defined. A chip consists of two parts, 'core' and 'die'.
This is the presentation that was shared by Nilesh Ranpura and Vineeth Mathramkote at CDNLIVE 2015. The session briefs about the implementation challenges and covers the solution approach and how to achieve results
This is a custom GUI, which eases fixing violations either by adding buffer, cloning or sizing. Drop down menu item is created in ICC2 layout window. Desired terminals can be selected by dragging or adding points in rectilinear fashion and desired locations can be selected for adding new buffer.
Formal equivalence checking process is a part of electronic design automation (EDA), commonly used during the development of digital integrated circuits, to formally prove that two representations of a circuit design exhibit exactly the same behavior.
Clock Tree Synthesis is a technique for distributing the clock equally among all sequential parts of a VLSI design. The purpose of Clock Tree Synthesis is to reduce skew and delay. Clock Tree Synthesis is provided the placement data as well as the clock tree limitations as input. Clock Tree Synthesis (CTS) is the technique of balancing the clock delay to all clock inputs by inserting buffers/inverters along the clock routes of an ASIC design. As a result, CTS is used to balance the skew and reduce insertion latency. Before Clock Tree Synthesis, all clock pins were driven by a single clock source. Clock tree synthesis includes both clock tree construction and clock tree balance. Clock tree inverters may be used to create a clock tree that maintains the correct transition (duty cycle), and clock tree buffers (CTB) can balance the clock tree to fulfil the skew and latency requirements. To fulfil the space and power limits, fewer clock tree inverters and buffers should be employed.
https://www.udemy.com/vlsi-academy
Usually, while drawing any circuit on paper, we have only one 'vdd' at the top and one 'vss' at the bottom. But on a chip, it becomes necessary to have a grid structure of power, with more than one 'vdd' and 'vss'. The concept of power grid structure would be uploaded soon. It is actually the scaling trend that drives chip designers for power grid structure.
VLSI Physical Design Flow(http://www.vlsisystemdesign.com)VLSI SYSTEM Design
Learning becomes Fun..
When tedious & difficult topics like Chip Design are explained in simple n creative videos....https://www.udemy.com/vlsi-academy
A typical design flow follows the below structure and can be broken down into multiple steps. Some of these phases happen in parallel and some in sequentially.
Requirements
A customer of a semiconductor firm is typically some other company who plans to use the chip in its systems or end products. So, the customer's requirements also play an important role in deciding how the chip should be designed.
The first step is to collect the requirements, estimate the end product's market value, and evaluate the number of resources required to do the project.
Specifications
The next step is to collect specifications that describe the functionality, interface abstractly, and over all architecture of the chip to be designed. This can be something along the lines such as:
Play
Next
Unmute
Current TimeÂ
0:00
/
DurationÂ
18:10
Â
Fullscreen
Backward Skip 10s
Play Video
Forward Skip 10s
Requires computational power to run imaging algorithms to support virtual reality.
Requires two ARM A53 processors with coherent interconnect and should run at 600 MHz.
Requires USB 3.0, Bluetooth, and PCIe 2nd gen interfaces.
It should support 1920x1080 pixel displays with an appropriate controller.
Digital Design
Because of the complex nature of modern chips, it's impossible to build something from scratch, and in many cases, many components will be reused.
For example, company A requires a FlexCAN module to interact with other modules in an automobile. They can either buy the FlexCAN design from another company to save time and effort or spend resources to build one.
It's not practical to design such a system from basic building blocks such as flip-flops and CMOS transistors.
Instead, a behavioral description is developed to analyze the design in terms of functionality, performance, and other high-level issues using a Hardware Description Language such as Verilog or VHDL.
This is usually done by a digital designer and is similar to a high-level computer programmer equipped with digital electronics skills.
Verification
Once the RTL design is ready, it needs to be verified for functional correctness.
For example, a DSP processor is expected to issue bus transactions with fetching instructions from memory and know that this will happen as expected.
The functional verification is required at this point, which is done with EDA simulators' help that can model the design and apply a different stimulus to it. This is the job of a pre-silicon verification engineer.
Logic Synthesis
Now we will convert this design into hardware schematic with real elements such as combinational gates and flip-flops. This step is called synthesis.
Logic synthesis tools enable the conversion of RTL description in HDL to a gate-level netlist. This netlist is a description of the circuit in terms of gates and connections between them.
Logic synthesis tools ensure that the netlist meets timing, area, and power specifications. Typically, they have access to different technology node
ASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGYIlango Jeyasubramanian
- Designed and analyzed a complete MSDAP with optimized convolution computation by only shifts and adds using power-of-2 coefficients. Synthesized the chip through high level architecture design (C Program), Logic synthesis (Synopsys Design Compiler) and Physical Synthesis (Synopsys IC compiler).
- Achieved a low power consumption of 3.1438mW at 29.186Mhz clock frequency, with core utilization of 70% and chip area of 1.29mm2.
This is the presentation that was shared by Nilesh Ranpura and Vineeth Mathramkote at CDNLIVE 2015. The session briefs about the implementation challenges and covers the solution approach and how to achieve results
This is a custom GUI, which eases fixing violations either by adding buffer, cloning or sizing. Drop down menu item is created in ICC2 layout window. Desired terminals can be selected by dragging or adding points in rectilinear fashion and desired locations can be selected for adding new buffer.
Formal equivalence checking process is a part of electronic design automation (EDA), commonly used during the development of digital integrated circuits, to formally prove that two representations of a circuit design exhibit exactly the same behavior.
Clock Tree Synthesis is a technique for distributing the clock equally among all sequential parts of a VLSI design. The purpose of Clock Tree Synthesis is to reduce skew and delay. Clock Tree Synthesis is provided the placement data as well as the clock tree limitations as input. Clock Tree Synthesis (CTS) is the technique of balancing the clock delay to all clock inputs by inserting buffers/inverters along the clock routes of an ASIC design. As a result, CTS is used to balance the skew and reduce insertion latency. Before Clock Tree Synthesis, all clock pins were driven by a single clock source. Clock tree synthesis includes both clock tree construction and clock tree balance. Clock tree inverters may be used to create a clock tree that maintains the correct transition (duty cycle), and clock tree buffers (CTB) can balance the clock tree to fulfil the skew and latency requirements. To fulfil the space and power limits, fewer clock tree inverters and buffers should be employed.
https://www.udemy.com/vlsi-academy
Usually, while drawing any circuit on paper, we have only one 'vdd' at the top and one 'vss' at the bottom. But on a chip, it becomes necessary to have a grid structure of power, with more than one 'vdd' and 'vss'. The concept of power grid structure would be uploaded soon. It is actually the scaling trend that drives chip designers for power grid structure.
VLSI Physical Design Flow(http://www.vlsisystemdesign.com)VLSI SYSTEM Design
Learning becomes Fun..
When tedious & difficult topics like Chip Design are explained in simple n creative videos....https://www.udemy.com/vlsi-academy
A typical design flow follows the below structure and can be broken down into multiple steps. Some of these phases happen in parallel and some in sequentially.
Requirements
A customer of a semiconductor firm is typically some other company who plans to use the chip in its systems or end products. So, the customer's requirements also play an important role in deciding how the chip should be designed.
The first step is to collect the requirements, estimate the end product's market value, and evaluate the number of resources required to do the project.
Specifications
The next step is to collect specifications that describe the functionality, interface abstractly, and over all architecture of the chip to be designed. This can be something along the lines such as:
Play
Next
Unmute
Current TimeÂ
0:00
/
DurationÂ
18:10
Â
Fullscreen
Backward Skip 10s
Play Video
Forward Skip 10s
Requires computational power to run imaging algorithms to support virtual reality.
Requires two ARM A53 processors with coherent interconnect and should run at 600 MHz.
Requires USB 3.0, Bluetooth, and PCIe 2nd gen interfaces.
It should support 1920x1080 pixel displays with an appropriate controller.
Digital Design
Because of the complex nature of modern chips, it's impossible to build something from scratch, and in many cases, many components will be reused.
For example, company A requires a FlexCAN module to interact with other modules in an automobile. They can either buy the FlexCAN design from another company to save time and effort or spend resources to build one.
It's not practical to design such a system from basic building blocks such as flip-flops and CMOS transistors.
Instead, a behavioral description is developed to analyze the design in terms of functionality, performance, and other high-level issues using a Hardware Description Language such as Verilog or VHDL.
This is usually done by a digital designer and is similar to a high-level computer programmer equipped with digital electronics skills.
Verification
Once the RTL design is ready, it needs to be verified for functional correctness.
For example, a DSP processor is expected to issue bus transactions with fetching instructions from memory and know that this will happen as expected.
The functional verification is required at this point, which is done with EDA simulators' help that can model the design and apply a different stimulus to it. This is the job of a pre-silicon verification engineer.
Logic Synthesis
Now we will convert this design into hardware schematic with real elements such as combinational gates and flip-flops. This step is called synthesis.
Logic synthesis tools enable the conversion of RTL description in HDL to a gate-level netlist. This netlist is a description of the circuit in terms of gates and connections between them.
Logic synthesis tools ensure that the netlist meets timing, area, and power specifications. Typically, they have access to different technology node
ASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGYIlango Jeyasubramanian
- Designed and analyzed a complete MSDAP with optimized convolution computation by only shifts and adds using power-of-2 coefficients. Synthesized the chip through high level architecture design (C Program), Logic synthesis (Synopsys Design Compiler) and Physical Synthesis (Synopsys IC compiler).
- Achieved a low power consumption of 3.1438mW at 29.186Mhz clock frequency, with core utilization of 70% and chip area of 1.29mm2.
System on Chip is a an IC that integrates all the components of an electronic system. This presentation is based on the current trends and challenges in the IP based SOC design.
CETPA INFOTECH PVT LTD is one of the IT education and training service provider brands of India that is preferably working in 3 most important domains. It includes IT Training services, software and embedded product development and consulting services.
http://www.cetpainfotech.com
CETPA INFOTECH PVT LTD is one of the IT education and training service provider brands of India that is preferably working in 3 most important domains. It includes IT Training services, software and embedded product development and consulting services.
Visit https://www.vlsiuniverse.com/
https://www.vlsiuniverse.com/2020/05/complete-asic-design-flow.html
This is the standard VLSI design flow that every semiconductor company follows. The complete ASIC design flow is explained by considering each and every stage.
For PCB Board Cloning Service, you can get in touch with us. Our PCB Copy Service is quite popular in the electronics industry. It will allow you to achieve you business goals and make the most out of your available resources. Our Printed Circuit Board Clone is quite popular in the industry and will allow you to achieve the required success in life. The Electronic PCB Board Cloning service will also allow you to make things a lot easier for you. The Circuit Board Cloning service will also make it easier for you to get things done in the most perfect way
2. Agenda
Introduction to design flow and Backend
Introduction to design planning
Floorplanning / Hierarchical design
Power planning
P l i
Summary
3.
4.
5.
6.
7.
8.
9.
10.
11. Agenda
Introduction to design flow and Backend
Introduction to design planning
Floorplanning / Hierarchical design
Power planning
P l i
Summary
12. The Physical Design Task
Physical Design
Verilog netlist Flow
GDSII
SDC constraints
Front End Back End
13. Example Physical Design Flow
Design/Constraints Import
Floorplanning
p g
Placement
Clock Tree S th i
Cl k T Synthesis
Routing
Post Route Optimization
Layout Verification / Finishing
14. Fullchip Design Overview
Core placement
area
The location of the core,
I/O areas P/G pads and
the P/G grid
RAM
IP
Rings
P/G ROM
Grid
Straps
Periphery
(I/O) area
15. Where Do We Start? - Design
Planning
Verilog netlist Physical Design
Flow
How do we handle?
SDC constraints
Die size
IO / Hard-IP placement
Global clock distribution
Power planning
P l i
Flat versus hierarchical design
16. Design Planning
Floorplanning
Determine die size
Shape and arrange hierarchical blocks
Integrate hard-IP efficiently
Predict and prevent congestion hotspots and critical timing
paths
Power planning
Create power distribution grid
Consider IR drop and Electromigration
Implement power saving techniques
Power gating
g g
Multi-Voltage design / Voltage islands
17. Agenda
Introduction to design planning
Floorplanning
p g
Setup/configuration
Die size, utilization, metallization scheme
size utilization
IO-ring and macro placement
Flat versus hierarchical design
Hierarchical design planning issues
Power planning
Summary
18. Setup/configuration
S t / fi ti
check netlist
Read netlist
High fanout
Read SDC Unique
U i
Read .lib files Unconnected inputs
Standard cell area
Read footprint for P&R
p
Check timing ith t i load
Ch k ti i without wire l d
LEF : SOC encounter
Fram : Synopsys tools
Read technology file
Metal width … (DRC
rules)
19. Floorplanning – Die Size
Size,
Utilization & Metal Stack-up
Choosing the die size, initial standard cell utilization and
metallization scheme involves several design tradeoffs (
Schedule, Cost, Performance)
Larger die
Easier to route, less congestion, lower cap (decrease
signal/power integrity related problems) faster design
problems),
cycle
Higher cost, higher power
More d
M dense power grid id
Reduce risk of power related failures
Increase number of metal layer masks, reduce signal
route tracks
21. Floorplanning – Utilization
Utilization refers to the percentage of core area that is taken
up by standard cells.
A typical starting utilization might be 70%
This can very a lot depending on the design
High utilization can make it difficult to close a design
Routing congestion,
Negative impact during optimization legalization stages.
Utilization changes should be examined after each stage of
g g
the flow
Avoid having large increases after placement optimization
Feedback should be given to front-end designers
front end
Topographical synthesis is now possible
22. Initialize Floorplan
Define globals (VDD1,VDD2,GND1,….)
Define
D fi core area : ( ll + utilization f
(cells ili i factor)
)
IO [
[Analog] macro
g]
core
core
IO
Shape can be implied by a macro
Place IO (fixed, equidistant,..)
Take macro’s and power domains into
account already
23. IO Ring and Large Macro
Placement
IO Ring is often decided by front-end designers, with input from
physical design and packaging engineers.
When placing large macros we must consider impacts on routing,
timing and power.
For wire-bond place power hungry macros away from the chip
center. Possible routing
congestion hotspots
24. Flat Versus Hierarchical
Design
What happens if the design is too big to be
handled by the EDA tools?
y
Hierarchical Design
Fullchip Design I/O Pad
IP Macro
Blk 1 Blk 2 Blk 3
Block / Tile
P&R P&R P&R
Flow Flow Flow
Fullchip Timing &
Verification
25. Flat Versus Hierarchical
Design
Hierarchical Design
Advantages
Faster runtime, less memory needed for EDA tools
Faster eco turn-around time
Ability to do design re use
re-use
Disadvantages
Much more difficult for fullchip timing closure
(ILMs)
More intensive design planning needed,
feedthrough generation repeater insertion timing
generation, insertion,
constraint budgeting.
26. Hierarchical Design : Specify
Partitions / Plan Groups
Netlist must have partitions as top level modules.
Partitions generally sized according to a target initial utilization
~70% utilization, ~300k-700k instances
Channels or abutment
Ch l b t t
Rectilinear block shapes are possible Abutment
Channels
Rectilinear
Blocks
27. Hierarchical Design : Pin
Assignment
Pin constraints include parameters such as,
Pin guide 1
Layers, spacing, size, overlap
Net groups, pin guides
Pin guide 2
Pins can be assigned placement-based
placement based
(flightlines) or route-based (trial route,
boundary crossings). Partition
Pin guides can be used to influence automatic
pin placement of particular net groups
Pins at partition
corners can make
routing diffi lt
ti difficult
28. Hierarchical Design : Timing
Budgeting
Chip level constraints must be mapped correctly to block
level constraints
The d i
Th design must b placed, t i l routed and h
t be l d trial t d d have pinsi
assigned before running budgeting
Block level constraints will be assigned input or output
delays on I/O ports based off of the estimated timing
slack.
IN1 set_input_delay 1.5 get port
set input delay 1 5 [ get_port IN1 ]
1.5ns
Block Boundary
29. Hierarchical Design : Timing
Budgeting & Fullchip Timing Closure
Fullchip timing closure is typically a bottleneck for design cycles.
Block-level P&R flow does not emphasize io-to-flop, flop-to-io, io-to-io
timing paths because budgeted constraints are only estimates
paths,
Interface logic models (ILMs) can be used
To speed-up timing analysis runs when fullchip design is too large.
Required clock and datapaths are p
q p preserved, net/cell names are
,
identical
A X A X
B Y B Y
Clk
Clk
Original Netlist Interface Logic Model (ILM)
30. Agenda
Introduction to design planning
Floorplanning
Power planning
Intro to power issues in IC design
Basic power grid creation
Multi-voltage
Multi voltage design & power gating
Automated power grid design flows
Summary
31. Power Consumption and Reliability
Dynamic Power IR Drop
IR-Drop /
Voltage Drop
Average Power
p ob e
problem
Static Power Fail
(Leakage Power)
Electromigration
Power density (EM)
Floorplan problem in the
+ Long run
Design of the grid
1 out of 5 chips fail due to excessive power consumption
32. Power Consumption and Reliability :
IR-Drop
The drop in supply voltage over the length of the supply line
A resistance matrix of the power grid is constructed
The average current of each g
g gate is considered
The matrix is solved for the current at each node, to determine
the IR-drop.
VDD Pad VDD
33. Where does the all power go
to?
Total Power
Core + I/O
•Separate supply ring
•Often higher voltage
•Fixed, no optimization
Standard Cells + Macros
•Clock network
34. Agenda
Introduction to design planning
Floorplanning
Power planning
Intro to power issues in IC design
p g
Basic power grid creation
Multi-voltage design & power gating
Automated power grid design flows
Summary
35. Power Grid Creation : Macro
Placement
Blocks with the
highest
performance and
highest power
consumption
Close to border power
pads (IR drop)
Away from each other
(EM)
36. Agenda
Introduction to design planning
Floorplanning
Power planning
Intro to power issues in IC design
p g
Basic power grid creation
Multi-voltage design & power gating
Automated power grid design flows
Summary
37. Agenda
Introduction to design planning
Floorplanning
p g
Power planning
Intro to power issues in IC design
Basic power grid creation
Multi-voltage design & p
g g power g
gating
g
Automated power grid design flows
Summary
y
38. Automated Power Grid Design:
PNS & PNA
Power grid creation has usually done by hand using
rules of thumb for widths and number of straps
Analysis often done late in the design flow
Grid is typically over-designed to prevent time-
intensive power grid changes.
When incorporating advanced low-power strategies,
there are too many variables to achieve an optimal result
manually.
For more complex designs an automated strategy is
preferred.
e.g
e g Power Network Synthesis (PNS) and Power
Network Analysis (PNA) from Synopsys
Allows designers to anticipate affects of floorplanning
39. Power Network Analysis (PNA)
P N t kA l i
There are EDA tools that allow early power network
analysis for designs in the early floorplaning stage.
Not i
N t signoff quality, b t good enough f i iti l d i
ff lit but d h for initial design.
e.g. Synopsys Power Network Analysis (PNA)
VDD Pad VDD
40. Power Network Synthesis:
PNS – What?
Goal is to QUICKLY find minimum routing resource
required to meet specified IR drop target
More power routing => easier to reach IR-drop
target, but harder to route clock and signals with
remaining tracks
Power straps
(in Red)
Power pads
Power trunks
Power rings
42. PNS : C t P
Create Power R ti
Routing
After running trials, an optimal p
g , p power g can be chosen and the
grid
actual rails can be laid out.
Virtual rails => actual rails
Outside main PNS : memory footprint + cpu time
Many options : eg. % Via penetration , order of routing …
Check legal cell/pin placement (grid aligned ?)
Depending on the design p
p g g phase
What cells, nets and layers
eg. First macros and pads, then high voltage areas, …
Seco da y G ports on e e shftrs, so cells, et egs
Secondary PG po ts o level s t s, isol. ce s, ret. Regs
Later after placement during routing : same as the follow pins for
the normal vdd and gnd of the std cells.
44. Summary
The goal of design p
g g planning is to arrange the chip so that the “Place and
g g p
Route” flow can converge quickly and easily.
Design experience is needed
Floorplan is driven by :
Power
P
Timing
Congestion
Minimum area
There is no 1 way to create a floorplan
Flat – hierarchical
Regions, p
g , position of the macro’s
Order of placement IO versus macros versus core
This phase can take a significant portion of the complete backend design
time.
Early
E l analysis of power grid i essential f avoiding major problems near
l i f id is ti l for idi j bl
the end of the design cycle.
Automated power grid tools may help reduce necessary safety margins.
46. Placement in the Flow
Design Specification
Front-End
d
Logic Design and Verification
F
Logic Synthesis
Physical
Libraries
Floorplanning
ack-End
Physical
Netlist Placement Design
Stage
g
Ba
Routing
Physical Design
Constraints
47. Definition f Placement
D fi iti of Pl t
Placement : Exact placement of the
modules (modules can be gates, standard
cells, macros…).
cells macros ) The general goal is to
minimize the total area and interconnect
cost.
cost
The quality of the attainable routing is highly
determined b th placement.
d t i d by the l t
Circuit placement becomes very critical in 90nm
and below technologies.
48. Cost Function for Placement
C tF ti f Pl t
Cost components Methods of consideration
Area
Wire length Traditional methods of Placement
Overlap
Timing Timing-driven
Timing driven Placement
Congestion Congestion-driven Placement
Clock Clock Gating
Power Multivoltage and Multisupply Placement
49. Placement Steps
p
Input information:
Netlist
Mapped and floorplanned design
Logical and physical libraries
Design constraints
Reading Gate level netlists from synthesis
Gate-level
Global placement
Detailed l
D il d placement
Placement optimization
Output information:
Physical layout information
Cell placement locations
Physical layout timing and technology information of reference libraries
layout, timing,
50. Inputs for the Placement Tool
Gate-level netlist
Design
constraints Logical
Target
Placement
Design libraries tool
Physical
Macro cell
Reference
Floorplanned Standard cell
design
Technology file
51. Inside A Physical Library
MACRO AN2D0
Example
CLASS CORE ;
FOREIGN AN2D0 0.000 0.000 ;
ORIGIN 0.000 0.000 ;
.lef
l f VDD
Dimension
“bounding box”
SIZE 1.400 BY 2.520 ;
A B
SYMMETRY x y ;
SITE core ; Blockage
PIN Z Pins
ANTENNADIFFAREA 0.1680 ; (direction, layer
DIRECTION OUTPUT ;
PORT Symmetry Y and shape)
LAYER M1 ; (X, Y, or 90º) F
RECT 1.300 0.640 1.330 1.675 ;
NAND_1
RECT 1.190 0.640 1.300 1.780 ; GND
RECT 1.140 0.640 1.190 0.900 ;
reference point Abstract View
RECT 1.140 1.520 1.190 1.780 ;
END
(typically 0,0)
END Z
PIN A2
ANTENNAGATEAREA 0.0704 ;
DIRECTION INPUT ;
PORT
LAYER M1 ;
RECT 0.610 0.975 0.770 1.545 ;
END
…
52. Technology I f
T h l Information
ti
For each tool, a specific set of files are required to
provide details about the metal layers for the chosen
process technology…
Number and name designations for each layer/via
Physical d l t i l h
Ph i l and electrical characteristics f each l
t i ti for h layer
Dielectric constant
Design rules for each layer (min spacing, min width,
etc…) )
Units and precision for numerical values
Example filetypes
p yp
.lefhdr, .tf -> contain layer and design rule
information
Also, there are files that enable improved RC estimation
that can be read by the placement engines.
.captable, .tluplus -> store RC coefficients.
53. Physical Technology D t
Ph i l T h l Data
The technology files contain
LAYER M1
Example
TYPE ROUTING ;
DIRECTION HORIZONTAL ;
design rule information that
OFFSET 0 ;
PITCH 0.280 ;
can be read by the tools
WIDTH 0.120 ;
MAXWIDTH 12.000 ;
AREA 0.058 ;
.lefhdr
MINENCLOSEDAREA 0.200 ;
THICKNESS 0.240 ;
For example, the
example HEIGHT 0.765 ;
SPACINGTABLE
spacing table constrains PARALLELRUNLENGTH
WIDTH
WIDTH
0.00
0.30
0.00
0.12
0.12
0.52
0.12
0.17
1.50
0.12
0.17
4.50
0.12
0.17
the parallel runlength of ;
WIDTH
WIDTH
1.50
4.50
0.12
0.12
0.17
0.17
0.50
0.50
0.50
1.50
adjacent wires on the
dj t i th MINIMUMCUT
MINIMUMCUT
2
4
WIDTH
WIDTH
0.42
0.98
;
FROMABOVE ;
same layer.
MINIMUMCUT 2 WIDTH 0.70 LENGTH 0.70 WITHIN 1.001 ;
MINIMUMCUT 2 WIDTH 2.00 LENGTH 2.00 WITHIN 2.001 ;
MINIMUMCUT 2 WIDTH 3.00 LENGTH 10.0 WITHIN 5.001 ;
Wire width and pitch are MINIMUMDENSITY 15 ;
MAXIMUMDENSITY 70 ;
DENSITYCHECKWINDOW 50 50 ;
also described, as well DENSITYCHECKSTEP 50 ;
FILLACTIVESPACING 0.60 ;
as any more complex
design rules for routing
routing.
54. Global d Detail Placement
Gl b l and D t il Pl t
Reading Gate-Level
Gate Level
Netlist from synthesis
Global Placement
Detailed Placement
Placement optimization
Pl t ti i ti
55. Global Placement
Gl b l Pl t
Standard cells are placed into groups such
that the number of connections between
groups is minimized.
This is solved through circuit partitioning
partitioning.
Bad Placement Good Placement
56. Detail Placement : Coarse
Placement
Coarse Pl
C Placement
t
All the cells are placed in the
approximate locations b t th
i t l ti but they
are not legally placed
No logic optimization is done
57. Detail Placement : L
D t il Pl t Legalization
li ti
Legalization: Ensures that the
final placement is legal before
saving the design.
Legal placement of cells is not required for analyzing routing
congestion at an early stage
ti t l t
58. Hard Macro Pl
H dM Placement
t
Hard macros are placed during the
floorplanning stage and th marked as
fl l i t d then k d
FIXED for placement.
Typically, hard macros are placed near the
sides of the core area.
59. Some Guidelines f Pl
S G id li for Placement (2)
t
RAM 1 RAM 2 RAM 3
RAM 4 RAM 5 RAM 6
Avoid
constrictive
channels
Avoid many pins in
the narrow RAM 8
channel. Rotate for RAM 7
pin accessibility Use blockage
to i
t improve pini
accessibility
60. Review of Placement Cost
Function
Cost components Methods of consideration
Area
Wire length Traditional methods of Placement
Overlap
Timing Timing-driven Placement
Congestion Congestion-driven Placement
Clock Clock Gating
Power Multivoltage and Multisupply Placement
61. Timing Driven Pl
Ti i D i Placement
t
Critical paths are determined using static timing
p g g
analysis (STA).
Tool attempts to minimize wire length of critical
paths to meet setup timing.
Net RCs are based on Virtual
Routing (VR) estimates
62. Virtual R t T i l R t
Vi t l Route / Trial Route
Manhatten geometry Virtual
Route
Horizontal – Vertical
NO diagonal routing
63. Congestion Driven Placement:
Detouring Routes
Congestion Map
Issues with Congestion
Congestion
If congestion is not too hot spot
severe, the actual route can
be detoured around the
congested area
The detoured nets will have Detour
worse RC delay compared to
the VR estimates
≥2 ≥3 ≥4 ≥5 ≥6 ≥7
In highly congested areas delay estimates during placement will
areas,
be optimistic.
64. Congestion M
C ti Map
No need to use -congestion Causes high local
unnecessarily utilization
By default, physical synthesis tools
perform some congestion optimization
which has a reasonable chance of
providing acceptable congestion
Congestion driven placement increases Gives uniform density
G f
the effort of algorithm to fix congestion
On average –congestion option
increases runtime by 20%
For better correlation to post-route,
congestion-driven placement s enabled
co gest o d e p ace e t is e ab ed
based on GR congestion map
65. Congestion Driven Placement:
Options
Some Congestion: using medium effort congestion-
driven
Max
M routing congestion > 90%
ti ti
Large hot spots
Bad Congestion: using high effort congestion-driven
Max routing congestion >> 90%
Very large hot spots
y g p
Congestion-driven might affect timing negatively but
Post-route numbers will not create surprises
Lower congestion will speed up the detailed router
66. Modifying Physical Constraints
M dif i Ph i l C t i t
Modifying Physical Constraints:
Cell Density
Cell density can be up to
y p
95% by default x2 y2
Density level can also be
applied to a specific region
Lower cell density in
x1 y1
congested areas using –
coordinate option
67. Modifying the Floorplan
M dif i th Fl l
Top level
Top-level ports
Changing to a different metal layer
Spreading them out, re-ordering or moving to other
sides
Macro location or orientation
Alignment of bus signal pins
Increase of spacing between macros
Core aspect ratio and size
p
Making block taller to add more horizontal routing
resource
Increase of the block size t reduce overall congestion
I f th bl k i to d ll ti
Power grid: Fixing any routed or non-preferred layers
68. Congestion Driven vs. Timing
Driven Placement
In general there is a direct trade-off
between congestion and timing
g g
Timing-driven placement tries to shorten nets
whereas congestion driven p
g placement tries to
spread cells, thus lengthing nets.
Iterative placement trials should be
p
performed to find a balance between the
different tool options/settings.
p g
69. Timing and Congestion
Optimization
Some things that can be done for timing optimization…
Adding deleting buffers
Addi / d l ti b ff
Resizing gates
Restructuring the netlist
Swapping pins
Moving instances
g
Area recovery
Congestion optimization tries to reduce local congestion
hotspots.
Generally if congestion exists after placement, little
more can be done if area recovery is not significant
done, significant.
It is essential that sufficient area is available for any
optimizations that are required
71. General Concept of Clock tree
synthesis
y
CLK CLK
Unbuffered clock tree Buffered/balanced clock tree
Skew Area (#buffers)
Power Slew rates
+ Minimize total insertion delay (latency) 71
72. Sources of skew
S f k
Not perfectly balanced clock tree
p y
Different levels of buffering
Different cells
Different load due to routing
Different RC delays
Setting a skew constraint = 0 ps
S
Makes no sense
Insertion delay (latency) will increase
Power consumption will increase
Area will increase
Rule of thumb : skew values : 100 – 150 ps for 90 nm
73. Extra sources of clock skew : variability
y
Unwanted Skew Variations
Process variations in clock buffers T W
S
Power supply noise
H
Temperature variations
Ground plane
. part of the OCV (lecture 15)
. L effective
. Gate length
Gate width tox
73
75. Prepare the netlist for CTS
Analyze the clock trees
Check the clocks
Remove unwanted buffering
76. Remove unwanted b ff i
R t d buffering
Unnecessary pre-existing clock
buffers/inverters
remove_clock_tree
77. CTS : Goals
Meeting the clock tree design rule
constraints
Constraints are upper
Maximum transition delay
bound goals. If constraints
Maximum load capacitance are not met, violations will
t t i l ti ill
Maximum fanout be reported.
[
[Maximum buffer levels]
]
defaults
Meeting the clock tree targets
Maximum skew Highest priority
Min/Max insertion delay (latency)
77
78. Effect of Clock Tree Synthesis
on placement
Clock buffers added
Congestion may increase
Non clock cells may have been
moved to less ideal locations
Inserting clock trees can
introduce new timing and max
tran/cap violations
“real” skew taken into
account
79. Summary
Clock tree synthesis is one of the most
important steps of IC design and can have
a significant impact on timing power area
timing, power, area,
etc.
The l ki
Th clocking strategy h t b di
t t has to be discussedd
with the frontend people before CTS is
started
t t d
Clocks identification
Clock dependencies
Clock balancing
81. Overview
Routing fundamentals / Advanced issues
intro
The routing flow
Special topics for 90nm and below
Additional routing considerations
Summary
82. Physical Design Flow
Physical Design Flow
Design/Constraints Import
Floorplanning
Placement
Clock Tree Synthesis
Routing
g
Post Route Optimization
Finishing
Fi i hi
82
83. Routing Fundamentals
Goal is to realize the metal/copper connections between the pins of
standard cells and macros
Input :
placed design
fixed number of metal/copper layers
Goal:
routed design that is DRC clean and meets setup/hold timing
Consists of two phases
1. Global route
Standard
cell pin
2. Detail route
Horizontal
routing
tracks
Vertical
routing
tracks
84. Routing Fundamentals :
Advanced Issues
Timing driven routing
Timing budget for each net
Minimize critical paths
Signal integrity aware : 90nm and below !!!!
Minimize crosstalk
DFM / DFY
DRC clean
Rule based versus Model based
85. General Flow for Routing
Placement and CTS
Route Clock Nets
Global Route Signal Nets
Detail Route Signal Nets
Design for Manufacturing
(DFM)
Geert Vanwijnsberghe - Affiliation 85
86. Global Route
Vertical routing
capacity = 9 tracks
Y
Horizontal routing
capacity = 9 tracks
X
X
Y
86
87. Global Route
Input:
Cell and macro placement
Routing channel capacity per layer / per direction
Goal:
Perform fast, coarse grid routing through global routing
cells (GCells) while considering the following:
Wire length
Congestion
Timing
Noise / SI
Often used by placement engines to predict
congestion in the form of a “trial ro te” or
route”
“virtual route” 87
88. Global Route
Global Route
Assigns nets to specific metal layers
and global routing cells (Gcells) global route
Tries to avoid congested Gcells while
minimizing detours
Congestion exists when more tracks
are needed than available
Detours increase wire length (delay)
Also avoids P/G (rings/straps/rails) and
routing blockages Y
virtual route
X congested area
88
90. Detail Route
Using global route plan, within each
global route cell
Assign nets to tracks
Lay down wires
L d i
Connect pins to corresponding nets
Solve DRC violations
Reduce cross couple cap
p p
Apply special routing rules
90
91. Detail Route: Track Assignment
For nets that
traverse multiple
GCells
Assigns each net to
a specific track and
lays down the actual
metal traces
Makes long, straight
traces and
Reduces the number
Preroute TA metal traces Jog reduces via count
of vias 91
94. Timing Driven Routing
At 90 Quality of route can effect timing
nm net delay becomes significant
Optimize critical paths
Route some nets first
Most routing freedom at start
Use shortest paths possible
Net weights
Order of routing (priorities : eg. Default : Clocks 50,
others 2)
Wire id i
Wi widening
Reduce resistance
95. What is Signal Integrity or SI? (1)
Signal delay caused by crosstalk noise
Possible in 2 directions : push-out pull-down
p p
net 1 Aggressor
net 2 Victim
Speed Up Delay
95
96. What is SI? (2)
Glitch caused by crosstalk noise
Aggressor
Extra clock cycle!
Functional Failure
Vdd
D Q
^
Clk
Victim
96
97. Crosstalk Prevention : Design
Optimization
Noise depends on
Coupling capacitance
Total net capacitance
Strength of the driver (Rd of the victim net)
Design optimization
Increase drive strength often easier (only
strength,
local effect)
Buffer long nets
99. Crosstalk Prevention : Reduce
Cross Coupling Cap
Critical Nets
Extra space Grounded shields
Spacing Shielding
Same layer (H)
Adjacent layers (V) Net Ordering
99
100. Effect of Floorplanning on Routing
Congestion
For hierarchical designs, good pin
p
placement is essential to p
preventing
g
routing congestion.
Can use pin guides during partitioning
101. Routing around blockages and over
macros
By default routing tool will:
Route over macros
M1- M4 Routing Blockage
Not route where there is a routing
blockage
Not route through a narrow M1- M3 Routing Blockage
channel in the non-preferred
non preferred
routing direction
M1- M4 Routing Blockage
M4 has a horizontal routing
channel but its preferred
routing direction is vertical
Macro
The preferred routing direction needs to be changed
102. Clock Tree Routing
For SI prevention we generally want to route
our clocks with extra spacing
spacing.
Global H-trees are often routed manually
before placement
Htree nets may be routed with wide-metal and
shielding. Wide metal H Tree
Wide-metal H-Tree net
102
Grounded shields
103. Post Route Clock Tree
Optimization (CTO)
improve the skew on clock nets
Detail Routed Before CTO
Design
Yes
Skew OK? Short
path
No
Postroute CTO
ECO Route
After CTO
Increased
delay
104. Options for CPU effort
O ti f ff t
# processors
Routing in parallel on # processors
Superthreading, multithreading
Some routers are better a threading than
others
# iterations for detail route
# of iteration steps done to get a DRC free
design
105. Summary
Starting from 90 nm technologies
Timing Driven Route
net delay is becoming more of a factor
SI Aware Route
Small geometries make SI timing closure much
more difficult
DFM / DFY
Now a crucial part of the routing flow
DRC
Number and complexity of DRC rules has
increased dramatically