A physical design flow consists of producing a production-worthy layout from a gate-level netlist subject to a set of constraints. We focus on the problems imposed by shrinking process technologies. It exposes the problems of timing closure, signal integrity, design variable dependencies, clock and power/ground routing, and design signoff. It also surveys some physical design flows, and outlines a refinement-based flow.
Timing and Design Closure in Physical Design Flows
1. ISQED 2002 (C) Monterey
ISQED 2002
Olivier Coudert
Monterey Design System
Timing and Design Closure in
Physical Design Flows
2. ISQED 2002 (C) Monterey
Summary
Why a Need for Physical
Flows?
Some Physical Flows
A refinement based Physical
Flow
Conclusion
3. ISQED 2002 (C) Monterey
Design Flow
Physical
Flow
RTL
Behavioral spec.
Behavioral
synthesis
Logic
synthesis
Layout
Gate level
netlist
while (x<a) do
x1:= x + dx;
u1:= u - (3*x*u*dx) - (3*y*dx);
y1:= y + (u*dx);
x:= x1; u:= u1; y:= y1;
endwhile
RC: = ALU 1(RX, a, comp);
wait until clock AND RC;
RX1 := ALU1 (RX, RDX, ADD);
RT1 := MULT1(RU, RX);
RT2 := MULT 2(3, RDX);
wait until clock;
RT3 := MULT1(RT1, RT2);
RT4:= MULT2(RT2, RY);
4. ISQED 2002 (C) Monterey
Pre DSM Physical Flow
Clock
Global place
Global route
Layout
Gate level
netlist
Detailed
place
Detailed
route
5. ISQED 2002 (C) Monterey
Timing & Interconnect
Wireload models were ALWAYS inaccurate
Good average but large variance
Post-synthesis signoff was possible when
interconnect contributed ~20% of the total
capacitance
But now the interconnect capacitance is
dominating the total capacitance with each
new process generation
Elmore delay model becomes inaccurate as
resistance increases
6. ISQED 2002 (C) Monterey
Gate vs. Net in Optimal Delay
0
0.2
0.4
0.6
0.8
1
1.2
0.5x 1.0x 2.0x 3.0x 4.0x 6.0x 8.0x 9.0x
Relative Driver Size
gatedelay/totaldelay
0.25 um
0.18 um
optimal
delay
point
7. ISQED 2002 (C) Monterey
Dominant coupling capacitance can produce a noise problem
Or a delay problem
Noise and Delay Coupling Effects
Switching
Noise Sensitive
CC
CL
increased delayCC
CL
8. ISQED 2002 (C) Monterey
Decrease in supply voltage at the gates
Due to current flow through the power resistive
network
Effects of IR drop on circuit performance
IR drop
IR drop delay
0 V 0.114 ns
0.15 V 0.126 ns (+10%)
0.3 V 0.143 ns (+25%)
0.5 V 0.184 ns (+61%)
input
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
9. ISQED 2002 (C) Monterey
Electromigration & Self Heating
Metal interconnect
disintegration due to high
current density
Can occur for power
network and also
signal nets
Important DSM effect
Higher current
densities due to
increased currents and
finer wire
widths/thicknesses
Faster switching is
increasing the di/dt’s
10. ISQED 2002 (C) Monterey
Signal Integrity
Xtalk
Can produce last minute timing problems at
DR
IR-drop
Can invalidate P/G routing
Design rules, electromigration
Make DR more difficult
Inductance
Need new analysis tools and avoidance
techniques
11. ISQED 2002 (C) Monterey
Physical Flow
Take a gate-level netlist and a library
Take constraints (place, route, timing, power,
design rules, etc)
Produce production worthy layout
Meet timing
P/G and clock
Satisfy design rules
Signal integrity aware (xtalk, IR-drop, EM)
Predictable
Fast TAT
12. ISQED 2002 (C) Monterey
Summary
Why a Need for Physical
Flows?
Some Physical Flows
A refinement based Physical
Flow
Conclusion
14. ISQED 2002 (C) Monterey
Block Based Flow
Procedure:
Partition the design in small blocks (~50k
gates)
Implement each block
Assemble the blocks
Assumptions:
Shield timing from the interconnect because:
small blocks
strong drivers
Interconnect becomes a local property of a
block
Budgeting can be done on every blocks
Benefit:
15. ISQED 2002 (C) Monterey
Block Based Flow
Problem:
Strong driver leads to suboptimal solutions
Interconnect is NOT a local property of a
block because of congestion
Does not capture large nets interconnecting
several blocks
Budgeting is non-trivial, and can lead to
suboptimal solutions
Assembly is complex if conditions at the
boundaries of the blocks (capacitance &
driver strength) is not fixed
16. ISQED 2002 (C) Monterey
Constant Delay Based Flow
Procedure:
Allocate delays on logical stage
Translate the delays into gains (Co/Ci)
Keep the gains constant as the gates are
placed
Assumptions:
Delays is a linear function of the gain
Convex libraries
Benefit:
Fix timing upfront
Fast
17. ISQED 2002 (C) Monterey
Constant Delay Based Flow
Problem:
Gain cannot be preserved, needs buffer
insertion
Consequently, allocation need to be revisited
Non-convex libraries
Mapping onto discrete libraries
Still will need DR information, e.g., for Xtalk
effect
18. ISQED 2002 (C) Monterey
Summary
Why a Need for Physical
Flows?
Some Physical Flows
A refinement based Physical
Flow
Conclusion
19. ISQED 2002 (C) Monterey
One cannot optimize what one cannot measure
accurately enough
Data is measured with a distribution (x, σ)
Need to know σ --noise
Need to know how the optimization affect the
distribution --correlation
Principle
23. ISQED 2002 (C) Monterey
Physical prototype
Earliest stage of the design when interconnect
is predictable
Physical logic optimization can start at this
level only
Timing signoff can be done at this level only
25. ISQED 2002 (C) Monterey
How Different Is Phy. Logic Opt.?
Need to work with accurate models
timing, power, design rules aware, etc
mostly non-convex
often CPU time costly
Need to place gates
tight communication with placer
Need to generate routes
New techniques
size & buffer & route & place
resynthesize & remap & place
logic optimization for congestion relief
26. ISQED 2002 (C) Monterey
Placement/Synthesis/Routing
The flexibility of the placement and the
continuous refinement allows logic optimization
to continue throughout the flow
Continual monitoring of “what is critical”
From extensive to local logic optimization
27. ISQED 2002 (C) Monterey
Clock Distribution
Clock tree is created at the physical
prototyping level
Distribution of latches and flip-flops is known
A complete buffered/gated clock tree is
automatically synthesized
Congestion and skew accounted for
28. ISQED 2002 (C) Monterey
Power/Ground Distribution
P/G network built at the physical prototype
level
Built from user-provided power stripe/ring
rules
P/G network can have a huge impact on
congestion
Can judge the quality and integrity of the
power/ground network (IR drop)
29. ISQED 2002 (C) Monterey
Summary
Why a Need for Physical
Flows?
Some Physical Flows
A refinement based Physical
Flow
Conclusion
30. ISQED 2002 (C) Monterey
Conclusion
Physical flows must consider logic, place, and
route simultaneously
Physical flows need new solutions:
Logic synthesis & placement interaction
Synthesize logic & route at the same time
Early estimation of xtalk so that GR can
allocate routing resources to DR
Logic optimization for congestion relief, for SI
…
31. ISQED 2002 (C) Monterey
The future
Possible flow:
Fast behavioral synthesis together with
floorplanning
Evaluate area/performance tradeoff
Timing driven block & port placement
Evaluate top level routing of P/G integrity
Budgeting
Clock methodology
Fast RTL to gate synthesis of blocks
Physical synthesis of block:
Logic optimization + placement + routing
Block assembly & chip verification
Editor's Notes
You can see that as the driver strength increases, interconnect has a smaller impact on delay. It approaches 1. We also show the optimal driver choice for this stage (represented by minimum delay and reasonable slope). It’s interesting to see that this point has a small value of r, which means that it is important to account for interconnect here.
interesting design points here 3x (knee of the curve).
If we move up the levels of abstraction to the behavioral level we need to change our design methodology.
This is the current RTL/logic synthesis based methodology where we specify the design in RTL and use logic synthesis to get to structural or gate level netlist - we would iterate on this loop until we are satisfied and then use place and route tools to get to the layout and iterate again.
With the behavioral synthesis methodology you would write a behavioral specification and use behavioral synthesis to get to RTL code - we would iterate at this high level using different set of constraints and so on until satisfied before we go to logic synthesis.
You can see that as the driver strength increases, interconnect has a smaller impact on delay. It approaches 1. We also show the optimal driver choice for this stage (represented by minimum delay and reasonable slope). It’s interesting to see that this point has a small value of r, which means that it is important to account for interconnect here.
interesting design points here 3x (knee of the curve).
You can see that as the driver strength increases, interconnect has a smaller impact on delay. It approaches 1. We also show the optimal driver choice for this stage (represented by minimum delay and reasonable slope). It’s interesting to see that this point has a small value of r, which means that it is important to account for interconnect here.
interesting design points here 3x (knee of the curve).
1) Electromigration is the breaking of interconnect due to high current density
flow. (This can happen to via also.)
2) Typically EM issue applies to power network and self-heating applies to
signal nets. (The solution to EM and self-heating are similar.)
3) This is an important DSM effect as technology goes to finer geometry. More
current and less width =&gt; higher current density =&gt;more EM/SH effect
4) Traditional approach is by over-designing the power network (to solve both
EM and IR drop problem). There are few solutions for self-heating for signal
nets, and the effects have been ignored. Verification at post-layout can
introduce delay into design tape-out and is unacceptable.
&lt;Next&gt;
You can see that as the driver strength increases, interconnect has a smaller impact on delay. It approaches 1. We also show the optimal driver choice for this stage (represented by minimum delay and reasonable slope). It’s interesting to see that this point has a small value of r, which means that it is important to account for interconnect here.
interesting design points here 3x (knee of the curve).
You can see that as the driver strength increases, interconnect has a smaller impact on delay. It approaches 1. We also show the optimal driver choice for this stage (represented by minimum delay and reasonable slope). It’s interesting to see that this point has a small value of r, which means that it is important to account for interconnect here.
interesting design points here 3x (knee of the curve).
You can see that as the driver strength increases, interconnect has a smaller impact on delay. It approaches 1. We also show the optimal driver choice for this stage (represented by minimum delay and reasonable slope). It’s interesting to see that this point has a small value of r, which means that it is important to account for interconnect here.
interesting design points here 3x (knee of the curve).
The clusters are sized and placed within partitions and among megacells
Long wires are modeled among partitions, and congestion is approximated within partitions
Initially, congestion is dominated by local wires
Early wireplanning for long wires will not work
“Long” wires are not “planned”, but are “placed” probabilistically in terms of where the router is likely to want to route them
The placement should provide enough information to know the distribution of latches, but should be abstract enough to avoid being trapped by congestion caused by the clock wiring. The contribution of the clock tree to the congestion is taken into account as early as it is meaningful
The latch and flip-flop distribution will not change dramatically after the physical prototype level
The clock tree leaves will be refined and the top clock tree adjusted as the placement and optimization processes continue. Accurate timing projections enable useful skew methods to be applied at this level
Placement is still coarse enough so that objects with common-skew targets can be grouped
Eventually automation process will have to consider more detailed analysis too:
Inductance of chip and packaging
Resonance frequencies via ac analyses
On-chip decoupling
Power rail currents will not change much as the placement is refined
Yet there is enough space to add/widen stripes
API driven adjustment using incremental IR-drop analyses
Ultimately this optimization process can be automated
You can see that as the driver strength increases, interconnect has a smaller impact on delay. It approaches 1. We also show the optimal driver choice for this stage (represented by minimum delay and reasonable slope). It’s interesting to see that this point has a small value of r, which means that it is important to account for interconnect here.
interesting design points here 3x (knee of the curve).