SlideShare a Scribd company logo
1 of 63
Download to read offline
Understanding
Understanding
Clock Tree Synthesis
Log Messages
Log Messages
© Synopsys 2012 1
Agenda
• Prerequisites for Clock Tree Synthesis
• Enabling Useful Debug Messages in IC Compiler Clock
Tree Synthesis
• Clock Tree Synthesis Log Messages
• Clock Tree Optimization Log Messages
© Synopsys 2012 2
Agenda
• Prerequisites for Clock Tree Synthesis
• Enabling Useful Debug Messages in IC Compiler Clock
Tree Synthesis
• Clock Tree Synthesis Log Messages
• Clock Tree Optimization Log Messages
© Synopsys 2012 3
Prerequisite 1:
Run the check clock tree Command
• Run the check_clock_tree command prior to clock tree
synthesis, and fix the issues reported
_ _
• This command checks the following, and reports issues that can
lead to bad QoR:
 Cl k T S
 Clock Tree Structure
 Constraints
 Clock Tree Exceptions
© Synopsys 2012 4
Prerequisite 2:
Ensure Placement Legality
• For clock tree synthesis to proceed without any errors, it is necessary to
have a legally placed design.
• Use the check legality command to check whether the design is
g y
• Use the check_legality command to check whether the design is
properly placed and legalized, prior to CTS.
• In case of legality issues, use the legalize_placement command to
resolve these issues
resolve these issues.
Note:
• Clock tree synthesis will abort in case of placement legality issues
• Clock tree synthesis will abort in case of placement legality issues.
• In some cases, like overlapping standard cells, it may still proceed and
issue a warning during placement legality checking, but continuing with
placement legality issues may lead to bad QoR
placement legality issues may lead to bad QoR.
Warning: Some cells in the design are not legal. (CTS-242)
© Synopsys 2012 5
Default Constraints
• The default constraints that clock tree synthesis uses are as follows:
Maximum transition time 0.5ns
Maximum capacitance 0.6pF
M i f 2000
Maximum fanout 2000
© Synopsys 2012 6
Design Rule Constraints
• In addition to the clock tree design rule constraint values specified using
In addition to the clock tree design rule constraint values specified using
set_clock_tree_options, IC Compiler also considers the design rule constraint values
from the logic library and the design.
• The following table summarizes how IC Compiler determines the design rule constraint
Case1:
Default behavior:
t lib f t f l
Case2:
Use library and SDC settings for maximum
fanout:
t lib f t t
Case3:
Use only user set settings for clock tree
synthesis and clock tree optimization:
The following table summarizes how IC Compiler determines the design rule constraint
values used during the design rule fixing stage of clock tree synthesis and optimization.
cts_use_lib_max_fanout=false
cts_use_sdc_max_fanout=false
cts_force_user_constraints=false
cts_use_lib_max_fanout=true
cts_use_sdc_max_fanout=true
cts_force_user_constraints=false
cts_force_user_constraints=true
Maximum capacitance
The minimum value from:
• The set_clock_tree_options
• The CTS default value (0.6pF)
The minimum value from:
• The set_clock_tree_options
• The CTS default value (0.6pF)
Value set using
set clock tree options
Maximum capacitance The CTS default value (0.6pF)
• The logic library
• The SDC constraints
The CTS default value (0.6pF)
• The logic library
• The SDC constraints
_ _ _ p
Maximum transition time
The minimum value from:
• The set_clock_tree_options
• The CTS default value (0.5ns)
Th l i lib
The minimum value from:
• The set_clock_tree_options
• The CTS default value (0.5ns)
Th l i lib
Value set using
set_clock_tree_options
• The logic library
• The SDC constraints
• The logic library
• The SDC constraints
Maximum fanout The value set using
set_clock_tree_options
The minimum value from
• The logic library
• The SDC constraints
• The set clock tree options
The value set using
set_clock_tree_options
© Synopsys 2012 7
The set_clock_tree_options
Constraints Specified Using the
set clock tree options Command
• Library units are used for time and capacitance values specified by using
the set_clock_tree_options command
_ _ _ p
• The smallest values accepted for the -max_capacitance and
-max_transition options of the set_clock_tree_options
command are 1fF and 1ps respectively
command are 1fF and 1ps respectively.
• For example, if the library units are pF and ps, and you specify the following
command IC Compiler will issue an error:
command, IC Compiler will issue an error:
icc_shell> set_clock_tree_options -max_cap 0.0009 -max_tran 0.300
Error: User max_cap constraint (0.900000 fF) is too small. (CTS-206)
Error: User max_tran constraint (0.300000 ps) is too small. (CTS-207)
– IC compiler will not accept these small values, and will use the previously
specified values or the default values for maximum capacitance and maximum
transition, during clock tree synthesis.
© Synopsys 2012 8
Agenda
• Prerequisites for Clock Tree Synthesis
• Enabling Useful Debug Messages in IC Compiler Clock
Tree Synthesis
• Clock Tree Synthesis Log Messages
• Clock Tree Optimization Log Messages
© Synopsys 2012 9
Enabling Debug Messages
• To enable clock tree synthesis debug messages in IC Compiler, use:
set cts use debug mode true
set cts_use_debug_mode true
• Many of the messages discussed in this presentation are available only
when you enable the debug mode.
y g
© Synopsys 2012 10
Agenda
• Prerequisites for Clock Tree Synthesis
• Enabling Useful Debug Messages in IC Compiler Clock
Tree Synthesis
• Clock Tree Synthesis Log Messages
• Clock Tree Optimization Log Messages
© Synopsys 2012 11
Messages in the compile_clock_tree
Command Log
• Before clock tree synthesis:
D i d t
Command Log
– Design update
– Buffer and Inverter information
– Clock tree constraints
– Clock structure before clock three synthesis
• During clock tree synthesis:
– Clustering
– Meeting target early delay
Meeting target early delay
– Gate level clock tree synthesis results
• After clock tree synthesis:
S t
– Summary report
– Embedded clock tree optimization
– DRC fixing beyond exceptions
– Placement legalization
© Synopsys 2012 12
START CMD: compile clock tree CPU: 55 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011
Overview of the compile_clock_tree Command Log
_ p _ _ ( ) ( )
(PSYN-508)
CTS: CTS Operating Condition(s): MAX(Worst)
START_FUNC: prelude CPU: 55 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011
(PSYN-508)
Loading design 'ORCA_TOP'
…
Information: Design Library and main library capacitance units are matched - 1.000 pf.
Prelude
g y y p p
END_FUNC: prelude CPU: 56 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011
(PSYN-508)
…
****************************************************************
Information: TLUPlus based RC computation is enabled. (RCEX-141)
****************************************************************
Information: The distance unit in Capacitance and Resistance is 1 micron. (RCEX-007)
Extraction related messages
Information: The distance unit in Capacitance and Resistance is 1 micron. (RCEX 007)
Information: The RC model used is TLU+. (RCEX-015)
…
CTS: Blockage Aware Algorithm
CTS: Marking Ignore Pins....
…
Warning: too small maximum transition (=0.300000) defined at library cell dl02d4. (CTS-619)
CTS b ff ti t d k t t d l d i i i t
CTS: buffer estimated skew target delay driving res input cap
CTS: invbdk [0.009 0.010] [0.043 0.058] [0.197 0.213] [0.059 0.059]
...
CTS: Prepare sources for clock domain SD_DDR_CLK
CTS: Prepare sources for clock domain SDRAM_CLK
CTS: Prepare sources for clock domain SYS_2x_CLK
…
Buffer characterization
CTS: Region Aware Algorithm is automatically turned off when design has no region or only has one region.
CTS: Info: Found net sys_2x_clk, on cell I_RISC_CORE/I_REG_FILE/REG_FILE_B_RAM is macro. Will not treat as pad.
…
clean drc fixing cell first...
In all, 0 drc fixing cell(s) are cleaned
In all, 0 drc fixing cell(s) beyond exception pins are cleaned
…
© Synopsys 2012 13
…
CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_8/S is implicit ignore
CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_9/S is implicit ignore
…
CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_8/S is implicit ignore
CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_11/S is implicit ignore
…
Warning: Ignore net sd_CK since it has no synchronous pins. (CTS-231)
CTS: Info: will use target transition value for initial CTS stages
Pruning library cells (r/f, pwr)
Min drive = 0.000372606.
…
Final pruned buffer set (7 buffers):
bufbd1
Pruning of buffers and inverters
…
CTDN lib estimation: buffers should result in better clock power.
CTS: BA: Net 'sdram_clk'
CTS: Starting clock tree synthesis ...
CTS: Conditions = worst(1)
CTS: Global design rule constraints [rise fall]
CTS: max transition = worst[0.300 0.300] GUI = worst[0.300 0.300] SDC = undefined/ignored
Reporting global
clock tree constraints
…
Information: Removing clock transition on clock PCI_CLK ... (CTS-103)
CTS: gate level 1 clock tree synthesis
CTS: clock net = sdram_clk
CTS: gate level 1 clock tree synthesis results
CTS: clock net : sdram clk
Clock tree synthesis
CTS: clock net : sdram_clk
…
TS: Clock tree synthesis completed successfully
CTS: CPU time: 18 seconds
CTS: Reporting clock tree violations ...
…
CTS: ------------------------------------------------
Reporting the results of clock tree synthesis
CTS: Clock Tree Synthesis Summary
CTS: ------------------------------------------------
…
CTS: Starting block level clock tree optimization
…
CTS: gate level 1 clock tree optimization
CTS: clock net = pclk
Embedded clock tree optimization
© Synopsys 2012 14
CTS: clock net = pclk
Gate Upsizing During Clock Tree
Synthesis
• The compile_clock_tree command will upsize all the
Synthesis
preexisting cells in the clock tree before building the clock tree.
Information: Replaced the library cell of sys_ctl/sunburst_clk_mux_div1/clk_buf from bufbd4 to
bufbdf (CTS 152)
Preexisting gate
bufbdf. (CTS-152)
• In the previous example the preexisting gate is upsized from a
bufbd4 to a bufbdf.
• This upsizing helps in reducing the number of buffer levels needed
to building the clock tree, thereby reducing the buffer count.
g , y g
© Synopsys 2012 15
Maximum Capacitance and Transition Related
Warnings
• Even if the set_clock_tree_options command does not issue
any errors when you set the maximum capacitance and transition
constraints, the compile_clock_tree command can issue
warnings if the values are too small.
Warning: too small maximum transition (=0.050000) defined at
pin instCLK1GC1/Q. (CTS-620)
Warning: too small maximum capacitance (=0.050000) defined at
pin instCLK1GC1/Q. (CTS-620)
Warning: too small maximum transition (=0.050000) defined at
Max trans =50ps is too tight for the pin instCLK1GC1/Q
Max cap =50fF is too tight for the pin instCLK1GC1/Q
Warning: too small maximum transition ( 0.050000) defined at
library cell bufbdk. (CTS-619)
• Tight constraints can cause clock tree synthesis to use an excessive
Tight constraints can cause clock tree synthesis to use an excessive
number of buffers to build the clock trees
© Synopsys 2012 16
Buffers and Inverters Used During Clock Tree
Synthesis
• Before synthesizing the clock tree, IC Compiler characterizes each buffer
and inverter
 To see the characterization details, set the following variable to true:
g
set cts_do_characterization true
 After characterization is done, characterized values for each buffer and
inverter are reported
Buffer p
CTS: buffer estimated skew target delay driving res input cap
CTS: bufbdf [0.013 0.015] [0.217 0.200] [0.210 0.248] [0.007 0.007]
CTS: inv0da [0.018 0.021] [0.097 0.119] [0.294 0.347] [0.036 0.036]
CTS: bufbd7 [0.025 0.030] [0.223 0.234] [0.415 0.503] [0.008 0.008]
CTS b fbd4 [0 047 0 053] [0 347 0 357] [0 786 0 880] [0 004 0 004]
CTS: bufbd4 [0.047 0.053] [0.347 0.357] [0.786 0.880] [0.004 0.004]
Inverter
Rise delay Fall delay
• Driving resistance determines the drive strength of the buffer or inverter.
• Smaller the driving resistance, greater is the drive strength.
• In the previous example, bufbdf is the buffer with the highest drive strength.
© Synopsys 2012 17
Unbalanced Buffers
• Buffers and inverters that have a big difference between their rise
and fall delays, which is referred to as the rise/fall delay skew, are
reported.
CTS: inverter inv0da: rise/fall delay skew = 0.204816 (> 0.200000)
• Remove unbalanced buffers them from the buffer list specified for
clock tree synthesis, as they can might cause bad skew.
• Use the set_clock_tree_references command to specify the
buffers and inverters that should be used for clock tree synthesis
© Synopsys 2012 18
Pruning of Buffers and Invertors
• Pruning is a process by which IC Compiler selects the buffers and
inverters which are best suited for clock tree synthesis, based on the
buffer and inverter characterization, and prevents the remaining ones
f b i d
from being used.
• IC Compiler prunes the buffers and inverters based on drive strength
and power:
and power:
Pruning library cells (r/f, pwr)
Min drive = 0.264263.
Pruning inv0d0 because drive of 0.149845 is less than 0.264263.
Pruning inv0d2 because it is (w/ power-considered) inferior to invbd2.
• IC Compiler calculates a minimum drive value based on heuristics.
Buffers and inverters whose drive strength is less than the minimum
drive value are considered as weak drivers and are pruned by IC
d e a ue a e co s de ed as ea d e s a d a e p u ed by C
Compiler.
• It is not possible to override the default pruning process
© Synopsys 2012 19
Maximum Transition, Maximum
Capacitance and Timing Constraints
Capacitance and Timing Constraints
Before clock tree synthesis begins, all the global clock tree constraints are
reported in the log in the format shown below:
Default value or the value set
using
set clock tree options
The value
reported in the log, in the format shown below:
CTS: Global design rule constraints [rise fall]
CTS: max transition = worst[0.050 0.050] GUI = worst[0.100 0.100] SDC = worst[0.050 0.050]
Value from
SDC
_ _ _ p
used by CTS
[ ] [ ] [ ]
CTS: max capacitance = worst[0.600 0.600] GUI = worst[0.600 0.600] SDC = undefined/ignored
CTS: max fanout = 2000 GUI = 2000 SDC = undefined/ignored
on
s
Undefined means no value
ifi d i SDC
CTS: Global timing/clock tree constraints
CTS: clock skew = worst[0.100]
CTS: insertion delay = worst[2.000]
CTS: levels per net = 200
Skew/insertio
delay
targets
Values set using the
specified in SDC
Ignored means the value from
SDC is ignored as the
cts force user constraints
© Synopsys 2012 20
S
d
Values set using the
set_clock_tree_options
command
cts_force_user_constraints
variable is set to true
Clock Tree Synthesis Target Specifications
• Target specifications are the internal targets for clock tree synthesis,
Clock Tree Synthesis Target Specifications
but are not guaranteed. Only target constraints are guaranteed to be
achieved
CTS: Global target spec [rise fall]
CTS: transition = worst[0.250 0.250]
CTS: capacitance = worst[0.300 0.300]
CTS: fanout= 32 (This target fanout value is not considered by CTS)
• Target specifications:
 maxTransSpec: Min(0.25, 80%of max_transition constraints)
 maxCapSpec: Min(0.30, 80%of max_capacitance constraints)
© Synopsys 2012 21
Preexisting Clock Tree Information in the Log File
Maximum number of Before starting to
CTS: Design infomation
CTS: total gate levels = 8
CTS: Root clock net CLK2
CTS: clock gate levels = 2
Number of sinks
Maximum number of
gate levels available
e
levels
Before starting to
build the clock tree,
the preexisting clock
tree structure is
printed in the log file
CTS: clock sink pins = 4
CTS: level 2: gates = 1
CTS: level 1: gates = 1
CTS: Buffer/Inverter list for CTS for clock net CLK2:
CTS: invbdk
Existing gate levels and number
of gates at each level
Number
of
gate
for
clock
CLK2
printed in the log file
CTS: bufbdk
...
CTS: Root clock net CLK1
CTS: clock gate levels = 8
CTS: clock sink pins = 8431
N
f
CTS: clock sink pins 8431
CTS: level 8: gates = 2
CTS: level 7: gates = 3
CTS: level 6: gates = 4
CTS: level 5: gates = 3
CTS: level 4: gates = 1
evels
from
ps
towards
source
CTS: level 4: gates = 1
CTS: level 3: gates = 5
CTS: level 2: gates = 4
CTS: level 1: gates = 1
CTS: Buffer/Inverter list for CTS for clock net CLK1:
CTS i bdk
Gate
l
flip-flo
clock
s
© Synopsys 2012 22
CTS: invbdk
CTS: bufbdk
...
Real Gates and Guide Buffers
• You may see the term real gates in the preexisting clock tree structure
information section:
CTS: Root clock net CLK1
CTS: clock gate levels = 16
CTS: clock gate levels = 16
CTS: clock sink pins = 70644
...
CTS: level 13: gates = 14 (real gates = 4)
CTS: level 12: gates = 111 (real gates = 101)
CTS: level 11: gates = 146 (real gates = 136)
g ( g )
CTS: level 10: gates = 2488 (real gates = 2478)
• Real gates are preexisting gates in the clock tree, and are not gates added by
the tool
• Guide buffers are buffers or inverters that are inserted by the tool, before it
begins to build the tree. They are intended to help clock tree synthesis build a
better clock tree
• The number of guide buffers inserted at each level can be determined from the
difference between gates and real gates.
– In the above example, the tool has added 10 guide buffers at each of the clock tree
© Synopsys 2012 23
Buffers and Inverters Used
• Before it begins to build the clock tree, the tool will list all the buffers and inverters it will
use to build the tree
CTS: Buffer/Inverter list for CTS for clock net sdram clk:
_
CTS: CLKBUFX20
CTS: CLKBUFX16
CTS: CLKBUFX12
CTS: Buffer/Inverter LEQ cell list for Boundary Cell for clock net sdram_clk:
CTS CLKBUFX20
CTS uses this list
CTS: CLKBUFX20
CTS: CLKBUFX16
CTS: CLKINVX8
CTS: Buffer/Inverter LEQ cell list for CTO for clock net sdram_clk:
CTS: CLKBUFX20
CTS uses this list for inserting boundary cells
CTS: CLKBUFX16
CTS: CLKINVX8
CTS: Buffer/Inverter list for DelayInsertion for clock net sdram_clk:
CTS: CLKBUFX20
CTO uses this list for sizing
CTO thi li t f d l i ti
CTS: CLKBUFX16
CTS: CLKINVX8
• You can change the buffer and inverter list by using the following command:
CTO uses this list for delay insertion
© Synopsys 2012 24
set_clock_tree_references
Clock Tree Synthesis Removes User-Specified
Ideal Attributes on Clocks
• Synthesized clocks are set to be propagated, and clock transition, which
is an attribute of an ideal clock, is removed
Ideal Attributes on Clocks
CTS: Information: Removing clock transition on clock SP0XCLK ... (CTS-103)
CTS: Information: Removing clock transition on clock SP0RCLK ... (CTS-103)
• Latency, another attribute of an ideal clock, is also removed
Latency, another attribute of an ideal clock, is also removed
CTS: Information: Removing clock latency on pin
Idma_scr_wrap0__Idma_scrba0_m2m0_wrap/I_dma_scrba0_m2m0/ I_dma@ ... (CTS-
098)
• Source Latency is removed for generated clocks
Information: Removing clock source latency on clock CLK1GC1 ... (CTS-289)
• These messages are informational only, and no action is required
© Synopsys 2012 25
Overlap or Reconvergent Paths
• Overlap or reconvergent paths occur when multiple clocks can drive a
node
node
• IC Compiler issues warnings about such paths
Warning: Either the driven net has been synthesized previously or
clock path overlaps/reconverges at pin periph/U1852/Y. (CTS-209)
• Such messages should be treated as informational, rather than as
warnings
– IC Compiler has no problems handling such situations
© Synopsys 2012 26
Cl k t b ildi i d t l l b t l l t ti f th
Gate Level-by-Level Clock Tree Synthesis
• Clock tree building is done gate level by gate level, starting from the
sinks to the clock root
• For each gate level, just before the synthesis starts, the following
information will be printed in the log:
CTS: gate level 2 clock tree synthesis
CTS: clock net = I BLENDER 1/gclk Net and driver at
_ _ g
CTS: driving pin = I_BLENDER_1/U483/Z
CTS: gate level 2 design rule constraints [rise fall]
CTS: max transition = worst[0.300 0.300]
CTS: max capacitance = worst[0.300 0.300]
Net and driver at
this gate level
CTS: max fanout = 2000
CTS: gate level 2 target spec [rise fall]
CTS: transition = worst[0.240 0.240]
CTS: capacitance = worst[0.240 0.240]
CTS: driver cap. = worst[0.088 0.088]
C S: d e cap. o st[0.088 0.088]
CTS: fanout = 32
CTS: gate level 2 timing constraints
CTS: clock skew = worst[0.000]
CTS: levels per net = 200
© Synopsys 2012 27
CTS: -----------------------------------------------
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
• The clock tree building starts with clustering. Clustering is the process of
Clustering During Clock Tree Synthesis
g g g p
dividing a set of sink pins (fanouts) into groups. Each group is driven by a
buffer
 The instances of a cluster are all close to each other
• The following message says that 423 sink pins are divided into 27 clusters
• The following message says that 423 sink pins are divided into 27 clusters,
each with approximately 423/27 sink pins
CTS: gate level 2 clock tree synthesis
...
CTS: gate level 2 design rule constraints [rise fall]
CTS: max transition = worst[0.300 0.300]
CTS: max capacitance = worst[0.300 0.300]
CTS: max fanout = 2000
CTS: gate level 2 target spec [rise fall]
CTS: transition = worst[0.240 0.240]
CTS: capacitance = worst[0.240 0.240]
p [ ]
CTS: driver cap. = worst[0.088 0.088]
CTS: fanout = 32
CTS: gate level 2 timing constraints
...
CTS: -----------------------------------------------
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
Before clustering
After clustering
CTS: Starting clustering for bufbda with target load worst[0.240 0.240]
CTS: Completed 423 to 27 clustering
CTS: BA: lp (1.520, 0.673): skew (0.149, 0.080) c(1.481, 0.198) viol(n y)
CTS: -----------------------------------------------
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
CTS: Completed 27 to 4 clustering
CTS: BA: lp (0 673 0 597): skew (0 080 0 105) c(0 198 0 026) viol(n n)
One buffer level is added
with each clustering
Represents DRCs
(cap,trans)
© Synopsys 2012 28
CTS: BA: lp (0.673, 0.597): skew (0.080, 0.105) c(0.198, 0.026) viol(n n)
CTS: -----------------------------------------------
y : violation present
n : no violation
Skew (Before clustering, After clustering)
Clustering With Hookup Pins
• Hookup pins are input pins of gates or macros
• Unlike clock pins of flip-flops and latches (sink pins), hookup pins
have a nonzero phase delay that must be balanced with the sink
pins
© Synopsys 2012 29
Initially the tool makes attempts to cluster hookup pins along with the normal sinks (trial
Clustering With Hookup Pins
• Initially, the tool makes attempts to cluster hookup pins along with the normal sinks (trial
clustering)
CTS: gate level 1 clock tree synthesis
...
CTS: gate level 1 design rule constraints [rise fall]
CTS: max transition = worst[0.300 0.300]
In this example there are 479 sinks
CTS: max capacitance = worst[0.300 0.300]
CTS: max fanout = 2000
CTS: gate level 1 target spec [rise fall]
CTS: transition = worst[0.240 0.240]
CTS: capacitance = worst[0.240 0.240]
CTS: driver cap. = worst[0.150 0.150]
CTS: fanout = 32
In this example, there are 479 sinks
and 1 hookup pin
CTS: fanout 32
CTS: gate level 1 timing constraints
...
CTS: -----------------------------------------------
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
CTS: Completed 480 to 34 clustering
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
CTS C l t d 34 t 6 l t i
Trial
clustering
CTS: Completed 34 to 6 clustering
CTS: BA: this delay [max min] (skew) = worst[0.000 0.000] (0.000)
CTS: BA: next delay [max min] (skew) = worst[0.124 0.124] (0.000)
CTS: BA: target cap = 0.070 pf
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
CTS: BA: CAC set: target cap = 0.070317: targetWireCap = 0.274866
CTS: Completed 479 to 39 clustering
clustering
Actual
l t i
CTS: BA: lp (1.574, 0.770): skew (0.821, 0.451) c(1.737, 0.269) viol(n y)
CTS: -----------------------------------------------
• At the trial clustering stage, the hookup pin is considered along with the other sink pins and
(479+1) to 34 to 6 clustering is obtained
• At the actual clustering stage the tool clusters the 479 sink pins separately from the hookup
clustering
© Synopsys 2012 30
• At the actual clustering stage, the tool clusters the 479 sink pins separately from the hookup
pin
Clustering With Hookup Pins:
Hookup Pin Clustered With Sinks
• If the trial clustering gives good QoR results, the following message shown in
blue is displayed :
Hookup Pin Clustered With Sinks
blue is displayed :
CTS: BA: lp (1.968, 2.031): skew (0.257, 0.194) c(0.076, 0.072) viol(y y)
CTS: -----------------------------------------------
CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]
CTS: BA: rootNetCap = 0.071776: targ cap = 0.045000: targ wirecap = 0.000000: not relaxed
CTS: Completed 2 to 2 clustering
CTS: Completed 2 to 2 clustering
CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]
CTS: BA: rootNetCap = 0.071776: targ cap = 0.045000: targ wirecap = 0.000000: not relaxed
CTS: Completed 2 to 1 clustering
CTS: BA: this delay [max min] (skew) = worst[2.040 1.844] (0.196)
CTS: BA: next delay [max min] (skew) = worst[2.161 1.965] (0.196)
CTS: BA: next delay [max min] (skew) worst[2.161 1.965] (0.196)
CTS: BA: target cap = 0.048 pf
CTS: Pin 1: periph/U5659/A is selected for next level
CTS: delay [max min] (skew) = worst[1.976 1.921] (0.055)
CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]
CTS: Completed 2 to 2 clustering
p g
CTS: BA: lp (2.031, 2.153): skew (0.194, 0.210) c(0.072, 0.026) viol(n n)
CTS: -----------------------------------------------
• When the phase delay of the hookup pin periph/U5659/A matches with the
delay of the already built tree at that gate level, it will be clustered at that buffer
© Synopsys 2012 31
y y g ,
level.
Meeting Target Early Delay
• After the synthesis of the root clock net (gate level 1 synthesis), the tool checks if the delay
constraint set by the user is being met or not.
• If it is not met, the tool inserts some buffers at the root clock net to achieve the target delay
specified by the user.
p y
• In the following message, 16 buffers are inserted at the root clock net to increase the delay from
0.569ns to 2ns, which is the user specified target.
CTS: gate level 1 clock tree synthesis
CTS: clock net = sys clk
C S: c oc et sys_c
CTS: driving pin = sys_clk
CTS: gate level 1 design rule constraints [rise fall]
...
CTS: gate level 1 target spec [rise fall]
...
CTS: gate level 1 timing constraints Constraint set by the user
CTS: clock skew = worst[0.000]
CTS: insertion delay = worst[2.000]
CTS: levels per net = 200
CTS: -----------------------------------------------
CTS: Starting clustering for CLKBUF_X20 with target load = worst[0.211 0.270]
...
CTS: -----------------------------------------------
CTS:
CTS: Starting clustering for CLKBUF_X20 with target load = worst[0.211 0.270]
CTS: Completed 19 to 2 clustering
CTS: BA: lp (0.563, 0.569): skew (0.142, 0.112) c(0.008, 0.008) viol(n n)
CTS: -----------------------------------------------
CTS: Inserting delay cells for clock tree sys_clk ...
CTS: current delay = worst[0.569] worst[0.457]
© Synopsys 2012 32
CTS: constraint = worst[2.000] worst[0.000]
CTS: inserted 16 (buffd3) delay cells to the clock net sys_clk
CTS: gate level 1 clock tree synthesis results
Synthesis Results of One Gate Level
After the synthesis of a
CTS: gate level 1 clock tree synthesis results
CTS: clock net : sdram_clk
CTS: driving pin: sdram_clk
CTS: load pins : 5 sink pins, 0 gates/macros pins, 0 ignore pins
CTS: buffer level 1: bufbd7 (1)
CTS: buffer level 2: bufbd7 (1)
delay
at
the
dram_clk)
After the synthesis of a
gate level, the results are
printed in the log
CTS: clock tree skew = worst[0.036]
CTS: longest path delay = worst[0.327](rise)
CTS: shortest path delay = worst[0.291](rise)
CTS: total capacitance = worst[0.389 0.389]
CTS: buffer level phase delay
CTS 1 (I) t[0 293]( i ) t[0 256]( i ) k t[0 036]
d
insertion
d
n
A
(here
sd
Operating Condition
CTS: 1 (I): worst[0.293](rise), worst[0.256](rise); skew = worst[0.036]
CTS: (O): worst[0.151](rise), worst[0.129](rise); skew = worst[0.022]
CTS: 2 (I): worst[0.150](rise), worst[0.128](rise); skew = worst[0.022]
CTS: (O): worst[0.004](rise), worst[0.000](rise); skew = worst[0.004]
CTS: buffer level output transition delays [rise fall]
CTS: level 0: worst[0.088 0.085] worst[0.088 0.085]
Skew
and
driving
pin
CTS: level 0: worst[0.088 0.085] worst[0.088 0.085]
CTS: load 0: worst[0.088 0.085] worst[0.088 0.085]
CTS: level 1: worst[0.111 0.115] worst[0.091 0.092]
CTS: load 1: worst[0.111 0.115] worst[0.091 0.092]
CTS: level 2: worst[0.158 0.153] worst[0.080 0.071]
CTS: load 2: worst[0.158 0.153] worst[0.080 0.071]
CTS: buffer level total load capacitance
CTS: level 0: worst[0.045 0.045]
CTS: level 1: worst[0.093 0.093]
CTS: level 2: worst[0.251 0.251]
CTS: drc violations: 0 0
2
1
A C
B
Load capacitance value is added and is
© Synopsys 2012 33
Load capacitance value is added and is
reported as total capacitance of the subtree
Number of cap
violations
Number of trans
violations
Maximum Transition and Capacitance
Violations
• After each gate level is synthesized, the maximum capacitance and
maximum transition violations at that gate level are reported
Violations
CTS: gate level 3 clock tree synthesis results
...
CTS: buffer level total load capacitance
...
CTS it i l ti i h/CTS 755
CTS: capacitance violation on periph/CTS_755
CTS: capacitance = worst[0.052 0.052]
CTS: constraint = worst[0.050 0.050]
CTS: capacitance violation on periph/CTS_757
CTS: capacitance = worst[0.051 0.051]
CTS: constraint = worst[0 050 0 050]
CTS: constraint worst[0.050 0.050]
...
CTS: transition delay violation at periph/CLKBUFX20_G3B1I3/A
CTS: transition delay = worst[0.052 0.050] worst[0.052 0.050]
CTS: constraint = worst[0.050 0.050]
CTS: transition delay violation at periph/CLKBUFX20_G3B2I14/A
CTS: transition delay = worst[0.053 0.051] worst[0.053 0.051]
CTS: constraint = worst[0.050 0.050]
...
CTS: drc violations: 18 5
Number of cap
violations
Number of trans
violations
© Synopsys 2012 34
violations violations
A More Complex Synthesis Results
CTS: gate level 1 clock tree synthesis results
CTS: clock net : clk
CTS: driving pin: clk
CTS: load pins : 80 sink pins, 0 gates/macros pins, 0 ignore pins
CTS: buffer level 1: CLKBUFX20 (1)
CTS: buffer level 2: CLKBUFX20 (2) CLKBUFX12 (1)
CTS: clock tree skew = worst[0.001]
CTS: longest path delay = worst[0.248](rise)
CTS: shortest path delay = worst[0.246](rise)
CTS: total capacitance = worst[0.549 0.549]
CTS: buffer level phase delay
CTS: 1 (I): worst[0.247](rise), worst[0.246](rise); skew = worst[0.001]
CTS: (O): worst[0.141](rise), worst[0.140](rise); skew = worst[0.001]
CTS: 2 (I): worst[0.141](rise), worst[0.140](rise); skew = worst[0.001]
CTS: (O): worst[0.001](rise), worst[0.000](rise); skew = worst[0.001]
CTS: buffer level output transition delays [rise fall]
CTS: level 0: worst[0.000 0.000] worst[0.000 0.000]
CTS: load 0: worst[0.000 0.000] worst[0.000 0.000]
CTS: level 1: worst[0.089 0.076] worst[0.089 0.076]
CTS: load 1: worst[0.089 0.076] worst[0.089 0.076]
CTS: level 2: worst[0.109 0.093] worst[0.104 0.091]
CTS: load 2: worst[0.109 0.093] worst[0.104 0.091]
CTS: buffer level total load capacitance
CTS: buffer level total load capacitance
CTS: level 0: worst[0.038 0.038]
CTS: level 1: worst[0.108 0.108]
CTS: level 2: worst[0.403 0.403]
CTS: drc violations: 0 0
© Synopsys 2012 35
Gate Level and Buffer Level Nomenclature
2
1
2
1
)
ate
level
2
ate
level
1
ate
level
2
ate
level
1
level
3
e
level
2
level
4
e
level
2
vel
1
source
pin
evel
2
evel
1
of
g
evel
2
of
g
evel
2
of
g
evel
1
of
g
Buffer
of
gate
Buffer
of
gate
Gate
lev
(Clock
s
Gate
Le
Buffer
le
Buffer
le
Buffer
le
Buffer
le
Red: Preexisting gates At each gate level, the clock tree is built
© Synopsys 2012 36
Black: CTS introduced gates bottom-up, but the buffer names are changed
to appear top-down
DRC Violation Report After Synthesis
• After building the complete clock tree, all the remaining DRC violations in
the entire clock tree gets reported in the log file:
CTS: Clock tree synthesis completed successfully
CTS: CPU time: 50 seconds
CTS: Reporting clock tree violations ...
CTS: Global design rules:
CTS: maximum transition delay [rise,fall] = [0.05,0.05]
CTS: maximum capacitance = 0.05
Constraints
CTS: maximum fanout = 2000
CTS: maximum buffer levels per net = 200
CTS: transition delay violation at sdram_clk
CTS: user specified transition delay = worst[0.056 0.050] worst[0.056 0.050]
CTS: constraint = worst[0.050 0.050]
Constraints
CTS: transition delay violation at CLKBUF_X20_G1B21I1/Z
CTS: transition delay = worst[0.051 0.050] worst[0.051 0.050]
CTS: constraint = worst[0.050 0.050]
CTS: capacitance violation on CTS_6557
CTS: capacitance = worst[0.074 0.074]
Reports only transition
and capacitance violations
p [ ]
CTS: constraint = worst[0.050 0.050]
CTS: Summary of clock tree violations:
CTS: Total number of transition violations = 2
CTS: Total number of capacitance violations = 1
p
Total transition and
capacitance violations
© Synopsys 2012 37
Summary Report After
Clock Tree Synthesis
CTS: ------------------------------------------------
CTS Cl k T S th i S
Clock Tree Synthesis
CTS: Clock Tree Synthesis Summary
CTS: ------------------------------------------------
CTS: 5 clock domain synthesized
CTS: 30 gated clock nets synthesized
CTS: 26 buffer trees inserted
CTS: 722 buffers used (total size = 45974.2)
CTS: 752 clock nets total capacitance = worst[76.868 76.868]
Each gate level can
h l i l
have multiple nets
© Synopsys 2012 38
Clock-by-Clock Summary
• A summary is reported for each clock:
CTS: ------------------------------------------------
CTS: Clock-by-Clock Summary
Buffer tree is inserted
only if necessary
CTS: ------------------------------------------------
CTS: Root clock net pclk
CTS: 3 gated clock nets synthesized
CTS: 2 buffer trees inserted
only if necessary
CTS: 2 buffers used (total size = 159.667)
CTS: 5 clock nets total capacitance = worst[0.514 0.514]
CTS: clock tree skew = worst[0.341]
CTS: longest path delay = worst[5.959](rise)
CTS: longest path delay worst[5.959](rise)
CTS: shortest path delay = worst[5.619](rise)
CTS: Root clock net sys_clk
...
© Synopsys 2012 39
Embedded Clock Tree Optimization
• After clock tree synthesis, embedded clock tree optimization begins
• The characteristics of the buffers and inverters used are reported again
CTS: buffer estimated skew target delay driving res input cap
CTS: bufbdf [0.013 0.015] [0.217 0.200] [0.210 0.248] [0.007 0.007]
CTS: inv0da [0.018 0.021] [0.097 0.119] [0.294 0.347] [0.036 0.036]
...
• The global constraints for clock tree are also reported again
CTS: Global design rule constraints [rise fall]
CTS: max transition = worst[0.050 0.050] GUI = worst[0.050 0.050] SDC = undefined/ignored
...
C S Gl b l i i / l k i
CTS: Global timing/clock tree constraints
CTS: clock skew = worst[0.000]
...
CTS: Global target spec [rise fall]
CTS: transition = worst[0.040 0.040]
...
Note:
Embedded clock tree optimization is called only when the compile_clock_tree
command is used It is not called when the l k t command is used
© Synopsys 2012 40
command is used. It is not called when the clock_opt command is used
More Messages on Real Gates and
Guide Buffers
• At the beginning of optimization, you might get the following
Guide Buffers
messages:
CTS: Root clock net chip_sclk_src
CTS: clock gate levels = 75
CTS: clock sink pins = 125896
CTS: clock sink pins 125896
...
CTS: level 73: gates = 3 (real gates = 1)
CTS: level 72: gates = 2 (no real gates, guide buffers only)
ff
• All the gates are guide buffers and inverters inserted during clock
tree synthesis.
• This information is similar to the one printed prior to clock tree
h i
synthesis.
© Synopsys 2012 41
Gate Level Optimization
• The clock tree optimization is also done for each gate level
• Similar to when the clock tree is built
• Before optimizing a gate level, the current skew, longest path delay and shortest
path delay from the driving pin of that gate level, is reported.
CTS: gate level 2 clock tree optimization
CTS: clock net = I_BLENDER_1/gclk
CTS: driving pin = I_BLENDER_1/U483/Z
CTS: clock tree skew = worst[0.517]
CTS: longest path delay = worst[5.339](rise)
CTS: shortest path delay = worst[4.822](fall)
• After which that gate level is optimized
© Synopsys 2012 42
Buffer Sizing
• The following message indicates that buffer sizing was successful
CTO-BS: Starting buffer sizing ...
Information: Replaced the library cell of CLKBUF_X20_G2B2I1 from CLKBUF_X20 to CLKBUF_X16. (CTS-152)
CTO-BS: CPU time = 0 seconds for buffer sizing
• Clock tree optimization will try to resize buffers, and improve skew and
insertion delay. If it does not find it beneficial, then the original cell
master will be restored.
CTO-BS: Starting buffer sizing ...
CTO-BS: Restoring original cellMaster <CLKBUF_X20> of <CLKBUF_X20_G2B2I4>
CTO-BS: CPU time = 1 seconds for buffer sizing
© Synopsys 2012 43
CTO-GS: Starting gate sizing ...
Gate Sizing
Information: Replaced the library cell of I7188625 from TLQMUX2X60 to TULQMUX2ZSX40. (CTS-152)
Information: Replaced the library cell of I7586451 from TLTMUX2X60 to TLTMUX2X50. (CTS-152)
Information: Replaced the library cell of I3342873 from TULTMUX2X50 to TLTMUX2ZSX60. (CTS-152)
Information: Replaced the library cell of I1387108 from TULTMUX2X80 to TULTMUX2ZSX80. (CTS-152)
...
I f ti R l d th lib ll f I6717862 f THQMUX2ZSX80 t TSTMUX2ZSX20 (CTS 152)
14 cells sized
Information: Replaced the library cell of I6717862 from THQMUX2ZSX80 to TSTMUX2ZSX20. (CTS-152)
Information: Replaced the library cell of I9359863 from TLTMUX2ZSX80 to TULTMUX2ZSX60. (CTS-152)
Information: Replaced the library cell of I10258160 from TLTMUX2ZSX60 to TLTMUX2ZSX40. (CTS-152)
Information: Replaced the library cell of I7636259 from TLTMUX2ZFFX80 to TULTMUX2ZSX60. (CTS-152)
CTO-GS: 1: Sized 14/40 cell instances (tested 40X247)
CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471] Summary of the first round of sizing
y ( ) [ ] [ ]; [ ]
CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471]
CTO-GS: improvement = worst[0.106%]
Information: Replaced the library cell of I2130284 from TLTMUX2X80 to TLTMUX2ZSX40. (CTS-152)
Information: Replaced the library cell of I8618764 from TLTMUX2ZFFX80 to TLTMUX2X80. (CTS-152)
Information: Replaced the library cell of I1749911 from TULTMUX2ZFFFX80 to TULTMUX2ZFFX80. (CTS-152)
• Number of gate sized (Here 14 out of 40 gates)
• Shows the improvement in skew
Information: Replaced the library cell of I3342873 from TLTMUX2ZSX60 to TLTMUX2ZSX40. (CTS-152)
Information: Replaced the library cell of I8872989 from TULTMUX2ZFFFX60 to TLTMUX2ZFFX80. (CTS-152)
Information: Replaced the library cell of I1387108 from TULTMUX2ZSX80 to TULTMUX2X50. (CTS-152)
CTO-GS: 2: Sized 6/40 cell instances (tested 40X247)
CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471]
CTO GS: delay (to) = worst[9 104] worst[8 633]; skew = worst[0 471]
CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471]
CTO-GS: improvement = worst[0.000%]
CTO-GS: Summary of cell sizing
CTO-GS: Sized 20/40 cell instances (tested 80X247)
CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471]
CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471]
Overall summary of gate sizing done at this gate
level. Total 14+6 =20 gates sized giving an
0 106% i t i k t thi t l l
© Synopsys 2012 44
y
CTO-GS: improvement = worst[0.106%]
CTO-GS: CPU time = 2413 seconds for gate sizing
0.106% improvement in skew at this gate level
Gate Relocation
• Gate relocation works on preexisting gates.
• If you have no preexisting gates, you might see the following
message:
g
CTO-GR: gate relocation is skipped since there are no hookup pins
© Synopsys 2012 45
A Successful Gate Relocation
CTO-GR: Starting gate relocation ...
CTO-GR: delay [max min] (skew) = worst[9.023 8.563] (0.460)
2 cells were tried at 47
new locations, 1 was moved
CTO-GR: 1: Relocated 1/40 cell instances (tested 2 cell instances at 47 points)
CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460]
CTO-GR: delay (to) = worst[9.023] worst[8.563]; skew = worst[0.460]
CTO-GR: improvement = worst[0.000%]
CTO GR d l [ i ] ( k ) t[9 018 8 563] (0 455)
Initial skew
Final skew
Improvement in skew
CTO-GR: delay [max min] (skew) = worst[9.018 8.563] (0.455)
CTO-GR: delay [max min] (skew) = worst[9.018 8.563] (0.455)
CTO-GR: 2: Relocated 2/40 cell instances (tested 5 cell instances at 83 points)
CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460]
CTO-GR: delay (to) = worst[9.018] worst[8.563]; skew = worst[0.455]
y ( ) [ ] [ ] [ ]
CTO-GR: improvement = worst[1.118%]
CTO-GR: Summary of cell relocation
CTO-GR: Relocated 3/40 cell instances (tested 7 cell instances at 130 points)
CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460] Overall summary of
t l ti t thi
CTO-GR: delay (to) = worst[9.018] worst[8.563]; skew = worst[0.455]
CTO-GR: improvement = worst[1.118%]
CTO-GR: CPU time = 2 seconds for gate relocation
gate relocation at this
gate level
© Synopsys 2012 46
Gate Relocation: Failed Attempts
CTO-GR: Starting gate relocation ...
CTO-GR: Summary of cell relocation
CTO-GR: Summary of cell relocation
CTO-GR: Relocated 0/1 cell instances (tested 1 cell instances at 24 points)
CTO-GR: delay (from) = worst[1.207] worst[0.980]; skew = worst[0.227]
CTO-GR: delay (to) = worst[1.207] worst[0.980]; skew = worst[0.227]
CTO-GR: improvement = worst[0.000%]
CTO-GR: CPU time = 0 seconds for gate relocation
• In this example, clock tree optimization tried to move one gate
instance to 24 different locations. Since the attempts did not improve
the QoR, the gate relocation was abandoned
© Synopsys 2012 47
Buffer Relocation
• Buffer relocation is done on all clock tree synthesis inserted buffers
CTO-BR: Buffer relocation ...
CTO BR: Buffer relocation ...
CTO-BR: Optimization level: net
CTO-BR: delay [max min] (skew) = worst[9.087 8.503] (0.584)
CTO-BR: 1: Relocated 1/6 cell instances (tested 6 cell instances at 74 points)
CTO-BR: delay (from) = worst[9.099] worst[8.503]; skew = worst[0.596]
CTO-BR: delay (to) = worst[9.087] worst[8.503]; skew = worst[0.584]
CTO-BR: improvement = worst[2.013%]
CTO-BR: delay [max min] (skew) = worst[9.087 8.503] (0.584)
CTO-BR: 2: Relocated 1/6 cell instances (tested 5 cell instances at 62 points)
CTO-BR: delay (from) = worst[9 087] worst[8 503]; skew = worst[0 584]
CTO BR: delay (from) worst[9.087] worst[8.503]; skew worst[0.584]
CTO-BR: delay (to) = worst[9.087] worst[8.503]; skew = worst[0.584]
CTO-BR: improvement = worst[0.000%]
CTO-BR: Summary of cell relocation
CTO-BR: Relocated 2/6 cell instances (tested 11 cell instances at 136 points)
CTO-BR: delay (from) = worst[9.099] worst[8.503]; skew = worst[0.596]
CTO-BR: delay (to) = worst[9.099] worst[8.503]; skew = worst[0.584]
CTO-BR: improvement = worst[2.013%]
CTO-BR: CPU time = 0 seconds for buffer relocation
Th i f i i i il l i
© Synopsys 2012 48
• The information is similar to gate relocation
• After the embedded clock tree optimization, the tool prints the summary.
• It looks exactly similar to the summary printed after clock tree synthesis
Post Embedded Clock Tree Synthesis
• It looks exactly similar to the summary printed after clock tree synthesis.
CTS: ------------------------------------------------
CTS: Clock Tree Optimization Summary
CTS: ------------------------------------------------
CTS: 4 clock domain synthesized
CTS: 5 gated clock nets synthesized
CTS: 5 buffer trees inserted
CTS: 1000 buffers used (total size = 16570 8)
CTS: 1000 buffers used (total size = 16570.8)
CTS: 1005 clock nets total capacitance = worst[14.010 14.010]
CTS: ------------------------------------------------
CTS: Clock-by-Clock Summary
CTS: ------------------------------------------------
CTS: Root clock net sdram_clk
CTS: 1 gated clock nets synthesized
CTS: 1 buffer trees inserted
CTS: 1 buffer trees inserted
CTS: 302 buffers used (total size = 5039.47)
CTS: 303 clock nets total capacitance = worst[4.170 4.170]
CTS: clock tree skew = worst[0.035]
CTS: longest path delay = worst[2.041](rise)
CTS: shortest path delay = worst[2.006](fall)
CTS: Root clock net sys_2x_clk
...
• After the summary, all the trans and cap violations on the clock tree are also reported.
CTS: Global design rules:
CTS: maximum transition delay [rise,fall] = [0.05,0.05]
CTS: maximum capacitance = 0.05
CTS: maximum fanout = 2000
CTS: maximum buffer levels per net = 200
CTS: transition delay violation at sdram_clk
CTS: user specified transition delay = worst[0.056 0.050] worst[0.056 0.050]
CTS: constraint = worst[0.050 0.050]
CTS: transition delay violation at buffd2_G1B1I1/Z
...
CTS: Summary of clock tree violations:
© Synopsys 2012 49
CTS: Summary of clock tree violations:
CTS: Total number of transition violations = 3994
CTS: Total number of capacitance violations = 1
DRC Fixing Beyond Exceptions
• After embedded clock tree optimization, the tool will start fixing the
DRC violations beyond exceptions.
• The messages are similar to clustering:
CTS: fixing DRC beyond exception pins under clock CLK1
CTS: gate level 2 DRC fixing (exception level 1)
CTS: clock net = CLK1_G1IP
CTS: driving pin = bufbd2_G1IP_1/Z
CTS: gate level 2 design rule constraints [rise fall]
CTS: max transition = worst[0.100 0.100]
CTS: max capacitance = worst[0.600 0.600]
CTS: max fanout = 2000
CTS: max fanout 2000
CTS: -----------------------------------------------
CTS: Starting clustering for bufbdf with target load = worst[0.056 0.056]
CTS: Completed 4 to 1 clustering
CTS: -----------------------------------------------
CTS: Starting clustering for bufbd7 with target load = worst[0.050 0.050]
1 1 i
CTS: Completed 1 to 1 clustering
CTS: ------------------------------------------------
• After fixing the DRC violations, the whole summary and the clock-
by-clock summary of DRC fixing beyond exceptions are reported.
© Synopsys 2012 50
by clock summary of DRC fixing beyond exceptions are reported.
Placement Legalization is Called
After Clock Tree Synthesis
• When clock tree synthesis places a clock tree buffer or inverter, it
After Clock Tree Synthesis
places it at a legal location, but the location might be occupied
 Causes overlaps which needs to be resolved
• The tool calls the placement legalizer which moves the cells to
resolve the overlaps.
• After legalization, the cells with large displacement gets reported in
the log
Largest displacement cells:
Cell: periph/U122 (AND3X)
Input location: (906.380 1597.520)
Legal location: (897.140 1582.400)
Displacement: 17 720 um e g 3 52 row height
1 of 6 cells that
were displaced
Displacement: 17.720 um, e.g. 3.52 row height.
Total 6 cells has large displacement (e.g. > 15.120 um or 3 row height)
© Synopsys 2012 51
Agenda
• Prerequisites for Clock Tree Synthesis
• Enabling Useful Debug Messages in IC Compiler Clock
Tree Synthesis
• Clock Tree Synthesis Log Messages
• Clock Tree Optimization Log Messages
© Synopsys 2012 52
The optimize_clock_tree Command
Log File Messages
• Optimization options
Log File Messages
p p
• Report before optimization
• Optimization
• Report after optimization
© Synopsys 2012 53
Standalone Optimization Using the
optimize clock tree Command
• Standalone optimization differs from embedded optimization in the
optimize_clock_tree Command
algorithms used
• Some of the log messages are similar to those of when you use the
g g y
compile_clock_tree command
 Design update information
 Buffer characterization
Buffer characterization
 Pruning of cells
 List of cells used for clock tree optimization
© Synopsys 2012 54
CTS-352 Warning
• The default delay calculation engine is Elmore. Elmore delay
calculation might lead to inferior accuracy in skew and latency
estimation.
• Enable the Arnoldi delay calculation engine for more accurate delay
y g y
calculation during optimization, by using the following command:
set_delay_calculation –clock_arnoldi
• Otherwise, the optimize_clock_tree command will issue the
following warning:
Warning: set_delay_calculation is currently set to 'elmore'.
'clock arnoldi' is suggested (CTS 352)
'clock_arnoldi' is suggested. (CTS-352)
© Synopsys 2012 55
Optimization Options
• Before starting optimization, the optimize_clock_tree
d h i d h i i i i f h
command reports the root pin and the optimization options for each
clock.
• The following are the options which you have specified, by using the
set clock tree optimization options command
set_clock_tree_optimization_options command
Initializing parameters for clock CLK2GC:
Root pin: instCLK2GC/Q
Root pin: instCLK2GC/Q
Using the following optimization options:
gate sizing : on
gate relocation : on
preserve levels : off
area recovery : on
relax insertion delay : off
balance rc : off
© Synopsys 2012 56
balance rc : off
Preoptimization Report
• Before the tool begins to optimize the clock tree, it reports some of
the current characteristics of the clock tree:
*****************************************
* Preoptimization report (clock 'CLK3') * Clock name
* Preoptimization report (clock CLK3 ) *
*****************************************
Corner max'
Estimated Skew (r/f/b) = (0.073 0.000 0.073)
Estimated Insertion Delay (r/f/b) = (1.903 -inf 1.903)
Corner 'RC-ONLY'
Clock name
CTS corner
The starting skew and ID
for the clock as seen by
CTO
Estimated Skew (r/f/b) = (0.005 0.000 0.005)
Estimated Insertion Delay (r/f/b) = (0.008 -inf 0.008)
Wire capacitance = 0.8 pf
Total capacitance = 2.3 pf
Max transition = 0.448 ns
CTO
Maximum transition value
present in the clock tree
Cells = 24 (area=67.500000)
Buffers = 23 (area=67.500000)
Buffer Types
============
bufbd2: 1
bufbdf: 8
p
Information about the
buffers and inverters
t i th l k t
bufbdf: 8
bufbd7: 5
bufbd4: 3
bufbd1: 6
present in the clock tree
© Synopsys 2012 57
Optimization Messages
• During optimization, the tool prints out messages for sizing, insertion
and removal, and switching of metal layers:
Deleting cell I_SDRAM_TOP/bufbda_G1B1I10 and output net I_SDRAM_TOP/sdram_clk_G1B1I10.
iteration 1: (0.314104, 3.328620)
Total 1 buffers removed on clock CLK3
Start (3.256, 3.527), End (3.015, 3.329)
Buffer Removal
Start (sp, lp) : Initial delays
(skew, ID)
....
iteration 2: (0.313991, 3.314841)
iteration 3: (0.308073, 3.295621)
Total 2 cells sized on clock CLK3
Start (3 015, 3 329), End (2 988, 3 296)
Cell Sizing
Start (sp, lp) : Initial delays
End (sp, lp) : Final delays
sp: shortest path delay
lp: longest path delay
Start (3.015, 3.329), End (2.988, 3.296)
....
iteration 6: (0.305181, 3.275623)
Total 1 delay buffers added on clock sck_in12 (LP)
Start (2.975, 3.283), End (2.970, 3.276)
Buffer Insertion
....
Switch to low metal layer for clock ‘CLK3':
Total 9 out of 13 nets switched to low metal layer for clock ‘CLK3' with largest cap
change 0.00 percent
© Synopsys 2012 58
Metal layer switching
Optimization Messages
• If area recovery option is enabled, the tool does area recovery after
optimizing each clock and reports the changes made to that clock:
optimizing each clock, and reports the changes made to that clock:
Area recovery optimization for clock ‘CLK3':
15% 23% 30% 46% 53% 61% 76% 84% 92% 100%
Deleting cell cell I_SDRAM_TOP/bufbda_G1B1I9 and output net I_SDRAM_TOP/sdram_clk_G1B1I9.
Total 1 buffers removed (all paths) for clock ‘CLK3'
© Synopsys 2012 59
• After completing the optimization of a clock, the tool reports the new
Post Optimization Report
p g p , p
characteristics of the clock tree.
• This is similar to the information printed in before optimization:
**************************************************
* Multicorner optimization report (clock 'CLK3') *
**************************************************
Corner ‘max'
Estimated Skew (r/f/b) = (0.041 0.000 0.041)
E ti t d I ti D l ( /f/b) (1 725 i f 1 725)
Estimated Insertion Delay (r/f/b) = (1.725 -inf 1.725)
Corner 'RC-ONLY'
Estimated Skew (r/f/b) = (0.007 0.000 0.007)
Estimated Insertion Delay (r/f/b) = (0.009 -inf 0.009)
Wire capacitance = 0.8 pf
Total capacitance = 2.3 pf
Max transition = 0.356 ns
Cells = 24 (area=59.000000)
Buffers = 23 (area=59.000000)
Buffer Types
Buffer Types
============
bufbd7: 4
bufbdf: 6
bufbd4: 5
© Synopsys 2012 60
bufbd1: 7
bufbd2: 1
Reporting the Longest and Shortest Paths
• The longest and shortest paths corresponding to all corners are reported,
soon after the post optimization report:
++ Longest path for clock CLK3 in corner 'max':
object fan cap trn inc arr r location
clk3 (port) 32 0 0 r ( 440 748)
clk3 (net) 13 97
…
I_SDRAM_TOP/I_SDRAM_READ_FIFO/reg_array_reg_3__8_/CP (senrq1)
167 4 289 r ( 521 520)
++ Shortest path for clock CLK3 in corner 'max':
object fan cap trn inc arr r location
object fan cap trn inc arr r location
clk3 (port) 32 0 0 r ( 440 748)
clk3(net) 13 97
…
I_SDRAM_TOP/I_SDRAM_READ_FIFO/reg_array_reg_4__11_/CP (senrq1)
217 4 247 r ( 687 656)
217 4 247 r ( 687 656)
• Placement legalization related messages are located at the end of the
optimize_clock_tree command log
© Synopsys 2012 61
Thank you
© Synopsys 2012 62
© Synopsys 2012 63

More Related Content

Similar to **Understanding_CTS_Log_Messages.pdf

5035-Pipeline-Optimization-Techniques.pdf
5035-Pipeline-Optimization-Techniques.pdf5035-Pipeline-Optimization-Techniques.pdf
5035-Pipeline-Optimization-Techniques.pdfssmukherjee2013
 
INDUSTRIAL TRAINING REPORT EMBEDDED SYSTEM.pptx
INDUSTRIAL TRAINING REPORT EMBEDDED SYSTEM.pptxINDUSTRIAL TRAINING REPORT EMBEDDED SYSTEM.pptx
INDUSTRIAL TRAINING REPORT EMBEDDED SYSTEM.pptxMeghdeepSingh
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOCA B Shinde
 
Porting_uClinux_CELF2008_Griffin
Porting_uClinux_CELF2008_GriffinPorting_uClinux_CELF2008_Griffin
Porting_uClinux_CELF2008_GriffinPeter Griffin
 
An Entire Concept of Embedded systems
An Entire Concept of Embedded systems An Entire Concept of Embedded systems
An Entire Concept of Embedded systems Prabhakar Captain
 
An entire concept of embedded systems entire ppt
An entire concept of embedded systems entire pptAn entire concept of embedded systems entire ppt
An entire concept of embedded systems entire pptPrabhakar Captain
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsYinghai Lu
 
Embedded training report(mcs 51)
Embedded training report(mcs 51)Embedded training report(mcs 51)
Embedded training report(mcs 51)Gurwinder Singh
 
microcontroller 8051 17.07.2023.pdf
microcontroller 8051 17.07.2023.pdfmicrocontroller 8051 17.07.2023.pdf
microcontroller 8051 17.07.2023.pdf818Farida
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...Amazon Web Services
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Programinside-BigData.com
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...Amazon Web Services
 

Similar to **Understanding_CTS_Log_Messages.pdf (20)

5035-Pipeline-Optimization-Techniques.pdf
5035-Pipeline-Optimization-Techniques.pdf5035-Pipeline-Optimization-Techniques.pdf
5035-Pipeline-Optimization-Techniques.pdf
 
INDUSTRIAL TRAINING REPORT EMBEDDED SYSTEM.pptx
INDUSTRIAL TRAINING REPORT EMBEDDED SYSTEM.pptxINDUSTRIAL TRAINING REPORT EMBEDDED SYSTEM.pptx
INDUSTRIAL TRAINING REPORT EMBEDDED SYSTEM.pptx
 
Static_Time_Analysis.pptx
Static_Time_Analysis.pptxStatic_Time_Analysis.pptx
Static_Time_Analysis.pptx
 
S emb t7-arch_bus
S emb t7-arch_busS emb t7-arch_bus
S emb t7-arch_bus
 
Timers
TimersTimers
Timers
 
Timer
TimerTimer
Timer
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOC
 
Porting_uClinux_CELF2008_Griffin
Porting_uClinux_CELF2008_GriffinPorting_uClinux_CELF2008_Griffin
Porting_uClinux_CELF2008_Griffin
 
An Entire Concept of Embedded systems
An Entire Concept of Embedded systems An Entire Concept of Embedded systems
An Entire Concept of Embedded systems
 
An entire concept of embedded systems entire ppt
An entire concept of embedded systems entire pptAn entire concept of embedded systems entire ppt
An entire concept of embedded systems entire ppt
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
 
Embedded training report(mcs 51)
Embedded training report(mcs 51)Embedded training report(mcs 51)
Embedded training report(mcs 51)
 
microcontroller 8051 17.07.2023.pdf
microcontroller 8051 17.07.2023.pdfmicrocontroller 8051 17.07.2023.pdf
microcontroller 8051 17.07.2023.pdf
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 

**Understanding_CTS_Log_Messages.pdf

  • 1. Understanding Understanding Clock Tree Synthesis Log Messages Log Messages © Synopsys 2012 1
  • 2. Agenda • Prerequisites for Clock Tree Synthesis • Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis • Clock Tree Synthesis Log Messages • Clock Tree Optimization Log Messages © Synopsys 2012 2
  • 3. Agenda • Prerequisites for Clock Tree Synthesis • Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis • Clock Tree Synthesis Log Messages • Clock Tree Optimization Log Messages © Synopsys 2012 3
  • 4. Prerequisite 1: Run the check clock tree Command • Run the check_clock_tree command prior to clock tree synthesis, and fix the issues reported _ _ • This command checks the following, and reports issues that can lead to bad QoR:  Cl k T S  Clock Tree Structure  Constraints  Clock Tree Exceptions © Synopsys 2012 4
  • 5. Prerequisite 2: Ensure Placement Legality • For clock tree synthesis to proceed without any errors, it is necessary to have a legally placed design. • Use the check legality command to check whether the design is g y • Use the check_legality command to check whether the design is properly placed and legalized, prior to CTS. • In case of legality issues, use the legalize_placement command to resolve these issues resolve these issues. Note: • Clock tree synthesis will abort in case of placement legality issues • Clock tree synthesis will abort in case of placement legality issues. • In some cases, like overlapping standard cells, it may still proceed and issue a warning during placement legality checking, but continuing with placement legality issues may lead to bad QoR placement legality issues may lead to bad QoR. Warning: Some cells in the design are not legal. (CTS-242) © Synopsys 2012 5
  • 6. Default Constraints • The default constraints that clock tree synthesis uses are as follows: Maximum transition time 0.5ns Maximum capacitance 0.6pF M i f 2000 Maximum fanout 2000 © Synopsys 2012 6
  • 7. Design Rule Constraints • In addition to the clock tree design rule constraint values specified using In addition to the clock tree design rule constraint values specified using set_clock_tree_options, IC Compiler also considers the design rule constraint values from the logic library and the design. • The following table summarizes how IC Compiler determines the design rule constraint Case1: Default behavior: t lib f t f l Case2: Use library and SDC settings for maximum fanout: t lib f t t Case3: Use only user set settings for clock tree synthesis and clock tree optimization: The following table summarizes how IC Compiler determines the design rule constraint values used during the design rule fixing stage of clock tree synthesis and optimization. cts_use_lib_max_fanout=false cts_use_sdc_max_fanout=false cts_force_user_constraints=false cts_use_lib_max_fanout=true cts_use_sdc_max_fanout=true cts_force_user_constraints=false cts_force_user_constraints=true Maximum capacitance The minimum value from: • The set_clock_tree_options • The CTS default value (0.6pF) The minimum value from: • The set_clock_tree_options • The CTS default value (0.6pF) Value set using set clock tree options Maximum capacitance The CTS default value (0.6pF) • The logic library • The SDC constraints The CTS default value (0.6pF) • The logic library • The SDC constraints _ _ _ p Maximum transition time The minimum value from: • The set_clock_tree_options • The CTS default value (0.5ns) Th l i lib The minimum value from: • The set_clock_tree_options • The CTS default value (0.5ns) Th l i lib Value set using set_clock_tree_options • The logic library • The SDC constraints • The logic library • The SDC constraints Maximum fanout The value set using set_clock_tree_options The minimum value from • The logic library • The SDC constraints • The set clock tree options The value set using set_clock_tree_options © Synopsys 2012 7 The set_clock_tree_options
  • 8. Constraints Specified Using the set clock tree options Command • Library units are used for time and capacitance values specified by using the set_clock_tree_options command _ _ _ p • The smallest values accepted for the -max_capacitance and -max_transition options of the set_clock_tree_options command are 1fF and 1ps respectively command are 1fF and 1ps respectively. • For example, if the library units are pF and ps, and you specify the following command IC Compiler will issue an error: command, IC Compiler will issue an error: icc_shell> set_clock_tree_options -max_cap 0.0009 -max_tran 0.300 Error: User max_cap constraint (0.900000 fF) is too small. (CTS-206) Error: User max_tran constraint (0.300000 ps) is too small. (CTS-207) – IC compiler will not accept these small values, and will use the previously specified values or the default values for maximum capacitance and maximum transition, during clock tree synthesis. © Synopsys 2012 8
  • 9. Agenda • Prerequisites for Clock Tree Synthesis • Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis • Clock Tree Synthesis Log Messages • Clock Tree Optimization Log Messages © Synopsys 2012 9
  • 10. Enabling Debug Messages • To enable clock tree synthesis debug messages in IC Compiler, use: set cts use debug mode true set cts_use_debug_mode true • Many of the messages discussed in this presentation are available only when you enable the debug mode. y g © Synopsys 2012 10
  • 11. Agenda • Prerequisites for Clock Tree Synthesis • Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis • Clock Tree Synthesis Log Messages • Clock Tree Optimization Log Messages © Synopsys 2012 11
  • 12. Messages in the compile_clock_tree Command Log • Before clock tree synthesis: D i d t Command Log – Design update – Buffer and Inverter information – Clock tree constraints – Clock structure before clock three synthesis • During clock tree synthesis: – Clustering – Meeting target early delay Meeting target early delay – Gate level clock tree synthesis results • After clock tree synthesis: S t – Summary report – Embedded clock tree optimization – DRC fixing beyond exceptions – Placement legalization © Synopsys 2012 12
  • 13. START CMD: compile clock tree CPU: 55 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011 Overview of the compile_clock_tree Command Log _ p _ _ ( ) ( ) (PSYN-508) CTS: CTS Operating Condition(s): MAX(Worst) START_FUNC: prelude CPU: 55 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011 (PSYN-508) Loading design 'ORCA_TOP' … Information: Design Library and main library capacitance units are matched - 1.000 pf. Prelude g y y p p END_FUNC: prelude CPU: 56 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011 (PSYN-508) … **************************************************************** Information: TLUPlus based RC computation is enabled. (RCEX-141) **************************************************************** Information: The distance unit in Capacitance and Resistance is 1 micron. (RCEX-007) Extraction related messages Information: The distance unit in Capacitance and Resistance is 1 micron. (RCEX 007) Information: The RC model used is TLU+. (RCEX-015) … CTS: Blockage Aware Algorithm CTS: Marking Ignore Pins.... … Warning: too small maximum transition (=0.300000) defined at library cell dl02d4. (CTS-619) CTS b ff ti t d k t t d l d i i i t CTS: buffer estimated skew target delay driving res input cap CTS: invbdk [0.009 0.010] [0.043 0.058] [0.197 0.213] [0.059 0.059] ... CTS: Prepare sources for clock domain SD_DDR_CLK CTS: Prepare sources for clock domain SDRAM_CLK CTS: Prepare sources for clock domain SYS_2x_CLK … Buffer characterization CTS: Region Aware Algorithm is automatically turned off when design has no region or only has one region. CTS: Info: Found net sys_2x_clk, on cell I_RISC_CORE/I_REG_FILE/REG_FILE_B_RAM is macro. Will not treat as pad. … clean drc fixing cell first... In all, 0 drc fixing cell(s) are cleaned In all, 0 drc fixing cell(s) beyond exception pins are cleaned … © Synopsys 2012 13 … CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_8/S is implicit ignore CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_9/S is implicit ignore …
  • 14. CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_8/S is implicit ignore CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_11/S is implicit ignore … Warning: Ignore net sd_CK since it has no synchronous pins. (CTS-231) CTS: Info: will use target transition value for initial CTS stages Pruning library cells (r/f, pwr) Min drive = 0.000372606. … Final pruned buffer set (7 buffers): bufbd1 Pruning of buffers and inverters … CTDN lib estimation: buffers should result in better clock power. CTS: BA: Net 'sdram_clk' CTS: Starting clock tree synthesis ... CTS: Conditions = worst(1) CTS: Global design rule constraints [rise fall] CTS: max transition = worst[0.300 0.300] GUI = worst[0.300 0.300] SDC = undefined/ignored Reporting global clock tree constraints … Information: Removing clock transition on clock PCI_CLK ... (CTS-103) CTS: gate level 1 clock tree synthesis CTS: clock net = sdram_clk CTS: gate level 1 clock tree synthesis results CTS: clock net : sdram clk Clock tree synthesis CTS: clock net : sdram_clk … TS: Clock tree synthesis completed successfully CTS: CPU time: 18 seconds CTS: Reporting clock tree violations ... … CTS: ------------------------------------------------ Reporting the results of clock tree synthesis CTS: Clock Tree Synthesis Summary CTS: ------------------------------------------------ … CTS: Starting block level clock tree optimization … CTS: gate level 1 clock tree optimization CTS: clock net = pclk Embedded clock tree optimization © Synopsys 2012 14 CTS: clock net = pclk
  • 15. Gate Upsizing During Clock Tree Synthesis • The compile_clock_tree command will upsize all the Synthesis preexisting cells in the clock tree before building the clock tree. Information: Replaced the library cell of sys_ctl/sunburst_clk_mux_div1/clk_buf from bufbd4 to bufbdf (CTS 152) Preexisting gate bufbdf. (CTS-152) • In the previous example the preexisting gate is upsized from a bufbd4 to a bufbdf. • This upsizing helps in reducing the number of buffer levels needed to building the clock tree, thereby reducing the buffer count. g , y g © Synopsys 2012 15
  • 16. Maximum Capacitance and Transition Related Warnings • Even if the set_clock_tree_options command does not issue any errors when you set the maximum capacitance and transition constraints, the compile_clock_tree command can issue warnings if the values are too small. Warning: too small maximum transition (=0.050000) defined at pin instCLK1GC1/Q. (CTS-620) Warning: too small maximum capacitance (=0.050000) defined at pin instCLK1GC1/Q. (CTS-620) Warning: too small maximum transition (=0.050000) defined at Max trans =50ps is too tight for the pin instCLK1GC1/Q Max cap =50fF is too tight for the pin instCLK1GC1/Q Warning: too small maximum transition ( 0.050000) defined at library cell bufbdk. (CTS-619) • Tight constraints can cause clock tree synthesis to use an excessive Tight constraints can cause clock tree synthesis to use an excessive number of buffers to build the clock trees © Synopsys 2012 16
  • 17. Buffers and Inverters Used During Clock Tree Synthesis • Before synthesizing the clock tree, IC Compiler characterizes each buffer and inverter  To see the characterization details, set the following variable to true: g set cts_do_characterization true  After characterization is done, characterized values for each buffer and inverter are reported Buffer p CTS: buffer estimated skew target delay driving res input cap CTS: bufbdf [0.013 0.015] [0.217 0.200] [0.210 0.248] [0.007 0.007] CTS: inv0da [0.018 0.021] [0.097 0.119] [0.294 0.347] [0.036 0.036] CTS: bufbd7 [0.025 0.030] [0.223 0.234] [0.415 0.503] [0.008 0.008] CTS b fbd4 [0 047 0 053] [0 347 0 357] [0 786 0 880] [0 004 0 004] CTS: bufbd4 [0.047 0.053] [0.347 0.357] [0.786 0.880] [0.004 0.004] Inverter Rise delay Fall delay • Driving resistance determines the drive strength of the buffer or inverter. • Smaller the driving resistance, greater is the drive strength. • In the previous example, bufbdf is the buffer with the highest drive strength. © Synopsys 2012 17
  • 18. Unbalanced Buffers • Buffers and inverters that have a big difference between their rise and fall delays, which is referred to as the rise/fall delay skew, are reported. CTS: inverter inv0da: rise/fall delay skew = 0.204816 (> 0.200000) • Remove unbalanced buffers them from the buffer list specified for clock tree synthesis, as they can might cause bad skew. • Use the set_clock_tree_references command to specify the buffers and inverters that should be used for clock tree synthesis © Synopsys 2012 18
  • 19. Pruning of Buffers and Invertors • Pruning is a process by which IC Compiler selects the buffers and inverters which are best suited for clock tree synthesis, based on the buffer and inverter characterization, and prevents the remaining ones f b i d from being used. • IC Compiler prunes the buffers and inverters based on drive strength and power: and power: Pruning library cells (r/f, pwr) Min drive = 0.264263. Pruning inv0d0 because drive of 0.149845 is less than 0.264263. Pruning inv0d2 because it is (w/ power-considered) inferior to invbd2. • IC Compiler calculates a minimum drive value based on heuristics. Buffers and inverters whose drive strength is less than the minimum drive value are considered as weak drivers and are pruned by IC d e a ue a e co s de ed as ea d e s a d a e p u ed by C Compiler. • It is not possible to override the default pruning process © Synopsys 2012 19
  • 20. Maximum Transition, Maximum Capacitance and Timing Constraints Capacitance and Timing Constraints Before clock tree synthesis begins, all the global clock tree constraints are reported in the log in the format shown below: Default value or the value set using set clock tree options The value reported in the log, in the format shown below: CTS: Global design rule constraints [rise fall] CTS: max transition = worst[0.050 0.050] GUI = worst[0.100 0.100] SDC = worst[0.050 0.050] Value from SDC _ _ _ p used by CTS [ ] [ ] [ ] CTS: max capacitance = worst[0.600 0.600] GUI = worst[0.600 0.600] SDC = undefined/ignored CTS: max fanout = 2000 GUI = 2000 SDC = undefined/ignored on s Undefined means no value ifi d i SDC CTS: Global timing/clock tree constraints CTS: clock skew = worst[0.100] CTS: insertion delay = worst[2.000] CTS: levels per net = 200 Skew/insertio delay targets Values set using the specified in SDC Ignored means the value from SDC is ignored as the cts force user constraints © Synopsys 2012 20 S d Values set using the set_clock_tree_options command cts_force_user_constraints variable is set to true
  • 21. Clock Tree Synthesis Target Specifications • Target specifications are the internal targets for clock tree synthesis, Clock Tree Synthesis Target Specifications but are not guaranteed. Only target constraints are guaranteed to be achieved CTS: Global target spec [rise fall] CTS: transition = worst[0.250 0.250] CTS: capacitance = worst[0.300 0.300] CTS: fanout= 32 (This target fanout value is not considered by CTS) • Target specifications:  maxTransSpec: Min(0.25, 80%of max_transition constraints)  maxCapSpec: Min(0.30, 80%of max_capacitance constraints) © Synopsys 2012 21
  • 22. Preexisting Clock Tree Information in the Log File Maximum number of Before starting to CTS: Design infomation CTS: total gate levels = 8 CTS: Root clock net CLK2 CTS: clock gate levels = 2 Number of sinks Maximum number of gate levels available e levels Before starting to build the clock tree, the preexisting clock tree structure is printed in the log file CTS: clock sink pins = 4 CTS: level 2: gates = 1 CTS: level 1: gates = 1 CTS: Buffer/Inverter list for CTS for clock net CLK2: CTS: invbdk Existing gate levels and number of gates at each level Number of gate for clock CLK2 printed in the log file CTS: bufbdk ... CTS: Root clock net CLK1 CTS: clock gate levels = 8 CTS: clock sink pins = 8431 N f CTS: clock sink pins 8431 CTS: level 8: gates = 2 CTS: level 7: gates = 3 CTS: level 6: gates = 4 CTS: level 5: gates = 3 CTS: level 4: gates = 1 evels from ps towards source CTS: level 4: gates = 1 CTS: level 3: gates = 5 CTS: level 2: gates = 4 CTS: level 1: gates = 1 CTS: Buffer/Inverter list for CTS for clock net CLK1: CTS i bdk Gate l flip-flo clock s © Synopsys 2012 22 CTS: invbdk CTS: bufbdk ...
  • 23. Real Gates and Guide Buffers • You may see the term real gates in the preexisting clock tree structure information section: CTS: Root clock net CLK1 CTS: clock gate levels = 16 CTS: clock gate levels = 16 CTS: clock sink pins = 70644 ... CTS: level 13: gates = 14 (real gates = 4) CTS: level 12: gates = 111 (real gates = 101) CTS: level 11: gates = 146 (real gates = 136) g ( g ) CTS: level 10: gates = 2488 (real gates = 2478) • Real gates are preexisting gates in the clock tree, and are not gates added by the tool • Guide buffers are buffers or inverters that are inserted by the tool, before it begins to build the tree. They are intended to help clock tree synthesis build a better clock tree • The number of guide buffers inserted at each level can be determined from the difference between gates and real gates. – In the above example, the tool has added 10 guide buffers at each of the clock tree © Synopsys 2012 23
  • 24. Buffers and Inverters Used • Before it begins to build the clock tree, the tool will list all the buffers and inverters it will use to build the tree CTS: Buffer/Inverter list for CTS for clock net sdram clk: _ CTS: CLKBUFX20 CTS: CLKBUFX16 CTS: CLKBUFX12 CTS: Buffer/Inverter LEQ cell list for Boundary Cell for clock net sdram_clk: CTS CLKBUFX20 CTS uses this list CTS: CLKBUFX20 CTS: CLKBUFX16 CTS: CLKINVX8 CTS: Buffer/Inverter LEQ cell list for CTO for clock net sdram_clk: CTS: CLKBUFX20 CTS uses this list for inserting boundary cells CTS: CLKBUFX16 CTS: CLKINVX8 CTS: Buffer/Inverter list for DelayInsertion for clock net sdram_clk: CTS: CLKBUFX20 CTO uses this list for sizing CTO thi li t f d l i ti CTS: CLKBUFX16 CTS: CLKINVX8 • You can change the buffer and inverter list by using the following command: CTO uses this list for delay insertion © Synopsys 2012 24 set_clock_tree_references
  • 25. Clock Tree Synthesis Removes User-Specified Ideal Attributes on Clocks • Synthesized clocks are set to be propagated, and clock transition, which is an attribute of an ideal clock, is removed Ideal Attributes on Clocks CTS: Information: Removing clock transition on clock SP0XCLK ... (CTS-103) CTS: Information: Removing clock transition on clock SP0RCLK ... (CTS-103) • Latency, another attribute of an ideal clock, is also removed Latency, another attribute of an ideal clock, is also removed CTS: Information: Removing clock latency on pin Idma_scr_wrap0__Idma_scrba0_m2m0_wrap/I_dma_scrba0_m2m0/ I_dma@ ... (CTS- 098) • Source Latency is removed for generated clocks Information: Removing clock source latency on clock CLK1GC1 ... (CTS-289) • These messages are informational only, and no action is required © Synopsys 2012 25
  • 26. Overlap or Reconvergent Paths • Overlap or reconvergent paths occur when multiple clocks can drive a node node • IC Compiler issues warnings about such paths Warning: Either the driven net has been synthesized previously or clock path overlaps/reconverges at pin periph/U1852/Y. (CTS-209) • Such messages should be treated as informational, rather than as warnings – IC Compiler has no problems handling such situations © Synopsys 2012 26
  • 27. Cl k t b ildi i d t l l b t l l t ti f th Gate Level-by-Level Clock Tree Synthesis • Clock tree building is done gate level by gate level, starting from the sinks to the clock root • For each gate level, just before the synthesis starts, the following information will be printed in the log: CTS: gate level 2 clock tree synthesis CTS: clock net = I BLENDER 1/gclk Net and driver at _ _ g CTS: driving pin = I_BLENDER_1/U483/Z CTS: gate level 2 design rule constraints [rise fall] CTS: max transition = worst[0.300 0.300] CTS: max capacitance = worst[0.300 0.300] Net and driver at this gate level CTS: max fanout = 2000 CTS: gate level 2 target spec [rise fall] CTS: transition = worst[0.240 0.240] CTS: capacitance = worst[0.240 0.240] CTS: driver cap. = worst[0.088 0.088] C S: d e cap. o st[0.088 0.088] CTS: fanout = 32 CTS: gate level 2 timing constraints CTS: clock skew = worst[0.000] CTS: levels per net = 200 © Synopsys 2012 27 CTS: ----------------------------------------------- CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
  • 28. • The clock tree building starts with clustering. Clustering is the process of Clustering During Clock Tree Synthesis g g g p dividing a set of sink pins (fanouts) into groups. Each group is driven by a buffer  The instances of a cluster are all close to each other • The following message says that 423 sink pins are divided into 27 clusters • The following message says that 423 sink pins are divided into 27 clusters, each with approximately 423/27 sink pins CTS: gate level 2 clock tree synthesis ... CTS: gate level 2 design rule constraints [rise fall] CTS: max transition = worst[0.300 0.300] CTS: max capacitance = worst[0.300 0.300] CTS: max fanout = 2000 CTS: gate level 2 target spec [rise fall] CTS: transition = worst[0.240 0.240] CTS: capacitance = worst[0.240 0.240] p [ ] CTS: driver cap. = worst[0.088 0.088] CTS: fanout = 32 CTS: gate level 2 timing constraints ... CTS: ----------------------------------------------- CTS: Starting clustering for bufbda with target load = worst[0.240 0.240] Before clustering After clustering CTS: Starting clustering for bufbda with target load worst[0.240 0.240] CTS: Completed 423 to 27 clustering CTS: BA: lp (1.520, 0.673): skew (0.149, 0.080) c(1.481, 0.198) viol(n y) CTS: ----------------------------------------------- CTS: Starting clustering for bufbda with target load = worst[0.240 0.240] CTS: Completed 27 to 4 clustering CTS: BA: lp (0 673 0 597): skew (0 080 0 105) c(0 198 0 026) viol(n n) One buffer level is added with each clustering Represents DRCs (cap,trans) © Synopsys 2012 28 CTS: BA: lp (0.673, 0.597): skew (0.080, 0.105) c(0.198, 0.026) viol(n n) CTS: ----------------------------------------------- y : violation present n : no violation Skew (Before clustering, After clustering)
  • 29. Clustering With Hookup Pins • Hookup pins are input pins of gates or macros • Unlike clock pins of flip-flops and latches (sink pins), hookup pins have a nonzero phase delay that must be balanced with the sink pins © Synopsys 2012 29
  • 30. Initially the tool makes attempts to cluster hookup pins along with the normal sinks (trial Clustering With Hookup Pins • Initially, the tool makes attempts to cluster hookup pins along with the normal sinks (trial clustering) CTS: gate level 1 clock tree synthesis ... CTS: gate level 1 design rule constraints [rise fall] CTS: max transition = worst[0.300 0.300] In this example there are 479 sinks CTS: max capacitance = worst[0.300 0.300] CTS: max fanout = 2000 CTS: gate level 1 target spec [rise fall] CTS: transition = worst[0.240 0.240] CTS: capacitance = worst[0.240 0.240] CTS: driver cap. = worst[0.150 0.150] CTS: fanout = 32 In this example, there are 479 sinks and 1 hookup pin CTS: fanout 32 CTS: gate level 1 timing constraints ... CTS: ----------------------------------------------- CTS: Starting clustering for bufbda with target load = worst[0.240 0.240] CTS: Completed 480 to 34 clustering CTS: Starting clustering for bufbda with target load = worst[0.240 0.240] CTS C l t d 34 t 6 l t i Trial clustering CTS: Completed 34 to 6 clustering CTS: BA: this delay [max min] (skew) = worst[0.000 0.000] (0.000) CTS: BA: next delay [max min] (skew) = worst[0.124 0.124] (0.000) CTS: BA: target cap = 0.070 pf CTS: Starting clustering for bufbda with target load = worst[0.240 0.240] CTS: BA: CAC set: target cap = 0.070317: targetWireCap = 0.274866 CTS: Completed 479 to 39 clustering clustering Actual l t i CTS: BA: lp (1.574, 0.770): skew (0.821, 0.451) c(1.737, 0.269) viol(n y) CTS: ----------------------------------------------- • At the trial clustering stage, the hookup pin is considered along with the other sink pins and (479+1) to 34 to 6 clustering is obtained • At the actual clustering stage the tool clusters the 479 sink pins separately from the hookup clustering © Synopsys 2012 30 • At the actual clustering stage, the tool clusters the 479 sink pins separately from the hookup pin
  • 31. Clustering With Hookup Pins: Hookup Pin Clustered With Sinks • If the trial clustering gives good QoR results, the following message shown in blue is displayed : Hookup Pin Clustered With Sinks blue is displayed : CTS: BA: lp (1.968, 2.031): skew (0.257, 0.194) c(0.076, 0.072) viol(y y) CTS: ----------------------------------------------- CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005] CTS: BA: rootNetCap = 0.071776: targ cap = 0.045000: targ wirecap = 0.000000: not relaxed CTS: Completed 2 to 2 clustering CTS: Completed 2 to 2 clustering CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005] CTS: BA: rootNetCap = 0.071776: targ cap = 0.045000: targ wirecap = 0.000000: not relaxed CTS: Completed 2 to 1 clustering CTS: BA: this delay [max min] (skew) = worst[2.040 1.844] (0.196) CTS: BA: next delay [max min] (skew) = worst[2.161 1.965] (0.196) CTS: BA: next delay [max min] (skew) worst[2.161 1.965] (0.196) CTS: BA: target cap = 0.048 pf CTS: Pin 1: periph/U5659/A is selected for next level CTS: delay [max min] (skew) = worst[1.976 1.921] (0.055) CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005] CTS: Completed 2 to 2 clustering p g CTS: BA: lp (2.031, 2.153): skew (0.194, 0.210) c(0.072, 0.026) viol(n n) CTS: ----------------------------------------------- • When the phase delay of the hookup pin periph/U5659/A matches with the delay of the already built tree at that gate level, it will be clustered at that buffer © Synopsys 2012 31 y y g , level.
  • 32. Meeting Target Early Delay • After the synthesis of the root clock net (gate level 1 synthesis), the tool checks if the delay constraint set by the user is being met or not. • If it is not met, the tool inserts some buffers at the root clock net to achieve the target delay specified by the user. p y • In the following message, 16 buffers are inserted at the root clock net to increase the delay from 0.569ns to 2ns, which is the user specified target. CTS: gate level 1 clock tree synthesis CTS: clock net = sys clk C S: c oc et sys_c CTS: driving pin = sys_clk CTS: gate level 1 design rule constraints [rise fall] ... CTS: gate level 1 target spec [rise fall] ... CTS: gate level 1 timing constraints Constraint set by the user CTS: clock skew = worst[0.000] CTS: insertion delay = worst[2.000] CTS: levels per net = 200 CTS: ----------------------------------------------- CTS: Starting clustering for CLKBUF_X20 with target load = worst[0.211 0.270] ... CTS: ----------------------------------------------- CTS: CTS: Starting clustering for CLKBUF_X20 with target load = worst[0.211 0.270] CTS: Completed 19 to 2 clustering CTS: BA: lp (0.563, 0.569): skew (0.142, 0.112) c(0.008, 0.008) viol(n n) CTS: ----------------------------------------------- CTS: Inserting delay cells for clock tree sys_clk ... CTS: current delay = worst[0.569] worst[0.457] © Synopsys 2012 32 CTS: constraint = worst[2.000] worst[0.000] CTS: inserted 16 (buffd3) delay cells to the clock net sys_clk
  • 33. CTS: gate level 1 clock tree synthesis results Synthesis Results of One Gate Level After the synthesis of a CTS: gate level 1 clock tree synthesis results CTS: clock net : sdram_clk CTS: driving pin: sdram_clk CTS: load pins : 5 sink pins, 0 gates/macros pins, 0 ignore pins CTS: buffer level 1: bufbd7 (1) CTS: buffer level 2: bufbd7 (1) delay at the dram_clk) After the synthesis of a gate level, the results are printed in the log CTS: clock tree skew = worst[0.036] CTS: longest path delay = worst[0.327](rise) CTS: shortest path delay = worst[0.291](rise) CTS: total capacitance = worst[0.389 0.389] CTS: buffer level phase delay CTS 1 (I) t[0 293]( i ) t[0 256]( i ) k t[0 036] d insertion d n A (here sd Operating Condition CTS: 1 (I): worst[0.293](rise), worst[0.256](rise); skew = worst[0.036] CTS: (O): worst[0.151](rise), worst[0.129](rise); skew = worst[0.022] CTS: 2 (I): worst[0.150](rise), worst[0.128](rise); skew = worst[0.022] CTS: (O): worst[0.004](rise), worst[0.000](rise); skew = worst[0.004] CTS: buffer level output transition delays [rise fall] CTS: level 0: worst[0.088 0.085] worst[0.088 0.085] Skew and driving pin CTS: level 0: worst[0.088 0.085] worst[0.088 0.085] CTS: load 0: worst[0.088 0.085] worst[0.088 0.085] CTS: level 1: worst[0.111 0.115] worst[0.091 0.092] CTS: load 1: worst[0.111 0.115] worst[0.091 0.092] CTS: level 2: worst[0.158 0.153] worst[0.080 0.071] CTS: load 2: worst[0.158 0.153] worst[0.080 0.071] CTS: buffer level total load capacitance CTS: level 0: worst[0.045 0.045] CTS: level 1: worst[0.093 0.093] CTS: level 2: worst[0.251 0.251] CTS: drc violations: 0 0 2 1 A C B Load capacitance value is added and is © Synopsys 2012 33 Load capacitance value is added and is reported as total capacitance of the subtree Number of cap violations Number of trans violations
  • 34. Maximum Transition and Capacitance Violations • After each gate level is synthesized, the maximum capacitance and maximum transition violations at that gate level are reported Violations CTS: gate level 3 clock tree synthesis results ... CTS: buffer level total load capacitance ... CTS it i l ti i h/CTS 755 CTS: capacitance violation on periph/CTS_755 CTS: capacitance = worst[0.052 0.052] CTS: constraint = worst[0.050 0.050] CTS: capacitance violation on periph/CTS_757 CTS: capacitance = worst[0.051 0.051] CTS: constraint = worst[0 050 0 050] CTS: constraint worst[0.050 0.050] ... CTS: transition delay violation at periph/CLKBUFX20_G3B1I3/A CTS: transition delay = worst[0.052 0.050] worst[0.052 0.050] CTS: constraint = worst[0.050 0.050] CTS: transition delay violation at periph/CLKBUFX20_G3B2I14/A CTS: transition delay = worst[0.053 0.051] worst[0.053 0.051] CTS: constraint = worst[0.050 0.050] ... CTS: drc violations: 18 5 Number of cap violations Number of trans violations © Synopsys 2012 34 violations violations
  • 35. A More Complex Synthesis Results CTS: gate level 1 clock tree synthesis results CTS: clock net : clk CTS: driving pin: clk CTS: load pins : 80 sink pins, 0 gates/macros pins, 0 ignore pins CTS: buffer level 1: CLKBUFX20 (1) CTS: buffer level 2: CLKBUFX20 (2) CLKBUFX12 (1) CTS: clock tree skew = worst[0.001] CTS: longest path delay = worst[0.248](rise) CTS: shortest path delay = worst[0.246](rise) CTS: total capacitance = worst[0.549 0.549] CTS: buffer level phase delay CTS: 1 (I): worst[0.247](rise), worst[0.246](rise); skew = worst[0.001] CTS: (O): worst[0.141](rise), worst[0.140](rise); skew = worst[0.001] CTS: 2 (I): worst[0.141](rise), worst[0.140](rise); skew = worst[0.001] CTS: (O): worst[0.001](rise), worst[0.000](rise); skew = worst[0.001] CTS: buffer level output transition delays [rise fall] CTS: level 0: worst[0.000 0.000] worst[0.000 0.000] CTS: load 0: worst[0.000 0.000] worst[0.000 0.000] CTS: level 1: worst[0.089 0.076] worst[0.089 0.076] CTS: load 1: worst[0.089 0.076] worst[0.089 0.076] CTS: level 2: worst[0.109 0.093] worst[0.104 0.091] CTS: load 2: worst[0.109 0.093] worst[0.104 0.091] CTS: buffer level total load capacitance CTS: buffer level total load capacitance CTS: level 0: worst[0.038 0.038] CTS: level 1: worst[0.108 0.108] CTS: level 2: worst[0.403 0.403] CTS: drc violations: 0 0 © Synopsys 2012 35
  • 36. Gate Level and Buffer Level Nomenclature 2 1 2 1 ) ate level 2 ate level 1 ate level 2 ate level 1 level 3 e level 2 level 4 e level 2 vel 1 source pin evel 2 evel 1 of g evel 2 of g evel 2 of g evel 1 of g Buffer of gate Buffer of gate Gate lev (Clock s Gate Le Buffer le Buffer le Buffer le Buffer le Red: Preexisting gates At each gate level, the clock tree is built © Synopsys 2012 36 Black: CTS introduced gates bottom-up, but the buffer names are changed to appear top-down
  • 37. DRC Violation Report After Synthesis • After building the complete clock tree, all the remaining DRC violations in the entire clock tree gets reported in the log file: CTS: Clock tree synthesis completed successfully CTS: CPU time: 50 seconds CTS: Reporting clock tree violations ... CTS: Global design rules: CTS: maximum transition delay [rise,fall] = [0.05,0.05] CTS: maximum capacitance = 0.05 Constraints CTS: maximum fanout = 2000 CTS: maximum buffer levels per net = 200 CTS: transition delay violation at sdram_clk CTS: user specified transition delay = worst[0.056 0.050] worst[0.056 0.050] CTS: constraint = worst[0.050 0.050] Constraints CTS: transition delay violation at CLKBUF_X20_G1B21I1/Z CTS: transition delay = worst[0.051 0.050] worst[0.051 0.050] CTS: constraint = worst[0.050 0.050] CTS: capacitance violation on CTS_6557 CTS: capacitance = worst[0.074 0.074] Reports only transition and capacitance violations p [ ] CTS: constraint = worst[0.050 0.050] CTS: Summary of clock tree violations: CTS: Total number of transition violations = 2 CTS: Total number of capacitance violations = 1 p Total transition and capacitance violations © Synopsys 2012 37
  • 38. Summary Report After Clock Tree Synthesis CTS: ------------------------------------------------ CTS Cl k T S th i S Clock Tree Synthesis CTS: Clock Tree Synthesis Summary CTS: ------------------------------------------------ CTS: 5 clock domain synthesized CTS: 30 gated clock nets synthesized CTS: 26 buffer trees inserted CTS: 722 buffers used (total size = 45974.2) CTS: 752 clock nets total capacitance = worst[76.868 76.868] Each gate level can h l i l have multiple nets © Synopsys 2012 38
  • 39. Clock-by-Clock Summary • A summary is reported for each clock: CTS: ------------------------------------------------ CTS: Clock-by-Clock Summary Buffer tree is inserted only if necessary CTS: ------------------------------------------------ CTS: Root clock net pclk CTS: 3 gated clock nets synthesized CTS: 2 buffer trees inserted only if necessary CTS: 2 buffers used (total size = 159.667) CTS: 5 clock nets total capacitance = worst[0.514 0.514] CTS: clock tree skew = worst[0.341] CTS: longest path delay = worst[5.959](rise) CTS: longest path delay worst[5.959](rise) CTS: shortest path delay = worst[5.619](rise) CTS: Root clock net sys_clk ... © Synopsys 2012 39
  • 40. Embedded Clock Tree Optimization • After clock tree synthesis, embedded clock tree optimization begins • The characteristics of the buffers and inverters used are reported again CTS: buffer estimated skew target delay driving res input cap CTS: bufbdf [0.013 0.015] [0.217 0.200] [0.210 0.248] [0.007 0.007] CTS: inv0da [0.018 0.021] [0.097 0.119] [0.294 0.347] [0.036 0.036] ... • The global constraints for clock tree are also reported again CTS: Global design rule constraints [rise fall] CTS: max transition = worst[0.050 0.050] GUI = worst[0.050 0.050] SDC = undefined/ignored ... C S Gl b l i i / l k i CTS: Global timing/clock tree constraints CTS: clock skew = worst[0.000] ... CTS: Global target spec [rise fall] CTS: transition = worst[0.040 0.040] ... Note: Embedded clock tree optimization is called only when the compile_clock_tree command is used It is not called when the l k t command is used © Synopsys 2012 40 command is used. It is not called when the clock_opt command is used
  • 41. More Messages on Real Gates and Guide Buffers • At the beginning of optimization, you might get the following Guide Buffers messages: CTS: Root clock net chip_sclk_src CTS: clock gate levels = 75 CTS: clock sink pins = 125896 CTS: clock sink pins 125896 ... CTS: level 73: gates = 3 (real gates = 1) CTS: level 72: gates = 2 (no real gates, guide buffers only) ff • All the gates are guide buffers and inverters inserted during clock tree synthesis. • This information is similar to the one printed prior to clock tree h i synthesis. © Synopsys 2012 41
  • 42. Gate Level Optimization • The clock tree optimization is also done for each gate level • Similar to when the clock tree is built • Before optimizing a gate level, the current skew, longest path delay and shortest path delay from the driving pin of that gate level, is reported. CTS: gate level 2 clock tree optimization CTS: clock net = I_BLENDER_1/gclk CTS: driving pin = I_BLENDER_1/U483/Z CTS: clock tree skew = worst[0.517] CTS: longest path delay = worst[5.339](rise) CTS: shortest path delay = worst[4.822](fall) • After which that gate level is optimized © Synopsys 2012 42
  • 43. Buffer Sizing • The following message indicates that buffer sizing was successful CTO-BS: Starting buffer sizing ... Information: Replaced the library cell of CLKBUF_X20_G2B2I1 from CLKBUF_X20 to CLKBUF_X16. (CTS-152) CTO-BS: CPU time = 0 seconds for buffer sizing • Clock tree optimization will try to resize buffers, and improve skew and insertion delay. If it does not find it beneficial, then the original cell master will be restored. CTO-BS: Starting buffer sizing ... CTO-BS: Restoring original cellMaster <CLKBUF_X20> of <CLKBUF_X20_G2B2I4> CTO-BS: CPU time = 1 seconds for buffer sizing © Synopsys 2012 43
  • 44. CTO-GS: Starting gate sizing ... Gate Sizing Information: Replaced the library cell of I7188625 from TLQMUX2X60 to TULQMUX2ZSX40. (CTS-152) Information: Replaced the library cell of I7586451 from TLTMUX2X60 to TLTMUX2X50. (CTS-152) Information: Replaced the library cell of I3342873 from TULTMUX2X50 to TLTMUX2ZSX60. (CTS-152) Information: Replaced the library cell of I1387108 from TULTMUX2X80 to TULTMUX2ZSX80. (CTS-152) ... I f ti R l d th lib ll f I6717862 f THQMUX2ZSX80 t TSTMUX2ZSX20 (CTS 152) 14 cells sized Information: Replaced the library cell of I6717862 from THQMUX2ZSX80 to TSTMUX2ZSX20. (CTS-152) Information: Replaced the library cell of I9359863 from TLTMUX2ZSX80 to TULTMUX2ZSX60. (CTS-152) Information: Replaced the library cell of I10258160 from TLTMUX2ZSX60 to TLTMUX2ZSX40. (CTS-152) Information: Replaced the library cell of I7636259 from TLTMUX2ZFFX80 to TULTMUX2ZSX60. (CTS-152) CTO-GS: 1: Sized 14/40 cell instances (tested 40X247) CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471] Summary of the first round of sizing y ( ) [ ] [ ]; [ ] CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471] CTO-GS: improvement = worst[0.106%] Information: Replaced the library cell of I2130284 from TLTMUX2X80 to TLTMUX2ZSX40. (CTS-152) Information: Replaced the library cell of I8618764 from TLTMUX2ZFFX80 to TLTMUX2X80. (CTS-152) Information: Replaced the library cell of I1749911 from TULTMUX2ZFFFX80 to TULTMUX2ZFFX80. (CTS-152) • Number of gate sized (Here 14 out of 40 gates) • Shows the improvement in skew Information: Replaced the library cell of I3342873 from TLTMUX2ZSX60 to TLTMUX2ZSX40. (CTS-152) Information: Replaced the library cell of I8872989 from TULTMUX2ZFFFX60 to TLTMUX2ZFFX80. (CTS-152) Information: Replaced the library cell of I1387108 from TULTMUX2ZSX80 to TULTMUX2X50. (CTS-152) CTO-GS: 2: Sized 6/40 cell instances (tested 40X247) CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471] CTO GS: delay (to) = worst[9 104] worst[8 633]; skew = worst[0 471] CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471] CTO-GS: improvement = worst[0.000%] CTO-GS: Summary of cell sizing CTO-GS: Sized 20/40 cell instances (tested 80X247) CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471] CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471] Overall summary of gate sizing done at this gate level. Total 14+6 =20 gates sized giving an 0 106% i t i k t thi t l l © Synopsys 2012 44 y CTO-GS: improvement = worst[0.106%] CTO-GS: CPU time = 2413 seconds for gate sizing 0.106% improvement in skew at this gate level
  • 45. Gate Relocation • Gate relocation works on preexisting gates. • If you have no preexisting gates, you might see the following message: g CTO-GR: gate relocation is skipped since there are no hookup pins © Synopsys 2012 45
  • 46. A Successful Gate Relocation CTO-GR: Starting gate relocation ... CTO-GR: delay [max min] (skew) = worst[9.023 8.563] (0.460) 2 cells were tried at 47 new locations, 1 was moved CTO-GR: 1: Relocated 1/40 cell instances (tested 2 cell instances at 47 points) CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460] CTO-GR: delay (to) = worst[9.023] worst[8.563]; skew = worst[0.460] CTO-GR: improvement = worst[0.000%] CTO GR d l [ i ] ( k ) t[9 018 8 563] (0 455) Initial skew Final skew Improvement in skew CTO-GR: delay [max min] (skew) = worst[9.018 8.563] (0.455) CTO-GR: delay [max min] (skew) = worst[9.018 8.563] (0.455) CTO-GR: 2: Relocated 2/40 cell instances (tested 5 cell instances at 83 points) CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460] CTO-GR: delay (to) = worst[9.018] worst[8.563]; skew = worst[0.455] y ( ) [ ] [ ] [ ] CTO-GR: improvement = worst[1.118%] CTO-GR: Summary of cell relocation CTO-GR: Relocated 3/40 cell instances (tested 7 cell instances at 130 points) CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460] Overall summary of t l ti t thi CTO-GR: delay (to) = worst[9.018] worst[8.563]; skew = worst[0.455] CTO-GR: improvement = worst[1.118%] CTO-GR: CPU time = 2 seconds for gate relocation gate relocation at this gate level © Synopsys 2012 46
  • 47. Gate Relocation: Failed Attempts CTO-GR: Starting gate relocation ... CTO-GR: Summary of cell relocation CTO-GR: Summary of cell relocation CTO-GR: Relocated 0/1 cell instances (tested 1 cell instances at 24 points) CTO-GR: delay (from) = worst[1.207] worst[0.980]; skew = worst[0.227] CTO-GR: delay (to) = worst[1.207] worst[0.980]; skew = worst[0.227] CTO-GR: improvement = worst[0.000%] CTO-GR: CPU time = 0 seconds for gate relocation • In this example, clock tree optimization tried to move one gate instance to 24 different locations. Since the attempts did not improve the QoR, the gate relocation was abandoned © Synopsys 2012 47
  • 48. Buffer Relocation • Buffer relocation is done on all clock tree synthesis inserted buffers CTO-BR: Buffer relocation ... CTO BR: Buffer relocation ... CTO-BR: Optimization level: net CTO-BR: delay [max min] (skew) = worst[9.087 8.503] (0.584) CTO-BR: 1: Relocated 1/6 cell instances (tested 6 cell instances at 74 points) CTO-BR: delay (from) = worst[9.099] worst[8.503]; skew = worst[0.596] CTO-BR: delay (to) = worst[9.087] worst[8.503]; skew = worst[0.584] CTO-BR: improvement = worst[2.013%] CTO-BR: delay [max min] (skew) = worst[9.087 8.503] (0.584) CTO-BR: 2: Relocated 1/6 cell instances (tested 5 cell instances at 62 points) CTO-BR: delay (from) = worst[9 087] worst[8 503]; skew = worst[0 584] CTO BR: delay (from) worst[9.087] worst[8.503]; skew worst[0.584] CTO-BR: delay (to) = worst[9.087] worst[8.503]; skew = worst[0.584] CTO-BR: improvement = worst[0.000%] CTO-BR: Summary of cell relocation CTO-BR: Relocated 2/6 cell instances (tested 11 cell instances at 136 points) CTO-BR: delay (from) = worst[9.099] worst[8.503]; skew = worst[0.596] CTO-BR: delay (to) = worst[9.099] worst[8.503]; skew = worst[0.584] CTO-BR: improvement = worst[2.013%] CTO-BR: CPU time = 0 seconds for buffer relocation Th i f i i i il l i © Synopsys 2012 48 • The information is similar to gate relocation
  • 49. • After the embedded clock tree optimization, the tool prints the summary. • It looks exactly similar to the summary printed after clock tree synthesis Post Embedded Clock Tree Synthesis • It looks exactly similar to the summary printed after clock tree synthesis. CTS: ------------------------------------------------ CTS: Clock Tree Optimization Summary CTS: ------------------------------------------------ CTS: 4 clock domain synthesized CTS: 5 gated clock nets synthesized CTS: 5 buffer trees inserted CTS: 1000 buffers used (total size = 16570 8) CTS: 1000 buffers used (total size = 16570.8) CTS: 1005 clock nets total capacitance = worst[14.010 14.010] CTS: ------------------------------------------------ CTS: Clock-by-Clock Summary CTS: ------------------------------------------------ CTS: Root clock net sdram_clk CTS: 1 gated clock nets synthesized CTS: 1 buffer trees inserted CTS: 1 buffer trees inserted CTS: 302 buffers used (total size = 5039.47) CTS: 303 clock nets total capacitance = worst[4.170 4.170] CTS: clock tree skew = worst[0.035] CTS: longest path delay = worst[2.041](rise) CTS: shortest path delay = worst[2.006](fall) CTS: Root clock net sys_2x_clk ... • After the summary, all the trans and cap violations on the clock tree are also reported. CTS: Global design rules: CTS: maximum transition delay [rise,fall] = [0.05,0.05] CTS: maximum capacitance = 0.05 CTS: maximum fanout = 2000 CTS: maximum buffer levels per net = 200 CTS: transition delay violation at sdram_clk CTS: user specified transition delay = worst[0.056 0.050] worst[0.056 0.050] CTS: constraint = worst[0.050 0.050] CTS: transition delay violation at buffd2_G1B1I1/Z ... CTS: Summary of clock tree violations: © Synopsys 2012 49 CTS: Summary of clock tree violations: CTS: Total number of transition violations = 3994 CTS: Total number of capacitance violations = 1
  • 50. DRC Fixing Beyond Exceptions • After embedded clock tree optimization, the tool will start fixing the DRC violations beyond exceptions. • The messages are similar to clustering: CTS: fixing DRC beyond exception pins under clock CLK1 CTS: gate level 2 DRC fixing (exception level 1) CTS: clock net = CLK1_G1IP CTS: driving pin = bufbd2_G1IP_1/Z CTS: gate level 2 design rule constraints [rise fall] CTS: max transition = worst[0.100 0.100] CTS: max capacitance = worst[0.600 0.600] CTS: max fanout = 2000 CTS: max fanout 2000 CTS: ----------------------------------------------- CTS: Starting clustering for bufbdf with target load = worst[0.056 0.056] CTS: Completed 4 to 1 clustering CTS: ----------------------------------------------- CTS: Starting clustering for bufbd7 with target load = worst[0.050 0.050] 1 1 i CTS: Completed 1 to 1 clustering CTS: ------------------------------------------------ • After fixing the DRC violations, the whole summary and the clock- by-clock summary of DRC fixing beyond exceptions are reported. © Synopsys 2012 50 by clock summary of DRC fixing beyond exceptions are reported.
  • 51. Placement Legalization is Called After Clock Tree Synthesis • When clock tree synthesis places a clock tree buffer or inverter, it After Clock Tree Synthesis places it at a legal location, but the location might be occupied  Causes overlaps which needs to be resolved • The tool calls the placement legalizer which moves the cells to resolve the overlaps. • After legalization, the cells with large displacement gets reported in the log Largest displacement cells: Cell: periph/U122 (AND3X) Input location: (906.380 1597.520) Legal location: (897.140 1582.400) Displacement: 17 720 um e g 3 52 row height 1 of 6 cells that were displaced Displacement: 17.720 um, e.g. 3.52 row height. Total 6 cells has large displacement (e.g. > 15.120 um or 3 row height) © Synopsys 2012 51
  • 52. Agenda • Prerequisites for Clock Tree Synthesis • Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis • Clock Tree Synthesis Log Messages • Clock Tree Optimization Log Messages © Synopsys 2012 52
  • 53. The optimize_clock_tree Command Log File Messages • Optimization options Log File Messages p p • Report before optimization • Optimization • Report after optimization © Synopsys 2012 53
  • 54. Standalone Optimization Using the optimize clock tree Command • Standalone optimization differs from embedded optimization in the optimize_clock_tree Command algorithms used • Some of the log messages are similar to those of when you use the g g y compile_clock_tree command  Design update information  Buffer characterization Buffer characterization  Pruning of cells  List of cells used for clock tree optimization © Synopsys 2012 54
  • 55. CTS-352 Warning • The default delay calculation engine is Elmore. Elmore delay calculation might lead to inferior accuracy in skew and latency estimation. • Enable the Arnoldi delay calculation engine for more accurate delay y g y calculation during optimization, by using the following command: set_delay_calculation –clock_arnoldi • Otherwise, the optimize_clock_tree command will issue the following warning: Warning: set_delay_calculation is currently set to 'elmore'. 'clock arnoldi' is suggested (CTS 352) 'clock_arnoldi' is suggested. (CTS-352) © Synopsys 2012 55
  • 56. Optimization Options • Before starting optimization, the optimize_clock_tree d h i d h i i i i f h command reports the root pin and the optimization options for each clock. • The following are the options which you have specified, by using the set clock tree optimization options command set_clock_tree_optimization_options command Initializing parameters for clock CLK2GC: Root pin: instCLK2GC/Q Root pin: instCLK2GC/Q Using the following optimization options: gate sizing : on gate relocation : on preserve levels : off area recovery : on relax insertion delay : off balance rc : off © Synopsys 2012 56 balance rc : off
  • 57. Preoptimization Report • Before the tool begins to optimize the clock tree, it reports some of the current characteristics of the clock tree: ***************************************** * Preoptimization report (clock 'CLK3') * Clock name * Preoptimization report (clock CLK3 ) * ***************************************** Corner max' Estimated Skew (r/f/b) = (0.073 0.000 0.073) Estimated Insertion Delay (r/f/b) = (1.903 -inf 1.903) Corner 'RC-ONLY' Clock name CTS corner The starting skew and ID for the clock as seen by CTO Estimated Skew (r/f/b) = (0.005 0.000 0.005) Estimated Insertion Delay (r/f/b) = (0.008 -inf 0.008) Wire capacitance = 0.8 pf Total capacitance = 2.3 pf Max transition = 0.448 ns CTO Maximum transition value present in the clock tree Cells = 24 (area=67.500000) Buffers = 23 (area=67.500000) Buffer Types ============ bufbd2: 1 bufbdf: 8 p Information about the buffers and inverters t i th l k t bufbdf: 8 bufbd7: 5 bufbd4: 3 bufbd1: 6 present in the clock tree © Synopsys 2012 57
  • 58. Optimization Messages • During optimization, the tool prints out messages for sizing, insertion and removal, and switching of metal layers: Deleting cell I_SDRAM_TOP/bufbda_G1B1I10 and output net I_SDRAM_TOP/sdram_clk_G1B1I10. iteration 1: (0.314104, 3.328620) Total 1 buffers removed on clock CLK3 Start (3.256, 3.527), End (3.015, 3.329) Buffer Removal Start (sp, lp) : Initial delays (skew, ID) .... iteration 2: (0.313991, 3.314841) iteration 3: (0.308073, 3.295621) Total 2 cells sized on clock CLK3 Start (3 015, 3 329), End (2 988, 3 296) Cell Sizing Start (sp, lp) : Initial delays End (sp, lp) : Final delays sp: shortest path delay lp: longest path delay Start (3.015, 3.329), End (2.988, 3.296) .... iteration 6: (0.305181, 3.275623) Total 1 delay buffers added on clock sck_in12 (LP) Start (2.975, 3.283), End (2.970, 3.276) Buffer Insertion .... Switch to low metal layer for clock ‘CLK3': Total 9 out of 13 nets switched to low metal layer for clock ‘CLK3' with largest cap change 0.00 percent © Synopsys 2012 58 Metal layer switching
  • 59. Optimization Messages • If area recovery option is enabled, the tool does area recovery after optimizing each clock and reports the changes made to that clock: optimizing each clock, and reports the changes made to that clock: Area recovery optimization for clock ‘CLK3': 15% 23% 30% 46% 53% 61% 76% 84% 92% 100% Deleting cell cell I_SDRAM_TOP/bufbda_G1B1I9 and output net I_SDRAM_TOP/sdram_clk_G1B1I9. Total 1 buffers removed (all paths) for clock ‘CLK3' © Synopsys 2012 59
  • 60. • After completing the optimization of a clock, the tool reports the new Post Optimization Report p g p , p characteristics of the clock tree. • This is similar to the information printed in before optimization: ************************************************** * Multicorner optimization report (clock 'CLK3') * ************************************************** Corner ‘max' Estimated Skew (r/f/b) = (0.041 0.000 0.041) E ti t d I ti D l ( /f/b) (1 725 i f 1 725) Estimated Insertion Delay (r/f/b) = (1.725 -inf 1.725) Corner 'RC-ONLY' Estimated Skew (r/f/b) = (0.007 0.000 0.007) Estimated Insertion Delay (r/f/b) = (0.009 -inf 0.009) Wire capacitance = 0.8 pf Total capacitance = 2.3 pf Max transition = 0.356 ns Cells = 24 (area=59.000000) Buffers = 23 (area=59.000000) Buffer Types Buffer Types ============ bufbd7: 4 bufbdf: 6 bufbd4: 5 © Synopsys 2012 60 bufbd1: 7 bufbd2: 1
  • 61. Reporting the Longest and Shortest Paths • The longest and shortest paths corresponding to all corners are reported, soon after the post optimization report: ++ Longest path for clock CLK3 in corner 'max': object fan cap trn inc arr r location clk3 (port) 32 0 0 r ( 440 748) clk3 (net) 13 97 … I_SDRAM_TOP/I_SDRAM_READ_FIFO/reg_array_reg_3__8_/CP (senrq1) 167 4 289 r ( 521 520) ++ Shortest path for clock CLK3 in corner 'max': object fan cap trn inc arr r location object fan cap trn inc arr r location clk3 (port) 32 0 0 r ( 440 748) clk3(net) 13 97 … I_SDRAM_TOP/I_SDRAM_READ_FIFO/reg_array_reg_4__11_/CP (senrq1) 217 4 247 r ( 687 656) 217 4 247 r ( 687 656) • Placement legalization related messages are located at the end of the optimize_clock_tree command log © Synopsys 2012 61