From Event to Action: Accelerate Your Decision Making with Real-Time Automation
4 U 5 Slides With Notes
1. Asynchronous Clock Domain Crossings Aware Physical Implementation of ASICs Ramesh Rajagopalan (rameraja@cisco.com), Cisco Systems Inc, San Jose, CA Ajay Bhandari (ajayb@cisco.com), Cisco Systems Inc, San Jose, CA Namit Gupta (namit@atrenta.com), Atrenta Inc, San Jose, CA
2. Introduction: Asynchronous Clock Domain Crossings (CDC) Ck1 Ck2 D0 C0 D1 Logic D1 D0 C0 Input to combination logic changes upon clock edge Ck1 Ck2 Combo logic computation transient glitches Computation settles before next edge of clock D1 captures a stable data honoring setup/hold time But what if D1 is clocked by another asynchronous clock?
27. To bring any across subchip CDC logic into one subchip. Create Abstract View of Partitions to perform top level CDC rule checks Synthesis Top Level FloorPlan CDC Checks report domain crossings at top between partitions Top level Timing reports net delays for domain crossing logic between partitions at the top Need to bring CDC logic spread between two partitions into one partition? YES NO Netlist and Floorplan with finalized physical partitions
In a synchronous design when data is launched from source flop D0 as shown, due to combinational logic computation, transient glitches will appear on C0, but before the next clock edge comes computation settles and D1 will be captured as safe data.What will happen if D1 is clocked by another clock which is asynchronous. Since asynchronous edge can come anytime, may be before the computation is settled and it is quite probable that it can sample one of the transient glitch, which can cause chip failure.
1) Improper clock domain crossings cause expensive chip failures. 2) Proper domain crossing logic such as multi-flop based synchronizers are to be designed in the RTL. 3) Simulations and STA do not report invalid domain crossings in the design . 4) Hence logic designers check for valid synchronization, data integrity and data coherency using CDC tools 4) However physical implementation of CDC logic has gained importance due to a) increased design complexity ( several asyncclk domains, higher clk freq, gate count, die size and more number of Ips) b) Normalized RC values of the interconnects increase significantly ( in thin metal layers) as the technology nodes shrink. 5) Now let us see how physical implementation of CDC logic influences how do we create physical partitions and hierarchy, top level floorplan and how do we perform cell placement and attain timing closure.
On the left hand side is the logical view of a data bus bit source register D in clk1(900 MHz) domainsubchipA is being received in clk2 (625MHz) domain in subchip B. This logic has proper synchronizer ( called mux recirculation) with the qualifier signal A is synchronized in clk2. Data transfer from D to E is controlled by qualifier. To meet setup/hold at Reg E, the max delay between Regs D and E to be less than one cycle ( with 25% margin) of clk2 (625 MHz) which is 1.2 nSec. The top level floorplanwas created and proximity between subchips was based on how strongly the subchips are connected and the timing constraints in the synchronous domain . That lead to the domain crossing signals between subchips A and B to traverse over a 12 mm distance accounting in 45 nm technologyfor more than 1.6 nsec delay violatingof the max delay requirement of 1.2 nsec for the CDC logic.During design planning and partitioning, information on physical distances in the floorplan that the CDC logic has to traverse while crossing the domains was lacking.
Once the physical partitions are decided, create an abstract view of them and perform CDC checks to report the signals between subchips that cross domains at top level.Create a top level floorplan and bring in the net delays estimated for domain crossing signals at the top level, between two subchips.Decide to bring in the CDC logic spread between two subchips into one subchip and to avoid as much as possible domain crossings at the top level.
Logical view a multi-flop synchronizer within a subchipis on the left side. if a transition on signal A occurs near the active edge of the clk_B, then setup or hold violations could occur at register F2 causing signal B to oscillate indefinitely and making F2 enter a meta-stable state. The multi-flop synchronizer allows sufficient time for the oscillator to settle and generates a stable output C at register F3. On the right side is the implementation of a subchip that uses this synchronization scheme. There is a requirement that the synchronizer flops (F2 and F3)should be placed as close to each other as possible so that the synchronizer output C does not go into a meta-stable state due to the delay between the synchronizer flops. However, we found that during the standard cell placement of a subchip, the flops in a multi-flop synchronizer get placed far away from each other (in different placement clusters) as their path may not be a timing critical path for the placement engine. This may increase the probability of the output of the synchronizer register going meta-stable.
1)Since the least amount of distance and delay between synchronizer flops is guaranteed by a hardcore integrated synchronizer from a vendor library, it remains the preferred synchronizer for physical implementation. 2) Using SpyGlass, the RTL netlist of each subchip goes through CDC rule checks to report all the synchronizers in the design that do not use the integrated (usually double-flops) synchronizers from the vendor library. Using this report, all the synchronizers in a subchip are swapped with the integrated synchronizers from the vendor library3) For synchronizers that cannot be swapped with integrated synchronizers, a max delay constraint that is a very small percentage( 10 to 20%) of one clock cycle of the destination clock is set for the placement of the synchronizer flops in a subchip.
The thirdissue is with timing closure of the data bus signals across clock domains.Left side shows logical view of “mux recirculation synchronizers” used for data bus domain crossings.3)there is a requirement of a maximum delay between registers D and E to be less than one clock cycle of the destination clock with a 25% margin.4) When unconstrained, the placement of data bus registers D and E resulted in either a scenic detour of the interconnect between register pairs or a very large distance between their placement.
To solve this issue, 1) A flow first runs CDC synchronization checks 2) Uses “CrossingInfo report” generated by SpyGlass CDC to find all source and destination pairs of CDC logic3) Scriptcreates a “set_max_delay” constraint between the register pairs.
(a) Identifying and resolving CDC across physical partitions on the chip due to integration of multiple IPs running on different clock frequencies.(b) In case an integrated synchronizer flop is not used, then it is necessary to minimize delays for the synchronizing flops to reduce the probability of it going into meta-stable states.c) Adding timing constraints in the form of max delay for the CDC paths to eliminate the possibility of a wiring detour or large placement separation. These CDC paths are otherwise not timed in the timing sign-off as the timing paths across different clock domains are considered asynchronous