SlideShare a Scribd company logo
RTL Coding Guidelines
Chethan Kumar H B
Chapter 1
Coding Style For Better Synthesis
Reference :
• http://www.sunburst-design.com/papers/CummingsSNUG2000SJ NBA.pdf
Before giving further explanation and examples of both blocking and nonblocking as-
signments, it would be useful to outline eight guidelines that help to accurately simulate
hardware, modeled using Verilog. Adherence to these guidelines will also remove 90-100%
of the Verilog race conditions encountered by most Verilog designers.
• When modeling sequential logic, use nonblocking assignments.
• When modeling latches, use nonblocking assignments.
• When modeling combinational logic with an always block, use blocking
assignments.
• When modeling both sequential and combinational logic within the same
always block, use nonblocking assignments.
• Do not mix blocking and nonblocking assignments in the same always
block.
• Do not make assignments to the same variable from more than one
always block.
• Use $strobe to display values that have been assigned using nonblocking
assignments.
• Do not make assignments using #0 delays
2
1.1 Blocking assignments
The blocking assignment operator is an equal sign (”=”). A blocking assignment gets
its name because a blocking assignment must evaluate the RHS arguments and complete
the assignment without interruption from any other Verilog statement. The assignment
is said to ”block” other assignments until the current assignment has completed. The
one exception is a blocking assignment with timing delays on the RHS of the blocking
operator, which is considered to be a poor coding style. A problem with blocking as-
signments occurs when the RHS variable of one assignment in one procedural
block is also the LHS variable of another assignment in another procedural
block and both equations are scheduled to execute in the same simulation
time step.
module fbosc1 (y1 , y2 , clk , r s t ) ;
output y1 , y2 ;
input clk , r s t ;
reg y1 , y2 ;
always @( posedgeclk or posedgerst )
i f ( r s t ) y1 = 0; // r e s e t
e l s e y1 = y2 ;
always @( posedgeclk or posedgerst )
i f ( r s t ) y2 = 1; // preset
e l s e y2 = y1 ;
endmodule
1.2 Nonblocking assignments
The nonblocking assignment operator is the same as the less-than-or-equal-to operator
(”<=”). A nonblocking assignment gets its name because the assignment evaluates the
RHS expression of a nonblocking statement at the beginning of a time step and schedules
the LHS update to take place at the end of the time step.
Execution of nonblocking assignments can be viewed as a two-step process:
• Evaluate the RHS of nonblocking statements at the beginning of the time step.
• Update the LHS of nonblocking statements at the end of the time step.
Nonblocking assignments are only made to register data types and are therefore only
permitted inside of procedural blocks, such as initial blocks and always blocks. Non-
blocking assignments are not permitted in continuous assignments
3
1.3 The Verilog ”stratified event queue”
the ”stratified event queue” is logically partitioned into four distinct queues for the
current simulation time and additional queues for future simulation times.
Figure 1.1: Verilog ”stratified event queue”.
The active events queue is where most Verilog events are scheduled, including block-
ing assignments, continuous assignments, $display commands, evaluation of instance and
primitive inputs followed by updates of primitive and instance outputs, and the eval-
uation of nonblocking RHS expressions. The LHS of nonblocking assignments are not
updated in the active events queue.
Events are added to any of the event queues but are only removed from the active events
queue. Events that are scheduled on the other event queues will eventually become
”activated,” or promoted into the active events queue.
The practice of making #0-delay assignments is generally a flawed practice employed by
designers who try to make assignments to the same variable from two separate procedural
blocks, attempting to beat Verilog race conditions by scheduling one of the assignments
4
to take place slightly later in the same simulation time step. Adding #0-delay as-
signments to Verilog models needlessly complicates the analysis of scheduled
events.
1.4 Self-triggering always blocks
In general, a Verilog always block cannot trigger itself. Consider the oscillator example in
Example 1.1. This oscillator uses blocking assignments. Blocking assignments evaluate
their RHS expression and update their LHS value without interruption. The blocking
assignment must complete before the @(clk) edge-trigger event can be scheduled. By the
time the trigger event has been scheduled, the blocking clk assignment has completed;
therefore, there is no trigger event from within the always block to trigger the @(clk)
trigger.
Listing 1.1: Non-self-triggering oscillator using blocking assignments
module osc1 ( clk ) ;
outputclk ;
regclk ;
i n i t i a l #10 clk = 0;
always @( clk ) #10 clk = ! clk ;
endmodule
In contrast, the oscillator in Example 1.2 uses nonblocking assignments. After the
first @(clk) trigger, the RHS expression of the nonblocking assignment is evaluated and
the LHS value scheduled into the nonblocking assign updates event queue. Before the
nonblocking assign updatesevent queue is ”activated,” the @(clk) trigger statement is
encountered and the always block again becomes sensitive to changes on the clksignal.
When the nonblocking LHS value is updated later in the same time step, the @(clk)
is again triggered. osc2 example is self triggering(which is not necessarily a
recommended coding style).
Listing 1.2: Self-triggering oscillator using nonblocking assignments
module osc2 ( clk ) ;
outputclk ;
regclk ;
i n i t i a l #10 clk = 0;
always @( clk ) #10 clk<= ! clk ;
endmodule
5
1.5 Combinational logic - use blocking assignments
The code shown in Example 1.3 builds the y-output from three sequentially executed
statements. Since nonblocking assignments evaluate the RHS expressions before updating
the LHS variables, the values of tmp1 and tmp2 were the original values of these two
variables upon entry to this always block and not the values that will be updated at the
end of the simulation time step. The y-output will reflect the old values of tmp1 and
tmp2, not the values calculated in the current pass of the always block
Listing 1.3: Bad combinational logic coding style using nonblocking assignments
module ao4 (y , a , b , c , d ) ;
output y ;
input a , b , c , d ;
reg y , tmp1 , tmp2 ;
always @( a or b or c or d) begin
tmp1 <= a & b ;
tmp2 <= c & d ;
y <= tmp1 | tmp2 ;
end
endmodule
The code shown in Example 1.4 is identical to the code shown in Example 1.3, except
that tmp1 and tmp2 have been added to the sensitivity list. As describe in section
1.2, when the nonblocking assignments update the LHS variables in the nonblocking
assign update events queue, the always block will self-trigger and update the y-outputs
with the newly calculated tmp1 and tmp2 values. y-output value will now be
correct after taking two passes through the always block. Multiple passes
through an always block equates to degraded simulation performance and
should be avoided if a reasonable alternative exists (use blocking statements
for combinational modeling).
Listing 1.4: combinational logic coding style using nonblocking assignments
module ao5 (y , a , b , c , d ) ;
output y ;
input a , b , c , d ;
reg y , tmp1 , tmp2 ;
always @( a or b or c or d or tmp1 or tmp2) begin
tmp1 <= a & b ;
tmp2 <= c & d ;
y <= tmp1 | tmp2 ;
end
endmodule
6
NOTE
• Using the $display command with nonblocking assignments does not work
• ”Making multiple nonblocking assignments to the same variable in the same always
block is defined by the Verilog Standard. The last nonblocking assignment to the
same variable wins”
1.5.1 Driving same signal inside two different if condition blocks
In the code shown below second if statement will take the precedence always. If both
the enables are ’1’ then the output seen is ’in2’.
Listing 1.5: One signal two if
module onesignaltwoif (
input clk ,
input en1 ,
input en2 ,
input in1 ,
input in2 ,
output reg out
) ;
always @( posedge ( clk ))
begin
i f ( en1 )
out <= in1 ;
i f ( en2 )
out <=in2 ;
end
endmodule
1.6 RTL Coding Styles That Yield Simulation and
Synthesis Mismatches
Reference :
• http://www.sunburst-design.com/papers/CummingsSNUG1999SJ SynthMismatch.pdf
7
1.6.1 SENSITIVITY LIST
Synthesis tools infer combinational or latching logic from an always block with a sensi-
tivity list that does not contain the Verilog keywords posedge or negedge. For a combi-
national always block, the logic inferred is derived from the equations in the block and
has nothing to do with the sensitivity list. The synthesis tool will read the sensitivity list
and compare it against the equations in the always block, only to report coding omissions
that might cause a mismatch between pre- and post-synthesis simulations.
The presence of signals in a sensitivity list that are not used in the always block will
not make any functional difference to either pre- or post-synthesis simulations. The only
effect of extraneous signals is that the pre-synthesis simulations will run more slowly.
This is due to the fact that the always block will be entered and evaluated more often
than is necessary
1.6.2 Incomplete sensitivity list
The synthesized logic described by the equations in an always block will always be im-
plemented as if the sensitivity list were complete. However, the pre-synthesis simulation
functionality of this same always block will be quite different. In module code1a, the
sensitivity list is complete; therefore, the pre- and post-synthesis simulations will both
simulate a 2-input and gate. In module code1b, the sensitivity list only contains the
variable a. The post-synthesis simulations will simulate a 2-input and gate. However, for
pre-synthesis simulation, the always block will only be executed when there are changes
on variable a. Any changes on variable b that do not coincide with changes on a will not
be observed on the output. This functionality will not match that of the 2-input and
gate of the post-synthesis model. Finally, module code1c does not contain any sensitivity
list. During pre-synthesis simulations, this always block will lock up the simulator into
an infinite loop. Yet, the post-synthesis model will again be a 2-input and gate.
Listing 1.6: Incomplete sensitivity lists
module code1a (o , a , b ) ;
output o ;
input a , b ;
reg o ;
always @( a or b)
o = a & b ;
endmodule
∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗
module code1b (o , a , b ) ;
output o ;
input a , b ;
reg o ;
8
always @( a )
o = a & b ;
∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗
endmodule
module code1c (o , a , b ) ;
output o ;
input a , b ;
reg o ;
always
o = a & b ;
endmodule
Note: All three modules infer a 2-input and gate
1.6.3 CASE STATEMENTS
Full Case
Using the synthesis tool directive //synopsys full case gives more information about the
design to the synthesis tool than is provided to the simulation tool. This particular
directive is used to inform the synthesis tool that the case statement is fully defined,
and that the output assignments for all unused cases are don’t cares. The functionality
between pre- and postsynthesized designs may or may not remain the same when using
this directive. Additionally, although this directive is telling the synthesis tool to use the
unused states as dont cares, this directive will sometimes make designs larger and slower
than designs that omit the full case directive.
In module code4a, a case statement is coded without using any synthesis directives.
The pre- and postsynthesis simulations will match. Module code4b uses a case statement
with the synthesis directive full case. Because of the synthesis directive, the en input is
optimized away during synthesis and left as a dangling input. The pre-synthesis simulator
results of modules code4a and code4b will match the post-synthesis simulation results
of module code4a, but will not match the post-synthesis simulation results of module
code4b.
Listing 1.7: Full Case
// no f u l l c a s e
// Decoder b u i l t from four 3−input and gates
// and two i n v e r t e r s
module code4a (y , a , en ) ;
output [ 3 : 0 ] y ;
input [ 1 : 0 ] a ;
input en ;
9
reg [ 3 : 0 ] y ;
always @( a or en ) begin
y = 4 ’ h0 ;
case ({en , a})
3 ’ b1 00 : y [ a ] = 1 ’ b1 ;
3 ’ b1 01 : y [ a ] = 1 ’ b1 ;
3 ’ b1 10 : y [ a ] = 1 ’ b1 ;
3 ’ b1 11 : y [ a ] = 1 ’ b1 ;
endcase
end
endmodule
∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗
// f u l l c a s e example
// Decoder b u i l t from four 2−input nor gates
// and two i n v e r t e r s
// The enable input i s dangling ( has been optimized away)
module code4b (y , a , en ) ;
output [ 3 : 0 ] y ;
input [ 1 : 0 ] a ;
input en ;
reg [ 3 : 0 ] y ;
always @( a or en ) begin
y = 4 ’ h0 ;
case ({en , a}) // synopsys f u l l c a s e
3 ’ b1 00 : y [ a ] = 1 ’ b1 ;
3 ’ b1 01 : y [ a ] = 1 ’ b1 ;
3 ’ b1 10 : y [ a ] = 1 ’ b1 ;
3 ’ b1 11 : y [ a ] = 1 ’ b1 ;
endcase
end
endmodule
Parallel Case
Using the synthesis tool directive //synopsys parallel case gives more information about
the design to the synthesis tool than is provided to the simulation tool. This particular
directive is used to inform the synthesis tool that all cases should be tested in parallel,
even if there are overlapping cases which would normally cause a priority encoder to be
inferred. When a design does have overlapping cases, the functionality between pre- and
post-synthesis designs will be different.
10
The pre-synthesis simulations for modules code5a and code5b below, as well as the
postsynthesis structure of module code5a will infer priority encoder functionality. How-
ever, the post-synthesis structure for module code5b will be two and gates. The use of the
synthesis tool directive //synopsys parallel case will cause priority encoder case state-
ments to be implemented as parallel logic, causing pre- and post-synthesis simulation
mismatches.
Listing 1.8: Parallel Case
// no p a r a l l e l c a s e
// Priority encoder − 2−input nand gate driving an
// inv er te r ( z−output ) and also driving a
// 3−input and gate (y−output )
module code5a (y , z , a , b , c , d ) ;
output y , z ;
input a , b , c , d ;
reg y , z ;
always @( a or b or c or d) begin
{y , z} = 2 ’b0 ;
casez ({a , b , c , d})
4 ’ b11 ??: z = 1;
4 ’b??11: y = 1;
endcase
end
endmodule
∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗
// p a r a l l e l c a s e
// two p a r a l l e l 2−input and gates
module code5b (y , z , a , b , c , d ) ;
output y , z ;
input a , b , c , d ;
reg y , z ;
always @( a or b or c or d) begin
{y , z} = 2 ’b0 ;
casez ({a , b , c , d}) // synopsys p a r a l l e l c a s e
4 ’ b11 ??: z = 1;
4 ’b??11: y = 1;
endcase
end
endmodule
11
caseX
The use of casex statements can cause design problems. A casex treats Xs as ”don’t
cares” if they are in either the case expression or the case items. The problem with casex
occurs when an input tested by a casex expression is initialized to an unknown state. The
pre-synthesis simulation will treat the unknown input as a ”don’t care” when evaluated in
the casex statement. The equivalent post-synthesis simulation will propagate Xs through
the gate-level model, if that condition is tested.
NOTE
• caseZ is same as caseX except ’Z’ treated as dont care.
• ”Aware of a fact that synthesis directives are not recognized by simulators. While
using any synthesis directives makes sure that, it doesn’t lead to pre & post syn-
thesis mismatch”
1.7 FSM
Reference :
• http://www.sunburst-design.com/papers/CummingsICU2002 FSMFundamentals.pdf
• http://www.sunburst-design.com/papers/CummingsSNUG2000Boston FSM.pdf
A common classification used to describe the type of an FSM is Mealy and Moore state
machines[2] [3]. A Moore FSM is a state machine where the outputs are only a function
of the present state. A Mealy FSM is a state machine where one or more of the outputs
is a function of the present state and one or more of the inputs.
12
Figure 1.2: Finite State Machine (FSM) block diagram.
1.7.1 Binary Encoded or Onehot Encoded?
Common classifications used to describe the state encoding of an FSM are Binary (or
highly encoded) and Onehot.
A binary-encoded FSM design only requires as many flip-flops as are needed to uniquely
encode the number of states in the state machine. The actual number of flip-flops required
is equal to the ceiling of the log-base-2 of the number of states in the FSM.
A onehot FSM design requires a flip-flop for each state in the design and only one
flip-flop (the flip-flop representing the current or ”hot” state) is set at a time in a onehot
FSM design. For a state machine with 9- 16 states, a binary FSM only requires 4 flip-
flops while a onehot FSM requires a flip-flop for each state in the design (9-16 flip-flops).
FPGA vendors frequently recommend using a onehot state encoding style because
flip-flops are plentiful in an FPGA and the combinational logic required to implement a
onehot FSM design is typically smaller than most binary encoding styles. Since FPGA
performance is typically related to the combinational logic size of the FPGA design,
onehot FSMs typically run faster than a binary encoded FSM with larger combinational
logic blocks[4].
13
Figure 1.3: FSM encoding.
Note: When one hot style is used to code FSM without passing // synopsys paral-
lel case directive, synthesis tools always infer priority encoder. This happens because,
there is a possibility that where two bits of the state variable are set and the first state
is given higher priority.
Listing 1.9: one hot
// This l o g i c i n f e r p r i o r i t y encoder
module fsm onehot1
( output reg y , z ,
input wire [ 1 : 0 ] state ) ;
parameter [ 3 : 0 ] IDLE = 0 ,
BBUSY = 1 ,
BWAIT = 2 ,
BFREE = 3;
always @( state ) begin
{y , z} = 2 ’ b0 ;
casez (1 ’ b1)
state [ IDLE ] : z = 1;
state [BBUSY] : y = 1;
endcase
end
endmodule
∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗
14
// This l o g i c i n f e r p a r a l l e l case
module fsm cc4 fp
( output reg y , z ,
input wire [ 1 : 0 ] state ) ;
parameter [ 3 : 0 ] IDLE = 0 ,
BBUSY = 1 ,
BWAIT = 2 ,
BFREE = 3;
always @( state ) begin
{y , z} = 2 ’b0 ;
casez (1 ’ b1) // synopsys p a r a l l e l c a s e
state [ IDLE ] : z = 1;
state [BBUSY] : y = 1;
endcase
end
endmodule
1.7.2 One Always Block FSM Style (Not Recommended)
One of the most common FSM coding styles in use today is the one sequential always
block FSM coding style. For most FSM designs, the one always block FSM coding style
is more verbose, more confusing and more error prone than a comparable two always
block coding style.
1.7.3 Two Always Block FSM Style
One of the best Verilog coding styles is to code the FSM design using two always blocks,
one for the sequential state register and one for the combinational next-state and com-
binational output logic.
Listing 1.10: fsm design - two always block style
module fsm cc4 2
( output reg gnt ,
input dly , done , req , clk , rst n ) ;
parameter [ 1 : 0 ] IDLE = 2 ’ b00 ,
BBUSY = 2 ’ b01 ,
BWAIT = 2 ’ b10 ,
BFREE = 2 ’ b11 ;
reg [ 1 : 0 ] state , next ;
always @( posedge clk or negedge rst n )
i f ( ! rst n ) state <= IDLE ;
e l s e state <= next ;
15
always @( state or dly or done or req ) begin
next = 2 ’bx ;
gnt = 1 ’b0 ;
case ( state )
IDLE : i f ( req ) next = BBUSY;
e l s e next = IDLE ;
BBUSY: begin
gnt = 1 ’b1 ;
i f ( ! done ) next = BBUSY;
e l s e i f ( dly ) next = BWAIT;
e l s e next = BFREE;
end
BWAIT: begin
gnt = 1 ’b1 ;
i f ( ! dly ) next = BFREE;
e l s e next = BWAIT;
end
BFREE: i f ( req ) next = BBUSY;
e l s e next = IDLE ;
endcase
end
endmodule
FSM Coding Notes
• Parameters (Parameters are constants that are local to a module) are used to
define state encodings instead of the Verilog ‘define macro definition construct.
After parameter definitions are created, the parameters are used throughout the
rest of the design, not the state encodings.
• The sequential always block is coded using nonblocking assignments.
• The combinational always block sensitivity list is sensitive to changes on the state
variable and all of the inputs referenced in the combinational always block.
• Assignments within the combinational always block are made using Verilog blocking
assignments.
• Default output and next state assignments are made before coding the case state-
ment as shown in 1.10. This eliminates latches and reduces the amount of code
required to code the rest of the outputs in the case statement and highlights in the
case statement exactly in which states the individual output(s) change).
16
• Assignments within the combinational always block are made using Verilog blocking
assignments.
1.7.4 Onehot FSM Coding Style
Efficient (small and fast) onehot state machines can be coded using an inverse case
statement; a case statement where each case item is an expression that evaluates to true
or false.
Listing 1.11: fsm design -onehot style
module fsm cc4 fp
( output reg gnt ,
input dly , done , req , clk , rst n ) ;
parameter [ 3 : 0 ] IDLE = 0 ,
BBUSY = 1 ,
BWAIT = 2 ,
BFREE = 3;
reg [ 3 : 0 ] state , next ;
always @( posedge clk or negedge rst n )
i f ( ! rst n ) begin
state <= 4 ’b0 ;
state [ IDLE ] <= 1 ’b1 ;
end
e l s e state <= next ;
always @( state or dly or done or req ) begin
next = 4 ’ b0 ;
gnt = 1 ’b0 ;
case (1 ’ b1) // ambit synthesis case = f u l l , p a r a l l e l
state [ IDLE ] : i f ( req ) next [BBUSY] = 1 ’b1 ;
e l s e next [ IDLE ] = 1 ’b1 ;
state [BBUSY] : begin
gnt = 1 ’b1 ;
i f ( ! done ) next [BBUSY] = 1 ’ b1 ;
e l s e i f ( dly ) next [BWAIT] = 1 ’b1 ;
e l s e next [BFREE] = 1 ’b1 ;
end
state [BWAIT] : begin
gnt = 1 ’b1 ;
i f ( ! dly ) next [BFREE] = 1 ’ b1 ;
e l s e next [BWAIT] = 1 ’ b1 ;
end
state [BFREE] : begin
17
i f ( req ) next [BBUSY] = 1 ’ b1 ;
e l s e next [ IDLE ] = 1 ’b1 ;
end
endcase
end
endmodule
1.7.5 Registered FSM Outputs
synthesis results by standardizing the output and input delay constraints of synthesized
modules [5].
FSM outputs are easily registered by adding a third always sequential block to an
FSM module where output assignments are generated in a case statement with case
items corresponding to the next state that will be active when the output is clocked.
Listing 1.12: fsm design -three always blocks w/registered outputs
module fsm cc4 fp
( output reg gnt ,
input dly , done , req , clk , rst n ) ;
parameter [ 3 : 0 ] IDLE = 0 ,
BBUSY = 1 ,
BWAIT = 2 ,
BFREE = 3;
reg [ 3 : 0 ] state , next ;
always @( posedge clk or negedge rst n )
i f ( ! rst n ) begin
state <= 4 ’ b0 ;
state [ IDLE ] <= 1 ’ b1 ;
end
e l s e state <= next ;
always @( state or dly or done or req ) begin
next = 4 ’ b0 ;
gnt = 1 ’b0 ;
case (1 ’ b1) // ambit synthesis case = f u l l , p a r a l l e l
state [ IDLE ] : i f ( req ) next [BBUSY] = 1 ’b1 ;
e l s e next [ IDLE ] = 1 ’b1 ;
state [BBUSY] : begin
gnt = 1 ’b1 ;
i f ( ! done ) next [BBUSY] = 1 ’ b1 ;
e l s e i f ( dly ) next [BWAIT] = 1 ’b1 ;
e l s e next [BFREE] = 1 ’b1 ;
18
end
state [BWAIT] : begin
gnt = 1 ’b1 ;
i f ( ! dly ) next [BFREE] = 1 ’ b1 ;
e l s e next [BWAIT] = 1 ’ b1 ;
end
state [BFREE] : begin
i f ( req ) next [BBUSY] = 1 ’ b1 ;
e l s e next [ IDLE ] = 1 ’b1 ;
end
endcase
end
endmodule
One or Two or Three always blocks for FSM??
• Use a two always block coding style to code FSM designs with combinational out-
puts. This style is efficient and easy to code and can also easily handle Mealy FSM
designs.
• Use a three always block coding style to code FSM designs with registered outputs.
This style is efficient and easy to code.
1.8 Clock Domain Crossing
Reference :
• http://www.sunburst-design.com/papers/CummingsSNUG2008Boston CDC.pdf
• http://www.sunburst-design.com/papers/CummingsSNUG2001SJ AsyncClk.pdf
1.8.1 Metastability
Metastbility refers to signals that do not have stable 0 or 1 states for some duration
of time at some point during normal operation of a design. In a multi-clock design,
metastability cannot be avoided but the detrimental effects of metastability
can be neutralized.
Figure 1.4 shows a synchronization failure that occurs when a signal generated in one
clock domain is sampled too close to the rising edge of a clock signal from a second
clock domain. Synchronization failure is caused by an output going metastable and not
converging to a legal stable state by the time the output must be sampled again.
19
Figure 1.4: Asynchronous clocks and synchronization failure
1.8.2 Why is metastability a problem?
metastable output that traverses additional logic in the receiving clock domain can cause
illegal signal values to be propagated throughout the rest of the design. Since the CDC
signal can fluctuate for some period of time, the input logic in the receiving clock domain
might recognize the logic level of the fluctuating signal to be different values and hence
propagate erroneous signals into the receiving clock domain.
1.8.3 Synchronizers
There are two scenarios that are possible when passing signals across CDC boundaries,
and it is important to determine which scenario applies to your design:
• It is permitted to miss samples that are passed between clock domains.
• Every signal passed between clock domains must be sampled.
1.8.4 Two flip-flop synchronizer
”A synchronizer is a device that samples an asynchronous signal and outputs a version
of the signal that has transitions synchronized to a local or sample clock.”
The simplest and most common synchronizer used by digital designers is a two-flip-flop
synchronizer as shown in Figure 1.5.
20
Figure 1.5: Two flip-flop synchronizer
The first flip-flop samples the asynchronous input signal into the new clock domain
and waits for a full clock cycle to permit any metastability on the stage-1 output signal
to decay, then the stage- 1 signal is sampled by the same clock into a second stage
flip-flop, with the intended goal that the stage-2 signal is now a stable and valid signal
synchronized and ready for distribution within the new clock domain.
It is theoretically possible for the stage-1 signal to still be sufficiently metastable by the
time the signal is clocked into the second stage to cause the stage-2 output signal to also
go metastable. The calculation of the probability of the time between synchronization
failures (MTBF) is a function of multiple variables including the clock frequencies used
to generate the input signal and to clock the synchronizing flip-flops.
For most synchronization applications, the two flip-flop synchronizer is suf-
ficient to remove all likely metastability.
21
1.8.5 MTBF - mean time before failure
When calculating MTBF numbers, larger numbers are preferred over smaller numbers.
Larger numbers indicate that metastability could happen frequently, similarly causing
failures within the design. MTBF numbers indicate longer periods of time between
potential failures, while smaller MTBF
Figure 1.6: MTBF
Two of the most important factors that directly impact the MTBF of a synchronizer
circuit are, the sample clock frequency (how fast are signals being sampled into the
receiving clock domain) and the data change frequency (how fast is the data changing
that crosses the CDC boundary). From the above equation, it can be seen that
failures occur more frequently (shorter MTBF) in higher speed designs, or
when the sampled data changes more frequently.
1.8.6 Three flip-flop synchronizer
For some very high speed designs, the MTBF of a two-flop synchronizer is too short and
a third flop is added to increase the MTBF.
Figure 1.7: Three flip-flop synchronizer used in higher speed designs
22
1.8.7 Registering signals from the sending clock domain to avoid
glitches
Consider an example where the signals in the sending clock domain are not registered
before being passed into the receiving clock domain, as shown in Figure 1.8.
Figure 1.8: Unregistered signals sent across a CDC boundary
In this example, the combinational output from the sending clock domain could expe-
rience combinational glitches at the CDC boundary. This combinational glitches effec-
tively increases the data-change frequency potentially creating small bursts of oscillating
data and thereby increasing the potential for sampling changing data and generating
metastable signals at the CDC boundary.
Signals in the sending clock domain should be synchronized (Registered)
before being passed to a CDC boundary. The synchronization of signals from
the sending clock domain reduces the number of edges that can be sampled
in the receiving clock domain, effectively reducing the data-change frequency
in the MTBF equation and hence increasing the time between calculated
failures.
23
Figure 1.9: Registered signals sent across a CDC boundary
In Figure 1.9 The adat flip-flop filters out the combinational glitches on the flip-flop
input (a) and passes a clean signal to the bclk logic.
24
1.8.8 Synchronizing fast signals into slow clock domains
One issue associated with synchronizers is the possibility that a signal from a sending
clock domain might change values twice before it can be sampled, or might be too close
to the sampling edges of a slower clock domain. This possibility must be considered any
time signals are sent from one clock domain to another and a determination must be
made whether missed signals are or are not a problem for the design in question. When
missed samples are not allowed, there are two general approaches to the problem:
• An open-loop solution to ensure that signals are captured without acknowledgment.
• A closed-loop solution that requires acknowledgement of receipt of the signal that
crosses a CDC boundary.
1.8.9 Requirement for reliable signal passing between clock do-
mains
When passing one CDC signal between clock domains through a two-flip-flop synchro-
nizer, the CDC signal must be wider than 1.5 times the cycle width of the receiving
domain clock period. ”input data values must be stable for three destination
clock edges”.
The ”three edge” requirement actually applies to both open-loop and closed-loop solu-
tions, but implementations of the closed-loop solution automatically ensure that at least
three edges are detected for all CDC signals.
Problem - passing a fast CDC pulse
Consider the severely flawed condition where the sending clock domain has a higher
frequency than the receiving clock domain and that a CDC pulse is only one cycle wide
in the sending clock domain. If the CDC signal is only pulsed for one fast-clock cycle,
the CDC signal could go high and low between the rising edges of a slower clock and not
be captured into the slower clock domain as shown in Figure 1.10.
25
Figure 1.10: Short CDC signal pulse missed during synchronization
Problem - sampling a long CDC pulse - but not long enough!
Consider the somewhat non-intuitive and flawed condition where the sending clock do-
main sends a pulse to the receiving clock domain that is slightly wider than the period
of the receiving clock frequency. Under most conditions, the signal will be sampled and
passed, but there is the small but real chance that the CDC pulse will change too close
to the two rising clock edges of the receiving clock domain and thereby violate the setup
time on the first clock edge and violate the hold time of the second clock edge and not
form the anticipated pulse. This possible failure is shown in Figure 1.11.
26
Figure 1.11: Marginal CDC pulse that violates the destination setup and hold times
1.8.10 Open-loop solution - sampling signals with synchronizers
One potential solution to this problem is to assert CDC signals for a period of time that
exceeds the cycle time of the sampling clock as shown in Figure 1.12. As discussed in
section 1.8.9, the minimum pulse width is 1.5X the period of the receiving clock frequency.
The assumption is that the CDC signal will be sampled at least once and possibly twice
by the receiver clock.
Open-loop sampling can be used when relative clock frequencies are fixed
and properly analyzed.
Advantage: the Open-loop solution is the fastest way to pass signals across CDC
boundaries that does not require acknowledgment of the received signal.
27
Figure 1.12: Lengthened pulse to guarantee that the control signal will be sampled
1.8.11 Closed loop solution - sampling signals with synchroniz-
ers
A second potential solution to this problem is to send an enabling control signal, synchro-
nize it into the new clock domain and then pass the synchronized signal back through
another synchronizer to the sending clock domain as an acknowledge signal.
Advantage: synchronizing a feedback signal is a very safe technique to acknowledge
that the first control signal was recognized and sampled into the new clock domain.
Disadvantage: there is potentially considerable delay associated with synchronizing
control signals in both directions before allowing the control signal to change.
28
Figure 1.13: Signal with feedback to acknowledge receipt
1.8.12 Passing multiple signals between clock domains
When passing multiple signals between clock domains, simple synchronizers do not guar-
antee safe delivery of the data.
A frequent mistake made by engineers when working on multi-clock de-
signs is passing multiple CDC bits required in the same transaction from one
clock domain to another and overlooking the importance of the synchronized
sampling of the CDC bits.
The problem is that multiple signals that are synchronized to one clock will experience
small data changing skews that can eventually be sampled on different rising clock edges
in a second clock domain.
29
Multi-bit CDC strategies
To avoid multi-bit CDC skewed sampling scenarios, following multi-bit CDC strategies
can be applied:
• Multi-bit signal consolidation. Where possible, consolidate multiple CDC bits into
1bit CDC signals.
• Multi-cycle path formulations. Use a synchronized load signal to safely pass mul-
tiple CDC bits.
• Pass multiple CDC bits using gray codes.
1.8.13 Multi-bit signal consolidation
Where possible, consolidate multiple CDC signals into a 1bit CDC signal. Check whether
you really need multiple bits to control logic across a CDC boundary. Simply using
synchronizers on all of the CDC bits is not always good enough.
Problem - Two simultaneously required control signals.
In the simple example shown in Figure 1.14, a register in the receiving clock domain
requires both a load signal and an enable signal in order to load a data value into the
register. If both the load and enable signals are driven on the same sending clock edge,
there is a chance that a small skew between the control signals could cause the two signals
to be synchronized into different clock cycles within the receiving clock domain. Under
these conditions, the data would not be loaded into the register.
30
Figure 1.14: Problem - Passing multiple control signals between clock domains
Solution - Consolidation
The solution to the problem in section 1.8.13 is simple, consolidate the control signals.
As shown in Figure 1.15, drive both the load and enable register input signals in the
receiving clock domain from just one load-enable signal. Consolidation will remove the
potential of two control signals arriving shifted in time.
31
Figure 1.15: Consolidating control signals before passing between clock domains
1.8.14 Problem - Multiple Data bits
The diagram in Figure 1.16 shows two encoded control signals being passed between clock
domains. If the two encoded signals are slightly skewed when sampled, an erroneous
decoded output could be generated for one clock period in the receiving clock domain.
32
Figure 1.16: Encoded control signals passed between clock domains
1.8.15 Solutions for passing multiple Data bits
Multi-Cycle Path (MCP) formulations and FIFO techniques can be used to address
problems related to passing multiple data bits between different clock domains.
Multi-Cycle Path (MCP) formulation
Using an MCP formulation is a common technique for safely passing multiple
CDC data bits. An MCP formulation refers to sending unsynchronized data to a
receiving clock domain paired with a synchronized control signal. The data and control
signals are sent simultaneously allowing the data to setup on the inputs of the destination
register while the control signal is synchronized for two receiving clock cycles before it
arrives at the load input of the destination register.
Advantages:
• The sending clock domain is not required to calculate the appropriate pulse width
to send between clock domains.
• The sending clock domain is only required to toggle an enable into the receiving
clock domain to indicate that data has been passed and is ready to be loaded. The
enable signal is not required to return to its initial logic level.
33
This strategy passes multiple CDC data bits without synchronization, and simultaneously
passes a synchronized enable signal to the receiving clock domain. The receiving clock
domain is not allowed to sample the multi-bit CDC signals until the synchronized enable
passes through synchronization and arrives at the receiving register.
This strategy is called a Multi-Cycle Path Formulation due to the fact that the unsyn-
chronized data word is passed directly to the receiving clock domain and held for multiple
receiving clock cycles, allowing an enable signal to be synchronized and recognized into
the receiving clock domain before permitting the unsynchronized data word to change.
Figure 1.17: Logic to pass a synchronized enable pulse between clock domains
Synchronized pulse generation logic
The most common method to pass a synchronized enable signal between clock domains
is to employ a toggling enable signal that is passed to a synchronized pulse generator
to indicate that the unsynchronized multi-cycle data word can be captured on the next
receiving clock edge as shown in Figure 1.18.
34
Figure 1.18: Synchronized pulse generation logic
Figure 1.19: Synchronized enable pulse generation logic and equivalent symbol
1.8.16 Synchronizing counters
When passing multiple signals between clock domains, an important question to ask is,
do I need to sample every value of a signal that is passed from one clock domain to
another? With counters, the answer is frequently, no!
Reference [7] details FIFO design techniques where gray code counters are sampled
between clock domains and intermediate gray count values are often missed. For this
FIFO design, the greater consideration is to make sure that the counters cannot overrun
their boundaries, which could cause missed full and empty flag detection. Even though
the sampled gray count values between clock domains are often missed, the design is
robust and all important gray count values are appropriately sampled. See [7] for details.
Since a valid design might be allowed to skip some count value samples,
can any counter be used to pass count values across a CDC boundary? The
answer is no.
35
Binary counters
One characteristic of binary counters is that half of all sequential binary incrementing
operations require that two or more counter bits must change. Trying to synchronize
a binary counter across a CDC boundary is the same as trying to synchronize multiple
CDC signals into a new clock domain. If a simple 4-bit binary counter changes from
address 7 (binary 0111) to address 8 (binary 1000), all four counter bits will change at
the same time. If a synchronizing clock edge comes in the middle of this transition, it is
possible that any 4-bit binary pattern could be sampled and synchronized into the new
clock domain.
In a FIFO design, the new synchronized binary value might trigger a false full or empty
flag, or even worse, it might not trigger a real full or empty flag causing data to be lost
due to FIFO overflow or causing invalid data to be read from the FIFO due to an attempt
to read data when the FIFO is really empty.
Gray codes
Gray codes are the safest counters that can be used in multi-clock designs. Gray codes
only allow one bit to change for each clock transition, eliminating the problem associated
with trying to synchronize multiple changing CDC bits across a clock domain.
1.9 Synchronous Resets? Asynchronous Resets?
Reference : http://www.sunburst-design.com/papers/CummingsSNUG2002SJ Resets.pdf
1.9.1 General flip-flop coding style notes
Synchronous reset flip-flops with non reset follower flip-flops
Each Verilog procedural block or VHDL process should model only one type of flip-
flop. In other words, a designer should not mix resetable flip-flops with follower flip-flops
(flops with no resets). Follower flip-flops are flipflops that are simple data shift registers.
In the Verilog code of Example 1.20, a flip-flop is used to capture data and then its
output is passed through a follower flip-flop. The first stage of this design is reset with a
synchronous reset. The second stage is a follower flip-flop and is not reset, but because
the two flip-flops were inferred in the same procedural block/process, the reset signal
rst n will be used as a data enable for the second flop. This coding style will generate
extraneous logic as shown in Figure 1.20.
36
Listing 1.13: Bad Verilog coding style to model dissimilar flip-flops
module badFFstyle (q2 , d , clk , rst n ) ;
output q2 ;
input d , clk , rst n ;
reg q2 , q1 ;
always @( posedge clk )
i f ( ! rst n ) q1 <= 1 ’b0 ;
e l s e begin
q1 <= d ;
q2 <= q1 ;
end
endmodule
Figure 1.20: Bad coding style yields a design with an unnecessary loadable flip-flop
The correct way to model a follower flip-flop is with two Verilog procedural blocks as
shown in Example 2a. These coding styles will generate the logic shown in Figure 1.21.
Listing 1.14: Good Verilog coding style to model dissimilar flip-flops
module goodFFstyle (q2 , d , clk , rst n ) ;
output q2 ;
input d , clk , rst n ;
reg q2 , q1 ;
always @( posedge clk )
i f ( ! rst n ) q1 <= 1 ’b0 ;
e l s e q1 <= d ;
37
always @( posedge clk )
q2 <= q1 ;
endmodule
Figure 1.21: Two different types of flip-flops, one with synchronous reset and one
without
It should be noted that the extraneous logic generated by the code in Example 1.20 is
only a result of using a synchronous reset. If an asynchronous reset approach
had been used, then both coding styles would synthesize to the same design
without any extra combinational logic. The generation of different flip-flop styles is
largely a function of the sensitivity lists and if-else statements that are used in the HDL
code.
1.9.2 Synchronous Resets
Synchronous resets are based on the premise that the reset signal will only affect or reset
the state of the flip-flop on the active edge of a clock. The reset can be applied to the
flip-flop as part of the combinational logic generating the d-input to the flip-flop.
Listing 1.15: Correct way to model a flip-flop with synchronous reset using Verilog
module sync resetFFstyle (q , d , clk , rst n ) ;
output q ;
input d , clk , rst n ;
reg q ;
38
always @( posedge clk )
i f ( ! rst n ) q <= 1 ’ b0 ;
e l s e q <= d ;
endmodule
Advantages of synchronous resets
Synchronous reset will synthesize to smaller flip-flops, particularly if the reset is gated
with the logic generating the d-input. But in such a case, the combinational logic gate
count grows, so the overall gate count savings may not be that significant.
• Synchronous resets generally insure that the circuit is 100% synchronous
• Synchronous resets insure that reset can only occur at an active clock edge. The
clock works as a filter for small reset glitches; however, if these glitches occur near
the active clock edge, the flip-flop could go metastable.
• In some designs, the reset must be generated by a set of internal conditions. A
synchronous reset is recommended for these types of designs because it will filter
the logic equation glitches between clocks.
Disadvantages of synchronous resets
• Synchronous resets may need a pulse stretcher to guarantee a reset pulse width
wide enough to ensure reset is present during an active edge of the clock.
When we are working with gated clock (Example SPI interface), it is not
possible to reset the logic through a synchronous reset
1.9.3 Asynchronous resets
Asynchronous resets alone can be very dangerous. The biggest problem with asyn-
chronous resets is the reset release, also called reset removal. Asynchronous reset flip-flops
incorporate a reset pin into the flip-flop design. The reset pin is typically active low (the
flip-flop goes into the reset state when the signal attached to the flip-flop reset pin goes
to a logic low level.)
Listing 1.16: Correct way to model a flip-flop with Asynchronous reset using Verilog
module async resetFFstyle (q , d , clk , rst n ) ;
output q ;
input d , clk , rst n ;
reg q ;
// Verilog −2001: permits comma−separation
// @( posedge clk , negedge rst n )
39
always @( posedge clk or negedge rst n )
i f ( ! rst n ) q <= 1 ’ b0 ;
e l s e q <= d ;
endmodule
Advantages of asynchronous resets
• The biggest advantage to using asynchronous resets is that, as long as the vendor
library has asynchronously resetable flip-flops, the data path is guaranteed to be
clean. Designs that are pushing the limit for data path timing, can not afford to
have added gates and additional net delays in the data path due to logic inserted
to handle synchronous resets. Of course this argument does not hold if the ven-
dor library has flip-flops with synchronous reset inputs and the designer can get
Synopsys to actually use those pins.
• Asynchronous resets doesn’t require free running clock to reset the logic.
Disadvantages of asynchronous resets
• The biggest problem with asynchronous resets is that they are asynchronous, both
at the assertion and at the deassertion of the reset. The assertion is a non issue,
the de-assertion is the issue. If the asynchronous reset is released at or near the
active clock edge of a flip-flop, the output of the flip-flop could go metastable.
• Another problem that an asynchronous reset can have, depending on its source, is
spurious resets due to noise or glitches on the board or system reset.
1.9.4 Asynchronous reset problem
As shown in Figure 1.22, an asynchronous reset signal will be de-asserted asynchronous
to the clock signal. There are two potential problems with this scenario: (1) violation of
reset recovery time and, (2) reset removal happening in different clock cycles for different
sequential elements.
40
Figure 1.22: Asynchronous reset removal recovery time problem
Reset recovery time
Reset recovery time refers to the time between when reset is de-asserted and the time
that the clock signal goes high again. Recovery time is also referred to as a tsu setup time
of the form, PRE or CLR inactive setup time before CLK edge. Missing a recovery time
can cause signal integrity or metastability problems with the registered data outputs.
Reset removal traversing different clock cycles
When reset removal is asynchronous to the rising clock edge, slight differences in prop-
agation delays in either or both the reset signal and the clock signal can cause some
registers or flip-flops to exit the reset state before others.
1.9.5 Reset synchronizer
Guideline: EVERY ASIC/FPGA USING AN ASYNCHRONOUS RESET SHOULD
INCLUDE A RESET SYNCHRONIZER CIRCUIT!!
41
Without a reset synchronizer, the usefulness of the asynchronous reset in the final
system is void even if the reset works during simulation.
The reset synchronizer logic of Figure 1.23 is designed to take advantage of the best of
both asynchronous and synchronous reset styles. An external reset signal asynchronously
Figure 1.23: Reset Synchronizer block diagram
resets a pair of master reset flip-flops, which in turn drive the master reset signal asyn-
chronously through the reset buffer tree to the rest of the flip-flops in the design. The
entire design will be asynchronously reset.
Reset removal is accomplished by de-asserting the reset signal, which then permits the
d-input of the first master reset flip-flop (which is tied high) to be clocked through a reset
synchronizer. It typically takes two rising clock edges after reset removal to synchronize
removal of the master reset.
First flip-flop is required to synchronize the reset signal to the clock pulse where the
second flip-flop is used to remove any metastability that might be caused by the reset
signal being removed asynchronously and too close to the rising clock edge.
A closer examination of the timing now shows that reset distribution timing is the sum
of the a clk-to-q propagation delay, total delay through the reset distribution tree and
42
Figure 1.24: Predictable reset removal to satisfy reset recovery time
meeting the reset recovery time of the destination registers and flip-flops, as shown in
Figure 1.24.
Listing 1.17: The code for the reset synchronizer circuit
module async resetFFstyle2 ( rst n , clk , asyncrst n ) ;
output rst n ;
input clk , asyncrst n ;
reg rst n , r f f 1 ;
always @( posedge clk or negedge asyncrst n )
i f ( ! asyncrst n ) { rst n , r f f 1 } <= 2 ’ b0 ;
e l s e { rst n , r f f 1 } <= { rff1 ,1 ’ b1 };
endmodule
1.9.6 Reset-glitch filtering
One of the biggest issues with asynchronous resets is that they are asynchronous and
therefore carry with them some characteristics that must be dealt with depending on
the source of the reset. With asynchronous resets, any input wide enough to meet the
minimum reset pulse width for a flip-flop will cause the flipflop to reset. If the reset line
43
is subject to glitching, this can be a real problem. Presented here is one approach that
will work to filter out the glitches, but it is ugly! This solution requires that a digital
delay (meaning the delay will vary with temperature, voltage and process) to filter out
small glitches. The reset input pad should also be a Schmidt triggered pad to help with
glitch filtering. Figure 1.25 shows the implementation of this approach.
Figure 1.25: Reset glitch filtering
44
Chapter 2
Xilinx RTL guidelines
Reference :
• http://classes.engineering.wustl.edu/cse460t/images/e/eb/Xst v6s6.pdf
Advantages of VHDL
• Enforces stricter rules, in particular strongly typed, less permissive and error-prone
• Initialization of RAM components in the HDL source code is easier (Verilog initial
blocks are less convenient)
• Package support
• Custom types
• Enumerated types
• No reg versus wire confusion
Advantages of VHDL
• Extension to System Verilog
• C-like syntax
• Results in more compact code
• Block commenting
• No heavy component instantiation as in VHDL
45
2.1 Macro Inference Flow Overview
Macros are inferred during three stages of the XST synthesis flow.
• Basic macros are inferred during HDL Synthesis.
• Complex macros are inferred during Advanced HDL Synthesis.
• Other macros are inferred during Low-Level Optimizations, when timing informa-
tion is available to make more fully-informed decisions.
• Macros inferred during Advanced HDL Synthesis are usually the result of an aggre-
gation of several basic macros previously inferred during HDL Synthesis. In most
cases, the XST inference engine can perform this grouping regardless of hierarchical
boundaries, unless Keep Hierarchy has been set to yes in order to prevent it.
Example; A block RAM is inferred by combining RAM core functionality described
in one user-defined hierarchical block, with a Register described in a different user-
defined hierarchy. This allows you to structure the HDL project in a modular way,
ensuring that XST can recognize relationships among design elements described in
different VHDL entities and Verilog modules.
2.2 Coding Guidelines for Virtex-6, Spartan-6, and
7 Series Devices
These coding guidelines: 1) Minimize slice logic utilization. 2) Maximize circuit perfor-
mance. 3) Utilize device resources such as block RAM components and DSP blocks.
• Do not set or reset Registers asynchronously.
Control set remapping becomes impossible.
Sequential functionality in device resources such as block RAM components
and DSP blocks can be set or reset synchronously only.
You will be unable to leverage device resources resources, or they will be
configured sub-optimally.
Use synchronous initialization instead.
• Use Asynchronous to Synchronous if your own coding guidelines require Registers
to be set or reset asynchronously. This allows you to assess the benefits of using
synchronous set/reset.
• Do not describe Flip-Flops with both a set and a reset.
No Flip-Flop primitives feature both a set and a reset, whether synchronous
or asynchronous.
46
If not rejected by the software, Flip-Flop primitives featuring both a set and
a reset may adversely affect area and performance.
• Always describe the clock enable, set, and reset control inputs of Flip-Flop primi-
tives as active-High. If they are described as active-Low, the resulting inverter logic
will penalize circuit performance
• Suggestions for faster and smaller designs
Use synchronous Set/Reset whenever possible
Use active-high CE and Set/Reset (no local inverter for secondary control
signals)
Try to build your design with as few control signals (Set, reset & clock enable)
as possible
2.2.1 Resource Sharing
XST implements high-level optimizations known as Resource Sharing.
• Resource Sharing minimizes the number of arithmetic operators, resulting in re-
duced device utilization.
• Resource Sharing is based on the principle that two similar arithmetic operators
can be implemented with common resources on the device, provided their respective
outputs are never used simultaneously.
• Resource Sharing usually involves creating additional multiplexing logic to select
between factorized inputs. Factorization is performed in a way that minimizes this
logic.
• Resource Sharing is enabled by default, no matter which overall optimization strat-
egy you have selected.
XST supports Resource Sharing for:
• Adders
• Subtractors
• Adders/Subtractors
• Multipliers
47
Xilinx recommends that you disable Resource Sharing:
• If circuit performance is your primary optimization goal, and
• You are unable to meet timing goals.
2.2.2 Implementing FSM Components on Block RAM Resources
• By default Finite State Machine (FSM) components are implemented on slice logic.
• To save slice logic resources, instruct XST to implement FSM components in block
RAM.
• Implementing FSM components in block RAM can enhance the performance of
large FSM components
• XST cannot implement an FSM in block RAM when the FSM has an asynchronous
reset
2.2.3 Mapping Logic to Block RAM
If you cannot fit the design onto the device, place some of the logic into unused block
RAM. XST does not automatically decide which logic can be placed into block RAM.
You must instruct XST to do so.
• Isolate the part of the Register Transfer Level (RTL) description to be placed into
block RAM in a separate hierarchical block.
• Apply Map Logic on BRAM to the separate hierarchical block, either directly in
the HDL source code, or in the XST Constraint File (XCF).
Block Ram Criteria
The logic implemented in block RAM must satisfy the following criteria:
• All outputs are registered.
• The block contains only one level of Registers, which are Output Registers.
• All Output Registers have the same control signals.
• The Output Registers have a synchronous reset signal.
• The block does not contain multi-source situations or tristate buffers.
48
Rules for Clock Signals
• Use one clock signal and one edge.
• Do not generate internal clock signals because of glitching and clock-skew related
problems
Rules for the Hierarchical Registering of Signals
• Register outputs of leaf-level (Sub blocks) blocks.
• Register the inputs to the chips top-level.
2.2.4 Important Notes
• Case statements results in luts connected in parallel where as if else statements
results in luts connected in series.
• If nested IF statements are necessary, put critical input signals on the first IF
statement.
The critical signal ends up in the last logic stage
Figure 2.1: Nested IF
• CASE statements in a combinatorial process (VHDL) or always statement (Verilog)
– Latches are inferred if outputs are not defined in all branches
– Use default assignments before the CASE statement to prevent latches
• CASE statements in a sequential process (VHDL) or always statement (Verilog)
49
– Clock enables are inferred if outputs are not defined in all branches
– This is not wrong, but might generate a long clock enable equation
– Use default assignments before CASE statement to prevent clock enables
• Consider using one-hot select inputs
– Eliminating the select decoding can improve performance (Only one bit used
at each state. Different select line for different state)
• The advantage of using the dont care for the default, is that the synthesizer will
have more flexibility to create a smaller, faster circuit.
Figure 2.2: FSM encoding
•••• Registering the control signals eliminates the net delay between two registers
• High Fanout: Solutions
50
Figure 2.3: Pipeline Registers
Figure 2.4: Registering High Fanout Signals
– Most likely solution is to duplicated the source of the high fanout net
51
2.2.5 FPGA Power Management Design Techniques
• Static and dynamic power is minimized by using Hard-IP
• Static power reduces because of less number of transistors where as reduction in
dynamic power is because of Reduced trace lengths
• Move functions to dedicated hardware resources
– State machines to BRAMs
– Counters to DSP48s
– Registers to SRLs
52
References
[1] http://www.sunburst-design.com/papers/
[2] William I. Fletcher, An Engineering Approach To Digital Design, New Jersey,
Prentice-Hall, 1980.
[3] Zvi Kohavi, Switching And Finite Automata Theory, Second Edition, New York,
McGraw-Hill Book Company, 1978.
[4] The Programmable Logic Data Book, Xilinx, 1994, pg. 8-171.
[5] Clifford E. Cummings, ”Coding And Scripting Techniques For FSM Designs With
Synthesis-Optimized, Glitch- Free Outputs,” SNUG’2000 Boston (Synopsys Users
Group Boston, MA, 2000) Proceedings, September 2000.
[6] Real Intent, Inc. (white paper), Clock Domain Crossing Demystified: The Second
Generation Solution for CDC Verification, February 2008 - www.realintent.com
[7] Clifford E. Cummings, Simulation and Synthesis Techniques
for Asynchronous FIFO Design, SNUG 2002 - www.sunburst-
design.com/papers/CummingsSNUG2002SJ FIFO1.pdf
53

More Related Content

What's hot

14 static timing_analysis_5_clock_domain_crossing
14 static timing_analysis_5_clock_domain_crossing14 static timing_analysis_5_clock_domain_crossing
14 static timing_analysis_5_clock_domain_crossing
Usha Mehta
 
Axi protocol
Axi protocolAxi protocol
Axi protocol
Rohit Kumar Pathak
 
STM32 Microcontroller Clocks and RCC block
STM32 Microcontroller Clocks and RCC blockSTM32 Microcontroller Clocks and RCC block
STM32 Microcontroller Clocks and RCC block
FastBit Embedded Brain Academy
 
Formal Verification - Formality.pdf
Formal Verification - Formality.pdfFormal Verification - Formality.pdf
Formal Verification - Formality.pdf
Ahmed Abdelazeem
 
Architecture of 8085
Architecture of 8085Architecture of 8085
Architecture of 8085
Sumit Swain
 
dual-port RAM (DPRAM)
dual-port RAM (DPRAM)dual-port RAM (DPRAM)
dual-port RAM (DPRAM)
SACHINKUMAR1890
 
SPI introduction(Serial Peripheral Interface)
SPI introduction(Serial Peripheral Interface)SPI introduction(Serial Peripheral Interface)
SPI introduction(Serial Peripheral Interface)
SUNODH GARLAPATI
 
System Verilog Functional Coverage
System Verilog Functional CoverageSystem Verilog Functional Coverage
System Verilog Functional Coverage
rraimi
 
4. Formal Equivalence Checking (Formality).pptx
4. Formal Equivalence Checking (Formality).pptx4. Formal Equivalence Checking (Formality).pptx
4. Formal Equivalence Checking (Formality).pptx
Ahmed Abdelazeem
 
How to create SystemVerilog verification environment?
How to create SystemVerilog verification environment?How to create SystemVerilog verification environment?
How to create SystemVerilog verification environment?
Sameh El-Ashry
 
8085 microprocessor
8085 microprocessor8085 microprocessor
8085 microprocessor
ganeshdabhole
 
PCIe DL_layer_3.0.1 (1)
PCIe DL_layer_3.0.1 (1)PCIe DL_layer_3.0.1 (1)
PCIe DL_layer_3.0.1 (1)
Rakeshkumar Sachdev
 
Memory Reference instruction
Memory Reference instructionMemory Reference instruction
Memory Reference instruction
mahesh kumar prajapat
 
Verilog Tasks & Functions
Verilog Tasks & FunctionsVerilog Tasks & Functions
Verilog Tasks & Functions
anand hd
 
Arm7 document
Arm7  documentArm7  document
Arm7 document
N Harisha
 
SOC Verification using SystemVerilog
SOC Verification using SystemVerilog SOC Verification using SystemVerilog
SOC Verification using SystemVerilog
Ramdas Mozhikunnath
 
Verilog Lecture4 2014
Verilog Lecture4 2014Verilog Lecture4 2014
Verilog Lecture4 2014
Béo Tú
 
I2C Protocol
I2C ProtocolI2C Protocol
I2C Protocol
Sudhanshu Janwadkar
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and VerificationDVClub
 
Verilog Tutorial - Verilog HDL Tutorial with Examples
Verilog Tutorial - Verilog HDL Tutorial with ExamplesVerilog Tutorial - Verilog HDL Tutorial with Examples
Verilog Tutorial - Verilog HDL Tutorial with Examples
E2MATRIX
 

What's hot (20)

14 static timing_analysis_5_clock_domain_crossing
14 static timing_analysis_5_clock_domain_crossing14 static timing_analysis_5_clock_domain_crossing
14 static timing_analysis_5_clock_domain_crossing
 
Axi protocol
Axi protocolAxi protocol
Axi protocol
 
STM32 Microcontroller Clocks and RCC block
STM32 Microcontroller Clocks and RCC blockSTM32 Microcontroller Clocks and RCC block
STM32 Microcontroller Clocks and RCC block
 
Formal Verification - Formality.pdf
Formal Verification - Formality.pdfFormal Verification - Formality.pdf
Formal Verification - Formality.pdf
 
Architecture of 8085
Architecture of 8085Architecture of 8085
Architecture of 8085
 
dual-port RAM (DPRAM)
dual-port RAM (DPRAM)dual-port RAM (DPRAM)
dual-port RAM (DPRAM)
 
SPI introduction(Serial Peripheral Interface)
SPI introduction(Serial Peripheral Interface)SPI introduction(Serial Peripheral Interface)
SPI introduction(Serial Peripheral Interface)
 
System Verilog Functional Coverage
System Verilog Functional CoverageSystem Verilog Functional Coverage
System Verilog Functional Coverage
 
4. Formal Equivalence Checking (Formality).pptx
4. Formal Equivalence Checking (Formality).pptx4. Formal Equivalence Checking (Formality).pptx
4. Formal Equivalence Checking (Formality).pptx
 
How to create SystemVerilog verification environment?
How to create SystemVerilog verification environment?How to create SystemVerilog verification environment?
How to create SystemVerilog verification environment?
 
8085 microprocessor
8085 microprocessor8085 microprocessor
8085 microprocessor
 
PCIe DL_layer_3.0.1 (1)
PCIe DL_layer_3.0.1 (1)PCIe DL_layer_3.0.1 (1)
PCIe DL_layer_3.0.1 (1)
 
Memory Reference instruction
Memory Reference instructionMemory Reference instruction
Memory Reference instruction
 
Verilog Tasks & Functions
Verilog Tasks & FunctionsVerilog Tasks & Functions
Verilog Tasks & Functions
 
Arm7 document
Arm7  documentArm7  document
Arm7 document
 
SOC Verification using SystemVerilog
SOC Verification using SystemVerilog SOC Verification using SystemVerilog
SOC Verification using SystemVerilog
 
Verilog Lecture4 2014
Verilog Lecture4 2014Verilog Lecture4 2014
Verilog Lecture4 2014
 
I2C Protocol
I2C ProtocolI2C Protocol
I2C Protocol
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and Verification
 
Verilog Tutorial - Verilog HDL Tutorial with Examples
Verilog Tutorial - Verilog HDL Tutorial with ExamplesVerilog Tutorial - Verilog HDL Tutorial with Examples
Verilog Tutorial - Verilog HDL Tutorial with Examples
 

Similar to FPGA Coding Guidelines

TLA+ and PlusCal / An engineer's perspective
TLA+ and PlusCal / An engineer's perspectiveTLA+ and PlusCal / An engineer's perspective
TLA+ and PlusCal / An engineer's perspective
Torao Takami
 
Lect 7: Verilog Behavioral model for Absolute Beginners
Lect 7: Verilog Behavioral model for Absolute BeginnersLect 7: Verilog Behavioral model for Absolute Beginners
Lect 7: Verilog Behavioral model for Absolute Beginners
Dr.YNM
 
Behavioral modeling
Behavioral modelingBehavioral modeling
Behavioral modeling
dennis gookyi
 
RTL Coding Basics in verilog hardware language
RTL Coding Basics in verilog hardware languageRTL Coding Basics in verilog hardware language
RTL Coding Basics in verilog hardware language
MohammedAbdulAzeem51
 
Verilog_Cheat_sheet_1672542963.pdf
Verilog_Cheat_sheet_1672542963.pdfVerilog_Cheat_sheet_1672542963.pdf
Verilog_Cheat_sheet_1672542963.pdf
sagar414433
 
Verilog_Cheat_sheet_1672542963.pdf
Verilog_Cheat_sheet_1672542963.pdfVerilog_Cheat_sheet_1672542963.pdf
Verilog_Cheat_sheet_1672542963.pdf
sagar414433
 
Verilog Cheat sheet-2 (1).pdf
Verilog Cheat sheet-2 (1).pdfVerilog Cheat sheet-2 (1).pdf
Verilog Cheat sheet-2 (1).pdf
DrViswanathKalannaga1
 
Notes: Verilog Part 4- Behavioural Modelling
Notes: Verilog Part 4- Behavioural ModellingNotes: Verilog Part 4- Behavioural Modelling
Notes: Verilog Part 4- Behavioural Modelling
Jay Baxi
 
Kroening et al, v2c a verilog to c translator
Kroening et al, v2c   a verilog to c translatorKroening et al, v2c   a verilog to c translator
Kroening et al, v2c a verilog to c translator
sce,bhopal
 
verilog ppt .pdf
verilog ppt .pdfverilog ppt .pdf
verilog ppt .pdf
RavinaBishnoi8
 
dokumen.tips_verilog-basic-ppt.pdf
dokumen.tips_verilog-basic-ppt.pdfdokumen.tips_verilog-basic-ppt.pdf
dokumen.tips_verilog-basic-ppt.pdf
Velmathi Saravanan
 
Fpga 08-behavioral-modeling-mealy-machine
Fpga 08-behavioral-modeling-mealy-machineFpga 08-behavioral-modeling-mealy-machine
Fpga 08-behavioral-modeling-mealy-machineMalik Tauqir Hasan
 
Test pattern Generation for 4:1 MUX
Test pattern Generation for 4:1 MUXTest pattern Generation for 4:1 MUX
Test pattern Generation for 4:1 MUX
UrmilasSrinivasan
 
C language (Part 2)
C language (Part 2)C language (Part 2)
C language (Part 2)
SURBHI SAROHA
 
A Verilog HDL Test Bench Primer
A Verilog HDL Test Bench PrimerA Verilog HDL Test Bench Primer
A Verilog HDL Test Bench Primer
Nicole Heredia
 
Digital System Design-Switchlevel and Behavioural Modeling
Digital System Design-Switchlevel and Behavioural ModelingDigital System Design-Switchlevel and Behavioural Modeling
Digital System Design-Switchlevel and Behavioural Modeling
Indira Priyadarshini
 
Coding style for good synthesis
Coding style for good synthesisCoding style for good synthesis
Coding style for good synthesis
Vinchipsytm Vlsitraining
 
Vlsiexpt 11 12
Vlsiexpt 11 12Vlsiexpt 11 12
Vlsiexpt 11 12JINCY Soju
 
Ver1-iitkgp.ppt
Ver1-iitkgp.pptVer1-iitkgp.ppt
Ver1-iitkgp.ppt
SouvikSaha842368
 

Similar to FPGA Coding Guidelines (20)

TLA+ and PlusCal / An engineer's perspective
TLA+ and PlusCal / An engineer's perspectiveTLA+ and PlusCal / An engineer's perspective
TLA+ and PlusCal / An engineer's perspective
 
Lect 7: Verilog Behavioral model for Absolute Beginners
Lect 7: Verilog Behavioral model for Absolute BeginnersLect 7: Verilog Behavioral model for Absolute Beginners
Lect 7: Verilog Behavioral model for Absolute Beginners
 
Behavioral modeling
Behavioral modelingBehavioral modeling
Behavioral modeling
 
RTL Coding Basics in verilog hardware language
RTL Coding Basics in verilog hardware languageRTL Coding Basics in verilog hardware language
RTL Coding Basics in verilog hardware language
 
Verilog_Cheat_sheet_1672542963.pdf
Verilog_Cheat_sheet_1672542963.pdfVerilog_Cheat_sheet_1672542963.pdf
Verilog_Cheat_sheet_1672542963.pdf
 
Verilog_Cheat_sheet_1672542963.pdf
Verilog_Cheat_sheet_1672542963.pdfVerilog_Cheat_sheet_1672542963.pdf
Verilog_Cheat_sheet_1672542963.pdf
 
Verilog Cheat sheet-2 (1).pdf
Verilog Cheat sheet-2 (1).pdfVerilog Cheat sheet-2 (1).pdf
Verilog Cheat sheet-2 (1).pdf
 
Notes: Verilog Part 4- Behavioural Modelling
Notes: Verilog Part 4- Behavioural ModellingNotes: Verilog Part 4- Behavioural Modelling
Notes: Verilog Part 4- Behavioural Modelling
 
Kroening et al, v2c a verilog to c translator
Kroening et al, v2c   a verilog to c translatorKroening et al, v2c   a verilog to c translator
Kroening et al, v2c a verilog to c translator
 
verilog ppt .pdf
verilog ppt .pdfverilog ppt .pdf
verilog ppt .pdf
 
dokumen.tips_verilog-basic-ppt.pdf
dokumen.tips_verilog-basic-ppt.pdfdokumen.tips_verilog-basic-ppt.pdf
dokumen.tips_verilog-basic-ppt.pdf
 
Fpga 08-behavioral-modeling-mealy-machine
Fpga 08-behavioral-modeling-mealy-machineFpga 08-behavioral-modeling-mealy-machine
Fpga 08-behavioral-modeling-mealy-machine
 
Test pattern Generation for 4:1 MUX
Test pattern Generation for 4:1 MUXTest pattern Generation for 4:1 MUX
Test pattern Generation for 4:1 MUX
 
Ssc06 e
Ssc06 eSsc06 e
Ssc06 e
 
C language (Part 2)
C language (Part 2)C language (Part 2)
C language (Part 2)
 
A Verilog HDL Test Bench Primer
A Verilog HDL Test Bench PrimerA Verilog HDL Test Bench Primer
A Verilog HDL Test Bench Primer
 
Digital System Design-Switchlevel and Behavioural Modeling
Digital System Design-Switchlevel and Behavioural ModelingDigital System Design-Switchlevel and Behavioural Modeling
Digital System Design-Switchlevel and Behavioural Modeling
 
Coding style for good synthesis
Coding style for good synthesisCoding style for good synthesis
Coding style for good synthesis
 
Vlsiexpt 11 12
Vlsiexpt 11 12Vlsiexpt 11 12
Vlsiexpt 11 12
 
Ver1-iitkgp.ppt
Ver1-iitkgp.pptVer1-iitkgp.ppt
Ver1-iitkgp.ppt
 

Recently uploaded

Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 

Recently uploaded (20)

Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 

FPGA Coding Guidelines

  • 2. Chapter 1 Coding Style For Better Synthesis Reference : • http://www.sunburst-design.com/papers/CummingsSNUG2000SJ NBA.pdf Before giving further explanation and examples of both blocking and nonblocking as- signments, it would be useful to outline eight guidelines that help to accurately simulate hardware, modeled using Verilog. Adherence to these guidelines will also remove 90-100% of the Verilog race conditions encountered by most Verilog designers. • When modeling sequential logic, use nonblocking assignments. • When modeling latches, use nonblocking assignments. • When modeling combinational logic with an always block, use blocking assignments. • When modeling both sequential and combinational logic within the same always block, use nonblocking assignments. • Do not mix blocking and nonblocking assignments in the same always block. • Do not make assignments to the same variable from more than one always block. • Use $strobe to display values that have been assigned using nonblocking assignments. • Do not make assignments using #0 delays 2
  • 3. 1.1 Blocking assignments The blocking assignment operator is an equal sign (”=”). A blocking assignment gets its name because a blocking assignment must evaluate the RHS arguments and complete the assignment without interruption from any other Verilog statement. The assignment is said to ”block” other assignments until the current assignment has completed. The one exception is a blocking assignment with timing delays on the RHS of the blocking operator, which is considered to be a poor coding style. A problem with blocking as- signments occurs when the RHS variable of one assignment in one procedural block is also the LHS variable of another assignment in another procedural block and both equations are scheduled to execute in the same simulation time step. module fbosc1 (y1 , y2 , clk , r s t ) ; output y1 , y2 ; input clk , r s t ; reg y1 , y2 ; always @( posedgeclk or posedgerst ) i f ( r s t ) y1 = 0; // r e s e t e l s e y1 = y2 ; always @( posedgeclk or posedgerst ) i f ( r s t ) y2 = 1; // preset e l s e y2 = y1 ; endmodule 1.2 Nonblocking assignments The nonblocking assignment operator is the same as the less-than-or-equal-to operator (”<=”). A nonblocking assignment gets its name because the assignment evaluates the RHS expression of a nonblocking statement at the beginning of a time step and schedules the LHS update to take place at the end of the time step. Execution of nonblocking assignments can be viewed as a two-step process: • Evaluate the RHS of nonblocking statements at the beginning of the time step. • Update the LHS of nonblocking statements at the end of the time step. Nonblocking assignments are only made to register data types and are therefore only permitted inside of procedural blocks, such as initial blocks and always blocks. Non- blocking assignments are not permitted in continuous assignments 3
  • 4. 1.3 The Verilog ”stratified event queue” the ”stratified event queue” is logically partitioned into four distinct queues for the current simulation time and additional queues for future simulation times. Figure 1.1: Verilog ”stratified event queue”. The active events queue is where most Verilog events are scheduled, including block- ing assignments, continuous assignments, $display commands, evaluation of instance and primitive inputs followed by updates of primitive and instance outputs, and the eval- uation of nonblocking RHS expressions. The LHS of nonblocking assignments are not updated in the active events queue. Events are added to any of the event queues but are only removed from the active events queue. Events that are scheduled on the other event queues will eventually become ”activated,” or promoted into the active events queue. The practice of making #0-delay assignments is generally a flawed practice employed by designers who try to make assignments to the same variable from two separate procedural blocks, attempting to beat Verilog race conditions by scheduling one of the assignments 4
  • 5. to take place slightly later in the same simulation time step. Adding #0-delay as- signments to Verilog models needlessly complicates the analysis of scheduled events. 1.4 Self-triggering always blocks In general, a Verilog always block cannot trigger itself. Consider the oscillator example in Example 1.1. This oscillator uses blocking assignments. Blocking assignments evaluate their RHS expression and update their LHS value without interruption. The blocking assignment must complete before the @(clk) edge-trigger event can be scheduled. By the time the trigger event has been scheduled, the blocking clk assignment has completed; therefore, there is no trigger event from within the always block to trigger the @(clk) trigger. Listing 1.1: Non-self-triggering oscillator using blocking assignments module osc1 ( clk ) ; outputclk ; regclk ; i n i t i a l #10 clk = 0; always @( clk ) #10 clk = ! clk ; endmodule In contrast, the oscillator in Example 1.2 uses nonblocking assignments. After the first @(clk) trigger, the RHS expression of the nonblocking assignment is evaluated and the LHS value scheduled into the nonblocking assign updates event queue. Before the nonblocking assign updatesevent queue is ”activated,” the @(clk) trigger statement is encountered and the always block again becomes sensitive to changes on the clksignal. When the nonblocking LHS value is updated later in the same time step, the @(clk) is again triggered. osc2 example is self triggering(which is not necessarily a recommended coding style). Listing 1.2: Self-triggering oscillator using nonblocking assignments module osc2 ( clk ) ; outputclk ; regclk ; i n i t i a l #10 clk = 0; always @( clk ) #10 clk<= ! clk ; endmodule 5
  • 6. 1.5 Combinational logic - use blocking assignments The code shown in Example 1.3 builds the y-output from three sequentially executed statements. Since nonblocking assignments evaluate the RHS expressions before updating the LHS variables, the values of tmp1 and tmp2 were the original values of these two variables upon entry to this always block and not the values that will be updated at the end of the simulation time step. The y-output will reflect the old values of tmp1 and tmp2, not the values calculated in the current pass of the always block Listing 1.3: Bad combinational logic coding style using nonblocking assignments module ao4 (y , a , b , c , d ) ; output y ; input a , b , c , d ; reg y , tmp1 , tmp2 ; always @( a or b or c or d) begin tmp1 <= a & b ; tmp2 <= c & d ; y <= tmp1 | tmp2 ; end endmodule The code shown in Example 1.4 is identical to the code shown in Example 1.3, except that tmp1 and tmp2 have been added to the sensitivity list. As describe in section 1.2, when the nonblocking assignments update the LHS variables in the nonblocking assign update events queue, the always block will self-trigger and update the y-outputs with the newly calculated tmp1 and tmp2 values. y-output value will now be correct after taking two passes through the always block. Multiple passes through an always block equates to degraded simulation performance and should be avoided if a reasonable alternative exists (use blocking statements for combinational modeling). Listing 1.4: combinational logic coding style using nonblocking assignments module ao5 (y , a , b , c , d ) ; output y ; input a , b , c , d ; reg y , tmp1 , tmp2 ; always @( a or b or c or d or tmp1 or tmp2) begin tmp1 <= a & b ; tmp2 <= c & d ; y <= tmp1 | tmp2 ; end endmodule 6
  • 7. NOTE • Using the $display command with nonblocking assignments does not work • ”Making multiple nonblocking assignments to the same variable in the same always block is defined by the Verilog Standard. The last nonblocking assignment to the same variable wins” 1.5.1 Driving same signal inside two different if condition blocks In the code shown below second if statement will take the precedence always. If both the enables are ’1’ then the output seen is ’in2’. Listing 1.5: One signal two if module onesignaltwoif ( input clk , input en1 , input en2 , input in1 , input in2 , output reg out ) ; always @( posedge ( clk )) begin i f ( en1 ) out <= in1 ; i f ( en2 ) out <=in2 ; end endmodule 1.6 RTL Coding Styles That Yield Simulation and Synthesis Mismatches Reference : • http://www.sunburst-design.com/papers/CummingsSNUG1999SJ SynthMismatch.pdf 7
  • 8. 1.6.1 SENSITIVITY LIST Synthesis tools infer combinational or latching logic from an always block with a sensi- tivity list that does not contain the Verilog keywords posedge or negedge. For a combi- national always block, the logic inferred is derived from the equations in the block and has nothing to do with the sensitivity list. The synthesis tool will read the sensitivity list and compare it against the equations in the always block, only to report coding omissions that might cause a mismatch between pre- and post-synthesis simulations. The presence of signals in a sensitivity list that are not used in the always block will not make any functional difference to either pre- or post-synthesis simulations. The only effect of extraneous signals is that the pre-synthesis simulations will run more slowly. This is due to the fact that the always block will be entered and evaluated more often than is necessary 1.6.2 Incomplete sensitivity list The synthesized logic described by the equations in an always block will always be im- plemented as if the sensitivity list were complete. However, the pre-synthesis simulation functionality of this same always block will be quite different. In module code1a, the sensitivity list is complete; therefore, the pre- and post-synthesis simulations will both simulate a 2-input and gate. In module code1b, the sensitivity list only contains the variable a. The post-synthesis simulations will simulate a 2-input and gate. However, for pre-synthesis simulation, the always block will only be executed when there are changes on variable a. Any changes on variable b that do not coincide with changes on a will not be observed on the output. This functionality will not match that of the 2-input and gate of the post-synthesis model. Finally, module code1c does not contain any sensitivity list. During pre-synthesis simulations, this always block will lock up the simulator into an infinite loop. Yet, the post-synthesis model will again be a 2-input and gate. Listing 1.6: Incomplete sensitivity lists module code1a (o , a , b ) ; output o ; input a , b ; reg o ; always @( a or b) o = a & b ; endmodule ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ module code1b (o , a , b ) ; output o ; input a , b ; reg o ; 8
  • 9. always @( a ) o = a & b ; ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ endmodule module code1c (o , a , b ) ; output o ; input a , b ; reg o ; always o = a & b ; endmodule Note: All three modules infer a 2-input and gate 1.6.3 CASE STATEMENTS Full Case Using the synthesis tool directive //synopsys full case gives more information about the design to the synthesis tool than is provided to the simulation tool. This particular directive is used to inform the synthesis tool that the case statement is fully defined, and that the output assignments for all unused cases are don’t cares. The functionality between pre- and postsynthesized designs may or may not remain the same when using this directive. Additionally, although this directive is telling the synthesis tool to use the unused states as dont cares, this directive will sometimes make designs larger and slower than designs that omit the full case directive. In module code4a, a case statement is coded without using any synthesis directives. The pre- and postsynthesis simulations will match. Module code4b uses a case statement with the synthesis directive full case. Because of the synthesis directive, the en input is optimized away during synthesis and left as a dangling input. The pre-synthesis simulator results of modules code4a and code4b will match the post-synthesis simulation results of module code4a, but will not match the post-synthesis simulation results of module code4b. Listing 1.7: Full Case // no f u l l c a s e // Decoder b u i l t from four 3−input and gates // and two i n v e r t e r s module code4a (y , a , en ) ; output [ 3 : 0 ] y ; input [ 1 : 0 ] a ; input en ; 9
  • 10. reg [ 3 : 0 ] y ; always @( a or en ) begin y = 4 ’ h0 ; case ({en , a}) 3 ’ b1 00 : y [ a ] = 1 ’ b1 ; 3 ’ b1 01 : y [ a ] = 1 ’ b1 ; 3 ’ b1 10 : y [ a ] = 1 ’ b1 ; 3 ’ b1 11 : y [ a ] = 1 ’ b1 ; endcase end endmodule ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ // f u l l c a s e example // Decoder b u i l t from four 2−input nor gates // and two i n v e r t e r s // The enable input i s dangling ( has been optimized away) module code4b (y , a , en ) ; output [ 3 : 0 ] y ; input [ 1 : 0 ] a ; input en ; reg [ 3 : 0 ] y ; always @( a or en ) begin y = 4 ’ h0 ; case ({en , a}) // synopsys f u l l c a s e 3 ’ b1 00 : y [ a ] = 1 ’ b1 ; 3 ’ b1 01 : y [ a ] = 1 ’ b1 ; 3 ’ b1 10 : y [ a ] = 1 ’ b1 ; 3 ’ b1 11 : y [ a ] = 1 ’ b1 ; endcase end endmodule Parallel Case Using the synthesis tool directive //synopsys parallel case gives more information about the design to the synthesis tool than is provided to the simulation tool. This particular directive is used to inform the synthesis tool that all cases should be tested in parallel, even if there are overlapping cases which would normally cause a priority encoder to be inferred. When a design does have overlapping cases, the functionality between pre- and post-synthesis designs will be different. 10
  • 11. The pre-synthesis simulations for modules code5a and code5b below, as well as the postsynthesis structure of module code5a will infer priority encoder functionality. How- ever, the post-synthesis structure for module code5b will be two and gates. The use of the synthesis tool directive //synopsys parallel case will cause priority encoder case state- ments to be implemented as parallel logic, causing pre- and post-synthesis simulation mismatches. Listing 1.8: Parallel Case // no p a r a l l e l c a s e // Priority encoder − 2−input nand gate driving an // inv er te r ( z−output ) and also driving a // 3−input and gate (y−output ) module code5a (y , z , a , b , c , d ) ; output y , z ; input a , b , c , d ; reg y , z ; always @( a or b or c or d) begin {y , z} = 2 ’b0 ; casez ({a , b , c , d}) 4 ’ b11 ??: z = 1; 4 ’b??11: y = 1; endcase end endmodule ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ // p a r a l l e l c a s e // two p a r a l l e l 2−input and gates module code5b (y , z , a , b , c , d ) ; output y , z ; input a , b , c , d ; reg y , z ; always @( a or b or c or d) begin {y , z} = 2 ’b0 ; casez ({a , b , c , d}) // synopsys p a r a l l e l c a s e 4 ’ b11 ??: z = 1; 4 ’b??11: y = 1; endcase end endmodule 11
  • 12. caseX The use of casex statements can cause design problems. A casex treats Xs as ”don’t cares” if they are in either the case expression or the case items. The problem with casex occurs when an input tested by a casex expression is initialized to an unknown state. The pre-synthesis simulation will treat the unknown input as a ”don’t care” when evaluated in the casex statement. The equivalent post-synthesis simulation will propagate Xs through the gate-level model, if that condition is tested. NOTE • caseZ is same as caseX except ’Z’ treated as dont care. • ”Aware of a fact that synthesis directives are not recognized by simulators. While using any synthesis directives makes sure that, it doesn’t lead to pre & post syn- thesis mismatch” 1.7 FSM Reference : • http://www.sunburst-design.com/papers/CummingsICU2002 FSMFundamentals.pdf • http://www.sunburst-design.com/papers/CummingsSNUG2000Boston FSM.pdf A common classification used to describe the type of an FSM is Mealy and Moore state machines[2] [3]. A Moore FSM is a state machine where the outputs are only a function of the present state. A Mealy FSM is a state machine where one or more of the outputs is a function of the present state and one or more of the inputs. 12
  • 13. Figure 1.2: Finite State Machine (FSM) block diagram. 1.7.1 Binary Encoded or Onehot Encoded? Common classifications used to describe the state encoding of an FSM are Binary (or highly encoded) and Onehot. A binary-encoded FSM design only requires as many flip-flops as are needed to uniquely encode the number of states in the state machine. The actual number of flip-flops required is equal to the ceiling of the log-base-2 of the number of states in the FSM. A onehot FSM design requires a flip-flop for each state in the design and only one flip-flop (the flip-flop representing the current or ”hot” state) is set at a time in a onehot FSM design. For a state machine with 9- 16 states, a binary FSM only requires 4 flip- flops while a onehot FSM requires a flip-flop for each state in the design (9-16 flip-flops). FPGA vendors frequently recommend using a onehot state encoding style because flip-flops are plentiful in an FPGA and the combinational logic required to implement a onehot FSM design is typically smaller than most binary encoding styles. Since FPGA performance is typically related to the combinational logic size of the FPGA design, onehot FSMs typically run faster than a binary encoded FSM with larger combinational logic blocks[4]. 13
  • 14. Figure 1.3: FSM encoding. Note: When one hot style is used to code FSM without passing // synopsys paral- lel case directive, synthesis tools always infer priority encoder. This happens because, there is a possibility that where two bits of the state variable are set and the first state is given higher priority. Listing 1.9: one hot // This l o g i c i n f e r p r i o r i t y encoder module fsm onehot1 ( output reg y , z , input wire [ 1 : 0 ] state ) ; parameter [ 3 : 0 ] IDLE = 0 , BBUSY = 1 , BWAIT = 2 , BFREE = 3; always @( state ) begin {y , z} = 2 ’ b0 ; casez (1 ’ b1) state [ IDLE ] : z = 1; state [BBUSY] : y = 1; endcase end endmodule ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ 14
  • 15. // This l o g i c i n f e r p a r a l l e l case module fsm cc4 fp ( output reg y , z , input wire [ 1 : 0 ] state ) ; parameter [ 3 : 0 ] IDLE = 0 , BBUSY = 1 , BWAIT = 2 , BFREE = 3; always @( state ) begin {y , z} = 2 ’b0 ; casez (1 ’ b1) // synopsys p a r a l l e l c a s e state [ IDLE ] : z = 1; state [BBUSY] : y = 1; endcase end endmodule 1.7.2 One Always Block FSM Style (Not Recommended) One of the most common FSM coding styles in use today is the one sequential always block FSM coding style. For most FSM designs, the one always block FSM coding style is more verbose, more confusing and more error prone than a comparable two always block coding style. 1.7.3 Two Always Block FSM Style One of the best Verilog coding styles is to code the FSM design using two always blocks, one for the sequential state register and one for the combinational next-state and com- binational output logic. Listing 1.10: fsm design - two always block style module fsm cc4 2 ( output reg gnt , input dly , done , req , clk , rst n ) ; parameter [ 1 : 0 ] IDLE = 2 ’ b00 , BBUSY = 2 ’ b01 , BWAIT = 2 ’ b10 , BFREE = 2 ’ b11 ; reg [ 1 : 0 ] state , next ; always @( posedge clk or negedge rst n ) i f ( ! rst n ) state <= IDLE ; e l s e state <= next ; 15
  • 16. always @( state or dly or done or req ) begin next = 2 ’bx ; gnt = 1 ’b0 ; case ( state ) IDLE : i f ( req ) next = BBUSY; e l s e next = IDLE ; BBUSY: begin gnt = 1 ’b1 ; i f ( ! done ) next = BBUSY; e l s e i f ( dly ) next = BWAIT; e l s e next = BFREE; end BWAIT: begin gnt = 1 ’b1 ; i f ( ! dly ) next = BFREE; e l s e next = BWAIT; end BFREE: i f ( req ) next = BBUSY; e l s e next = IDLE ; endcase end endmodule FSM Coding Notes • Parameters (Parameters are constants that are local to a module) are used to define state encodings instead of the Verilog ‘define macro definition construct. After parameter definitions are created, the parameters are used throughout the rest of the design, not the state encodings. • The sequential always block is coded using nonblocking assignments. • The combinational always block sensitivity list is sensitive to changes on the state variable and all of the inputs referenced in the combinational always block. • Assignments within the combinational always block are made using Verilog blocking assignments. • Default output and next state assignments are made before coding the case state- ment as shown in 1.10. This eliminates latches and reduces the amount of code required to code the rest of the outputs in the case statement and highlights in the case statement exactly in which states the individual output(s) change). 16
  • 17. • Assignments within the combinational always block are made using Verilog blocking assignments. 1.7.4 Onehot FSM Coding Style Efficient (small and fast) onehot state machines can be coded using an inverse case statement; a case statement where each case item is an expression that evaluates to true or false. Listing 1.11: fsm design -onehot style module fsm cc4 fp ( output reg gnt , input dly , done , req , clk , rst n ) ; parameter [ 3 : 0 ] IDLE = 0 , BBUSY = 1 , BWAIT = 2 , BFREE = 3; reg [ 3 : 0 ] state , next ; always @( posedge clk or negedge rst n ) i f ( ! rst n ) begin state <= 4 ’b0 ; state [ IDLE ] <= 1 ’b1 ; end e l s e state <= next ; always @( state or dly or done or req ) begin next = 4 ’ b0 ; gnt = 1 ’b0 ; case (1 ’ b1) // ambit synthesis case = f u l l , p a r a l l e l state [ IDLE ] : i f ( req ) next [BBUSY] = 1 ’b1 ; e l s e next [ IDLE ] = 1 ’b1 ; state [BBUSY] : begin gnt = 1 ’b1 ; i f ( ! done ) next [BBUSY] = 1 ’ b1 ; e l s e i f ( dly ) next [BWAIT] = 1 ’b1 ; e l s e next [BFREE] = 1 ’b1 ; end state [BWAIT] : begin gnt = 1 ’b1 ; i f ( ! dly ) next [BFREE] = 1 ’ b1 ; e l s e next [BWAIT] = 1 ’ b1 ; end state [BFREE] : begin 17
  • 18. i f ( req ) next [BBUSY] = 1 ’ b1 ; e l s e next [ IDLE ] = 1 ’b1 ; end endcase end endmodule 1.7.5 Registered FSM Outputs synthesis results by standardizing the output and input delay constraints of synthesized modules [5]. FSM outputs are easily registered by adding a third always sequential block to an FSM module where output assignments are generated in a case statement with case items corresponding to the next state that will be active when the output is clocked. Listing 1.12: fsm design -three always blocks w/registered outputs module fsm cc4 fp ( output reg gnt , input dly , done , req , clk , rst n ) ; parameter [ 3 : 0 ] IDLE = 0 , BBUSY = 1 , BWAIT = 2 , BFREE = 3; reg [ 3 : 0 ] state , next ; always @( posedge clk or negedge rst n ) i f ( ! rst n ) begin state <= 4 ’ b0 ; state [ IDLE ] <= 1 ’ b1 ; end e l s e state <= next ; always @( state or dly or done or req ) begin next = 4 ’ b0 ; gnt = 1 ’b0 ; case (1 ’ b1) // ambit synthesis case = f u l l , p a r a l l e l state [ IDLE ] : i f ( req ) next [BBUSY] = 1 ’b1 ; e l s e next [ IDLE ] = 1 ’b1 ; state [BBUSY] : begin gnt = 1 ’b1 ; i f ( ! done ) next [BBUSY] = 1 ’ b1 ; e l s e i f ( dly ) next [BWAIT] = 1 ’b1 ; e l s e next [BFREE] = 1 ’b1 ; 18
  • 19. end state [BWAIT] : begin gnt = 1 ’b1 ; i f ( ! dly ) next [BFREE] = 1 ’ b1 ; e l s e next [BWAIT] = 1 ’ b1 ; end state [BFREE] : begin i f ( req ) next [BBUSY] = 1 ’ b1 ; e l s e next [ IDLE ] = 1 ’b1 ; end endcase end endmodule One or Two or Three always blocks for FSM?? • Use a two always block coding style to code FSM designs with combinational out- puts. This style is efficient and easy to code and can also easily handle Mealy FSM designs. • Use a three always block coding style to code FSM designs with registered outputs. This style is efficient and easy to code. 1.8 Clock Domain Crossing Reference : • http://www.sunburst-design.com/papers/CummingsSNUG2008Boston CDC.pdf • http://www.sunburst-design.com/papers/CummingsSNUG2001SJ AsyncClk.pdf 1.8.1 Metastability Metastbility refers to signals that do not have stable 0 or 1 states for some duration of time at some point during normal operation of a design. In a multi-clock design, metastability cannot be avoided but the detrimental effects of metastability can be neutralized. Figure 1.4 shows a synchronization failure that occurs when a signal generated in one clock domain is sampled too close to the rising edge of a clock signal from a second clock domain. Synchronization failure is caused by an output going metastable and not converging to a legal stable state by the time the output must be sampled again. 19
  • 20. Figure 1.4: Asynchronous clocks and synchronization failure 1.8.2 Why is metastability a problem? metastable output that traverses additional logic in the receiving clock domain can cause illegal signal values to be propagated throughout the rest of the design. Since the CDC signal can fluctuate for some period of time, the input logic in the receiving clock domain might recognize the logic level of the fluctuating signal to be different values and hence propagate erroneous signals into the receiving clock domain. 1.8.3 Synchronizers There are two scenarios that are possible when passing signals across CDC boundaries, and it is important to determine which scenario applies to your design: • It is permitted to miss samples that are passed between clock domains. • Every signal passed between clock domains must be sampled. 1.8.4 Two flip-flop synchronizer ”A synchronizer is a device that samples an asynchronous signal and outputs a version of the signal that has transitions synchronized to a local or sample clock.” The simplest and most common synchronizer used by digital designers is a two-flip-flop synchronizer as shown in Figure 1.5. 20
  • 21. Figure 1.5: Two flip-flop synchronizer The first flip-flop samples the asynchronous input signal into the new clock domain and waits for a full clock cycle to permit any metastability on the stage-1 output signal to decay, then the stage- 1 signal is sampled by the same clock into a second stage flip-flop, with the intended goal that the stage-2 signal is now a stable and valid signal synchronized and ready for distribution within the new clock domain. It is theoretically possible for the stage-1 signal to still be sufficiently metastable by the time the signal is clocked into the second stage to cause the stage-2 output signal to also go metastable. The calculation of the probability of the time between synchronization failures (MTBF) is a function of multiple variables including the clock frequencies used to generate the input signal and to clock the synchronizing flip-flops. For most synchronization applications, the two flip-flop synchronizer is suf- ficient to remove all likely metastability. 21
  • 22. 1.8.5 MTBF - mean time before failure When calculating MTBF numbers, larger numbers are preferred over smaller numbers. Larger numbers indicate that metastability could happen frequently, similarly causing failures within the design. MTBF numbers indicate longer periods of time between potential failures, while smaller MTBF Figure 1.6: MTBF Two of the most important factors that directly impact the MTBF of a synchronizer circuit are, the sample clock frequency (how fast are signals being sampled into the receiving clock domain) and the data change frequency (how fast is the data changing that crosses the CDC boundary). From the above equation, it can be seen that failures occur more frequently (shorter MTBF) in higher speed designs, or when the sampled data changes more frequently. 1.8.6 Three flip-flop synchronizer For some very high speed designs, the MTBF of a two-flop synchronizer is too short and a third flop is added to increase the MTBF. Figure 1.7: Three flip-flop synchronizer used in higher speed designs 22
  • 23. 1.8.7 Registering signals from the sending clock domain to avoid glitches Consider an example where the signals in the sending clock domain are not registered before being passed into the receiving clock domain, as shown in Figure 1.8. Figure 1.8: Unregistered signals sent across a CDC boundary In this example, the combinational output from the sending clock domain could expe- rience combinational glitches at the CDC boundary. This combinational glitches effec- tively increases the data-change frequency potentially creating small bursts of oscillating data and thereby increasing the potential for sampling changing data and generating metastable signals at the CDC boundary. Signals in the sending clock domain should be synchronized (Registered) before being passed to a CDC boundary. The synchronization of signals from the sending clock domain reduces the number of edges that can be sampled in the receiving clock domain, effectively reducing the data-change frequency in the MTBF equation and hence increasing the time between calculated failures. 23
  • 24. Figure 1.9: Registered signals sent across a CDC boundary In Figure 1.9 The adat flip-flop filters out the combinational glitches on the flip-flop input (a) and passes a clean signal to the bclk logic. 24
  • 25. 1.8.8 Synchronizing fast signals into slow clock domains One issue associated with synchronizers is the possibility that a signal from a sending clock domain might change values twice before it can be sampled, or might be too close to the sampling edges of a slower clock domain. This possibility must be considered any time signals are sent from one clock domain to another and a determination must be made whether missed signals are or are not a problem for the design in question. When missed samples are not allowed, there are two general approaches to the problem: • An open-loop solution to ensure that signals are captured without acknowledgment. • A closed-loop solution that requires acknowledgement of receipt of the signal that crosses a CDC boundary. 1.8.9 Requirement for reliable signal passing between clock do- mains When passing one CDC signal between clock domains through a two-flip-flop synchro- nizer, the CDC signal must be wider than 1.5 times the cycle width of the receiving domain clock period. ”input data values must be stable for three destination clock edges”. The ”three edge” requirement actually applies to both open-loop and closed-loop solu- tions, but implementations of the closed-loop solution automatically ensure that at least three edges are detected for all CDC signals. Problem - passing a fast CDC pulse Consider the severely flawed condition where the sending clock domain has a higher frequency than the receiving clock domain and that a CDC pulse is only one cycle wide in the sending clock domain. If the CDC signal is only pulsed for one fast-clock cycle, the CDC signal could go high and low between the rising edges of a slower clock and not be captured into the slower clock domain as shown in Figure 1.10. 25
  • 26. Figure 1.10: Short CDC signal pulse missed during synchronization Problem - sampling a long CDC pulse - but not long enough! Consider the somewhat non-intuitive and flawed condition where the sending clock do- main sends a pulse to the receiving clock domain that is slightly wider than the period of the receiving clock frequency. Under most conditions, the signal will be sampled and passed, but there is the small but real chance that the CDC pulse will change too close to the two rising clock edges of the receiving clock domain and thereby violate the setup time on the first clock edge and violate the hold time of the second clock edge and not form the anticipated pulse. This possible failure is shown in Figure 1.11. 26
  • 27. Figure 1.11: Marginal CDC pulse that violates the destination setup and hold times 1.8.10 Open-loop solution - sampling signals with synchronizers One potential solution to this problem is to assert CDC signals for a period of time that exceeds the cycle time of the sampling clock as shown in Figure 1.12. As discussed in section 1.8.9, the minimum pulse width is 1.5X the period of the receiving clock frequency. The assumption is that the CDC signal will be sampled at least once and possibly twice by the receiver clock. Open-loop sampling can be used when relative clock frequencies are fixed and properly analyzed. Advantage: the Open-loop solution is the fastest way to pass signals across CDC boundaries that does not require acknowledgment of the received signal. 27
  • 28. Figure 1.12: Lengthened pulse to guarantee that the control signal will be sampled 1.8.11 Closed loop solution - sampling signals with synchroniz- ers A second potential solution to this problem is to send an enabling control signal, synchro- nize it into the new clock domain and then pass the synchronized signal back through another synchronizer to the sending clock domain as an acknowledge signal. Advantage: synchronizing a feedback signal is a very safe technique to acknowledge that the first control signal was recognized and sampled into the new clock domain. Disadvantage: there is potentially considerable delay associated with synchronizing control signals in both directions before allowing the control signal to change. 28
  • 29. Figure 1.13: Signal with feedback to acknowledge receipt 1.8.12 Passing multiple signals between clock domains When passing multiple signals between clock domains, simple synchronizers do not guar- antee safe delivery of the data. A frequent mistake made by engineers when working on multi-clock de- signs is passing multiple CDC bits required in the same transaction from one clock domain to another and overlooking the importance of the synchronized sampling of the CDC bits. The problem is that multiple signals that are synchronized to one clock will experience small data changing skews that can eventually be sampled on different rising clock edges in a second clock domain. 29
  • 30. Multi-bit CDC strategies To avoid multi-bit CDC skewed sampling scenarios, following multi-bit CDC strategies can be applied: • Multi-bit signal consolidation. Where possible, consolidate multiple CDC bits into 1bit CDC signals. • Multi-cycle path formulations. Use a synchronized load signal to safely pass mul- tiple CDC bits. • Pass multiple CDC bits using gray codes. 1.8.13 Multi-bit signal consolidation Where possible, consolidate multiple CDC signals into a 1bit CDC signal. Check whether you really need multiple bits to control logic across a CDC boundary. Simply using synchronizers on all of the CDC bits is not always good enough. Problem - Two simultaneously required control signals. In the simple example shown in Figure 1.14, a register in the receiving clock domain requires both a load signal and an enable signal in order to load a data value into the register. If both the load and enable signals are driven on the same sending clock edge, there is a chance that a small skew between the control signals could cause the two signals to be synchronized into different clock cycles within the receiving clock domain. Under these conditions, the data would not be loaded into the register. 30
  • 31. Figure 1.14: Problem - Passing multiple control signals between clock domains Solution - Consolidation The solution to the problem in section 1.8.13 is simple, consolidate the control signals. As shown in Figure 1.15, drive both the load and enable register input signals in the receiving clock domain from just one load-enable signal. Consolidation will remove the potential of two control signals arriving shifted in time. 31
  • 32. Figure 1.15: Consolidating control signals before passing between clock domains 1.8.14 Problem - Multiple Data bits The diagram in Figure 1.16 shows two encoded control signals being passed between clock domains. If the two encoded signals are slightly skewed when sampled, an erroneous decoded output could be generated for one clock period in the receiving clock domain. 32
  • 33. Figure 1.16: Encoded control signals passed between clock domains 1.8.15 Solutions for passing multiple Data bits Multi-Cycle Path (MCP) formulations and FIFO techniques can be used to address problems related to passing multiple data bits between different clock domains. Multi-Cycle Path (MCP) formulation Using an MCP formulation is a common technique for safely passing multiple CDC data bits. An MCP formulation refers to sending unsynchronized data to a receiving clock domain paired with a synchronized control signal. The data and control signals are sent simultaneously allowing the data to setup on the inputs of the destination register while the control signal is synchronized for two receiving clock cycles before it arrives at the load input of the destination register. Advantages: • The sending clock domain is not required to calculate the appropriate pulse width to send between clock domains. • The sending clock domain is only required to toggle an enable into the receiving clock domain to indicate that data has been passed and is ready to be loaded. The enable signal is not required to return to its initial logic level. 33
  • 34. This strategy passes multiple CDC data bits without synchronization, and simultaneously passes a synchronized enable signal to the receiving clock domain. The receiving clock domain is not allowed to sample the multi-bit CDC signals until the synchronized enable passes through synchronization and arrives at the receiving register. This strategy is called a Multi-Cycle Path Formulation due to the fact that the unsyn- chronized data word is passed directly to the receiving clock domain and held for multiple receiving clock cycles, allowing an enable signal to be synchronized and recognized into the receiving clock domain before permitting the unsynchronized data word to change. Figure 1.17: Logic to pass a synchronized enable pulse between clock domains Synchronized pulse generation logic The most common method to pass a synchronized enable signal between clock domains is to employ a toggling enable signal that is passed to a synchronized pulse generator to indicate that the unsynchronized multi-cycle data word can be captured on the next receiving clock edge as shown in Figure 1.18. 34
  • 35. Figure 1.18: Synchronized pulse generation logic Figure 1.19: Synchronized enable pulse generation logic and equivalent symbol 1.8.16 Synchronizing counters When passing multiple signals between clock domains, an important question to ask is, do I need to sample every value of a signal that is passed from one clock domain to another? With counters, the answer is frequently, no! Reference [7] details FIFO design techniques where gray code counters are sampled between clock domains and intermediate gray count values are often missed. For this FIFO design, the greater consideration is to make sure that the counters cannot overrun their boundaries, which could cause missed full and empty flag detection. Even though the sampled gray count values between clock domains are often missed, the design is robust and all important gray count values are appropriately sampled. See [7] for details. Since a valid design might be allowed to skip some count value samples, can any counter be used to pass count values across a CDC boundary? The answer is no. 35
  • 36. Binary counters One characteristic of binary counters is that half of all sequential binary incrementing operations require that two or more counter bits must change. Trying to synchronize a binary counter across a CDC boundary is the same as trying to synchronize multiple CDC signals into a new clock domain. If a simple 4-bit binary counter changes from address 7 (binary 0111) to address 8 (binary 1000), all four counter bits will change at the same time. If a synchronizing clock edge comes in the middle of this transition, it is possible that any 4-bit binary pattern could be sampled and synchronized into the new clock domain. In a FIFO design, the new synchronized binary value might trigger a false full or empty flag, or even worse, it might not trigger a real full or empty flag causing data to be lost due to FIFO overflow or causing invalid data to be read from the FIFO due to an attempt to read data when the FIFO is really empty. Gray codes Gray codes are the safest counters that can be used in multi-clock designs. Gray codes only allow one bit to change for each clock transition, eliminating the problem associated with trying to synchronize multiple changing CDC bits across a clock domain. 1.9 Synchronous Resets? Asynchronous Resets? Reference : http://www.sunburst-design.com/papers/CummingsSNUG2002SJ Resets.pdf 1.9.1 General flip-flop coding style notes Synchronous reset flip-flops with non reset follower flip-flops Each Verilog procedural block or VHDL process should model only one type of flip- flop. In other words, a designer should not mix resetable flip-flops with follower flip-flops (flops with no resets). Follower flip-flops are flipflops that are simple data shift registers. In the Verilog code of Example 1.20, a flip-flop is used to capture data and then its output is passed through a follower flip-flop. The first stage of this design is reset with a synchronous reset. The second stage is a follower flip-flop and is not reset, but because the two flip-flops were inferred in the same procedural block/process, the reset signal rst n will be used as a data enable for the second flop. This coding style will generate extraneous logic as shown in Figure 1.20. 36
  • 37. Listing 1.13: Bad Verilog coding style to model dissimilar flip-flops module badFFstyle (q2 , d , clk , rst n ) ; output q2 ; input d , clk , rst n ; reg q2 , q1 ; always @( posedge clk ) i f ( ! rst n ) q1 <= 1 ’b0 ; e l s e begin q1 <= d ; q2 <= q1 ; end endmodule Figure 1.20: Bad coding style yields a design with an unnecessary loadable flip-flop The correct way to model a follower flip-flop is with two Verilog procedural blocks as shown in Example 2a. These coding styles will generate the logic shown in Figure 1.21. Listing 1.14: Good Verilog coding style to model dissimilar flip-flops module goodFFstyle (q2 , d , clk , rst n ) ; output q2 ; input d , clk , rst n ; reg q2 , q1 ; always @( posedge clk ) i f ( ! rst n ) q1 <= 1 ’b0 ; e l s e q1 <= d ; 37
  • 38. always @( posedge clk ) q2 <= q1 ; endmodule Figure 1.21: Two different types of flip-flops, one with synchronous reset and one without It should be noted that the extraneous logic generated by the code in Example 1.20 is only a result of using a synchronous reset. If an asynchronous reset approach had been used, then both coding styles would synthesize to the same design without any extra combinational logic. The generation of different flip-flop styles is largely a function of the sensitivity lists and if-else statements that are used in the HDL code. 1.9.2 Synchronous Resets Synchronous resets are based on the premise that the reset signal will only affect or reset the state of the flip-flop on the active edge of a clock. The reset can be applied to the flip-flop as part of the combinational logic generating the d-input to the flip-flop. Listing 1.15: Correct way to model a flip-flop with synchronous reset using Verilog module sync resetFFstyle (q , d , clk , rst n ) ; output q ; input d , clk , rst n ; reg q ; 38
  • 39. always @( posedge clk ) i f ( ! rst n ) q <= 1 ’ b0 ; e l s e q <= d ; endmodule Advantages of synchronous resets Synchronous reset will synthesize to smaller flip-flops, particularly if the reset is gated with the logic generating the d-input. But in such a case, the combinational logic gate count grows, so the overall gate count savings may not be that significant. • Synchronous resets generally insure that the circuit is 100% synchronous • Synchronous resets insure that reset can only occur at an active clock edge. The clock works as a filter for small reset glitches; however, if these glitches occur near the active clock edge, the flip-flop could go metastable. • In some designs, the reset must be generated by a set of internal conditions. A synchronous reset is recommended for these types of designs because it will filter the logic equation glitches between clocks. Disadvantages of synchronous resets • Synchronous resets may need a pulse stretcher to guarantee a reset pulse width wide enough to ensure reset is present during an active edge of the clock. When we are working with gated clock (Example SPI interface), it is not possible to reset the logic through a synchronous reset 1.9.3 Asynchronous resets Asynchronous resets alone can be very dangerous. The biggest problem with asyn- chronous resets is the reset release, also called reset removal. Asynchronous reset flip-flops incorporate a reset pin into the flip-flop design. The reset pin is typically active low (the flip-flop goes into the reset state when the signal attached to the flip-flop reset pin goes to a logic low level.) Listing 1.16: Correct way to model a flip-flop with Asynchronous reset using Verilog module async resetFFstyle (q , d , clk , rst n ) ; output q ; input d , clk , rst n ; reg q ; // Verilog −2001: permits comma−separation // @( posedge clk , negedge rst n ) 39
  • 40. always @( posedge clk or negedge rst n ) i f ( ! rst n ) q <= 1 ’ b0 ; e l s e q <= d ; endmodule Advantages of asynchronous resets • The biggest advantage to using asynchronous resets is that, as long as the vendor library has asynchronously resetable flip-flops, the data path is guaranteed to be clean. Designs that are pushing the limit for data path timing, can not afford to have added gates and additional net delays in the data path due to logic inserted to handle synchronous resets. Of course this argument does not hold if the ven- dor library has flip-flops with synchronous reset inputs and the designer can get Synopsys to actually use those pins. • Asynchronous resets doesn’t require free running clock to reset the logic. Disadvantages of asynchronous resets • The biggest problem with asynchronous resets is that they are asynchronous, both at the assertion and at the deassertion of the reset. The assertion is a non issue, the de-assertion is the issue. If the asynchronous reset is released at or near the active clock edge of a flip-flop, the output of the flip-flop could go metastable. • Another problem that an asynchronous reset can have, depending on its source, is spurious resets due to noise or glitches on the board or system reset. 1.9.4 Asynchronous reset problem As shown in Figure 1.22, an asynchronous reset signal will be de-asserted asynchronous to the clock signal. There are two potential problems with this scenario: (1) violation of reset recovery time and, (2) reset removal happening in different clock cycles for different sequential elements. 40
  • 41. Figure 1.22: Asynchronous reset removal recovery time problem Reset recovery time Reset recovery time refers to the time between when reset is de-asserted and the time that the clock signal goes high again. Recovery time is also referred to as a tsu setup time of the form, PRE or CLR inactive setup time before CLK edge. Missing a recovery time can cause signal integrity or metastability problems with the registered data outputs. Reset removal traversing different clock cycles When reset removal is asynchronous to the rising clock edge, slight differences in prop- agation delays in either or both the reset signal and the clock signal can cause some registers or flip-flops to exit the reset state before others. 1.9.5 Reset synchronizer Guideline: EVERY ASIC/FPGA USING AN ASYNCHRONOUS RESET SHOULD INCLUDE A RESET SYNCHRONIZER CIRCUIT!! 41
  • 42. Without a reset synchronizer, the usefulness of the asynchronous reset in the final system is void even if the reset works during simulation. The reset synchronizer logic of Figure 1.23 is designed to take advantage of the best of both asynchronous and synchronous reset styles. An external reset signal asynchronously Figure 1.23: Reset Synchronizer block diagram resets a pair of master reset flip-flops, which in turn drive the master reset signal asyn- chronously through the reset buffer tree to the rest of the flip-flops in the design. The entire design will be asynchronously reset. Reset removal is accomplished by de-asserting the reset signal, which then permits the d-input of the first master reset flip-flop (which is tied high) to be clocked through a reset synchronizer. It typically takes two rising clock edges after reset removal to synchronize removal of the master reset. First flip-flop is required to synchronize the reset signal to the clock pulse where the second flip-flop is used to remove any metastability that might be caused by the reset signal being removed asynchronously and too close to the rising clock edge. A closer examination of the timing now shows that reset distribution timing is the sum of the a clk-to-q propagation delay, total delay through the reset distribution tree and 42
  • 43. Figure 1.24: Predictable reset removal to satisfy reset recovery time meeting the reset recovery time of the destination registers and flip-flops, as shown in Figure 1.24. Listing 1.17: The code for the reset synchronizer circuit module async resetFFstyle2 ( rst n , clk , asyncrst n ) ; output rst n ; input clk , asyncrst n ; reg rst n , r f f 1 ; always @( posedge clk or negedge asyncrst n ) i f ( ! asyncrst n ) { rst n , r f f 1 } <= 2 ’ b0 ; e l s e { rst n , r f f 1 } <= { rff1 ,1 ’ b1 }; endmodule 1.9.6 Reset-glitch filtering One of the biggest issues with asynchronous resets is that they are asynchronous and therefore carry with them some characteristics that must be dealt with depending on the source of the reset. With asynchronous resets, any input wide enough to meet the minimum reset pulse width for a flip-flop will cause the flipflop to reset. If the reset line 43
  • 44. is subject to glitching, this can be a real problem. Presented here is one approach that will work to filter out the glitches, but it is ugly! This solution requires that a digital delay (meaning the delay will vary with temperature, voltage and process) to filter out small glitches. The reset input pad should also be a Schmidt triggered pad to help with glitch filtering. Figure 1.25 shows the implementation of this approach. Figure 1.25: Reset glitch filtering 44
  • 45. Chapter 2 Xilinx RTL guidelines Reference : • http://classes.engineering.wustl.edu/cse460t/images/e/eb/Xst v6s6.pdf Advantages of VHDL • Enforces stricter rules, in particular strongly typed, less permissive and error-prone • Initialization of RAM components in the HDL source code is easier (Verilog initial blocks are less convenient) • Package support • Custom types • Enumerated types • No reg versus wire confusion Advantages of VHDL • Extension to System Verilog • C-like syntax • Results in more compact code • Block commenting • No heavy component instantiation as in VHDL 45
  • 46. 2.1 Macro Inference Flow Overview Macros are inferred during three stages of the XST synthesis flow. • Basic macros are inferred during HDL Synthesis. • Complex macros are inferred during Advanced HDL Synthesis. • Other macros are inferred during Low-Level Optimizations, when timing informa- tion is available to make more fully-informed decisions. • Macros inferred during Advanced HDL Synthesis are usually the result of an aggre- gation of several basic macros previously inferred during HDL Synthesis. In most cases, the XST inference engine can perform this grouping regardless of hierarchical boundaries, unless Keep Hierarchy has been set to yes in order to prevent it. Example; A block RAM is inferred by combining RAM core functionality described in one user-defined hierarchical block, with a Register described in a different user- defined hierarchy. This allows you to structure the HDL project in a modular way, ensuring that XST can recognize relationships among design elements described in different VHDL entities and Verilog modules. 2.2 Coding Guidelines for Virtex-6, Spartan-6, and 7 Series Devices These coding guidelines: 1) Minimize slice logic utilization. 2) Maximize circuit perfor- mance. 3) Utilize device resources such as block RAM components and DSP blocks. • Do not set or reset Registers asynchronously. Control set remapping becomes impossible. Sequential functionality in device resources such as block RAM components and DSP blocks can be set or reset synchronously only. You will be unable to leverage device resources resources, or they will be configured sub-optimally. Use synchronous initialization instead. • Use Asynchronous to Synchronous if your own coding guidelines require Registers to be set or reset asynchronously. This allows you to assess the benefits of using synchronous set/reset. • Do not describe Flip-Flops with both a set and a reset. No Flip-Flop primitives feature both a set and a reset, whether synchronous or asynchronous. 46
  • 47. If not rejected by the software, Flip-Flop primitives featuring both a set and a reset may adversely affect area and performance. • Always describe the clock enable, set, and reset control inputs of Flip-Flop primi- tives as active-High. If they are described as active-Low, the resulting inverter logic will penalize circuit performance • Suggestions for faster and smaller designs Use synchronous Set/Reset whenever possible Use active-high CE and Set/Reset (no local inverter for secondary control signals) Try to build your design with as few control signals (Set, reset & clock enable) as possible 2.2.1 Resource Sharing XST implements high-level optimizations known as Resource Sharing. • Resource Sharing minimizes the number of arithmetic operators, resulting in re- duced device utilization. • Resource Sharing is based on the principle that two similar arithmetic operators can be implemented with common resources on the device, provided their respective outputs are never used simultaneously. • Resource Sharing usually involves creating additional multiplexing logic to select between factorized inputs. Factorization is performed in a way that minimizes this logic. • Resource Sharing is enabled by default, no matter which overall optimization strat- egy you have selected. XST supports Resource Sharing for: • Adders • Subtractors • Adders/Subtractors • Multipliers 47
  • 48. Xilinx recommends that you disable Resource Sharing: • If circuit performance is your primary optimization goal, and • You are unable to meet timing goals. 2.2.2 Implementing FSM Components on Block RAM Resources • By default Finite State Machine (FSM) components are implemented on slice logic. • To save slice logic resources, instruct XST to implement FSM components in block RAM. • Implementing FSM components in block RAM can enhance the performance of large FSM components • XST cannot implement an FSM in block RAM when the FSM has an asynchronous reset 2.2.3 Mapping Logic to Block RAM If you cannot fit the design onto the device, place some of the logic into unused block RAM. XST does not automatically decide which logic can be placed into block RAM. You must instruct XST to do so. • Isolate the part of the Register Transfer Level (RTL) description to be placed into block RAM in a separate hierarchical block. • Apply Map Logic on BRAM to the separate hierarchical block, either directly in the HDL source code, or in the XST Constraint File (XCF). Block Ram Criteria The logic implemented in block RAM must satisfy the following criteria: • All outputs are registered. • The block contains only one level of Registers, which are Output Registers. • All Output Registers have the same control signals. • The Output Registers have a synchronous reset signal. • The block does not contain multi-source situations or tristate buffers. 48
  • 49. Rules for Clock Signals • Use one clock signal and one edge. • Do not generate internal clock signals because of glitching and clock-skew related problems Rules for the Hierarchical Registering of Signals • Register outputs of leaf-level (Sub blocks) blocks. • Register the inputs to the chips top-level. 2.2.4 Important Notes • Case statements results in luts connected in parallel where as if else statements results in luts connected in series. • If nested IF statements are necessary, put critical input signals on the first IF statement. The critical signal ends up in the last logic stage Figure 2.1: Nested IF • CASE statements in a combinatorial process (VHDL) or always statement (Verilog) – Latches are inferred if outputs are not defined in all branches – Use default assignments before the CASE statement to prevent latches • CASE statements in a sequential process (VHDL) or always statement (Verilog) 49
  • 50. – Clock enables are inferred if outputs are not defined in all branches – This is not wrong, but might generate a long clock enable equation – Use default assignments before CASE statement to prevent clock enables • Consider using one-hot select inputs – Eliminating the select decoding can improve performance (Only one bit used at each state. Different select line for different state) • The advantage of using the dont care for the default, is that the synthesizer will have more flexibility to create a smaller, faster circuit. Figure 2.2: FSM encoding •••• Registering the control signals eliminates the net delay between two registers • High Fanout: Solutions 50
  • 51. Figure 2.3: Pipeline Registers Figure 2.4: Registering High Fanout Signals – Most likely solution is to duplicated the source of the high fanout net 51
  • 52. 2.2.5 FPGA Power Management Design Techniques • Static and dynamic power is minimized by using Hard-IP • Static power reduces because of less number of transistors where as reduction in dynamic power is because of Reduced trace lengths • Move functions to dedicated hardware resources – State machines to BRAMs – Counters to DSP48s – Registers to SRLs 52
  • 53. References [1] http://www.sunburst-design.com/papers/ [2] William I. Fletcher, An Engineering Approach To Digital Design, New Jersey, Prentice-Hall, 1980. [3] Zvi Kohavi, Switching And Finite Automata Theory, Second Edition, New York, McGraw-Hill Book Company, 1978. [4] The Programmable Logic Data Book, Xilinx, 1994, pg. 8-171. [5] Clifford E. Cummings, ”Coding And Scripting Techniques For FSM Designs With Synthesis-Optimized, Glitch- Free Outputs,” SNUG’2000 Boston (Synopsys Users Group Boston, MA, 2000) Proceedings, September 2000. [6] Real Intent, Inc. (white paper), Clock Domain Crossing Demystified: The Second Generation Solution for CDC Verification, February 2008 - www.realintent.com [7] Clifford E. Cummings, Simulation and Synthesis Techniques for Asynchronous FIFO Design, SNUG 2002 - www.sunburst- design.com/papers/CummingsSNUG2002SJ FIFO1.pdf 53