FPGA Coding Guidelines

RTL Coding Guidelines
Chethan Kumar H B

Chapter 1
Coding Style For Better Synthesis
Reference :
• http://www.sunburst-design.com/papers/CummingsSNUG2000SJ NBA.pdf
Before giving further explanation and examples of both blocking and nonblocking as-
signments, it would be useful to outline eight guidelines that help to accurately simulate
hardware, modeled using Verilog. Adherence to these guidelines will also remove 90-100%
of the Verilog race conditions encountered by most Verilog designers.
• When modeling sequential logic, use nonblocking assignments.
• When modeling latches, use nonblocking assignments.
• When modeling combinational logic with an always block, use blocking
assignments.
• When modeling both sequential and combinational logic within the same
always block, use nonblocking assignments.
• Do not mix blocking and nonblocking assignments in the same always
block.
• Do not make assignments to the same variable from more than one
always block.
• Use $strobe to display values that have been assigned using nonblocking
assignments.
• Do not make assignments using #0 delays
2

1.1 Blocking assignments
The blocking assignment operator is an equal sign (”=”). A blocking assignment gets
its name because a blocking assignment must evaluate the RHS arguments and complete
the assignment without interruption from any other Verilog statement. The assignment
is said to ”block” other assignments until the current assignment has completed. The
one exception is a blocking assignment with timing delays on the RHS of the blocking
operator, which is considered to be a poor coding style. A problem with blocking as-
signments occurs when the RHS variable of one assignment in one procedural
block is also the LHS variable of another assignment in another procedural
block and both equations are scheduled to execute in the same simulation
time step.
module fbosc1 (y1 , y2 , clk , r s t ) ;
output y1 , y2 ;
input clk , r s t ;
reg y1 , y2 ;
always @( posedgeclk or posedgerst )
i f ( r s t ) y1 = 0; // r e s e t
e l s e y1 = y2 ;
always @( posedgeclk or posedgerst )
i f ( r s t ) y2 = 1; // preset
e l s e y2 = y1 ;
endmodule
1.2 Nonblocking assignments
The nonblocking assignment operator is the same as the less-than-or-equal-to operator
(”<=”). A nonblocking assignment gets its name because the assignment evaluates the
RHS expression of a nonblocking statement at the beginning of a time step and schedules
the LHS update to take place at the end of the time step.
Execution of nonblocking assignments can be viewed as a two-step process:
• Evaluate the RHS of nonblocking statements at the beginning of the time step.
• Update the LHS of nonblocking statements at the end of the time step.
Nonblocking assignments are only made to register data types and are therefore only
permitted inside of procedural blocks, such as initial blocks and always blocks. Non-
blocking assignments are not permitted in continuous assignments
3

1.3 The Verilog ”stratified event queue”
the ”stratified event queue” is logically partitioned into four distinct queues for the
current simulation time and additional queues for future simulation times.
Figure 1.1: Verilog ”stratified event queue”.
The active events queue is where most Verilog events are scheduled, including block-
ing assignments, continuous assignments, $display commands, evaluation of instance and
primitive inputs followed by updates of primitive and instance outputs, and the eval-
uation of nonblocking RHS expressions. The LHS of nonblocking assignments are not
updated in the active events queue.
Events are added to any of the event queues but are only removed from the active events
queue. Events that are scheduled on the other event queues will eventually become
”activated,” or promoted into the active events queue.
The practice of making #0-delay assignments is generally a flawed practice employed by
designers who try to make assignments to the same variable from two separate procedural
blocks, attempting to beat Verilog race conditions by scheduling one of the assignments
4

to take place slightly later in the same simulation time step. Adding #0-delay as-
signments to Verilog models needlessly complicates the analysis of scheduled
events.
1.4 Self-triggering always blocks
In general, a Verilog always block cannot trigger itself. Consider the oscillator example in
Example 1.1. This oscillator uses blocking assignments. Blocking assignments evaluate
their RHS expression and update their LHS value without interruption. The blocking
assignment must complete before the @(clk) edge-trigger event can be scheduled. By the
time the trigger event has been scheduled, the blocking clk assignment has completed;
therefore, there is no trigger event from within the always block to trigger the @(clk)
trigger.
Listing 1.1: Non-self-triggering oscillator using blocking assignments
module osc1 ( clk ) ;
outputclk ;
regclk ;
i n i t i a l #10 clk = 0;
always @( clk ) #10 clk = ! clk ;
endmodule
In contrast, the oscillator in Example 1.2 uses nonblocking assignments. After the
ﬁrst @(clk) trigger, the RHS expression of the nonblocking assignment is evaluated and
the LHS value scheduled into the nonblocking assign updates event queue. Before the
nonblocking assign updatesevent queue is ”activated,” the @(clk) trigger statement is
encountered and the always block again becomes sensitive to changes on the clksignal.
When the nonblocking LHS value is updated later in the same time step, the @(clk)
is again triggered. osc2 example is self triggering(which is not necessarily a
recommended coding style).
Listing 1.2: Self-triggering oscillator using nonblocking assignments
module osc2 ( clk ) ;
outputclk ;
regclk ;
i n i t i a l #10 clk = 0;
always @( clk ) #10 clk<= ! clk ;
endmodule
5

1.5 Combinational logic - use blocking assignments
The code shown in Example 1.3 builds the y-output from three sequentially executed
statements. Since nonblocking assignments evaluate the RHS expressions before updating
the LHS variables, the values of tmp1 and tmp2 were the original values of these two
variables upon entry to this always block and not the values that will be updated at the
end of the simulation time step. The y-output will reﬂect the old values of tmp1 and
tmp2, not the values calculated in the current pass of the always block
Listing 1.3: Bad combinational logic coding style using nonblocking assignments
module ao4 (y , a , b , c , d ) ;
output y ;
input a , b , c , d ;
reg y , tmp1 , tmp2 ;
always @( a or b or c or d) begin
tmp1 <= a & b ;
tmp2 <= c & d ;
y <= tmp1 | tmp2 ;
end
endmodule
The code shown in Example 1.4 is identical to the code shown in Example 1.3, except
that tmp1 and tmp2 have been added to the sensitivity list. As describe in section
1.2, when the nonblocking assignments update the LHS variables in the nonblocking
assign update events queue, the always block will self-trigger and update the y-outputs
with the newly calculated tmp1 and tmp2 values. y-output value will now be
correct after taking two passes through the always block. Multiple passes
through an always block equates to degraded simulation performance and
should be avoided if a reasonable alternative exists (use blocking statements
for combinational modeling).
Listing 1.4: combinational logic coding style using nonblocking assignments
module ao5 (y , a , b , c , d ) ;
output y ;
reg y , tmp1 , tmp2 ;
always @( a or b or c or d or tmp1 or tmp2) begin
tmp1 <= a & b ;
tmp2 <= c & d ;
y <= tmp1 | tmp2 ;
end
endmodule
6

NOTE
• Using the $display command with nonblocking assignments does not work
• ”Making multiple nonblocking assignments to the same variable in the same always
block is deﬁned by the Verilog Standard. The last nonblocking assignment to the
same variable wins”
1.5.1 Driving same signal inside two diﬀerent if condition blocks
In the code shown below second if statement will take the precedence always. If both
the enables are ’1’ then the output seen is ’in2’.
Listing 1.5: One signal two if
module onesignaltwoif (
input clk ,
input en1 ,
input en2 ,
input in1 ,
input in2 ,
output reg out
) ;
always @( posedge ( clk ))
begin
i f ( en1 )
out <= in1 ;
i f ( en2 )
out <=in2 ;
end
endmodule
1.6 RTL Coding Styles That Yield Simulation and
Synthesis Mismatches
Reference :
• http://www.sunburst-design.com/papers/CummingsSNUG1999SJ SynthMismatch.pdf
7

1.6.1 SENSITIVITY LIST
Synthesis tools infer combinational or latching logic from an always block with a sensi-
tivity list that does not contain the Verilog keywords posedge or negedge. For a combi-
national always block, the logic inferred is derived from the equations in the block and
has nothing to do with the sensitivity list. The synthesis tool will read the sensitivity list
and compare it against the equations in the always block, only to report coding omissions
that might cause a mismatch between pre- and post-synthesis simulations.
The presence of signals in a sensitivity list that are not used in the always block will
not make any functional difference to either pre- or post-synthesis simulations. The only
effect of extraneous signals is that the pre-synthesis simulations will run more slowly.
This is due to the fact that the always block will be entered and evaluated more often
than is necessary
1.6.2 Incomplete sensitivity list
The synthesized logic described by the equations in an always block will always be im-
plemented as if the sensitivity list were complete. However, the pre-synthesis simulation
functionality of this same always block will be quite different. In module code1a, the
sensitivity list is complete; therefore, the pre- and post-synthesis simulations will both
simulate a 2-input and gate. In module code1b, the sensitivity list only contains the
variable a. The post-synthesis simulations will simulate a 2-input and gate. However, for
pre-synthesis simulation, the always block will only be executed when there are changes
on variable a. Any changes on variable b that do not coincide with changes on a will not
be observed on the output. This functionality will not match that of the 2-input and
gate of the post-synthesis model. Finally, module code1c does not contain any sensitivity
list. During pre-synthesis simulations, this always block will lock up the simulator into
an infinite loop. Yet, the post-synthesis model will again be a 2-input and gate.
Listing 1.6: Incomplete sensitivity lists
module code1a (o , a , b ) ;
output o ;
input a , b ;
reg o ;
always @( a or b)
o = a & b ;
endmodule
∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗
module code1b (o , a , b ) ;
output o ;
input a , b ;
reg o ;
8

always @( a )
o = a & b ;
∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗
endmodule
module code1c (o , a , b ) ;
output o ;
input a , b ;
reg o ;
always
o = a & b ;
endmodule
Note: All three modules infer a 2-input and gate
1.6.3 CASE STATEMENTS
Full Case
Using the synthesis tool directive //synopsys full case gives more information about the
design to the synthesis tool than is provided to the simulation tool. This particular
directive is used to inform the synthesis tool that the case statement is fully deﬁned,
and that the output assignments for all unused cases are don’t cares. The functionality
between pre- and postsynthesized designs may or may not remain the same when using
this directive. Additionally, although this directive is telling the synthesis tool to use the
unused states as dont cares, this directive will sometimes make designs larger and slower
than designs that omit the full case directive.
In module code4a, a case statement is coded without using any synthesis directives.
The pre- and postsynthesis simulations will match. Module code4b uses a case statement
with the synthesis directive full case. Because of the synthesis directive, the en input is
optimized away during synthesis and left as a dangling input. The pre-synthesis simulator
results of modules code4a and code4b will match the post-synthesis simulation results
of module code4a, but will not match the post-synthesis simulation results of module
code4b.
Listing 1.7: Full Case
// no f u l l c a s e
// Decoder b u i l t from four 3−input and gates
// and two i n v e r t e r s
module code4a (y , a , en ) ;
output [ 3 : 0 ] y ;
input [ 1 : 0 ] a ;
input en ;
9

reg [ 3 : 0 ] y ;
always @( a or en ) begin
y = 4 ’ h0 ;
case ({en , a})
3 ’ b1 00 : y [ a ] = 1 ’ b1 ;
3 ’ b1 01 : y [ a ] = 1 ’ b1 ;
3 ’ b1 10 : y [ a ] = 1 ’ b1 ;
3 ’ b1 11 : y [ a ] = 1 ’ b1 ;
endcase
end
endmodule
∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗
// f u l l c a s e example
// Decoder b u i l t from four 2−input nor gates
// and two i n v e r t e r s
// The enable input i s dangling ( has been optimized away)
module code4b (y , a , en ) ;
output [ 3 : 0 ] y ;
input [ 1 : 0 ] a ;
input en ;
reg [ 3 : 0 ] y ;
always @( a or en ) begin
y = 4 ’ h0 ;
case ({en , a}) // synopsys f u l l c a s e
3 ’ b1 00 : y [ a ] = 1 ’ b1 ;
3 ’ b1 01 : y [ a ] = 1 ’ b1 ;
3 ’ b1 10 : y [ a ] = 1 ’ b1 ;
3 ’ b1 11 : y [ a ] = 1 ’ b1 ;
endcase
end
endmodule
Parallel Case
Using the synthesis tool directive //synopsys parallel case gives more information about
the design to the synthesis tool than is provided to the simulation tool. This particular
directive is used to inform the synthesis tool that all cases should be tested in parallel,
even if there are overlapping cases which would normally cause a priority encoder to be
inferred. When a design does have overlapping cases, the functionality between pre- and
post-synthesis designs will be diﬀerent.
10

The pre-synthesis simulations for modules code5a and code5b below, as well as the
postsynthesis structure of module code5a will infer priority encoder functionality. How-
ever, the post-synthesis structure for module code5b will be two and gates. The use of the
synthesis tool directive //synopsys parallel case will cause priority encoder case state-
ments to be implemented as parallel logic, causing pre- and post-synthesis simulation
mismatches.
Listing 1.8: Parallel Case
// no p a r a l l e l c a s e
// Priority encoder − 2−input nand gate driving an
// inv er te r ( z−output ) and also driving a
// 3−input and gate (y−output )
module code5a (y , z , a , b , c , d ) ;
output y , z ;
reg y , z ;
{y , z} = 2 ’b0 ;
casez ({a , b , c , d})
4 ’ b11 ??: z = 1;
4 ’b??11: y = 1;
endcase
end
endmodule
∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗
// p a r a l l e l c a s e
// two p a r a l l e l 2−input and gates
module code5b (y , z , a , b , c , d ) ;
output y , z ;
reg y , z ;
{y , z} = 2 ’b0 ;
casez ({a , b , c , d}) // synopsys p a r a l l e l c a s e
4 ’ b11 ??: z = 1;
4 ’b??11: y = 1;
endcase
end
endmodule
11

caseX
The use of casex statements can cause design problems. A casex treats Xs as ”don’t
cares” if they are in either the case expression or the case items. The problem with casex
occurs when an input tested by a casex expression is initialized to an unknown state. The
pre-synthesis simulation will treat the unknown input as a ”don’t care” when evaluated in
the casex statement. The equivalent post-synthesis simulation will propagate Xs through
the gate-level model, if that condition is tested.
NOTE
• caseZ is same as caseX except ’Z’ treated as dont care.
• ”Aware of a fact that synthesis directives are not recognized by simulators. While
using any synthesis directives makes sure that, it doesn’t lead to pre & post syn-
thesis mismatch”
1.7 FSM
Reference :
• http://www.sunburst-design.com/papers/CummingsICU2002 FSMFundamentals.pdf
• http://www.sunburst-design.com/papers/CummingsSNUG2000Boston FSM.pdf
A common classiﬁcation used to describe the type of an FSM is Mealy and Moore state
machines[2] [3]. A Moore FSM is a state machine where the outputs are only a function
of the present state. A Mealy FSM is a state machine where one or more of the outputs
is a function of the present state and one or more of the inputs.
12

Figure 1.2: Finite State Machine (FSM) block diagram.
1.7.1 Binary Encoded or Onehot Encoded?
Common classifications used to describe the state encoding of an FSM are Binary (or
highly encoded) and Onehot.
A binary-encoded FSM design only requires as many flip-flops as are needed to uniquely
encode the number of states in the state machine. The actual number of flip-flops required
is equal to the ceiling of the log-base-2 of the number of states in the FSM.
A onehot FSM design requires a flip-flop for each state in the design and only one
flip-flop (the flip-flop representing the current or ”hot” state) is set at a time in a onehot
FSM design. For a state machine with 9- 16 states, a binary FSM only requires 4 flip-
flops while a onehot FSM requires a flip-flop for each state in the design (9-16 flip-flops).
FPGA vendors frequently recommend using a onehot state encoding style because
flip-flops are plentiful in an FPGA and the combinational logic required to implement a
onehot FSM design is typically smaller than most binary encoding styles. Since FPGA
performance is typically related to the combinational logic size of the FPGA design,
onehot FSMs typically run faster than a binary encoded FSM with larger combinational
logic blocks[4].
13

Figure 1.3: FSM encoding.
Note: When one hot style is used to code FSM without passing // synopsys paral-
lel case directive, synthesis tools always infer priority encoder. This happens because,
there is a possibility that where two bits of the state variable are set and the ﬁrst state
is given higher priority.
Listing 1.9: one hot
// This l o g i c i n f e r p r i o r i t y encoder
module fsm onehot1
( output reg y , z ,
input wire [ 1 : 0 ] state ) ;
parameter [ 3 : 0 ] IDLE = 0 ,
BBUSY = 1 ,
BWAIT = 2 ,
BFREE = 3;
always @( state ) begin
{y , z} = 2 ’ b0 ;
casez (1 ’ b1)
state [ IDLE ] : z = 1;
state [BBUSY] : y = 1;
endcase
end
endmodule
∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗
14

// This l o g i c i n f e r p a r a l l e l case
module fsm cc4 fp
( output reg y , z ,
input wire [ 1 : 0 ] state ) ;
BBUSY = 1 ,
BWAIT = 2 ,
BFREE = 3;
always @( state ) begin
{y , z} = 2 ’b0 ;
casez (1 ’ b1) // synopsys p a r a l l e l c a s e
state [ IDLE ] : z = 1;
state [BBUSY] : y = 1;
endcase
end
endmodule
1.7.2 One Always Block FSM Style (Not Recommended)
One of the most common FSM coding styles in use today is the one sequential always
block FSM coding style. For most FSM designs, the one always block FSM coding style
is more verbose, more confusing and more error prone than a comparable two always
block coding style.
1.7.3 Two Always Block FSM Style
One of the best Verilog coding styles is to code the FSM design using two always blocks,
one for the sequential state register and one for the combinational next-state and com-
binational output logic.
Listing 1.10: fsm design - two always block style
module fsm cc4 2
( output reg gnt ,
input dly , done , req , clk , rst n ) ;
parameter [ 1 : 0 ] IDLE = 2 ’ b00 ,
BBUSY = 2 ’ b01 ,
BWAIT = 2 ’ b10 ,
BFREE = 2 ’ b11 ;
reg [ 1 : 0 ] state , next ;
always @( posedge clk or negedge rst n )
i f ( ! rst n ) state <= IDLE ;
e l s e state <= next ;
15

always @( state or dly or done or req ) begin
next = 2 ’bx ;
gnt = 1 ’b0 ;
case ( state )
IDLE : i f ( req ) next = BBUSY;
e l s e next = IDLE ;
BBUSY: begin
gnt = 1 ’b1 ;
i f ( ! done ) next = BBUSY;
e l s e i f ( dly ) next = BWAIT;
e l s e next = BFREE;
end
BWAIT: begin
gnt = 1 ’b1 ;
i f ( ! dly ) next = BFREE;
e l s e next = BWAIT;
end
BFREE: i f ( req ) next = BBUSY;
e l s e next = IDLE ;
endcase
end
endmodule
FSM Coding Notes
• Parameters (Parameters are constants that are local to a module) are used to
define state encodings instead of the Verilog ‘define macro definition construct.
After parameter definitions are created, the parameters are used throughout the
rest of the design, not the state encodings.
• The sequential always block is coded using nonblocking assignments.
• The combinational always block sensitivity list is sensitive to changes on the state
variable and all of the inputs referenced in the combinational always block.
• Assignments within the combinational always block are made using Verilog blocking
assignments.
• Default output and next state assignments are made before coding the case state-
ment as shown in 1.10. This eliminates latches and reduces the amount of code
required to code the rest of the outputs in the case statement and highlights in the
case statement exactly in which states the individual output(s) change).
16

• Assignments within the combinational always block are made using Verilog blocking
assignments.
1.7.4 Onehot FSM Coding Style
Eﬃcient (small and fast) onehot state machines can be coded using an inverse case
statement; a case statement where each case item is an expression that evaluates to true
or false.
Listing 1.11: fsm design -onehot style
module fsm cc4 fp
( output reg gnt ,
BBUSY = 1 ,
BWAIT = 2 ,
BFREE = 3;
i f ( ! rst n ) begin
state <= 4 ’b0 ;
state [ IDLE ] <= 1 ’b1 ;
end
next = 4 ’ b0 ;
gnt = 1 ’b0 ;
case (1 ’ b1) // ambit synthesis case = f u l l , p a r a l l e l
state [ IDLE ] : i f ( req ) next [BBUSY] = 1 ’b1 ;
e l s e next [ IDLE ] = 1 ’b1 ;
state [BBUSY] : begin
gnt = 1 ’b1 ;
i f ( ! done ) next [BBUSY] = 1 ’ b1 ;
e l s e i f ( dly ) next [BWAIT] = 1 ’b1 ;
e l s e next [BFREE] = 1 ’b1 ;
end
state [BWAIT] : begin
gnt = 1 ’b1 ;
i f ( ! dly ) next [BFREE] = 1 ’ b1 ;
e l s e next [BWAIT] = 1 ’ b1 ;
end
state [BFREE] : begin
17

i f ( req ) next [BBUSY] = 1 ’ b1 ;
end
endcase
end
endmodule
1.7.5 Registered FSM Outputs
synthesis results by standardizing the output and input delay constraints of synthesized
modules [5].
FSM outputs are easily registered by adding a third always sequential block to an
FSM module where output assignments are generated in a case statement with case
items corresponding to the next state that will be active when the output is clocked.
Listing 1.12: fsm design -three always blocks w/registered outputs
module fsm cc4 fp
( output reg gnt ,
BBUSY = 1 ,
BWAIT = 2 ,
BFREE = 3;
i f ( ! rst n ) begin
state <= 4 ’ b0 ;
state [ IDLE ] <= 1 ’ b1 ;
end
next = 4 ’ b0 ;
gnt = 1 ’b0 ;
case (1 ’ b1) // ambit synthesis case = f u l l , p a r a l l e l
state [ IDLE ] : i f ( req ) next [BBUSY] = 1 ’b1 ;
state [BBUSY] : begin
gnt = 1 ’b1 ;
i f ( ! done ) next [BBUSY] = 1 ’ b1 ;
e l s e i f ( dly ) next [BWAIT] = 1 ’b1 ;
e l s e next [BFREE] = 1 ’b1 ;
18

end
state [BWAIT] : begin
gnt = 1 ’b1 ;
i f ( ! dly ) next [BFREE] = 1 ’ b1 ;
e l s e next [BWAIT] = 1 ’ b1 ;
end
state [BFREE] : begin
i f ( req ) next [BBUSY] = 1 ’ b1 ;
end
endcase
end
endmodule
One or Two or Three always blocks for FSM??
• Use a two always block coding style to code FSM designs with combinational out-
puts. This style is efficient and easy to code and can also easily handle Mealy FSM
designs.
• Use a three always block coding style to code FSM designs with registered outputs.
This style is efficient and easy to code.
1.8 Clock Domain Crossing
Reference :
• http://www.sunburst-design.com/papers/CummingsSNUG2008Boston CDC.pdf
• http://www.sunburst-design.com/papers/CummingsSNUG2001SJ AsyncClk.pdf
1.8.1 Metastability
Metastbility refers to signals that do not have stable 0 or 1 states for some duration
of time at some point during normal operation of a design. In a multi-clock design,
metastability cannot be avoided but the detrimental effects of metastability
can be neutralized.
Figure 1.4 shows a synchronization failure that occurs when a signal generated in one
clock domain is sampled too close to the rising edge of a clock signal from a second
clock domain. Synchronization failure is caused by an output going metastable and not
converging to a legal stable state by the time the output must be sampled again.
19

Figure 1.4: Asynchronous clocks and synchronization failure
1.8.2 Why is metastability a problem?
metastable output that traverses additional logic in the receiving clock domain can cause
illegal signal values to be propagated throughout the rest of the design. Since the CDC
signal can fluctuate for some period of time, the input logic in the receiving clock domain
might recognize the logic level of the fluctuating signal to be different values and hence
propagate erroneous signals into the receiving clock domain.
1.8.3 Synchronizers
There are two scenarios that are possible when passing signals across CDC boundaries,
and it is important to determine which scenario applies to your design:
• It is permitted to miss samples that are passed between clock domains.
• Every signal passed between clock domains must be sampled.
1.8.4 Two flip-flop synchronizer
”A synchronizer is a device that samples an asynchronous signal and outputs a version
of the signal that has transitions synchronized to a local or sample clock.”
The simplest and most common synchronizer used by digital designers is a two-flip-flop
synchronizer as shown in Figure 1.5.
20

Figure 1.5: Two flip-flop synchronizer
The first flip-flop samples the asynchronous input signal into the new clock domain
and waits for a full clock cycle to permit any metastability on the stage-1 output signal
to decay, then the stage- 1 signal is sampled by the same clock into a second stage
flip-flop, with the intended goal that the stage-2 signal is now a stable and valid signal
synchronized and ready for distribution within the new clock domain.
It is theoretically possible for the stage-1 signal to still be sufficiently metastable by the
time the signal is clocked into the second stage to cause the stage-2 output signal to also
go metastable. The calculation of the probability of the time between synchronization
failures (MTBF) is a function of multiple variables including the clock frequencies used
to generate the input signal and to clock the synchronizing flip-flops.
For most synchronization applications, the two flip-flop synchronizer is suf-
ficient to remove all likely metastability.
21

1.8.5 MTBF - mean time before failure
When calculating MTBF numbers, larger numbers are preferred over smaller numbers.
Larger numbers indicate that metastability could happen frequently, similarly causing
failures within the design. MTBF numbers indicate longer periods of time between
potential failures, while smaller MTBF
Figure 1.6: MTBF
Two of the most important factors that directly impact the MTBF of a synchronizer
circuit are, the sample clock frequency (how fast are signals being sampled into the
receiving clock domain) and the data change frequency (how fast is the data changing
that crosses the CDC boundary). From the above equation, it can be seen that
failures occur more frequently (shorter MTBF) in higher speed designs, or
when the sampled data changes more frequently.
1.8.6 Three flip-flop synchronizer
For some very high speed designs, the MTBF of a two-flop synchronizer is too short and
a third flop is added to increase the MTBF.
Figure 1.7: Three flip-flop synchronizer used in higher speed designs
22

1.8.7 Registering signals from the sending clock domain to avoid
glitches
Consider an example where the signals in the sending clock domain are not registered
before being passed into the receiving clock domain, as shown in Figure 1.8.
Figure 1.8: Unregistered signals sent across a CDC boundary
In this example, the combinational output from the sending clock domain could expe-
rience combinational glitches at the CDC boundary. This combinational glitches eﬀec-
tively increases the data-change frequency potentially creating small bursts of oscillating
data and thereby increasing the potential for sampling changing data and generating
metastable signals at the CDC boundary.
Signals in the sending clock domain should be synchronized (Registered)
before being passed to a CDC boundary. The synchronization of signals from
the sending clock domain reduces the number of edges that can be sampled
in the receiving clock domain, eﬀectively reducing the data-change frequency
in the MTBF equation and hence increasing the time between calculated
failures.
23

Figure 1.9: Registered signals sent across a CDC boundary
In Figure 1.9 The adat flip-flop filters out the combinational glitches on the flip-flop
input (a) and passes a clean signal to the bclk logic.
24

1.8.8 Synchronizing fast signals into slow clock domains
One issue associated with synchronizers is the possibility that a signal from a sending
clock domain might change values twice before it can be sampled, or might be too close
to the sampling edges of a slower clock domain. This possibility must be considered any
time signals are sent from one clock domain to another and a determination must be
made whether missed signals are or are not a problem for the design in question. When
missed samples are not allowed, there are two general approaches to the problem:
• An open-loop solution to ensure that signals are captured without acknowledgment.
• A closed-loop solution that requires acknowledgement of receipt of the signal that
crosses a CDC boundary.
1.8.9 Requirement for reliable signal passing between clock do-
mains
When passing one CDC signal between clock domains through a two-flip-flop synchro-
nizer, the CDC signal must be wider than 1.5 times the cycle width of the receiving
domain clock period. ”input data values must be stable for three destination
clock edges”.
The ”three edge” requirement actually applies to both open-loop and closed-loop solu-
tions, but implementations of the closed-loop solution automatically ensure that at least
three edges are detected for all CDC signals.
Problem - passing a fast CDC pulse
Consider the severely flawed condition where the sending clock domain has a higher
frequency than the receiving clock domain and that a CDC pulse is only one cycle wide
in the sending clock domain. If the CDC signal is only pulsed for one fast-clock cycle,
the CDC signal could go high and low between the rising edges of a slower clock and not
be captured into the slower clock domain as shown in Figure 1.10.
25

Figure 1.10: Short CDC signal pulse missed during synchronization
Problem - sampling a long CDC pulse - but not long enough!
Consider the somewhat non-intuitive and ﬂawed condition where the sending clock do-
main sends a pulse to the receiving clock domain that is slightly wider than the period
of the receiving clock frequency. Under most conditions, the signal will be sampled and
passed, but there is the small but real chance that the CDC pulse will change too close
to the two rising clock edges of the receiving clock domain and thereby violate the setup
time on the ﬁrst clock edge and violate the hold time of the second clock edge and not
form the anticipated pulse. This possible failure is shown in Figure 1.11.
26

Figure 1.11: Marginal CDC pulse that violates the destination setup and hold times
1.8.10 Open-loop solution - sampling signals with synchronizers
One potential solution to this problem is to assert CDC signals for a period of time that
exceeds the cycle time of the sampling clock as shown in Figure 1.12. As discussed in
section 1.8.9, the minimum pulse width is 1.5X the period of the receiving clock frequency.
The assumption is that the CDC signal will be sampled at least once and possibly twice
by the receiver clock.
Open-loop sampling can be used when relative clock frequencies are ﬁxed
and properly analyzed.
Advantage: the Open-loop solution is the fastest way to pass signals across CDC
boundaries that does not require acknowledgment of the received signal.
27

Figure 1.12: Lengthened pulse to guarantee that the control signal will be sampled
1.8.11 Closed loop solution - sampling signals with synchroniz-
ers
A second potential solution to this problem is to send an enabling control signal, synchro-
nize it into the new clock domain and then pass the synchronized signal back through
another synchronizer to the sending clock domain as an acknowledge signal.
Advantage: synchronizing a feedback signal is a very safe technique to acknowledge
that the ﬁrst control signal was recognized and sampled into the new clock domain.
Disadvantage: there is potentially considerable delay associated with synchronizing
control signals in both directions before allowing the control signal to change.
28

Figure 1.13: Signal with feedback to acknowledge receipt
1.8.12 Passing multiple signals between clock domains
When passing multiple signals between clock domains, simple synchronizers do not guar-
antee safe delivery of the data.
A frequent mistake made by engineers when working on multi-clock de-
signs is passing multiple CDC bits required in the same transaction from one
clock domain to another and overlooking the importance of the synchronized
sampling of the CDC bits.
The problem is that multiple signals that are synchronized to one clock will experience
small data changing skews that can eventually be sampled on diﬀerent rising clock edges
in a second clock domain.
29

Multi-bit CDC strategies
To avoid multi-bit CDC skewed sampling scenarios, following multi-bit CDC strategies
can be applied:
• Multi-bit signal consolidation. Where possible, consolidate multiple CDC bits into
1bit CDC signals.
• Multi-cycle path formulations. Use a synchronized load signal to safely pass mul-
tiple CDC bits.
• Pass multiple CDC bits using gray codes.
1.8.13 Multi-bit signal consolidation
Where possible, consolidate multiple CDC signals into a 1bit CDC signal. Check whether
you really need multiple bits to control logic across a CDC boundary. Simply using
synchronizers on all of the CDC bits is not always good enough.
Problem - Two simultaneously required control signals.
In the simple example shown in Figure 1.14, a register in the receiving clock domain
requires both a load signal and an enable signal in order to load a data value into the
register. If both the load and enable signals are driven on the same sending clock edge,
there is a chance that a small skew between the control signals could cause the two signals
to be synchronized into diﬀerent clock cycles within the receiving clock domain. Under
these conditions, the data would not be loaded into the register.
30

Figure 1.14: Problem - Passing multiple control signals between clock domains
Solution - Consolidation
The solution to the problem in section 1.8.13 is simple, consolidate the control signals.
As shown in Figure 1.15, drive both the load and enable register input signals in the
receiving clock domain from just one load-enable signal. Consolidation will remove the
potential of two control signals arriving shifted in time.
31

Figure 1.15: Consolidating control signals before passing between clock domains
1.8.14 Problem - Multiple Data bits
The diagram in Figure 1.16 shows two encoded control signals being passed between clock
domains. If the two encoded signals are slightly skewed when sampled, an erroneous
decoded output could be generated for one clock period in the receiving clock domain.
32

Figure 1.16: Encoded control signals passed between clock domains
1.8.15 Solutions for passing multiple Data bits
Multi-Cycle Path (MCP) formulations and FIFO techniques can be used to address
problems related to passing multiple data bits between diﬀerent clock domains.
Multi-Cycle Path (MCP) formulation
Using an MCP formulation is a common technique for safely passing multiple
CDC data bits. An MCP formulation refers to sending unsynchronized data to a
receiving clock domain paired with a synchronized control signal. The data and control
signals are sent simultaneously allowing the data to setup on the inputs of the destination
register while the control signal is synchronized for two receiving clock cycles before it
arrives at the load input of the destination register.
Advantages:
• The sending clock domain is not required to calculate the appropriate pulse width
to send between clock domains.
• The sending clock domain is only required to toggle an enable into the receiving
clock domain to indicate that data has been passed and is ready to be loaded. The
enable signal is not required to return to its initial logic level.
33

This strategy passes multiple CDC data bits without synchronization, and simultaneously
passes a synchronized enable signal to the receiving clock domain. The receiving clock
domain is not allowed to sample the multi-bit CDC signals until the synchronized enable
passes through synchronization and arrives at the receiving register.
This strategy is called a Multi-Cycle Path Formulation due to the fact that the unsyn-
chronized data word is passed directly to the receiving clock domain and held for multiple
receiving clock cycles, allowing an enable signal to be synchronized and recognized into
the receiving clock domain before permitting the unsynchronized data word to change.
Figure 1.17: Logic to pass a synchronized enable pulse between clock domains
Synchronized pulse generation logic
The most common method to pass a synchronized enable signal between clock domains
is to employ a toggling enable signal that is passed to a synchronized pulse generator
to indicate that the unsynchronized multi-cycle data word can be captured on the next
receiving clock edge as shown in Figure 1.18.
34

Figure 1.18: Synchronized pulse generation logic
Figure 1.19: Synchronized enable pulse generation logic and equivalent symbol
1.8.16 Synchronizing counters
When passing multiple signals between clock domains, an important question to ask is,
do I need to sample every value of a signal that is passed from one clock domain to
another? With counters, the answer is frequently, no!
Reference [7] details FIFO design techniques where gray code counters are sampled
between clock domains and intermediate gray count values are often missed. For this
FIFO design, the greater consideration is to make sure that the counters cannot overrun
their boundaries, which could cause missed full and empty ﬂag detection. Even though
the sampled gray count values between clock domains are often missed, the design is
robust and all important gray count values are appropriately sampled. See [7] for details.
Since a valid design might be allowed to skip some count value samples,
can any counter be used to pass count values across a CDC boundary? The
answer is no.
35

Binary counters
One characteristic of binary counters is that half of all sequential binary incrementing
operations require that two or more counter bits must change. Trying to synchronize
a binary counter across a CDC boundary is the same as trying to synchronize multiple
CDC signals into a new clock domain. If a simple 4-bit binary counter changes from
address 7 (binary 0111) to address 8 (binary 1000), all four counter bits will change at
the same time. If a synchronizing clock edge comes in the middle of this transition, it is
possible that any 4-bit binary pattern could be sampled and synchronized into the new
clock domain.
In a FIFO design, the new synchronized binary value might trigger a false full or empty
flag, or even worse, it might not trigger a real full or empty flag causing data to be lost
due to FIFO overflow or causing invalid data to be read from the FIFO due to an attempt
to read data when the FIFO is really empty.
Gray codes
Gray codes are the safest counters that can be used in multi-clock designs. Gray codes
only allow one bit to change for each clock transition, eliminating the problem associated
with trying to synchronize multiple changing CDC bits across a clock domain.
1.9 Synchronous Resets? Asynchronous Resets?
Reference : http://www.sunburst-design.com/papers/CummingsSNUG2002SJ Resets.pdf
1.9.1 General flip-flop coding style notes
Synchronous reset flip-flops with non reset follower flip-flops
Each Verilog procedural block or VHDL process should model only one type of flip-
flop. In other words, a designer should not mix resetable flip-flops with follower flip-flops
(flops with no resets). Follower flip-flops are flipflops that are simple data shift registers.
In the Verilog code of Example 1.20, a flip-flop is used to capture data and then its
output is passed through a follower flip-flop. The first stage of this design is reset with a
synchronous reset. The second stage is a follower flip-flop and is not reset, but because
the two flip-flops were inferred in the same procedural block/process, the reset signal
rst n will be used as a data enable for the second flop. This coding style will generate
extraneous logic as shown in Figure 1.20.
36

Listing 1.13: Bad Verilog coding style to model dissimilar flip-flops
module badFFstyle (q2 , d , clk , rst n ) ;
output q2 ;
input d , clk , rst n ;
reg q2 , q1 ;
always @( posedge clk )
i f ( ! rst n ) q1 <= 1 ’b0 ;
e l s e begin
q1 <= d ;
q2 <= q1 ;
end
endmodule
Figure 1.20: Bad coding style yields a design with an unnecessary loadable flip-flop
The correct way to model a follower flip-flop is with two Verilog procedural blocks as
shown in Example 2a. These coding styles will generate the logic shown in Figure 1.21.
Listing 1.14: Good Verilog coding style to model dissimilar flip-flops
module goodFFstyle (q2 , d , clk , rst n ) ;
output q2 ;
reg q2 , q1 ;
i f ( ! rst n ) q1 <= 1 ’b0 ;
e l s e q1 <= d ;
37

q2 <= q1 ;
endmodule
Figure 1.21: Two different types of flip-flops, one with synchronous reset and one
without
It should be noted that the extraneous logic generated by the code in Example 1.20 is
only a result of using a synchronous reset. If an asynchronous reset approach
had been used, then both coding styles would synthesize to the same design
without any extra combinational logic. The generation of different flip-flop styles is
largely a function of the sensitivity lists and if-else statements that are used in the HDL
code.
1.9.2 Synchronous Resets
Synchronous resets are based on the premise that the reset signal will only affect or reset
the state of the flip-flop on the active edge of a clock. The reset can be applied to the
flip-flop as part of the combinational logic generating the d-input to the flip-flop.
Listing 1.15: Correct way to model a flip-flop with synchronous reset using Verilog
module sync resetFFstyle (q , d , clk , rst n ) ;
output q ;
reg q ;
38

i f ( ! rst n ) q <= 1 ’ b0 ;
e l s e q <= d ;
endmodule
Advantages of synchronous resets
Synchronous reset will synthesize to smaller flip-flops, particularly if the reset is gated
with the logic generating the d-input. But in such a case, the combinational logic gate
count grows, so the overall gate count savings may not be that significant.
• Synchronous resets generally insure that the circuit is 100% synchronous
• Synchronous resets insure that reset can only occur at an active clock edge. The
clock works as a filter for small reset glitches; however, if these glitches occur near
the active clock edge, the flip-flop could go metastable.
• In some designs, the reset must be generated by a set of internal conditions. A
synchronous reset is recommended for these types of designs because it will filter
the logic equation glitches between clocks.
Disadvantages of synchronous resets
• Synchronous resets may need a pulse stretcher to guarantee a reset pulse width
wide enough to ensure reset is present during an active edge of the clock.
When we are working with gated clock (Example SPI interface), it is not
possible to reset the logic through a synchronous reset
1.9.3 Asynchronous resets
Asynchronous resets alone can be very dangerous. The biggest problem with asyn-
chronous resets is the reset release, also called reset removal. Asynchronous reset flip-flops
incorporate a reset pin into the flip-flop design. The reset pin is typically active low (the
flip-flop goes into the reset state when the signal attached to the flip-flop reset pin goes
to a logic low level.)
Listing 1.16: Correct way to model a flip-flop with Asynchronous reset using Verilog
module async resetFFstyle (q , d , clk , rst n ) ;
output q ;
reg q ;
// Verilog −2001: permits comma−separation
// @( posedge clk , negedge rst n )
39

i f ( ! rst n ) q <= 1 ’ b0 ;
e l s e q <= d ;
endmodule
Advantages of asynchronous resets
• The biggest advantage to using asynchronous resets is that, as long as the vendor
library has asynchronously resetable flip-flops, the data path is guaranteed to be
clean. Designs that are pushing the limit for data path timing, can not afford to
have added gates and additional net delays in the data path due to logic inserted
to handle synchronous resets. Of course this argument does not hold if the ven-
dor library has flip-flops with synchronous reset inputs and the designer can get
Synopsys to actually use those pins.
• Asynchronous resets doesn’t require free running clock to reset the logic.
Disadvantages of asynchronous resets
• The biggest problem with asynchronous resets is that they are asynchronous, both
at the assertion and at the deassertion of the reset. The assertion is a non issue,
the de-assertion is the issue. If the asynchronous reset is released at or near the
active clock edge of a flip-flop, the output of the flip-flop could go metastable.
• Another problem that an asynchronous reset can have, depending on its source, is
spurious resets due to noise or glitches on the board or system reset.
1.9.4 Asynchronous reset problem
As shown in Figure 1.22, an asynchronous reset signal will be de-asserted asynchronous
to the clock signal. There are two potential problems with this scenario: (1) violation of
reset recovery time and, (2) reset removal happening in different clock cycles for different
sequential elements.
40

Figure 1.22: Asynchronous reset removal recovery time problem
Reset recovery time
Reset recovery time refers to the time between when reset is de-asserted and the time
that the clock signal goes high again. Recovery time is also referred to as a tsu setup time
of the form, PRE or CLR inactive setup time before CLK edge. Missing a recovery time
can cause signal integrity or metastability problems with the registered data outputs.
Reset removal traversing different clock cycles
When reset removal is asynchronous to the rising clock edge, slight differences in prop-
agation delays in either or both the reset signal and the clock signal can cause some
registers or flip-flops to exit the reset state before others.
1.9.5 Reset synchronizer
Guideline: EVERY ASIC/FPGA USING AN ASYNCHRONOUS RESET SHOULD
INCLUDE A RESET SYNCHRONIZER CIRCUIT!!
41

Without a reset synchronizer, the usefulness of the asynchronous reset in the final
system is void even if the reset works during simulation.
The reset synchronizer logic of Figure 1.23 is designed to take advantage of the best of
both asynchronous and synchronous reset styles. An external reset signal asynchronously
Figure 1.23: Reset Synchronizer block diagram
resets a pair of master reset flip-flops, which in turn drive the master reset signal asyn-
chronously through the reset buffer tree to the rest of the flip-flops in the design. The
entire design will be asynchronously reset.
Reset removal is accomplished by de-asserting the reset signal, which then permits the
d-input of the first master reset flip-flop (which is tied high) to be clocked through a reset
synchronizer. It typically takes two rising clock edges after reset removal to synchronize
removal of the master reset.
First flip-flop is required to synchronize the reset signal to the clock pulse where the
second flip-flop is used to remove any metastability that might be caused by the reset
signal being removed asynchronously and too close to the rising clock edge.
A closer examination of the timing now shows that reset distribution timing is the sum
of the a clk-to-q propagation delay, total delay through the reset distribution tree and
42

Figure 1.24: Predictable reset removal to satisfy reset recovery time
meeting the reset recovery time of the destination registers and flip-flops, as shown in
Figure 1.24.
Listing 1.17: The code for the reset synchronizer circuit
module async resetFFstyle2 ( rst n , clk , asyncrst n ) ;
output rst n ;
input clk , asyncrst n ;
reg rst n , r f f 1 ;
always @( posedge clk or negedge asyncrst n )
i f ( ! asyncrst n ) { rst n , r f f 1 } <= 2 ’ b0 ;
e l s e { rst n , r f f 1 } <= { rff1 ,1 ’ b1 };
endmodule
1.9.6 Reset-glitch filtering
One of the biggest issues with asynchronous resets is that they are asynchronous and
therefore carry with them some characteristics that must be dealt with depending on
the source of the reset. With asynchronous resets, any input wide enough to meet the
minimum reset pulse width for a flip-flop will cause the flipflop to reset. If the reset line
43

is subject to glitching, this can be a real problem. Presented here is one approach that
will work to filter out the glitches, but it is ugly! This solution requires that a digital
delay (meaning the delay will vary with temperature, voltage and process) to filter out
small glitches. The reset input pad should also be a Schmidt triggered pad to help with
glitch filtering. Figure 1.25 shows the implementation of this approach.
Figure 1.25: Reset glitch filtering
44

Chapter 2
Xilinx RTL guidelines
Reference :
• http://classes.engineering.wustl.edu/cse460t/images/e/eb/Xst v6s6.pdf
Advantages of VHDL
• Enforces stricter rules, in particular strongly typed, less permissive and error-prone
• Initialization of RAM components in the HDL source code is easier (Verilog initial
blocks are less convenient)
• Package support
• Custom types
• Enumerated types
• No reg versus wire confusion
Advantages of VHDL
• Extension to System Verilog
• C-like syntax
• Results in more compact code
• Block commenting
• No heavy component instantiation as in VHDL
45

2.1 Macro Inference Flow Overview
Macros are inferred during three stages of the XST synthesis flow.
• Basic macros are inferred during HDL Synthesis.
• Complex macros are inferred during Advanced HDL Synthesis.
• Other macros are inferred during Low-Level Optimizations, when timing informa-
tion is available to make more fully-informed decisions.
• Macros inferred during Advanced HDL Synthesis are usually the result of an aggre-
gation of several basic macros previously inferred during HDL Synthesis. In most
cases, the XST inference engine can perform this grouping regardless of hierarchical
boundaries, unless Keep Hierarchy has been set to yes in order to prevent it.
Example; A block RAM is inferred by combining RAM core functionality described
in one user-defined hierarchical block, with a Register described in a different user-
defined hierarchy. This allows you to structure the HDL project in a modular way,
ensuring that XST can recognize relationships among design elements described in
different VHDL entities and Verilog modules.
2.2 Coding Guidelines for Virtex-6, Spartan-6, and
7 Series Devices
These coding guidelines: 1) Minimize slice logic utilization. 2) Maximize circuit perfor-
mance. 3) Utilize device resources such as block RAM components and DSP blocks.
• Do not set or reset Registers asynchronously.
Control set remapping becomes impossible.
Sequential functionality in device resources such as block RAM components
and DSP blocks can be set or reset synchronously only.
You will be unable to leverage device resources resources, or they will be
configured sub-optimally.
Use synchronous initialization instead.
• Use Asynchronous to Synchronous if your own coding guidelines require Registers
to be set or reset asynchronously. This allows you to assess the benefits of using
synchronous set/reset.
• Do not describe Flip-Flops with both a set and a reset.
No Flip-Flop primitives feature both a set and a reset, whether synchronous
or asynchronous.
46

If not rejected by the software, Flip-Flop primitives featuring both a set and
a reset may adversely aﬀect area and performance.
• Always describe the clock enable, set, and reset control inputs of Flip-Flop primi-
tives as active-High. If they are described as active-Low, the resulting inverter logic
will penalize circuit performance
• Suggestions for faster and smaller designs
Use synchronous Set/Reset whenever possible
Use active-high CE and Set/Reset (no local inverter for secondary control
signals)
Try to build your design with as few control signals (Set, reset & clock enable)
as possible
2.2.1 Resource Sharing
XST implements high-level optimizations known as Resource Sharing.
• Resource Sharing minimizes the number of arithmetic operators, resulting in re-
duced device utilization.
• Resource Sharing is based on the principle that two similar arithmetic operators
can be implemented with common resources on the device, provided their respective
outputs are never used simultaneously.
• Resource Sharing usually involves creating additional multiplexing logic to select
between factorized inputs. Factorization is performed in a way that minimizes this
logic.
• Resource Sharing is enabled by default, no matter which overall optimization strat-
egy you have selected.
XST supports Resource Sharing for:
• Adders
• Subtractors
• Adders/Subtractors
• Multipliers
47

Xilinx recommends that you disable Resource Sharing:
• If circuit performance is your primary optimization goal, and
• You are unable to meet timing goals.
2.2.2 Implementing FSM Components on Block RAM Resources
• By default Finite State Machine (FSM) components are implemented on slice logic.
• To save slice logic resources, instruct XST to implement FSM components in block
RAM.
• Implementing FSM components in block RAM can enhance the performance of
large FSM components
• XST cannot implement an FSM in block RAM when the FSM has an asynchronous
reset
2.2.3 Mapping Logic to Block RAM
If you cannot ﬁt the design onto the device, place some of the logic into unused block
RAM. XST does not automatically decide which logic can be placed into block RAM.
You must instruct XST to do so.
• Isolate the part of the Register Transfer Level (RTL) description to be placed into
block RAM in a separate hierarchical block.
• Apply Map Logic on BRAM to the separate hierarchical block, either directly in
the HDL source code, or in the XST Constraint File (XCF).
Block Ram Criteria
The logic implemented in block RAM must satisfy the following criteria:
• All outputs are registered.
• The block contains only one level of Registers, which are Output Registers.
• All Output Registers have the same control signals.
• The Output Registers have a synchronous reset signal.
• The block does not contain multi-source situations or tristate buﬀers.
48

Rules for Clock Signals
• Use one clock signal and one edge.
• Do not generate internal clock signals because of glitching and clock-skew related
problems
Rules for the Hierarchical Registering of Signals
• Register outputs of leaf-level (Sub blocks) blocks.
• Register the inputs to the chips top-level.
2.2.4 Important Notes
• Case statements results in luts connected in parallel where as if else statements
results in luts connected in series.
• If nested IF statements are necessary, put critical input signals on the ﬁrst IF
statement.
The critical signal ends up in the last logic stage
Figure 2.1: Nested IF
• CASE statements in a combinatorial process (VHDL) or always statement (Verilog)
– Latches are inferred if outputs are not deﬁned in all branches
– Use default assignments before the CASE statement to prevent latches
• CASE statements in a sequential process (VHDL) or always statement (Verilog)
49

– Clock enables are inferred if outputs are not defined in all branches
– This is not wrong, but might generate a long clock enable equation
– Use default assignments before CASE statement to prevent clock enables
• Consider using one-hot select inputs
– Eliminating the select decoding can improve performance (Only one bit used
at each state. Different select line for different state)
• The advantage of using the dont care for the default, is that the synthesizer will
have more flexibility to create a smaller, faster circuit.
Figure 2.2: FSM encoding
•••• Registering the control signals eliminates the net delay between two registers
• High Fanout: Solutions
50

Figure 2.3: Pipeline Registers
Figure 2.4: Registering High Fanout Signals
– Most likely solution is to duplicated the source of the high fanout net
51

2.2.5 FPGA Power Management Design Techniques
• Static and dynamic power is minimized by using Hard-IP
• Static power reduces because of less number of transistors where as reduction in
dynamic power is because of Reduced trace lengths
• Move functions to dedicated hardware resources
– State machines to BRAMs
– Counters to DSP48s
– Registers to SRLs
52

References
[1] http://www.sunburst-design.com/papers/
[2] William I. Fletcher, An Engineering Approach To Digital Design, New Jersey,
Prentice-Hall, 1980.
[3] Zvi Kohavi, Switching And Finite Automata Theory, Second Edition, New York,
McGraw-Hill Book Company, 1978.
[4] The Programmable Logic Data Book, Xilinx, 1994, pg. 8-171.
[5] Clifford E. Cummings, ”Coding And Scripting Techniques For FSM Designs With
Synthesis-Optimized, Glitch- Free Outputs,” SNUG’2000 Boston (Synopsys Users
Group Boston, MA, 2000) Proceedings, September 2000.
[6] Real Intent, Inc. (white paper), Clock Domain Crossing Demystified: The Second
Generation Solution for CDC Verification, February 2008 - www.realintent.com
[7] Clifford E. Cummings, Simulation and Synthesis Techniques
for Asynchronous FIFO Design, SNUG 2002 - www.sunburst-
design.com/papers/CummingsSNUG2002SJ FIFO1.pdf
53

FPGA Coding Guidelines

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to FPGA Coding Guidelines

Similar to FPGA Coding Guidelines (20)

Recently uploaded

Recently uploaded (20)

FPGA Coding Guidelines