Synthesis Process, synthesis Model, Why Perform Logic synthesis, Resource Sharing,Example of Resource sharing,Pipe-lining,Power Analysis of FPGA Based System
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
Unit v. HDL Synthesis Process
1. Dr. Sudhir N. Shelke
Ph.D
Principal, Guru Nanak Institute of
Technology, Nagpur.
Dr Sudhir Shelke Page 1 of 51
2. What is Synthesis?
Synthesis is the process of constructing a gate-level
netlist from a model of a circuit described in VHDL.
A synthesis program generates a RTL netlist (FF’s, ALU,
multiplexer, interconnected by wires). So, the RTL
module builder is necessary and the purpose of this
builder is to build each of the required RTL blocks from
a library of predefined components (user-specified
target technology).
After producing a gate-level netlist, a logic optimizer
reads in this netlist and optimizes the circuit for the
user-specified area and timing constraints.
Dr Sudhir Shelke Page 2 of 51
4. 1. Automatically manages many details of the design process:
• Fewer bugs
• Improves productivity
2. Abstracts the design data (HDL description) from any
particular implementation technology
• Designs can be re-synthesized targeting different chip technologies;
E.g.: first implement in FPGA then later in ASIC
3. In some cases, leads to a more optimal design than could be
achieved by manual means (e.g.: logic optimization)
Why Not Logic Synthesis?
1. May lead to less than optimal designs in some cases
Why Perform Logic Synthesis?
Dr Sudhir Shelke Page 4 of 51
5. Variety of general and ad-hoc (special case) methods:
1. Instantiation: maintains a library of primitive modules (AND, OR,
etc.) and user defined modules.
2. “Macro expansion”/substitution: a large set of language operators
(+, -, Boolean operators, etc.) and constructs (if-else, case) expand
into special circuits.
3. Inference: special patterns are detected in the language description
and treated specially (e.g.,: inferring memory blocks from variable
declaration and read/write statements, FSM detection.
4. Logic optimization: Boolean operations are grouped and optimized
with logic minimization techniques
5. Structural reorganization: advanced techniques including sharing of
operators, and retiming of circuits (moving FFs), and others
Dr Sudhir Shelke Page 5 of 51
7. 1. A circuit can be described in
any different ways, not all of
which may be synthesizable.
This is due to the fact that
HDL was designed primarily
as simulation language and
not for synthesis.
2. There is no standardized
subset of VHDL for synthesis.
3. There is no direct object in
VHDL that means a latch or
a flip-flop,
4. therefore, each synthesis
system provide different
mechanism to model a flip-
flop or a latch.
Dr Sudhir Shelke Page 7 of 51
8. In VHDL, a signal, or a variable declared in a process, retains its
value through the entire simulation run, thus inferring memory.
Example :
signal A, B, C, Z : bit;
…….
No _memory : process(A, B, C)
variable temp : bit;
begin
temp := A and B;
Z <= temp or C;
end process;
VHDL semantics says that variable temp retains its value through
the entire simulation run.
Dr Sudhir Shelke Page 8 of 51
9. Resource sharing is an optimization technique that uses a
single functional block (such as an adder or comparator) to
implement several operators in the HDL code. Use resource
sharing to improve design performance by reducing the gate
count and the routing congestion.
The following operators can be shared either with instances of
the same operator or with the operator on the same line.
*
+ -
>>=<<=
For example, a + operator can be shared with instances of other
+ operators or with - operators.
Dr Sudhir Shelke Page 9 of 51
10. ONE can implement arithmetic functions (+, -, magnitude
comparators) with gates, Synopsys Design Ware functions, or
Xilinx Design Ware functions.
Resource sharing adds additional logic levels to multiplex the
inputs to implement more than one function.
Since resource sharing allows you to reduce the number of
design resources, the device area required for your design is
also decreased. The area that is used for a shared resource
depends on the type and bit width of the shared operation.
Dr Sudhir Shelke Page 10 of 51
13. VHDL is a H/W modeling language used to model digital
circuits
Digital circuits can be either Combinational or
Sequential
Combinational Logic circuits: Implement Boolean functions
whose output is only dependant on the present inputs
Sequential Logic circuits: Implement circuits whose output
depends on the present inputs & the history of the inputs. i.e.
Circuits having storage elements
Introduction
13Dr Sudhir Shelke Page 13 of 51
14. Synthesis tools translate the VHDL code to a gate level netlist
representing the actual H/W gates [and, or, not, Flip-Flops…et ]
Only a subset of the language is synthesizable
A model can be either
Synthesizable: Used for both Simulation & Synthesis
Non-Synthesizable: Used for Simulation only
Introduction Cont..
14
VHDL
Standard
Synthesizable
VHDL
Dr Sudhir Shelke Page 14 of 51
15. Combinational
Logic
15
library IEEE;
use IEEE.std_logic_1164.all;
Entity mux_case is
Port(a, b, c, d: in std_logic;
Sel: in std_logic_vector(1 downto 0);
F: out std_logic);
End entity;
Architecture rtl of mux_case is
begin
process (a,b,c,d,sel) is
begin
Case sel is
When "00" => f <= a;
When "01" => f <= b;
When "10" => f <= c;
When "11" => f <= d;
when others => f <= a;
End case;
End process;
End architecture;
Example 1: 4x1 Multiplexer
Dr Sudhir Shelke Page 15 of 51
16. 16
• What is the impact of removing some signals from the sensitivity list as
shown in example 2?
Architecture rtl of mux_case is
begin
process (a,b,c,d,sel) is
begin
Case sel is
When "00" => f <= a;
When "01" => f <= b;
When "10" => f <= c;
When "11" => f <= d;
when others => f <= a;
End case;
End process;
End architecture;
Architecture rtl of mux_case is
begin
process (a, sel) is
begin
Case sel is
When "00" => f <= a;
When "01" => f <= b;
When "10" => f <= c;
When "11" => f <= d;
when others => f <= a;
End case;
End process;
End architecture;
Example 2: 4x1 MultiplexerExample 1: 4x1 Multiplexer
Dr Sudhir Shelke Page 16 of 51
17. 17
Architecture rtl of mux_case is
begin
process (a,b,c,d,sel) is
begin
Case sel is
When "00" => f <= a;
When "01" => f <= b;
When "10" => f <= c;
When "11" => f <= d;
when others => f <= a;
End case;
End process;
End architecture;
Architecture rtl of mux_case is
begin
process (a, sel) is
begin
Case sel is
When "00" => f <= a;
When "01" => f <= b;
When "10" => f <= c;
When "11" => f <= d;
when others => f <= a;
End case;
End process;
End architecture;
Example 2: 4x1 MultiplexerExample 1: 4x1 Multiplexer
• No Impact on the synthesis results, however we will find that the
simulation results differ
• Synthesis tools don’t use the sensitivity list to determine the logic, but
simulation tools depend on the sensitivity list to execute the process
• Example 2 suffers a problem called “Simulation – Synthesis mismatch”
Dr Sudhir Shelke Page 17 of 51
18. Combinational Logic
18
• VHDL 2008* introduced the keyword "all" that implicitly adds all read
signals to the sensitivity list to avoid “Simulation Synthesis mismatch”
Architecture rtl of mux_case is
begin
process (all) is
begin
Case sel is
When "00" => f <= a;
When "01" => f <= b;
When "10" => f <= c;
When "11" => f <= d;
when others => f <= a;
End case;
End process;
End architecture;
Example 3
Golden rule of thumb
• To a oid Si ulatio Sy thesis is at h p o le s he odeli g
Combinational logic, add all read signals to the sensitivity list
Dr Sudhir Shelke Page 18 of 51
19. Combinational Logic
19
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_arith.all;
ENTITY add_sub IS
port (a, b : in integer;
result : out integer;
operation: in std_logic);
END ENTITY;
ARCHITECTURE behave OF add_sub IS
BEGIN
process (a, b, operation)
begin
if (operation = '1') then
result <= a + b;
else
result <= a - b;
end if;
end process;
END ARCHITECTURE;
Example 4: Adder-Subtractor
Dr Sudhir Shelke Page 19 of 51
20. Consider that someone tries to re-use that code to implement an adder with
an enable He modifies the add_sub example; removes the else branch &
e a es the ope atio po t to e a le as sho elo , Ho ould these
changes affect the logic?
Combinational Logic
20
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_arith.all;
ENTITY adder IS
port (a, b : in integer;
result : out integer;
enable: in std_logic);
END ENTITY adder;
ARCHITECTURE behave OF adder IS
BEGIN
process (a, b, enable)
begin
if (enable = '1') then
result <= a + b;
end if;
end process;
END ARCHITECTURE;
Example 5:
Dr Sudhir Shelke Page 20 of 51
21. This ill i fe a lat h, e ause e did ’t spe ify hat should happe to
esult he e a le is ’t e ual to '1'
Si ulatio & sy thesis tools ill just keep the alue as is…i.e. It lat hes the last
value
Combinational Logic
21
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_arith.all;
ENTITY adder IS
port (a, b : in integer;
result : out integer;
enable: in std_logic);
END ENTITY adder;
ARCHITECTURE behave OF adder IS
BEGIN
process (a, b, enable)
begin
if (enable = '1') then
result <= a + b;
end if;
end process;
END ARCHITECTURE;
Example 5:
Dr Sudhir Shelke Page 21 of 51
22. In the below example, the "11" value of "sel" signal is not listed as a case choice,
hence signal "F" is not assigned a value in this case
A Latch is inferred in this example P o a ly that as ’t eeded
Combinational Logic
22
LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY incomplete_case IS
port(sel : std_logic_vector (1 downto 0);
A, B: std_logic;
F : out std_logic);
END ENTITY;
ARCHITECTURE rtl OF incomplete_case IS
BEGIN
process (sel, A, B)
begin
case (sel) is
when "00" =>
F <= A;
when "01" =>
F <= B;
when "10" =>
F <= A xor B;
when others => null;
end case;
end process;
END ARCHITECTURE;
Example 6:
Dr Sudhir Shelke Page 22 of 51
23. Do you think a Latch would be inferred in the below example?
Skills Check
23
LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY incomplete_assignment IS
port(sel : in std_logic_vector (1 downto 0);
A, B : in std_logic;
O1, O2: out std_logic);
END ENTITY;
ARCHITECTURE rtl OF incomplete_assignment IS
BEGIN
process (sel, A, B) begin
case (sel) is
when "00" =>
O1 <= A;
O2 <= A and B;
when "01" =>
O1 <= B;
O2 <= A xor B;
when "10" =>
O1 <= A xor B;
when "11" =>
O2 <= A or B;
when others =>
O1 <= '0';
O2 <= '0';
end case;
end process;
END ARCHITECTURE;
Example 7:
Dr Sudhir Shelke Page 23 of 51
24. Do you think a Latch would be inferred in the below example?
Skills Check (Soln.)
24
LIBRARY ieee;
USE ieee.std_logic_1164.all;
ENTITY incomplete_assignment IS
port(sel : in std_logic_vector (1 downto 0);
A, B : in std_logic;
O1, O2: out std_logic);
END ENTITY;
ARCHITECTURE rtl OF incomplete_assignment IS
BEGIN
process (sel, A, B) begin
case (sel) is
when "00" =>
O1 <= A;
O2 <= A and B;
when "01" =>
O1 <= B;
O2 <= A xor B;
when "10" =>
O1 <= A xor B;
when "11" =>
O2 <= A or B;
when others =>
O1 <= '0';
O2 <= '0';
end case;
end process;
END ARCHITECTURE;
Example 7:
• Latches are inferred for both signals "O1" & "O2"
• Though the case is complete & no "null" statement is
there, we find that "O1" & "O2" are not assigned in all
case's branches This is alled I o plete sig al
assig e t
Dr Sudhir Shelke Page 24 of 51
26. 26
Library ieee;
use ieee.std_logic_1164.all;
Entity d_ff is
Port(d, clk, rst : in std_logic;
Q, nQ : out std_logic);
end entity;
Architecture behav of d_ff is
Begin
process(clk)
begin
If (rising_edge(clk)) then
If (rst = '1') then
Q <= '0';
nQ <= '0';
else
Q <= d;
nQ <= not (d);
end if;
end if;
end process;
end behav;
• Let's model the well known D-FF with outputs Q & nQ and see
the synthesis results
Example 8:
Dr Sudhir Shelke Page 26 of 51
27. 27
Library ieee;
use ieee.std_logic_1164.all;
Entity d_ff is
Port(d, clk, rst : in std_logic;
Q, nQ : out std_logic);
end entity;
Architecture behav of d_ff is
Begin
process(clk)
begin
If (rising_edge(clk)) then
If (rst = '1') then
Q <= '0';
nQ <= '1';
else
Q <= d;
nQ <= not (d);
end if;
end if;
end process;
end behav;
• Let's model the well known D-FF with outputs Q & nQ and see
the synthesis results
Example 8:
Two Flip-Flops ?!
Change the code to have only one Flip-Flop
Dr Sudhir Shelke Page 27 of 51
28. 28
• Let's model the well known D-FF with outputs Q & nQ and see
the synthesis results
Example 9:
Yep…That's hat e a t!
Library ieee;
use ieee.std_logic_1164.all;
Entity d_ff is
Port( d, clk, rst : in std_logic;
Q, nQ : out std_logic);
end entity;
Architecture behav of d_ff is
signal Q_int: std_logic;
Begin
process(clk)
begin
If (rising_edge(clk)) then
If (rst = '1') then
Q_int <= '0';
else
Q_int <= d;
end if;
end if;
end process;
Q <= Q_int;
nQ <= not (Q_int);
end behav; Dr Sudhir Shelke Page 28 of 51
32. LIBRARY ieee;
USE ieee.std_logic_1164.all;
Entity unknown is
port(x: out std_logic;
y: in std_logic_vector(3 downto 0);
c: in integer);
End entity;
Architecture behave of unknown is
Begin
x <= y(c);
End behave;
Deduce what the below code models
Use synthesis tool to validate your answer
32Dr Sudhir Shelke Page 32 of 51
33. Power Estimation and Analysis of FPGA based
System.
Dr Sudhir Shelke Page 33 of 51
34. As devices get larger and faster, power
consumption goes up
First-generation FPGAs had
Lower performance
Lower power requirements
No package power concerns
Today’s FPGAs ha e
Much higher performance
Higher power requirements
Package power limit concerns
A System Monitor that provides active monitoring of the die temperature
Refer to the Virtex-6 User Guide for more information
Performance (MHz)
PMAX
Package Power
Limit
Real World Design
Power Consumption
High Density
Low
Density
Dr Sudhir Shelke Page 34 of 51
35. High-speed and high-
density designs require
more power, leading to
higher junction
temperatures
Package thermal limits
exist
125° C for plastic
150° C for ceramic
Power directly limits
System performance
Design density
Package options
Device reliability
Dr Sudhir Shelke Page 35 of 51
36. Estimating power
consumption is a complex
calculation
Power consumption of an FPGA
is almost exclusively dynamic
Power consumption is
dependent on design and is
affected by
Output loading
System performance
(switching frequency)
Design density (number of
interconnects)
Design activity (percent of
interconnects switching)
Logic block and interconnect
structure
Supply voltage
Dr Sudhir Shelke Page 36 of 51
37. Power calculations can be performed at
three distinct phases of the design cycle
Concept phase: A rough estimate of power can be
calculated based on estimates of logic capacity and
activity rates
Use the Xilinx Power Estimator spreadsheet
Design phase: Power can be calculated more
accurately based on detailed information about how
the design is implemented in the FPGA
Use the XPower Analyzer
System Integration phase: Power is calculated in a lab
environment
Use actual instrumentation
Accurate power calculation at an early
stage in the design cycle will result in
fewer problems later
Dr Sudhir Shelke Page 37 of 51
38. Accurate activity rates (also known as toggle
rates) are required for meaningful power
calculations
Clocks and input signals have an absolute
frequency
Synchronous logic nets use a percentage
activity rate
100% indicates that a net is expected to change state
on every clock cycle
Allows you to adjust the primary clock frequency and
see the effect on power consumption
Can be set globally to an average activity rate on
groups or individual nets
Logic elements also use a percentage activity
rate
Based on the activity rate of output signals of the logic
element
Logic elements have capacitance
Dr Sudhir Shelke Page 38 of 51
39. Excel spreadsheets with power
estimation formulas built in
Enter design data in white boxes
Power estimates are shown in gray boxes
Sheets
Summary (device totals)
Clock, Logic, I/O, Block RAMs, DSP, MMCM
GTX, TEMAC, PCIE
To download go to
http://www.support.xilinx.com ->
Technology Solutions -> Power
Download the XPE spreadsheet for your device
family
XPE is not installed with the ISE software
The Power Solutions page has numerous
resources
Dr Sudhir Shelke Page 39 of 51
40. Summary and Quiescent power
White boxes allow you to enter
design data
Gray boxes show you the Power
estimates
Tabs at bottom allow you to enter
power information per device
resources (not shown)
Settings reviews device, system,
and environment information
On-Chip Power breaks the
estimated power consumption into
device resources
Dr Sudhir Shelke Page 40 of 51
41. Summary and Quiescent
power
Power Supply reviews
what power sources will
be necessary
Summary describes
your systems total
power and estimated
junction temperature
Dr Sudhir Shelke Page 41 of 51
45. A utility for estimating the power
consumption and junction temperature
of FPGA and CPLD devices
Reads an implemented design (NCD file)
and timing constraint data
You supply activity rates
Clock frequencies
Activity rates for nets, logic elements, and output
pins
Capacitive loading on output pins
Power supply data and ambient temperature
Detailed design activity data from simulation (VCD
file)
The XPower Analyzer calculates the total
average power consumption and
generates a report
Dr Sudhir Shelke Page 45 of 51
46. Expand Implement Design
Place & Route
Double-click XPower Analyzer
to launch the XPower utility in
interactive mode
Use the Generate Power Data
process to create reports using
VCD files or TCL scripts
Dr Sudhir Shelke Page 46 of 51
47. Estimated junction temperature
Reporting, settings, and thermal information is all placed in one utility
As you manipulate system characteristics you will update the generated report
Report Navigator allows for quick migration to various reports and functions of
the utility
Dr Sudhir Shelke Page 47 of 51
48. Produced as a simple text file
File is given .pwr
extension
Report is more detailed
and stored in one text file
Some what-if analysis
information is included
Includes a Power
Improvement Guide
Dr Sudhir Shelke Page 48 of 51
49. Pipeli
ning
and
Retim
ing
Adding registers along a path
split combinational logic into multiple
cycles
increase clock rate
increase throughput
increase latency
Dr Sudhir Shelke Page 49 of 51
50. Pipeli
ning
and
Retim
ing
Delay, d, of slowest combinational stage
determines performance
clock period = d
Throughput = 1/d : rate at which
outputs are produced
Latency = •d : number of stages *
clock period
Pipelining increases circuit utilization
Registers slow down data, synchronize
data paths
Wave-pipelining
no pipeline registers - waves of data flow through
circuit
relies on equal-delay circuit paths - no short paths
Dr Sudhir Shelke Page 50 of 51
51. Pipeli
ning
and
Retim
ing
Where is the best place to add
registers?
splitting combinational logic
overhead of registers (propagation delay and
setup time requirements)
What about cycles in data path?
Example: 16-bit adder, add 8-bits in
each of two cycles
Dr Sudhir Shelke Page 51 of 51