SlideShare a Scribd company logo
1 of 62
Download to read offline
Introduction and Background
Multiplier Architectures
Results
Conclusion
Implementation and Comparison of Softcore
Multiplier Architectures for FPGAs
Shahid Abbas
Projektarbeit (Master of Science)
Fachgebiet Digitaltechnik
Universt¨at Kassel
August 22, 2014
1 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Outline
1 Introduction and Background
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
2 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Outline
1 Introduction and Background
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
2 Multiplier Architectures
Target Specific Implementation
LUT-Based Multipliers
2 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Outline
1 Introduction and Background
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
2 Multiplier Architectures
Target Specific Implementation
LUT-Based Multipliers
3 Results
Simulation
Synthesis Results
2 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Outline
1 Introduction and Background
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
2 Multiplier Architectures
Target Specific Implementation
LUT-Based Multipliers
3 Results
Simulation
Synthesis Results
4 Conclusion
2 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Motivations
Fast Multiplication for Signal Processing
3 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Motivations
Fast Multiplication for Signal Processing
Limited number of DSP Blocks in FPGA [1]
3 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Motivations
Fast Multiplication for Signal Processing
Limited number of DSP Blocks in FPGA [1]
Fixed word size
3 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Motivations
Fast Multiplication for Signal Processing
Limited number of DSP Blocks in FPGA [1]
Fixed word size
Use big multiplier for small word size
3 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Motivations
Fast Multiplication for Signal Processing
Limited number of DSP Blocks in FPGA [1]
Fixed word size
Use big multiplier for small word size
Fixed allocation
3 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Motivations
Fast Multiplication for Signal Processing
Limited number of DSP Blocks in FPGA [1]
Fixed word size
Use big multiplier for small word size
Fixed allocation
Place and routing issues
3 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Motivations
Fast Multiplication for Signal Processing
Limited number of DSP Blocks in FPGA [1]
Fixed word size
Use big multiplier for small word size
Fixed allocation
Place and routing issues
Use of FPGA logic blocks for multiplier of any word size
3 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Motivations
Fast Multiplication for Signal Processing
Limited number of DSP Blocks in FPGA [1]
Fixed word size
Use big multiplier for small word size
Fixed allocation
Place and routing issues
Use of FPGA logic blocks for multiplier of any word size
Softcore multiplier that work in conjunction with DSP multipliers
3 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Fundamentals of Binary Multiplication
1 Partial Products Calculation
A=A3
B=B3
20
·B0·A
x
+
Step 1
Step 2
A0…
… B0
21
·B1·A
22
·B2·A
23
·B3·A
+
+
=
Figure: Binary 4×4-bit Multiplication
4 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Fundamentals of Binary Multiplication
1 Partial Products Calculation
2 Addition of Partial Products by proper shifting
A=A3
B=B3
20
·B0·A
x
+
Step 1
Step 2
A0…
… B0
21
·B1·A
22
·B2·A
23
·B3·A
+
+
=
Figure: Binary 4×4-bit Multiplication
4 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Xilinx Virtex-6 Slice [2]
Configurable Logic Blocks (CLB) contains two slices
0
1
0
1
0
1
0
1
c_in
c_out
LUTLUTLUTLUT
Figure: Xilinx Virtex-6 Slice
5 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Xilinx Virtex-6 Slice [2]
Configurable Logic Blocks (CLB) contains two slices
Each slice contains four Look-Up Tables (LUT), eight Flip-Flops, multiplexers and a
carry-propagation logic.
0
1
0
1
0
1
0
1
c_in
c_out
LUTLUTLUTLUT
Figure: Xilinx Virtex-6 Slice
5 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
Xilinx Virtex-6 Slice [2]
Configurable Logic Blocks (CLB) contains two slices
Each slice contains four Look-Up Tables (LUT), eight Flip-Flops, multiplexers and a
carry-propagation logic.
Single or two outputs per LUT
0
1
0
1
0
1
0
1
c_in
c_out
LUTLUTLUTLUT
Figure: Xilinx Virtex-6 Slice
5 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
FloPoCo Library and Bit Heaps
Floating-Point Cores (FloPoCo), C++ framework for synthesizable VHDL code [3] [4].
before first compression
1 0.530 ns
1 1.061 ns
before 3-bit height additions
before final addition
Figure: Bit-Heap Structure for 16×16-Bit Multiplier
6 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
FloPoCo Library and Bit Heaps
Floating-Point Cores (FloPoCo), C++ framework for synthesizable VHDL code [3] [4].
Bit heap is a data-structure holds unevaluated sum of any number of bits weighted by
power of two [5].
before first compression
1 0.530 ns
1 1.061 ns
before 3-bit height additions
before final addition
Figure: Bit-Heap Structure for 16×16-Bit Multiplier
6 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Motivation
Fundamentals of Binary Multiplication
Xilinx Virtex-6 Slice
FloPoCo Library and Bit Heaps
FloPoCo Library and Bit Heaps
Floating-Point Cores (FloPoCo), C++ framework for synthesizable VHDL code [3] [4].
Bit heap is a data-structure holds unevaluated sum of any number of bits weighted by
power of two [5].
Equally weighted bits aligned in column as order is irrelevant for sum
before first compression
1 0.530 ns
1 1.061 ns
before 3-bit height additions
before final addition
Figure: Bit-Heap Structure for 16×16-Bit Multiplier
6 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
Target Specific Implementation [6]
Best Fit design in Logic Blocks = Better Performance
a b
c_out c_in
sum
0
1
Figure: Full Adder Implementation with Multiplexer and XOR-Gates
7 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
Target Specific Implementation [6]
0
1
0
1
0
1
0
1
c_in
c_out
S0S1S2S3
LUTLUTLUTLUT
a0b0a1b1a2b2a3b3a4b4a5b5a6b6a7b7
Figure: Slice configuration of 4-LUTs for Partial Product and Addition
8 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
Target Specific Implementation (Automated)
vector < vector < pair < int, int >>>
0
00
0
0
c_in=0
c_out (to bit-heap)
Partial-Product Calculation
Re-arrangement
3-LUT Slice
4-LUT Slice
n 8-bit
m 8-bit
Before Multiplication
A
B
20
·B0·A
21
·B1·A
22
·B2·A
23
·B3·A
24
·B4·A
25
·B5·A
26
·B6·A
27
·B7·A
Figure: 8×8-bit Multiplier Implementation in FloPoCo
9 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
Target Specific Implementation (For 8×8-bit Multiplier)
Automated Implementation
10 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
Target Specific Implementation (For 8×8-bit Multiplier)
Automated Implementation
vector < vector < pair < int, int >>>
10 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
Target Specific Implementation (For 8×8-bit Multiplier)
Automated Implementation
vector < vector < pair < int, int >>>
Manual interconnection of Slices
10 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
Target Specific Implementation (For 8×8-bit Multiplier)
Automated Implementation
vector < vector < pair < int, int >>>
Manual interconnection of Slices
Addition using Bit Heaps
10 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
Target Specific Implementation (For 8×8-bit Multiplier)
Automated Implementation
vector < vector < pair < int, int >>>
Manual interconnection of Slices
Addition using Bit Heaps
Addition using Arithmetic Expressions
10 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
Target Specific Implementation (Manual)
Re-arrangement
Tobitheap
Tobitheap
Tobitheap
Tobitheap
Tobitheap
AND-gate
Figure: 8×8-bit Multiplier with Manual Interconnection of Slices
11 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
LUT-Based Multipliers [7] [5]
Multiplication of two numbers can be obtained by the bit shifted additions of
small multiplier result
A = 2n
A1 + A0 (1)
B = 2n
B1 + B0 (2)
A × B = 22n
A1B1 + 2n
(A1B0 + A0B1) + A0B0 (3)
12 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
LUT-Based Multipliers [7] [5]
Multiplication of two numbers can be obtained by the bit shifted additions of
small multiplier result
A = 2n
A1 + A0 (1)
B = 2n
B1 + B0 (2)
A × B = 22n
A1B1 + 2n
(A1B0 + A0B1) + A0B0 (3)
A basic n × m-bit multiplier can be instantiated multiple times
12 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
LUT-Based Multipliers [7] [5]
Multiplication of two numbers can be obtained by the bit shifted additions of
small multiplier result
A = 2n
A1 + A0 (1)
B = 2n
B1 + B0 (2)
A × B = 22n
A1B1 + 2n
(A1B0 + A0B1) + A0B0 (3)
A basic n × m-bit multiplier can be instantiated multiple times
Add results of each instance through proper shifting
12 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
LUT-Based Multipliers [7] [5]
3×3-LUT based Multipliers (Needs 6-LUTs for 6 output Bits)
3
3
6
A B
Y
Figure: 3×3-LUT Multiplier
3
5
A B
Y
2
Figure: 3×2-LUT Multiplier
13 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
LUT-Based Multipliers [7] [5]
3×3-LUT based Multipliers (Needs 6-LUTs for 6 output Bits)
3×2-LUT based Multipliers (Needs 3-LUTs for 5 output Bits)
3
3
6
A B
Y
Figure: 3×3-LUT Multiplier
3
5
A B
Y
2
Figure: 3×2-LUT Multiplier
13 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
LUT-Based Multipliers [7] [5]
3×3-LUT based Multipliers (Needs 6-LUTs for 6 output Bits)
3×2-LUT based Multipliers (Needs 3-LUTs for 5 output Bits)
1×4-LUT based Multipliers
3
3
6
A B
Y
Figure: 3×3-LUT Multiplier
3
5
A B
Y
2
Figure: 3×2-LUT Multiplier
13 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
LUT-Based Multipliers (3×3-LUT based 8×7 Multiplier)
3x3 3x3
3x33x3
0 1 2 3 4 5
0
1
2
3
4
5
2x3
2x3
3x1 3x16 2x1
6 7
A
B
i ii
iii iv
v
vi
vii viii ix
Figure: 8×7-bit LUT-Multiplier Implementation in FloPoCo
14 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Target Specific Implementation
LUT-Based Multipliers
LUT-Based Multipliers (3×3-LUT based 8×7 Multiplier)
3x3 3x3
3x33x3
0 1 2 3 4 5
0
1
2
3
4
5
2x3
2x3
3x1 3x16 2x1
6 7
A
B
i ii
iii iv
v
vi
vii viii ix
Figure: 8×7-bit LUT-Multiplier Implementation in FloPoCo
AB = A0..2B0..2 + 23
(A3..5B0..2 + A0..2B3..5) + 26
(A6..7B0..2 + A3..5B3..5 + A0..2B6)
+ 29
(A6..7B3..5 + A3..5B6) + 212
A6..7B6
(4)
14 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Simulation
1 Word sizes are even and equal
15 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Simulation
1 Word sizes are even and equal
2 Word sizes are even and unequal.
15 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Simulation
1 Word sizes are even and equal
2 Word sizes are even and unequal.
3 Width of large word is even and other is odd
15 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Simulation
1 Word sizes are even and equal
2 Word sizes are even and unequal.
3 Width of large word is even and other is odd
4 Width of large word is odd and other is even
15 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Simulation
1 Word sizes are even and equal
2 Word sizes are even and unequal.
3 Width of large word is even and other is odd
4 Width of large word is odd and other is even
5 Word sizes are odd and unequal
15 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Simulation
1 Word sizes are even and equal
2 Word sizes are even and unequal.
3 Width of large word is even and other is odd
4 Width of large word is odd and other is even
5 Word sizes are odd and unequal
6 Word sizes are odd and equal
15 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Simulation
1 Word sizes are even and equal
2 Word sizes are even and unequal.
3 Width of large word is even and other is odd
4 Width of large word is odd and other is even
5 Word sizes are odd and unequal
6 Word sizes are odd and equal
Eight Designs for every of above specifications
15 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Simulation
1 Word sizes are even and equal
2 Word sizes are even and unequal.
3 Width of large word is even and other is odd
4 Width of large word is odd and other is even
5 Word sizes are odd and unequal
6 Word sizes are odd and equal
Eight Designs for every of above specifications
48-Designs for each architecture
15 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Simulation
1 Word sizes are even and equal
2 Word sizes are even and unequal.
3 Width of large word is even and other is odd
4 Width of large word is odd and other is even
5 Word sizes are odd and unequal
6 Word sizes are odd and equal
Eight Designs for every of above specifications
48-Designs for each architecture
Self-Checking testbenches were generated using FloPoCo function emulate(TestCase
*tc)
15 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Simulation
1 Word sizes are even and equal
2 Word sizes are even and unequal.
3 Width of large word is even and other is odd
4 Width of large word is odd and other is even
5 Word sizes are odd and unequal
6 Word sizes are odd and equal
Eight Designs for every of above specifications
48-Designs for each architecture
Self-Checking testbenches were generated using FloPoCo function emulate(TestCase
*tc)
TestBench 10000 option was used to generated 10000 random test
cases during core-generation.
15 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Simulation
1 Word sizes are even and equal
2 Word sizes are even and unequal.
3 Width of large word is even and other is odd
4 Width of large word is odd and other is even
5 Word sizes are odd and unequal
6 Word sizes are odd and equal
Eight Designs for every of above specifications
48-Designs for each architecture
Self-Checking testbenches were generated using FloPoCo function emulate(TestCase
*tc)
TestBench 10000 option was used to generated 10000 random test
cases during core-generation.
Simulation on ModelSim
15 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Synthesis Results
0 500 1000 1500 2000 2500 3000 3500 4000 4500
0
100
200
300
400
500
600
700
800
900
1000
Speed Vs Complexity (N X M)
Complexity (N X M)
Frequency(MHz)
f
max
= 906.62 MHz in Target Specific Multiplier
Target Specfic Multiplier
3x3 LUT Multiplier
1x4 LUT Multiplier
3x2 LUT Multiplier
Figure: Comparison of Architectures on the basis of Speed (for N×M-bit)
16 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Synthesis Results
0 10 20 30 40 50 60 70
0
100
200
300
400
500
600
700
Speed Vs Complexity (N X N)
Complexity (N)
Frequency(MHz)
Target Specfic Multiplier
3x3 LUT Multiplier
1x4 LUT Multiplier
3x2 LUT Multiplier
Figure: Comparison of Architectures on the basis of Speed (for N×N-bit)
17 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Synthesis Results
0 500 1000 1500 2000 2500 3000 3500 4000 4500
0
200
400
600
800
1000
1200
1400
1600
1800
Slice Usage Vs Complexity (N X M)
Complexity (N X M)
NumberofSlices
Target Specfic Multiplier
3x3 LUT Multiplier
1x4 LUT Multiplier
3x2 LUT Multiplier
Figure: Comparison of Architectures on the basis of Slice usage (for N×M-bit)
18 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Synthesis Results
0 10 20 30 40 50 60 70
0
200
400
600
800
1000
1200
1400
1600
1800
Slice Usage Vs Complexity (N X N)
Complexity (N)
NumberofSlices
Target Specfic Multiplier
3x3 LUT Multiplier
1x4 LUT Multiplier
3x2 LUT Multiplier
Figure: Comparison of Architectures on the basis of Speed (for N×N-bit)
19 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Synthesis Results
Average Performance
Table: Average values of parameters for different architectures
Architecture No. of Flip-Flops No. of LUTs No. of Slices Frequency (MHz)
Target Specific 1144 1615 419 346.36
3×3-LUT 1422 1893 491 301.03
3×2-LUT 1730 1962 513 264.95
1×4-LUT 2019 2340 610 259.98
20 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Synthesis Results
Automatic Vs Manual Interconnection of Slices (8×8-bit)
Table: Automatic Vs Manual routing between Slices
No. of FFs No. of LUTs No. of Slices Frequency (MHz)
Automatic 56 74 21 686.81
Manual(Bit Heap) 22 74 20 256.61
Manual (Without Bit Heap) 40 59 16 414.08
21 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Simulation
Synthesis Results
Synthesis Results
Automatic Vs Manual Interconnection of Slices (8×8-bit)
before first compression
before 3-bit height additions
before final addition
Figure: Bit Heap Structure for Automatic
Interconnection of Slices
before first compression
1 0.530 ns
before 3-bit height additions
before final addition
Figure: Bit Heap Structure for Manual
Interconnection of Slices
22 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Conclusion
Fast multipliers with minimum resources can be implemented
by choosing appropriate architecture.
23 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Conclusion
Fast multipliers with minimum resources can be implemented
by choosing appropriate architecture.
Target Specific Implementation showed best results due to
average fast speed and less consumption of resources.
23 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Conclusion
Fast multipliers with minimum resources can be implemented
by choosing appropriate architecture.
Target Specific Implementation showed best results due to
average fast speed and less consumption of resources.
Automated generation of this approach can modified with
introduction of AND-gate for corner elements.
23 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Conclusion
Fast multipliers with minimum resources can be implemented
by choosing appropriate architecture.
Target Specific Implementation showed best results due to
average fast speed and less consumption of resources.
Automated generation of this approach can modified with
introduction of AND-gate for corner elements.
Slice usage can be improved by their manual interconnection,
with compromise over speed.
23 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
References
[1] Ian Kuon and J. Rose.
Measuring the Gap Between FPGAs and ASICs.
Computer-Aided Design of Integrated Circuits and Systems, 26:203–215, February 2007.
[2] Xilinx.
Virtex-6 FPGA, Configurable Logic Block User Guide, UG364 (v1.2).
http://www.xilinx.com/support/documentation/user_guides/ug364.pdf, 2012.
[3] F. de Dinechin and B. Pasca.
Designing Custom Arithmetic Data Paths with FloPoCo.
Design and Test of Computers, 28:18–27, 2011.
[4] Florent de Dinechin.
Tutorial held at HiPEAC’2013 “Building Custom Arithmetic Operators with the FloPoCo Generator”.
http://perso.citi-lab.fr/fdedinec/recherche/2013-HiPEAC-Tutorial-FloPoCo/flopoco-tutorial.pdf,
2013.
[5] Brunie N., de Dinechin F., Istoan M., Sergent G., Illyes K., and Popa B.
Arithmetic core generation using bit heaps.
In Proc. IEEE FPL ’2013, pages 1–8, Porto, Portugal, 2–4, 2013.
[6] H. ParandehAfshar and P. Ienne.
Measuring and Reducing the Performance Gap between Embedded and Soft Multipliers on FPGAs.
In Proc. IEEE FPL ’2011, pages 225–231, Chania, Greece, 5–7, 2011.
[7] F. de Dinechin and B. Pasca.
Large multipliers with fewer DSP blocks.
In Proc. IEEE FPL ’2009, pages 225–231, Chania, Greece, Aug 31-Sept 2 2011.
24 / 25
Introduction and Background
Multiplier Architectures
Results
Conclusion
Thanks for your attention !
25 / 25

More Related Content

What's hot

What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?Michelle Holley
 
An Optimized Implementation Of 64-Bit MAC Unit For DSP Applications Using SPST
An Optimized Implementation Of 64-Bit MAC Unit For DSP Applications Using SPSTAn Optimized Implementation Of 64-Bit MAC Unit For DSP Applications Using SPST
An Optimized Implementation Of 64-Bit MAC Unit For DSP Applications Using SPSTPatnam Shruthi
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmicsDenys Haryachyy
 
Project ACRN: SR-IOV implementation
Project ACRN: SR-IOV implementationProject ACRN: SR-IOV implementation
Project ACRN: SR-IOV implementationGeoffroy Van Cutsem
 
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC ClustersSR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC ClustersGlenn K. Lockwood
 
Yocto Project introduction
Yocto Project introductionYocto Project introduction
Yocto Project introductionYi-Hsiu Hsu
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu WorksZhen Wei
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsVipin Varghese
 
UVM Methodology Tutorial
UVM Methodology TutorialUVM Methodology Tutorial
UVM Methodology TutorialArrow Devices
 
ETUDE DE L'EVOLUTION DU COEUR PAQUET 3G VERS L'EPC
ETUDE DE L'EVOLUTION DU COEUR PAQUET 3G VERS L'EPCETUDE DE L'EVOLUTION DU COEUR PAQUET 3G VERS L'EPC
ETUDE DE L'EVOLUTION DU COEUR PAQUET 3G VERS L'EPCOkoma Diby
 
Poll mode driver integration into dpdk
Poll mode driver integration into dpdkPoll mode driver integration into dpdk
Poll mode driver integration into dpdkVipin Varghese
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATOpen vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATThomas Graf
 
Radware Alteon Introduction - new GUI
Radware Alteon Introduction - new GUIRadware Alteon Introduction - new GUI
Radware Alteon Introduction - new GUI윤기 정
 

What's hot (20)

What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
An Optimized Implementation Of 64-Bit MAC Unit For DSP Applications Using SPST
An Optimized Implementation Of 64-Bit MAC Unit For DSP Applications Using SPSTAn Optimized Implementation Of 64-Bit MAC Unit For DSP Applications Using SPST
An Optimized Implementation Of 64-Bit MAC Unit For DSP Applications Using SPST
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Project ACRN: SR-IOV implementation
Project ACRN: SR-IOV implementationProject ACRN: SR-IOV implementation
Project ACRN: SR-IOV implementation
 
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC ClustersSR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
 
Yocto Project introduction
Yocto Project introductionYocto Project introduction
Yocto Project introduction
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Works
 
Bootloaders
BootloadersBootloaders
Bootloaders
 
Dpdk pmd
Dpdk pmdDpdk pmd
Dpdk pmd
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
 
UVM Methodology Tutorial
UVM Methodology TutorialUVM Methodology Tutorial
UVM Methodology Tutorial
 
ETUDE DE L'EVOLUTION DU COEUR PAQUET 3G VERS L'EPC
ETUDE DE L'EVOLUTION DU COEUR PAQUET 3G VERS L'EPCETUDE DE L'EVOLUTION DU COEUR PAQUET 3G VERS L'EPC
ETUDE DE L'EVOLUTION DU COEUR PAQUET 3G VERS L'EPC
 
Linux Porting
Linux PortingLinux Porting
Linux Porting
 
Poll mode driver integration into dpdk
Poll mode driver integration into dpdkPoll mode driver integration into dpdk
Poll mode driver integration into dpdk
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATOpen vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NAT
 
Introduction to DPDK RIB library
Introduction to DPDK RIB libraryIntroduction to DPDK RIB library
Introduction to DPDK RIB library
 
Microcontroller part 4
Microcontroller part 4Microcontroller part 4
Microcontroller part 4
 
Radware Alteon Introduction - new GUI
Radware Alteon Introduction - new GUIRadware Alteon Introduction - new GUI
Radware Alteon Introduction - new GUI
 
Linux Kernel Overview
Linux Kernel OverviewLinux Kernel Overview
Linux Kernel Overview
 

Similar to Implementation and Comparison of Softcore Multiplier Architectures for FPGAs

Design and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpgaDesign and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpgaVLSICS Design
 
VLSI Experiments I
VLSI Experiments IVLSI Experiments I
VLSI Experiments IGouthaman V
 
Reverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipReverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipRISC-V International
 
(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collectionsBIOVIA
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Intel® Software
 
[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI EcosystemJiangjie Qin
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
 
186 devlin p-poster(2)
186 devlin p-poster(2)186 devlin p-poster(2)
186 devlin p-poster(2)vaidehi87
 
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Deepak Kumar
 
Softcore processor.pptxSoftcore processor.pptxSoftcore processor.pptx
Softcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSoftcore processor.pptx
Softcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSnehaLatha68
 
Demosaic RTL for ISP workflow
Demosaic RTL for ISP workflowDemosaic RTL for ISP workflow
Demosaic RTL for ISP workflowMaikon
 
Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsBowen Li
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Flink Forward
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
 
Differences of Deep Learning Frameworks
Differences of Deep Learning FrameworksDifferences of Deep Learning Frameworks
Differences of Deep Learning FrameworksSeiya Tokui
 
f37-book-intarch-pres-pt2.ppt
f37-book-intarch-pres-pt2.pptf37-book-intarch-pres-pt2.ppt
f37-book-intarch-pres-pt2.pptssuserf06014
 
f37-book-intarch-pres-pt2.ppt
f37-book-intarch-pres-pt2.pptf37-book-intarch-pres-pt2.ppt
f37-book-intarch-pres-pt2.pptVhhvf
 

Similar to Implementation and Comparison of Softcore Multiplier Architectures for FPGAs (20)

Design and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpgaDesign and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpga
 
VLSI Experiments I
VLSI Experiments IVLSI Experiments I
VLSI Experiments I
 
02 intro syst_gen
02 intro syst_gen02 intro syst_gen
02 intro syst_gen
 
Reverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipReverse Engineering of Rocket Chip
Reverse Engineering of Rocket Chip
 
(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
 
[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
186 devlin p-poster(2)
186 devlin p-poster(2)186 devlin p-poster(2)
186 devlin p-poster(2)
 
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)
 
Softcore processor.pptxSoftcore processor.pptxSoftcore processor.pptx
Softcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSoftcore processor.pptx
Softcore processor.pptxSoftcore processor.pptxSoftcore processor.pptx
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
Demosaic RTL for ISP workflow
Demosaic RTL for ISP workflowDemosaic RTL for ISP workflow
Demosaic RTL for ISP workflow
 
Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systems
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
 
Differences of Deep Learning Frameworks
Differences of Deep Learning FrameworksDifferences of Deep Learning Frameworks
Differences of Deep Learning Frameworks
 
f37-book-intarch-pres-pt2.ppt
f37-book-intarch-pres-pt2.pptf37-book-intarch-pres-pt2.ppt
f37-book-intarch-pres-pt2.ppt
 
f37-book-intarch-pres-pt2.ppt
f37-book-intarch-pres-pt2.pptf37-book-intarch-pres-pt2.ppt
f37-book-intarch-pres-pt2.ppt
 

Recently uploaded

Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 

Recently uploaded (20)

Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 

Implementation and Comparison of Softcore Multiplier Architectures for FPGAs

  • 1. Introduction and Background Multiplier Architectures Results Conclusion Implementation and Comparison of Softcore Multiplier Architectures for FPGAs Shahid Abbas Projektarbeit (Master of Science) Fachgebiet Digitaltechnik Universt¨at Kassel August 22, 2014 1 / 25
  • 2. Introduction and Background Multiplier Architectures Results Conclusion Outline 1 Introduction and Background Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps 2 / 25
  • 3. Introduction and Background Multiplier Architectures Results Conclusion Outline 1 Introduction and Background Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps 2 Multiplier Architectures Target Specific Implementation LUT-Based Multipliers 2 / 25
  • 4. Introduction and Background Multiplier Architectures Results Conclusion Outline 1 Introduction and Background Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps 2 Multiplier Architectures Target Specific Implementation LUT-Based Multipliers 3 Results Simulation Synthesis Results 2 / 25
  • 5. Introduction and Background Multiplier Architectures Results Conclusion Outline 1 Introduction and Background Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps 2 Multiplier Architectures Target Specific Implementation LUT-Based Multipliers 3 Results Simulation Synthesis Results 4 Conclusion 2 / 25
  • 6. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Motivations Fast Multiplication for Signal Processing 3 / 25
  • 7. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Motivations Fast Multiplication for Signal Processing Limited number of DSP Blocks in FPGA [1] 3 / 25
  • 8. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Motivations Fast Multiplication for Signal Processing Limited number of DSP Blocks in FPGA [1] Fixed word size 3 / 25
  • 9. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Motivations Fast Multiplication for Signal Processing Limited number of DSP Blocks in FPGA [1] Fixed word size Use big multiplier for small word size 3 / 25
  • 10. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Motivations Fast Multiplication for Signal Processing Limited number of DSP Blocks in FPGA [1] Fixed word size Use big multiplier for small word size Fixed allocation 3 / 25
  • 11. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Motivations Fast Multiplication for Signal Processing Limited number of DSP Blocks in FPGA [1] Fixed word size Use big multiplier for small word size Fixed allocation Place and routing issues 3 / 25
  • 12. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Motivations Fast Multiplication for Signal Processing Limited number of DSP Blocks in FPGA [1] Fixed word size Use big multiplier for small word size Fixed allocation Place and routing issues Use of FPGA logic blocks for multiplier of any word size 3 / 25
  • 13. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Motivations Fast Multiplication for Signal Processing Limited number of DSP Blocks in FPGA [1] Fixed word size Use big multiplier for small word size Fixed allocation Place and routing issues Use of FPGA logic blocks for multiplier of any word size Softcore multiplier that work in conjunction with DSP multipliers 3 / 25
  • 14. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Fundamentals of Binary Multiplication 1 Partial Products Calculation A=A3 B=B3 20 ·B0·A x + Step 1 Step 2 A0… … B0 21 ·B1·A 22 ·B2·A 23 ·B3·A + + = Figure: Binary 4×4-bit Multiplication 4 / 25
  • 15. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Fundamentals of Binary Multiplication 1 Partial Products Calculation 2 Addition of Partial Products by proper shifting A=A3 B=B3 20 ·B0·A x + Step 1 Step 2 A0… … B0 21 ·B1·A 22 ·B2·A 23 ·B3·A + + = Figure: Binary 4×4-bit Multiplication 4 / 25
  • 16. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Xilinx Virtex-6 Slice [2] Configurable Logic Blocks (CLB) contains two slices 0 1 0 1 0 1 0 1 c_in c_out LUTLUTLUTLUT Figure: Xilinx Virtex-6 Slice 5 / 25
  • 17. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Xilinx Virtex-6 Slice [2] Configurable Logic Blocks (CLB) contains two slices Each slice contains four Look-Up Tables (LUT), eight Flip-Flops, multiplexers and a carry-propagation logic. 0 1 0 1 0 1 0 1 c_in c_out LUTLUTLUTLUT Figure: Xilinx Virtex-6 Slice 5 / 25
  • 18. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps Xilinx Virtex-6 Slice [2] Configurable Logic Blocks (CLB) contains two slices Each slice contains four Look-Up Tables (LUT), eight Flip-Flops, multiplexers and a carry-propagation logic. Single or two outputs per LUT 0 1 0 1 0 1 0 1 c_in c_out LUTLUTLUTLUT Figure: Xilinx Virtex-6 Slice 5 / 25
  • 19. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps FloPoCo Library and Bit Heaps Floating-Point Cores (FloPoCo), C++ framework for synthesizable VHDL code [3] [4]. before first compression 1 0.530 ns 1 1.061 ns before 3-bit height additions before final addition Figure: Bit-Heap Structure for 16×16-Bit Multiplier 6 / 25
  • 20. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps FloPoCo Library and Bit Heaps Floating-Point Cores (FloPoCo), C++ framework for synthesizable VHDL code [3] [4]. Bit heap is a data-structure holds unevaluated sum of any number of bits weighted by power of two [5]. before first compression 1 0.530 ns 1 1.061 ns before 3-bit height additions before final addition Figure: Bit-Heap Structure for 16×16-Bit Multiplier 6 / 25
  • 21. Introduction and Background Multiplier Architectures Results Conclusion Motivation Fundamentals of Binary Multiplication Xilinx Virtex-6 Slice FloPoCo Library and Bit Heaps FloPoCo Library and Bit Heaps Floating-Point Cores (FloPoCo), C++ framework for synthesizable VHDL code [3] [4]. Bit heap is a data-structure holds unevaluated sum of any number of bits weighted by power of two [5]. Equally weighted bits aligned in column as order is irrelevant for sum before first compression 1 0.530 ns 1 1.061 ns before 3-bit height additions before final addition Figure: Bit-Heap Structure for 16×16-Bit Multiplier 6 / 25
  • 22. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers Target Specific Implementation [6] Best Fit design in Logic Blocks = Better Performance a b c_out c_in sum 0 1 Figure: Full Adder Implementation with Multiplexer and XOR-Gates 7 / 25
  • 23. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers Target Specific Implementation [6] 0 1 0 1 0 1 0 1 c_in c_out S0S1S2S3 LUTLUTLUTLUT a0b0a1b1a2b2a3b3a4b4a5b5a6b6a7b7 Figure: Slice configuration of 4-LUTs for Partial Product and Addition 8 / 25
  • 24. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers Target Specific Implementation (Automated) vector < vector < pair < int, int >>> 0 00 0 0 c_in=0 c_out (to bit-heap) Partial-Product Calculation Re-arrangement 3-LUT Slice 4-LUT Slice n 8-bit m 8-bit Before Multiplication A B 20 ·B0·A 21 ·B1·A 22 ·B2·A 23 ·B3·A 24 ·B4·A 25 ·B5·A 26 ·B6·A 27 ·B7·A Figure: 8×8-bit Multiplier Implementation in FloPoCo 9 / 25
  • 25. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers Target Specific Implementation (For 8×8-bit Multiplier) Automated Implementation 10 / 25
  • 26. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers Target Specific Implementation (For 8×8-bit Multiplier) Automated Implementation vector < vector < pair < int, int >>> 10 / 25
  • 27. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers Target Specific Implementation (For 8×8-bit Multiplier) Automated Implementation vector < vector < pair < int, int >>> Manual interconnection of Slices 10 / 25
  • 28. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers Target Specific Implementation (For 8×8-bit Multiplier) Automated Implementation vector < vector < pair < int, int >>> Manual interconnection of Slices Addition using Bit Heaps 10 / 25
  • 29. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers Target Specific Implementation (For 8×8-bit Multiplier) Automated Implementation vector < vector < pair < int, int >>> Manual interconnection of Slices Addition using Bit Heaps Addition using Arithmetic Expressions 10 / 25
  • 30. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers Target Specific Implementation (Manual) Re-arrangement Tobitheap Tobitheap Tobitheap Tobitheap Tobitheap AND-gate Figure: 8×8-bit Multiplier with Manual Interconnection of Slices 11 / 25
  • 31. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers LUT-Based Multipliers [7] [5] Multiplication of two numbers can be obtained by the bit shifted additions of small multiplier result A = 2n A1 + A0 (1) B = 2n B1 + B0 (2) A × B = 22n A1B1 + 2n (A1B0 + A0B1) + A0B0 (3) 12 / 25
  • 32. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers LUT-Based Multipliers [7] [5] Multiplication of two numbers can be obtained by the bit shifted additions of small multiplier result A = 2n A1 + A0 (1) B = 2n B1 + B0 (2) A × B = 22n A1B1 + 2n (A1B0 + A0B1) + A0B0 (3) A basic n × m-bit multiplier can be instantiated multiple times 12 / 25
  • 33. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers LUT-Based Multipliers [7] [5] Multiplication of two numbers can be obtained by the bit shifted additions of small multiplier result A = 2n A1 + A0 (1) B = 2n B1 + B0 (2) A × B = 22n A1B1 + 2n (A1B0 + A0B1) + A0B0 (3) A basic n × m-bit multiplier can be instantiated multiple times Add results of each instance through proper shifting 12 / 25
  • 34. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers LUT-Based Multipliers [7] [5] 3×3-LUT based Multipliers (Needs 6-LUTs for 6 output Bits) 3 3 6 A B Y Figure: 3×3-LUT Multiplier 3 5 A B Y 2 Figure: 3×2-LUT Multiplier 13 / 25
  • 35. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers LUT-Based Multipliers [7] [5] 3×3-LUT based Multipliers (Needs 6-LUTs for 6 output Bits) 3×2-LUT based Multipliers (Needs 3-LUTs for 5 output Bits) 3 3 6 A B Y Figure: 3×3-LUT Multiplier 3 5 A B Y 2 Figure: 3×2-LUT Multiplier 13 / 25
  • 36. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers LUT-Based Multipliers [7] [5] 3×3-LUT based Multipliers (Needs 6-LUTs for 6 output Bits) 3×2-LUT based Multipliers (Needs 3-LUTs for 5 output Bits) 1×4-LUT based Multipliers 3 3 6 A B Y Figure: 3×3-LUT Multiplier 3 5 A B Y 2 Figure: 3×2-LUT Multiplier 13 / 25
  • 37. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers LUT-Based Multipliers (3×3-LUT based 8×7 Multiplier) 3x3 3x3 3x33x3 0 1 2 3 4 5 0 1 2 3 4 5 2x3 2x3 3x1 3x16 2x1 6 7 A B i ii iii iv v vi vii viii ix Figure: 8×7-bit LUT-Multiplier Implementation in FloPoCo 14 / 25
  • 38. Introduction and Background Multiplier Architectures Results Conclusion Target Specific Implementation LUT-Based Multipliers LUT-Based Multipliers (3×3-LUT based 8×7 Multiplier) 3x3 3x3 3x33x3 0 1 2 3 4 5 0 1 2 3 4 5 2x3 2x3 3x1 3x16 2x1 6 7 A B i ii iii iv v vi vii viii ix Figure: 8×7-bit LUT-Multiplier Implementation in FloPoCo AB = A0..2B0..2 + 23 (A3..5B0..2 + A0..2B3..5) + 26 (A6..7B0..2 + A3..5B3..5 + A0..2B6) + 29 (A6..7B3..5 + A3..5B6) + 212 A6..7B6 (4) 14 / 25
  • 39. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Simulation 1 Word sizes are even and equal 15 / 25
  • 40. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Simulation 1 Word sizes are even and equal 2 Word sizes are even and unequal. 15 / 25
  • 41. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Simulation 1 Word sizes are even and equal 2 Word sizes are even and unequal. 3 Width of large word is even and other is odd 15 / 25
  • 42. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Simulation 1 Word sizes are even and equal 2 Word sizes are even and unequal. 3 Width of large word is even and other is odd 4 Width of large word is odd and other is even 15 / 25
  • 43. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Simulation 1 Word sizes are even and equal 2 Word sizes are even and unequal. 3 Width of large word is even and other is odd 4 Width of large word is odd and other is even 5 Word sizes are odd and unequal 15 / 25
  • 44. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Simulation 1 Word sizes are even and equal 2 Word sizes are even and unequal. 3 Width of large word is even and other is odd 4 Width of large word is odd and other is even 5 Word sizes are odd and unequal 6 Word sizes are odd and equal 15 / 25
  • 45. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Simulation 1 Word sizes are even and equal 2 Word sizes are even and unequal. 3 Width of large word is even and other is odd 4 Width of large word is odd and other is even 5 Word sizes are odd and unequal 6 Word sizes are odd and equal Eight Designs for every of above specifications 15 / 25
  • 46. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Simulation 1 Word sizes are even and equal 2 Word sizes are even and unequal. 3 Width of large word is even and other is odd 4 Width of large word is odd and other is even 5 Word sizes are odd and unequal 6 Word sizes are odd and equal Eight Designs for every of above specifications 48-Designs for each architecture 15 / 25
  • 47. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Simulation 1 Word sizes are even and equal 2 Word sizes are even and unequal. 3 Width of large word is even and other is odd 4 Width of large word is odd and other is even 5 Word sizes are odd and unequal 6 Word sizes are odd and equal Eight Designs for every of above specifications 48-Designs for each architecture Self-Checking testbenches were generated using FloPoCo function emulate(TestCase *tc) 15 / 25
  • 48. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Simulation 1 Word sizes are even and equal 2 Word sizes are even and unequal. 3 Width of large word is even and other is odd 4 Width of large word is odd and other is even 5 Word sizes are odd and unequal 6 Word sizes are odd and equal Eight Designs for every of above specifications 48-Designs for each architecture Self-Checking testbenches were generated using FloPoCo function emulate(TestCase *tc) TestBench 10000 option was used to generated 10000 random test cases during core-generation. 15 / 25
  • 49. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Simulation 1 Word sizes are even and equal 2 Word sizes are even and unequal. 3 Width of large word is even and other is odd 4 Width of large word is odd and other is even 5 Word sizes are odd and unequal 6 Word sizes are odd and equal Eight Designs for every of above specifications 48-Designs for each architecture Self-Checking testbenches were generated using FloPoCo function emulate(TestCase *tc) TestBench 10000 option was used to generated 10000 random test cases during core-generation. Simulation on ModelSim 15 / 25
  • 50. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Synthesis Results 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 100 200 300 400 500 600 700 800 900 1000 Speed Vs Complexity (N X M) Complexity (N X M) Frequency(MHz) f max = 906.62 MHz in Target Specific Multiplier Target Specfic Multiplier 3x3 LUT Multiplier 1x4 LUT Multiplier 3x2 LUT Multiplier Figure: Comparison of Architectures on the basis of Speed (for N×M-bit) 16 / 25
  • 51. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Synthesis Results 0 10 20 30 40 50 60 70 0 100 200 300 400 500 600 700 Speed Vs Complexity (N X N) Complexity (N) Frequency(MHz) Target Specfic Multiplier 3x3 LUT Multiplier 1x4 LUT Multiplier 3x2 LUT Multiplier Figure: Comparison of Architectures on the basis of Speed (for N×N-bit) 17 / 25
  • 52. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Synthesis Results 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 200 400 600 800 1000 1200 1400 1600 1800 Slice Usage Vs Complexity (N X M) Complexity (N X M) NumberofSlices Target Specfic Multiplier 3x3 LUT Multiplier 1x4 LUT Multiplier 3x2 LUT Multiplier Figure: Comparison of Architectures on the basis of Slice usage (for N×M-bit) 18 / 25
  • 53. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Synthesis Results 0 10 20 30 40 50 60 70 0 200 400 600 800 1000 1200 1400 1600 1800 Slice Usage Vs Complexity (N X N) Complexity (N) NumberofSlices Target Specfic Multiplier 3x3 LUT Multiplier 1x4 LUT Multiplier 3x2 LUT Multiplier Figure: Comparison of Architectures on the basis of Speed (for N×N-bit) 19 / 25
  • 54. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Synthesis Results Average Performance Table: Average values of parameters for different architectures Architecture No. of Flip-Flops No. of LUTs No. of Slices Frequency (MHz) Target Specific 1144 1615 419 346.36 3×3-LUT 1422 1893 491 301.03 3×2-LUT 1730 1962 513 264.95 1×4-LUT 2019 2340 610 259.98 20 / 25
  • 55. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Synthesis Results Automatic Vs Manual Interconnection of Slices (8×8-bit) Table: Automatic Vs Manual routing between Slices No. of FFs No. of LUTs No. of Slices Frequency (MHz) Automatic 56 74 21 686.81 Manual(Bit Heap) 22 74 20 256.61 Manual (Without Bit Heap) 40 59 16 414.08 21 / 25
  • 56. Introduction and Background Multiplier Architectures Results Conclusion Simulation Synthesis Results Synthesis Results Automatic Vs Manual Interconnection of Slices (8×8-bit) before first compression before 3-bit height additions before final addition Figure: Bit Heap Structure for Automatic Interconnection of Slices before first compression 1 0.530 ns before 3-bit height additions before final addition Figure: Bit Heap Structure for Manual Interconnection of Slices 22 / 25
  • 57. Introduction and Background Multiplier Architectures Results Conclusion Conclusion Fast multipliers with minimum resources can be implemented by choosing appropriate architecture. 23 / 25
  • 58. Introduction and Background Multiplier Architectures Results Conclusion Conclusion Fast multipliers with minimum resources can be implemented by choosing appropriate architecture. Target Specific Implementation showed best results due to average fast speed and less consumption of resources. 23 / 25
  • 59. Introduction and Background Multiplier Architectures Results Conclusion Conclusion Fast multipliers with minimum resources can be implemented by choosing appropriate architecture. Target Specific Implementation showed best results due to average fast speed and less consumption of resources. Automated generation of this approach can modified with introduction of AND-gate for corner elements. 23 / 25
  • 60. Introduction and Background Multiplier Architectures Results Conclusion Conclusion Fast multipliers with minimum resources can be implemented by choosing appropriate architecture. Target Specific Implementation showed best results due to average fast speed and less consumption of resources. Automated generation of this approach can modified with introduction of AND-gate for corner elements. Slice usage can be improved by their manual interconnection, with compromise over speed. 23 / 25
  • 61. Introduction and Background Multiplier Architectures Results Conclusion References [1] Ian Kuon and J. Rose. Measuring the Gap Between FPGAs and ASICs. Computer-Aided Design of Integrated Circuits and Systems, 26:203–215, February 2007. [2] Xilinx. Virtex-6 FPGA, Configurable Logic Block User Guide, UG364 (v1.2). http://www.xilinx.com/support/documentation/user_guides/ug364.pdf, 2012. [3] F. de Dinechin and B. Pasca. Designing Custom Arithmetic Data Paths with FloPoCo. Design and Test of Computers, 28:18–27, 2011. [4] Florent de Dinechin. Tutorial held at HiPEAC’2013 “Building Custom Arithmetic Operators with the FloPoCo Generator”. http://perso.citi-lab.fr/fdedinec/recherche/2013-HiPEAC-Tutorial-FloPoCo/flopoco-tutorial.pdf, 2013. [5] Brunie N., de Dinechin F., Istoan M., Sergent G., Illyes K., and Popa B. Arithmetic core generation using bit heaps. In Proc. IEEE FPL ’2013, pages 1–8, Porto, Portugal, 2–4, 2013. [6] H. ParandehAfshar and P. Ienne. Measuring and Reducing the Performance Gap between Embedded and Soft Multipliers on FPGAs. In Proc. IEEE FPL ’2011, pages 225–231, Chania, Greece, 5–7, 2011. [7] F. de Dinechin and B. Pasca. Large multipliers with fewer DSP blocks. In Proc. IEEE FPL ’2009, pages 225–231, Chania, Greece, Aug 31-Sept 2 2011. 24 / 25
  • 62. Introduction and Background Multiplier Architectures Results Conclusion Thanks for your attention ! 25 / 25