SlideShare a Scribd company logo
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
OUTLINE
• Introduction:
– Problem statement
– Background
– Goals
• Co-processing units generation:
– Approach and baseline Multi-Dataflow Composer
– Template Interface Layer: hardware and automatic composition
• Performance assessment
– Use-case scenario
– Results
• Final remarks and future directions
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
OUTLINE: INTRODUCTION
• Introduction:
– Problem statement
– Background
– Goals
• Co-processing units generation:
– Approach and baseline Multi-Dataflow Composer
– Template Interface Layer: hardware and automatic composition
• Performance assessment
– Use-case scenario
– Results
• Final remarks and future directions
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
PROBLEM STATEMENT
• HIGH PERFORMANCES real time applications:
– Media players, video calling...
• UP-TO-DATE SOLUTIONS
– Support for the last audio/video codecs, file formats...
• MORE INTEGRATED FEATURES in mobile devices:
– MP3, Camera, Video, GPS...
• PORTABILITY
• LONG BATTERY LIFE
– Convenient form factor, affordable price...
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
PROBLEM STATEMENT
• HIGH PERFORMANCES real time applications:
– Media players, video calling...
• UP-TO-DATE SOLUTIONS
– Support for the last audio/video codecs, file formats...
• MORE INTEGRATED FEATURES in mobile devices:
– MP3, Camera, Video, GPS...
• PORTABILITY
• LONG BATTERY LIFE
– Convenient form factor, affordable price...
• DATAFLOW MODEL OF COMPUTATION
– Modularity and parallelism  EASIER INTEGRATION AND FAVOURED RE-USABILITY
• COARSE-GRAINED RECONFIGURABILITY
– Flexibility and resource sharing  MULTI-APPLICATION PORTABLE DEVICES
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
PROBLEM STATEMENT
• HIGH PERFORMANCES real time applications:
– Media players, video calling...
• UP-TO-DATE SOLUTIONS
– Support for the last audio/video codecs, file formats...
• MORE INTEGRATED FEATURES in mobile devices:
– MP3, Camera, Video, GPS...
• PORTABILITY
• LONG BATTERY LIFE
– Convenient form factor, affordable price...
• DATAFLOW MODEL OF COMPUTATION
– Modularity and parallelism  EASIER INTEGRATION AND FAVOURED RE-USABILITY
• COARSE-GRAINED RECONFIGURABILITY
– Flexibility and resource sharing  MULTI-APPLICATION PORTABLE DEVICES
Automated are fundamental to guarantee
. Dealing with
systems, in particular for ,
state of the art still lacks in providing a broadly accepted solution.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
BACKGROUND: CG RECONFIGURABILITY
• High flexibility bit-level reconfiguration
• Slow and memory expensive configuration phase
• Medium flexibility word-level reconfiguration
• Fast configuration phase
FG CG
Bit-level Word-level
Flexibility  
Reconf.
Speed  
Config.
Storage  
ASIC
GPP
DSP
RP
Performance
Flexibility
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
BACKGROUND: CG RECONFIGURABILITY
• High flexibility bit-level reconfiguration
• Slow and memory expensive configuration phase
• Medium flexibility word-level reconfiguration
• Fast configuration phase
FG CG
Bit-level Word-level
Flexibility  
Reconf.
Speed  
Config.
Storage  
ASIC
GPP
DSP
RP
Performance
Flexibility
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
BACKGROUND: CG RECONFIGURABILITY
• High flexibility bit-level reconfiguration
• Slow and memory expensive configuration phase
• Medium flexibility word-level reconfiguration
• Fast configuration phase
FG CG
Bit-level Word-level
Flexibility  
Reconf.
Speed  
Config.
Storage  
ASIC
GPP
DSP
RP
Performance
Flexibility
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
BACKGROUND: THE DATAFLOW MOC
• Directed graph of actors (functional units)
• Actors exchange tokens (data packets) through dedicated
channels
• Explicit the intrinsic application parallelism.
• Modularity favours model re-usability/adaptivity.
• I/O ports number
• I/O ports depth
• I/O ports burst of tokens
actions
state
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
BACKGROUND: THE DATAFLOW MOC
• Directed graph of actors (functional units)
• Actors exchange tokens (data packets) through dedicated
channels
• Explicit the intrinsic application parallelism.
• Modularity favours model re-usability/adaptivity.
• I/O ports number
• I/O ports depth
• I/O ports burst of tokens
actions
state
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
BACKGROUND: THE DATAFLOW MOC
• Directed graph of actors (functional units)
• Actors exchange tokens (data packets) through dedicated
channels
• Explicit the intrinsic application parallelism.
• Modularity favours model re-usability/adaptivity.
• I/O ports number
• I/O ports depth
• I/O ports burst of tokens
actions
state
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
BACKGROUND: THE DATAFLOW MOC
• Directed graph of actors (functional units)
• Actors exchange tokens (data packets) through dedicated
channels
• Explicit the intrinsic application parallelism.
• Modularity favours model re-usability/adaptivity.
• I/O ports number
• I/O ports depth
• I/O ports burst of tokens
actions
state
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
BACKGROUND: THE DATAFLOW MOC
• Directed graph of actors (functional units)
• Actors exchange tokens (data packets) through dedicated
channels
• Explicit the intrinsic application parallelism.
• Modularity favours model re-usability/adaptivity.
• I/O ports number
• I/O ports depth
• I/O ports burst of tokens
actions
state
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
BACKGROUND: THE DATAFLOW MOC
• Directed graph of actors (functional units)
• Actors exchange tokens (data packets) through dedicated
channels
• Explicit the intrinsic application parallelism.
• Modularity favours model re-usability/adaptivity.
• I/O ports number
• I/O ports depth
• I/O ports burst of tokens
actions
state
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
• High-level dataflow combination tool, front-end of the
Multi-Dataflow Composer tool.
• Concrete definition of the hardware template and of the
dataflow-based mapping strategy.
• Integration of the complete synthesis flow.
GOALS AND WORK EVOLUTION
• Implementation of a coarse-grained multi-standard
decoder.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
GOAL: deployment of
, contemporarily tackling and
.
• High-level dataflow combination tool, front-end of the
Multi-Dataflow Composer tool.
• Concrete definition of the hardware template and of the
dataflow-based mapping strategy.
• Integration of the complete synthesis flow.
GOALS AND WORK EVOLUTION
• Implementation of a coarse-grained multi-standard
decoder.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
OUTLINE: CO-PROCESSOR GENERATION
• Introduction:
– Problem statement
– Background
– Goals
• Co-processing units generation:
– Approach and baseline Multi-Dataflow Composer
– Template Interface Layer: hardware and automatic composition
• Performance assessment
– Use-case scenario
– Results
• Final remarks and future directions
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: BASIC APPROACH
1:1
2:1
Dataflow Description
Coarse Grained Reconfigurable
Hardware Platform
Dataflow Descriptions
Coarse Grained Hardware Platform
parser
crc32
inflate
input
output
parser
crc32
inflate
input
output
parser
crc32_x
inflate
input
output
parser
crc32
inflate
input
output
parser
crc32
inflate
input output
crc32_x
SBox
SBox
SBox
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: THE TOOL
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: THE TOOL
P
C
I
input
output
P
CX
I
input
output
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: THE TOOL
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: THE TOOL
P
C
I
input output
CX
S1
S2
S0
SBox D1 D2
S0 0 1
S1 0 1
S2 0 1
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: THE TOOL
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: THE TOOL
parser
crc32
inflate
input output
crc32_x
SBox
SBox
SBox
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: AUTOMATIC GENERATION
P
C
I
input output
CX
S1
S2
S0
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: AUTOMATIC GENERATION
P
C
I
input output
CX
S1
S2
S0
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: AUTOMATIC GENERATION
P
C
I
input output
CX
S1
S2
S0
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: AUTOMATIC GENERATION
P
C
I
input output
CX
S1
S2
S0
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: AUTOMATIC GENERATION
P
C
I
input output
CX
S1
S2
S0
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: AUTOMATIC GENERATION
P
C
I
input output
CX
S1
S2
S0
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
MDC: AUTOMATIC GENERATION
P
C
I
input output
CX
S1
S2
S0
[1] F. Palumbo, N. Carta, D. Pani, P. Meloni and L. Raffo, The multi-dataflow composer tool: generation of on-the-fly
reconfigurable platforms, Journal of Real-Time Image Processing, vol. 9, no. 1, pp 233-249, 2012.
[2] N. Carta, C. Sau, D. Pani, F. Palumbo and L. Raffo, A Coarse-Grained Reconfigurable Approach for Low-Power
Spike Sorting Architectures, IEEE/EMBS Conference on Neural Engineering, 2013.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
THE TEMPLATE INTERFACE LAYER (TIL)
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
THE TEMPLATE INTERFACE LAYER (TIL)
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
THE TEMPLATE INTERFACE LAYER (TIL)
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
THE TEMPLATE INTERFACE LAYER (TIL)
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
THE TEMPLATE INTERFACE LAYER (TIL)
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
THE TEMPLATE INTERFACE LAYER (TIL)
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
THE TEMPLATE INTERFACE LAYER (TIL)
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL FRONT-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL FRONT-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL FRONT-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL FRONT-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL FRONT-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL FRONT-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL FRONT-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL BACK-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL BACK-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL BACK-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL BACK-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL BACK-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL BACK-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL BACK-END
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL:AUTOMATIC GENERATION
XDF
MDCTOOL
XDFPARSER
XDF TXT
TILCOMPOSITION
HDL
HDL
CG-DP
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL:AUTOMATIC GENERATION
XDF
MDCTOOL
XDF
in1
out2
out1
in1
in2
in3
out2
in1
in2
in3
out2
out1
HDL
CG-DP
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
TIL:AUTOMATIC GENERATION
XDF
MDCTOOL
XDFPARSER
XDF TXT
in1
in2
in3
out2
out1
I ports number = 3
O ports number = 2
I/O ports depth = X
I/O ports burst of tokens = N
HDL
CG-DP
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
HDL
CG-DP
TIL:AUTOMATIC GENERATION
XDF
MDCTOOL
XDFPARSER
XDF TXT
TILCOMPOSITION
HDL
I ports number = 3
I ports number = 2
I/O ports depth = X
I/O ports burst of tokens = N
MDC-BASED
COARSE-GRAINED
RECONFIGURABLE
DATAPATH
LOCAL
MEM.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
HDL
CG-DP
TIL:AUTOMATIC GENERATION
XDF
MDCTOOL
XDFPARSER
XDF TXT
TILCOMPOSITION
HDL
resource
number of resource instances
type
3 inputs
2 outputs
N inputs
M outputs
register 12 2+N*2+M*2 port-dependent
counter 6 1+N+M port-dependent
mux 2x1 4 4 extendable
mux Nx1 2 2 extendable
mux Mx1 3 3 extendable
demux 1xN 3 3 extendable
demux 1xM 3 3 extendable
FIFO 2 M port-dependent
port selector 2 2 extendable
addr generator 2 2 extendable
FSM 2 2 fixed
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
OUTLINE: PERFORMANCE ASSESSMENT
• Introduction:
– Problem statement
– Background
– Goals
• Co-processing units generation:
– Approach and baseline Multi-Dataflow Composer
– Template Interface Layer: hardware and automatic composition
• Performance assessment
– Use-case scenario
– Results
• Final remarks and future directions
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: TIL ADAPTIVITY
number of I/O ports
(value=M=N)
resource (% on available)
Slice Regs
(207360)
Slice LUTs
(207360)
BUFGs
(32)
BRAMs
(288)
frequency
[MHz]
1 153 (0,1) 277 (0,1) 1 (3,1) 65 (22,6) 243,8
2 261 (0,1) 430 (0,2) 1 (3,1) 65 (22,6) 243,8
4 475 (0,2) 751 (0,4) 1 (3,1) 65 (22,6) 239,0
8 901 (0,4) 1558 (0,8) 1 (3,1) 65 (22,6) 208,9
16 1757 (0,9) 2760 (1,3) 1 (3,1) 65 (22,6) 197,6
32 4353 (1,7) 5339 (2,6) 2 (6,3) 65 (22,6) 158,4
Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only
the co-processing unit without any coarse-grained reconfigurable datapath has been considered.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: TIL ADAPTIVITY
number of I/O ports
(value=M=N)
resource (% on available)
Slice Regs
(207360)
Slice LUTs
(207360)
BUFGs
(32)
BRAMs
(288)
frequency
[MHz]
1 153 (0,1) 277 (0,1) 1 (3,1) 65 (22,6) 243,8
2 261 (0,1) 430 (0,2) 1 (3,1) 65 (22,6) 243,8
4 475 (0,2) 751 (0,4) 1 (3,1) 65 (22,6) 239,0
8 901 (0,4) 1558 (0,8) 1 (3,1) 65 (22,6) 208,9
16 1757 (0,9) 2760 (1,3) 1 (3,1) 65 (22,6) 197,6
32 4353 (1,7) 5339 (2,6) 2 (6,3) 65 (22,6) 158,4
Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only
the co-processing unit without any coarse-grained reconfigurable datapath has been considered.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: USE CASE
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: USE CASE
kernel
(ID)
sorter(1)
max_min(2)
rgb2ycc(3)
ycc2rgb(4)
abs(5)
corr(6)
sbwlabel(7)
chgb(8)
cubic_conv(9)
median(10)
cubic(11)
filter(12)
clip_f(13)
inner(14)
mdiv(15)
nbit(16)
sign(17)
smear(18)
rgb2yuv(19)
yuv2rgb(20)
antialiasing x x x x x x
zoom x x x x x x x
motion
estimation
x x
deblocking
deringing
x x x x x x x x
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: USE CASE
kernel
(ID)
sorter(1)
max_min(2)
rgb2ycc(3)
ycc2rgb(4)
abs(5)
corr(6)
sbwlabel(7)
chgb(8)
cubic_conv(9)
median(10)
cubic(11)
filter(12)
clip_f(13)
inner(14)
mdiv(15)
nbit(16)
sign(17)
smear(18)
rgb2yuv(19)
yuv2rgb(20)
antialiasing x x x x x x
zoom x x x x x x x
motion
estimation
x x
deblocking
deringing
x x x x x x x x
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: USE CASE
kernel
(ID)
sorter(1)
max_min(2)
rgb2ycc(3)
ycc2rgb(4)
abs(5)
corr(6)
sbwlabel(7)
chgb(8)
cubic_conv(9)
median(10)
cubic(11)
filter(12)
clip_f(13)
inner(14)
mdiv(15)
nbit(16)
sign(17)
smear(18)
rgb2yuv(19)
yuv2rgb(20)
antialiasing x x x x x x
zoom x x x x x x x
motion
estimation
x x
deblocking
deringing
x x x x x x x x
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: PERFOMANCES (1)
1,E+00
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+07
qsort
min_max
rgb2ycc
ycc2rgb
abs
corr
sbwlabel
chgb
cubic_conv
median
cubic
filter
clip_f
inner
mdiv
nbit
sign
smear
rgb2yuv
yuv2rgb
ExecutionLatency[clockticks]
proc +
copr
proc
Results have been retrieved running the kernels in the targeted Virtex 5 330 FPGA board at an operating frequency
of 125 MHz for the host processor and of 65 MHz (fixed by the CG reconfigurable datapath) for the co-processor.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: PERFOMANCES (1)
1,E+00
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+07
qsort
min_max
rgb2ycc
ycc2rgb
abs
corr
sbwlabel
chgb
cubic_conv
median
cubic
filter
clip_f
inner
mdiv
nbit
sign
smear
rgb2yuv
yuv2rgb
ExecutionLatency[clockticks]
proc +
copr
proc
Results have been retrieved running the kernels in the targeted Virtex 5 330 FPGA board at an operating frequency
of 125 MHz for the host processor and of 65 MHz (fixed by the CG reconfigurable datapath) for the co-processor.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: PERFOMANCES (1)
1,E+00
1,E+01
1,E+02
1,E+03
1,E+04
1,E+05
1,E+06
1,E+07
qsort
min_max
rgb2ycc
ycc2rgb
abs
corr
sbwlabel
chgb
cubic_conv
median
cubic
filter
clip_f
inner
mdiv
nbit
sign
smear
rgb2yuv
yuv2rgb
ExecutionLatency[clockticks]
proc +
copr
proc
Results have been retrieved running the kernels in the targeted Virtex 5 330 FPGA board at an operating frequency
of 125 MHz for the host processor and of 65 MHz (fixed by the CG reconfigurable datapath) for the co-processor.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
1,E+07
1,E+08
1,E+09
1,E+10
antialiasing zoom motion estimation deblocking/deringing
ExecutionLatency[μs]
proc +
copr
proc
ACHIEVED RESULTS: PERFOMANCES (2)
Results have been retrieved running the applications in the targeted Virtex 5 330 FPGA board at an operating
frequency of 125 MHz for the host processor and of 65 MHz (fixed by the CG reconfigurable datapath) for the co-
processor.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
1,E+07
1,E+08
1,E+09
1,E+10
antialiasing zoom motion estimation deblocking/deringing
ExecutionLatency[μs]
proc +
copr
proc
ACHIEVED RESULTS: PERFOMANCES (2)
Results have been retrieved running the applications in the targeted Virtex 5 330 FPGA board at an operating
frequency of 125 MHz for the host processor and of 65 MHz (fixed by the CG reconfigurable datapath) for the co-
processor.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: COMPARISON (1)
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: COMPARISON (1)
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: COMPARISON (1)
[1]
[1] F. Palumbo, N. Carta, D. Pani, P. Meloni and L. Raffo, The multi-dataflow composer tool: generation of on-the-fly
reconfigurable platforms, Journal of Real-Time Image Processing, vol. 9, no. 1, pp 233-249, 2012.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: COMPARISON (1)
[1]
[1] F. Palumbo, N. Carta, D. Pani, P. Meloni and L. Raffo, The multi-dataflow composer tool: generation of on-the-fly
reconfigurable platforms, Journal of Real-Time Image Processing, vol. 9, no. 1, pp 233-249, 2012.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: COMPARISON (2)
resource (% on available)
Slice Regs
(207360)
Slice LUTs
(207360)
BUFGs
(32)
BRAMs
(288)
frequency
[MHz]
custom_copr 352 (0.2) 1041 (0.5) 1 (3.1) 8 (2.8) 155.1
auto_copr 163 (0.1) 372 (0.2) 1 (3.1) 4 (1.4) 226.7
auto vs. custom -53.7 -68.6 0.0 -50.0 +46.2
Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only
the co-processing unit without any coarse-grained reconfigurable datapath has been considered.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: COMPARISON (2)
resource (% on available)
Slice Regs
(207360)
Slice LUTs
(207360)
BUFGs
(32)
BRAMs
(288)
frequency
[MHz]
custom_copr 352 (0.2) 1041 (0.5) 1 (3.1) 8 (2.8) 155.1
auto_copr 163 (0.1) 372 (0.2) 1 (3.1) 4 (1.4) 226.7
auto vs. custom -53.7 -68.6 0.0 -50.0 +46.2
Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only
the co-processing unit without any coarse-grained reconfigurable datapath has been considered.
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: COMPARISON (2)
resource (% on available)
Slice Regs
(207360)
Slice LUTs
(207360)
BUFGs
(32)
BRAMs
(288)
frequency
[MHz]
custom_copr 352 (0.2) 1041 (0.5) 1 (3.1) 8 (2.8) 155.1
auto_copr 163 (0.1) 372 (0.2) 1 (3.1) 4 (1.4) 226.7
auto vs. custom -53.7 -68.6 0.0 -50.0 +46.2
Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only
the co-processing unit without any coarse-grained reconfigurable datapath has been considered.
custom_copr auto_copr
packet size 1 4 16 1 4 16
loading [# cycles] 3 6 18 3 9 33
loading [μs] @maxf 0.19 0.39 1.16 0.13 0.40 1.46
storing [# cycles] 1 - - 1 - -
storing [μs] @maxf 0.06 - - 0.04 - -
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: COMPARISON (2)
resource (% on available)
Slice Regs
(207360)
Slice LUTs
(207360)
BUFGs
(32)
BRAMs
(288)
frequency
[MHz]
custom_copr 352 (0.2) 1041 (0.5) 1 (3.1) 8 (2.8) 155.1
auto_copr 163 (0.1) 372 (0.2) 1 (3.1) 4 (1.4) 226.7
auto vs. custom -53.7 -68.6 0.0 -50.0 +46.2
Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only
the co-processing unit without any coarse-grained reconfigurable datapath has been considered.
custom_copr auto_copr
packet size 1 4 16 1 4 16
loading [# cycles] 3 6 18 3 9 33
loading [μs] @maxf 0.19 0.39 1.16 0.13 0.40 1.46
storing [# cycles] 1 - - 1 - -
storing [μs] @maxf 0.06 - - 0.04 - -
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
ACHIEVED RESULTS: COMPARISON (2)
resource (% on available)
Slice Regs
(207360)
Slice LUTs
(207360)
BUFGs
(32)
BRAMs
(288)
frequency
[MHz]
custom_copr 352 (0.2) 1041 (0.5) 1 (3.1) 8 (2.8) 155.1
auto_copr 163 (0.1) 372 (0.2) 1 (3.1) 4 (1.4) 226.7
auto vs. custom -53.7 -68.6 0.0 -50.0 +46.2
Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only
the co-processing unit without any coarse-grained reconfigurable datapath has been considered.
custom_copr auto_copr
packet size 1 4 16 1 4 16
loading [# cycles] 3 6 18 3 9 33
loading [μs] @maxf 0.19 0.39 1.16 0.13 0.40 1.46
storing [# cycles] 1 - - 1 - -
storing [μs] @maxf 0.06 - - 0.04 - -
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
OUTLINE: FINAL REMARKS
• Introduction:
– Problem statement
– Background
– Goals
• Co-processing units generation:
– Approach and baseline Multi-Dataflow Composer
– Template Interface Layer: hardware and automatic composition
• Performance assessment
– Use-case scenario
– Results
• Final remarks and future directions
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
FINAL REMARKS
• Automatic generation tools are needed to cut down time
to market of CG reconfigurable co-processors
• MDC has been developed within the RVC domain to:
– Implement coarse-grained reconfigurable datapaths but
lacks of an interface able to exploit the datapath as co-
processing unit in a complete system
• In this work we have:
– defined an adaptable interface for the MDC generated
datapaths
– integrated the generation and adaptation of this interface in the
MDC framework
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
FINAL REMARKS
• Automatic generation tools are needed to cut down time
to market of CG reconfigurable co-processors
• MDC has been developed within the RVC domain to:
– Implement coarse-grained reconfigurable datapaths but
lacks of an interface able to exploit the datapath as co-
processing unit in a complete system
• In this work we have:
– defined an adaptable interface for the MDC generated
datapaths
– integrated the generation and adaptation of this interface in the
MDC framework
• Future developments:
– Optimize loading
– Support to other kinds of communication schemes
– Automatic APIs library
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
FINAL REMARKS
• IEEE Xplore Digital Library
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumb
er=7115605
• RPCT Project Website:
http://sites.unica.it/rpct/references/
DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari
Reference: BibTex
@INPROCEEDINGS{SAU_Dasip2014,
author={C. Sau and F. Palumbo},
booktitle={Design and Architectures for Signal and Image Processing
(DASIP), 2014 Conference on},
title={Automatic generation of dataflow-based reconfigurable co-
processing units},
year={2014},
pages={1-8},
doi={10.1109/DASIP.2014.7115605},}

More Related Content

Similar to Automatic Generation of Dataflow-Based Reconfigurable Co-processing Units

The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform Composer
The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform ComposerThe Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform Composer
The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform ComposerMDC_UNICA
 
Reconfigurable Platform Composer Tool Project
Reconfigurable Platform Composer Tool ProjectReconfigurable Platform Composer Tool Project
Reconfigurable Platform Composer Tool ProjectMDC_UNICA
 
Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Blue BRIDGE
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...Big Data Value Association
 
Advanced Topics in OpenAPI: Added Value Services and Protection in the OpenTr...
Advanced Topics in OpenAPI: Added Value Services and Protection in the OpenTr...Advanced Topics in OpenAPI: Added Value Services and Protection in the OpenTr...
Advanced Topics in OpenAPI: Added Value Services and Protection in the OpenTr...🧑‍💻 Manuel Coppotelli
 
Cahill_Ben_Value_Through_LDT.pptx
Cahill_Ben_Value_Through_LDT.pptxCahill_Ben_Value_Through_LDT.pptx
Cahill_Ben_Value_Through_LDT.pptxFIWARE
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and CeremonyArchiver
 
XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorialmarpierc
 
Presto4U Overview @BAAC2014 in Riga
Presto4U Overview @BAAC2014 in RigaPresto4U Overview @BAAC2014 in Riga
Presto4U Overview @BAAC2014 in RigaMarco Rendina
 
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixt...
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixt...A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixt...
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixt...Nandana Mihindukulasooriya
 
Design and Experiment Platform for Industrial Wireless Systems
Design and Experiment Platform for Industrial Wireless SystemsDesign and Experiment Platform for Industrial Wireless Systems
Design and Experiment Platform for Industrial Wireless SystemsRyan
 
Multidimensional Data in the VO
Multidimensional Data in the VOMultidimensional Data in the VO
Multidimensional Data in the VOJose Enrique Ruiz
 
Eurostars MODELS Project, System modeling and design exploration of applicati...
Eurostars MODELS Project, System modeling and design exploration of applicati...Eurostars MODELS Project, System modeling and design exploration of applicati...
Eurostars MODELS Project, System modeling and design exploration of applicati...Alessandra Bagnato
 
MODELS, a unified environment for the design of system applications on parall...
MODELS, a unified environment for the design of system applications on parall...MODELS, a unified environment for the design of system applications on parall...
MODELS, a unified environment for the design of system applications on parall...OW2
 

Similar to Automatic Generation of Dataflow-Based Reconfigurable Co-processing Units (20)

The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform Composer
The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform ComposerThe Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform Composer
The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform Composer
 
Towards a Common Approach for Access to Digital Archival Records in Europe. A...
Towards a Common Approach for Access to Digital Archival Records in Europe. A...Towards a Common Approach for Access to Digital Archival Records in Europe. A...
Towards a Common Approach for Access to Digital Archival Records in Europe. A...
 
Reconfigurable Platform Composer Tool Project
Reconfigurable Platform Composer Tool ProjectReconfigurable Platform Composer Tool Project
Reconfigurable Platform Composer Tool Project
 
New Goals of PARES: Spanish Archives Web Portal
New Goals of PARES: Spanish Archives Web PortalNew Goals of PARES: Spanish Archives Web Portal
New Goals of PARES: Spanish Archives Web Portal
 
Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
2017 dagstuhl-nfv-rothenberg
2017 dagstuhl-nfv-rothenberg2017 dagstuhl-nfv-rothenberg
2017 dagstuhl-nfv-rothenberg
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
Advanced Topics in OpenAPI: Added Value Services and Protection in the OpenTr...
Advanced Topics in OpenAPI: Added Value Services and Protection in the OpenTr...Advanced Topics in OpenAPI: Added Value Services and Protection in the OpenTr...
Advanced Topics in OpenAPI: Added Value Services and Protection in the OpenTr...
 
Cahill_Ben_Value_Through_LDT.pptx
Cahill_Ben_Value_Through_LDT.pptxCahill_Ben_Value_Through_LDT.pptx
Cahill_Ben_Value_Through_LDT.pptx
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 
XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorial
 
Presto4U Overview @BAAC2014 in Riga
Presto4U Overview @BAAC2014 in RigaPresto4U Overview @BAAC2014 in Riga
Presto4U Overview @BAAC2014 in Riga
 
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixt...
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixt...A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixt...
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixt...
 
Design and Experiment Platform for Industrial Wireless Systems
Design and Experiment Platform for Industrial Wireless SystemsDesign and Experiment Platform for Industrial Wireless Systems
Design and Experiment Platform for Industrial Wireless Systems
 
Multidimensional Data in the VO
Multidimensional Data in the VOMultidimensional Data in the VO
Multidimensional Data in the VO
 
Eurostars MODELS Project, System modeling and design exploration of applicati...
Eurostars MODELS Project, System modeling and design exploration of applicati...Eurostars MODELS Project, System modeling and design exploration of applicati...
Eurostars MODELS Project, System modeling and design exploration of applicati...
 
MODELS, a unified environment for the design of system applications on parall...
MODELS, a unified environment for the design of system applications on parall...MODELS, a unified environment for the design of system applications on parall...
MODELS, a unified environment for the design of system applications on parall...
 
EuroHPC AI in DAPHNE
EuroHPC AI in DAPHNEEuroHPC AI in DAPHNE
EuroHPC AI in DAPHNE
 

More from MDC_UNICA

RVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer ToolRVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer ToolMDC_UNICA
 
A Coarse-Grained Reconfigurable Wavelet Denoiser Exploiting the Multi-Dataflo...
A Coarse-Grained Reconfigurable Wavelet Denoiser Exploiting the Multi-Dataflo...A Coarse-Grained Reconfigurable Wavelet Denoiser Exploiting the Multi-Dataflo...
A Coarse-Grained Reconfigurable Wavelet Denoiser Exploiting the Multi-Dataflo...MDC_UNICA
 
Power-Awarness in Coarse-Grained Reconfigurable Designs: a Dataflow Based Str...
Power-Awarness in Coarse-Grained Reconfigurable Designs: a Dataflow Based Str...Power-Awarness in Coarse-Grained Reconfigurable Designs: a Dataflow Based Str...
Power-Awarness in Coarse-Grained Reconfigurable Designs: a Dataflow Based Str...MDC_UNICA
 
Automated Power Gating Methodology for Dataflow-Based Reconfigurable Systems
Automated Power Gating Methodology for Dataflow-Based Reconfigurable SystemsAutomated Power Gating Methodology for Dataflow-Based Reconfigurable Systems
Automated Power Gating Methodology for Dataflow-Based Reconfigurable SystemsMDC_UNICA
 
Reconfigurable Coprocessors Synthesis in the MPEG-RVC Domain
Reconfigurable Coprocessors Synthesis in the MPEG-RVC DomainReconfigurable Coprocessors Synthesis in the MPEG-RVC Domain
Reconfigurable Coprocessors Synthesis in the MPEG-RVC DomainMDC_UNICA
 
Power Modelling for Saving Strategies in Coarse Grained Reconfigurable Systems
Power Modelling for Saving Strategies in Coarse Grained Reconfigurable SystemsPower Modelling for Saving Strategies in Coarse Grained Reconfigurable Systems
Power Modelling for Saving Strategies in Coarse Grained Reconfigurable SystemsMDC_UNICA
 
Adaptable AES Implementation with Power-Gating Support
Adaptable AES Implementation with Power-Gating SupportAdaptable AES Implementation with Power-Gating Support
Adaptable AES Implementation with Power-Gating SupportMDC_UNICA
 
Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems
Power and Clock Gating Modelling in Coarse Grained Reconfigurable SystemsPower and Clock Gating Modelling in Coarse Grained Reconfigurable Systems
Power and Clock Gating Modelling in Coarse Grained Reconfigurable SystemsMDC_UNICA
 

More from MDC_UNICA (8)

RVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer ToolRVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer Tool
 
A Coarse-Grained Reconfigurable Wavelet Denoiser Exploiting the Multi-Dataflo...
A Coarse-Grained Reconfigurable Wavelet Denoiser Exploiting the Multi-Dataflo...A Coarse-Grained Reconfigurable Wavelet Denoiser Exploiting the Multi-Dataflo...
A Coarse-Grained Reconfigurable Wavelet Denoiser Exploiting the Multi-Dataflo...
 
Power-Awarness in Coarse-Grained Reconfigurable Designs: a Dataflow Based Str...
Power-Awarness in Coarse-Grained Reconfigurable Designs: a Dataflow Based Str...Power-Awarness in Coarse-Grained Reconfigurable Designs: a Dataflow Based Str...
Power-Awarness in Coarse-Grained Reconfigurable Designs: a Dataflow Based Str...
 
Automated Power Gating Methodology for Dataflow-Based Reconfigurable Systems
Automated Power Gating Methodology for Dataflow-Based Reconfigurable SystemsAutomated Power Gating Methodology for Dataflow-Based Reconfigurable Systems
Automated Power Gating Methodology for Dataflow-Based Reconfigurable Systems
 
Reconfigurable Coprocessors Synthesis in the MPEG-RVC Domain
Reconfigurable Coprocessors Synthesis in the MPEG-RVC DomainReconfigurable Coprocessors Synthesis in the MPEG-RVC Domain
Reconfigurable Coprocessors Synthesis in the MPEG-RVC Domain
 
Power Modelling for Saving Strategies in Coarse Grained Reconfigurable Systems
Power Modelling for Saving Strategies in Coarse Grained Reconfigurable SystemsPower Modelling for Saving Strategies in Coarse Grained Reconfigurable Systems
Power Modelling for Saving Strategies in Coarse Grained Reconfigurable Systems
 
Adaptable AES Implementation with Power-Gating Support
Adaptable AES Implementation with Power-Gating SupportAdaptable AES Implementation with Power-Gating Support
Adaptable AES Implementation with Power-Gating Support
 
Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems
Power and Clock Gating Modelling in Coarse Grained Reconfigurable SystemsPower and Clock Gating Modelling in Coarse Grained Reconfigurable Systems
Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems
 

Recently uploaded

NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...Amil baba
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理eemet
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理kywwoyk
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理kywwoyk
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxArjunJain44
 
Memory compiler tutorial – TSMC 40nm technology
Memory compiler tutorial – TSMC 40nm technologyMemory compiler tutorial – TSMC 40nm technology
Memory compiler tutorial – TSMC 40nm technologyAhmed Abdelazeem
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...PinkySharma900491
 

Recently uploaded (7)

NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
 
Memory compiler tutorial – TSMC 40nm technology
Memory compiler tutorial – TSMC 40nm technologyMemory compiler tutorial – TSMC 40nm technology
Memory compiler tutorial – TSMC 40nm technology
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
 

Automatic Generation of Dataflow-Based Reconfigurable Co-processing Units

  • 1.
  • 2. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari OUTLINE • Introduction: – Problem statement – Background – Goals • Co-processing units generation: – Approach and baseline Multi-Dataflow Composer – Template Interface Layer: hardware and automatic composition • Performance assessment – Use-case scenario – Results • Final remarks and future directions
  • 3. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari OUTLINE: INTRODUCTION • Introduction: – Problem statement – Background – Goals • Co-processing units generation: – Approach and baseline Multi-Dataflow Composer – Template Interface Layer: hardware and automatic composition • Performance assessment – Use-case scenario – Results • Final remarks and future directions
  • 4. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari PROBLEM STATEMENT • HIGH PERFORMANCES real time applications: – Media players, video calling... • UP-TO-DATE SOLUTIONS – Support for the last audio/video codecs, file formats... • MORE INTEGRATED FEATURES in mobile devices: – MP3, Camera, Video, GPS... • PORTABILITY • LONG BATTERY LIFE – Convenient form factor, affordable price...
  • 5. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari PROBLEM STATEMENT • HIGH PERFORMANCES real time applications: – Media players, video calling... • UP-TO-DATE SOLUTIONS – Support for the last audio/video codecs, file formats... • MORE INTEGRATED FEATURES in mobile devices: – MP3, Camera, Video, GPS... • PORTABILITY • LONG BATTERY LIFE – Convenient form factor, affordable price... • DATAFLOW MODEL OF COMPUTATION – Modularity and parallelism  EASIER INTEGRATION AND FAVOURED RE-USABILITY • COARSE-GRAINED RECONFIGURABILITY – Flexibility and resource sharing  MULTI-APPLICATION PORTABLE DEVICES
  • 6. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari PROBLEM STATEMENT • HIGH PERFORMANCES real time applications: – Media players, video calling... • UP-TO-DATE SOLUTIONS – Support for the last audio/video codecs, file formats... • MORE INTEGRATED FEATURES in mobile devices: – MP3, Camera, Video, GPS... • PORTABILITY • LONG BATTERY LIFE – Convenient form factor, affordable price... • DATAFLOW MODEL OF COMPUTATION – Modularity and parallelism  EASIER INTEGRATION AND FAVOURED RE-USABILITY • COARSE-GRAINED RECONFIGURABILITY – Flexibility and resource sharing  MULTI-APPLICATION PORTABLE DEVICES Automated are fundamental to guarantee . Dealing with systems, in particular for , state of the art still lacks in providing a broadly accepted solution.
  • 7. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari BACKGROUND: CG RECONFIGURABILITY • High flexibility bit-level reconfiguration • Slow and memory expensive configuration phase • Medium flexibility word-level reconfiguration • Fast configuration phase FG CG Bit-level Word-level Flexibility   Reconf. Speed   Config. Storage   ASIC GPP DSP RP Performance Flexibility
  • 8. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari BACKGROUND: CG RECONFIGURABILITY • High flexibility bit-level reconfiguration • Slow and memory expensive configuration phase • Medium flexibility word-level reconfiguration • Fast configuration phase FG CG Bit-level Word-level Flexibility   Reconf. Speed   Config. Storage   ASIC GPP DSP RP Performance Flexibility
  • 9. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari BACKGROUND: CG RECONFIGURABILITY • High flexibility bit-level reconfiguration • Slow and memory expensive configuration phase • Medium flexibility word-level reconfiguration • Fast configuration phase FG CG Bit-level Word-level Flexibility   Reconf. Speed   Config. Storage   ASIC GPP DSP RP Performance Flexibility
  • 10. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari BACKGROUND: THE DATAFLOW MOC • Directed graph of actors (functional units) • Actors exchange tokens (data packets) through dedicated channels • Explicit the intrinsic application parallelism. • Modularity favours model re-usability/adaptivity. • I/O ports number • I/O ports depth • I/O ports burst of tokens actions state
  • 11. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari BACKGROUND: THE DATAFLOW MOC • Directed graph of actors (functional units) • Actors exchange tokens (data packets) through dedicated channels • Explicit the intrinsic application parallelism. • Modularity favours model re-usability/adaptivity. • I/O ports number • I/O ports depth • I/O ports burst of tokens actions state
  • 12. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari BACKGROUND: THE DATAFLOW MOC • Directed graph of actors (functional units) • Actors exchange tokens (data packets) through dedicated channels • Explicit the intrinsic application parallelism. • Modularity favours model re-usability/adaptivity. • I/O ports number • I/O ports depth • I/O ports burst of tokens actions state
  • 13. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari BACKGROUND: THE DATAFLOW MOC • Directed graph of actors (functional units) • Actors exchange tokens (data packets) through dedicated channels • Explicit the intrinsic application parallelism. • Modularity favours model re-usability/adaptivity. • I/O ports number • I/O ports depth • I/O ports burst of tokens actions state
  • 14. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari BACKGROUND: THE DATAFLOW MOC • Directed graph of actors (functional units) • Actors exchange tokens (data packets) through dedicated channels • Explicit the intrinsic application parallelism. • Modularity favours model re-usability/adaptivity. • I/O ports number • I/O ports depth • I/O ports burst of tokens actions state
  • 15. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari BACKGROUND: THE DATAFLOW MOC • Directed graph of actors (functional units) • Actors exchange tokens (data packets) through dedicated channels • Explicit the intrinsic application parallelism. • Modularity favours model re-usability/adaptivity. • I/O ports number • I/O ports depth • I/O ports burst of tokens actions state
  • 16. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari • High-level dataflow combination tool, front-end of the Multi-Dataflow Composer tool. • Concrete definition of the hardware template and of the dataflow-based mapping strategy. • Integration of the complete synthesis flow. GOALS AND WORK EVOLUTION • Implementation of a coarse-grained multi-standard decoder.
  • 17. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari GOAL: deployment of , contemporarily tackling and . • High-level dataflow combination tool, front-end of the Multi-Dataflow Composer tool. • Concrete definition of the hardware template and of the dataflow-based mapping strategy. • Integration of the complete synthesis flow. GOALS AND WORK EVOLUTION • Implementation of a coarse-grained multi-standard decoder.
  • 18. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari OUTLINE: CO-PROCESSOR GENERATION • Introduction: – Problem statement – Background – Goals • Co-processing units generation: – Approach and baseline Multi-Dataflow Composer – Template Interface Layer: hardware and automatic composition • Performance assessment – Use-case scenario – Results • Final remarks and future directions
  • 19. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: BASIC APPROACH 1:1 2:1 Dataflow Description Coarse Grained Reconfigurable Hardware Platform Dataflow Descriptions Coarse Grained Hardware Platform parser crc32 inflate input output parser crc32 inflate input output parser crc32_x inflate input output parser crc32 inflate input output parser crc32 inflate input output crc32_x SBox SBox SBox
  • 20. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: THE TOOL
  • 21. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: THE TOOL P C I input output P CX I input output
  • 22. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: THE TOOL
  • 23. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: THE TOOL P C I input output CX S1 S2 S0 SBox D1 D2 S0 0 1 S1 0 1 S2 0 1
  • 24. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: THE TOOL
  • 25. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: THE TOOL parser crc32 inflate input output crc32_x SBox SBox SBox
  • 26. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: AUTOMATIC GENERATION P C I input output CX S1 S2 S0
  • 27. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: AUTOMATIC GENERATION P C I input output CX S1 S2 S0
  • 28. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: AUTOMATIC GENERATION P C I input output CX S1 S2 S0
  • 29. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: AUTOMATIC GENERATION P C I input output CX S1 S2 S0
  • 30. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: AUTOMATIC GENERATION P C I input output CX S1 S2 S0
  • 31. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: AUTOMATIC GENERATION P C I input output CX S1 S2 S0
  • 32. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari MDC: AUTOMATIC GENERATION P C I input output CX S1 S2 S0 [1] F. Palumbo, N. Carta, D. Pani, P. Meloni and L. Raffo, The multi-dataflow composer tool: generation of on-the-fly reconfigurable platforms, Journal of Real-Time Image Processing, vol. 9, no. 1, pp 233-249, 2012. [2] N. Carta, C. Sau, D. Pani, F. Palumbo and L. Raffo, A Coarse-Grained Reconfigurable Approach for Low-Power Spike Sorting Architectures, IEEE/EMBS Conference on Neural Engineering, 2013.
  • 33. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari THE TEMPLATE INTERFACE LAYER (TIL)
  • 34. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari THE TEMPLATE INTERFACE LAYER (TIL)
  • 35. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari THE TEMPLATE INTERFACE LAYER (TIL)
  • 36. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari THE TEMPLATE INTERFACE LAYER (TIL)
  • 37. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari THE TEMPLATE INTERFACE LAYER (TIL)
  • 38. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari THE TEMPLATE INTERFACE LAYER (TIL)
  • 39. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari THE TEMPLATE INTERFACE LAYER (TIL)
  • 40. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL FRONT-END
  • 41. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL FRONT-END
  • 42. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL FRONT-END
  • 43. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL FRONT-END
  • 44. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL FRONT-END
  • 45. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL FRONT-END
  • 46. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL FRONT-END
  • 47. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL BACK-END
  • 48. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL BACK-END
  • 49. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL BACK-END
  • 50. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL BACK-END
  • 51. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL BACK-END
  • 52. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL BACK-END
  • 53. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL BACK-END
  • 54. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL:AUTOMATIC GENERATION XDF MDCTOOL XDFPARSER XDF TXT TILCOMPOSITION HDL HDL CG-DP
  • 55. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL:AUTOMATIC GENERATION XDF MDCTOOL XDF in1 out2 out1 in1 in2 in3 out2 in1 in2 in3 out2 out1 HDL CG-DP
  • 56. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari TIL:AUTOMATIC GENERATION XDF MDCTOOL XDFPARSER XDF TXT in1 in2 in3 out2 out1 I ports number = 3 O ports number = 2 I/O ports depth = X I/O ports burst of tokens = N HDL CG-DP
  • 57. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari HDL CG-DP TIL:AUTOMATIC GENERATION XDF MDCTOOL XDFPARSER XDF TXT TILCOMPOSITION HDL I ports number = 3 I ports number = 2 I/O ports depth = X I/O ports burst of tokens = N MDC-BASED COARSE-GRAINED RECONFIGURABLE DATAPATH LOCAL MEM.
  • 58. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari HDL CG-DP TIL:AUTOMATIC GENERATION XDF MDCTOOL XDFPARSER XDF TXT TILCOMPOSITION HDL resource number of resource instances type 3 inputs 2 outputs N inputs M outputs register 12 2+N*2+M*2 port-dependent counter 6 1+N+M port-dependent mux 2x1 4 4 extendable mux Nx1 2 2 extendable mux Mx1 3 3 extendable demux 1xN 3 3 extendable demux 1xM 3 3 extendable FIFO 2 M port-dependent port selector 2 2 extendable addr generator 2 2 extendable FSM 2 2 fixed
  • 59. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari OUTLINE: PERFORMANCE ASSESSMENT • Introduction: – Problem statement – Background – Goals • Co-processing units generation: – Approach and baseline Multi-Dataflow Composer – Template Interface Layer: hardware and automatic composition • Performance assessment – Use-case scenario – Results • Final remarks and future directions
  • 60. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: TIL ADAPTIVITY number of I/O ports (value=M=N) resource (% on available) Slice Regs (207360) Slice LUTs (207360) BUFGs (32) BRAMs (288) frequency [MHz] 1 153 (0,1) 277 (0,1) 1 (3,1) 65 (22,6) 243,8 2 261 (0,1) 430 (0,2) 1 (3,1) 65 (22,6) 243,8 4 475 (0,2) 751 (0,4) 1 (3,1) 65 (22,6) 239,0 8 901 (0,4) 1558 (0,8) 1 (3,1) 65 (22,6) 208,9 16 1757 (0,9) 2760 (1,3) 1 (3,1) 65 (22,6) 197,6 32 4353 (1,7) 5339 (2,6) 2 (6,3) 65 (22,6) 158,4 Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only the co-processing unit without any coarse-grained reconfigurable datapath has been considered.
  • 61. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: TIL ADAPTIVITY number of I/O ports (value=M=N) resource (% on available) Slice Regs (207360) Slice LUTs (207360) BUFGs (32) BRAMs (288) frequency [MHz] 1 153 (0,1) 277 (0,1) 1 (3,1) 65 (22,6) 243,8 2 261 (0,1) 430 (0,2) 1 (3,1) 65 (22,6) 243,8 4 475 (0,2) 751 (0,4) 1 (3,1) 65 (22,6) 239,0 8 901 (0,4) 1558 (0,8) 1 (3,1) 65 (22,6) 208,9 16 1757 (0,9) 2760 (1,3) 1 (3,1) 65 (22,6) 197,6 32 4353 (1,7) 5339 (2,6) 2 (6,3) 65 (22,6) 158,4 Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only the co-processing unit without any coarse-grained reconfigurable datapath has been considered.
  • 62. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: USE CASE
  • 63. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: USE CASE kernel (ID) sorter(1) max_min(2) rgb2ycc(3) ycc2rgb(4) abs(5) corr(6) sbwlabel(7) chgb(8) cubic_conv(9) median(10) cubic(11) filter(12) clip_f(13) inner(14) mdiv(15) nbit(16) sign(17) smear(18) rgb2yuv(19) yuv2rgb(20) antialiasing x x x x x x zoom x x x x x x x motion estimation x x deblocking deringing x x x x x x x x
  • 64. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: USE CASE kernel (ID) sorter(1) max_min(2) rgb2ycc(3) ycc2rgb(4) abs(5) corr(6) sbwlabel(7) chgb(8) cubic_conv(9) median(10) cubic(11) filter(12) clip_f(13) inner(14) mdiv(15) nbit(16) sign(17) smear(18) rgb2yuv(19) yuv2rgb(20) antialiasing x x x x x x zoom x x x x x x x motion estimation x x deblocking deringing x x x x x x x x
  • 65. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: USE CASE kernel (ID) sorter(1) max_min(2) rgb2ycc(3) ycc2rgb(4) abs(5) corr(6) sbwlabel(7) chgb(8) cubic_conv(9) median(10) cubic(11) filter(12) clip_f(13) inner(14) mdiv(15) nbit(16) sign(17) smear(18) rgb2yuv(19) yuv2rgb(20) antialiasing x x x x x x zoom x x x x x x x motion estimation x x deblocking deringing x x x x x x x x
  • 66. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: PERFOMANCES (1) 1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05 1,E+06 1,E+07 qsort min_max rgb2ycc ycc2rgb abs corr sbwlabel chgb cubic_conv median cubic filter clip_f inner mdiv nbit sign smear rgb2yuv yuv2rgb ExecutionLatency[clockticks] proc + copr proc Results have been retrieved running the kernels in the targeted Virtex 5 330 FPGA board at an operating frequency of 125 MHz for the host processor and of 65 MHz (fixed by the CG reconfigurable datapath) for the co-processor.
  • 67. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: PERFOMANCES (1) 1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05 1,E+06 1,E+07 qsort min_max rgb2ycc ycc2rgb abs corr sbwlabel chgb cubic_conv median cubic filter clip_f inner mdiv nbit sign smear rgb2yuv yuv2rgb ExecutionLatency[clockticks] proc + copr proc Results have been retrieved running the kernels in the targeted Virtex 5 330 FPGA board at an operating frequency of 125 MHz for the host processor and of 65 MHz (fixed by the CG reconfigurable datapath) for the co-processor.
  • 68. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: PERFOMANCES (1) 1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05 1,E+06 1,E+07 qsort min_max rgb2ycc ycc2rgb abs corr sbwlabel chgb cubic_conv median cubic filter clip_f inner mdiv nbit sign smear rgb2yuv yuv2rgb ExecutionLatency[clockticks] proc + copr proc Results have been retrieved running the kernels in the targeted Virtex 5 330 FPGA board at an operating frequency of 125 MHz for the host processor and of 65 MHz (fixed by the CG reconfigurable datapath) for the co-processor.
  • 69. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari 1,E+07 1,E+08 1,E+09 1,E+10 antialiasing zoom motion estimation deblocking/deringing ExecutionLatency[μs] proc + copr proc ACHIEVED RESULTS: PERFOMANCES (2) Results have been retrieved running the applications in the targeted Virtex 5 330 FPGA board at an operating frequency of 125 MHz for the host processor and of 65 MHz (fixed by the CG reconfigurable datapath) for the co- processor.
  • 70. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari 1,E+07 1,E+08 1,E+09 1,E+10 antialiasing zoom motion estimation deblocking/deringing ExecutionLatency[μs] proc + copr proc ACHIEVED RESULTS: PERFOMANCES (2) Results have been retrieved running the applications in the targeted Virtex 5 330 FPGA board at an operating frequency of 125 MHz for the host processor and of 65 MHz (fixed by the CG reconfigurable datapath) for the co- processor.
  • 71. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: COMPARISON (1)
  • 72. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: COMPARISON (1)
  • 73. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: COMPARISON (1) [1] [1] F. Palumbo, N. Carta, D. Pani, P. Meloni and L. Raffo, The multi-dataflow composer tool: generation of on-the-fly reconfigurable platforms, Journal of Real-Time Image Processing, vol. 9, no. 1, pp 233-249, 2012.
  • 74. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: COMPARISON (1) [1] [1] F. Palumbo, N. Carta, D. Pani, P. Meloni and L. Raffo, The multi-dataflow composer tool: generation of on-the-fly reconfigurable platforms, Journal of Real-Time Image Processing, vol. 9, no. 1, pp 233-249, 2012.
  • 75. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: COMPARISON (2) resource (% on available) Slice Regs (207360) Slice LUTs (207360) BUFGs (32) BRAMs (288) frequency [MHz] custom_copr 352 (0.2) 1041 (0.5) 1 (3.1) 8 (2.8) 155.1 auto_copr 163 (0.1) 372 (0.2) 1 (3.1) 4 (1.4) 226.7 auto vs. custom -53.7 -68.6 0.0 -50.0 +46.2 Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only the co-processing unit without any coarse-grained reconfigurable datapath has been considered.
  • 76. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: COMPARISON (2) resource (% on available) Slice Regs (207360) Slice LUTs (207360) BUFGs (32) BRAMs (288) frequency [MHz] custom_copr 352 (0.2) 1041 (0.5) 1 (3.1) 8 (2.8) 155.1 auto_copr 163 (0.1) 372 (0.2) 1 (3.1) 4 (1.4) 226.7 auto vs. custom -53.7 -68.6 0.0 -50.0 +46.2 Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only the co-processing unit without any coarse-grained reconfigurable datapath has been considered.
  • 77. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: COMPARISON (2) resource (% on available) Slice Regs (207360) Slice LUTs (207360) BUFGs (32) BRAMs (288) frequency [MHz] custom_copr 352 (0.2) 1041 (0.5) 1 (3.1) 8 (2.8) 155.1 auto_copr 163 (0.1) 372 (0.2) 1 (3.1) 4 (1.4) 226.7 auto vs. custom -53.7 -68.6 0.0 -50.0 +46.2 Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only the co-processing unit without any coarse-grained reconfigurable datapath has been considered. custom_copr auto_copr packet size 1 4 16 1 4 16 loading [# cycles] 3 6 18 3 9 33 loading [μs] @maxf 0.19 0.39 1.16 0.13 0.40 1.46 storing [# cycles] 1 - - 1 - - storing [μs] @maxf 0.06 - - 0.04 - -
  • 78. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: COMPARISON (2) resource (% on available) Slice Regs (207360) Slice LUTs (207360) BUFGs (32) BRAMs (288) frequency [MHz] custom_copr 352 (0.2) 1041 (0.5) 1 (3.1) 8 (2.8) 155.1 auto_copr 163 (0.1) 372 (0.2) 1 (3.1) 4 (1.4) 226.7 auto vs. custom -53.7 -68.6 0.0 -50.0 +46.2 Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only the co-processing unit without any coarse-grained reconfigurable datapath has been considered. custom_copr auto_copr packet size 1 4 16 1 4 16 loading [# cycles] 3 6 18 3 9 33 loading [μs] @maxf 0.19 0.39 1.16 0.13 0.40 1.46 storing [# cycles] 1 - - 1 - - storing [μs] @maxf 0.06 - - 0.04 - -
  • 79. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari ACHIEVED RESULTS: COMPARISON (2) resource (% on available) Slice Regs (207360) Slice LUTs (207360) BUFGs (32) BRAMs (288) frequency [MHz] custom_copr 352 (0.2) 1041 (0.5) 1 (3.1) 8 (2.8) 155.1 auto_copr 163 (0.1) 372 (0.2) 1 (3.1) 4 (1.4) 226.7 auto vs. custom -53.7 -68.6 0.0 -50.0 +46.2 Results have been retrieved through the Xilinx Synthesis Technology tool targeting a Virtex 5 330 FPGA board. Only the co-processing unit without any coarse-grained reconfigurable datapath has been considered. custom_copr auto_copr packet size 1 4 16 1 4 16 loading [# cycles] 3 6 18 3 9 33 loading [μs] @maxf 0.19 0.39 1.16 0.13 0.40 1.46 storing [# cycles] 1 - - 1 - - storing [μs] @maxf 0.06 - - 0.04 - -
  • 80. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari OUTLINE: FINAL REMARKS • Introduction: – Problem statement – Background – Goals • Co-processing units generation: – Approach and baseline Multi-Dataflow Composer – Template Interface Layer: hardware and automatic composition • Performance assessment – Use-case scenario – Results • Final remarks and future directions
  • 81. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari FINAL REMARKS • Automatic generation tools are needed to cut down time to market of CG reconfigurable co-processors • MDC has been developed within the RVC domain to: – Implement coarse-grained reconfigurable datapaths but lacks of an interface able to exploit the datapath as co- processing unit in a complete system • In this work we have: – defined an adaptable interface for the MDC generated datapaths – integrated the generation and adaptation of this interface in the MDC framework
  • 82. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari FINAL REMARKS • Automatic generation tools are needed to cut down time to market of CG reconfigurable co-processors • MDC has been developed within the RVC domain to: – Implement coarse-grained reconfigurable datapaths but lacks of an interface able to exploit the datapath as co- processing unit in a complete system • In this work we have: – defined an adaptable interface for the MDC generated datapaths – integrated the generation and adaptation of this interface in the MDC framework • Future developments: – Optimize loading – Support to other kinds of communication schemes – Automatic APIs library
  • 83.
  • 84. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari FINAL REMARKS • IEEE Xplore Digital Library http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumb er=7115605 • RPCT Project Website: http://sites.unica.it/rpct/references/
  • 85. DASIP 2014, October 8-10 , Madrid (ES) Palumbo Francesca, University of Sassari Reference: BibTex @INPROCEEDINGS{SAU_Dasip2014, author={C. Sau and F. Palumbo}, booktitle={Design and Architectures for Signal and Image Processing (DASIP), 2014 Conference on}, title={Automatic generation of dataflow-based reconfigurable co- processing units}, year={2014}, pages={1-8}, doi={10.1109/DASIP.2014.7115605},}