Image Pr Image Processing Application related with
VivIado ocessing Application related with
Vivado
Image Processing Application related
with Vivado
Dr.S.Shiyamala
Professor / ECE
Vel Tech Rangarajan Dr.Sagunthala R&D
Institute of Science and Technology
Chennai, TamilNadu.
FPGA
FPGA Architecture
FPGA Silicon View
FPGA – BRAM FEATURES
BRAM
Memory Types
Memory
Distributed
(MLUT-based)
Block RAM-based
(BRAM-based)
Inferred Instantiated
Memory
Manually Using Core Generator
FPGA Distributed
Memory
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
F5IN
BY
SR
S
Carry
&
Control
Logic
CIN
CLK
CE
SLICE
CLB Slice
16-bit SR
16 x 1 RAM
4-input LUT
The Design Warrior’s Guide to FPGAs
Devices, Tools, and Flows. ISBN 0750676043
Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
Xilinx Multipurpose LUT (MLUT)
16 x 1 ROM
(logic)
RAM16X1S
O
D
WE
WCLK
A0
A1
A2
A3
RAM32X1S
O
D
WE
WCLK
A0
A1
A2
A3
A4
RAM16X2S
O1
D0
WE
WCLK
A0
A1
A2
A3
D1
O0
=
=
LUT
LUT
or
LUT
RAM16X1D
SPO
D
WE
WCLK
A0
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
or
Distributed RAM
• CLB LUT configurable as
Distributed RAM
– An LUT equals 16x1 RAM
– Cascade LUTs to increase RAM
size
• Synchronous write
• Asynchronous read
– Can create a synchronous read
by using extra flip-flops
– Naturally, distributed RAM read
is asynchronous
• Two LUTs can make
– 32 x 1 single-port RAM
– 16 x 2 single-port RAM
– 16 x 1 dual-port RAM
FPGA Block RAM
Block RAM
Spartan-3
Dual-Port
Block RAM
Port
A
Port
B
Block RAM
• Most efficient memory implementation
– Dedicated blocks of memory
• Ideal for most memory requirements
– 4 to 104 memory blocks
• 18 kbits = 18,432 bits per block (16 k without parity bits)
– Use multiple blocks for larger memories
• Builds both single and true dual-port RAMs
• Synchronous write and read (different from distributed
RAM)
BRAM
• Block RAMs (or BRAM) stands for Block
Random Access Memory.
• Block RAMs are used for storing large amounts
of data inside of your FPGA.
• A Block RAM (sometimes called embedded
memory, or Embedded Block RAM (EBR)), is a
discrete part of an FPGA, meaning there are
only so many of them available on the chip.
FIFO BRAM CONFIGURATION
DATA WIDTH OF BRAM
Generate the bitstream
(write_bitstream), and open the
implemented design
Run the script to generate MMI
(Memory Mapped Info file
Run updatemem to initialize the
BRAM with MEM data
Test on Hardware
CORE Generator
CORE Generator
BRAM Program sample
• module BMDEMO(
input clk,
input en,
input rst
);
wire [7:0]a,b;
wire [8:0]c;
reg [5:0]addr;
wire [8:0]bout;
blk_mem_gen_0 b1(clk,1'b0,addr,8'b1,a);
blk_mem_gen_0 b2(clk,1'b0,addr,8'b1,b);
adder a1 (a,b,c);
blk_mem_gen_1 b11(clk,1'b1,addr,c,bout);
• always @(posedge clk or negedge rst)
begin
if(!rst)
addr = {{6'b1}};
else if(en)
addr=addr+1;
else
addr=addr;
end
endmodule
module adder(a,b,c);
input [7:0]a,b;
output [8:0] c;
assign c = (a+b);
endmodule
BRAM USAGE
• Designers are encouraged to examine their
Virtex and ZYBO FPGA designs for surplus
block RAM and to use these functions to
unburden the FPGA logic.
• For example, using block RAM as state
machines simplifies the design effort,
significantly reduces routing overhead and
power consumption, and achieves higher
performance.
Real Time Image Application - Example
• FPGA implementation of high accuracy, low
latency breast cancer diagnosis using YOLO
algorithm
• FPGA implementation of CCSDS standard
DWT based hyper spectral image
decompression
Example 1
• FPGA implementation of high accuracy, low
latency breast cancer diagnosis using YOLO
algorithm
General schematic diagram of
mammographic data analysis using
FPGA
Data Base
• Some of the publicly available databases of the
mammogram and their descriptions are as
follows:
• MIAS Mini Mammographic Database (mini-MIAS)
• Digital Database for Screening Mammography
(DDSM)
• Mammographic Image Database for Automated
Analysis (MIDAS)
• Breast Cancer Digital Repository (BCDR)
Schematic representation of overall
methodology
Example of breast cancer image grid
and bounding box prediction for YOLO
Xilinx ZynqUltraScale+ MPSoC ZCU104 evaluation
FPGA board
(Equipment Details)
• Xilinx ZynqUltraScale+
MPSoC ZCU104 evaluation
FPGA board is most suitable
for this application.
• Hyperspectral image feed and
store in the FPGA BRAM
(Block RAM) in the .coe file
format.
• Xilinx VIVADO have in build
IP (Intellectual Property)
blocks.
Xilinx Vivado based YOLO architecture
IP Integrator
Example 2
• FPGA implementation of CCSDS standard DWT
based hyper spectral image decompression
Objectives
To design the smooth
interoperability and
adoption of compression
for the hyperspectral
image.
To develop low-
complexity high-
throughput algorithms is
used for encoding in
onboard and decoding in
ground station
To design the ease
efficient implementation
on space qualified
hardware using FPGA
General schematic diagram of hyperspectral
image compression using FPGA
Schematic Representation of overall
Methodology
Methodology
• Key constraints of the hyperspectral image decompressions are the high
volume of remote sensing data, limited storage resources, limited
downlink bandwidth and dynamic adaptability.
• High density and high-performance reconfigurable FPGA is the best
solution to overcome these problems. Lossless and hyper spectral image
compression follows the recommended standards CCSDS 123.0-B-2
(Consultative Committee for Space Data Systems) and CCSDS 121 is
for normal image only.
• Using multiplexer, have a chance to choose the required CCSDS
standard by using commands. More over CCSDS 123 standard supports
different scan orders for prediction and encoding, Band-Interleaved-by-
Pixel (BIP), Band-Interleaved-by-Line (BIL), Band-SeQuential (BSQ).
• JPEG (JP2) is an image compression standard and coding system which
is followed in CCSDS 123. To compress the image, Haar Discrete
Wavelet Transform is applied on it.
• Compressed images are encoded using MAP encoder and digital data
are transmitted. Decompress the image in ground station using inverse
Haar wavelet transform, to reconstruct the image without error.
• To enhance the efficiency of inverse Haar DWT, have an idea to replace
Specifications
• Spectral range - 400-1000 nm
• Number of spectral bands : up to 220
• Spectral resolution: 3nm
• Spectral Pixels : 800px X scan length
• Standard Lens : 16mm(200 FOV)
• Frame rate : upto 50 frames / sec
• Weight : ~ 570g (including standard lens)
• Dimension : (14 cm x 7 cm x 7 cm)
• Data format : Hyperspectral cube (ENVI-BSQ), Color
image
• (BMP), Band image (BMP), ROI spectra (CSV format)
OCI™-FHR
Hyperspectral Camera
Compression Technique
Haar Wavelet Transform
• Good approximation properties : Enables
applications of wavelet methods to digital
images: compression and progressive
transmission.
• Efficient way to compress the smooth data
except in localized region.
• Easy to control wavelet properties.( Example:
Smoothness, better accuracy near sharp
gradients)
• Allows information to be encoded according to
levels of detail.
• This layering facilitates approximations at
various intermediate stages requiring less space.
MAP
(Maximum A
Posteriori)
decoder
General diagram for Convolutional encoder with
K = 5 and code rate ½
Thank you
THANK YOU

Dr.s.shiyamala fpga ppt

  • 1.
    Image Pr ImageProcessing Application related with VivIado ocessing Application related with Vivado Image Processing Application related with Vivado Dr.S.Shiyamala Professor / ECE Vel Tech Rangarajan Dr.Sagunthala R&D Institute of Science and Technology Chennai, TamilNadu.
  • 2.
  • 3.
  • 4.
  • 5.
    FPGA – BRAMFEATURES
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    16-bit SR 16 x1 RAM 4-input LUT The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) Xilinx Multipurpose LUT (MLUT) 16 x 1 ROM (logic)
  • 11.
    RAM16X1S O D WE WCLK A0 A1 A2 A3 RAM32X1S O D WE WCLK A0 A1 A2 A3 A4 RAM16X2S O1 D0 WE WCLK A0 A1 A2 A3 D1 O0 = = LUT LUT or LUT RAM16X1D SPO D WE WCLK A0 A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 or Distributed RAM •CLB LUT configurable as Distributed RAM – An LUT equals 16x1 RAM – Cascade LUTs to increase RAM size • Synchronous write • Asynchronous read – Can create a synchronous read by using extra flip-flops – Naturally, distributed RAM read is asynchronous • Two LUTs can make – 32 x 1 single-port RAM – 16 x 2 single-port RAM – 16 x 1 dual-port RAM
  • 12.
  • 13.
    Block RAM Spartan-3 Dual-Port Block RAM Port A Port B BlockRAM • Most efficient memory implementation – Dedicated blocks of memory • Ideal for most memory requirements – 4 to 104 memory blocks • 18 kbits = 18,432 bits per block (16 k without parity bits) – Use multiple blocks for larger memories • Builds both single and true dual-port RAMs • Synchronous write and read (different from distributed RAM)
  • 14.
    BRAM • Block RAMs(or BRAM) stands for Block Random Access Memory. • Block RAMs are used for storing large amounts of data inside of your FPGA. • A Block RAM (sometimes called embedded memory, or Embedded Block RAM (EBR)), is a discrete part of an FPGA, meaning there are only so many of them available on the chip.
  • 15.
  • 16.
  • 17.
    Generate the bitstream (write_bitstream),and open the implemented design Run the script to generate MMI (Memory Mapped Info file Run updatemem to initialize the BRAM with MEM data Test on Hardware
  • 18.
  • 19.
  • 20.
    BRAM Program sample •module BMDEMO( input clk, input en, input rst ); wire [7:0]a,b; wire [8:0]c; reg [5:0]addr; wire [8:0]bout; blk_mem_gen_0 b1(clk,1'b0,addr,8'b1,a); blk_mem_gen_0 b2(clk,1'b0,addr,8'b1,b); adder a1 (a,b,c); blk_mem_gen_1 b11(clk,1'b1,addr,c,bout);
  • 21.
    • always @(posedgeclk or negedge rst) begin if(!rst) addr = {{6'b1}}; else if(en) addr=addr+1; else addr=addr; end endmodule module adder(a,b,c); input [7:0]a,b; output [8:0] c; assign c = (a+b); endmodule
  • 28.
    BRAM USAGE • Designersare encouraged to examine their Virtex and ZYBO FPGA designs for surplus block RAM and to use these functions to unburden the FPGA logic. • For example, using block RAM as state machines simplifies the design effort, significantly reduces routing overhead and power consumption, and achieves higher performance.
  • 29.
    Real Time ImageApplication - Example • FPGA implementation of high accuracy, low latency breast cancer diagnosis using YOLO algorithm • FPGA implementation of CCSDS standard DWT based hyper spectral image decompression
  • 30.
    Example 1 • FPGAimplementation of high accuracy, low latency breast cancer diagnosis using YOLO algorithm
  • 31.
    General schematic diagramof mammographic data analysis using FPGA
  • 32.
    Data Base • Someof the publicly available databases of the mammogram and their descriptions are as follows: • MIAS Mini Mammographic Database (mini-MIAS) • Digital Database for Screening Mammography (DDSM) • Mammographic Image Database for Automated Analysis (MIDAS) • Breast Cancer Digital Repository (BCDR)
  • 33.
    Schematic representation ofoverall methodology
  • 34.
    Example of breastcancer image grid and bounding box prediction for YOLO
  • 35.
    Xilinx ZynqUltraScale+ MPSoCZCU104 evaluation FPGA board (Equipment Details) • Xilinx ZynqUltraScale+ MPSoC ZCU104 evaluation FPGA board is most suitable for this application. • Hyperspectral image feed and store in the FPGA BRAM (Block RAM) in the .coe file format. • Xilinx VIVADO have in build IP (Intellectual Property) blocks.
  • 36.
    Xilinx Vivado basedYOLO architecture
  • 37.
  • 38.
    Example 2 • FPGAimplementation of CCSDS standard DWT based hyper spectral image decompression
  • 39.
    Objectives To design thesmooth interoperability and adoption of compression for the hyperspectral image. To develop low- complexity high- throughput algorithms is used for encoding in onboard and decoding in ground station To design the ease efficient implementation on space qualified hardware using FPGA
  • 40.
    General schematic diagramof hyperspectral image compression using FPGA
  • 41.
    Schematic Representation ofoverall Methodology
  • 42.
    Methodology • Key constraintsof the hyperspectral image decompressions are the high volume of remote sensing data, limited storage resources, limited downlink bandwidth and dynamic adaptability. • High density and high-performance reconfigurable FPGA is the best solution to overcome these problems. Lossless and hyper spectral image compression follows the recommended standards CCSDS 123.0-B-2 (Consultative Committee for Space Data Systems) and CCSDS 121 is for normal image only. • Using multiplexer, have a chance to choose the required CCSDS standard by using commands. More over CCSDS 123 standard supports different scan orders for prediction and encoding, Band-Interleaved-by- Pixel (BIP), Band-Interleaved-by-Line (BIL), Band-SeQuential (BSQ). • JPEG (JP2) is an image compression standard and coding system which is followed in CCSDS 123. To compress the image, Haar Discrete Wavelet Transform is applied on it. • Compressed images are encoded using MAP encoder and digital data are transmitted. Decompress the image in ground station using inverse Haar wavelet transform, to reconstruct the image without error. • To enhance the efficiency of inverse Haar DWT, have an idea to replace
  • 43.
    Specifications • Spectral range- 400-1000 nm • Number of spectral bands : up to 220 • Spectral resolution: 3nm • Spectral Pixels : 800px X scan length • Standard Lens : 16mm(200 FOV) • Frame rate : upto 50 frames / sec • Weight : ~ 570g (including standard lens) • Dimension : (14 cm x 7 cm x 7 cm) • Data format : Hyperspectral cube (ENVI-BSQ), Color image • (BMP), Band image (BMP), ROI spectra (CSV format) OCI™-FHR Hyperspectral Camera
  • 44.
  • 45.
    Haar Wavelet Transform •Good approximation properties : Enables applications of wavelet methods to digital images: compression and progressive transmission. • Efficient way to compress the smooth data except in localized region. • Easy to control wavelet properties.( Example: Smoothness, better accuracy near sharp gradients) • Allows information to be encoded according to levels of detail. • This layering facilitates approximations at various intermediate stages requiring less space.
  • 46.
  • 47.
    General diagram forConvolutional encoder with K = 5 and code rate ½
  • 48.