INDIAN INSTITUTE OF INFORMATION TECHNOLOGY RANCHI
INDIAN INSTITUTE OF INFORMATION TECHNOLOGY RANCHI
Major Project Presentation
on
Custom Hardware Design on Xilinx PYNQ for Image Processing with AXI Integration
By
Rohan Tiwari(2021UG2027)
Under the supervision of
Dr. Shivang Tripathi
Department of Electronics and Communication Engineering
Indian Institute of Information Technology Ranchi
Ranchi- 835217
CONTENT
● PROBLEM STATEMENT
● INTRODUCTION
● LITERATURE SURVEY
● CHALLENGES & LIMITATIONS
● PROPOSED METHODOLOGY
● EXPERIMENTS
● RESULTS
● DISCUSSION
● CONCLUSION
● REFERENCES
1
PROBLEM STATEMENT
1. Real-time image processing requires high-throughput, low-latency solutions, which CPUs and GPUs fail to deliver
efficiently under embedded constraints.
2. Software-based methods are not scalable for high-resolution or high-frame-rate applications due to power and latency
limitations.
3. There's a need for scalable hardware accelerators for edge detection and blurring.
4. AXI protocol enables parallel control and data transfer in SoC systems
1
INTRODUCTION
1. FPGAs offer parallelism and flexibility essential for real-time vision tasks (e.g.,
medical imaging).
2. The Xilinx PYNQ platform bridges the gap between low-level hardware design and
high-level software control using Python..
3. AXI4-Stream protocols facilitate efficient command and data exchange between
Zynq PS and PL.
Design Goal:
The custom IP core performs convolution using Gaussian and Sobel
kernels. Hardware-software co-design accelerates development while
achieving performance goals.
2
ZYNQ PS AND PL
LITERATURE SURVEY
3
Work Contribution Relevance
Real Digital (CAPH Conv) Real-time FPGA 2D convolution, 103 FPS Basis for efficient line-buffer designs
PipeCNN (2016) CNN acceleration using OpenCL on FPGA Shows FPGA advantage in parallel
convolution
ZynqNet CNN on Zynq SoC with high resource usage Validates co-design on PYNQ-like platforms
IPPro FPGA-based image processor outperforms ARM cores Reinforces FPGA’s real-time advantages
PYNQ Framework Python-Jupyter FPGA control and dataflow prototyping Directly aligns with the project's methodology
CHALLENGES & LIMITATIONS
1. The design is only for B&W images. Complexity in handling multi-channel (RGB) image streams through AXI-Stream
interface.
2. Line buffer and convolution engine scaling for larger kernel sizes increases resource pressure.
3. High initial learning curve and toolchain overhead for Vivado and IP integration.
4. Limited on-chip memory (BRAM) can restrict large image/frame buffering.
5. Sometimes, while programming the FPGA, base address 0x100000 and others fail to receive data, preventing image
generation.
6. Ensuring AXI protocol timing, handshakes (TVALID/TREADY), and alignment with DMA operations.
4
PROPOSED METHODOLOGY
Modular RTL Design Approach:
• Custom Verilog IP designed to support 3×3 convolution with switchable Gaussian and Sobel filters.
• AXI4-Stream interface used for high-throughput data streaming between the External Memory and the IP.
• Python/Jupyter interface via PYNQ allows high-level runtime control of hardware.
• Vivado IP packager used to wrap Verilog logic into reusable AXI-compliant IP.
5
Design Structured Flow
PROPOSED METHODOLOGY
Top-Level Wrapper Integration:
● This architecture shows the interaction between the
Zynq Processing System (PS) and Programmable
Logic (PL), with AXI4-Stream for data. It highlights
the modular and scalable nature of the proposed
system.
6
ZYNQ PS with IP PL
PROPOSED METHODOLOGY
IP Design and Integration:
• Block diagram of the Verilog-designed IP core packaged using Vivado. It includes the Line Buffer and Convolution Engine,
interfaced with AXI for integration with the rest of the system.
• The Packaged IP is interfaced with DMA controller through AXI Interface using AXI Stream for efficient data transfer.
7
PROPOSED METHODOLOGY
8
Software Overlay:
• DMA control and data path configured via Xilinx Vitis, ensuring efficient memory-mapped access to image buffers.
• Software design follows a modular approach: control logic (Python) and DMA management (Vitis-generated drivers).
• The image Data which are the intensity values are obtained using the Python script and stored in the memory.
• Software design follows a modular approach: control logic (Python) and DMA management (Vitis-generated drivers)
EXPERIMENTS
Simulation Setup
Functional Simulation in Vivado:
The validation was conducted using the Vivado (XSim) environment to test the custom Verilog-based hardware modules.
Vitis-Based DMA Setup:
Xilinx Vitis was used to program and initialize the AXI-DMA controller for efficient data transfer between PS memory and PL.
Control logic was auto-generated for memory-mapped communication.
Hardware Deployment on PYNQ-Z2 Board:
The packaged IP and block design were synthesized and bitstream was loaded onto the FPGA. The board ran a Linux
environment supporting the PYNQ framework for high-level control.
Verification Pipeline:
Input images were streamed into the FPGA fabric, processed output was collected in DDR memory. Real-time edge detection and
blurring were validated live on the hardware.
9
EXPERIMENTS
The Line Buffer forms a sliding 3×3 window over the incoming pixel stream using shift registers, enabling hardware-efficient
convolution.
10
EXPERIMENTS
11
ACTUAL HARDWARE SETUP (PYNQ Z2 BOARD EXECUTING THE IP)
RESULTS
12
• 300 FPS achieved on 512×512 grayscale images (hardware).
• Software (OpenCV) achieves ~20–30 FPS for same operations.
• Hardware latency: 3–5 ms/frame vs software: 30–50 ms/frame.
• Resource Utilization: 6% LUTs, 1.5% FFs, 4.3% BRAM.
RESULTS
13
ORIGINAL IMAGE EDGE DETECTED IMAGE BLURRED IMAGE
DISCUSSION
● Real-time throughput with low resource footprint proves IP scalability.
● AXI4-Stream enables high-speed streaming with minimal software load.
● Pixel-accurate performance is validated against trusted software models.
● FPGA design uses minimal DSP slices due to optimized convolution.
14
MINIMAL ENERGY CONSUMPTION BY FPGA BOARD
CONCLUSION AND FUTURE WORK
CONCLUSION:
● Developed a complete FPGA-based image filter using
custom IP integrated with AXI Interfaces.
● Verified on board using Real Images with high
performance gains.
● Low BRAM/DSP usage validates future expandability
FUTURE SCOPE:
Add RGB support and larger kernels (e.g., 5×5, 7×7) using
extended line buffers. 5. Integrate remote image
acquisition/processing using Ethernet (LwIP stack)
15
IP CONFIDENCE LEVELS
REFERENCES
[1] Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, 3rd Edition, Pearson Education, ISBN: 978-0131687288. [2] Samir
Palnitkar, Verilog HDL: A Guide to Digital Design and Synthesis, 2nd Edition, Pearson Education, ISBN: 978-0130449115.
[2] Xilinx Documentation, "Building Custom AXI IP", [Online]. Available:
https://www.xilinx.com/support/documentation-navigation/design-hubs/dh0002-building-axi-ip-hub.html
[3] Xilinx PYNQ Documentation, "Getting Started with PYNQ", [Online]. Available:
https://pynq.readthedocs.io/en/latest/getting_started.html
[4] BLT Inc., "AXI Protocol Applications and Functionality", [Online]. Available:
https://bltinc.com/2025/02/06/axi-protocol-applications-and-functionality/
[5] Vipin K Menon, "Vipin K Menon - FPGA, Digital Design, and Verilog Tutorials", YouTube Channel, [Online].
Available: https://www.youtube.com/@Vipinkmenon
16
THANK YOU

Custom Hardware design for image processing.pptx

  • 1.
    INDIAN INSTITUTE OFINFORMATION TECHNOLOGY RANCHI INDIAN INSTITUTE OF INFORMATION TECHNOLOGY RANCHI Major Project Presentation on Custom Hardware Design on Xilinx PYNQ for Image Processing with AXI Integration By Rohan Tiwari(2021UG2027) Under the supervision of Dr. Shivang Tripathi Department of Electronics and Communication Engineering Indian Institute of Information Technology Ranchi Ranchi- 835217
  • 2.
    CONTENT ● PROBLEM STATEMENT ●INTRODUCTION ● LITERATURE SURVEY ● CHALLENGES & LIMITATIONS ● PROPOSED METHODOLOGY ● EXPERIMENTS ● RESULTS ● DISCUSSION ● CONCLUSION ● REFERENCES 1
  • 3.
    PROBLEM STATEMENT 1. Real-timeimage processing requires high-throughput, low-latency solutions, which CPUs and GPUs fail to deliver efficiently under embedded constraints. 2. Software-based methods are not scalable for high-resolution or high-frame-rate applications due to power and latency limitations. 3. There's a need for scalable hardware accelerators for edge detection and blurring. 4. AXI protocol enables parallel control and data transfer in SoC systems 1
  • 4.
    INTRODUCTION 1. FPGAs offerparallelism and flexibility essential for real-time vision tasks (e.g., medical imaging). 2. The Xilinx PYNQ platform bridges the gap between low-level hardware design and high-level software control using Python.. 3. AXI4-Stream protocols facilitate efficient command and data exchange between Zynq PS and PL. Design Goal: The custom IP core performs convolution using Gaussian and Sobel kernels. Hardware-software co-design accelerates development while achieving performance goals. 2 ZYNQ PS AND PL
  • 5.
    LITERATURE SURVEY 3 Work ContributionRelevance Real Digital (CAPH Conv) Real-time FPGA 2D convolution, 103 FPS Basis for efficient line-buffer designs PipeCNN (2016) CNN acceleration using OpenCL on FPGA Shows FPGA advantage in parallel convolution ZynqNet CNN on Zynq SoC with high resource usage Validates co-design on PYNQ-like platforms IPPro FPGA-based image processor outperforms ARM cores Reinforces FPGA’s real-time advantages PYNQ Framework Python-Jupyter FPGA control and dataflow prototyping Directly aligns with the project's methodology
  • 6.
    CHALLENGES & LIMITATIONS 1.The design is only for B&W images. Complexity in handling multi-channel (RGB) image streams through AXI-Stream interface. 2. Line buffer and convolution engine scaling for larger kernel sizes increases resource pressure. 3. High initial learning curve and toolchain overhead for Vivado and IP integration. 4. Limited on-chip memory (BRAM) can restrict large image/frame buffering. 5. Sometimes, while programming the FPGA, base address 0x100000 and others fail to receive data, preventing image generation. 6. Ensuring AXI protocol timing, handshakes (TVALID/TREADY), and alignment with DMA operations. 4
  • 7.
    PROPOSED METHODOLOGY Modular RTLDesign Approach: • Custom Verilog IP designed to support 3×3 convolution with switchable Gaussian and Sobel filters. • AXI4-Stream interface used for high-throughput data streaming between the External Memory and the IP. • Python/Jupyter interface via PYNQ allows high-level runtime control of hardware. • Vivado IP packager used to wrap Verilog logic into reusable AXI-compliant IP. 5 Design Structured Flow
  • 8.
    PROPOSED METHODOLOGY Top-Level WrapperIntegration: ● This architecture shows the interaction between the Zynq Processing System (PS) and Programmable Logic (PL), with AXI4-Stream for data. It highlights the modular and scalable nature of the proposed system. 6 ZYNQ PS with IP PL
  • 9.
    PROPOSED METHODOLOGY IP Designand Integration: • Block diagram of the Verilog-designed IP core packaged using Vivado. It includes the Line Buffer and Convolution Engine, interfaced with AXI for integration with the rest of the system. • The Packaged IP is interfaced with DMA controller through AXI Interface using AXI Stream for efficient data transfer. 7
  • 10.
    PROPOSED METHODOLOGY 8 Software Overlay: •DMA control and data path configured via Xilinx Vitis, ensuring efficient memory-mapped access to image buffers. • Software design follows a modular approach: control logic (Python) and DMA management (Vitis-generated drivers). • The image Data which are the intensity values are obtained using the Python script and stored in the memory. • Software design follows a modular approach: control logic (Python) and DMA management (Vitis-generated drivers)
  • 11.
    EXPERIMENTS Simulation Setup Functional Simulationin Vivado: The validation was conducted using the Vivado (XSim) environment to test the custom Verilog-based hardware modules. Vitis-Based DMA Setup: Xilinx Vitis was used to program and initialize the AXI-DMA controller for efficient data transfer between PS memory and PL. Control logic was auto-generated for memory-mapped communication. Hardware Deployment on PYNQ-Z2 Board: The packaged IP and block design were synthesized and bitstream was loaded onto the FPGA. The board ran a Linux environment supporting the PYNQ framework for high-level control. Verification Pipeline: Input images were streamed into the FPGA fabric, processed output was collected in DDR memory. Real-time edge detection and blurring were validated live on the hardware. 9
  • 12.
    EXPERIMENTS The Line Bufferforms a sliding 3×3 window over the incoming pixel stream using shift registers, enabling hardware-efficient convolution. 10
  • 13.
    EXPERIMENTS 11 ACTUAL HARDWARE SETUP(PYNQ Z2 BOARD EXECUTING THE IP)
  • 14.
    RESULTS 12 • 300 FPSachieved on 512×512 grayscale images (hardware). • Software (OpenCV) achieves ~20–30 FPS for same operations. • Hardware latency: 3–5 ms/frame vs software: 30–50 ms/frame. • Resource Utilization: 6% LUTs, 1.5% FFs, 4.3% BRAM.
  • 15.
    RESULTS 13 ORIGINAL IMAGE EDGEDETECTED IMAGE BLURRED IMAGE
  • 16.
    DISCUSSION ● Real-time throughputwith low resource footprint proves IP scalability. ● AXI4-Stream enables high-speed streaming with minimal software load. ● Pixel-accurate performance is validated against trusted software models. ● FPGA design uses minimal DSP slices due to optimized convolution. 14 MINIMAL ENERGY CONSUMPTION BY FPGA BOARD
  • 17.
    CONCLUSION AND FUTUREWORK CONCLUSION: ● Developed a complete FPGA-based image filter using custom IP integrated with AXI Interfaces. ● Verified on board using Real Images with high performance gains. ● Low BRAM/DSP usage validates future expandability FUTURE SCOPE: Add RGB support and larger kernels (e.g., 5×5, 7×7) using extended line buffers. 5. Integrate remote image acquisition/processing using Ethernet (LwIP stack) 15 IP CONFIDENCE LEVELS
  • 18.
    REFERENCES [1] Rafael C.Gonzalez and Richard E. Woods, Digital Image Processing, 3rd Edition, Pearson Education, ISBN: 978-0131687288. [2] Samir Palnitkar, Verilog HDL: A Guide to Digital Design and Synthesis, 2nd Edition, Pearson Education, ISBN: 978-0130449115. [2] Xilinx Documentation, "Building Custom AXI IP", [Online]. Available: https://www.xilinx.com/support/documentation-navigation/design-hubs/dh0002-building-axi-ip-hub.html [3] Xilinx PYNQ Documentation, "Getting Started with PYNQ", [Online]. Available: https://pynq.readthedocs.io/en/latest/getting_started.html [4] BLT Inc., "AXI Protocol Applications and Functionality", [Online]. Available: https://bltinc.com/2025/02/06/axi-protocol-applications-and-functionality/ [5] Vipin K Menon, "Vipin K Menon - FPGA, Digital Design, and Verilog Tutorials", YouTube Channel, [Online]. Available: https://www.youtube.com/@Vipinkmenon 16
  • 19.