SoC Design Flow for Nonlinear CMOS ISP (39

System-on-Chip Design
Flow for the Image Signal
Processor of a Nonlinear
CMOS Imaging System
Maikon Nascimento and Dileepan Joseph
Electrical and Computer Engineering, University of Alberta
16/01/2019

Introduction
● What do these pictures have in common ?
● Autonomous devices require high
processing data capabilities but have
restrictions in power, weight, and cost;
● Latency is also crucial for these fast
moving applications, which means the use
of cloud computing is not ideal;
● Very high bandwidth for thousands of
cameras streaming HD video also makes
the use of cloud computing infeasible;
● All of these cases require a solution --
edge computing -- where computing is
realized locally, at the edge of the cloud;
● Recently, system-on-chip (SoC) platforms
have been developed to address these
requirements.
https://www.kqed.org
https://www.procemex.com
https://theumlaut.com

Introduction
● As a case study, our edge computing
device is an HDR video camera;
● HDR is crucial to the future of digital
imaging, especially outdoors;
● High performance CMOS imaging systems
require an image signal processor (ISP)
especially for HDR imaging;
● A promising approach for HDR imaging is
to use nonlinear CMOS image sensors,
mimicking the human eye, and the ISP
must be tailored for the nonlinearity.
*High-Dynamic-Range(HDR) Vision - Bernd Hoefflinger

Apparatus
● The SoC (chip):
○ Is manufactured by Xilinx and is called
Zynq-7000;
○ Is embedded with a dual Core ARM (uP)
supporting Linux and 7 Series FPGA;
○ Features a high throughput internal
interface, enabling the data rates required
for HD video processing;
● The SoC platform (board) is manufactured
by MYIR with essential peripherals such
as: DDR RAM, SD card, JTAG, ethernet,
and HDMI.
https://www.myirtech.com

Application
● High level schematic of the SoC where blocks in red
are in the FPGA and blue in the uP;
● The ISP is composed of fixed pattern noise (FPN)
correction, salt-and-pepper (SPN) filtering, and a tone
mapping operator (TMO), explained in the next slide
using a MATLAB simulation;
● The controller is responsible for:
○ The external communication protocol AXI4-Stream of a
direct memory access (DMA) module that interfaces the
FPGA and uP;
○ Providing FPN correction coefficients obtained from the
uP;
○ Generating a few control signals needed by the SPN
and TMO circuits;
● Currently, instead of the CIS, an HDR video from [1] is
used to simulate the ISP input -- it is also loaded from
the disk and transferred from the uP to the FPGA;* [1] http://www.hdrv.org/Resources.php

Application
● ISP operations are illustrated in
this panel;
● On the top row, “All ISP” shows
the complete system (FPN, SPN,
and TMO);
● “No TMO” demonstrates the
importance of tone mapping for
HDR video;
● “No SPN” shows the effect of
salt-and-pepper noise;
● “No FPN” presents the fixed
pattern noise inherent to a
nonlinear CIS.

Method ● The block diagram is marked with red frames to illustrate
which modules are being utilized;
● We use the FPGA to implement a low power ISP that
exploits parallel processing for high speed computation;
● μP with dual core ARM running Linux is used for networking
and to support, in future, open source computer vision
frameworks;
● The interfaces used between the FPGA and uP are:
○ high performance port (HP) for data transfer;
○ General purpose port (GP) for configuration and control;
○ Interruptions from the FPGA to the uP;
● The protocol adopted is AXI4 from AMBA, with 3 variations:
AXI4-Full for multiple devices in HP vias, AXI4-Little for
control, and AXI4-Stream for point2point communication;
● DMA is responsible for the Data transfer between uP and
FPGA.
*https://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html

Method
● This system, which makes the FPGA the master,
differs from hardware acceleration approaches
reported in the literature;
● Three different approaches are shown, where
arrows indicate the direction of data transfer:
○ (1) typifies a hardware acceleration architecture
where the uP is the master;
○ (2) represents our system where a CIS is simulated,
using data loaded by the uP, and where the FPGA is
the master;
○ (3) presents our ideal architecture (future work),
including an actual CIS, where the FPGA remains the
master.

Results and Discussion
Validation
● Initial functional validation of the FPGA design realized with small images to validate the protocol
and data path matching delays;
● Automatic for LARGE images and 0 BIT ERROR validated against MATLAB simulation;

Resource Occupancy
● Picturing the occupancy of the FPGA
by the implemented ISP;
● The vertical blue rectangles on the
cells are the blocks of RAM (BRAM);
● Others blocks are Lookup tables
(LUTs) for logic and small memory,
(DSPs) blocks for multipliers and
accumulators, and Flip-flops (FFs);
● This project has not been constrained
yet;

● These pies show the FPGA resources consumed by our ISP for zynq XC7Z020;
● On left is the number of LUTs (all 53200 LUTs including Logic and LUTRAM );
● On right is the block of RAM consumed by each major component from the total of 140 BRAMs;

● On left is the number of FFs (106400 available);
● On right is the block of DSPs (220 available);

This experiment simulates an overworking
situation by programming μP to make it busy
from time to time, represented by the CPU red
line in the image on left. Even the μP using close
to 50% of the CPU, the FPGA keeps its
processing steady and constant as shown in the
oscilloscope print of the FPGA IRQ in the next
image;

● Even having a busy μP, the FPGA
generates the interruptions constantly 30
Frames per second reading data from
the μP, processing the data, and sending
back;
● On left is the printscreen from the
Oscilloscope showing the interruption
signal managed by the FPGA.

● This is a screenshot of a browser from a
cellphone connected to the network via
wifi where our SoC System is providing
webcontent;
● This Bitmap picture is the output of the ISP
from the DMA; a Linux application convert
the binary data from the DATA to a
Bitmap.
● The user can fresh the picture anytime
although the frame is depends on the
conversion which may not be 30 FPS;

Conclusion
● This work has achieved a SoC design flow
for hard real-time image signal processing
of a nonlinear HDR imaging system;
● Our ISP design used 14.5% of LUTs, 9.0%
of flip-flops, and 31.4% of memory, of a
Xilinx Zynq SoC, and consumed XXX mW,
to process HD video (00000 MB/s);
● One novelty of the design flow is that the
FPGA is the master of the SoC platform,
which includes a μP running Linux and
will, in future, include a nonlinear CIS;
● This approach is especially suited for
future edge computing applications
involving HDR and computer vision.
Acknowledgements:

SoC Design Flow for Nonlinear CMOS ISP (39

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to SoC Design Flow for Nonlinear CMOS ISP (39

Similar to SoC Design Flow for Nonlinear CMOS ISP (39 (20)

Recently uploaded

Recently uploaded (20)

SoC Design Flow for Nonlinear CMOS ISP (39