1570514051.pptx

A Software/Hardware Co-Design Framework
for the ‘Internet of Eyes’
Cathal Garry, Derek Molloy
Entwine Centre for IoT, Dublin City University

Introduction
o The main challenge examined in this paper was to bring ‘eyes’ to the
Internet of Things in real time
o Background research indicates current technologies that can facilitate this
are:
Cloud Computing
GPUs
FPGA
Neuromorphic Chipsets
SDSoC

What are SDSoCs?
o An SDSoC is an integer circuit that contains a processor, a number of
peripherals and some programmable logic
o SDSoC like the Xilinx Zynq chipset consist of two main components on the same
SoC:
The processing system (PS)
The programmable logic (PL)
o The PL is used to create custom IP (intellectual property), which is linked to the
processing system using standard AXI AMBA interfaces
o The processing system is used to run a software stack, which can access this
custom IP in the programmable logic

What are SDSoCs?
o PL can be updated either
before or dynamically during
run-time operation by software
o Effectively allows software to
redefine the hardware
o This simplifies the process of
the software development flow
[Xilinx SDSoC Overview]

Advantages and Disadvantages
o Cloud computing can offer real time imaging processing while saving on local
power consumption. But in areas with restricted network access latency can be a
problem
o GPU offer real time imaging processing at the edge but have high power
requirements
o FPGA can offer real time imaging processing with relatively low power
consumption but they require a developer to have a high level of expertise
o Neuromorphic chipset like the Movidius compute stick are a relatively new to the
market and require a high level of expertise in order to implement a solution
o SDSoC is the only solution that can offer low power consumption and real time
image processing while keeping development complexity relatively low

Architecture
o The aim of this architecture
was to develop a solution that
could provide low power
consumption and real time
image processing using an
SDSoC
o The proposed architecture is
made up of three components
The producer
The handler
The consumer
o Architecture was applied to a
chosen application which was
a variable speed limit
controlled motorway

The Producer
o The SDSoC that was chosen for this
research was the Xilinx Zynq chipset
o The processing system on the Xilinx
Zynq chipset contains an ARM A9
processor along with a number of
standard peripherals like UART and I2C
o The programmable logic contains a
number a system gates, DSP and RAM
o There are many Zynq platforms
available on the market, the one that
was chosen for this research was the
PYNQ platform
[Xilinx Zybo]

PYNQ Platform
o PYNQ or Python for productivity for Zynq is a Xilinx platform that provides a software
stack that allows developers to access the benefits of an FPGA without learning
advanced skills
o The PYNQ platform provides this support through Python libraries for accessing the PL.
o Running the PYNQ platform can be done over UART or through Jupyter notebooks
[Xilinx PYNQ]

PYNQ Platform
o The PYNQ platform runs Ubuntu-based Linux which is optimized for
developer productivity and provides support for many standard drivers and
libraries
o The framework also provides a function called Overlays which allows for the
hardware in the PL to be reprogrammable at run time
o The PYNQ framework can be ported to other Zynq based platforms as well
o The application of a variable speed limit (VSL) controlled motorway was
implemented by splitting the application between the PS and PL

The Producer – Processing System
o The PS was used to monitor the vehicle count and send data to the
handler using MQTT
o The PS read the result from a register in the custom IP. This was read
over an AXI Lite interface for each frame in the input video stream
o The PS also stored a history of the count values provided by the custom
IP in the PL. This history was then used to create a congestion level on
the motorway

Producer – Programmable Logic
o PL was used to implement a
custom IP for counting the
number of videos in a given
image frame
o Performed by implementing a
number of image processing
technics in Vivado HLS
o Result of this image
processing was stored in a
register which could be
accessed over an AXI Lite
interface

The Handler and Consumer
o Remaining parts of the
architecture are the
handler and consumer
o The handler acts as an
intermediate agent
between the producers
and consumers in the
network
o The consumer acts as an
endpoint for the
producers data.
 It can receive data from a
single or multiple
producers in order to make
a decision

Power Consumption
o The power consumption
was measured using a
number of different
profiles including:
 Different types of amount
of programmable logic in
the PL
 Different processor states
in the PS
o The worst case power
consumption when
performing some image
processing in the PL and
in the PS was 2.5 Watts

Response Time
o The response time was
measured using a number of
different platforms and
processors
o The tests were also varied
using a number of different
image processing tasks
across different image
resolution
o Worst case response time
for the PL when processing a
1080p image at 30fps was
40ms
o This increases to 50ms when
testing a 1080p image at
60fps

MQTT Latency
o MQTT was used as the
transfer protocol so it was
important to determine
the latency across the
network
o The MQTT latency was
measured by varying the
number of messages
published per second
o The response time is in
the msec range, once the
number of published
message is less than 100
per second
o After this the latency
increases by 1000x

Register Access Times
o Since the result is
produced for each frame
in a video stream it was
important to determine
the register access time
from the PL
o The register access time
was measured over a
varying number of reads
per second across a
number of iterations
o The worse case latency
was ~100usec which is
more than enough for a
60fps video

Overlay Switching
o Overlay switching allows
the user to change the
logic in the PL at run time
– e.g., change from day
time to night time image
processing algorithm
o This test measures how
long it takes for the
programmable logic to
change and for the image
processing to restart
o The worst overlay
switching time found in
this test was 30 seconds

Other Analysis
oThe implementation in this research found some types of
image processing are better suited to SDSoC than others:
Image processing techniques that are very spatially localized in
nature perform better in the programmable logic
The further away a required pixel is (spatially) the more memory is
required to store it
Alternative approach to this is to split the image processing
between the PS and PL (e.g., for higher level reasoning)

Conclusion
o The Architecture provides a scalable IoT architecture using a
software/hardware co-design for real–time IoE applications
o The research provides an implementation and evaluation of this
architecture through the development of a full stack IoE application

Question?
ACKNOWLEDGEMENTS
This research was supported by Xilinx Inc. who provided the PYNQ platform used in this project.
In particular we would like to thank Cathal McCabe and Peter Ogden from Xilinx Inc who provided
technical support during the research. We would also like to thank the Intel Corporation for their
support throughout the DCU master’s program and the development of this research.

References
－[1] Xilinx SDSOC Overview, “SDSoC Overview”, [Online]. Available:
www.xilinx.com/support.html
－[2] Xilinx Zybo; “Zybo Reference Manual”, [Online]. Available:
www.xilinx.com/support/documentation/university/XUP%20Boards/XUPZYB
O/documentation/ZYBO_RM_B_V6.pdf
－[3] Xilinx PYNQ, “PYNQ Python Productivity for Zynq”, Xilinx Documentation

1570514051.pptx

More Related Content

Similar to 1570514051.pptx

More from ssuser3855be

1570514051.pptx

Editor's Notes