A Software/Hardware Co-Design Framework
for the ‘Internet of Eyes’
Cathal Garry, Derek Molloy
Entwine Centre for IoT, Dublin City University
Introduction
o The main challenge examined in this paper was to bring ‘eyes’ to the
Internet of Things in real time
o Background research indicates current technologies that can facilitate this
are:
Cloud Computing
GPUs
FPGA
Neuromorphic Chipsets
SDSoC
What are SDSoCs?
o An SDSoC is an integer circuit that contains a processor, a number of
peripherals and some programmable logic
o SDSoC like the Xilinx Zynq chipset consist of two main components on the same
SoC:
The processing system (PS)
The programmable logic (PL)
o The PL is used to create custom IP (intellectual property), which is linked to the
processing system using standard AXI AMBA interfaces
o The processing system is used to run a software stack, which can access this
custom IP in the programmable logic
What are SDSoCs?
o PL can be updated either
before or dynamically during
run-time operation by software
o Effectively allows software to
redefine the hardware
o This simplifies the process of
the software development flow
[Xilinx SDSoC Overview]
Advantages and Disadvantages
o Cloud computing can offer real time imaging processing while saving on local
power consumption. But in areas with restricted network access latency can be a
problem
o GPU offer real time imaging processing at the edge but have high power
requirements
o FPGA can offer real time imaging processing with relatively low power
consumption but they require a developer to have a high level of expertise
o Neuromorphic chipset like the Movidius compute stick are a relatively new to the
market and require a high level of expertise in order to implement a solution
o SDSoC is the only solution that can offer low power consumption and real time
image processing while keeping development complexity relatively low
Architecture
Proposed
Architecture
o The aim of this architecture
was to develop a solution that
could provide low power
consumption and real time
image processing using an
SDSoC
o The proposed architecture is
made up of three components
The producer
The handler
The consumer
o Architecture was applied to a
chosen application which was
a variable speed limit
controlled motorway
The Producer
o The SDSoC that was chosen for this
research was the Xilinx Zynq chipset
o The processing system on the Xilinx
Zynq chipset contains an ARM A9
processor along with a number of
standard peripherals like UART and I2C
o The programmable logic contains a
number a system gates, DSP and RAM
o There are many Zynq platforms
available on the market, the one that
was chosen for this research was the
PYNQ platform
[Xilinx Zybo]
PYNQ Platform
o PYNQ or Python for productivity for Zynq is a Xilinx platform that provides a software
stack that allows developers to access the benefits of an FPGA without learning
advanced skills
o The PYNQ platform provides this support through Python libraries for accessing the PL.
o Running the PYNQ platform can be done over UART or through Jupyter notebooks
[Xilinx PYNQ]
PYNQ Platform
o The PYNQ platform runs Ubuntu-based Linux which is optimized for
developer productivity and provides support for many standard drivers and
libraries
o The framework also provides a function called Overlays which allows for the
hardware in the PL to be reprogrammable at run time
o The PYNQ framework can be ported to other Zynq based platforms as well
o The application of a variable speed limit (VSL) controlled motorway was
implemented by splitting the application between the PS and PL
The Producer – Processing System
o The PS was used to monitor the vehicle count and send data to the
handler using MQTT
o The PS read the result from a register in the custom IP. This was read
over an AXI Lite interface for each frame in the input video stream
o The PS also stored a history of the count values provided by the custom
IP in the PL. This history was then used to create a congestion level on
the motorway
Producer – Programmable Logic
o PL was used to implement a
custom IP for counting the
number of videos in a given
image frame
o Performed by implementing a
number of image processing
technics in Vivado HLS
o Result of this image
processing was stored in a
register which could be
accessed over an AXI Lite
interface
The Handler and Consumer
o Remaining parts of the
architecture are the
handler and consumer
o The handler acts as an
intermediate agent
between the producers
and consumers in the
network
o The consumer acts as an
endpoint for the
producers data.
 It can receive data from a
single or multiple
producers in order to make
a decision
Working Demo
Results Obtained
Power Consumption
o The power consumption
was measured using a
number of different
profiles including:
 Different types of amount
of programmable logic in
the PL
 Different processor states
in the PS
o The worst case power
consumption when
performing some image
processing in the PL and
in the PS was 2.5 Watts
Response Time
o The response time was
measured using a number of
different platforms and
processors
o The tests were also varied
using a number of different
image processing tasks
across different image
resolution
o Worst case response time
for the PL when processing a
1080p image at 30fps was
40ms
o This increases to 50ms when
testing a 1080p image at
60fps
MQTT Latency
o MQTT was used as the
transfer protocol so it was
important to determine
the latency across the
network
o The MQTT latency was
measured by varying the
number of messages
published per second
o The response time is in
the msec range, once the
number of published
message is less than 100
per second
o After this the latency
increases by 1000x
Register Access Times
o Since the result is
produced for each frame
in a video stream it was
important to determine
the register access time
from the PL
o The register access time
was measured over a
varying number of reads
per second across a
number of iterations
o The worse case latency
was ~100usec which is
more than enough for a
60fps video
Overlay Switching
o Overlay switching allows
the user to change the
logic in the PL at run time
– e.g., change from day
time to night time image
processing algorithm
o This test measures how
long it takes for the
programmable logic to
change and for the image
processing to restart
o The worst overlay
switching time found in
this test was 30 seconds
Other Analysis
oThe implementation in this research found some types of
image processing are better suited to SDSoC than others:
Image processing techniques that are very spatially localized in
nature perform better in the programmable logic
The further away a required pixel is (spatially) the more memory is
required to store it
Alternative approach to this is to split the image processing
between the PS and PL (e.g., for higher level reasoning)
Conclusion
Conclusion
o The Architecture provides a scalable IoT architecture using a
software/hardware co-design for real–time IoE applications
o The research provides an implementation and evaluation of this
architecture through the development of a full stack IoE application
Question?
ACKNOWLEDGEMENTS
This research was supported by Xilinx Inc. who provided the PYNQ platform used in this project.
In particular we would like to thank Cathal McCabe and Peter Ogden from Xilinx Inc who provided
technical support during the research. We would also like to thank the Intel Corporation for their
support throughout the DCU master’s program and the development of this research.
References
-[1] Xilinx SDSOC Overview, “SDSoC Overview”, [Online]. Available:
www.xilinx.com/support.html
-[2] Xilinx Zybo; “Zybo Reference Manual”, [Online]. Available:
www.xilinx.com/support/documentation/university/XUP%20Boards/XUPZYB
O/documentation/ZYBO_RM_B_V6.pdf
-[3] Xilinx PYNQ, “PYNQ Python Productivity for Zynq”, Xilinx Documentation

1570514051.pptx

  • 1.
    A Software/Hardware Co-DesignFramework for the ‘Internet of Eyes’ Cathal Garry, Derek Molloy Entwine Centre for IoT, Dublin City University
  • 2.
    Introduction o The mainchallenge examined in this paper was to bring ‘eyes’ to the Internet of Things in real time o Background research indicates current technologies that can facilitate this are: Cloud Computing GPUs FPGA Neuromorphic Chipsets SDSoC
  • 3.
    What are SDSoCs? oAn SDSoC is an integer circuit that contains a processor, a number of peripherals and some programmable logic o SDSoC like the Xilinx Zynq chipset consist of two main components on the same SoC: The processing system (PS) The programmable logic (PL) o The PL is used to create custom IP (intellectual property), which is linked to the processing system using standard AXI AMBA interfaces o The processing system is used to run a software stack, which can access this custom IP in the programmable logic
  • 4.
    What are SDSoCs? oPL can be updated either before or dynamically during run-time operation by software o Effectively allows software to redefine the hardware o This simplifies the process of the software development flow [Xilinx SDSoC Overview]
  • 5.
    Advantages and Disadvantages oCloud computing can offer real time imaging processing while saving on local power consumption. But in areas with restricted network access latency can be a problem o GPU offer real time imaging processing at the edge but have high power requirements o FPGA can offer real time imaging processing with relatively low power consumption but they require a developer to have a high level of expertise o Neuromorphic chipset like the Movidius compute stick are a relatively new to the market and require a high level of expertise in order to implement a solution o SDSoC is the only solution that can offer low power consumption and real time image processing while keeping development complexity relatively low
  • 6.
  • 7.
    Architecture o The aimof this architecture was to develop a solution that could provide low power consumption and real time image processing using an SDSoC o The proposed architecture is made up of three components The producer The handler The consumer o Architecture was applied to a chosen application which was a variable speed limit controlled motorway
  • 8.
    The Producer o TheSDSoC that was chosen for this research was the Xilinx Zynq chipset o The processing system on the Xilinx Zynq chipset contains an ARM A9 processor along with a number of standard peripherals like UART and I2C o The programmable logic contains a number a system gates, DSP and RAM o There are many Zynq platforms available on the market, the one that was chosen for this research was the PYNQ platform [Xilinx Zybo]
  • 9.
    PYNQ Platform o PYNQor Python for productivity for Zynq is a Xilinx platform that provides a software stack that allows developers to access the benefits of an FPGA without learning advanced skills o The PYNQ platform provides this support through Python libraries for accessing the PL. o Running the PYNQ platform can be done over UART or through Jupyter notebooks [Xilinx PYNQ]
  • 10.
    PYNQ Platform o ThePYNQ platform runs Ubuntu-based Linux which is optimized for developer productivity and provides support for many standard drivers and libraries o The framework also provides a function called Overlays which allows for the hardware in the PL to be reprogrammable at run time o The PYNQ framework can be ported to other Zynq based platforms as well o The application of a variable speed limit (VSL) controlled motorway was implemented by splitting the application between the PS and PL
  • 11.
    The Producer –Processing System o The PS was used to monitor the vehicle count and send data to the handler using MQTT o The PS read the result from a register in the custom IP. This was read over an AXI Lite interface for each frame in the input video stream o The PS also stored a history of the count values provided by the custom IP in the PL. This history was then used to create a congestion level on the motorway
  • 12.
    Producer – ProgrammableLogic o PL was used to implement a custom IP for counting the number of videos in a given image frame o Performed by implementing a number of image processing technics in Vivado HLS o Result of this image processing was stored in a register which could be accessed over an AXI Lite interface
  • 13.
    The Handler andConsumer o Remaining parts of the architecture are the handler and consumer o The handler acts as an intermediate agent between the producers and consumers in the network o The consumer acts as an endpoint for the producers data.  It can receive data from a single or multiple producers in order to make a decision
  • 14.
  • 15.
  • 16.
    Power Consumption o Thepower consumption was measured using a number of different profiles including:  Different types of amount of programmable logic in the PL  Different processor states in the PS o The worst case power consumption when performing some image processing in the PL and in the PS was 2.5 Watts
  • 17.
    Response Time o Theresponse time was measured using a number of different platforms and processors o The tests were also varied using a number of different image processing tasks across different image resolution o Worst case response time for the PL when processing a 1080p image at 30fps was 40ms o This increases to 50ms when testing a 1080p image at 60fps
  • 18.
    MQTT Latency o MQTTwas used as the transfer protocol so it was important to determine the latency across the network o The MQTT latency was measured by varying the number of messages published per second o The response time is in the msec range, once the number of published message is less than 100 per second o After this the latency increases by 1000x
  • 19.
    Register Access Times oSince the result is produced for each frame in a video stream it was important to determine the register access time from the PL o The register access time was measured over a varying number of reads per second across a number of iterations o The worse case latency was ~100usec which is more than enough for a 60fps video
  • 20.
    Overlay Switching o Overlayswitching allows the user to change the logic in the PL at run time – e.g., change from day time to night time image processing algorithm o This test measures how long it takes for the programmable logic to change and for the image processing to restart o The worst overlay switching time found in this test was 30 seconds
  • 21.
    Other Analysis oThe implementationin this research found some types of image processing are better suited to SDSoC than others: Image processing techniques that are very spatially localized in nature perform better in the programmable logic The further away a required pixel is (spatially) the more memory is required to store it Alternative approach to this is to split the image processing between the PS and PL (e.g., for higher level reasoning)
  • 22.
  • 23.
    Conclusion o The Architectureprovides a scalable IoT architecture using a software/hardware co-design for real–time IoE applications o The research provides an implementation and evaluation of this architecture through the development of a full stack IoE application
  • 24.
    Question? ACKNOWLEDGEMENTS This research wassupported by Xilinx Inc. who provided the PYNQ platform used in this project. In particular we would like to thank Cathal McCabe and Peter Ogden from Xilinx Inc who provided technical support during the research. We would also like to thank the Intel Corporation for their support throughout the DCU master’s program and the development of this research.
  • 25.
    References -[1] Xilinx SDSOCOverview, “SDSoC Overview”, [Online]. Available: www.xilinx.com/support.html -[2] Xilinx Zybo; “Zybo Reference Manual”, [Online]. Available: www.xilinx.com/support/documentation/university/XUP%20Boards/XUPZYB O/documentation/ZYBO_RM_B_V6.pdf -[3] Xilinx PYNQ, “PYNQ Python Productivity for Zynq”, Xilinx Documentation

Editor's Notes

  • #3 Cloud computing, GPUs and FPGA have been around a while but SDSoCs and neuromorphic chipset are relatively new
  • #6 - When I looked at the Movidius compute stick the level of complexity involved in using it in a full network stack was quite high and there was limited support available
  • #8 - The producer roles is to process the large amounts of image data at the edge and reduce it down to a small piece of information. This small piece of information is then sent to the handler The handler acts an intermediate agent between he producer and consumer. It is responsible for collecting and storing the small pieces of information provided by the producers. The consumer is responsible for making a decision based on the data received from the producer. This decision can be made based on a single or multiple producers
  • #12 Explain what AXI Lite is here
  • #15 Video stream of the motorway is output by the laptop The handler is implemented on a virtual machine running Ubuntu linux The image on the monitor is of the post processed image
  • #19 - NTP server used to synchronize the consumer and producer clocks
  • #24 The architecture outline provides a scalable IoT architecture where the number of producer or consumer could easily be increased. Looking at the chosen application the structure could easily be changed to have a large number of producer along the motorway monitoring traffic and feeding this information to a single consumer.