“FOMO: Real-Time Object Detection on Microcontrollers,” a Presentation from Edge Impulse

•

0 likes•85 views

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/fomo-real-time-object-detection-on-microcontrollers-a-presentation-from-edge-impulse/ Jan Jongboom, Co-founder and CTO of Edge Impulse, delivers the “FOMO: Real-Time Object Detection on Microcontrollers” tutorial at the May 2022 Embedded Vision Summit. Object detection models are vital for many computer vision applications. They can show where an object is in a video stream, or how many objects are present. But they’re also very resource-intensive—models like MobileNet SSD can analyze a few frames per second on a Raspberry Pi 4, using a significant amount of RAM. This has put object detection out of reach for the most interesting devices: microcontrollers. Microcontrollers are cheap, small, ubiquitous and energy efficient—and are thus attractive for adding computer vision to everyday devices. But microcontrollers are also very resource-constrained, with clock speeds of up to 200 MHz and less than 256 Kbytes of RAM—far too little to run complex object detection models. But… that’s about to change! In this talk, Jongboom outlines his company’s work on FOMO (“faster objects, more objects”), a novel DNN architecture for object detection, designed from the ground up to run on microcontrollers.

Technology

FOMO: Real-Time
Object Detection on
Microcontrollers
Jan Jongboom
CTO
Edge Impulse

Leading development platform for machine
learning on edge devices
Launched two years ago... Now 240 new
projects daily (!)
40% of these are vision projects
Wait... What? Edge Impulse?
2
Two years ago
(1,163 projects)
Today
(82,069 projects)
Edge Impulse project count
© 2022 Edge Impulse Inc.

Object Detection
3
© 2022 Edge Impulse Inc.

Significantly Different Performance (Rpi4)
4
60 fps 2 fps
© 2022 Edge Impulse Inc.

But Why?
5
"The tiny version of YOLO only uses 516 MB of GPU memory"
© 2022 Edge Impulse Inc.

Object Detection (MobileNet SSD, YOLO)
6
© 2022 Edge Impulse Inc.

FOMO
7
Replace with single per-region class probability map
© 2022 Edge Impulse Inc.

Single Per-Region Class
Probability What?

Convolutional Image Classification Model
9
64x64 input
16x16 after conv
4x4 after conv
👨
1x1 output
High locality Zero locality
© 2022 Edge Impulse Inc.

FOMO
10
64x64 input
16x16 after conv
4x4 after conv
Loss function that forces
the top-left tile here to map to
16x16 top-left tile in the input
© 2022 Edge Impulse Inc.

Output = Heatmap
11
© 2022 Edge Impulse Inc.

Postprocessing in Software
12
Not constrained to just bounding boxes!
© 2022 Edge Impulse Inc.

Heatmap Ratio is Configurable
13
© 2022 Edge Impulse Inc.

Flexibility in Input / Output Resolution
14
320 x 320 input => 80 x 80 output
© 2022 Edge Impulse Inc.

Flexibility in Input / Output Resolution
15
96 x 96 input => 12 x 12 output
© 2022 Edge Impulse Inc.

Bounding boxes are an implementation detail
Most of the time you just want to know where and
how many objects there are
FOMO trains on the centroid of an object, w/ loss
function allowing some error
Convolutional network, so will still look around the
object, but lot less error prone against background
Centroids
16
© 2022 Edge Impulse Inc.

"Equal to an object detector, but bounding boxes are fixed
and always square"
Constrained object detection for constrained problems
Works best with similar sized objects, at similar distances
One activation per heat map cell, so objects overlapping
each other might be fused together
Few classes? Fine. 100s of objects across 60 classes? Use
YOLOv5.
Caveats
17
© 2022 Edge Impulse Inc.

Fixed and Always Square
19
Potential issue,
merge cells?
But... merging leads
to issue here

Centroids Should Have Meaning
20
© 2022 Edge Impulse Inc.

Demo (Rpi4 on CPU, 160x160 input, 60 fps)
21
© 2022 Edge Impulse Inc.

Demo (Cortex-M7, 96x96 input, 28 fps)
22
© 2022 Edge Impulse Inc.

Demo (M1, 160x160, 1000 fps)
23
© 2022 Edge Impulse Inc.

Add-on to any convolutional image network
(incl. transfer learning models), so highly
configurable
Fully convolutional, just the ratio is set
Can count objects
Can be a segmentation model
Performs much better on small objects than
YOLOv5 or MobileNet SSD
Other Cool Features of FOMO
24
© 2022 Edge Impulse Inc.

Cortex-M7 @ 480MHz: 30 fps (96x96 MobileNetV2 a=0.1)
Raspberry Pi 4: 60 fps (160x160 MobileNetV2 a=0.35)
Himax DSP @ 400MHz: 14 fps (96x96 MobileNetV2 a=0.35)
Cortex-M4F @ 156MHz: 5 fps (96x96 MobileNetV2 a=0.05)
My Macbook: 1000 fps :-)
(Can be bolted on other CNNs, e.g. MobileNetV1)
Performance
25
© 2022 Edge Impulse Inc.

• FOMO is available today for free:
https://edgeimpulse.com/fomo
• Very wide range of dev boards, from Cortex-M4F to Jetson Nano
• Deploy to any device that has a C++ compiler
• Or use your phone!
• Be one of today's 194 new projects!
Getting Started
26
© 2022 Edge Impulse Inc.

Vision is such a cool sense
Classification is cool, locality and count matters
Want a 30x increase in performance? Use FOMO!
edgeimpulse.com
Recap
27
© 2022 Edge Impulse Inc.

Full docs:
https://docs.edgeimpulse.com
FOMO
https://docs.edgeimpulse.com/docs/fomo-object-detection-for-constrained-devices
We're hiring!
https://edgeimpulse.com/careers
More questions:
forum.edgeimpulse.com / jan@edgeimpulse.com
Questions?
28
© 2022 Edge Impulse Inc.

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/06/visual-anomaly-detection-with-fomo-ad-a-presentation-from-edge-impulse/ Jan Jongboom, Co-Founder and CTO of Edge Impulse, presents the “Visual Anomaly Detection with FOMO-AD” tutorial at the May 2023 Embedded Vision Summit. Virtually all computer vision machine learning models involve classification—for example, “how many humans are in the frame?” To train such a model, you need examples of every class of object the model is intended to detect. But what happens when a new type of object appears? Ask a model trained only on elephants and zebras to classify a giraffe and it will happily misclassify the image. Similarly, when you train a model to detect flaws (such as a broken bottle) on a production line, you need training data for every possible fault, which is impractical. The solution to this dilemma, as Jongboom explains, is FOMO-AD, an ML model for visual anomaly detection. FOMO-AD can be used along with classification models, in which case it flags instances where you shouldn’t trust the classifier (e.g., this animal is unlike anything I’ve ever seen), or it can be used standalone to detect anomalies without requiring the developer to define any explicit fault states.

Данило Ульянич “C89 OpenGL for ARM microcontrollers on Cortex-M. Basic functi...

Lviv Startup Club

Deep Learning Hardware: Past, Present, & Future

Rouyun Pan

How To Make Multi-Robots Formation Control System

Keisuke Uto

Realtime 3D Visualization without GPU

Tobias G

Tutorial on Point Cloud Compression and standardisation

Rufael Mekuria

Tutorial on Point Cloud Compression and standardisation given at IEEE VCIP 2017 in december. I provide the techniques for point cloud compression and the designed quality metrics and codecs in my PhD at CWI. I detail the standardisation activity on point cloud compression that I started in 2014 and that started in 2017 involving all mobile device makers like Apple, Huawei, Sony, Samsung and Nokia.

“High-Efficiency Edge Vision Processing Based on Dynamically Reconfigurable T...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/high-efficiency-edge-vision-processing-based-on-dynamically-reconfigurable-tpu-technology-a-presentation-from-flex-logix/ Cheng Wang, Senior Vice President and Co-founder of Flex Logix, presents the “High-Efficiency Edge Vision Processing Based on Dynamically Reconfigurable TPU Technology” tutorial at the May 2022 Embedded Vision Summit. To achieve high accuracy, edge computer vision requires teraops of processing to be executed in fractions of a second. Additionally, edge systems are constrained in terms of power and cost. This talk presents and demonstrates the novel dynamic TPU array architecture of Flex Logix’s InferX X1 accelerators and contrasts it to current GPU, TPU and other approaches to delivering the teraops computing required by edge vision inferencing. Wang compares latency, throughput, memory utilization, power dissipation and overall solution cost. He also shows how existing trained models can be easily ported to run on the InferX X1 accelerator.

“Once-for-All DNNs: Simplifying Design of Efficient Models for Diverse Hardwa...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2020/08/once-for-all-dnns-simplifying-design-of-efficient-models-for-diverse-hardware-a-presentation-from-mit/ For more information about edge AI and vision, please visit: http://www.edge-ai-vision.com Song Han, Associate Professor in the Department of Electrical Engineering and Computer Science at MIT, delivers the presentation “Once-for-All DNNs: Simplifying Design of Efficient Models for Diverse Hardware” at the Edge AI and Vision Alliance’s July 2020 Edge AI and Vision Innovation Forum. Han shares his group’s latest research on the automated generation of deep neural network architectures that execute efficiently across diverse hardware targets.

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/08/understanding-dnn-based-object-detectors-a-presentation-from-au-zone-technologies/ Azhar Quddus, Senior Computer Vision Engineer at Au-Zone Technologies, presents the “Understanding DNN-Based Object Detectors” tutorial at the May 2022 Embedded Vision Summit. Unlike image classifiers, which merely report on the most important objects within or attributes of an image, object detectors determine where objects of interest are located within an image. Consequently, object detectors are central to many computer vision applications including (but not limited to) autonomous vehicles and virtual reality. In this presentation, Quddus provides a technical introduction to deep-neural-network-based object detectors. He explains how these algorithms work, and how they have evolved in recent years, utilizing examples of popular object detectors. Quddus examines some of the trade-offs to consider when selecting an object detector for an application, and touches on accuracy measurement. He also discusses performance comparison among the models discussed in this presentation.

Real-time DeepLearning on IoT Sensor Data

Romeo Kienzler

Introduction of Mobile CNN

Ryosuke Tanno

"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: http://www.embedded-vision.com/platinum-members/qualcomm/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit-mangen For more information about embedded vision, please visit: http://www.embedded-vision.com Michael Mangen, Product Manager for Camera and Computer Vision at Qualcomm, presents the "High-resolution 3D Reconstruction on a Mobile Processor" tutorial at the May 2016 Embedded Vision Summit. Computer vision has come a long way. Use cases that were previously not possible in mass-market devices are now more accessible thanks to advances in depth sensors and mobile processors. In this presentation, Mangen provides an overview of how we are able to implement high-resolution 3D reconstruction – a capability typically requiring cloud/server processing – on a mobile processor. This is an exciting example of how new sensor technology and advanced mobile processors are bringing computer vision capabilities to broader markets.

Intelligent Thumbnail Selection

Kamil Sindi

IoT consideration selection

Yoss Cohen

Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

Jenny Midwinter

SDVIs and In-Situ Visualization on TACC's Stampede

Intel® Software

Speaker: Paul Navrátil, Texas Advanced Computing Center (TACC) The design emphasis for supercomputing systems has moved from raw performance to performance-per-watt, and as a result, supercomputing architectures are converging on processors with wide vector units and many processing cores per chip. Such processors are capable of performant image rendering purely in software. This improved capability is fortuitous, since the prevailing homogeneous system designs lack dedicated, hardware-accelerated rendering subsystems for use in data visualization. Reliance on this “software-defined” rendering capability will grow in importance since, due to growing data sizes, visualizations must be performed on the same machine where the data is produced. Further, as data sizes outgrow disk I/O capacity, visualization will be increasingly incorporated into the simulation code itself (in situ visualization). This talk presents recent work in high-fidelity visualization using the OSPRay ray tracing framework on TACC’s local and remote visualization systems. We present work using OSPRay within ParaView Catalyst in situ framework from Kitware, including capitalizing on opportunities to reduce data costs migrating through VTK filters for visualization. We highlight the performance opportunities and advantages of Intel® Advanced Vector Extensions 512, the memory system improvements possible with Intel® Xeon Phi™ processor multi-channel DRAM (MCDRAM) and the Intel® Omni-Path Architecture interconnect.

Overview Of Parallel Development - Ericnel

ukdpe

Hacking for salone: drone races

Emanuele Di Saverio

“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2020/12/trends-in-neural-network-topologies-for-vision-at-the-edge-a-presentation-from-synopsys/ For more information about edge AI and computer vision, please visit: https://www.edge-ai-vision.com Pierre Paulin, Director of R&D for Embedded Vision at Synopsys, presents the “Trends in Neural Network Topologies for Vision at the Edge” tutorial at the September 2020 Embedded Vision Summit. The widespread adoption of deep neural networks (DNNs) in embedded vision applications has increased the importance of creating DNN topologies that maximize accuracy while minimizing computation and memory requirements. This has led to accelerated innovation in DNN topologies. In this talk, Paulin summarizes the key trends in neural network topologies for embedded vision applications, highlighting techniques employed by widely used networks such as EfficientNet and MobileNet to boost both accuracy and efficiency. He also touches on other optimization methods—such as pruning, compression and layer fusion—that developers can use to further reduce the memory and computation demands of modern DNNs.

Crossing the Resolution Divide

Cyrene Domogalla

AAA 3D GRAPHICS ON THE WEB WITH REACTJS + BABYLONJS + UNITY3D by Denis Radin ...

DevClub_lv

JS Fest 2019. Денис Радин. AAA 3D графика в Web с ReactJS, BabylonJS и Unity3D

JSFestUA

Создать фотореалистичное 3D приложение для Web не просто. Сделать это с React еще сложнее, но окупается с лихвой если вы все таки справились. Этот доклад о том как Evolution Gaming использует WebGL и ReactJS для создания самого сложного и дорогого WebGL приложения из когда либо разработанных.

Electrivia an Solution

Ankitesh Kumar Singh

“Jumpstart Your Edge AI Vision Application with New Development Kits from Avn...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/jumpstart-your-edge-ai-vision-application-with-new-development-kits-from-avnet-a-presentation-from-avnet/ Monica Houston, Technical Solutions Manager at Avnet, presents the “Jumpstart Your Edge AI Vision Application with New Development Kits from Avnet” tutorial at the May 2022 Embedded Vision Summit. Choosing the right processing solution for your embedded vision application can make or break your next development effort. This presentation introduces three next-generation embedded vision platforms from Avnet that enable camera-based AI at the edge, featuring the latest edge AI technical advances in processors from NXP, Renesas and Xilinx. Houston discusses the strengths and distinctive features of each solution, highlighting the applications each solution is best optimized for. She also explores the new family of production-ready camera modules featured with these kits and provides guidance on selecting the appropriate camera features for your embedded application.

Embedded Fest 2019. Dov Nimratz. Artificial Intelligence in Small Embedded Sy...

EmbeddedFest

Majority of IoT solutions use data analysis at the Cloud level, collecting a huge amount of raw data from many thousands of peripherals. What if I told you that you can move from raw data collection to knowledge aggregation by implementing Artificial Intelligence into IoT systems? During the talk, I will show the benefits of introducing AI at the earliest possible stages, applying the concept of moving from Cloud computing to Fog computing. The basic principle of constructing AIoT systems is the use of the node logic, where a node of the system has to process the provided information in a form of abstract concepts, but not in a form of raw information. Further, the experience of one device learning and the history of its life cycle can be applied to new models, automatically programming their production cycles for the most efficient use. Actually, IoT solutions should apply AI components at each level of data transfer. Following this approach, the whole system becomes self-optimizing. Also, during the talk, I will present related case studies and demonstrate a working stand.

“Challenges and Approaches for Cascaded DNNs: A Case Study of Face Detection ...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/01/challenges-and-approaches-for-cascaded-dnns-a-case-study-of-face-detection-for-face-verification-a-presentation-from-imagination-technologies/ Ana Salazar, Senior Research Manager at Imagination Technologies, presents the “Challenges and Approaches for Cascaded DNNs: A Case Study of Face Detection for Face Verification” tutorial at the September 2020 Embedded Vision Summit. This talk explores the challenges of deploying serial computer vision tasks implemented with DNNs. Neural network accelerators have demonstrated significant gains in performance for DNN inference, especially when the net has been quantized. Quantization often brings a loss in accuracy. This loss in accuracy may be considered acceptable in itself but may cause problems if the output of the DNN is used as input for a second DNN which itself has been quantized. Salazar presents her company’s research into this challenge in the context of a face verification CNN which consumes the output of a face detection CNN, discussing approaches for reducing the impact of quantization in such scenarios.

Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019

Schaffhausen Institute of Technology

“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/scaling-vision-based-edge-ai-solutions-from-prototype-to-global-deployment-a-presentation-from-network-optix/ Maurits Kaptein, Chief Data Scientist at Network Optix and Professor at the University of Eindhoven, presents the “Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment” tutorial at the May 2024 Embedded Vision Summit. The Embedded Vision Summit brings together innovators in silicon, devices, software and applications and empowers them to bring computer vision and perceptual AI into reliable and scalable products. However, integrating recent hardware, software and algorithm innovations into prime-time-ready products is quite challenging. Scaling from a proof of concept—for example, a novel neural network architecture performing a valuable task efficiently on a new piece of silicon—to an AI vision system installed in hundreds of sites requires surmounting myriad hurdles. First, building on Network Optix’s 14 years of experience, Professor Kaptein details how to overcome the networking, fleet management, visualization and monetization challenges that come with scaling a global vision solution. Second, Kaptein discusses the complexities of making vision AI solutions device-agnostic and remotely manageable, proposing an open standard for AI model deployment to edge devices. The proposed standard aims to simplify market entry for silicon manufacturers and enhance scalability for solution developers. Kaptein outlines the standard’s core components and invites collaborative contributions to drive market expansion.

“What’s Next in On-device Generative AI,” a Presentation from Qualcomm

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/whats-next-in-on-device-generative-ai-a-presentation-from-qualcomm/ Jilei Hou, Vice President of Engineering and Head of AI Research at Qualcomm Technologies, presents the “What’s Next in On-device Generative AI” tutorial at the May 2024 Embedded Vision Summit. The generative AI era has begun! Large multimodal models are bringing the power of language understanding to machine perception, and transformer models are expanding to allow machines to understand using multiple types of sensors. This new wave of approaches is poised to revolutionize user experiences, disrupt industries and enable powerful new capabilities. For generative AI to reach its full potential, however, we must deploy it on edge devices, providing improved latency, pervasive interaction and enhanced privacy. In this talk, Hou shares Qualcomm’s vision of the compelling opportunities enabled by efficient generative AI at the edge. He also identifies the key challenges that the industry must overcome to realize the massive potential of these technologies. And he highlights research and product development work that Qualcomm is doing to lead the way via an end-to-end system approach—including techniques for efficient on-device execution of LLMs, LVMs and LMMs, methods for orchestration of large models at the edge and approaches for adaptation and personalization.

Similar to “FOMO: Real-Time Object Detection on Microcontrollers,” a Presentation from Edge Impulse

Artificial intelligence at the edge

Jameson Toole

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...

Edge AI and Vision Alliance

Real-time DeepLearning on IoT Sensor Data

Romeo Kienzler

Introduction of Mobile CNN

Ryosuke Tanno

"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...

Edge AI and Vision Alliance

Intelligent Thumbnail Selection

Kamil Sindi

IoT consideration selection

Yoss Cohen

Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

Jenny Midwinter

SDVIs and In-Situ Visualization on TACC's Stampede

Intel® Software

Overview Of Parallel Development - Ericnel

ukdpe

Hacking for salone: drone races

Emanuele Di Saverio

“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...

Edge AI and Vision Alliance

Crossing the Resolution Divide

Cyrene Domogalla

AAA 3D GRAPHICS ON THE WEB WITH REACTJS + BABYLONJS + UNITY3D by Denis Radin ...

DevClub_lv

JS Fest 2019. Денис Радин. AAA 3D графика в Web с ReactJS, BabylonJS и Unity3D

JSFestUA

Electrivia an Solution

Ankitesh Kumar Singh

“Jumpstart Your Edge AI Vision Application with New Development Kits from Avn...

Edge AI and Vision Alliance

Embedded Fest 2019. Dov Nimratz. Artificial Intelligence in Small Embedded Sy...

EmbeddedFest

“Challenges and Approaches for Cascaded DNNs: A Case Study of Face Detection ...

Edge AI and Vision Alliance

Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019

Schaffhausen Institute of Technology

Similar to “FOMO: Real-Time Object Detection on Microcontrollers,” a Presentation from Edge Impulse (20)

Artificial intelligence at the edge

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...

Real-time DeepLearning on IoT Sensor Data

Introduction of Mobile CNN

"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...

Intelligent Thumbnail Selection

IoT consideration selection

Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

SDVIs and In-Situ Visualization on TACC's Stampede

Overview Of Parallel Development - Ericnel

Hacking for salone: drone races

“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...

Crossing the Resolution Divide

AAA 3D GRAPHICS ON THE WEB WITH REACTJS + BABYLONJS + UNITY3D by Denis Radin ...

JS Fest 2019. Денис Радин. AAA 3D графика в Web с ReactJS, BabylonJS и Unity3D

Electrivia an Solution

“Jumpstart Your Edge AI Vision Application with New Development Kits from Avn...

Embedded Fest 2019. Dov Nimratz. Artificial Intelligence in Small Embedded Sy...

“Challenges and Approaches for Cascaded DNNs: A Case Study of Face Detection ...

Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019

More from Edge AI and Vision Alliance

“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...

Edge AI and Vision Alliance

“What’s Next in On-device Generative AI,” a Presentation from Qualcomm

Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/11/learning-compact-dnn-models-for-embedded-vision-a-presentation-from-the-university-of-maryland-at-college-park/ Shuvra Bhattacharyya, Professor at the University of Maryland at College Park, presents the “Learning Compact DNN Models for Embedded Vision” tutorial at the May 2023 Embedded Vision Summit. In this talk, Bhattacharyya explores methods to transform large deep neural network (DNN) models into effective compact models. The transformation process that he focuses on—from large to compact DNN form—is referred to as pruning. Pruning involves the removal of neurons or parameters from a neural network. When performed strategically, pruning can lead to significant reductions in computational complexity without significant degradation in accuracy. It is sometimes even possible to increase accuracy through pruning. Pruning provides a general approach for facilitating real-time inference in resource-constrained embedded computer vision systems. Bhattacharyya provides an overview of important aspects to consider when applying or developing a DNN pruning method and presents details on a recently introduced pruning method called NeuroGRS. NeuroGRS considers structures and trained weights jointly throughout the pruning process and can result in significantly more compact models compared to other pruning methods.

“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/11/introduction-to-computer-vision-with-cnns-a-presentation-from-mohammad-haghighat/ Independent consultant Mohammad Haghighat presents the “Introduction to Computer Vision with Convolutional Neural Networks” tutorial at the May 2023 Embedded Vision Summit. This presentation covers the basics of computer vision using convolutional neural networks. Haghighat begins by introducing some important conventional computer vision techniques and then transition to explaining the basics of machine learning and convolutional neural networks (CNNs) and showing how CNNs are used in visual perception. Haghighat illustrates the building blocks and computational elements of neural networks through examples. This session provides an overview of how modern computer vision algorithms are designed, trained and used in real-world applications.

“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/11/selecting-tools-for-developing-monitoring-and-maintaining-ml-models-a-presentation-from-yummly/ Parshad Patel, Data Scientist at Yummly, presents the “Selecting Tools for Developing, Monitoring and Maintaining ML Models” tutorial at the May 2023 Embedded Vision Summit. With the boom in tools for developing, monitoring and maintaining ML models, data science teams have many options to choose from. Proprietary tools provided by cloud service providers are enticing, but teams may fear being locked in—and may worry that these tools are too costly or missing important features when compared with alternatives from specialized providers. Fortunately, most proprietary, fee-based tools have an open-source component that can be integrated into a home-grown solution at low cost. This can be a good starting point, enabling teams to get started with modern tools without making big investments and leaving the door open to evolve tool selection over time. In this talk, Patel presents a step-by-step process for creating an MLOps tool set that enables you to deliver maximum value as a data scientist. He shares how Yummly built pipelines for model development and put them into production using open-source projects.

“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/11/building-accelerated-gstreamer-applications-for-video-and-audio-ai-a-presentation-from-wave-spectrum/ Abdo Babukr, Accelerated Computing Consultant at Wave Spectrum, presents the “Building Accelerated GStreamer Applications for Video and Audio AI,” tutorial at the May 2023 Embedded Vision Summit. GStreamer is a popular open-source framework for creating streaming media applications. Developers often use GStreamer to streamline the development of computer vision and audio perception applications. Since perceptual algorithms are often quite demanding in terms of processing performance, in many cases developers need to find ways to accelerate key GStreamer building blocks, taking advantage of specialized features of their target processor or co-processor. In this talk, Babukr introduces GStreamer and shows how to use it to build computer vision and audio perception applications. He also shows how to create efficient, high-performance GStreamer applications that utilize specialized hardware features.

“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/11/understanding-selecting-and-optimizing-object-detectors-for-edge-applications-a-presentation-from-walmart-global-tech/ Md Nasir Uddin Laskar, Staff Machine Learning Engineer at Walmart Global Tech, presents the “Understanding, Selecting and Optimizing Object Detectors for Edge Applications” tutorial at the May 2023 Embedded Vision Summit. Object detectors count objects in a scene and determine their precise locations, while also labeling them. Object detection plays a crucial role in many vision applications, from autonomous driving to smart appliances. In many of these applications, it’s necessary or desirable to implement object detection at the edge. In this presentation, Laskar explores the evolution of object detection algorithms, from traditional approaches to deep learning-based methods and transformer-based architectures. He delves into widely used approaches for object detection, such as two-stage R-CNNs and one-stage YOLO algorithms, and examines their strengths and weaknesses. And he provides guidance on how to evaluate and select an object detector for an edge application.

“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/introduction-to-modern-lidar-for-machine-perception-a-presentation-from-the-university-of-ottawa/ Robert Laganière, Professor at the University of Ottawa and CEO of Sensor Cortek, presents the “Introduction to Modern LiDAR for Machine Perception” tutorial at the May 2023 Embedded Vision Summit. In this presentation, Laganière provides an introduction to light detection and ranging (LiDAR) technology. He explains how LiDAR sensors work and their main advantages and disadvantages. He also introduces different approaches to LiDAR, including scanning and flash LiDAR. Laganière explores the types of data produced by LiDAR sensors and explains how this data can be processed using deep neural networks. He also examines the synergy between LiDAR and cameras, and the concept of pseudo-LiDAR for detection.

“Vision-language Representations for Robotics,” a Presentation from the Unive...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/11/vision-language-representations-for-robotics-a-presentation-from-the-university-of-pennsylvania/ Dinesh Jayaraman, Assistant Professor at the University of Pennsylvania, presents the “Vision-language Representations for Robotics” tutorial at the May 2023 Embedded Vision Summit. In what format can an AI system best present what it “sees” in a visual scene to help robots accomplish tasks? This question has been a long-standing challenge for computer scientists and robotics engineers. In this presentation, Jayaraman provides insights into cutting-edge techniques being used to help robots better understand their surroundings, learn new skills with minimal guidance and become more capable of performing complex tasks. Jayaraman discusses recent advances in unsupervised representation learning and explains how these approaches can be used to build visual representations that are appropriate for a controller that decides how the robot should act. In particular, he presents insights from his research group’s recent work on how to represent the constituent objects and entities in a visual scene, and how to combine vision and language in a way that permits effectively translating language-based task descriptions into images depicting the robot’s goals.

“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/adas-and-av-sensors-whats-winning-and-why-a-presentation-from-techinsights/ Ian Riches, Vice President of the Global Automotive Practice at TechInsights, presents the “ADAS and AV Sensors: What’s Winning and Why?” tutorial at the May 2023 Embedded Vision Summit. It’s clear that the number of sensors per vehicle—and the sophistication of these sensors—is growing rapidly, largely thanks to increased adoption of advanced safety and driver assistance features. In this presentation, Riches explores likely future demand for automotive radars, cameras and LiDARs. Riches examines which vehicle features will drive demand out to 2030, how vehicle architecture change is impacting the market and what sorts of compute platforms these sensors will be connected to. Finally, he shares his firm’s vision of what the landscape could look like far beyond 2030, considering scenarios out to 2050 for automated driving and the resulting sensor demand.

“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/computer-vision-in-sports-scalable-solutions-for-downmarkets-a-presentation-from-sportlogiq/ Mehrsan Javan, Co-founder and CTO of Sportlogiq, presents the “Computer Vision in Sports: Scalable Solutions for Downmarket Leagues” tutorial at the May 2023 Embedded Vision Summit. Sports analytics is about observing, understanding and describing the game in an intelligent manner. In practice, this requires a fully automated, robust end-to-end pipeline, spanning from visual input, to player and group activities, to player and team evaluation to planning. Despite major advancements in computer vision and machine learning, today sports analytics solutions are limited to top leagues and are not widely available for downmarket leagues and youth sports. In this talk, Javan explains how his company has developed scalable and robust computer vision solutions to democratize sport analytics and offer pro-league-level insights to leagues with modest resources, including youth leagues. He highlights key challenges—such as the requirement for low-cost, low-latency processing and the need for robustness despite variations in venues. He discusses the approaches Sportlogiq tried and how it ultimately overcame these challenges, including the use of transformers and fusion of multiple type of data streams to maximize accuracy.

“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/detecting-data-drift-in-image-classification-neural-networks-a-presentation-from-southern-illinois-university/ Spyros Tragoudas, Professor and School Director at Southern Illinois University Carbondale, presents the “Detecting Data Drift in Image Classification Neural Networks” tutorial at the May 2023 Embedded Vision Summit. An unforeseen change in the input data is called “drift,” and may impact the accuracy of machine learning models. In this talk, Tragoudas presents a novel scheme for diagnosing data drift in the input streams of image classification neural networks. His proposed method for drift detection and quantification uses a threshold dictionary for the prediction probabilities of each class in the neural network model. The method is applicable to any drift type in images such as noise and weather effects, among others. Tragoudas shares experimental results on various data sets, drift types and neural network models to show that his proposed method estimates the drift magnitude with high accuracy, especially when the level of drift significantly impacts the model’s performance.

“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/deep-neural-network-training-diagnosing-problems-and-implementing-solutions-a-presentation-from-sensor-cortek/ Fahed Hassanat, Chief Operating Officer and Head of Engineering at Sensor Cortek, presents the “Deep Neural Network Training: Diagnosing Problems and Implementing Solutions” tutorial at the May 2023 Embedded Vision Summit. In this presentation, Hassanat delves into some of the most common problems that arise when training deep neural networks. He provides a brief overview of essential training metrics, including accuracy, precision, false positives, false negatives and F1 score. Hassanat then explores training challenges that arise from problems with hyperparameters, inappropriately sized models, inadequate models, poor-quality datasets, imbalances within training datasets and mismatches between training and testing datasets. To help detect and diagnose training problems, he also covers techniques such as understanding performance curves, recognizing overfitting and underfitting, analyzing confusion matrices and identifying class interaction issues.

“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/ai-start-ups-the-perils-of-fishing-for-whales-war-stories-from-the-entrepreneurial-front-lines-a-presentation-from-seechange-technologies/ Tim Hartley, Vice President of Product for SeeChange Technologies, presents the “AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepreneurial Front Lines)” tutorial at the May 2023 Embedded Vision Summit. You have a killer idea that will change the world. You’ve thought through product-market fit and differentiation. You have seed funding and a world-beating team. Best of all, you’ve caught the attention of major players in your industry. You’ve reached peak “start-up”—that point of limitless possibility—when you go to bed with the same level of energy and enthusiasm you had when you woke. And then the first proof of concept starts… In this talk, Hartley lays out some of the pitfalls that await those building the next big thing. Using real examples, he shares some of the dos and don’ts, particularly when dealing with that big potential first customer. Hartley discusses the importance of end-to-end design, ensuring your product solves real-world problems. He explores how far the big companies will tell you to jump—and then jump again—for free. And, most importantly, how to build long-term partnerships with major corporations without relying on over-promising sales pitches.

“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/a-computer-vision-system-for-autonomous-satellite-maneuvering-a-presentation-from-scout-space/ Andrew Harris, Spacecraft Systems Engineer at SCOUT Space, presents the “Developing a Computer Vision System for Autonomous Satellite Maneuvering” tutorial at the May 2023 Embedded Vision Summit. Computer vision systems for mobile autonomous machines experience a wide variety of real-world conditions and inputs that can be challenging to capture accurately in training datasets. Few autonomous systems experience more challenging conditions than those in orbit. In this talk, Harris describes how SCOUT Space has designed and trained satellite vision systems using dynamic and physically informed synthetic image datasets. Harris describes how his company generates synthetic data for this challenging environment and how it leverages new real-world data to improve our datasets. In particular, he explains how these synthetic datasets account for and can replicate real sources of noise and error in the orbital environment, and how his company supplements them with in-space data from the first SCOUT-Vision system, which has been in orbit since 2021.

“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/bias-in-computer-vision-its-bigger-than-facial-recognition-a-presentation-from-santa-clara-university/ Susan Kennedy, Assistant Professor of Philosophy at Santa Clara University, presents the “Bias in Computer Vision—It’s Bigger Than Facial Recognition!” tutorial at the May 2023 Embedded Vision Summit. As AI is increasingly integrated into various industries, concerns about its potential to reproduce or exacerbate bias have become widespread. While the use of AI holds the promise of reducing bias, it can also have unintended consequences, particularly in high-stakes computer vision applications such as facial recognition. However, even seemingly low-stakes computer vision applications such as identifying potholes and damaged roads can also present ethical challenges related to bias. This talk explores how bias in computer vision often poses an ethical challenge, regardless of the stakes involved. Kennedy discusses the limitations of technical solutions aimed at mitigating bias, and why “bias-free” AI may not be achievable. Instead, she focuses on the importance of adopting a “bias-aware” approach to responsible AI design and explores strategies that can be employed to achieve this.

“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/sensor-fusion-techniques-for-accurate-perception-of-objects-in-the-environment-a-presentation-from-sanborn-map-company/ Baharak Soltanian, Vice President of Research and Development for the Sanborn Map Company, presents the “Sensor Fusion Techniques for Accurate Perception of Objects in the Environment” tutorial at the May 2023 Embedded Vision Summit. Increasingly, perceptual AI is being used to enable devices and systems to obtain accurate estimates of object locations, speeds and trajectories. In demanding applications, this is often best done using a heterogeneous combination of sensors (e.g., vision, radar, LiDAR). In this talk, Soltanian introduces techniques for combining data from multiple sensors to obtain accurate information about objects in the environment. Soltanian briefly introduces the roles played by Kalman filters, particle filters, Bayesian networks and neural networks in this type of fusion. She then examines alternative fusion architectures, such as centralized and decentralized approaches, to better understand the trade-offs associated with different approaches to sensor fusion as used to enhance the ability of machines to understand their environment.

“Updating the Edge ML Development Process,” a Presentation from Samsara

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/updating-the-edge-ml-development-process-a-presentation-from-samsara/ Jim Steele, Vice President of Embedded Software at Samsara, presents the “Updating the Edge ML Development Process” tutorial at the May 2023 Embedded Vision Summit. Samsara (NYSE:IOT) is focused on digitizing the world of operations. The company helps customers across many industries—including food and beverage, utilities and energy, field services and government—get information about their physical operations into the cloud, so they can operate more safely, efficiently and sustainably. Samsara’s sensors collect billions of data points per day and on-device processing is instrumental to its success. The company is constantly developing, improving and deploying ML models at the edge. Samsara has found that the traditional development process—where ML scientists create models and hand them off to firmware engineers for embedded implementation—is slow and often produces difficult-to-resolve differences between the original model and the embedded implementation. In this talk, Steele presents an alternative development process that his company has adopted with good results. In this process, firmware engineers develop a general framework that ML scientists use to design, develop and deploy their models. This enables quick iterations and fewer confounding bugs.

“Combating Bias in Production Computer Vision Systems,” a Presentation from R...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/combating-bias-in-production-computer-vision-systems-a-presentation-from-red-cell-partners/ Alex Thaman, Chief Architect at Red Cell Partners, presents the “Combating Bias in Production Computer Vision Systems” tutorial at the May 2023 Embedded Vision Summit. Bias is a critical challenge in predictive and generative AI that involves images of humans. People have a variety of body shapes, skin tones and other features that can be challenging to represent completely in training data. Without attention to bias risks, ML systems have the potential to treat people unfairly, and even to make humans more likely to do so. In this talk, Thaman examines the ways in which bias can arise in visual AI systems. He shares techniques for detecting bias and strategies for minimizing it in production AI systems.

“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/developing-an-embedded-vision-ai-powered-fitness-system-a-presentation-from-peloton-interactive/ Sanjay Nichani, Vice President for Artificial Intelligence and Computer Vision at Peloton Interactive, presents the “Developing an Embedded Vision AI-powered Fitness System” tutorial at the May 2023 Embedded Vision Summit. The Guide is Peloton’s first strength-training product that runs on a physical device and also the first that uses AI technology. It turns any TV into an interactive personal training studio. Members have access to Peloton’s instructors, who lead classes that use dumbbells and body weight. The Guide uses an innovative combination of hardware, software and user interface; it is one of only a handful of consumer products in this market that use real-time embedded computer vision, and Peloton’s members love it! In this talk, Nichani shares insights into how the computer vision technology behind the Guide was built. He focuses on data, models and processes. The AI and Computer Vision team at Peloton worked through numerous obstacles by using synthetic data, advanced algorithms and field-testing iteration to create an affordable AI-powered device that enables at-home fitness for everyone.

More from Edge AI and Vision Alliance (20)