SlideShare a Scribd company logo
1 of 38
Download to read offline
Efficient
Video
Perception
through AI
Qualcomm Technologies, Inc.
January 19, 2021 @QCOMResearch
2
• The role of video in our lives
• What is video perception & what makes it challenging
• Our research toward efficient video perception
• Forward looking video perception research
Agenda
3
3
A picture is worth
a thousand words
Out of all the five senses, vision
is arguably the most important
3
4
4
A minute of
video has
more than
4
1,000
pictures
5
How video came to be
From performing arts and still photos to high-quality video
Performing arts
Japanese
Kabuki
Roman
amphitheater
Chinese
shadow
puppetry Television
8K HDR
video
360-degree
video
Chrono-
photography
Sanskrit
plays
Greek
theater
Motion
pictures
Color
analog TV
1900s
1880s
1650s
Early ages 1890s 1920s 1938
Polynesian
hula
Magic
lantern
Motion
camera
projectors Digital TVs
1990s 1995
Internet video
broadcasting
Still photos Motion photos Low-quality video High-quality video
6
6
The scale of video being created
and consumed is massive
1M
Minutes of video
crossing the internet
per second
300
Hours of video are
uploaded every
minute to YouTube
82%
Of all consumer
internet traffic is
online video
76
Minutes per day watching
video on digital devices
by US adults
8B
Average daily
video views
on Facebook
Cisco Visual Networking Index: Forecast and Trends, 2017–2022
7
7
Video monitoring
Extended reality Smart cities
Smart factories
Increasingly, video is all around us — providing entertainment,
enhancing collaboration, and transforming industries
Autonomous vehicles
Video conferencing
Sports
Smartphone
8
Understand
Recognizing patterns,
identities, objects,
scenes, context, relations,
compositions, changes,
motions, actions, activities,
events, 3D structures,
surfaces, lightings, text,
emotions, sentiments,
sounds, and more
Systems
Any compute
platform, including
SoCs, CPUs,
GPUs, TPUs,
NPUs, and DSPs
Making
Developing
mathematical
representations,
models,
algorithms,
rules, and
frameworks
Video
perception
Making systems
understand
video content
9
9
Platform
limitations
Task
diversity
Volume of video data
(training/testing)
Diversity in
visual data
Quality of data
acquisition
Availability of
annotated datasets
What makes video perception challenging?
Implementation
challenges
Data
challenges
Video
perception
challenges
10
10
Power and thermal
efficiency are essential
for on-device
video perception
The challenge of
AI workloads
Constrained mobile
environment
Very compute
intensive
Large,
complicated neural
network models
Complex
concurrencies
Always-on
Real-time
Must be thermally
efficient for sleek,
ultra-light designs
Storage/memory
bandwidth limitations
Requires long battery
life for all-day use
11
Robustness
Robust to data variations
Adaptability
Adaptable to different domains
Scalability
Scaling up and down,
from IoT to the data center
Making
video
perception
ubiquitous
Solving additional
key challenges to
take video perception
from the research lab
to broad commercial
deployment
12
12
Efficiently running on-device video perception without sacrificing accuracy
Leverage
Temporal
redundancy
By reusing what
is computed before
• Learning to skip regions
• Recycling features
Make
Early
decisions
By dynamically changing
the network architecture
per input frame
• Early exiting
• Frame exiting
12
Key concepts
for efficient
video perception
13
Data driven learning
• Supervised, unsupervised,
semi/self/weak supervised, adversarial
Neural network
• Convolutional neural networks
• Graph neural networks
Regression and
classification tasks
• Minimizing a loss function
• Back-propagation over differentiable layers
Convolutional layers (deep)
Non-linearity, pooling
Kernels, neurons
Deep learning
basics
Computable AI
2
Training data Image
Convolutional
layer
Feature
map
Pooling
Fully-connected
layer of neurons
output
Learnable coefficients
Feed-forward CNN
Vectorized
features
Inspired by the workings of the brain,
drawing from data
13
14
14
Learning to skip redundant computations
Video frames are heavily correlated
frame t
“Skip-convolutions for efficient video processing” (submitted 2021)
Limit the computation only to the regions
where there are significant changes
frame t+10 residual
The residual frame, the difference between two
consecutive frames, contains little information
in most regions
15
“Skip-convolutions for efficient video processing”
(submitted 2021)
Skip-
convolution
A convolution at a frame
can be written as the previous
frame’s convolution plus the
convolution of the residual
Computation is limited only
to the regions where there
are strong residuals
Reinforce residual’s sparsity
by removing negligible residuals
Can replace convolutional layers
in any CNN with skip convolutions
A convolutional layer
with a skip gate
that masks out
negligible residuals
Conv
G
Conv
– +
Skip
gate
Previous frame
output features
Output
features
Input
features
Convolutional layer
Previous frame
input features
Current frame
input features
Output
features
Previous frame
output features
Small
residual
Large
residual
No computation
15
16
Determining the gate for a skip convolution
Can we learn it?
Non-trainable
Based on thresholding
the norm of the
convolved residual
Approximate the norm of output
without computing it
Trainable
Based on a tiny gating
network that predicts
gate probabilities
Implement as a convolution with
a single output channel
Apply a sigmoid
function to gating
network probabilities
Apply a sigmoid function on the
norm of the residual and the weight
of the layer kernel
Joint training to minimize
the classification loss and
average active gates
17
17
Learned gate requires very low computational overhead
Gate network is an additional output channel
Cout
Feature map
Gating scores Gate mask
Skip
convolutions
Win
Wout
No computation Computation
“Skip-convolutions for efficient video processing” (submitted 2021)
Hardware friendly implementation enforced by
imposing structured sparsity into compute masks
Block-wise structure by down/up sampling
18
“Skip-convolutions for efficient video processing” (submitted 2021)
Predicted gating masks reduce computations
Example masks show where computations can be skipped
Gating masks become more selective at deeper
layers, concentrating on task specific regions
Layer 3 Layer 30
Gating masks for video object detection
No computation
Computation
Larger block-wise structured gates are more
selective when training with different sizes
1x1 2x2 4x4 8x8
Gating masks for pose estimation
19
0.5
0.55
0.6
0.65
0.1 1 10 100 1000
Average
precision
GMAC
Object detection
SpotNet
Deep Feature Flow
EfficientDet
Ours-EfficientDet
“Skip-convolutions for efficient video processing”
(submitted 2021)
Learning to
skip reduces
compute and
maintains
accuracy
Results for object detection on
video object detection dataset
3x-
speed-up over
state-of-the-art
5x
20
“Skip-convolutions for efficient video processing” (submitted 2021)
Learning
to skip
reduces
compute for
human pose
estimation
Results for human
pose estimation
GMACs without
skip convolutions
GMACs with
skip-convolutions
10.2
10.2
1.1
10.2
10.2
10.2
0.4
10.2 0.4
10.2 0.4
10.2 0.5
10.2
0.7
10.2 0.8
10.2 0.7
10.2
21
0.9
0.91
0.92
0.93
0.94
0.95
0.96
0 2 4 6 8 10
Percentage
of
correct
keypoints
GMAC
Pose estimation
HRNet-w32
L0
W-SVD
S-SVD
Ours-Skip
Ours-Skip+S-SVD
“Skip-convolutions for efficient video processing”
(submitted 2021)
Learning
to skip is
complementary
to model
compression
Results for human
pose estimation on
video human action dataset
speed-up over
HRNet
2.5x-8x
22
22
Instead of computing deep features repetitively, compute once and recycle
“Time-sharing networks for efficient semantic video segmentation” (submitted 2021)
Recycling features saves compute
Compute deep features once and
recycle — reuse from past frame
Compute shallow features
for all frames
Deep features remain relatively stationary
over time — they have lower spatial resolution
Shallow features are more responsive to
smooth changes, encoding the temporally
varying information
Applicable to any video neural network architectures including
segmentation, optical flow, classification, and more
23
“Time-sharing networks for efficient semantic video segmentation” (submitted 2021)
Example of a feature recycling network
Making the best use of the deep features at hierarchical scales
Frame t
Frame t+1
Compute
Data Flow
Stride=2, Convolution
Bilinear Up-sampling
Features
Transition
Frame t
(Key
frame)
Transition
C Ψ
Multi-scale
Convolution
Feature Sharing + Addition C Concat
Convolution Based
Decoder Block
Ψ
Convolution
Block
H
H Classifier
Time
Next key frame
Transition
Transition
24
C Ψ
Frame
t+1
+ +
Feature
Transform
Feature
Transform
Feature
Transform
H
Branch
Decision
“Time-sharing networks for efficient semantic video segmentation” (submitted 2021)
Example of a feature recycling network
Most frames require a light branch; some frames require a heavy branch
Frame t
Frame t+1
Compute
Data Flow
Stride=2, Convolution
Bilinear Up-sampling
Features
Transition
Frame t
(Key
frame)
Transition
C Ψ
Multi-scale
Convolution
Feature Sharing + Addition C Concat
Convolution Based
Decoder Block
Ψ
Convolution
Block
H
H Classifier
Time
Next key frame
Transition
Transition
25
25
Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/
“Time-sharing networks for efficient semantic video segmentation” (submitted 2021)
Recycling features
saves compute
Instead of computing deep features
repetitively, compute once and recycle
25
26
74.0
26.0
17.0
77.9
Qualcomm Snapdragon is a product of Qualcomm
Technologies, Inc. and/or its subsidiaries.
Model efficiency
78%
GMACs On-device latency (ms/frame)
GMAC
reduction
Feature recycling
reduces compute
and latency
Semantic segmentation example
Input:
2048x1024 RGB video
Output:
2048x1024,
19 object classes
Runs on:
Qualcomm® Snapdragon™ 888
Mobile Platform
65% latency
reduction
Memory traffic
321.3
373.6
89%
MB read MB write
read
reduction 93% write
reduction
41.3 22.6
Enhanced Net
HRNet w18 v2
“Time-sharing networks for efficient semantic video segmentation” (submitted 2021)
27
27
Exploit the fact that not all input examples require models of the same complexity
“FrameExit: Conditional early exiting for efficient video recognition” (submitted 2021)
Early exiting a neural network saves compute
Very large, computationally intensive
models are needed to correctly classify
Very small and compact models can achieve very
high accuracies, but they fail for complex examples
Complex
examples
Simple
examples
Ideally, our system should be composed of a
cascade of classifiers throughout the network
28
Early exiting at the earliest possible NN layer for video
Conv
G1
Accept previous
frame layer output
Feature
from t-1
Feature
from t
G1:
Gating based
on temporal
agreement
Continue
Similar
Not
similar
FC Classifier
G2
G2:
Gating based
on frame
complexity
Exit
Continue
Confident
Not
confident
Frame t-1
Exit NN layer
“FrameExit: Conditional early exiting for efficient video recognition” (submitted 2021)
Frame t
Exit NN layer
29
29
0.91
0.915
0.92
0.925
0.93
0.935
0.94
0.945
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Top-1
accuracy
MAC (x108)
Classification on image dataset
RANets1
RANets2
RANets3
MSDNet
Ours-MSDNet
DenseNet
Ours-DenseNet
ResNet
Ours-ResNet
Early exiting reduces compute while maintaining accuracy
“FrameExit: Conditional early exiting for efficient video recognition” (submitted 2021)
Early exiting
applies to most
neural network
backbones
30
0.77
0.775
0.78
0.785
0.79
0.795
0.8
4 6 8 10 12 14 16 18 20
Top-1
accuracy
GMAC
Object classification on video street scene dataset
ResNet34
ResNet34-Uniform2
Ours-ResNet34-G1
Ours-ResNet34-G1/G2
“FrameExit: Conditional early exiting for efficient video
recognition” (submitted 2021)
Early exiting
for object
classification
2.5x
less MACs while
maintaining
accuracy
31
Gating module
2-layer MLP
2-layer MLP
Shared
weights
“FrameExit: Conditional early exiting for efficient video recognition” (submitted 2021)
Frame exiting also applies to action recognition tasks
Policy
Function
Frame feature
G1
Frame feature
Classifier 1
G2
Frame feature
Classifier 2
Accumulated
Feature Pooling
G3
Frame feature
Classifier 3
Accumulated
Feature Pooling
G4
Frame feature
Classifier 4
Accumulated
Feature Pooling
Frame feature
Classifier 5
Accumulated
Feature Pooling
G5
32
32
Frame exiting improves accuracy and reduces compute
Policy
Function
Policy
Function
Policy
Function
Methods
Video activity dataset Video action dataset
mAP (%) GFLOPS Top-1 (%) GFLOPS
Resnet
AdaFrame 71.5 79.0 — —
LiteEval 72.7 95.1 61.0 99.0
ListenToLook 72.3 81.4 — —
SCSampler 72.9 41.9 70.8 41.9
AR-Net 73.8 33.5 71.7 32.0
FrameExit 76.1 25.2 72.8 19.7
EfficientNet
AR-Net 79.7 15.3 74.8 16.3
FrameExit 80.0 11.4 75.3 7.8
mAP (%) GFLOPS
2d/3d Resnet Video human action dataset
Uniform-10 44.7 41.2
Random-10 43.6 41.2
3D-ResNet18 35.4 38.6*
HATNet 39.6 41.8*
FrameExit (Efficient) 45.7 8.6
FrameExit (Accurate) 49.2 18.7
By adding gates to the NN architecture, deeper layers concentrate on
the difficult decisions while earlier layers solve all the easy issues
33
60
65
70
75
80
5 25 45 65 85 105
mean
Average
Precision
(mAP)
GFLOPs
Video classification on video activity dataset
AdaFrame5
ARNet-ResNet
ARNet-EfficientNet
L2L-Mnet
LiteEval
Ours-ResNet
Ours-EfficientNet
“FrameExit: Conditional early exiting for efficient video
recognition” (submitted 2021)
Frame exiting
for video
classification
less GFLOPs while
maintaining
accuracy
1.3x-5x
34
Efficient sparse
convolutions
Personalization
Multi-task networks
Quantization aware
training
Platform optimizations
Unsupervised / semi-
supervised learning
Advance
existing conditional
compute techniques
Develop efficient
video neural network
solutions
Learning to skip regions
Recycling features
Early exiting
Frame exiting
Future work
in video
perception
35
Machine
learning
core
Computer vision
3D Sensing
RF Sensing
Personalization
Biometrics
Data driven models
XR
Camera
Autonomous
vehicles
Fingerprint ASP
IOT
Cloud
Wi-Fi
Mobile
Research Areas Applications
Leading high-impact
research efforts in
perception
Inventing technology
enablers for important
applications
35
Our perception research is much broader than video
36
Video perception is crucial for
understanding the world and making
devices smarter
We are conducting leading research
and development in video perception
We are making power efficient video
perception possible without
sacrificing accuracy
37
www.qualcomm.com/ai
@QCOMResearch
www.qualcomm.com/news/onq
https://www.youtube.com/qualcomm?
http://www.slideshare.net/qualcommwirelessevolution
Connect with Us
Questions?
Follow us on:
For more information, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
Thank you
Nothing in these materials is an offer to sell any of the
components or devices referenced herein.
©2018-2021 Qualcomm Technologies, Inc. and/or its
affiliated companies. All Rights Reserved.
Qualcomm and Snapdragon are trademarks or registered
trademarks of Qualcomm Incorporated. Other products
and brand names may be trademarks or registered
trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm
Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries
or business units within the Qualcomm corporate structure, as
applicable. Qualcomm Incorporated includes our licensing business,
QTL, and the vast majority of our patent portfolio. Qualcomm
Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates,
along with its subsidiaries, substantially all of our engineering,
research and development functions, and substantially all of our
products and services businesses, including our QCT semiconductor
business.

More Related Content

What's hot

“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from XailientEdge AI and Vision Alliance
 
Challenges & issues in way to 6g wireless communication
Challenges & issues in way to 6g wireless communicationChallenges & issues in way to 6g wireless communication
Challenges & issues in way to 6g wireless communicationNikhil Soni
 
What's in the future of 5G millimeter wave?
What's in the future of 5G millimeter wave? What's in the future of 5G millimeter wave?
What's in the future of 5G millimeter wave? Qualcomm Research
 
Zero Trust for Private 5G and Edge
Zero Trust for Private 5G and EdgeZero Trust for Private 5G and Edge
Zero Trust for Private 5G and EdgeRebekah Rodriguez
 
Wireless Sensor Networks
Wireless Sensor NetworksWireless Sensor Networks
Wireless Sensor Networksrajatmal4
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
5G Technology Tutorial
5G Technology Tutorial5G Technology Tutorial
5G Technology TutorialAPNIC
 
Bluetooth protocol
Bluetooth protocolBluetooth protocol
Bluetooth protocolRajan Shah
 
Prof. Andy Sutton: 5G RAN Architecture Evolution - Jan 2019
Prof. Andy Sutton: 5G RAN Architecture Evolution - Jan 2019Prof. Andy Sutton: 5G RAN Architecture Evolution - Jan 2019
Prof. Andy Sutton: 5G RAN Architecture Evolution - Jan 20193G4G
 
zigbee technology
zigbee technologyzigbee technology
zigbee technologyDeep Hundal
 
InfiniBand Presentation
InfiniBand PresentationInfiniBand Presentation
InfiniBand PresentationShekhar Kumar
 
Near field communication(NFC)
Near field communication(NFC)Near field communication(NFC)
Near field communication(NFC)ronak1207
 
5G Physical Channel and Signal
5G Physical Channel and Signal5G Physical Channel and Signal
5G Physical Channel and SignalTaiz Telecom
 
Beginners: 5G: A simple explanation
Beginners: 5G: A simple explanationBeginners: 5G: A simple explanation
Beginners: 5G: A simple explanation3G4G
 
Efficient Deep Learning in Communications
Efficient Deep Learning in CommunicationsEfficient Deep Learning in Communications
Efficient Deep Learning in CommunicationsITU
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 

What's hot (20)

EfficientNet
EfficientNetEfficientNet
EfficientNet
 
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
 
Challenges & issues in way to 6g wireless communication
Challenges & issues in way to 6g wireless communicationChallenges & issues in way to 6g wireless communication
Challenges & issues in way to 6g wireless communication
 
What's in the future of 5G millimeter wave?
What's in the future of 5G millimeter wave? What's in the future of 5G millimeter wave?
What's in the future of 5G millimeter wave?
 
Zero Trust for Private 5G and Edge
Zero Trust for Private 5G and EdgeZero Trust for Private 5G and Edge
Zero Trust for Private 5G and Edge
 
Wireless Sensor Networks
Wireless Sensor NetworksWireless Sensor Networks
Wireless Sensor Networks
 
Introduction to 5G NR
Introduction to 5G NRIntroduction to 5G NR
Introduction to 5G NR
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
5G Technology Tutorial
5G Technology Tutorial5G Technology Tutorial
5G Technology Tutorial
 
Bluetooth protocol
Bluetooth protocolBluetooth protocol
Bluetooth protocol
 
Prof. Andy Sutton: 5G RAN Architecture Evolution - Jan 2019
Prof. Andy Sutton: 5G RAN Architecture Evolution - Jan 2019Prof. Andy Sutton: 5G RAN Architecture Evolution - Jan 2019
Prof. Andy Sutton: 5G RAN Architecture Evolution - Jan 2019
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
zigbee technology
zigbee technologyzigbee technology
zigbee technology
 
InfiniBand Presentation
InfiniBand PresentationInfiniBand Presentation
InfiniBand Presentation
 
Near field communication(NFC)
Near field communication(NFC)Near field communication(NFC)
Near field communication(NFC)
 
5G Physical Channel and Signal
5G Physical Channel and Signal5G Physical Channel and Signal
5G Physical Channel and Signal
 
Beginners: 5G: A simple explanation
Beginners: 5G: A simple explanationBeginners: 5G: A simple explanation
Beginners: 5G: A simple explanation
 
Efficient Deep Learning in Communications
Efficient Deep Learning in CommunicationsEfficient Deep Learning in Communications
Efficient Deep Learning in Communications
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 

Similar to Efficient video perception through AI

“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from Qualcomm“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from QualcommEdge AI and Vision Alliance
 
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemHai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemAI Frontiers
 
Networking Challenges for the Next Decade
Networking Challenges for the Next DecadeNetworking Challenges for the Next Decade
Networking Challenges for the Next DecadeOpen Networking Summit
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsPetteriTeikariPhD
 
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...Edge AI and Vision Alliance
 
Going Remote: Running VFX Virtual Workstations
Going Remote: Running VFX Virtual WorkstationsGoing Remote: Running VFX Virtual Workstations
Going Remote: Running VFX Virtual WorkstationsAmazon Web Services
 
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...Edge AI and Vision Alliance
 
“AI-ISP: Adding Real-time AI Functionality to Image Signal Processing with Re...
“AI-ISP: Adding Real-time AI Functionality to Image Signal Processing with Re...“AI-ISP: Adding Real-time AI Functionality to Image Signal Processing with Re...
“AI-ISP: Adding Real-time AI Functionality to Image Signal Processing with Re...Edge AI and Vision Alliance
 
The evolution of data center network fabrics
The evolution of data center network fabricsThe evolution of data center network fabrics
The evolution of data center network fabricsCisco Canada
 
“Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio f...
“Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio f...“Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio f...
“Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio f...Edge AI and Vision Alliance
 
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...Edge AI and Vision Alliance
 
Ten reasons to buy a network camera
Ten reasons to buy a network cameraTen reasons to buy a network camera
Ten reasons to buy a network cameraAxis Communications
 
10 reasons why you shouldn't buy an anologue camera
10 reasons why you shouldn't buy an anologue camera10 reasons why you shouldn't buy an anologue camera
10 reasons why you shouldn't buy an anologue cameracnssources
 
Edge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-timeEdge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-timeShuquan Huang
 
Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Kieran Kunhya
 
Future Internet: Managing Innovation and Testbed
Future Internet: Managing Innovation and TestbedFuture Internet: Managing Innovation and Testbed
Future Internet: Managing Innovation and TestbedShinji Shimojo
 
Cloud Gaming - A Green Solution to Massive Multiplayer Online Games
Cloud Gaming - A Green Solution to Massive Multiplayer Online Games Cloud Gaming - A Green Solution to Massive Multiplayer Online Games
Cloud Gaming - A Green Solution to Massive Multiplayer Online Games Suhas Urs
 
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...Amazon Web Services
 

Similar to Efficient video perception through AI (20)

“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from Qualcomm“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from Qualcomm
 
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemHai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
 
Networking Challenges for the Next Decade
Networking Challenges for the Next DecadeNetworking Challenges for the Next Decade
Networking Challenges for the Next Decade
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
 
Going Remote: Running VFX Virtual Workstations
Going Remote: Running VFX Virtual WorkstationsGoing Remote: Running VFX Virtual Workstations
Going Remote: Running VFX Virtual Workstations
 
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
 
Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?
 
“AI-ISP: Adding Real-time AI Functionality to Image Signal Processing with Re...
“AI-ISP: Adding Real-time AI Functionality to Image Signal Processing with Re...“AI-ISP: Adding Real-time AI Functionality to Image Signal Processing with Re...
“AI-ISP: Adding Real-time AI Functionality to Image Signal Processing with Re...
 
The evolution of data center network fabrics
The evolution of data center network fabricsThe evolution of data center network fabrics
The evolution of data center network fabrics
 
“Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio f...
“Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio f...“Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio f...
“Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio f...
 
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
 
Ten reasons to buy a network camera
Ten reasons to buy a network cameraTen reasons to buy a network camera
Ten reasons to buy a network camera
 
10 reasons why you shouldn't buy an anologue camera
10 reasons why you shouldn't buy an anologue camera10 reasons why you shouldn't buy an anologue camera
10 reasons why you shouldn't buy an anologue camera
 
Edge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-timeEdge optimized architecture for fabric defect detection in real-time
Edge optimized architecture for fabric defect detection in real-time
 
Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...
 
Future Internet: Managing Innovation and Testbed
Future Internet: Managing Innovation and TestbedFuture Internet: Managing Innovation and Testbed
Future Internet: Managing Innovation and Testbed
 
Cloud Gaming - A Green Solution to Massive Multiplayer Online Games
Cloud Gaming - A Green Solution to Massive Multiplayer Online Games Cloud Gaming - A Green Solution to Massive Multiplayer Online Games
Cloud Gaming - A Green Solution to Massive Multiplayer Online Games
 
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
 
Sem vaibhav belkhude
Sem vaibhav belkhudeSem vaibhav belkhude
Sem vaibhav belkhude
 

More from Qualcomm Research

Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdfQualcomm Research
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
 
Why and what you need to know about 6G in 2022
Why and what you need to know about 6G in 2022Why and what you need to know about 6G in 2022
Why and what you need to know about 6G in 2022Qualcomm Research
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfQualcomm Research
 
Enabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdfEnabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdfQualcomm Research
 
Presentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIPresentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIQualcomm Research
 
Bringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingBringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingQualcomm Research
 
How will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdfHow will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdfQualcomm Research
 
Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)Qualcomm Research
 
Realizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5GRealizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5GQualcomm Research
 
3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolution3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolutionQualcomm Research
 
AI firsts: Leading from research to proof-of-concept
AI firsts: Leading from research to proof-of-conceptAI firsts: Leading from research to proof-of-concept
AI firsts: Leading from research to proof-of-conceptQualcomm Research
 
Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18Qualcomm Research
 
5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edge5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edgeQualcomm Research
 
Enabling on-device learning at scale
Enabling on-device learning at scaleEnabling on-device learning at scale
Enabling on-device learning at scaleQualcomm Research
 
The essential role of AI in the 5G future
The essential role of AI in the 5G futureThe essential role of AI in the 5G future
The essential role of AI in the 5G futureQualcomm Research
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsQualcomm Research
 
Role of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous drivingRole of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous drivingQualcomm Research
 

More from Qualcomm Research (20)

Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
The future of AI is hybrid
The future of AI is hybridThe future of AI is hybrid
The future of AI is hybrid
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
 
Why and what you need to know about 6G in 2022
Why and what you need to know about 6G in 2022Why and what you need to know about 6G in 2022
Why and what you need to know about 6G in 2022
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdf
 
Enabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdfEnabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdf
 
Presentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIPresentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AI
 
Bringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingBringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensing
 
How will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdfHow will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdf
 
Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)
 
Realizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5GRealizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5G
 
3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolution3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolution
 
AI firsts: Leading from research to proof-of-concept
AI firsts: Leading from research to proof-of-conceptAI firsts: Leading from research to proof-of-concept
AI firsts: Leading from research to proof-of-concept
 
Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18
 
5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edge5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edge
 
Enabling on-device learning at scale
Enabling on-device learning at scaleEnabling on-device learning at scale
Enabling on-device learning at scale
 
The essential role of AI in the 5G future
The essential role of AI in the 5G futureThe essential role of AI in the 5G future
The essential role of AI in the 5G future
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecs
 
Role of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous drivingRole of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous driving
 
Pioneering 5G broadcast
Pioneering 5G broadcastPioneering 5G broadcast
Pioneering 5G broadcast
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Efficient video perception through AI

  • 2. 2 • The role of video in our lives • What is video perception & what makes it challenging • Our research toward efficient video perception • Forward looking video perception research Agenda
  • 3. 3 3 A picture is worth a thousand words Out of all the five senses, vision is arguably the most important 3
  • 4. 4 4 A minute of video has more than 4 1,000 pictures
  • 5. 5 How video came to be From performing arts and still photos to high-quality video Performing arts Japanese Kabuki Roman amphitheater Chinese shadow puppetry Television 8K HDR video 360-degree video Chrono- photography Sanskrit plays Greek theater Motion pictures Color analog TV 1900s 1880s 1650s Early ages 1890s 1920s 1938 Polynesian hula Magic lantern Motion camera projectors Digital TVs 1990s 1995 Internet video broadcasting Still photos Motion photos Low-quality video High-quality video
  • 6. 6 6 The scale of video being created and consumed is massive 1M Minutes of video crossing the internet per second 300 Hours of video are uploaded every minute to YouTube 82% Of all consumer internet traffic is online video 76 Minutes per day watching video on digital devices by US adults 8B Average daily video views on Facebook Cisco Visual Networking Index: Forecast and Trends, 2017–2022
  • 7. 7 7 Video monitoring Extended reality Smart cities Smart factories Increasingly, video is all around us — providing entertainment, enhancing collaboration, and transforming industries Autonomous vehicles Video conferencing Sports Smartphone
  • 8. 8 Understand Recognizing patterns, identities, objects, scenes, context, relations, compositions, changes, motions, actions, activities, events, 3D structures, surfaces, lightings, text, emotions, sentiments, sounds, and more Systems Any compute platform, including SoCs, CPUs, GPUs, TPUs, NPUs, and DSPs Making Developing mathematical representations, models, algorithms, rules, and frameworks Video perception Making systems understand video content
  • 9. 9 9 Platform limitations Task diversity Volume of video data (training/testing) Diversity in visual data Quality of data acquisition Availability of annotated datasets What makes video perception challenging? Implementation challenges Data challenges Video perception challenges
  • 10. 10 10 Power and thermal efficiency are essential for on-device video perception The challenge of AI workloads Constrained mobile environment Very compute intensive Large, complicated neural network models Complex concurrencies Always-on Real-time Must be thermally efficient for sleek, ultra-light designs Storage/memory bandwidth limitations Requires long battery life for all-day use
  • 11. 11 Robustness Robust to data variations Adaptability Adaptable to different domains Scalability Scaling up and down, from IoT to the data center Making video perception ubiquitous Solving additional key challenges to take video perception from the research lab to broad commercial deployment
  • 12. 12 12 Efficiently running on-device video perception without sacrificing accuracy Leverage Temporal redundancy By reusing what is computed before • Learning to skip regions • Recycling features Make Early decisions By dynamically changing the network architecture per input frame • Early exiting • Frame exiting 12 Key concepts for efficient video perception
  • 13. 13 Data driven learning • Supervised, unsupervised, semi/self/weak supervised, adversarial Neural network • Convolutional neural networks • Graph neural networks Regression and classification tasks • Minimizing a loss function • Back-propagation over differentiable layers Convolutional layers (deep) Non-linearity, pooling Kernels, neurons Deep learning basics Computable AI 2 Training data Image Convolutional layer Feature map Pooling Fully-connected layer of neurons output Learnable coefficients Feed-forward CNN Vectorized features Inspired by the workings of the brain, drawing from data 13
  • 14. 14 14 Learning to skip redundant computations Video frames are heavily correlated frame t “Skip-convolutions for efficient video processing” (submitted 2021) Limit the computation only to the regions where there are significant changes frame t+10 residual The residual frame, the difference between two consecutive frames, contains little information in most regions
  • 15. 15 “Skip-convolutions for efficient video processing” (submitted 2021) Skip- convolution A convolution at a frame can be written as the previous frame’s convolution plus the convolution of the residual Computation is limited only to the regions where there are strong residuals Reinforce residual’s sparsity by removing negligible residuals Can replace convolutional layers in any CNN with skip convolutions A convolutional layer with a skip gate that masks out negligible residuals Conv G Conv – + Skip gate Previous frame output features Output features Input features Convolutional layer Previous frame input features Current frame input features Output features Previous frame output features Small residual Large residual No computation 15
  • 16. 16 Determining the gate for a skip convolution Can we learn it? Non-trainable Based on thresholding the norm of the convolved residual Approximate the norm of output without computing it Trainable Based on a tiny gating network that predicts gate probabilities Implement as a convolution with a single output channel Apply a sigmoid function to gating network probabilities Apply a sigmoid function on the norm of the residual and the weight of the layer kernel Joint training to minimize the classification loss and average active gates
  • 17. 17 17 Learned gate requires very low computational overhead Gate network is an additional output channel Cout Feature map Gating scores Gate mask Skip convolutions Win Wout No computation Computation “Skip-convolutions for efficient video processing” (submitted 2021) Hardware friendly implementation enforced by imposing structured sparsity into compute masks Block-wise structure by down/up sampling
  • 18. 18 “Skip-convolutions for efficient video processing” (submitted 2021) Predicted gating masks reduce computations Example masks show where computations can be skipped Gating masks become more selective at deeper layers, concentrating on task specific regions Layer 3 Layer 30 Gating masks for video object detection No computation Computation Larger block-wise structured gates are more selective when training with different sizes 1x1 2x2 4x4 8x8 Gating masks for pose estimation
  • 19. 19 0.5 0.55 0.6 0.65 0.1 1 10 100 1000 Average precision GMAC Object detection SpotNet Deep Feature Flow EfficientDet Ours-EfficientDet “Skip-convolutions for efficient video processing” (submitted 2021) Learning to skip reduces compute and maintains accuracy Results for object detection on video object detection dataset 3x- speed-up over state-of-the-art 5x
  • 20. 20 “Skip-convolutions for efficient video processing” (submitted 2021) Learning to skip reduces compute for human pose estimation Results for human pose estimation GMACs without skip convolutions GMACs with skip-convolutions 10.2 10.2 1.1 10.2 10.2 10.2 0.4 10.2 0.4 10.2 0.4 10.2 0.5 10.2 0.7 10.2 0.8 10.2 0.7 10.2
  • 21. 21 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0 2 4 6 8 10 Percentage of correct keypoints GMAC Pose estimation HRNet-w32 L0 W-SVD S-SVD Ours-Skip Ours-Skip+S-SVD “Skip-convolutions for efficient video processing” (submitted 2021) Learning to skip is complementary to model compression Results for human pose estimation on video human action dataset speed-up over HRNet 2.5x-8x
  • 22. 22 22 Instead of computing deep features repetitively, compute once and recycle “Time-sharing networks for efficient semantic video segmentation” (submitted 2021) Recycling features saves compute Compute deep features once and recycle — reuse from past frame Compute shallow features for all frames Deep features remain relatively stationary over time — they have lower spatial resolution Shallow features are more responsive to smooth changes, encoding the temporally varying information Applicable to any video neural network architectures including segmentation, optical flow, classification, and more
  • 23. 23 “Time-sharing networks for efficient semantic video segmentation” (submitted 2021) Example of a feature recycling network Making the best use of the deep features at hierarchical scales Frame t Frame t+1 Compute Data Flow Stride=2, Convolution Bilinear Up-sampling Features Transition Frame t (Key frame) Transition C Ψ Multi-scale Convolution Feature Sharing + Addition C Concat Convolution Based Decoder Block Ψ Convolution Block H H Classifier Time Next key frame Transition Transition
  • 24. 24 C Ψ Frame t+1 + + Feature Transform Feature Transform Feature Transform H Branch Decision “Time-sharing networks for efficient semantic video segmentation” (submitted 2021) Example of a feature recycling network Most frames require a light branch; some frames require a heavy branch Frame t Frame t+1 Compute Data Flow Stride=2, Convolution Bilinear Up-sampling Features Transition Frame t (Key frame) Transition C Ψ Multi-scale Convolution Feature Sharing + Addition C Concat Convolution Based Decoder Block Ψ Convolution Block H H Classifier Time Next key frame Transition Transition
  • 25. 25 25 Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/ “Time-sharing networks for efficient semantic video segmentation” (submitted 2021) Recycling features saves compute Instead of computing deep features repetitively, compute once and recycle 25
  • 26. 26 74.0 26.0 17.0 77.9 Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc. and/or its subsidiaries. Model efficiency 78% GMACs On-device latency (ms/frame) GMAC reduction Feature recycling reduces compute and latency Semantic segmentation example Input: 2048x1024 RGB video Output: 2048x1024, 19 object classes Runs on: Qualcomm® Snapdragon™ 888 Mobile Platform 65% latency reduction Memory traffic 321.3 373.6 89% MB read MB write read reduction 93% write reduction 41.3 22.6 Enhanced Net HRNet w18 v2 “Time-sharing networks for efficient semantic video segmentation” (submitted 2021)
  • 27. 27 27 Exploit the fact that not all input examples require models of the same complexity “FrameExit: Conditional early exiting for efficient video recognition” (submitted 2021) Early exiting a neural network saves compute Very large, computationally intensive models are needed to correctly classify Very small and compact models can achieve very high accuracies, but they fail for complex examples Complex examples Simple examples Ideally, our system should be composed of a cascade of classifiers throughout the network
  • 28. 28 Early exiting at the earliest possible NN layer for video Conv G1 Accept previous frame layer output Feature from t-1 Feature from t G1: Gating based on temporal agreement Continue Similar Not similar FC Classifier G2 G2: Gating based on frame complexity Exit Continue Confident Not confident Frame t-1 Exit NN layer “FrameExit: Conditional early exiting for efficient video recognition” (submitted 2021) Frame t Exit NN layer
  • 29. 29 29 0.91 0.915 0.92 0.925 0.93 0.935 0.94 0.945 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Top-1 accuracy MAC (x108) Classification on image dataset RANets1 RANets2 RANets3 MSDNet Ours-MSDNet DenseNet Ours-DenseNet ResNet Ours-ResNet Early exiting reduces compute while maintaining accuracy “FrameExit: Conditional early exiting for efficient video recognition” (submitted 2021) Early exiting applies to most neural network backbones
  • 30. 30 0.77 0.775 0.78 0.785 0.79 0.795 0.8 4 6 8 10 12 14 16 18 20 Top-1 accuracy GMAC Object classification on video street scene dataset ResNet34 ResNet34-Uniform2 Ours-ResNet34-G1 Ours-ResNet34-G1/G2 “FrameExit: Conditional early exiting for efficient video recognition” (submitted 2021) Early exiting for object classification 2.5x less MACs while maintaining accuracy
  • 31. 31 Gating module 2-layer MLP 2-layer MLP Shared weights “FrameExit: Conditional early exiting for efficient video recognition” (submitted 2021) Frame exiting also applies to action recognition tasks Policy Function Frame feature G1 Frame feature Classifier 1 G2 Frame feature Classifier 2 Accumulated Feature Pooling G3 Frame feature Classifier 3 Accumulated Feature Pooling G4 Frame feature Classifier 4 Accumulated Feature Pooling Frame feature Classifier 5 Accumulated Feature Pooling G5
  • 32. 32 32 Frame exiting improves accuracy and reduces compute Policy Function Policy Function Policy Function Methods Video activity dataset Video action dataset mAP (%) GFLOPS Top-1 (%) GFLOPS Resnet AdaFrame 71.5 79.0 — — LiteEval 72.7 95.1 61.0 99.0 ListenToLook 72.3 81.4 — — SCSampler 72.9 41.9 70.8 41.9 AR-Net 73.8 33.5 71.7 32.0 FrameExit 76.1 25.2 72.8 19.7 EfficientNet AR-Net 79.7 15.3 74.8 16.3 FrameExit 80.0 11.4 75.3 7.8 mAP (%) GFLOPS 2d/3d Resnet Video human action dataset Uniform-10 44.7 41.2 Random-10 43.6 41.2 3D-ResNet18 35.4 38.6* HATNet 39.6 41.8* FrameExit (Efficient) 45.7 8.6 FrameExit (Accurate) 49.2 18.7 By adding gates to the NN architecture, deeper layers concentrate on the difficult decisions while earlier layers solve all the easy issues
  • 33. 33 60 65 70 75 80 5 25 45 65 85 105 mean Average Precision (mAP) GFLOPs Video classification on video activity dataset AdaFrame5 ARNet-ResNet ARNet-EfficientNet L2L-Mnet LiteEval Ours-ResNet Ours-EfficientNet “FrameExit: Conditional early exiting for efficient video recognition” (submitted 2021) Frame exiting for video classification less GFLOPs while maintaining accuracy 1.3x-5x
  • 34. 34 Efficient sparse convolutions Personalization Multi-task networks Quantization aware training Platform optimizations Unsupervised / semi- supervised learning Advance existing conditional compute techniques Develop efficient video neural network solutions Learning to skip regions Recycling features Early exiting Frame exiting Future work in video perception
  • 35. 35 Machine learning core Computer vision 3D Sensing RF Sensing Personalization Biometrics Data driven models XR Camera Autonomous vehicles Fingerprint ASP IOT Cloud Wi-Fi Mobile Research Areas Applications Leading high-impact research efforts in perception Inventing technology enablers for important applications 35 Our perception research is much broader than video
  • 36. 36 Video perception is crucial for understanding the world and making devices smarter We are conducting leading research and development in video perception We are making power efficient video perception possible without sacrificing accuracy
  • 38. Follow us on: For more information, visit us at: www.qualcomm.com & www.qualcomm.com/blog Thank you Nothing in these materials is an offer to sell any of the components or devices referenced herein. ©2018-2021 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm and Snapdragon are trademarks or registered trademarks of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.