SlideShare a Scribd company logo
1 of 29
Download to read offline
How Transformers are
Changing the Direction of
Deep Learning
Architectures
Tom Michiels
System Architect
Synopsys
• The Surprising Rise of Transformers in Vision
• The Structure of Attention and Transformer
• Transformers applied to Vision and Other Application Domains
• Why Transformers are Here to Stay for Vision
Outline
© 2022 Synopsys 2
CNNs Have Dominated Many Vision Tasks Since 2012
2012 ... 2014 2015 2016 2017 2018 2019 2020 2021 2022
AlexNet VGG ResNet MobileNet V1 MobileNet V2
EfficientNet
MobileNet v3
Image Classification
© 2022 Synopsys 3
2012 ... 2014
AlexNet VGG ResNet MobileNet V1 MobileNet V2
EfficientNet
MobileNet v3
RCNN
FRCNN
SSD
YOLOV2 Mask RCNN YoloV3
YoloV4/V5
EfficientDet
Object Detection
© 2022 Synopsys 4
2015 2016 2017 2018 2019 2020 2021 2022
CNNs Have Dominated Many Vision Tasks Since 2012
2012 ... 2014
AlexNet VGG ResNet MobileNet V1 MobileNet V2
EfficientNet
MobileNet v3
RCNN
FRCNN
SSD
YOLOV2 Mask RCNN YoloV3
YoloV4/V5
EfficientDet
FCN
DeepLab
SegNet DSNet
Semantic Segmentation
© 2022 Synopsys 5
2015 2016 2017 2018 2019 2020 2021 2022
CNNs Have Dominated Many Vision Tasks Since 2012
2012 ... 2014
AlexNet VGG ResNet MobileNet V1 MobileNet V2
EfficientNet
MobileNet v3
RCNN
FRCNN
SSD
YOLOV2 Mask RCNN YoloV3
YoloV4/V5
EfficientDet
FCN
DeepLab
SegNet DSNet
DeeperLab
PFPN
EfficientPS
YoloP
Panoptic Vision
© 2022 Synopsys 6
2015 2016 2017 2018 2019 2020 2021 2022
CNNs Have Dominated Many Vision Tasks Since 2012
A Decade of CNN Development…
Residual Connection
© 2022 Synopsys 7
Inverted Residual Blocks
Beaten in Accuracy by Transformers
© 2022 Synopsys 8
Is this the real life ? Is this just fantasy ?
Transformer, a model designed for natural
language processing
… without any modifications applied to image patches,
beats the highly specialized CNNs in accuracy
The Structure of Attention and Transformer
• Attention is all you need!(*)
• Bidirectional Encoder Representations from
Transformers
• A Transformer is a deep learning model that uses
attention mechanism
• Transformers were primarily used for Natural
Language Processing
• Translation
• Question Answering
• Conversational AI
• Successful training of huge transformers
• MTM, GPT-3, T5, ALBERT, RoBERTa, T5, Switch
• Transformers are successfully applied in other
application domains with promising results for
embedded use
Bert and Transformers
(*) https://arxiv.org/abs/1706.03762
© 2022 Synopsys 10
Convolutions, Feed Forward, and Multi-Head Attention
11
© 2022 Synopsys
1x1 Convolution
Add
3x3 Convolution
Add
CNN
Transformer
• The Feed Forward layer of the Transformer
is identical to a 1x1 Convolution
• In this part of the model, no information is
flowing between tokens/pixels
• Multi-Head Attention and 3x3 Convolution
layers are the layers responsible for mixing
information between tokens/pixels
Convolutions as Hard-Coded Attention
Both Convolution and Attention Networks mix in features of other tokens/pixels
Convolution Attention
Convolutions mix in features from tokens based on fixed spatial location
Attention mix in features from tokens based on learned attention
© 2022 Synopsys 12
The Structure of a Transformer: Attention
Multi-Head Attention
Attention: Mix in Features of Other Tokens
Is this the real life ?
Is
this
the
real
life
?
© 2022 Synopsys 13
The Structure of a Transformer: Attention
Multi-Head Attention
Attention: Mix in Features of Other Tokens
© 2022 Synopsys 14
The Structure of a Transformer: Attention
NxN
matrix
N: number of tokens
Multi-Head Attention
© 2022 Synopsys 15
+
The Structure of a Transformer: Embedding
Embedding of input tokens and the positional encoding
Is this the real life ? Is this just fantasy ?
embedding vectors
position encoding
elementwise addition
© 2022 Synopsys 17
Other Application Domains:
Vision, Action Recognition, Speech Recognition
Vision Transformers (ViT/L16 or ViT-G/14)
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(*)
Image is split into tiles
(*) https://arxiv.org/abs/2010.11929
Linear Projection
Multi-Head Attention
Feed Forward
Add & Norm
Add & Norm
+
Pixels in a tile are
flattened into tokens (vectors) that feed in
the transformer
N x
Vision Transformers are at the time of
publication best-known method for image
classification
They are beating convolutional neural
networks in accuracy and training time, but
not in inference time.
Position encoding
© 2022 Synopsys 20
Vision Transformer  Increasing Resolution
Linear Projection
Multi-Head Attention
Feed Forward
Add & Norm
Add & Norm
+
N x
Position encoding
Attention matrix scales quadratically with the number
of patches
N x N matrix
Where N = the number of
tokens/patches
© 2022 Synopsys 21
Swin Transformers
Hierarchical Vision Transformer using Shifted Windows (*)
Adaptation makes Transformers scale for
larger images:
1. Shifted Window Attention
2. Patch-Merging
State of the Art for
• Object Detection (COCO)
• Semantic Segmentation (ADE20K)
(*) https://arxiv.org/abs/2103.14030
Shifted Window Attention Patch-Merging
22
© 2022 Synopsys
Action Classification with Transformers
Video Swin Transformer
https://arxiv.org/abs/2106.13230
Video Swin Transformers extend the (shifted) window to three
dimensions (2D spatial + time)
Today’s state of the art on Kinetics-400 and Kinetics-600
© 2022 Synopsys 23
© 2022 Synopsys
Action Classification with Transformers
Is Space-Time Attention All You Need for Video Understanding?
https://arxiv.org/abs/2102.05095
• Transformers can directly be applied to video
• Like for ViT, the video frames are split-up in tiles that feed directly in the Transformer
• Applying attention separately on time and on space “Divided Attention” gives (at time of publication) state of the art
results on Kinetics-400 and Kinetics-600 benchmarks
24
• Conformers are Transformer with and additional Convolution Module
• The convolution module contains a pointwise and a depthwise (1D, size=31) convolution:
• Compared to RNN, LSTM, DW-Conv and Transformers, Conformers give excellent accuracy / size ratio:
• Best known methods for speech recognition (LibriSpeech)
are based on Conformers
Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition (*)
WER
(%)
5.0
1.0
2.0
3.0
4.0
6.0
7.0
8.0
9.0
10.0 Not Released
Hybrid Model
Transformer
# of parameters (M)
25 275 300
50 75 100 125 150 175 200 225 250 325
https://arxiv.org/abs/2005.08100
© 2022 Synopsys 25
Why Attention and Transformers
are Here to Stay for Vision
Visual Perception beyond Segmentation & Object
Detection
Future applications like security cameras, personal assistants, storage retrieval,…. require a deeper
understanding of the world
 Merging NLP and Vision using the same knowledge representation backend
Today
Panoptic Segmentation
2022-…
What is happening in this scene?
© 2022 Synopsys 27
Tesla AI Day: Using Transformers Make Predictions in
Vector Space
• Convolutional neural network extract features
for every camera
• A transformer is used to:
• Fuse multiple cameras
• Make predictions directly in bird-eye-
view vector space
© 2022 Synopsys 28
• Attention based networks outperform CNN-only networks on accuracy
• Highest accuracy required for high-end applications
• Models that combine Vision Transformers with Convolutions are more efficient at
inference
• Examples: MobileViT(*), CoAtNet(**)
• Full visual perception requires knowledge that may not easily be acquired by
vision only
• Multi-modal learning required for a deeper understanding of visual information
• Application integrating multiple sensors benefit from attention-based networks
Why Transformers are Here to Stay in Vision
(*) https://arxiv.org/abs/2110.02178
(**) https://arxiv.org/abs/2106.04803v2
30
© 2022 Synopsys
• Transformers are deep learning models primarily used in the field of NLP
• Transformers lead to state-of-the-art results in other application domains of deep
learning like vision and speech
• They can be applied to other domains with surprisingly little modifications
• Models that combine attention and convolutions outperform convolutional
neural networks on vision tasks, even for small models
• Transformers and attention for vision applications are here to stay
• Real world applications require knowledge that is not easily captured with
convolutions
Summary
© 2022 Synopsys 32
Resources
Join the Synopsys Deep Dive
Optimize AI Performance & Power for Tomorrow’s Neural
Network Applications (Thursday, 12-3 PM)
Synopsys Demos in Booth 719
• Executing Transformer Neural Networks in ARC NPX6 NPU
IP
• Driver Management System on ARC EV Processor IP with
Visidon
• Neural Network-Enhanced Radar Processing on ARC VPX5
DSP with SensorCortek
Resources
ARVIX.org
https://arxiv.org/abs/1706.03762
ARC NPX6 NPU IP
www.synopsys.com/arc
© 2022 Synopsys 33

More Related Content

What's hot

Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxDeep Learning Italia
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachFerdin Joe John Joseph PhD
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...changedaeoh
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 
Semantic Segmentation AIML Project
Semantic Segmentation AIML ProjectSemantic Segmentation AIML Project
Semantic Segmentation AIML ProjectHitesh
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 

What's hot (20)

Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
Semantic Segmentation AIML Project
Semantic Segmentation AIML ProjectSemantic Segmentation AIML Project
Semantic Segmentation AIML Project
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
ViT.pptx
ViT.pptxViT.pptx
ViT.pptx
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Bert
BertBert
Bert
 

Similar to Transformers Changing Vision Deep Learning

“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...Edge AI and Vision Alliance
 
Transformer models for FER
Transformer models for FERTransformer models for FER
Transformer models for FERIRJET Journal
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...Edge AI and Vision Alliance
 
“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from Qualcomm“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from QualcommEdge AI and Vision Alliance
 
Efficient video perception through AI
Efficient video perception through AIEfficient video perception through AI
Efficient video perception through AIQualcomm Research
 
MPEG-Immersive 3DoF+: 360 Video Streaming for Virtual Reality
MPEG-Immersive 3DoF+: 360 Video Streaming for Virtual RealityMPEG-Immersive 3DoF+: 360 Video Streaming for Virtual Reality
MPEG-Immersive 3DoF+: 360 Video Streaming for Virtual Realitymcslgachon
 
The evolution of data center network fabrics
The evolution of data center network fabricsThe evolution of data center network fabrics
The evolution of data center network fabricsCisco Canada
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfRakuten Group, Inc.
 
CICD Pipelines for Microservices Best Practices
CICD Pipelines for Microservices Best Practices CICD Pipelines for Microservices Best Practices
CICD Pipelines for Microservices Best Practices Codefresh
 
Multicloud as the Next Generation of Cloud Infrastructure
Multicloud as the Next Generation of Cloud Infrastructure Multicloud as the Next Generation of Cloud Infrastructure
Multicloud as the Next Generation of Cloud Infrastructure Brad Eckert
 
Виктор Ерухимов Open VX mixar moscow sept'15
Виктор Ерухимов Open VX  mixar moscow sept'15 Виктор Ерухимов Open VX  mixar moscow sept'15
Виктор Ерухимов Open VX mixar moscow sept'15 mixARConference
 
“Challenges in Architecting Vision Inference Systems for Transformer Models,”...
“Challenges in Architecting Vision Inference Systems for Transformer Models,”...“Challenges in Architecting Vision Inference Systems for Transformer Models,”...
“Challenges in Architecting Vision Inference Systems for Transformer Models,”...Edge AI and Vision Alliance
 
vCPE 2.0 – the business case for an open vCPE framework
vCPE 2.0 – the business case for an open vCPE frameworkvCPE 2.0 – the business case for an open vCPE framework
vCPE 2.0 – the business case for an open vCPE frameworkCloudify Community
 
Edge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different PiecesEdge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different PiecesCloudify Community
 
Cloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
Cloudify: Open vCPE Design Concepts and Multi-Cloud OrchestrationCloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
Cloudify: Open vCPE Design Concepts and Multi-Cloud OrchestrationCloudify Community
 
CICD Pipelines for Microservices: Lessons from the Trenches
CICD Pipelines for Microservices: Lessons from the TrenchesCICD Pipelines for Microservices: Lessons from the Trenches
CICD Pipelines for Microservices: Lessons from the TrenchesCodefresh
 
leewayhertz.com-HOW IS A VISION TRANSFORMER MODEL ViT BUILT AND IMPLEMENTED.pdf
leewayhertz.com-HOW IS A VISION TRANSFORMER MODEL ViT BUILT AND IMPLEMENTED.pdfleewayhertz.com-HOW IS A VISION TRANSFORMER MODEL ViT BUILT AND IMPLEMENTED.pdf
leewayhertz.com-HOW IS A VISION TRANSFORMER MODEL ViT BUILT AND IMPLEMENTED.pdfrobertsamuel23
 
OpenStack and Kubernetes - A match made for Telco Heaven
OpenStack and Kubernetes - A match made for Telco HeavenOpenStack and Kubernetes - A match made for Telco Heaven
OpenStack and Kubernetes - A match made for Telco HeavenTrinath Somanchi
 

Similar to Transformers Changing Vision Deep Learning (20)

“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
 
Transformer models for FER
Transformer models for FERTransformer models for FER
Transformer models for FER
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
 
“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from Qualcomm“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from Qualcomm
 
Efficient video perception through AI
Efficient video perception through AIEfficient video perception through AI
Efficient video perception through AI
 
MPEG-Immersive 3DoF+: 360 Video Streaming for Virtual Reality
MPEG-Immersive 3DoF+: 360 Video Streaming for Virtual RealityMPEG-Immersive 3DoF+: 360 Video Streaming for Virtual Reality
MPEG-Immersive 3DoF+: 360 Video Streaming for Virtual Reality
 
The evolution of data center network fabrics
The evolution of data center network fabricsThe evolution of data center network fabrics
The evolution of data center network fabrics
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
CICD Pipelines for Microservices Best Practices
CICD Pipelines for Microservices Best Practices CICD Pipelines for Microservices Best Practices
CICD Pipelines for Microservices Best Practices
 
Multicloud as the Next Generation of Cloud Infrastructure
Multicloud as the Next Generation of Cloud Infrastructure Multicloud as the Next Generation of Cloud Infrastructure
Multicloud as the Next Generation of Cloud Infrastructure
 
Whats new in eCognition 8.8
Whats new in eCognition 8.8Whats new in eCognition 8.8
Whats new in eCognition 8.8
 
Виктор Ерухимов Open VX mixar moscow sept'15
Виктор Ерухимов Open VX  mixar moscow sept'15 Виктор Ерухимов Open VX  mixar moscow sept'15
Виктор Ерухимов Open VX mixar moscow sept'15
 
“Challenges in Architecting Vision Inference Systems for Transformer Models,”...
“Challenges in Architecting Vision Inference Systems for Transformer Models,”...“Challenges in Architecting Vision Inference Systems for Transformer Models,”...
“Challenges in Architecting Vision Inference Systems for Transformer Models,”...
 
vCPE 2.0 – the business case for an open vCPE framework
vCPE 2.0 – the business case for an open vCPE frameworkvCPE 2.0 – the business case for an open vCPE framework
vCPE 2.0 – the business case for an open vCPE framework
 
Edge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different PiecesEdge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different Pieces
 
Cloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
Cloudify: Open vCPE Design Concepts and Multi-Cloud OrchestrationCloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
Cloudify: Open vCPE Design Concepts and Multi-Cloud Orchestration
 
CICD Pipelines for Microservices: Lessons from the Trenches
CICD Pipelines for Microservices: Lessons from the TrenchesCICD Pipelines for Microservices: Lessons from the Trenches
CICD Pipelines for Microservices: Lessons from the Trenches
 
Cloud to Edge
Cloud to EdgeCloud to Edge
Cloud to Edge
 
leewayhertz.com-HOW IS A VISION TRANSFORMER MODEL ViT BUILT AND IMPLEMENTED.pdf
leewayhertz.com-HOW IS A VISION TRANSFORMER MODEL ViT BUILT AND IMPLEMENTED.pdfleewayhertz.com-HOW IS A VISION TRANSFORMER MODEL ViT BUILT AND IMPLEMENTED.pdf
leewayhertz.com-HOW IS A VISION TRANSFORMER MODEL ViT BUILT AND IMPLEMENTED.pdf
 
OpenStack and Kubernetes - A match made for Telco Heaven
OpenStack and Kubernetes - A match made for Telco HeavenOpenStack and Kubernetes - A match made for Telco Heaven
OpenStack and Kubernetes - A match made for Telco Heaven
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Transformers Changing Vision Deep Learning

  • 1. How Transformers are Changing the Direction of Deep Learning Architectures Tom Michiels System Architect Synopsys
  • 2. • The Surprising Rise of Transformers in Vision • The Structure of Attention and Transformer • Transformers applied to Vision and Other Application Domains • Why Transformers are Here to Stay for Vision Outline © 2022 Synopsys 2
  • 3. CNNs Have Dominated Many Vision Tasks Since 2012 2012 ... 2014 2015 2016 2017 2018 2019 2020 2021 2022 AlexNet VGG ResNet MobileNet V1 MobileNet V2 EfficientNet MobileNet v3 Image Classification © 2022 Synopsys 3
  • 4. 2012 ... 2014 AlexNet VGG ResNet MobileNet V1 MobileNet V2 EfficientNet MobileNet v3 RCNN FRCNN SSD YOLOV2 Mask RCNN YoloV3 YoloV4/V5 EfficientDet Object Detection © 2022 Synopsys 4 2015 2016 2017 2018 2019 2020 2021 2022 CNNs Have Dominated Many Vision Tasks Since 2012
  • 5. 2012 ... 2014 AlexNet VGG ResNet MobileNet V1 MobileNet V2 EfficientNet MobileNet v3 RCNN FRCNN SSD YOLOV2 Mask RCNN YoloV3 YoloV4/V5 EfficientDet FCN DeepLab SegNet DSNet Semantic Segmentation © 2022 Synopsys 5 2015 2016 2017 2018 2019 2020 2021 2022 CNNs Have Dominated Many Vision Tasks Since 2012
  • 6. 2012 ... 2014 AlexNet VGG ResNet MobileNet V1 MobileNet V2 EfficientNet MobileNet v3 RCNN FRCNN SSD YOLOV2 Mask RCNN YoloV3 YoloV4/V5 EfficientDet FCN DeepLab SegNet DSNet DeeperLab PFPN EfficientPS YoloP Panoptic Vision © 2022 Synopsys 6 2015 2016 2017 2018 2019 2020 2021 2022 CNNs Have Dominated Many Vision Tasks Since 2012
  • 7. A Decade of CNN Development… Residual Connection © 2022 Synopsys 7 Inverted Residual Blocks
  • 8. Beaten in Accuracy by Transformers © 2022 Synopsys 8 Is this the real life ? Is this just fantasy ? Transformer, a model designed for natural language processing … without any modifications applied to image patches, beats the highly specialized CNNs in accuracy
  • 9. The Structure of Attention and Transformer
  • 10. • Attention is all you need!(*) • Bidirectional Encoder Representations from Transformers • A Transformer is a deep learning model that uses attention mechanism • Transformers were primarily used for Natural Language Processing • Translation • Question Answering • Conversational AI • Successful training of huge transformers • MTM, GPT-3, T5, ALBERT, RoBERTa, T5, Switch • Transformers are successfully applied in other application domains with promising results for embedded use Bert and Transformers (*) https://arxiv.org/abs/1706.03762 © 2022 Synopsys 10
  • 11. Convolutions, Feed Forward, and Multi-Head Attention 11 © 2022 Synopsys 1x1 Convolution Add 3x3 Convolution Add CNN Transformer • The Feed Forward layer of the Transformer is identical to a 1x1 Convolution • In this part of the model, no information is flowing between tokens/pixels • Multi-Head Attention and 3x3 Convolution layers are the layers responsible for mixing information between tokens/pixels
  • 12. Convolutions as Hard-Coded Attention Both Convolution and Attention Networks mix in features of other tokens/pixels Convolution Attention Convolutions mix in features from tokens based on fixed spatial location Attention mix in features from tokens based on learned attention © 2022 Synopsys 12
  • 13. The Structure of a Transformer: Attention Multi-Head Attention Attention: Mix in Features of Other Tokens Is this the real life ? Is this the real life ? © 2022 Synopsys 13
  • 14. The Structure of a Transformer: Attention Multi-Head Attention Attention: Mix in Features of Other Tokens © 2022 Synopsys 14
  • 15. The Structure of a Transformer: Attention NxN matrix N: number of tokens Multi-Head Attention © 2022 Synopsys 15
  • 16. + The Structure of a Transformer: Embedding Embedding of input tokens and the positional encoding Is this the real life ? Is this just fantasy ? embedding vectors position encoding elementwise addition © 2022 Synopsys 17
  • 17. Other Application Domains: Vision, Action Recognition, Speech Recognition
  • 18. Vision Transformers (ViT/L16 or ViT-G/14) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(*) Image is split into tiles (*) https://arxiv.org/abs/2010.11929 Linear Projection Multi-Head Attention Feed Forward Add & Norm Add & Norm + Pixels in a tile are flattened into tokens (vectors) that feed in the transformer N x Vision Transformers are at the time of publication best-known method for image classification They are beating convolutional neural networks in accuracy and training time, but not in inference time. Position encoding © 2022 Synopsys 20
  • 19. Vision Transformer  Increasing Resolution Linear Projection Multi-Head Attention Feed Forward Add & Norm Add & Norm + N x Position encoding Attention matrix scales quadratically with the number of patches N x N matrix Where N = the number of tokens/patches © 2022 Synopsys 21
  • 20. Swin Transformers Hierarchical Vision Transformer using Shifted Windows (*) Adaptation makes Transformers scale for larger images: 1. Shifted Window Attention 2. Patch-Merging State of the Art for • Object Detection (COCO) • Semantic Segmentation (ADE20K) (*) https://arxiv.org/abs/2103.14030 Shifted Window Attention Patch-Merging 22 © 2022 Synopsys
  • 21. Action Classification with Transformers Video Swin Transformer https://arxiv.org/abs/2106.13230 Video Swin Transformers extend the (shifted) window to three dimensions (2D spatial + time) Today’s state of the art on Kinetics-400 and Kinetics-600 © 2022 Synopsys 23
  • 22. © 2022 Synopsys Action Classification with Transformers Is Space-Time Attention All You Need for Video Understanding? https://arxiv.org/abs/2102.05095 • Transformers can directly be applied to video • Like for ViT, the video frames are split-up in tiles that feed directly in the Transformer • Applying attention separately on time and on space “Divided Attention” gives (at time of publication) state of the art results on Kinetics-400 and Kinetics-600 benchmarks 24
  • 23. • Conformers are Transformer with and additional Convolution Module • The convolution module contains a pointwise and a depthwise (1D, size=31) convolution: • Compared to RNN, LSTM, DW-Conv and Transformers, Conformers give excellent accuracy / size ratio: • Best known methods for speech recognition (LibriSpeech) are based on Conformers Speech Recognition Conformer: Convolution-augmented Transformer for Speech Recognition (*) WER (%) 5.0 1.0 2.0 3.0 4.0 6.0 7.0 8.0 9.0 10.0 Not Released Hybrid Model Transformer # of parameters (M) 25 275 300 50 75 100 125 150 175 200 225 250 325 https://arxiv.org/abs/2005.08100 © 2022 Synopsys 25
  • 24. Why Attention and Transformers are Here to Stay for Vision
  • 25. Visual Perception beyond Segmentation & Object Detection Future applications like security cameras, personal assistants, storage retrieval,…. require a deeper understanding of the world  Merging NLP and Vision using the same knowledge representation backend Today Panoptic Segmentation 2022-… What is happening in this scene? © 2022 Synopsys 27
  • 26. Tesla AI Day: Using Transformers Make Predictions in Vector Space • Convolutional neural network extract features for every camera • A transformer is used to: • Fuse multiple cameras • Make predictions directly in bird-eye- view vector space © 2022 Synopsys 28
  • 27. • Attention based networks outperform CNN-only networks on accuracy • Highest accuracy required for high-end applications • Models that combine Vision Transformers with Convolutions are more efficient at inference • Examples: MobileViT(*), CoAtNet(**) • Full visual perception requires knowledge that may not easily be acquired by vision only • Multi-modal learning required for a deeper understanding of visual information • Application integrating multiple sensors benefit from attention-based networks Why Transformers are Here to Stay in Vision (*) https://arxiv.org/abs/2110.02178 (**) https://arxiv.org/abs/2106.04803v2 30 © 2022 Synopsys
  • 28. • Transformers are deep learning models primarily used in the field of NLP • Transformers lead to state-of-the-art results in other application domains of deep learning like vision and speech • They can be applied to other domains with surprisingly little modifications • Models that combine attention and convolutions outperform convolutional neural networks on vision tasks, even for small models • Transformers and attention for vision applications are here to stay • Real world applications require knowledge that is not easily captured with convolutions Summary © 2022 Synopsys 32
  • 29. Resources Join the Synopsys Deep Dive Optimize AI Performance & Power for Tomorrow’s Neural Network Applications (Thursday, 12-3 PM) Synopsys Demos in Booth 719 • Executing Transformer Neural Networks in ARC NPX6 NPU IP • Driver Management System on ARC EV Processor IP with Visidon • Neural Network-Enhanced Radar Processing on ARC VPX5 DSP with SensorCortek Resources ARVIX.org https://arxiv.org/abs/1706.03762 ARC NPX6 NPU IP www.synopsys.com/arc © 2022 Synopsys 33