SlideShare a Scribd company logo
1 of 24
Download to read offline
Learning for 360°
Vision
Yu-Chuan Su
Research Scientist
Google
Expected market growth
[Precedence Research]
Applications
Growing Popularity of 360° Media
Virtual / Augmented Reality
Drone
Photography Surveillance
© 2023 Google 2
360° Image vs. Perspective Image
© 2023 Google 3
360° Image vs. Perspective Image
𝜃
𝜙
© 2023 Google 4
• Applications depend on visual recognition
[Zhang et al., ECCV 2014], [Hu et al., CVPR 2017], [Lai et al., TOG 2017], [Chou et al., AAAI
2018], [Yu et al., AAAI 2018], [Lee et al., CVPR 2018]
• CNNs for spherical data
[Boomsma and Frellsen, NeurIPS 2017], [Cohen et al., ICLR 2018], [Cheng et al., CVPR
2018], [Esteves et al., ECCV 2018], [Coors et al., ECCV 2018], [Zhang et al., ECCV 2018],
[Khasanova and Frossard, ICML 2019]
o New architectures designed specifically for spherical data
o Not data efficient – require annotations in spherical formats
Visual Recognition on 360° Imagery
© 2023 Google 5
Strategy I (Equirectangular)
[Hu CVPR 2017], [Lai TOG 2017]
• Fast – single projection
• Inaccurate – distortion in projection
Applying CNNs to 360° Imagery
© 2023 Google 6
Strategy II (Perspective)
[Zhang ECCV 2014], [Chou AAAI 2018],
[Yu AAAI 2018]
• Accurate – exact convolution output
• Slow – repeated projection
Applying CNNs to 360° Imagery
© 2023 Google 7
• Fast – single equirectangular projection
• Accurate – simulate exact convolution output
• Data efficient – does not require additional annotations
Spherical Convolution
© 2023 Google 8
• Goal – account for distortion
• Problem – distortion is polar angle (θ) / row dependent
• Solution – untie kernel weights along the rows*
Spherical Convolution Model Architecture
θ = 90° θ = 54° θ = 18°
Source
* We use one set of weights every 5 rows in practice
© 2023 Google 9
• Goal – accelerate training
• Idea – require 𝑁𝑒 to reproduce all intermediate features
Layer-wise Training
© 2023 Google 10
Apply SphConv to Faster R-CNN
• Pano2Vid (Train)
o 360° video dataset
o Sample 1,056 frames for training
• Pascal VOC 2007 (Test)
o Perspective object detection dataset
o Project bounding boxes to spheres at five polar angle (𝜽)
Experiment: Datasets
© 2023 Google 11
SphConv Evaluation
12
50% more accurate
than Strategy I
Orders of magnitude
faster than Strategy II
* [Hu et al., CVPR 2017], [Lai et al., TOG 2017]
# [Zhang et al., ECCV 2014], [Chou et al., AAAI 2018], [Yu et al., AAAI 2018]
© 2023 Google
RMSE
Accuracy
• Model size increases linearly w.r.t. input resolution (H)
• SphConv kernels are highly correlated → weights are redundant
Problem of Spherical Convolution
CNN SphConv
Asymptotic c2k2 c2k2H
VGG* 56 MB 29 GB
* Over 640x320 input images
© 2023 Google 13
Idea – generates SphConv kernels from source kernels
Kernel Transformer Network
© 2023 Google 14
Kernel Transformer Network Architecture
15
© 2023 Google
CNN SphConv KTN
Asymptotic c2k2 c2k2H c2k2+c2
VGG 56 MB 29 GB 70 MB
Transferability of Kernel Transformer Network
We need only one KTN for different tasks!
© 2023 Google 16
• Appy KTN to Faster R-CNN
[Ren et al., NIPS 2015]
o Extract equirectangular feature map
o Project features to tangent planes
o Spherical non-maximum suppression
• Does not need new annotation
Application: 360° Object Detection
© 2023 Google 17
Experiment: Object Detection Accuracy
1[Cheng et al., CVPR 2018],
2[Cohen et al., ICLR 2018],
3[Esteves et al., ECCV 2018],
4[Coors et al., ECCV 2018]
• KTN is much more compact than SphConv
• KTN outperforms recent CNNs on spherical data
© 2023 Google 18
© 2023 Google 19
Experiment: KTN Transferability
© 2023 Google 19
Experiment: KTN Transferability
KTN performs almost identically
regardless of the training source!
© 2023 Google 21
Conclusion
1[Cheng et al., CVPR 2018],
2[Cohen et al., ICLR 2018],
3[Esteves et al., ECCV 2018],
4[Coors et al., ECCV 2018]
• Accounts for distortion in 360° images
• Does not require annotated 360° images
• Is transferable across different source CNNs
Open door to off-the-shelf CNN recognition on 360° images
© 2023 Google 22
Resources
• Su and Grauman, Learning Spherical
Convolution for Fast Features from 360°
Imagery, NeurIPS 2017
• Su and Grauman, Kernel Transformer
Networks for Compact Spherical
Convolution, CVPR 2018
• Su and Grauman, Learning Spherical
Convolution for 360 Recognition, PAMI
2021
• Zhang et al., PanoContext: A Whole-
Room 3D Context Model for Panoramic
Scene Understanding, ECCV 2014
• Hu et al., Deep 360 Pilot: Learning a
Deep Agent for Piloting through 360°
Sports Video, CVPR 2017
• Lai et al., Semantic-driven Generation of
Hyperlapse from 360° Video, TOG 2017
• Boomsma and Frellsen, Spherical
convolutions and their application in
molecular modelling, NeurIPS 2017
23
© 2023 Google
Resources
• Cheng et al., Cube Padding for Weakly-
Supervised Saliency Prediction in 360°
Videos, CVPR 2018
• Lee et al., A Memory Network Approach
for Story-Based Temporal Summarization
of 360° Videos, CVPR 2018
• Chou et al., Self-view Grounding Given a
Narrated 360° Video, AAAI 2018
• Yu et al., A Deep Ranking Model for
Spatio-Temporal Highlight Detection
from a 360 Video, AAAI 2018
• Cohen et al., Spherical CNNs, ICLR 2018
• Esteves et al., Learning SO(3)
Equivariant Representations with
Spherical CNNs, ECCV 2018
• Coors et al., SphereNet: Learning
Spherical Representations for Detection
and Classification in Omnidirectional
Images, ECCV 2018
• Zhang et al., Saliency Detection in 360°
Videos, ECCV 2018
• Khasanova and Frossard, Geometry
Aware Convolutional Filters for
Omnidirectional Images Representation,
ICML 2019
24
© 2023 Google

More Related Content

Similar to “Learning for 360° Vision,” a Presentation from Google

Understanding Users Behaviours in User-Centric Immersive Communications
Understanding Users Behaviours in User-Centric Immersive CommunicationsUnderstanding Users Behaviours in User-Centric Immersive Communications
Understanding Users Behaviours in User-Centric Immersive CommunicationsFörderverein Technische Fakultät
 
Mobile AR Lecture 8 - AR Panoramas
Mobile AR Lecture 8 - AR PanoramasMobile AR Lecture 8 - AR Panoramas
Mobile AR Lecture 8 - AR PanoramasMark Billinghurst
 
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis PlatformImproved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis PlatformFelipe Pedroso
 
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realitySynthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realityNAVER Engineering
 
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...YutaSuzuki27
 
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...Seiya Ito
 
Applying Machine Learning to Data Visaulization: What, Why, Where, and How
Applying Machine Learning to Data Visaulization: What, Why, Where, and HowApplying Machine Learning to Data Visaulization: What, Why, Where, and How
Applying Machine Learning to Data Visaulization: What, Why, Where, and HowQianwen Wang
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learningNAVER Engineering
 
YouTube Trending Video Dashboard
YouTube Trending Video DashboardYouTube Trending Video Dashboard
YouTube Trending Video DashboardIRJET Journal
 
Towards GQL 1 — A Property Graph Query Language Standard
Towards GQL 1 — A Property Graph Query Language StandardTowards GQL 1 — A Property Graph Query Language Standard
Towards GQL 1 — A Property Graph Query Language StandardNeo4j
 
October 2018 ODTUG Webinar - Getting Started with Groovy in EPBCS
October 2018 ODTUG Webinar - Getting Started with Groovy in EPBCSOctober 2018 ODTUG Webinar - Getting Started with Groovy in EPBCS
October 2018 ODTUG Webinar - Getting Started with Groovy in EPBCSKyle Goodfriend
 
Real Time Moving Object Detection for Day-Night Surveillance using AI
Real Time Moving Object Detection for Day-Night Surveillance using AIReal Time Moving Object Detection for Day-Night Surveillance using AI
Real Time Moving Object Detection for Day-Night Surveillance using AIIRJET Journal
 
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Saimunur Rahman
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringIRJET Journal
 

Similar to “Learning for 360° Vision,” a Presentation from Google (20)

Understanding Users Behaviours in User-Centric Immersive Communications
Understanding Users Behaviours in User-Centric Immersive CommunicationsUnderstanding Users Behaviours in User-Centric Immersive Communications
Understanding Users Behaviours in User-Centric Immersive Communications
 
Mobile AR Lecture 8 - AR Panoramas
Mobile AR Lecture 8 - AR PanoramasMobile AR Lecture 8 - AR Panoramas
Mobile AR Lecture 8 - AR Panoramas
 
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis PlatformImproved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
 
RESNA2011-lim-69568
RESNA2011-lim-69568RESNA2011-lim-69568
RESNA2011-lim-69568
 
Introduction of slam
Introduction of slamIntroduction of slam
Introduction of slam
 
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realitySynthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
 
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
 
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
 
Applying Machine Learning to Data Visaulization: What, Why, Where, and How
Applying Machine Learning to Data Visaulization: What, Why, Where, and HowApplying Machine Learning to Data Visaulization: What, Why, Where, and How
Applying Machine Learning to Data Visaulization: What, Why, Where, and How
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 
YouTube Trending Video Dashboard
YouTube Trending Video DashboardYouTube Trending Video Dashboard
YouTube Trending Video Dashboard
 
Cloud Testing Framework
Cloud Testing FrameworkCloud Testing Framework
Cloud Testing Framework
 
Towards GQL 1 — A Property Graph Query Language Standard
Towards GQL 1 — A Property Graph Query Language StandardTowards GQL 1 — A Property Graph Query Language Standard
Towards GQL 1 — A Property Graph Query Language Standard
 
October 2018 ODTUG Webinar - Getting Started with Groovy in EPBCS
October 2018 ODTUG Webinar - Getting Started with Groovy in EPBCSOctober 2018 ODTUG Webinar - Getting Started with Groovy in EPBCS
October 2018 ODTUG Webinar - Getting Started with Groovy in EPBCS
 
1806 cosmic progress
1806 cosmic progress1806 cosmic progress
1806 cosmic progress
 
Using cosmic in agile projects
Using cosmic in agile projectsUsing cosmic in agile projects
Using cosmic in agile projects
 
40120140502005
4012014050200540120140502005
40120140502005
 
Real Time Moving Object Detection for Day-Night Surveillance using AI
Real Time Moving Object Detection for Day-Night Surveillance using AIReal Time Moving Object Detection for Day-Night Surveillance using AI
Real Time Moving Object Detection for Day-Night Surveillance using AI
 
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question Answering
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

“Learning for 360° Vision,” a Presentation from Google

  • 1. Learning for 360° Vision Yu-Chuan Su Research Scientist Google
  • 2. Expected market growth [Precedence Research] Applications Growing Popularity of 360° Media Virtual / Augmented Reality Drone Photography Surveillance © 2023 Google 2
  • 3. 360° Image vs. Perspective Image © 2023 Google 3
  • 4. 360° Image vs. Perspective Image 𝜃 𝜙 © 2023 Google 4
  • 5. • Applications depend on visual recognition [Zhang et al., ECCV 2014], [Hu et al., CVPR 2017], [Lai et al., TOG 2017], [Chou et al., AAAI 2018], [Yu et al., AAAI 2018], [Lee et al., CVPR 2018] • CNNs for spherical data [Boomsma and Frellsen, NeurIPS 2017], [Cohen et al., ICLR 2018], [Cheng et al., CVPR 2018], [Esteves et al., ECCV 2018], [Coors et al., ECCV 2018], [Zhang et al., ECCV 2018], [Khasanova and Frossard, ICML 2019] o New architectures designed specifically for spherical data o Not data efficient – require annotations in spherical formats Visual Recognition on 360° Imagery © 2023 Google 5
  • 6. Strategy I (Equirectangular) [Hu CVPR 2017], [Lai TOG 2017] • Fast – single projection • Inaccurate – distortion in projection Applying CNNs to 360° Imagery © 2023 Google 6
  • 7. Strategy II (Perspective) [Zhang ECCV 2014], [Chou AAAI 2018], [Yu AAAI 2018] • Accurate – exact convolution output • Slow – repeated projection Applying CNNs to 360° Imagery © 2023 Google 7
  • 8. • Fast – single equirectangular projection • Accurate – simulate exact convolution output • Data efficient – does not require additional annotations Spherical Convolution © 2023 Google 8
  • 9. • Goal – account for distortion • Problem – distortion is polar angle (θ) / row dependent • Solution – untie kernel weights along the rows* Spherical Convolution Model Architecture θ = 90° θ = 54° θ = 18° Source * We use one set of weights every 5 rows in practice © 2023 Google 9
  • 10. • Goal – accelerate training • Idea – require 𝑁𝑒 to reproduce all intermediate features Layer-wise Training © 2023 Google 10
  • 11. Apply SphConv to Faster R-CNN • Pano2Vid (Train) o 360° video dataset o Sample 1,056 frames for training • Pascal VOC 2007 (Test) o Perspective object detection dataset o Project bounding boxes to spheres at five polar angle (𝜽) Experiment: Datasets © 2023 Google 11
  • 12. SphConv Evaluation 12 50% more accurate than Strategy I Orders of magnitude faster than Strategy II * [Hu et al., CVPR 2017], [Lai et al., TOG 2017] # [Zhang et al., ECCV 2014], [Chou et al., AAAI 2018], [Yu et al., AAAI 2018] © 2023 Google RMSE Accuracy
  • 13. • Model size increases linearly w.r.t. input resolution (H) • SphConv kernels are highly correlated → weights are redundant Problem of Spherical Convolution CNN SphConv Asymptotic c2k2 c2k2H VGG* 56 MB 29 GB * Over 640x320 input images © 2023 Google 13
  • 14. Idea – generates SphConv kernels from source kernels Kernel Transformer Network © 2023 Google 14
  • 15. Kernel Transformer Network Architecture 15 © 2023 Google CNN SphConv KTN Asymptotic c2k2 c2k2H c2k2+c2 VGG 56 MB 29 GB 70 MB
  • 16. Transferability of Kernel Transformer Network We need only one KTN for different tasks! © 2023 Google 16
  • 17. • Appy KTN to Faster R-CNN [Ren et al., NIPS 2015] o Extract equirectangular feature map o Project features to tangent planes o Spherical non-maximum suppression • Does not need new annotation Application: 360° Object Detection © 2023 Google 17
  • 18. Experiment: Object Detection Accuracy 1[Cheng et al., CVPR 2018], 2[Cohen et al., ICLR 2018], 3[Esteves et al., ECCV 2018], 4[Coors et al., ECCV 2018] • KTN is much more compact than SphConv • KTN outperforms recent CNNs on spherical data © 2023 Google 18
  • 21. Experiment: KTN Transferability KTN performs almost identically regardless of the training source! © 2023 Google 21
  • 22. Conclusion 1[Cheng et al., CVPR 2018], 2[Cohen et al., ICLR 2018], 3[Esteves et al., ECCV 2018], 4[Coors et al., ECCV 2018] • Accounts for distortion in 360° images • Does not require annotated 360° images • Is transferable across different source CNNs Open door to off-the-shelf CNN recognition on 360° images © 2023 Google 22
  • 23. Resources • Su and Grauman, Learning Spherical Convolution for Fast Features from 360° Imagery, NeurIPS 2017 • Su and Grauman, Kernel Transformer Networks for Compact Spherical Convolution, CVPR 2018 • Su and Grauman, Learning Spherical Convolution for 360 Recognition, PAMI 2021 • Zhang et al., PanoContext: A Whole- Room 3D Context Model for Panoramic Scene Understanding, ECCV 2014 • Hu et al., Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video, CVPR 2017 • Lai et al., Semantic-driven Generation of Hyperlapse from 360° Video, TOG 2017 • Boomsma and Frellsen, Spherical convolutions and their application in molecular modelling, NeurIPS 2017 23 © 2023 Google
  • 24. Resources • Cheng et al., Cube Padding for Weakly- Supervised Saliency Prediction in 360° Videos, CVPR 2018 • Lee et al., A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos, CVPR 2018 • Chou et al., Self-view Grounding Given a Narrated 360° Video, AAAI 2018 • Yu et al., A Deep Ranking Model for Spatio-Temporal Highlight Detection from a 360 Video, AAAI 2018 • Cohen et al., Spherical CNNs, ICLR 2018 • Esteves et al., Learning SO(3) Equivariant Representations with Spherical CNNs, ECCV 2018 • Coors et al., SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images, ECCV 2018 • Zhang et al., Saliency Detection in 360° Videos, ECCV 2018 • Khasanova and Frossard, Geometry Aware Convolutional Filters for Omnidirectional Images Representation, ICML 2019 24 © 2023 Google