SlideShare a Scribd company logo
Learning for 360°
Vision
Yu-Chuan Su
Research Scientist
Google
Expected market growth
[Precedence Research]
Applications
Growing Popularity of 360° Media
Virtual / Augmented Reality
Drone
Photography Surveillance
© 2023 Google 2
360° Image vs. Perspective Image
© 2023 Google 3
360° Image vs. Perspective Image
𝜃
𝜙
© 2023 Google 4
• Applications depend on visual recognition
[Zhang et al., ECCV 2014], [Hu et al., CVPR 2017], [Lai et al., TOG 2017], [Chou et al., AAAI
2018], [Yu et al., AAAI 2018], [Lee et al., CVPR 2018]
• CNNs for spherical data
[Boomsma and Frellsen, NeurIPS 2017], [Cohen et al., ICLR 2018], [Cheng et al., CVPR
2018], [Esteves et al., ECCV 2018], [Coors et al., ECCV 2018], [Zhang et al., ECCV 2018],
[Khasanova and Frossard, ICML 2019]
o New architectures designed specifically for spherical data
o Not data efficient – require annotations in spherical formats
Visual Recognition on 360° Imagery
© 2023 Google 5
Strategy I (Equirectangular)
[Hu CVPR 2017], [Lai TOG 2017]
• Fast – single projection
• Inaccurate – distortion in projection
Applying CNNs to 360° Imagery
© 2023 Google 6
Strategy II (Perspective)
[Zhang ECCV 2014], [Chou AAAI 2018],
[Yu AAAI 2018]
• Accurate – exact convolution output
• Slow – repeated projection
Applying CNNs to 360° Imagery
© 2023 Google 7
• Fast – single equirectangular projection
• Accurate – simulate exact convolution output
• Data efficient – does not require additional annotations
Spherical Convolution
© 2023 Google 8
• Goal – account for distortion
• Problem – distortion is polar angle (θ) / row dependent
• Solution – untie kernel weights along the rows*
Spherical Convolution Model Architecture
θ = 90° θ = 54° θ = 18°
Source
* We use one set of weights every 5 rows in practice
© 2023 Google 9
• Goal – accelerate training
• Idea – require 𝑁𝑒 to reproduce all intermediate features
Layer-wise Training
© 2023 Google 10
Apply SphConv to Faster R-CNN
• Pano2Vid (Train)
o 360° video dataset
o Sample 1,056 frames for training
• Pascal VOC 2007 (Test)
o Perspective object detection dataset
o Project bounding boxes to spheres at five polar angle (𝜽)
Experiment: Datasets
© 2023 Google 11
SphConv Evaluation
12
50% more accurate
than Strategy I
Orders of magnitude
faster than Strategy II
* [Hu et al., CVPR 2017], [Lai et al., TOG 2017]
# [Zhang et al., ECCV 2014], [Chou et al., AAAI 2018], [Yu et al., AAAI 2018]
© 2023 Google
RMSE
Accuracy
• Model size increases linearly w.r.t. input resolution (H)
• SphConv kernels are highly correlated → weights are redundant
Problem of Spherical Convolution
CNN SphConv
Asymptotic c2k2 c2k2H
VGG* 56 MB 29 GB
* Over 640x320 input images
© 2023 Google 13
Idea – generates SphConv kernels from source kernels
Kernel Transformer Network
© 2023 Google 14
Kernel Transformer Network Architecture
15
© 2023 Google
CNN SphConv KTN
Asymptotic c2k2 c2k2H c2k2+c2
VGG 56 MB 29 GB 70 MB
Transferability of Kernel Transformer Network
We need only one KTN for different tasks!
© 2023 Google 16
• Appy KTN to Faster R-CNN
[Ren et al., NIPS 2015]
o Extract equirectangular feature map
o Project features to tangent planes
o Spherical non-maximum suppression
• Does not need new annotation
Application: 360° Object Detection
© 2023 Google 17
Experiment: Object Detection Accuracy
1[Cheng et al., CVPR 2018],
2[Cohen et al., ICLR 2018],
3[Esteves et al., ECCV 2018],
4[Coors et al., ECCV 2018]
• KTN is much more compact than SphConv
• KTN outperforms recent CNNs on spherical data
© 2023 Google 18
© 2023 Google 19
Experiment: KTN Transferability
© 2023 Google 19
Experiment: KTN Transferability
KTN performs almost identically
regardless of the training source!
© 2023 Google 21
Conclusion
1[Cheng et al., CVPR 2018],
2[Cohen et al., ICLR 2018],
3[Esteves et al., ECCV 2018],
4[Coors et al., ECCV 2018]
• Accounts for distortion in 360° images
• Does not require annotated 360° images
• Is transferable across different source CNNs
Open door to off-the-shelf CNN recognition on 360° images
© 2023 Google 22
Resources
• Su and Grauman, Learning Spherical
Convolution for Fast Features from 360°
Imagery, NeurIPS 2017
• Su and Grauman, Kernel Transformer
Networks for Compact Spherical
Convolution, CVPR 2018
• Su and Grauman, Learning Spherical
Convolution for 360 Recognition, PAMI
2021
• Zhang et al., PanoContext: A Whole-
Room 3D Context Model for Panoramic
Scene Understanding, ECCV 2014
• Hu et al., Deep 360 Pilot: Learning a
Deep Agent for Piloting through 360°
Sports Video, CVPR 2017
• Lai et al., Semantic-driven Generation of
Hyperlapse from 360° Video, TOG 2017
• Boomsma and Frellsen, Spherical
convolutions and their application in
molecular modelling, NeurIPS 2017
23
© 2023 Google
Resources
• Cheng et al., Cube Padding for Weakly-
Supervised Saliency Prediction in 360°
Videos, CVPR 2018
• Lee et al., A Memory Network Approach
for Story-Based Temporal Summarization
of 360° Videos, CVPR 2018
• Chou et al., Self-view Grounding Given a
Narrated 360° Video, AAAI 2018
• Yu et al., A Deep Ranking Model for
Spatio-Temporal Highlight Detection
from a 360 Video, AAAI 2018
• Cohen et al., Spherical CNNs, ICLR 2018
• Esteves et al., Learning SO(3)
Equivariant Representations with
Spherical CNNs, ECCV 2018
• Coors et al., SphereNet: Learning
Spherical Representations for Detection
and Classification in Omnidirectional
Images, ECCV 2018
• Zhang et al., Saliency Detection in 360°
Videos, ECCV 2018
• Khasanova and Frossard, Geometry
Aware Convolutional Filters for
Omnidirectional Images Representation,
ICML 2019
24
© 2023 Google

More Related Content

Similar to “Learning for 360° Vision,” a Presentation from Google

Understanding Users Behaviours in User-Centric Immersive Communications
Understanding Users Behaviours in User-Centric Immersive CommunicationsUnderstanding Users Behaviours in User-Centric Immersive Communications
Understanding Users Behaviours in User-Centric Immersive Communications
Förderverein Technische Fakultät
 
Mobile AR Lecture 8 - AR Panoramas
Mobile AR Lecture 8 - AR PanoramasMobile AR Lecture 8 - AR Panoramas
Mobile AR Lecture 8 - AR Panoramas
Mark Billinghurst
 
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis PlatformImproved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
Felipe Pedroso
 
Introduction of slam
Introduction of slamIntroduction of slam
Introduction of slam
Hung-Chih Chang
 
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realitySynthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
NAVER Engineering
 
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
YutaSuzuki27
 
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
Seiya Ito
 
Applying Machine Learning to Data Visaulization: What, Why, Where, and How
Applying Machine Learning to Data Visaulization: What, Why, Where, and HowApplying Machine Learning to Data Visaulization: What, Why, Where, and How
Applying Machine Learning to Data Visaulization: What, Why, Where, and How
Qianwen Wang
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
NAVER Engineering
 
YouTube Trending Video Dashboard
YouTube Trending Video DashboardYouTube Trending Video Dashboard
YouTube Trending Video Dashboard
IRJET Journal
 
Cloud Testing Framework
Cloud Testing FrameworkCloud Testing Framework
Cloud Testing Framework
Crescencio Rodrigues Lima Neto
 
Towards GQL 1 — A Property Graph Query Language Standard
Towards GQL 1 — A Property Graph Query Language StandardTowards GQL 1 — A Property Graph Query Language Standard
Towards GQL 1 — A Property Graph Query Language Standard
Neo4j
 
October 2018 ODTUG Webinar - Getting Started with Groovy in EPBCS
October 2018 ODTUG Webinar - Getting Started with Groovy in EPBCSOctober 2018 ODTUG Webinar - Getting Started with Groovy in EPBCS
October 2018 ODTUG Webinar - Getting Started with Groovy in EPBCS
Kyle Goodfriend
 
1806 cosmic progress
1806 cosmic progress1806 cosmic progress
1806 cosmic progress
Charles Symons
 
Real Time Moving Object Detection for Day-Night Surveillance using AI
Real Time Moving Object Detection for Day-Night Surveillance using AIReal Time Moving Object Detection for Day-Night Surveillance using AI
Real Time Moving Object Detection for Day-Night Surveillance using AI
IRJET Journal
 
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Saimunur Rahman
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question Answering
IRJET Journal
 

Similar to “Learning for 360° Vision,” a Presentation from Google (20)

Understanding Users Behaviours in User-Centric Immersive Communications
Understanding Users Behaviours in User-Centric Immersive CommunicationsUnderstanding Users Behaviours in User-Centric Immersive Communications
Understanding Users Behaviours in User-Centric Immersive Communications
 
Mobile AR Lecture 8 - AR Panoramas
Mobile AR Lecture 8 - AR PanoramasMobile AR Lecture 8 - AR Panoramas
Mobile AR Lecture 8 - AR Panoramas
 
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis PlatformImproved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
 
RESNA2011-lim-69568
RESNA2011-lim-69568RESNA2011-lim-69568
RESNA2011-lim-69568
 
Introduction of slam
Introduction of slamIntroduction of slam
Introduction of slam
 
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realitySynthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
 
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...
 
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
 
Applying Machine Learning to Data Visaulization: What, Why, Where, and How
Applying Machine Learning to Data Visaulization: What, Why, Where, and HowApplying Machine Learning to Data Visaulization: What, Why, Where, and How
Applying Machine Learning to Data Visaulization: What, Why, Where, and How
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 
YouTube Trending Video Dashboard
YouTube Trending Video DashboardYouTube Trending Video Dashboard
YouTube Trending Video Dashboard
 
Cloud Testing Framework
Cloud Testing FrameworkCloud Testing Framework
Cloud Testing Framework
 
Towards GQL 1 — A Property Graph Query Language Standard
Towards GQL 1 — A Property Graph Query Language StandardTowards GQL 1 — A Property Graph Query Language Standard
Towards GQL 1 — A Property Graph Query Language Standard
 
October 2018 ODTUG Webinar - Getting Started with Groovy in EPBCS
October 2018 ODTUG Webinar - Getting Started with Groovy in EPBCSOctober 2018 ODTUG Webinar - Getting Started with Groovy in EPBCS
October 2018 ODTUG Webinar - Getting Started with Groovy in EPBCS
 
1806 cosmic progress
1806 cosmic progress1806 cosmic progress
1806 cosmic progress
 
Using cosmic in agile projects
Using cosmic in agile projectsUsing cosmic in agile projects
Using cosmic in agile projects
 
40120140502005
4012014050200540120140502005
40120140502005
 
Real Time Moving Object Detection for Day-Night Surveillance using AI
Real Time Moving Object Detection for Day-Night Surveillance using AIReal Time Moving Object Detection for Day-Night Surveillance using AI
Real Time Moving Object Detection for Day-Night Surveillance using AI
 
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question Answering
 

More from Edge AI and Vision Alliance

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a..."OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
Edge AI and Vision Alliance
 
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
Edge AI and Vision Alliance
 
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
Edge AI and Vision Alliance
 
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
Edge AI and Vision Alliance
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a..."OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
 
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
 
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
 
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

“Learning for 360° Vision,” a Presentation from Google

  • 1. Learning for 360° Vision Yu-Chuan Su Research Scientist Google
  • 2. Expected market growth [Precedence Research] Applications Growing Popularity of 360° Media Virtual / Augmented Reality Drone Photography Surveillance © 2023 Google 2
  • 3. 360° Image vs. Perspective Image © 2023 Google 3
  • 4. 360° Image vs. Perspective Image 𝜃 𝜙 © 2023 Google 4
  • 5. • Applications depend on visual recognition [Zhang et al., ECCV 2014], [Hu et al., CVPR 2017], [Lai et al., TOG 2017], [Chou et al., AAAI 2018], [Yu et al., AAAI 2018], [Lee et al., CVPR 2018] • CNNs for spherical data [Boomsma and Frellsen, NeurIPS 2017], [Cohen et al., ICLR 2018], [Cheng et al., CVPR 2018], [Esteves et al., ECCV 2018], [Coors et al., ECCV 2018], [Zhang et al., ECCV 2018], [Khasanova and Frossard, ICML 2019] o New architectures designed specifically for spherical data o Not data efficient – require annotations in spherical formats Visual Recognition on 360° Imagery © 2023 Google 5
  • 6. Strategy I (Equirectangular) [Hu CVPR 2017], [Lai TOG 2017] • Fast – single projection • Inaccurate – distortion in projection Applying CNNs to 360° Imagery © 2023 Google 6
  • 7. Strategy II (Perspective) [Zhang ECCV 2014], [Chou AAAI 2018], [Yu AAAI 2018] • Accurate – exact convolution output • Slow – repeated projection Applying CNNs to 360° Imagery © 2023 Google 7
  • 8. • Fast – single equirectangular projection • Accurate – simulate exact convolution output • Data efficient – does not require additional annotations Spherical Convolution © 2023 Google 8
  • 9. • Goal – account for distortion • Problem – distortion is polar angle (θ) / row dependent • Solution – untie kernel weights along the rows* Spherical Convolution Model Architecture θ = 90° θ = 54° θ = 18° Source * We use one set of weights every 5 rows in practice © 2023 Google 9
  • 10. • Goal – accelerate training • Idea – require 𝑁𝑒 to reproduce all intermediate features Layer-wise Training © 2023 Google 10
  • 11. Apply SphConv to Faster R-CNN • Pano2Vid (Train) o 360° video dataset o Sample 1,056 frames for training • Pascal VOC 2007 (Test) o Perspective object detection dataset o Project bounding boxes to spheres at five polar angle (𝜽) Experiment: Datasets © 2023 Google 11
  • 12. SphConv Evaluation 12 50% more accurate than Strategy I Orders of magnitude faster than Strategy II * [Hu et al., CVPR 2017], [Lai et al., TOG 2017] # [Zhang et al., ECCV 2014], [Chou et al., AAAI 2018], [Yu et al., AAAI 2018] © 2023 Google RMSE Accuracy
  • 13. • Model size increases linearly w.r.t. input resolution (H) • SphConv kernels are highly correlated → weights are redundant Problem of Spherical Convolution CNN SphConv Asymptotic c2k2 c2k2H VGG* 56 MB 29 GB * Over 640x320 input images © 2023 Google 13
  • 14. Idea – generates SphConv kernels from source kernels Kernel Transformer Network © 2023 Google 14
  • 15. Kernel Transformer Network Architecture 15 © 2023 Google CNN SphConv KTN Asymptotic c2k2 c2k2H c2k2+c2 VGG 56 MB 29 GB 70 MB
  • 16. Transferability of Kernel Transformer Network We need only one KTN for different tasks! © 2023 Google 16
  • 17. • Appy KTN to Faster R-CNN [Ren et al., NIPS 2015] o Extract equirectangular feature map o Project features to tangent planes o Spherical non-maximum suppression • Does not need new annotation Application: 360° Object Detection © 2023 Google 17
  • 18. Experiment: Object Detection Accuracy 1[Cheng et al., CVPR 2018], 2[Cohen et al., ICLR 2018], 3[Esteves et al., ECCV 2018], 4[Coors et al., ECCV 2018] • KTN is much more compact than SphConv • KTN outperforms recent CNNs on spherical data © 2023 Google 18
  • 21. Experiment: KTN Transferability KTN performs almost identically regardless of the training source! © 2023 Google 21
  • 22. Conclusion 1[Cheng et al., CVPR 2018], 2[Cohen et al., ICLR 2018], 3[Esteves et al., ECCV 2018], 4[Coors et al., ECCV 2018] • Accounts for distortion in 360° images • Does not require annotated 360° images • Is transferable across different source CNNs Open door to off-the-shelf CNN recognition on 360° images © 2023 Google 22
  • 23. Resources • Su and Grauman, Learning Spherical Convolution for Fast Features from 360° Imagery, NeurIPS 2017 • Su and Grauman, Kernel Transformer Networks for Compact Spherical Convolution, CVPR 2018 • Su and Grauman, Learning Spherical Convolution for 360 Recognition, PAMI 2021 • Zhang et al., PanoContext: A Whole- Room 3D Context Model for Panoramic Scene Understanding, ECCV 2014 • Hu et al., Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video, CVPR 2017 • Lai et al., Semantic-driven Generation of Hyperlapse from 360° Video, TOG 2017 • Boomsma and Frellsen, Spherical convolutions and their application in molecular modelling, NeurIPS 2017 23 © 2023 Google
  • 24. Resources • Cheng et al., Cube Padding for Weakly- Supervised Saliency Prediction in 360° Videos, CVPR 2018 • Lee et al., A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos, CVPR 2018 • Chou et al., Self-view Grounding Given a Narrated 360° Video, AAAI 2018 • Yu et al., A Deep Ranking Model for Spatio-Temporal Highlight Detection from a 360 Video, AAAI 2018 • Cohen et al., Spherical CNNs, ICLR 2018 • Esteves et al., Learning SO(3) Equivariant Representations with Spherical CNNs, ECCV 2018 • Coors et al., SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images, ECCV 2018 • Zhang et al., Saliency Detection in 360° Videos, ECCV 2018 • Khasanova and Frossard, Geometry Aware Convolutional Filters for Omnidirectional Images Representation, ICML 2019 24 © 2023 Google