SlideShare a Scribd company logo
1 of 21
State of the Art Innovations in
Computer Vision
Christian Siagian
DataCon LA
August 16, 2019
Presentation Structure
• 10 minutes background to set the information
• 20 minutes current Computer Vision topics
• 10 minutes summary and questions
My Background
• Academic:
– Publications in Computer Vision, Robotic Vision,
Human Vision
– Beobot 2.0:
• parallel high-performing robotics vision mobile
platform
• full software architecture with vision localization and
navigation
• Start Up: AIO Robotics, inc.
– fully integrated 3D printer, scanner, editor, object
search
– 2 patents and CES Innovation Awards 2016 & 2017
• Start Up: Eyenuk, inc. Medical Deep Learning
– retinal image lesion detection and segmentation
– end-to-end robotic system to automate eye
screening, monitoring, diagnosis, reporting
– Patent and Grant applications
• Competition Robotics:
– Robocup Soccer Robot & AUVSI Autonomous
Submarine
• Teaching:
– After school robotics program, USC robotics courses
• Learning:
– Academics, sports journalism, nutrition, art, music
Artificial Intelligence
• Fields: Machine Learning (ML), Computer
Vision (CV), Natural Language Processing
(NLP), and Robotics
– Digitally and in real world
– They are connected for particular applications
• We will focus on Computer Vision and related
topics
Connection with Data Science
• Computer Vision (CV) processes raw data to be used for
data science
• Raw input data: images (regular cameras, heat cameras,
etc), texts, audio
– These data do not have direct semantic meaning:
• Not measuring specific (or isolated) characteristic
– Create models to understand what is in the images, etc.
• Advantage of raw data:
– General purpose/richer source of information
– Target events can be obtained by further processing later
– Less reliant on manual entry, more natural interactions (with
customers)
Connection with Data Science
• Disadvantage of raw data:
– Systems/Infrastructure (hardware & software
environment tools): are expensive
– Models: are more complex
– Data: are of higher dimension, massive, and need
data annotations (for learning)
Deep Learning: AlexNet 2012
• Trying to solve Object Recognition:
– Given an image (massive number of
pixels), determine the object (1
label)
• Have labeled training dataset,
would like to learn a function of the
mapping
• Data should encapsulate invariance
in the presence of:
– Appearance
– Interaction with the world
– Perspective (2D – 3D), including
size
– Occlusion
– Lighting
Deep Learning: AlexNet 2012
• Data: CalTech 101, ImageNet: 2005
– 1,000,000 images (1000 categories,
1000 image/categories)
– The set of all objects in real life is in
the thousands
• Model: 1989
– Convolutional Neural Network:
Yann LeCun 1989: MNIST digit
recognition
– Deep network that jointly trains
both the feature extraction and
classification stage
• Systems/Infrastructure: 2010
– From Video games (Sony
PlayStations): GPU, CUDA: 60 – 100
times speed up
• BLOG:
https://adeshpande3.github.io/adeshpande3.github.io/A-
Beginner's-Guide-To-Understanding-Convolutional-Neural-
Networks/
Data-driven features within a
compositional architecture
Solving Other Computer Vision
Problems
• The data-driven features is key in moving efforts for ALMOST ALL
other difficult Computer Vision tasks forward
– Note: Basic single image object/person/background recognition has
moved to Enterprise AI (e.g. Amazon Rekognition)
– Mature tasks, such as tracking are available in many free libraries
(OpenCV, etc.)
• Complex algorithms hinges on: architecture & training
– Papers focus on architecture, training is tribal knowledge
• Whether the data is noisy
• Do we need more data
• Training regiment: hyper-parameter grid, fine-tuning, multiple stages, etc.
• Visualization
• Evaluation
Solving Other Computer Vision
Problems
• Additional key concepts
in architecture:
– Adding dependencies to
the past (recurrence):
• Recurrence Neural
Network (RNN)
Long range dependency:
“When I was in Paris I got
lost because I couldn’t
ask for directions in
_____”
Solving Other Computer Vision
Problems
• Additional key concepts in
architecture:
– Adding dependencies to the
past (recurrence):
• Recurrence Neural Network
(RNN)
– Undoing the dimensional
collapse to get more
details:
• Fully Convolutional
Network
Segmentation tasks, Neural
Network visualization
Solving Other Computer Vision
Problems
• Additional key concepts in
architecture:
– Adding dependencies to the past
(recurrence):
• Recurrence Neural Network (RNN)
– Undoing the dimension collapse:
• Fully Convolutional Network
– Using multiple networks:
• Joint Learning: jointly learn inter-
related tasks
• Generative Adversarial Network
(GAN): learning using competing
networks
Learning jointly can provide benefits
of improved individual task
performance
GAN is used for synthetic data
generation
Contemporary Computer Vision
• Topics:
– Deep Learning Theory: accuracy, efficiency
– Recognition: robustness, more detail, larger context
– Reconstruction: WILL NOT DELVE DEEP INTO THIS
• 6DOF pose, clothing, hair, light, deformation, mesh, depth, joint
• GAN is moving forward: generates control signals at multiple layers
– https://www.youtube.com/watch?v=kSLJriaOumA
• Inputs:
– Images, Videos, 3D data, special cameras (thermal, event cameras)
– Video and: audio, text (language), robots
• Applications:
– Medical
– Robots: language/semantic navigation, interacting with object
Deep Learning Theory
• Graph Neural Networks
– Relationships: objects, joints
• Few shot, one shot, zero shot
learning. Weakly/un supervised
Learning
• measure uncertainty & class
imbalance
• Active/online Learning
• open-set learning
• Architectural search:
• Component analyses:
– RELU, Augmentation strategy
• Resources allocation/compression
• Stability/sensitivity/adversarial
Deep Learning Theory
• Graph Convolutional Networks [https://arxiv.org/pdf/1609.02907.pdf]
– http://openaccess.thecvf.com/content_CVPR_2019/papers/Kim_Edge-Labeling_Graph_Neural_Network_for_Few-
Shot_Learning_CVPR_2019_paper.pdf
• Few shot, one shot, zero shot learning. Weakly/un
supervised Learning
– http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Few-Shot_Adaptive_Faster_R-CNN_CVPR_2019_paper.pdf
• Active Learning
• measure uncertainty & class imbalance
– http://openaccess.thecvf.com/content_CVPR_2019/papers/Khan_Striking_the_Right_Balance_With_Uncertainty_CVPR_2019_paper.pdf
• Online learning, open-set
• Architectural search:– http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Auto-DeepLab_Hierarchical_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2019_paper.pdf
• Component analyses:
– RELU, Augmentation strategy
– http://openaccess.thecvf.com/content_CVPR_2019/papers/Cubuk_AutoAugment_Learning_Augmentation_Strategies_From_Data_CVPR_2019_paper.pdf
• Resources allocation/compression:– http://openaccess.thecvf.com/content_CVPR_2019/papers/Qiao_Neural_Rejuvenation_Improving_Deep_Network_Training_by_Enhancing_Computational_Resource_CVPR_2019_paper.pdf
• Stability/sensitivity/adversarial
Recognition
• Image: detection, recognition,
segmentation, landmarking,
identification in the crowd/wild:
– Face, hand & body pose estimation
• Skeleton, joint localization
• Dense pose
– Panoptic segmentation, RCNN-family
• Video: (person, object, background,
and combination):
– Action Recognition (1 person):
• most active in recognition
• Still in 80 actions: space of actions is
unknown
• Segmenting action in the wild,
simultaneous multiple actions is difficult
– Social relationship (multiple person):
– Video Object segmentation Faster R-
CNN, etc (multiple object)
– Surveillance: tracking & Re-identification
Recognition, cont.
• Visual Question
Answering (VQA): words
& image connection:
– Visual dialog
– Video Captioning
• Video and Audio:
– Audio video event
recognition
– Video enhancement:
diarization
Overarching Trends
• Datasets dictates
research activity
– Largest datasets are from
large entities (Facebook,
Google Deep Mind, etc.)
– Examples:
• Cityscapes: Dashboard
Cam: Segmentation:
semantic, instance
• COCO datasets:
Segmentation: semantic,
instance
• Kinetics Human Action
Dataset
• Social interaction capture:
CMU
• Person Re-identification
Trends/Predictions Moving Forward
• Smaller manually-annotated dataset training catches
up in performance
– Few, one, no shot training
– mixed use real & synthetic data
• Grounded recognition and reconstruction (adding
more modules to solve a problem robustly):
– Image: recognition – segmentation (panoptic) – 3D object
reconstruction – space understanding
– Video: pose estimation – action recognition – action
forecasting – reconstruction
• The next superior building block should direct the field
again (following SIFT 2004, and DL features 2012)
How Do We Apply All These
Information?
• Have a working knowledge of the ML/CV
fundamentals:
– theory, software, hardware, models (CNN, RNN)
• Start with your use-case:
– find keywords in the papers
– search blogs for definition, background
• Run the open-source code
– Understand the limitations
– Are they acceptable to your business?

More Related Content

What's hot

Excursions into Blended Reality
Excursions into Blended RealityExcursions into Blended Reality
Excursions into Blended RealityLarry Smarr
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Affective recommender systems: the role of emotions in recommender systems
Affective recommender systems: the role of emotions in recommender systemsAffective recommender systems: the role of emotions in recommender systems
Affective recommender systems: the role of emotions in recommender systemsMarko Tkalčič
 
Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...
Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...
Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...Larry Smarr
 
Teach Less Learn More
Teach Less Learn MoreTeach Less Learn More
Teach Less Learn MoreKevin Walsh
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Universitat Politècnica de Catalunya
 

What's hot (10)

Excursions into Blended Reality
Excursions into Blended RealityExcursions into Blended Reality
Excursions into Blended Reality
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
 
Affective recommender systems: the role of emotions in recommender systems
Affective recommender systems: the role of emotions in recommender systemsAffective recommender systems: the role of emotions in recommender systems
Affective recommender systems: the role of emotions in recommender systems
 
Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...
Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...
Shrinking the Planet—How Dedicated Optical Networks are Transforming Computat...
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
 
Teach Less Learn More
Teach Less Learn MoreTeach Less Learn More
Teach Less Learn More
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
 

Similar to Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Christian Siagian

Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDong Heon Cho
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep LearningDavid Khosid
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction Wael Badawy
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningAli Alkan
 
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr..."Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...Edge AI and Vision Alliance
 
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...MaRS Discovery District
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial IntelligenceZavain Dar
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesUnited States Air Force Academy
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningCharles Deledalle
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptxfahmi324663
 
Lecture 1 computer vision introduction
Lecture 1 computer vision introductionLecture 1 computer vision introduction
Lecture 1 computer vision introductioncairo university
 
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Symeon Papadopoulos
 
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...Skolkovo Robotics Center
 

Similar to Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Christian Siagian (20)

Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image Perspective
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep Learning
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
 
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr..."Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
 
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial Intelligence
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large Repositories
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
CNN Algorithm
CNN AlgorithmCNN Algorithm
CNN Algorithm
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learning
 
ICS1020 CV
ICS1020 CVICS1020 CV
ICS1020 CV
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptx
 
Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
Lecture 1 computer vision introduction
Lecture 1 computer vision introductionLecture 1 computer vision introduction
Lecture 1 computer vision introduction
 
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
 
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Christian Siagian

  • 1. State of the Art Innovations in Computer Vision Christian Siagian DataCon LA August 16, 2019
  • 2. Presentation Structure • 10 minutes background to set the information • 20 minutes current Computer Vision topics • 10 minutes summary and questions
  • 3. My Background • Academic: – Publications in Computer Vision, Robotic Vision, Human Vision – Beobot 2.0: • parallel high-performing robotics vision mobile platform • full software architecture with vision localization and navigation • Start Up: AIO Robotics, inc. – fully integrated 3D printer, scanner, editor, object search – 2 patents and CES Innovation Awards 2016 & 2017 • Start Up: Eyenuk, inc. Medical Deep Learning – retinal image lesion detection and segmentation – end-to-end robotic system to automate eye screening, monitoring, diagnosis, reporting – Patent and Grant applications • Competition Robotics: – Robocup Soccer Robot & AUVSI Autonomous Submarine • Teaching: – After school robotics program, USC robotics courses • Learning: – Academics, sports journalism, nutrition, art, music
  • 4. Artificial Intelligence • Fields: Machine Learning (ML), Computer Vision (CV), Natural Language Processing (NLP), and Robotics – Digitally and in real world – They are connected for particular applications • We will focus on Computer Vision and related topics
  • 5. Connection with Data Science • Computer Vision (CV) processes raw data to be used for data science • Raw input data: images (regular cameras, heat cameras, etc), texts, audio – These data do not have direct semantic meaning: • Not measuring specific (or isolated) characteristic – Create models to understand what is in the images, etc. • Advantage of raw data: – General purpose/richer source of information – Target events can be obtained by further processing later – Less reliant on manual entry, more natural interactions (with customers)
  • 6. Connection with Data Science • Disadvantage of raw data: – Systems/Infrastructure (hardware & software environment tools): are expensive – Models: are more complex – Data: are of higher dimension, massive, and need data annotations (for learning)
  • 7. Deep Learning: AlexNet 2012 • Trying to solve Object Recognition: – Given an image (massive number of pixels), determine the object (1 label) • Have labeled training dataset, would like to learn a function of the mapping • Data should encapsulate invariance in the presence of: – Appearance – Interaction with the world – Perspective (2D – 3D), including size – Occlusion – Lighting
  • 8. Deep Learning: AlexNet 2012 • Data: CalTech 101, ImageNet: 2005 – 1,000,000 images (1000 categories, 1000 image/categories) – The set of all objects in real life is in the thousands • Model: 1989 – Convolutional Neural Network: Yann LeCun 1989: MNIST digit recognition – Deep network that jointly trains both the feature extraction and classification stage • Systems/Infrastructure: 2010 – From Video games (Sony PlayStations): GPU, CUDA: 60 – 100 times speed up • BLOG: https://adeshpande3.github.io/adeshpande3.github.io/A- Beginner's-Guide-To-Understanding-Convolutional-Neural- Networks/
  • 9. Data-driven features within a compositional architecture
  • 10. Solving Other Computer Vision Problems • The data-driven features is key in moving efforts for ALMOST ALL other difficult Computer Vision tasks forward – Note: Basic single image object/person/background recognition has moved to Enterprise AI (e.g. Amazon Rekognition) – Mature tasks, such as tracking are available in many free libraries (OpenCV, etc.) • Complex algorithms hinges on: architecture & training – Papers focus on architecture, training is tribal knowledge • Whether the data is noisy • Do we need more data • Training regiment: hyper-parameter grid, fine-tuning, multiple stages, etc. • Visualization • Evaluation
  • 11. Solving Other Computer Vision Problems • Additional key concepts in architecture: – Adding dependencies to the past (recurrence): • Recurrence Neural Network (RNN) Long range dependency: “When I was in Paris I got lost because I couldn’t ask for directions in _____”
  • 12. Solving Other Computer Vision Problems • Additional key concepts in architecture: – Adding dependencies to the past (recurrence): • Recurrence Neural Network (RNN) – Undoing the dimensional collapse to get more details: • Fully Convolutional Network Segmentation tasks, Neural Network visualization
  • 13. Solving Other Computer Vision Problems • Additional key concepts in architecture: – Adding dependencies to the past (recurrence): • Recurrence Neural Network (RNN) – Undoing the dimension collapse: • Fully Convolutional Network – Using multiple networks: • Joint Learning: jointly learn inter- related tasks • Generative Adversarial Network (GAN): learning using competing networks Learning jointly can provide benefits of improved individual task performance GAN is used for synthetic data generation
  • 14. Contemporary Computer Vision • Topics: – Deep Learning Theory: accuracy, efficiency – Recognition: robustness, more detail, larger context – Reconstruction: WILL NOT DELVE DEEP INTO THIS • 6DOF pose, clothing, hair, light, deformation, mesh, depth, joint • GAN is moving forward: generates control signals at multiple layers – https://www.youtube.com/watch?v=kSLJriaOumA • Inputs: – Images, Videos, 3D data, special cameras (thermal, event cameras) – Video and: audio, text (language), robots • Applications: – Medical – Robots: language/semantic navigation, interacting with object
  • 15. Deep Learning Theory • Graph Neural Networks – Relationships: objects, joints • Few shot, one shot, zero shot learning. Weakly/un supervised Learning • measure uncertainty & class imbalance • Active/online Learning • open-set learning • Architectural search: • Component analyses: – RELU, Augmentation strategy • Resources allocation/compression • Stability/sensitivity/adversarial
  • 16. Deep Learning Theory • Graph Convolutional Networks [https://arxiv.org/pdf/1609.02907.pdf] – http://openaccess.thecvf.com/content_CVPR_2019/papers/Kim_Edge-Labeling_Graph_Neural_Network_for_Few- Shot_Learning_CVPR_2019_paper.pdf • Few shot, one shot, zero shot learning. Weakly/un supervised Learning – http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Few-Shot_Adaptive_Faster_R-CNN_CVPR_2019_paper.pdf • Active Learning • measure uncertainty & class imbalance – http://openaccess.thecvf.com/content_CVPR_2019/papers/Khan_Striking_the_Right_Balance_With_Uncertainty_CVPR_2019_paper.pdf • Online learning, open-set • Architectural search:– http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Auto-DeepLab_Hierarchical_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2019_paper.pdf • Component analyses: – RELU, Augmentation strategy – http://openaccess.thecvf.com/content_CVPR_2019/papers/Cubuk_AutoAugment_Learning_Augmentation_Strategies_From_Data_CVPR_2019_paper.pdf • Resources allocation/compression:– http://openaccess.thecvf.com/content_CVPR_2019/papers/Qiao_Neural_Rejuvenation_Improving_Deep_Network_Training_by_Enhancing_Computational_Resource_CVPR_2019_paper.pdf • Stability/sensitivity/adversarial
  • 17. Recognition • Image: detection, recognition, segmentation, landmarking, identification in the crowd/wild: – Face, hand & body pose estimation • Skeleton, joint localization • Dense pose – Panoptic segmentation, RCNN-family • Video: (person, object, background, and combination): – Action Recognition (1 person): • most active in recognition • Still in 80 actions: space of actions is unknown • Segmenting action in the wild, simultaneous multiple actions is difficult – Social relationship (multiple person): – Video Object segmentation Faster R- CNN, etc (multiple object) – Surveillance: tracking & Re-identification
  • 18. Recognition, cont. • Visual Question Answering (VQA): words & image connection: – Visual dialog – Video Captioning • Video and Audio: – Audio video event recognition – Video enhancement: diarization
  • 19. Overarching Trends • Datasets dictates research activity – Largest datasets are from large entities (Facebook, Google Deep Mind, etc.) – Examples: • Cityscapes: Dashboard Cam: Segmentation: semantic, instance • COCO datasets: Segmentation: semantic, instance • Kinetics Human Action Dataset • Social interaction capture: CMU • Person Re-identification
  • 20. Trends/Predictions Moving Forward • Smaller manually-annotated dataset training catches up in performance – Few, one, no shot training – mixed use real & synthetic data • Grounded recognition and reconstruction (adding more modules to solve a problem robustly): – Image: recognition – segmentation (panoptic) – 3D object reconstruction – space understanding – Video: pose estimation – action recognition – action forecasting – reconstruction • The next superior building block should direct the field again (following SIFT 2004, and DL features 2012)
  • 21. How Do We Apply All These Information? • Have a working knowledge of the ML/CV fundamentals: – theory, software, hardware, models (CNN, RNN) • Start with your use-case: – find keywords in the papers – search blogs for definition, background • Run the open-source code – Understand the limitations – Are they acceptable to your business?

Editor's Notes

  1. Evaluation - Sports Recruiting Self Improvements Robotics, Computer Vision, ML, AI,  robotics is sensor driven & bayesian model
  2. The future is in this field
  3. Flight cameras, Won’t talk a lot on Infrastructure,
  4. Flight cameras, Won’t talk a lot on Infrastructure,
  5. Edges, texture, more complex textures, objects
  6. creating new item from distribution Training
  7. creating new item from distribution Training
  8. Training can be difficult
  9. GAN paper: http://openaccess.thecvf.com/content_CVPR_2019/papers/Karras_A_Style-Based_Generator_Architecture_for_Generative_Adversarial_Networks_CVPR_2019_paper.pdf Animation from Single image Robotics: grasping http://openaccess.thecvf.com/content_CVPR_2019/papers/Huang_Neural_Task_Graphs_Generalizing_to_Unseen_Tasks_From_a_Single_CVPR_2019_paper.pdf “Three Strong Accept” paper: semantic navigation: in the kitchen Interacting with people http://openaccess.thecvf.com/content_CVPR_2019/papers/Wortsman_Learning_to_Learn_How_to_Learn_Self-Adaptive_Visual_Navigation_Using_CVPR_2019_paper.pdf
  10. Architectural search: Network comprised on spatial computation & within layer computation Scaling policies: http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_ELASTIC_Improving_CNNs_With_Dynamic_Scaling_Policies_CVPR_2019_paper.pdf
  11. Architectural search: Network comprised on spatial computation & within layer computation Scaling policies: http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_ELASTIC_Improving_CNNs_With_Dynamic_Scaling_Policies_CVPR_2019_paper.pdf
  12. http://ikea.csail.mit.edu/ Pose estimation is moving forward with dense pose http://densepose.org/ https://github.com/facebookresearch/DensePose http://openaccess.thecvf.com/content_CVPR_2019/papers/Guler_HoloPose_Holistic_3D_Human_Reconstruction_In-The-Wild_CVPR_2019_paper.pdf Pose estimation: Hand & Pose http://openaccess.thecvf.com/content_CVPR_2019/papers/Ge_3D_Hand_Shape_and_Pose_Estimation_From_a_Single_RGB_CVPR_2019_paper.pdf http://openaccess.thecvf.com/content_CVPR_2019/papers/Pavllo_3D_Human_Pose_Estimation_in_Video_With_Temporal_Convolutions_and_CVPR_2019_paper.pdf Mask-R-CNN http://openaccess.thecvf.com/content_CVPR_2019/papers/Huang_Mask_Scoring_R-CNN_CVPR_2019_paper.pdf Panoptic Segmentation [https://arxiv.org/pdf/1801.00868.pdf] Action recognition: Flow Representation http://openaccess.thecvf.com/content_CVPR_2019/papers/Piergiovanni_Representation_Flow_for_Action_Recognition_CVPR_2019_paper.pdf Video Salient object detection http://openaccess.thecvf.com/content_CVPR_2019/papers/Fan_Shifting_More_Attention_to_Video_Salient_Object_Detection_CVPR_2019_paper.pdf Object Relationship: http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhan_On_Exploring_Undetermined_Relationships_for_Visual_Relationship_Detection_CVPR_2019_paper.pdf Video Classification: http://openaccess.thecvf.com/content_CVPR_2019/papers/Bhardwaj_Efficient_Video_Classification_Using_Fewer_Frames_CVPR_2019_paper.pdf Relationship: http://openaccess.thecvf.com/content_CVPR_2019/papers/Sun_Relational_Action_Forecasting_CVPR_2019_paper.pdf Performance/Action Quality http://openaccess.thecvf.com/content_CVPR_2019/papers/Parmar_What_and_How_Well_You_Performed_A_Multitask_Learning_Approach_CVPR_2019_paper.pdf http://openaccess.thecvf.com/content_CVPR_2019/papers/Doughty_The_Pros_and_Cons_Rank-Aware_Temporal_Attention_for_Skill_Determination_CVPR_2019_paper.pdf
  13. http://ikea.csail.mit.edu/ Pose estimation is moving forward with dense pose http://densepose.org/ https://github.com/facebookresearch/DensePose http://openaccess.thecvf.com/content_CVPR_2019/papers/Guler_HoloPose_Holistic_3D_Human_Reconstruction_In-The-Wild_CVPR_2019_paper.pdf Pose estimation: Hand & Pose http://openaccess.thecvf.com/content_CVPR_2019/papers/Ge_3D_Hand_Shape_and_Pose_Estimation_From_a_Single_RGB_CVPR_2019_paper.pdf http://openaccess.thecvf.com/content_CVPR_2019/papers/Pavllo_3D_Human_Pose_Estimation_in_Video_With_Temporal_Convolutions_and_CVPR_2019_paper.pdf Mask-R-CNN http://openaccess.thecvf.com/content_CVPR_2019/papers/Huang_Mask_Scoring_R-CNN_CVPR_2019_paper.pdf Action recognition: Flow Representation http://openaccess.thecvf.com/content_CVPR_2019/papers/Piergiovanni_Representation_Flow_for_Action_Recognition_CVPR_2019_paper.pdf Video Salient object detection http://openaccess.thecvf.com/content_CVPR_2019/papers/Fan_Shifting_More_Attention_to_Video_Salient_Object_Detection_CVPR_2019_paper.pdf Object Relationship: http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhan_On_Exploring_Undetermined_Relationships_for_Visual_Relationship_Detection_CVPR_2019_paper.pdf Video Classification: http://openaccess.thecvf.com/content_CVPR_2019/papers/Bhardwaj_Efficient_Video_Classification_Using_Fewer_Frames_CVPR_2019_paper.pdf Relationship: http://openaccess.thecvf.com/content_CVPR_2019/papers/Sun_Relational_Action_Forecasting_CVPR_2019_paper.pdf Performance/Action Quality http://openaccess.thecvf.com/content_CVPR_2019/papers/Parmar_What_and_How_Well_You_Performed_A_Multitask_Learning_Approach_CVPR_2019_paper.pdf http://openaccess.thecvf.com/content_CVPR_2019/papers/Doughty_The_Pros_and_Cons_Rank-Aware_Temporal_Attention_for_Skill_Determination_CVPR_2019_paper.pdf
  14. Larger context in visual reasoning from language: https://arxiv.org/abs/1906.08237 https://github.com/zihangdai/xlnet http://openaccess.thecvf.com/content_CVPR_2019/papers/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.pdf
  15. CityscapesDataset: https://www.cityscapes-dataset.com/ COCO datasets: http://cocodataset.org Kinetics Human Action Dataset: https://deepmind.com/research/open-source/kinetics Panoptic Studio Dataset: https://www.cs.cmu.edu/~hanbyulj/panoptic-studio/ Person Reidentification Dataset: https://amberer.gitlab.io/papers_in_ai/person-reid.html
  16. Reconstruction: mechanistic understanding Vs. Recognition: discriminative, deeper understanding
  17. Academics try to solve the general problem