SlideShare a Scribd company logo
1 of 34
Computer Vision
Landscape :
Present and
Future
Sanghamitra Deb
Staff Data Scientist
Chegg Inc
Data Day Texas, 2023
Outline
• Images
• Enhanced Transcription
o Data Story
o Computer Vision model
o Metrics
o Deployment
• Computer Vision Landscape
• Image Embeddings
Images
Disclaimer: Images are replica’s representing real scenarios
Enhanced Transcription
Computer
Vision Model Transcription
Service
{”text”:”Resonant ocean thicknesses at different forcing frequencies. (a) Location of Europa's
first three largest resonant rotational-gravity modes as a function of forcing frequency and
ocean thickness, for both zonal (m = 0) and sectoral (m = 2) degree-2 modes…..”}
Reference paper: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2020GL088317
Data Story
Version 1
• Collect Data on
cropped images
• Build object
Detection Model
• Measure
performance.
Version 3
Version 2
Version 1
Data Story
Version 1
• Collect Data on
cropped images
• Build object
Detection Model
• Measure
performance.
Performance : Not good enough
Lessons learned: CV models
cannot read, unless objects
are well defined and distinct
detection has a lot of errors
Version 3
Version 2
Data Story
Version 1
• Collect Data on
cropped images
• Build object
Detection Model
• Measure
performance.
Performance : Not good enough
Lessons learned: CV
models cannot read, unless
objects are well defined
and distinct detection has a
lot of errors
Version 3
Version 2
Redefine problem --- Detect
bounding boxes for
• Text
• Equations
• Diagrams and Charts
• UI Elements
• Tables
Performance was good.
This model is currently in production
Data Story
Version 1
• Collect Data on
cropped images
• Build object
Detection Model
• Measure
performance.
Performance : Not good enough
Lessons learned: CV
models cannot read, unless
objects are well defined
and distinct detection has a
lot of errors
Version 2
Redefine problem --- Detect
bounding boxes for
• Text
• Equations
• Diagrams and Charts
• UI Elements
• Tables
Performance was good.
This model is currently in production
Redefine problem ---
Downstream applications
need the text that was
getting cropped out.
• Header Region
• Side Region
• Footer Region
• Question Region
• UI Elements
Version 3
Data Story
Redefine problem --- Detect
bounding boxes for
• Text
• Equations
• Diagrams and Charts
• UI Elements
• Tables
Performance was good.
This model is currently in production
Version 1 Version 3
Version 2
• Collect Data on
cropped images
• Build object
Detection Model
• Measure
performance.
Performance : Not good enough
Lessons learned: CV
models cannot read, unless
objects are well defined
and distinct detection has a
lot of errors
Redefine problem ---
Downstream applications
need the text that was
getting cropped out.
• Header Region
• Side Region
• Footer Region
• Question Region
•
• UI Elements
• Text
• Equations
• Diagrams &
Charts
• Tables
Enhanced Transcription: Version 2
We are extracting Bounding Boxes.
• Text
• Equations
• Diagrams and Charts
• UI Elements
• Tables
Tables
Text
Enhanced Transcription: Version 2
Equations
UI Elements
Diagrams and Charts
Building Object Detection Model: Training Pipeline
What is object Detection
Metrics: Intersection over Union
Predictions: Bounding Boxes (BB), classification labels. IOU is computed for each bounding box
Metrics: mAP@iou=0.5
Metrics are computed for a given IOU threshold.
For a prediction, we may get different binary TRUE or
FALSE positives, by changing the IoU threshold.
Average precision is computed for each class for a threshold of 0.5. mAP is the mean across all classes.
mAP@iou=0.5 >=0.8
Collecting Training Data: LabelBox
Retrieve archival images .
Create annotation project.
Write annotation guide. Make sure 5-
10% of the data is reviewed for quality
checks.
Look for inter-annotator agreement
for a small dataset
Collect labelled data.
Do some spot checks for annotation
quality
Object Detection Models
Region-based Convolutional Neural Networks (R-
CNN)
Cons: Very slow --- propagating thousand’s of RP’s through CNN & classifier takes a very long time
Vanishing/Exploding Gradients
Operation --- multiplying n small / large numbers to compute gradients of the “front” layers in
an n-layer network
When the network is deep, multiplying n small numbers will become zero (vanished).
When the network is deep, multiplying n large numbers will become too large (exploded).
Resnet-2015
Right: Regular CNN, Left: fit some residual , instead of the desired function
H(X) directly. A skip / shortcut connection is added to the input x along with
the output after few weight layers
Layers can be stacked to be 150 layers deep
Plain Network vs RESNET
YOLO (You Only Look Once)
Unified Detection ---
• Uses features from the entire image for prediction
• Predicts Bounding boxes across all classes simultaneously.
• Bounding boxes and classes are predicted in one shot, i.e by
the same network.
Divide input into grids class probability map Final detections
Yolo v5 network
Why Yolo?
o Faster Speed: YOLO algorithms works comparatively faster as compared to other
algorithm. Smaller model is able to process 155 frames per second.
o Accurary: State of art performance on several Object Detection datasets including
COCO.
o Open source code is available in multiple deep learning frameworks.
o Code is well developed and easy to use.
Limitations: small objects that are grouped together do not have good recall
Yolo v5 Pytorch codebase
https://github.com/ultralytics/yolov5
Lets look into the repo
python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt
Model size
Batch size
python detect.py --weights yolov5s.pt --source image.jpg
Deployment
Load Pytorch
model & predict
Bounding Boxes
Crop image with
Bounding Box
output
Send cropped image
to transcription
service
API output:
{Transcribed text,
Bounding box }
Version 2
Version 3
Measuring effectiveness of the Enhanced Transcription
Annotation Task: Labelbox
Which Transcription is better?
Improves Coverage
If the entire image was send to the
transcription service more than 5% of the
images returned “no content found”.
Cropping the image using object detection
removes low quality surrounding elements,
this facilitates recovery of transcription for
2.7% of images
Computer Vision
Landscape
Diagram Embeddings
Pulley diagram
Newton’s second
law
Friction
acceleration
Moment of Inertia
Extract diagram embeddings from pre-trained modes such as Resnet.
Use case
• Similarity based applications --- recommendation systems.
• Converting general predictive model into multimodal models with text , image and structured data features.
• Categorizing diagrams and creating a diagram ontology to create rich metadata.
Takeaways
o Computer Vision models can see but they cannot read.
o Doing a deepdive on metrics ahead of building the model is a good practice.
o YOLO performs well out of the box. Its open source and readily available with
very low latency.
o Building service combining outputs from external vendors requires careful
load testing.
o Having a vision beyond immediate deliverables creates avenues for overall
enrichment of ML products.
Thank You
@sangha_deb
sdeb@chegg.com
References
• Computer Vision Models : https://medium.com/augmented-startups/top-6-object-detection-algorithms-b8e5c41b952f.
https://www.v7labs.com/blog/yolo-object-detection#h1
• https://towardsdatascience.com/map-mean-average-precision-might-confuse-you-5956f1bfa9e2
• R-FCN : https://arxiv.org/pdf/1605.06409.pdf
• YOLOV5 - https://arxiv.org/pdf/2108.11539.pdf

More Related Content

Similar to Computer Vision Landscape : Present and Future

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overviewLEE HOSEONG
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong LeeMoazzem Hossain
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern PresentationDaniel Cahall
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudGábor Szárnyas
 
Sem 2 Presentation
Sem 2 PresentationSem 2 Presentation
Sem 2 PresentationShalom Cohen
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET Journal
 
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxJawadHaider36
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsIRJET Journal
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMeetupDataScienceRoma
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)Yu Huang
 
Automated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with DaskAutomated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with DaskASI Data Science
 

Similar to Computer Vision Landscape : Present and Future (20)

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern Presentation
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 
Sem 2 Presentation
Sem 2 PresentationSem 2 Presentation
Sem 2 Presentation
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
 
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptxDeep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptx
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection Algorithms
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
Automated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with DaskAutomated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with Dask
 

More from Sanghamitra Deb

Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingSanghamitra Deb
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...Sanghamitra Deb
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsSanghamitra Deb
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsSanghamitra Deb
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Sanghamitra Deb
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from textSanghamitra Deb
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relationsSanghamitra Deb
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsSanghamitra Deb
 

More from Sanghamitra Deb (16)

odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic Modeling
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-experts
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUs
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.
 
Data day2017
Data day2017Data day2017
Data day2017
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from text
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relations
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from Reviews
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 

Computer Vision Landscape : Present and Future

  • 1. Computer Vision Landscape : Present and Future Sanghamitra Deb Staff Data Scientist Chegg Inc Data Day Texas, 2023
  • 2. Outline • Images • Enhanced Transcription o Data Story o Computer Vision model o Metrics o Deployment • Computer Vision Landscape • Image Embeddings
  • 3. Images Disclaimer: Images are replica’s representing real scenarios
  • 4. Enhanced Transcription Computer Vision Model Transcription Service {”text”:”Resonant ocean thicknesses at different forcing frequencies. (a) Location of Europa's first three largest resonant rotational-gravity modes as a function of forcing frequency and ocean thickness, for both zonal (m = 0) and sectoral (m = 2) degree-2 modes…..”} Reference paper: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2020GL088317
  • 5. Data Story Version 1 • Collect Data on cropped images • Build object Detection Model • Measure performance. Version 3 Version 2
  • 7. Data Story Version 1 • Collect Data on cropped images • Build object Detection Model • Measure performance. Performance : Not good enough Lessons learned: CV models cannot read, unless objects are well defined and distinct detection has a lot of errors Version 3 Version 2
  • 8. Data Story Version 1 • Collect Data on cropped images • Build object Detection Model • Measure performance. Performance : Not good enough Lessons learned: CV models cannot read, unless objects are well defined and distinct detection has a lot of errors Version 3 Version 2 Redefine problem --- Detect bounding boxes for • Text • Equations • Diagrams and Charts • UI Elements • Tables Performance was good. This model is currently in production
  • 9. Data Story Version 1 • Collect Data on cropped images • Build object Detection Model • Measure performance. Performance : Not good enough Lessons learned: CV models cannot read, unless objects are well defined and distinct detection has a lot of errors Version 2 Redefine problem --- Detect bounding boxes for • Text • Equations • Diagrams and Charts • UI Elements • Tables Performance was good. This model is currently in production Redefine problem --- Downstream applications need the text that was getting cropped out. • Header Region • Side Region • Footer Region • Question Region • UI Elements Version 3
  • 10. Data Story Redefine problem --- Detect bounding boxes for • Text • Equations • Diagrams and Charts • UI Elements • Tables Performance was good. This model is currently in production Version 1 Version 3 Version 2 • Collect Data on cropped images • Build object Detection Model • Measure performance. Performance : Not good enough Lessons learned: CV models cannot read, unless objects are well defined and distinct detection has a lot of errors Redefine problem --- Downstream applications need the text that was getting cropped out. • Header Region • Side Region • Footer Region • Question Region • • UI Elements • Text • Equations • Diagrams & Charts • Tables
  • 11. Enhanced Transcription: Version 2 We are extracting Bounding Boxes. • Text • Equations • Diagrams and Charts • UI Elements • Tables Tables Text
  • 12. Enhanced Transcription: Version 2 Equations UI Elements Diagrams and Charts
  • 13. Building Object Detection Model: Training Pipeline
  • 14. What is object Detection
  • 15. Metrics: Intersection over Union Predictions: Bounding Boxes (BB), classification labels. IOU is computed for each bounding box
  • 16. Metrics: mAP@iou=0.5 Metrics are computed for a given IOU threshold. For a prediction, we may get different binary TRUE or FALSE positives, by changing the IoU threshold. Average precision is computed for each class for a threshold of 0.5. mAP is the mean across all classes. mAP@iou=0.5 >=0.8
  • 17. Collecting Training Data: LabelBox Retrieve archival images . Create annotation project. Write annotation guide. Make sure 5- 10% of the data is reviewed for quality checks. Look for inter-annotator agreement for a small dataset Collect labelled data. Do some spot checks for annotation quality
  • 19. Region-based Convolutional Neural Networks (R- CNN) Cons: Very slow --- propagating thousand’s of RP’s through CNN & classifier takes a very long time
  • 20. Vanishing/Exploding Gradients Operation --- multiplying n small / large numbers to compute gradients of the “front” layers in an n-layer network When the network is deep, multiplying n small numbers will become zero (vanished). When the network is deep, multiplying n large numbers will become too large (exploded).
  • 21. Resnet-2015 Right: Regular CNN, Left: fit some residual , instead of the desired function H(X) directly. A skip / shortcut connection is added to the input x along with the output after few weight layers Layers can be stacked to be 150 layers deep
  • 23. YOLO (You Only Look Once) Unified Detection --- • Uses features from the entire image for prediction • Predicts Bounding boxes across all classes simultaneously. • Bounding boxes and classes are predicted in one shot, i.e by the same network. Divide input into grids class probability map Final detections
  • 25. Why Yolo? o Faster Speed: YOLO algorithms works comparatively faster as compared to other algorithm. Smaller model is able to process 155 frames per second. o Accurary: State of art performance on several Object Detection datasets including COCO. o Open source code is available in multiple deep learning frameworks. o Code is well developed and easy to use. Limitations: small objects that are grouped together do not have good recall
  • 26. Yolo v5 Pytorch codebase https://github.com/ultralytics/yolov5 Lets look into the repo python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt Model size Batch size python detect.py --weights yolov5s.pt --source image.jpg
  • 27. Deployment Load Pytorch model & predict Bounding Boxes Crop image with Bounding Box output Send cropped image to transcription service API output: {Transcribed text, Bounding box } Version 2 Version 3
  • 28. Measuring effectiveness of the Enhanced Transcription Annotation Task: Labelbox Which Transcription is better?
  • 29. Improves Coverage If the entire image was send to the transcription service more than 5% of the images returned “no content found”. Cropping the image using object detection removes low quality surrounding elements, this facilitates recovery of transcription for 2.7% of images
  • 31. Diagram Embeddings Pulley diagram Newton’s second law Friction acceleration Moment of Inertia Extract diagram embeddings from pre-trained modes such as Resnet. Use case • Similarity based applications --- recommendation systems. • Converting general predictive model into multimodal models with text , image and structured data features. • Categorizing diagrams and creating a diagram ontology to create rich metadata.
  • 32. Takeaways o Computer Vision models can see but they cannot read. o Doing a deepdive on metrics ahead of building the model is a good practice. o YOLO performs well out of the box. Its open source and readily available with very low latency. o Building service combining outputs from external vendors requires careful load testing. o Having a vision beyond immediate deliverables creates avenues for overall enrichment of ML products.
  • 34. References • Computer Vision Models : https://medium.com/augmented-startups/top-6-object-detection-algorithms-b8e5c41b952f. https://www.v7labs.com/blog/yolo-object-detection#h1 • https://towardsdatascience.com/map-mean-average-precision-might-confuse-you-5956f1bfa9e2 • R-FCN : https://arxiv.org/pdf/1605.06409.pdf • YOLOV5 - https://arxiv.org/pdf/2108.11539.pdf

Editor's Notes

  1. Classification and Localization --- Done using regression
  2. Selective search. --- extract several thousand region proposals. Each of these region proposals (RP) is labeled with a class and a ground-truth bounding box. A pre-trained CNN is used to extract features for the region proposals through forward propagation. These features are used to predict the class and bounding box of this region proposal using SVMs and linear regression. ROI pooling is followed by fully connected (FC) layers for classification and bounding box regression. The FC layers after ROI pooling do not share among different ROIs and take time. This makes R-CNN approaches slow, and the fully connected layers have a large number of parameters. Fast R-CNN performs the CNN forward propagation once on the entire image. Faster R-CNN reduces the total number of region proposals by using a region proposal network(RPN) instead of selective search to further improve the speed.
  3. Yolo reasons globally about the full image … YOLO models treat object detection as a regression problem. It divides the image into an S × S grid and for each grid cell predicts B bounding boxes, confidence for those boxes, and C class probabilities. These predictions are encoded as an S × S × (B ∗ 5 + C) tensor.
  4. The details of the architecture are beyond the scope of this presentation. YOLO V5 HAS improvements in data augmentation compared to previous models. Resnet is one of the backbones used for the architecture for extracting features. Transformers are used in the prediction head. Predictions from multiple heads are ensembled using techniques such as non-max suppression to predict the bounding boxes. Additionally a resnet model is trained using image patches cropping from training data as classification training set.
  5. Test for I/ contract Send images that have no text and check for the output. Make sure there is logging