SlideShare a Scribd company logo
1 of 42
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Machine Learning
Speaker:
ANCA CIURTE - AI Team Lead at Softvision-
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Outline
● Why machine learning on Android?
● Mostly:
○ Some insights about Object Detection algorithms
○ Practical example in Tensorflow
○ Data gathering and labeling
○ Model training
● Hopefully:
○ It will inspire you to deeg deeper
○ It won’t confuse you too much :)
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Machine learning
Why machine learning on Android?
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
● Why is it useful?
○ StreetView,
○ Self-driving cars etc.
E.g.: Street view - face
blurring
E.g.: Self driving cars - pedestrian
detection
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
● Why is it useful?
○ StreetView,
○ Self-driving cars etc.
● Object detection: impact of deep learning
○ Deep convnets significantly increased
accuracy and processing time
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Why machine learning on Android?
● Object detection
○ Is a very common Computer Vision problem
○ Identifies the objects in the image and
provides their precise location
● Why is it useful?
○ StreetView,
○ Self-driving cars etc.
● Object detection: impact of deep learning
○ Deep convnets significantly increased
accuracy and processing time
● Why on Android?
○ We are living in the era when mobile took over
○ Running on mobile makes it possible to
deliver interactive and real time applications
○ Latest released phones have great computing
power
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Machine learning
Some insights about Object Detection
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Image classification with convnets
● Dataset
○ e.g. Cifar-10 dataset:
■ consists of 60000 32x32 colour images in 10 classes,
with 6000 images per class.
■ There are 50000 training images and 10000 test images.
● Training phase
○ e.g. VGG 16 network
○ input: labeled images (x,y)
Forward propagation (Given wl , compute predictions )
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Intuition about the convolution
Convolution Kernel
(weights)
Input image
* =
Another way to
understand the
convolution operation:
or: Convolution layer
or: Feature Map
or: Network’s parameters
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Image classification with convnets
● Dataset
○ e.g. Cifar-10 dataset:
■ consists of 60000 32x32 colour images in 10 classes,
with 6000 images per class.
■ There are 50000 training images and 10000 test images.
● Training phase
○ e.g. VGG 16 network
○ input: labeled images (x,y)
● Testing phase
○ Use the trained model to classify new instances
○ Detection output: predicted class
Forward propagation (Given wl , compute predictions )
Loss function:
Backward propagation (compute wl+1 by minimizing the loss)
Repeat until
convergence
=> w*
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Relation between classification and object detection
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Relation between classification and object detection
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Relation between classification and object detection
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
Classified as pedestrian:All fragments:
...
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
○ challenges :
■ how to deal with: various object size, various aspect ratio, object overlap or multiple responses
Relation between classification and object detection
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
● We have an accurate way of classifying images
○ e.g.: does this image contain a pedestrian?
● But how can we say WHERE is this pedestrian?
Solution:
● Sliding window
○ strategy:
■ splits into fragments and classify them independently
○ challenges :
■ how to deal with: various object size, various aspect ratio, object overlap or multiple responses
○ problem: need to apply CNN to huge number of locations and scales, very computationally expensive!!
Relation between classification and object detection
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
R-CNN (Region-based convolutional neural network)
Two steps:
● Select object proposals: Selective Search Algorithm
○ it has very low precision to be used as object
detector, but it works fine as a first step in the
detection pipeline
● Apply strong CNN classifier to select proposal
Girshick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
R-CNN (Region-based convolutional neural network)
Two steps:
● Select object proposal: Selective Search Algorithm
○ it has very low precision to be used as object
detector, but it works fine as a first step in the
detection pipeline
● Apply strong CNN classifier to select proposal
It outperforms all the previous object detection algorithms
R-CNN
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
R-CNN (Region-based convolutional neural network)
Two steps:
● Select object proposal: Selective Search Algorithm
○ it has very low precision to be used as object
detector, but it works fine as a first step in the
detection pipeline
● Apply strong CNN classifier to select proposal
It outperforms all the previous object detection algorithms
Limitations:
● Depend on external algorithm hypothesis
● Need to rescale object proposals to fixed resolution
● Redundant computation - all features are
independently computed even for overlapped
proposal regions
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Fast R-CNN
From R-CNN to Fast R-CNN:
● input: image + region proposals
● region pooling on “conv5” feature map for feature
extraction
● softmax classifier instead of SVM classifier
● End to end multi-task training:
○ the last FC layer branch into two sibling
output layers:
■ one that produces softmax
probability estimates over K object
classes
■ another layer that outputs the
bounding box coordinates for each
object.
Girshick, “Fast R-CNN”, ICCV 2015
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Fast R-CNN
From R-CNN to Fast R-CNN:
● input: image + region proposals
● region pooling on “conv5” feature map for feature
extraction
● softmax classifier instead of SVM classifier
● End to end multi-task training:
○ the last FC layer branch into two sibling
output layers:
■ one that produces softmax
probability estimates over K object
classes
■ another layer that outputs the
bounding box coordinates for each
object.
Advantages:
● Higher detection quality (mAP) than R-CNN
● Training is single-stage
● Training can update all network layers at once
● No disk storage is required for feature caching
Girshick, “Fast R-CNN”, ICCV 2015
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Faster R-CNN
Faster R-CNN = Fast R-CNN + RPN (Region Proposal
Network)
● RPN
○ removes dependency from external hypothesis
ROI generation method
○ is a convolutional network trained end-to-end
○ generates a list of high-quality region proposal
(bbox coordinates + objectness scores)
● Then RPN + Fast R-CNN are merged into a single
network by sharing their convolutional features
○ predicts the class of the objects + a refined bbox
position
○ shared convolutional features enables nearly cost-
free region proposals
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster R-CNN: Towards
Real-Time Object Detection with Region Proposal Networks”, NIPS 2015
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
SSD (Single shot detector)
● Extra feature layers
○ additional convolutional feature layers of different sizes are placed at
the end of base net
○ each added feature layer produce a set of detection predictions,
allowing predictions at multiple scales
○ this design lead to simple end-to-end training
Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
SSD (Single shot detector)
● Extra feature layers
○ additional convolutional feature layers of different sizes are placed at
the end of base net
○ each added feature layer produce a set of detection predictions,
allowing predictions at multiple scales
○ this design lead to simple end-to-end training
● ROIs proposal
○ output space of region proposals contains a fixed set of default boxes
over different aspect ratios and scales per feature map location
○ for each default bounding box, predict
○ the shape offsets Δ(cx, cy, w, h) and
○ the confidence for all object categories (c1, …, cp)
● Non-Maxima suppression
4x4 feature map
Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
8x8 feature map
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Compare modern convolutional object detectors
Lots of variables to set up ...
● base net:
○ VGG16
○ ResNet101
○ InceptionV2
○ InceptionV3
○ ResNet
○ MobileNet
● Object detection architecture:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN
○ SSD
● Input image resolution
● Number of region proposal
● Frozen weights - for fine tuning
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Lots of variables to set up ...
● base net:
○ VGG16
○ ResNet101
○ InceptionV2
○ InceptionV3
○ ResNet
○ MobileNet
● Object detection architecture:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN
○ SSD
● Input image resolution
● Number of region proposal
● Frozen weights - for fine tuning
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
Speed/accuracy trade-offs
Compare modern convolutional object detectors
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Lots of variables to set up ...
● base net:
○ VGG16
○ ResNet101
○ InceptionV2
○ InceptionV3
○ ResNet
○ MobileNet
● Object detection architecture:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN
○ SSD
● Input image resolution
● Number of region proposal
● Frozen weights - for fine tuning
Takeaways:
● Faster R-CNN is slower but more accurate
● SSD is much faster but not as accurate (therefore is a good choice for mobile apps)
Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
Speed/accuracy trade-offs
Compare modern convolutional object detectors
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time
Problem to solve:
- a mobile app for real time clothes detection
- class categories: Top, Pants, Shorts, Skirt and Dress
Frameworks:
● Tensorflow Object Detection API
- made by GOOGLE
- an open source framework built on top of TensorFlow that
makes it easy to construct, train and deploy object detection
models
- input: images + labels
- output: inference graph (.pb format)
● LabelImg
- an open source graphical image annotation tool
- annotations are saved as XML files in PASCAL VOC format,
the format used by ImageNet dataset
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time: step by step
● Create dataset and split it into: train (70%) and test (30%) folders
● Label images with LabelImg tool (output: .xml files for each image in dataset)
● Convert .xml to .csv (use dataset/xml_to_csv.py script; output: train.csv, test.csv)
● Convert to TFRecord format
○ set paths (from ../models/research):
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/object_detection
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
○ edit generate_tfrecord.py file and change the label map + path to the train/test folder:
○ finally execute the generate_tfrecord.py script in Terminal:
python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=data/train.record
python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record
○ output: train.record, test.record
● Training
○ create a label map: label_map.pbtxt
○ optional, but recommended :), choose a pretrained model from here
○ prepare the .config file: .../models/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config
○ run training script (from ../models/research/object_detection):
python legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=Ssd_mobilenet_v1_pets.config
● Export inference graph:
python export_inference_graph.py --input_type image_tensor --pipeline_config_path pipeline.config 
--trained_checkpoint_prefix=training/model.ckpt-10750 --output_directory=inference_graph
output: the model in .pb format
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
e-mail: anca.ciurte@softvision.ro
Q&A
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Integrating with Android
Speaker:
MIHALY NAGY - Android Community Influencer at Softvision
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
● Model File
● [Labels File]
● tensorflow-android dependency
● Boilerplate
● Integrate TF to process each frame
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
Bitmap
Recognition
each Frame
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Android + TensorFlow
Follow Along:
http://goo.gl/SYHSb7
https://github.com/code-twister/tf_example
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Coding time
ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com
Thank You!
DroidCon Cluj 2018 - Hands on machine learning on android

More Related Content

Similar to DroidCon Cluj 2018 - Hands on machine learning on android

Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
Antti Haapala
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
Hirantha Pradeep
 

Similar to DroidCon Cluj 2018 - Hands on machine learning on android (20)

Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)
 
KNN Algorithm Using R | Edureka
KNN Algorithm Using R | EdurekaKNN Algorithm Using R | Edureka
KNN Algorithm Using R | Edureka
 
Pelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewPelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper Review
 
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
Artificial intelligence use cases for International Dating Apps. iDate 2018. ...
 
Automatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMAutomatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVM
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
 
Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
Avihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slidesAvihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slides
 
Machine learning ( Part 2 )
Machine learning ( Part 2 )Machine learning ( Part 2 )
Machine learning ( Part 2 )
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
VIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape EstimationVIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape Estimation
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Object Detection for Autonomous Cars using AI/ML
Object Detection for Autonomous Cars using AI/MLObject Detection for Autonomous Cars using AI/ML
Object Detection for Autonomous Cars using AI/ML
 
Fa19_P1.pptx
Fa19_P1.pptxFa19_P1.pptx
Fa19_P1.pptx
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methods
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
 
Automatic image moderation in classifieds
Automatic image moderation in classifiedsAutomatic image moderation in classifieds
Automatic image moderation in classifieds
 
Automatic image moderation in classifieds, Jarosław Szymczak
Automatic image moderation in classifieds, Jarosław SzymczakAutomatic image moderation in classifieds, Jarosław Szymczak
Automatic image moderation in classifieds, Jarosław Szymczak
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 

Recently uploaded (20)

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 

DroidCon Cluj 2018 - Hands on machine learning on android

  • 1.
  • 2.
  • 3. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Machine Learning Speaker: ANCA CIURTE - AI Team Lead at Softvision-
  • 4. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Outline ● Why machine learning on Android? ● Mostly: ○ Some insights about Object Detection algorithms ○ Practical example in Tensorflow ○ Data gathering and labeling ○ Model training ● Hopefully: ○ It will inspire you to deeg deeper ○ It won’t confuse you too much :)
  • 5. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Machine learning Why machine learning on Android?
  • 6. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Why machine learning on Android? ● Object detection ○ Is a very common Computer Vision problem ○ Identifies the objects in the image and provides their precise location
  • 7. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Why machine learning on Android? ● Object detection ○ Is a very common Computer Vision problem ○ Identifies the objects in the image and provides their precise location ● Why is it useful? ○ StreetView, ○ Self-driving cars etc. E.g.: Street view - face blurring E.g.: Self driving cars - pedestrian detection
  • 8. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Why machine learning on Android? ● Object detection ○ Is a very common Computer Vision problem ○ Identifies the objects in the image and provides their precise location ● Why is it useful? ○ StreetView, ○ Self-driving cars etc. ● Object detection: impact of deep learning ○ Deep convnets significantly increased accuracy and processing time
  • 9. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Why machine learning on Android? ● Object detection ○ Is a very common Computer Vision problem ○ Identifies the objects in the image and provides their precise location ● Why is it useful? ○ StreetView, ○ Self-driving cars etc. ● Object detection: impact of deep learning ○ Deep convnets significantly increased accuracy and processing time ● Why on Android? ○ We are living in the era when mobile took over ○ Running on mobile makes it possible to deliver interactive and real time applications ○ Latest released phones have great computing power
  • 10. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Machine learning Some insights about Object Detection
  • 11. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Image classification with convnets ● Dataset ○ e.g. Cifar-10 dataset: ■ consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. ■ There are 50000 training images and 10000 test images. ● Training phase ○ e.g. VGG 16 network ○ input: labeled images (x,y) Forward propagation (Given wl , compute predictions )
  • 12. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Intuition about the convolution Convolution Kernel (weights) Input image * = Another way to understand the convolution operation: or: Convolution layer or: Feature Map or: Network’s parameters
  • 13. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Image classification with convnets ● Dataset ○ e.g. Cifar-10 dataset: ■ consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. ■ There are 50000 training images and 10000 test images. ● Training phase ○ e.g. VGG 16 network ○ input: labeled images (x,y) ● Testing phase ○ Use the trained model to classify new instances ○ Detection output: predicted class Forward propagation (Given wl , compute predictions ) Loss function: Backward propagation (compute wl+1 by minimizing the loss) Repeat until convergence => w*
  • 14. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Relation between classification and object detection ● We have an accurate way of classifying images ○ e.g.: does this image contain a pedestrian? ● But how can we say WHERE is this pedestrian?
  • 15. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Relation between classification and object detection ● We have an accurate way of classifying images ○ e.g.: does this image contain a pedestrian? ● But how can we say WHERE is this pedestrian? Solution: ● Sliding window ○ strategy: ■ splits into fragments and classify them independently
  • 16. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Relation between classification and object detection ● We have an accurate way of classifying images ○ e.g.: does this image contain a pedestrian? ● But how can we say WHERE is this pedestrian? Solution: ● Sliding window ○ strategy: ■ splits into fragments and classify them independently Classified as pedestrian:All fragments: ...
  • 17. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com ● We have an accurate way of classifying images ○ e.g.: does this image contain a pedestrian? ● But how can we say WHERE is this pedestrian? Solution: ● Sliding window ○ strategy: ■ splits into fragments and classify them independently ○ challenges : ■ how to deal with: various object size, various aspect ratio, object overlap or multiple responses Relation between classification and object detection
  • 18. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com ● We have an accurate way of classifying images ○ e.g.: does this image contain a pedestrian? ● But how can we say WHERE is this pedestrian? Solution: ● Sliding window ○ strategy: ■ splits into fragments and classify them independently ○ challenges : ■ how to deal with: various object size, various aspect ratio, object overlap or multiple responses ○ problem: need to apply CNN to huge number of locations and scales, very computationally expensive!! Relation between classification and object detection
  • 19. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com R-CNN (Region-based convolutional neural network) Two steps: ● Select object proposals: Selective Search Algorithm ○ it has very low precision to be used as object detector, but it works fine as a first step in the detection pipeline ● Apply strong CNN classifier to select proposal Girshick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014
  • 20. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com R-CNN (Region-based convolutional neural network) Two steps: ● Select object proposal: Selective Search Algorithm ○ it has very low precision to be used as object detector, but it works fine as a first step in the detection pipeline ● Apply strong CNN classifier to select proposal It outperforms all the previous object detection algorithms R-CNN
  • 21. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com R-CNN (Region-based convolutional neural network) Two steps: ● Select object proposal: Selective Search Algorithm ○ it has very low precision to be used as object detector, but it works fine as a first step in the detection pipeline ● Apply strong CNN classifier to select proposal It outperforms all the previous object detection algorithms Limitations: ● Depend on external algorithm hypothesis ● Need to rescale object proposals to fixed resolution ● Redundant computation - all features are independently computed even for overlapped proposal regions
  • 22. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Fast R-CNN From R-CNN to Fast R-CNN: ● input: image + region proposals ● region pooling on “conv5” feature map for feature extraction ● softmax classifier instead of SVM classifier ● End to end multi-task training: ○ the last FC layer branch into two sibling output layers: ■ one that produces softmax probability estimates over K object classes ■ another layer that outputs the bounding box coordinates for each object. Girshick, “Fast R-CNN”, ICCV 2015
  • 23. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Fast R-CNN From R-CNN to Fast R-CNN: ● input: image + region proposals ● region pooling on “conv5” feature map for feature extraction ● softmax classifier instead of SVM classifier ● End to end multi-task training: ○ the last FC layer branch into two sibling output layers: ■ one that produces softmax probability estimates over K object classes ■ another layer that outputs the bounding box coordinates for each object. Advantages: ● Higher detection quality (mAP) than R-CNN ● Training is single-stage ● Training can update all network layers at once ● No disk storage is required for feature caching Girshick, “Fast R-CNN”, ICCV 2015
  • 24. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Faster R-CNN Faster R-CNN = Fast R-CNN + RPN (Region Proposal Network) ● RPN ○ removes dependency from external hypothesis ROI generation method ○ is a convolutional network trained end-to-end ○ generates a list of high-quality region proposal (bbox coordinates + objectness scores) ● Then RPN + Fast R-CNN are merged into a single network by sharing their convolutional features ○ predicts the class of the objects + a refined bbox position ○ shared convolutional features enables nearly cost- free region proposals Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, NIPS 2015
  • 25. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com SSD (Single shot detector) ● Extra feature layers ○ additional convolutional feature layers of different sizes are placed at the end of base net ○ each added feature layer produce a set of detection predictions, allowing predictions at multiple scales ○ this design lead to simple end-to-end training Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
  • 26. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com SSD (Single shot detector) ● Extra feature layers ○ additional convolutional feature layers of different sizes are placed at the end of base net ○ each added feature layer produce a set of detection predictions, allowing predictions at multiple scales ○ this design lead to simple end-to-end training ● ROIs proposal ○ output space of region proposals contains a fixed set of default boxes over different aspect ratios and scales per feature map location ○ for each default bounding box, predict ○ the shape offsets Δ(cx, cy, w, h) and ○ the confidence for all object categories (c1, …, cp) ● Non-Maxima suppression 4x4 feature map Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016 8x8 feature map
  • 27. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Compare modern convolutional object detectors Lots of variables to set up ... ● base net: ○ VGG16 ○ ResNet101 ○ InceptionV2 ○ InceptionV3 ○ ResNet ○ MobileNet ● Object detection architecture: ○ R-CNN ○ Fast R-CNN ○ Faster R-CNN ○ SSD ● Input image resolution ● Number of region proposal ● Frozen weights - for fine tuning
  • 28. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Lots of variables to set up ... ● base net: ○ VGG16 ○ ResNet101 ○ InceptionV2 ○ InceptionV3 ○ ResNet ○ MobileNet ● Object detection architecture: ○ R-CNN ○ Fast R-CNN ○ Faster R-CNN ○ SSD ● Input image resolution ● Number of region proposal ● Frozen weights - for fine tuning Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017 Speed/accuracy trade-offs Compare modern convolutional object detectors
  • 29. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Lots of variables to set up ... ● base net: ○ VGG16 ○ ResNet101 ○ InceptionV2 ○ InceptionV3 ○ ResNet ○ MobileNet ● Object detection architecture: ○ R-CNN ○ Fast R-CNN ○ Faster R-CNN ○ SSD ● Input image resolution ● Number of region proposal ● Frozen weights - for fine tuning Takeaways: ● Faster R-CNN is slower but more accurate ● SSD is much faster but not as accurate (therefore is a good choice for mobile apps) Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017 Speed/accuracy trade-offs Compare modern convolutional object detectors
  • 30. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Coding time
  • 31. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Coding time Problem to solve: - a mobile app for real time clothes detection - class categories: Top, Pants, Shorts, Skirt and Dress Frameworks: ● Tensorflow Object Detection API - made by GOOGLE - an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models - input: images + labels - output: inference graph (.pb format) ● LabelImg - an open source graphical image annotation tool - annotations are saved as XML files in PASCAL VOC format, the format used by ImageNet dataset
  • 32. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Coding time: step by step ● Create dataset and split it into: train (70%) and test (30%) folders ● Label images with LabelImg tool (output: .xml files for each image in dataset) ● Convert .xml to .csv (use dataset/xml_to_csv.py script; output: train.csv, test.csv) ● Convert to TFRecord format ○ set paths (from ../models/research): export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/object_detection export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim ○ edit generate_tfrecord.py file and change the label map + path to the train/test folder: ○ finally execute the generate_tfrecord.py script in Terminal: python generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=data/train.record python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record ○ output: train.record, test.record ● Training ○ create a label map: label_map.pbtxt ○ optional, but recommended :), choose a pretrained model from here ○ prepare the .config file: .../models/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config ○ run training script (from ../models/research/object_detection): python legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=Ssd_mobilenet_v1_pets.config ● Export inference graph: python export_inference_graph.py --input_type image_tensor --pipeline_config_path pipeline.config --trained_checkpoint_prefix=training/model.ckpt-10750 --output_directory=inference_graph output: the model in .pb format
  • 33. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com e-mail: anca.ciurte@softvision.ro Q&A
  • 34. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Integrating with Android Speaker: MIHALY NAGY - Android Community Influencer at Softvision
  • 35. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Android + TensorFlow
  • 36. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Android + TensorFlow ● Model File ● [Labels File] ● tensorflow-android dependency ● Boilerplate ● Integrate TF to process each frame
  • 37. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Android + TensorFlow
  • 38. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Android + TensorFlow Bitmap Recognition each Frame
  • 39. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Android + TensorFlow Follow Along: http://goo.gl/SYHSb7 https://github.com/code-twister/tf_example
  • 40. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Coding time
  • 41. ATLANTA | AUSTIN | PHILADELPHIA | BENTONVILLE | ROMANIA | INDIA | AUSTRALIA | BRAZIL | NEPAL | CANADA www.softvision.com Thank You!

Editor's Notes

  1. Running on mobile makes it possible to deliver interactive and real time applications in a way that’s not possible when depending on the internet connection
  2. https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
  3. multile scales and aspect ratios are handles by search windows of different size and aspect, or by image scaling
  4. From R-CNN to Fast R-CNN: region pooling on “conv5” feature map for deature extraction softmax classifier instead of SVM classifier Multitask training: the last fc layer branch into two sibling output layers: one that produces softmax probability estimates over K object classes another layer that outputs the bounding box coordinates for each object. First, a CNN is applied on the whole original image with several convolutional (conv) and max pooling layers to produce a conv feature map. Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map and fed into a sequence of fully connected (fc) layers. fc layers finally branch into two sibling output layers: one that produces softmax probability estimates over K object classes another layer that outputs the bounding box coordinates for each object.
  5. From R-CNN to Fast R-CNN: region pooling on “conv5” feature map for deature extraction softmax classifier instead of SVM classifier Multitask training: the last fc layer branch into two sibling output layers: one that produces softmax probability estimates over K object classes another layer that outputs the bounding box coordinates for each object. First, a CNN is applied on the whole original image with several convolutional (conv) and max pooling layers to produce a conv feature map. Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map and fed into a sequence of fully connected (fc) layers. fc layers finally branch into two sibling output layers: one that produces softmax probability estimates over K object classes another layer that outputs the bounding box coordinates for each object.
  6. A Region Proposal Network (RPN) takes an image (of any size) as input and outputs a set of rectangular object proposals, each with an objectness score.
  7. SSD approach: produces a fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes followed by a non-maximum suppression step to produce the final detections. Network generates scores for each default box Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
  8. SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location Wei Liu et al., SSD: Single Shot MultiBox Detector, ECCV 2016
  9. There are several algorithms of Object detection The question is: how well they compete to each other? We define several meta parameters that influence detectors performance Critical points on the curve that can be identified: mAP = mean average precision [Huang et al.] measured the influence of these metaparams on accuracy and speed Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
  10. There are several algorithms of Object detection The question is: how well they compete to each other? We define several meta parameters that influence detectors performance Critical points on the curve that can be identified: mAP = mean average precision [Huang et al.] measured the influence of these metaparams on accuracy and speed Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
  11. There are several algorithms of Object detection The question is: how well they compete to each other? We define several meta parameters that influence detectors performance Critical points on the curve that can be identified: mAP = mean average precision [Huang et al.] measured the influence of these metaparams on accuracy and speed Jonathan Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017
  12. Recognition refers to the objects detected not the process