SlideShare a Scribd company logo
1 of 21
Download to read offline
© 2019 Kentaro Yoshioka
Dataset Culling: Towards
Efficient Training of Distillation-
based Domain Specific Models
K.Yoshioka(1)(2), E. Lee(2), S. Wong(2), M. Horowitz(2)
(1) Toshiba
(2) Stanford University
IEEE ICIP 2019 Sept. 25
1© 2019 Kentaro Yoshioka
Introduction
• Deep Learning based object detection has excellent accuracy.
• Apps: for security, infrastructure, transportation..
Image credit: [Nest.com]
2© 2019 Kentaro Yoshioka
Introduction
• Cost?
• Requires many GPU-hours, difficult to scale.
• Has accuracy-cost tradeoff.
•How can we break this tradeoff?
101-layer Resnet:
Imagenet accuracy 78%
10-layer Resnet:
Imagenet accuracy 60%
3© 2019 Kentaro Yoshioka
[1]D. Kang, “Noscope: optimizing neural network queries over video at scale,”
[2]R.Mullapudi “Online model distillation for efficient video inference,”
Introduction: Domain Specific Models
• Training compact domain specific models (DSMs) [1,2]
• DSMs: a specialized model for specific env.
{conference room, your house, your office, etc.}
• Cuts down computation cost 5-20x
Surveillance cam. data
General dataset
Images from MS-COCO(http://cocodataset.org/)
4© 2019 Kentaro Yoshioka
Introduction: What is Distillation?
• Teacher model teaches the small student model to learn
• Works without human interference
Teacher provides “answers”
Teacher model (large, general)
Train model
Domain data Teacher model (large, general)
Domain Specific Model
(Small, specialized)
5© 2019 Kentaro Yoshioka
Introduction: The Problem
• Can gather lots of training data easily..
• A day’s worth of surveillance data
=86,400 images @ 1FPS
• Training 86,400 images require over 100 GPU-hours
(Nvidia K80 on AWS) to train.
• Unable to scale to deploying DSMs to thousands of cameras
• Reducing the DSM training cost has not been explored.
6© 2019 Kentaro Yoshioka
Dataset Culling
7© 2019 Kentaro Yoshioka
Basic Idea of Dataset Culling
• Reduces the dataset size 300x
•Culls only “Easy” data; model accuracy is not harmed
Total training time:
104  2.2 GPU-hours
47x improvement 
8© 2019 Kentaro Yoshioka
What is good training data?
• “Difficult” data which the model makes a lot of mistakes.
• No backprop is done if the model can perfectly predict.
 Does not contribute to training.
• Comparing teacher-student predictions are costly..
• Can we assess from student predictions only?
9© 2019 Kentaro Yoshioka
Difficulty assessment from confidence
• Quantify good data by proposed “confidence loss”
• Assesses the difficulty of prediction from the output probability.
• Utilize a “pretrained” model.
Car=0.49
Car=0.69
Car=0.79
Car=1.0
Person=0.19
10© 2019 Kentaro Yoshioka
Difficulty assessment from confidence
• Quantify good data by proposed “confidence loss”
• Assesses the difficulty of prediction from the output probability
Car=0.49
Car=0.69
Car=0.79
Car=1.0
Image Conf. Loss: 3.79
 Cull images with low loss.
Σ
0
0.25
0.5
0.75
1
0 0.25 0.5 0.75 1
Lconf
Detection Confidence
Compute loss for all detections..
Person=0.19
11© 2019 Kentaro Yoshioka
Examples.
Data kept. Data culled.
12© 2019 Kentaro Yoshioka
Dataset culling pipeline
• First, cull dataset using only the student model
• Culls out majority of the data first (50x).
Cull
50x
13© 2019 Kentaro Yoshioka
Dataset culling pipeline
• Then, conduct a secondary culling using both
teacher-student predictions.
• Contributes to boosting the trained model accuracy.
• Data is culled up to 300x by the pipeline.
14© 2019 Kentaro Yoshioka
Dataset culling pipeline
Details in paper
• Then, conduct a secondary culling using both
teacher-student predictions.
• Contributes to boosting the trained model accuracy.
• Data is culled up to 300x by the pipeline.
15© 2019 Kentaro Yoshioka
Experiments
16© 2019 Kentaro Yoshioka
Experiment setups
• Models pretrained on MS-COCO:
• Student: Resnet-18 based Faster-RCNN
• Teacher: Resnet-101 based Faster-RCNN
• Dataset: 5 custom videos acquired from Youtube.
• Train: first 24-hours
• Validation: Subsequent 6-hours
• Utilize teacher output as ground-truths
17© 2019 Kentaro Yoshioka
Qualitative results
RawStudent TrainStudent TrainStudent+optResolution Teacher
mAP=90.2, comp=28G
mAP=94.8, comp=28G
mAP=81.6, comp=28G
mAP=78.2, comp=28G
mAP=71.3, comp=28G
mAP=52.7, comp=28G
Oracle, comp=1
Oracle, comp=1
Oracle, comp=1
mAP=89.6, comp=7G
mAP=93.2, comp=18G
mAP=80.7, comp=18G
udent TrainStudent TrainStudent+optResolution Teacher
mAP=90.2, comp=28G
mAP=94.8, comp=28G
mAP=81.6, comp=28G
comp=28G
comp=28G
comp=28G
Oracle, comp=128G
Oracle, comp=128G
Oracle, comp=128G
mAP=89.6, comp=7G
mAP=93.2, comp=18G
mAP=80.7, comp=18G
18© 2019 Kentaro Yoshioka
Quantitative Results
64 128 256
Full
(86,400)
No
Training
Mean Accuracy
[mAP]
85.56
(-3.0%)
88.3
(-0.3%)
89.3
(+0.8%)
88.5 58.6
Total train time
[hours]
1.9 (54x) 2.0 (50x) 2.2 (47x) 104 -
Student predictions 1.54 1.54 1.54 - -
Student training 0.07 0.14 0.28 96 -
Teacher predictions 0.33 0.33 0.33 8 -
Culled dataset size
• Can cull the dataset size to 300x, without accuracy drops
or even with slight improvements.
19© 2019 Kentaro Yoshioka
Conclusions
• While DSMs can reduce the inference cost, training them
can take many GPU-hours.
• We proposed Dataset Culling, which reduces the DSM
training cost by 47x.
•We found that by culling easy-to-predict data, the
accuracy drop can be minimized.
•Evaluated on our long-duration dataset, we saw little
accuracy penalty when trained with culled datasets.
•One step towards deploying DSMs to the real world 
Codes and dataset available:
https://github.com/kentaroy47/DatasetCulling
20© 2019 Kentaro Yoshioka
Ablation study
• Entropy implements the loss function for active learning.
• Using teacher-student comparisons achieve best accuracy
(Precision)
• Our dataset culling pipeline with Confidence + Precision has
the best tradeoff of accuracy and training time.

More Related Content

Similar to Dataset Culling: Towards Efficient Training of Distillation based Domain Specific Models

Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...DataWorks Summit
 
CAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNINGCAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNINGIRJET Journal
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGIRJET Journal
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGIRJET Journal
 
IRJET- Sketch-Verse: Sketch Image Inversion using DCNN
IRJET- Sketch-Verse: Sketch Image Inversion using DCNNIRJET- Sketch-Verse: Sketch Image Inversion using DCNN
IRJET- Sketch-Verse: Sketch Image Inversion using DCNNIRJET Journal
 
IRJET- Rice QA using Deep Learning
IRJET- Rice QA using Deep LearningIRJET- Rice QA using Deep Learning
IRJET- Rice QA using Deep LearningIRJET Journal
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET Journal
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation SystemIRJET Journal
 
Data-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsData-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsAndreas Metzger
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkDatabricks
 
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...TigerGraph
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
A Ensemble Learning-based No Reference QoE Model for User Generated Contents
A Ensemble Learning-based No Reference QoE Model for User Generated ContentsA Ensemble Learning-based No Reference QoE Model for User Generated Contents
A Ensemble Learning-based No Reference QoE Model for User Generated ContentsDuc Nguyen
 
Certified Data Science Specialist Course Preview Dr. Nickholas
Certified Data Science Specialist Course Preview Dr. NickholasCertified Data Science Specialist Course Preview Dr. Nickholas
Certified Data Science Specialist Course Preview Dr. NickholasiTrainMalaysia1
 
IRJET - Obstacle Detection using a Stereo Vision of a Car
IRJET -  	  Obstacle Detection using a Stereo Vision of a CarIRJET -  	  Obstacle Detection using a Stereo Vision of a Car
IRJET - Obstacle Detection using a Stereo Vision of a CarIRJET Journal
 
Power through your high school courseload with a responsive Chromebook
Power through your high school courseload with a responsive ChromebookPower through your high school courseload with a responsive Chromebook
Power through your high school courseload with a responsive ChromebookPrincipled Technologies
 
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...Edge AI and Vision Alliance
 

Similar to Dataset Culling: Towards Efficient Training of Distillation based Domain Specific Models (20)

Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
 
CAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNINGCAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNING
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
 
IRJET- Sketch-Verse: Sketch Image Inversion using DCNN
IRJET- Sketch-Verse: Sketch Image Inversion using DCNNIRJET- Sketch-Verse: Sketch Image Inversion using DCNN
IRJET- Sketch-Verse: Sketch Image Inversion using DCNN
 
IRJET- Rice QA using Deep Learning
IRJET- Rice QA using Deep LearningIRJET- Rice QA using Deep Learning
IRJET- Rice QA using Deep Learning
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural Networks
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
Data-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsData-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software Systems
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
 
Self Driving Car
Self Driving CarSelf Driving Car
Self Driving Car
 
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
A Ensemble Learning-based No Reference QoE Model for User Generated Contents
A Ensemble Learning-based No Reference QoE Model for User Generated ContentsA Ensemble Learning-based No Reference QoE Model for User Generated Contents
A Ensemble Learning-based No Reference QoE Model for User Generated Contents
 
Certified Data Science Specialist Course Preview Dr. Nickholas
Certified Data Science Specialist Course Preview Dr. NickholasCertified Data Science Specialist Course Preview Dr. Nickholas
Certified Data Science Specialist Course Preview Dr. Nickholas
 
IRJET - Obstacle Detection using a Stereo Vision of a Car
IRJET -  	  Obstacle Detection using a Stereo Vision of a CarIRJET -  	  Obstacle Detection using a Stereo Vision of a Car
IRJET - Obstacle Detection using a Stereo Vision of a Car
 
Power through your high school courseload with a responsive Chromebook
Power through your high school courseload with a responsive ChromebookPower through your high school courseload with a responsive Chromebook
Power through your high school courseload with a responsive Chromebook
 
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
 

Recently uploaded

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

Dataset Culling: Towards Efficient Training of Distillation based Domain Specific Models

  • 1. © 2019 Kentaro Yoshioka Dataset Culling: Towards Efficient Training of Distillation- based Domain Specific Models K.Yoshioka(1)(2), E. Lee(2), S. Wong(2), M. Horowitz(2) (1) Toshiba (2) Stanford University IEEE ICIP 2019 Sept. 25
  • 2. 1© 2019 Kentaro Yoshioka Introduction • Deep Learning based object detection has excellent accuracy. • Apps: for security, infrastructure, transportation.. Image credit: [Nest.com]
  • 3. 2© 2019 Kentaro Yoshioka Introduction • Cost? • Requires many GPU-hours, difficult to scale. • Has accuracy-cost tradeoff. •How can we break this tradeoff? 101-layer Resnet: Imagenet accuracy 78% 10-layer Resnet: Imagenet accuracy 60%
  • 4. 3© 2019 Kentaro Yoshioka [1]D. Kang, “Noscope: optimizing neural network queries over video at scale,” [2]R.Mullapudi “Online model distillation for efficient video inference,” Introduction: Domain Specific Models • Training compact domain specific models (DSMs) [1,2] • DSMs: a specialized model for specific env. {conference room, your house, your office, etc.} • Cuts down computation cost 5-20x Surveillance cam. data General dataset Images from MS-COCO(http://cocodataset.org/)
  • 5. 4© 2019 Kentaro Yoshioka Introduction: What is Distillation? • Teacher model teaches the small student model to learn • Works without human interference Teacher provides “answers” Teacher model (large, general) Train model Domain data Teacher model (large, general) Domain Specific Model (Small, specialized)
  • 6. 5© 2019 Kentaro Yoshioka Introduction: The Problem • Can gather lots of training data easily.. • A day’s worth of surveillance data =86,400 images @ 1FPS • Training 86,400 images require over 100 GPU-hours (Nvidia K80 on AWS) to train. • Unable to scale to deploying DSMs to thousands of cameras • Reducing the DSM training cost has not been explored.
  • 7. 6© 2019 Kentaro Yoshioka Dataset Culling
  • 8. 7© 2019 Kentaro Yoshioka Basic Idea of Dataset Culling • Reduces the dataset size 300x •Culls only “Easy” data; model accuracy is not harmed Total training time: 104  2.2 GPU-hours 47x improvement 
  • 9. 8© 2019 Kentaro Yoshioka What is good training data? • “Difficult” data which the model makes a lot of mistakes. • No backprop is done if the model can perfectly predict.  Does not contribute to training. • Comparing teacher-student predictions are costly.. • Can we assess from student predictions only?
  • 10. 9© 2019 Kentaro Yoshioka Difficulty assessment from confidence • Quantify good data by proposed “confidence loss” • Assesses the difficulty of prediction from the output probability. • Utilize a “pretrained” model. Car=0.49 Car=0.69 Car=0.79 Car=1.0 Person=0.19
  • 11. 10© 2019 Kentaro Yoshioka Difficulty assessment from confidence • Quantify good data by proposed “confidence loss” • Assesses the difficulty of prediction from the output probability Car=0.49 Car=0.69 Car=0.79 Car=1.0 Image Conf. Loss: 3.79  Cull images with low loss. Σ 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 Lconf Detection Confidence Compute loss for all detections.. Person=0.19
  • 12. 11© 2019 Kentaro Yoshioka Examples. Data kept. Data culled.
  • 13. 12© 2019 Kentaro Yoshioka Dataset culling pipeline • First, cull dataset using only the student model • Culls out majority of the data first (50x). Cull 50x
  • 14. 13© 2019 Kentaro Yoshioka Dataset culling pipeline • Then, conduct a secondary culling using both teacher-student predictions. • Contributes to boosting the trained model accuracy. • Data is culled up to 300x by the pipeline.
  • 15. 14© 2019 Kentaro Yoshioka Dataset culling pipeline Details in paper • Then, conduct a secondary culling using both teacher-student predictions. • Contributes to boosting the trained model accuracy. • Data is culled up to 300x by the pipeline.
  • 16. 15© 2019 Kentaro Yoshioka Experiments
  • 17. 16© 2019 Kentaro Yoshioka Experiment setups • Models pretrained on MS-COCO: • Student: Resnet-18 based Faster-RCNN • Teacher: Resnet-101 based Faster-RCNN • Dataset: 5 custom videos acquired from Youtube. • Train: first 24-hours • Validation: Subsequent 6-hours • Utilize teacher output as ground-truths
  • 18. 17© 2019 Kentaro Yoshioka Qualitative results RawStudent TrainStudent TrainStudent+optResolution Teacher mAP=90.2, comp=28G mAP=94.8, comp=28G mAP=81.6, comp=28G mAP=78.2, comp=28G mAP=71.3, comp=28G mAP=52.7, comp=28G Oracle, comp=1 Oracle, comp=1 Oracle, comp=1 mAP=89.6, comp=7G mAP=93.2, comp=18G mAP=80.7, comp=18G udent TrainStudent TrainStudent+optResolution Teacher mAP=90.2, comp=28G mAP=94.8, comp=28G mAP=81.6, comp=28G comp=28G comp=28G comp=28G Oracle, comp=128G Oracle, comp=128G Oracle, comp=128G mAP=89.6, comp=7G mAP=93.2, comp=18G mAP=80.7, comp=18G
  • 19. 18© 2019 Kentaro Yoshioka Quantitative Results 64 128 256 Full (86,400) No Training Mean Accuracy [mAP] 85.56 (-3.0%) 88.3 (-0.3%) 89.3 (+0.8%) 88.5 58.6 Total train time [hours] 1.9 (54x) 2.0 (50x) 2.2 (47x) 104 - Student predictions 1.54 1.54 1.54 - - Student training 0.07 0.14 0.28 96 - Teacher predictions 0.33 0.33 0.33 8 - Culled dataset size • Can cull the dataset size to 300x, without accuracy drops or even with slight improvements.
  • 20. 19© 2019 Kentaro Yoshioka Conclusions • While DSMs can reduce the inference cost, training them can take many GPU-hours. • We proposed Dataset Culling, which reduces the DSM training cost by 47x. •We found that by culling easy-to-predict data, the accuracy drop can be minimized. •Evaluated on our long-duration dataset, we saw little accuracy penalty when trained with culled datasets. •One step towards deploying DSMs to the real world  Codes and dataset available: https://github.com/kentaroy47/DatasetCulling
  • 21. 20© 2019 Kentaro Yoshioka Ablation study • Entropy implements the loss function for active learning. • Using teacher-student comparisons achieve best accuracy (Precision) • Our dataset culling pipeline with Confidence + Precision has the best tradeoff of accuracy and training time.