5 Practical Steps to a
Successful
Deep Learning Research
Amir Alush, Phd
Co founder & CTO
Brodmann17
Founded in 2016, 19+ team, mostly M.Sc. / Ph.D. machine learning researchers.
Backed by: lool Ventures, Maniv Ventures, Sony Innovation & SamsungNEXT
Brodmann17 has designed a Deep learning technology independently (patents
pending). from scratch, with optimal performance and accuracy by design.
Brodmann17 is developing perception software for the world’s largest Tier-1
automotive suppliers, for pre-install/aftermarket ADAS & autonomous driving.
Things I’ll Talk About
1. Requirements
3. Data Annotation
4. Research Evaluation Metric
5. Research
2. Data Collection
Step 1: Set your Requirements
Open Research: can lead to new great products but it’s also risky.
Always keep this alive!
Product oriented research: must have clear requirements:
● What is the task?
● What is the data?
● What is the target platform? cpu (arm/intel), gpu(arm/nvidia)
Step 1: Set your Requirements (example)
Smart Doorbell “Requirements”:
● Task:
○ Alert when a human appears (once) with 98%Recall, 1
False Alarms per week
○ 0.3-1.5 meters distance from camera
○ Full / upper body only
○ Unique id per person
● Input:
○ 720p RGB images, 30fps
○ Camera height: 1.5 meters from ground
● Platform & Run-Time:
○ Raspberry Pi 3, 1xA53 ARM CPU
○ 0.5 sec latency from appearance to alert
Things I’ll Talk About
1. Requirements
3. Data Annotation
4. Research Evaluation Metric
5. Research
2. Data Collection
Step 2: Data Collection
Data collection is a long and expensive process:
● Long: proprietary setup, requires variability, takes its times,
depending on another company
● Expensive: buying data, special setup, storage, management, etc.
You should:
● Start early
● Collect the Right data. Plan thoughtfully, wrong data could hold back
your product release
Step 2: Data Collection (quantity)
How much data do I need ?
● Quantity is Important, it comes with a price tag and time
● Quality is more important
It’s a continuous process:
1. Start with a small subset for fast POC to reduce risks
2. Increase the collection rate
3. Data collection to improve research metrics
Just putting it here:
● Academic data
● Synthesizing data
Step 2: Data Collection (quality)
Meet product requirements:
● Same modality
● Cover expected operation mode distribution
(scene appearance, objects appearance, viewpoint, etc..)
Using Pascal/Coco for the Smart Doorbell?
Step 2: Data Collection (quality)
Meet product requirements:
● Same modality
● Cover expected operation mode distribution
(scene appearance, objects appearance, viewpoint, etc..)
Doorbell camera example
X X X X OK
Step 2: Data Collection (quality)
Meet product requirements:
● Same modality
● Cover expected operation mode distribution (e.g. scene appearance,
objects appearance, viewpoint, scene type)
Traffic monitoring application
X
X
X
X
X OK
Step 2: Data Collection (quality)
Data with Variability
● Collecting a correlated data set is easy
● Data under different conditions: e.g. location, time of day, season,
weather condition
● Data coming from multiple sources (cameras, devices)
Step 2: Data Collection (quality)
Data with Variability
● Collecting a correlated data set is easy
● Data under different conditions: e.g. location, time of day, season,
weather condition
● Data coming from multiple sources (cameras, devices)
Things I’ll Talk About
1. Requirements
3. Data Annotation
4. Research Evaluation Metric
5. Research
2. Data Collection
Step 3: Data Annotation (Quantity)
More Expensive and time consuming than the collection part
● Could cost up to several $ per frame!
● Understand what you’ll be needing in the research phase
Step 3: Data Annotation (Quantity)
Choose what data to annotate:
● You should not annotate all your data
● Annotate quality data
It’s a continuous process:
● Start with a small subset and a fixed annotation scheme
● Increase the annotation rate
Step 3: Data Annotation (Quality)
Supervised Learning:
● This is the actual data your models are trained with
● Your model will get as good as your data!
Annotation guidelines are derived from Product Requirements
● Usually not straightforward
● Should be fine detailed
● New data annotation scheme / re annotate /cleaning to improve
research metrics
Step 3: Data Annotation (Quality)
How would you annotate this person?
Step 3: Data Annotation (Quality)
How would you annotate this face?
Consistency and clarity is important:
● Not to confuse your learning process
● Not to confuse your annotators
● Not to fail your research evaluation metric
● Other algorithms are depending on this
annotation
Step 3: Data Annotation (Quality)
How would you annotate these objects?
Step 3: Data Annotation (Quality)
Quality Assurance:
● Several annotators → costly
● Familiar annotators (with a name) are a good choice
● Tight definition of the task
● Automatic validation
● Simple tasks, or pre-process to simplify
Step 3: Data Annotation (Costs)
Optimize Costs & Throughput:
● Bootstrap to initialize/prioritize annotation
● Use Temporal Information
● Use any available information
● Preprocess to simplify tasks
● Build your own annotation infrastructure or 3rd party?
Things I’ll Talk About
1. Requirements
3. Data Annotation
4. Research Evaluation Metric
5. Research
2. Data Collection
Step 4: Research Evaluation Metric
Thus far we have:
1. Product requirements
2. Initial Data collection + annotation strategy
Before you start your research experiments set a research
evaluation metric (a single number*)
*Andrew Ng
Step 4: Research Evaluation Metric
● There are many ways to evaluate an experiment:
○ e.g. TPR, aDR, FPR, mAP, latency, etc..
○ Improving one metric can lower another one..
● It’s more efficient (time & resources) to advance with a clear target
EvaluationMetric
EvaluationMetric
Time / Resources Time / Resources
requirements achieved requirements achieved
?
Optimizing for a single evaluation metric Optimizing for several evaluation metrics
Steps 1-4 Overview by Example
Smart DoorBell example:
Step 1 - Product Requirements
Step 2 - Data Collection:
● 720p RGB Videos, 1.5m heigh
● 20% “No Objects” Videos, 80% “With Objects” Videos
Step 3 - Data Annotation:
● Annotate only objects up to 1.5m away
● Full body + Upper body only bounding boxes
● Annotate 5% Full Videos, 95% Sampled videos
Step 4 - Evaluation Metric
● 98% Recall w/(1FPPW, <500msecs) for Object Detection Task
Things I’ll Talk About
1. Requirements
3. Data Annotation
4. Research Evaluation Metric
5. Research
2. Data Collection
Step 5: Research
Research
Experiments
Error Analysis
Data
(collection/annotation)
Step 5: Research
Applied research is an empirical process.
1. Research Experiments Phase:
● Deep Learning architectures
● Learning hyper parameters
● Data manipulations
● Other ...
Step 5: Research
Applied research is an empirical process.
2. Analysis Phase:
● Split data into train / validation / test
● Bias / Variance (on validation and training data) *
● Rank the factors that impact our evaluation metric the most (on validation data)
Feature %Error Priority
Wrong Annotation 25% 1
Close Objects 20% 2
Truncated Objects 3% 3
Umbrellas 1% 4
... ... ...
** https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
Step 5: Research
Applied research is an empirical process.
3. Data Phase:
● Clean Data / Re annotate / Change Annotation Scheme
● Data Collection
Step 5: Research
Applied research is an empirical process.
Next Iteration:
● What to explore / fix next in our Deep Learning models
Step 5: Research
The research phase is very resource demanding:
● Researchers
● Compute
● Time
Optimize this in order to shorten time to product:
● Researchers → Increase productivity
● Compute → reduce costs
Step 5: Research
Running an experiment involves:
● Planning the experiment (ok)
● Setting up a compute environment (oh)
● Data selection, preprocessing, fetching (oh)
● Monitoring Periodic evaluation (oh)
● Managing a pipeline of algorithms (oh)
● Saving intermediate results (oh)
Step 5: Research
Running many experiments doesn’t scale up!:
● Managing compute environment resources and prioritization
● Monitoring many experiments
● Analysing many experiments results
● Experiment traceability over time: code, data, experiment configuration versioning
A dedicated infrastructure and management system is needed to:
● Shared resources management
● Orchestrate the training of the different models
● Monitoring the various experiments, training configurations, models
● Allowing to build complicated algorithms pipelines and running effortlessly
Build your own or use 3rd party?
Topics Covered
1. Requirements
3. Data Annotation
4. Research Evaluation Metric
5. Research
2. Data Collection
We are always looking for new talents
Passionate about AI and want to explore more ?
We invite to join us on our journey!
For Jobs opportunities:
https://www.linkedin.com/company/brodmann17/
THANK YOU
Amir Alush, Phd - Co founder & CTO
amir@brodmann17.com

5 Practical Steps to a Successful Deep Learning Research

  • 1.
    5 Practical Stepsto a Successful Deep Learning Research Amir Alush, Phd Co founder & CTO
  • 2.
    Brodmann17 Founded in 2016,19+ team, mostly M.Sc. / Ph.D. machine learning researchers. Backed by: lool Ventures, Maniv Ventures, Sony Innovation & SamsungNEXT Brodmann17 has designed a Deep learning technology independently (patents pending). from scratch, with optimal performance and accuracy by design. Brodmann17 is developing perception software for the world’s largest Tier-1 automotive suppliers, for pre-install/aftermarket ADAS & autonomous driving.
  • 3.
    Things I’ll TalkAbout 1. Requirements 3. Data Annotation 4. Research Evaluation Metric 5. Research 2. Data Collection
  • 4.
    Step 1: Setyour Requirements Open Research: can lead to new great products but it’s also risky. Always keep this alive! Product oriented research: must have clear requirements: ● What is the task? ● What is the data? ● What is the target platform? cpu (arm/intel), gpu(arm/nvidia)
  • 5.
    Step 1: Setyour Requirements (example) Smart Doorbell “Requirements”: ● Task: ○ Alert when a human appears (once) with 98%Recall, 1 False Alarms per week ○ 0.3-1.5 meters distance from camera ○ Full / upper body only ○ Unique id per person ● Input: ○ 720p RGB images, 30fps ○ Camera height: 1.5 meters from ground ● Platform & Run-Time: ○ Raspberry Pi 3, 1xA53 ARM CPU ○ 0.5 sec latency from appearance to alert
  • 6.
    Things I’ll TalkAbout 1. Requirements 3. Data Annotation 4. Research Evaluation Metric 5. Research 2. Data Collection
  • 7.
    Step 2: DataCollection Data collection is a long and expensive process: ● Long: proprietary setup, requires variability, takes its times, depending on another company ● Expensive: buying data, special setup, storage, management, etc. You should: ● Start early ● Collect the Right data. Plan thoughtfully, wrong data could hold back your product release
  • 8.
    Step 2: DataCollection (quantity) How much data do I need ? ● Quantity is Important, it comes with a price tag and time ● Quality is more important It’s a continuous process: 1. Start with a small subset for fast POC to reduce risks 2. Increase the collection rate 3. Data collection to improve research metrics Just putting it here: ● Academic data ● Synthesizing data
  • 9.
    Step 2: DataCollection (quality) Meet product requirements: ● Same modality ● Cover expected operation mode distribution (scene appearance, objects appearance, viewpoint, etc..) Using Pascal/Coco for the Smart Doorbell?
  • 10.
    Step 2: DataCollection (quality) Meet product requirements: ● Same modality ● Cover expected operation mode distribution (scene appearance, objects appearance, viewpoint, etc..) Doorbell camera example X X X X OK
  • 11.
    Step 2: DataCollection (quality) Meet product requirements: ● Same modality ● Cover expected operation mode distribution (e.g. scene appearance, objects appearance, viewpoint, scene type) Traffic monitoring application X X X X X OK
  • 12.
    Step 2: DataCollection (quality) Data with Variability ● Collecting a correlated data set is easy ● Data under different conditions: e.g. location, time of day, season, weather condition ● Data coming from multiple sources (cameras, devices)
  • 13.
    Step 2: DataCollection (quality) Data with Variability ● Collecting a correlated data set is easy ● Data under different conditions: e.g. location, time of day, season, weather condition ● Data coming from multiple sources (cameras, devices)
  • 14.
    Things I’ll TalkAbout 1. Requirements 3. Data Annotation 4. Research Evaluation Metric 5. Research 2. Data Collection
  • 15.
    Step 3: DataAnnotation (Quantity) More Expensive and time consuming than the collection part ● Could cost up to several $ per frame! ● Understand what you’ll be needing in the research phase
  • 16.
    Step 3: DataAnnotation (Quantity) Choose what data to annotate: ● You should not annotate all your data ● Annotate quality data It’s a continuous process: ● Start with a small subset and a fixed annotation scheme ● Increase the annotation rate
  • 17.
    Step 3: DataAnnotation (Quality) Supervised Learning: ● This is the actual data your models are trained with ● Your model will get as good as your data! Annotation guidelines are derived from Product Requirements ● Usually not straightforward ● Should be fine detailed ● New data annotation scheme / re annotate /cleaning to improve research metrics
  • 18.
    Step 3: DataAnnotation (Quality) How would you annotate this person?
  • 19.
    Step 3: DataAnnotation (Quality) How would you annotate this face? Consistency and clarity is important: ● Not to confuse your learning process ● Not to confuse your annotators ● Not to fail your research evaluation metric ● Other algorithms are depending on this annotation
  • 20.
    Step 3: DataAnnotation (Quality) How would you annotate these objects?
  • 21.
    Step 3: DataAnnotation (Quality) Quality Assurance: ● Several annotators → costly ● Familiar annotators (with a name) are a good choice ● Tight definition of the task ● Automatic validation ● Simple tasks, or pre-process to simplify
  • 22.
    Step 3: DataAnnotation (Costs) Optimize Costs & Throughput: ● Bootstrap to initialize/prioritize annotation ● Use Temporal Information ● Use any available information ● Preprocess to simplify tasks ● Build your own annotation infrastructure or 3rd party?
  • 23.
    Things I’ll TalkAbout 1. Requirements 3. Data Annotation 4. Research Evaluation Metric 5. Research 2. Data Collection
  • 24.
    Step 4: ResearchEvaluation Metric Thus far we have: 1. Product requirements 2. Initial Data collection + annotation strategy Before you start your research experiments set a research evaluation metric (a single number*) *Andrew Ng
  • 25.
    Step 4: ResearchEvaluation Metric ● There are many ways to evaluate an experiment: ○ e.g. TPR, aDR, FPR, mAP, latency, etc.. ○ Improving one metric can lower another one.. ● It’s more efficient (time & resources) to advance with a clear target EvaluationMetric EvaluationMetric Time / Resources Time / Resources requirements achieved requirements achieved ? Optimizing for a single evaluation metric Optimizing for several evaluation metrics
  • 26.
    Steps 1-4 Overviewby Example Smart DoorBell example: Step 1 - Product Requirements Step 2 - Data Collection: ● 720p RGB Videos, 1.5m heigh ● 20% “No Objects” Videos, 80% “With Objects” Videos Step 3 - Data Annotation: ● Annotate only objects up to 1.5m away ● Full body + Upper body only bounding boxes ● Annotate 5% Full Videos, 95% Sampled videos Step 4 - Evaluation Metric ● 98% Recall w/(1FPPW, <500msecs) for Object Detection Task
  • 27.
    Things I’ll TalkAbout 1. Requirements 3. Data Annotation 4. Research Evaluation Metric 5. Research 2. Data Collection
  • 28.
    Step 5: Research Research Experiments ErrorAnalysis Data (collection/annotation)
  • 29.
    Step 5: Research Appliedresearch is an empirical process. 1. Research Experiments Phase: ● Deep Learning architectures ● Learning hyper parameters ● Data manipulations ● Other ...
  • 30.
    Step 5: Research Appliedresearch is an empirical process. 2. Analysis Phase: ● Split data into train / validation / test ● Bias / Variance (on validation and training data) * ● Rank the factors that impact our evaluation metric the most (on validation data) Feature %Error Priority Wrong Annotation 25% 1 Close Objects 20% 2 Truncated Objects 3% 3 Umbrellas 1% 4 ... ... ... ** https://kevinzakka.github.io/2016/09/26/applying-deep-learning/
  • 31.
    Step 5: Research Appliedresearch is an empirical process. 3. Data Phase: ● Clean Data / Re annotate / Change Annotation Scheme ● Data Collection
  • 32.
    Step 5: Research Appliedresearch is an empirical process. Next Iteration: ● What to explore / fix next in our Deep Learning models
  • 33.
    Step 5: Research Theresearch phase is very resource demanding: ● Researchers ● Compute ● Time Optimize this in order to shorten time to product: ● Researchers → Increase productivity ● Compute → reduce costs
  • 34.
    Step 5: Research Runningan experiment involves: ● Planning the experiment (ok) ● Setting up a compute environment (oh) ● Data selection, preprocessing, fetching (oh) ● Monitoring Periodic evaluation (oh) ● Managing a pipeline of algorithms (oh) ● Saving intermediate results (oh)
  • 35.
    Step 5: Research Runningmany experiments doesn’t scale up!: ● Managing compute environment resources and prioritization ● Monitoring many experiments ● Analysing many experiments results ● Experiment traceability over time: code, data, experiment configuration versioning A dedicated infrastructure and management system is needed to: ● Shared resources management ● Orchestrate the training of the different models ● Monitoring the various experiments, training configurations, models ● Allowing to build complicated algorithms pipelines and running effortlessly Build your own or use 3rd party?
  • 36.
    Topics Covered 1. Requirements 3.Data Annotation 4. Research Evaluation Metric 5. Research 2. Data Collection
  • 37.
    We are alwayslooking for new talents Passionate about AI and want to explore more ? We invite to join us on our journey! For Jobs opportunities: https://www.linkedin.com/company/brodmann17/
  • 38.
    THANK YOU Amir Alush,Phd - Co founder & CTO amir@brodmann17.com