This is the presentation given in TLV DLD 2017. In this presentation we walk through the planning and implemintation of deeplearning solution for image recognition, with focus on the data.
It is based on the work we do at dataloop.ai and its customers.
3. About me
Eran Shlomo
15 years of technical and innovation experience
Smartap Co-Founder and chief architect
Comentino Co-Founder and CTO
Cloud & Embedded Systems expert
Tech lead of the Intel partners program for Startups.
Dataloop.ai Co-founder & CEO
5. A Special Time in History
ML technology
is mature
Compute power
price is
decreasing
Data is the new
6. The Bigger Change, Scalability & Repeatability
Program
Input Input
Data
Data Program
7. Model Objective
Computer vision models can be described by these 3 objectives*
* We observe traditional CV is usually needed for data pipeline, where DL is the “core”
** Solutions are usually an ensemble of several models rather than one
9. Decisions Decisions…
Pipeline
Planning
• Model ensemble plan
• Model type mapping
Performance
• FPS
• Power / Thermal limitations
Environment
• Cloud/Edge
• Accelerators (HW Cost)
Expected
accuracies
• Don’t ask… everybody wants high
accuracies
• What is minimally acceptable ?
10. Time for Some Hard Questions – ML2
What is the expected accuracy ?
How much data is needed ?
How much will it cost ?
We call the answers “ML2“, and train models to provide them
11. Pipeline Planning
• Breaking it to the most basic units possible makes predictability much easier
• Example:
Build two classifiers with a and b classes rather then single classifier with a + b
classes
• How many classes ? (aka class planning).
• Evaluate the SnR, High SnR == Classification model, low SnR == Segmentation
• Plan the pipeline for the most deterministic environment w/o business
impact.
12. HW limitations and performance
requirements
• Pipeline is defined We need to run X models
every Y (mili) seconds.
• Compute budget?
• Set model compute budget plan
• Meet power and thermal envelope
• You are now ready for model arch selection
Compute
14. The Data Volume Illusion
• We tend to talk volumes pretty fast : data volume data cost
• But data variance is important as volume
• Deep learning is very good at modeling bounded patterns
• So when building a dataset:
• Consider all expected scenarios, these grow exponentially
• Each image should contain relevant information
• Quality annotation - the model is only as good as your data
• Augmentations are free lunch
15. Exponential Data Growth
• Lets take self driving car as an example, scenarios:
• Day time
• Weather condition
• Traffic density
• Road conditions
• …
• Now datasets are multiplied:
• Can the model detect dog crossing while in jammed junction with
green traffic light in rural bumpy road on a rainy night ?
• Data is #1 cost/TTM factor in developing solutions
• The process is iterative, requires closure of the data loop
16. Data ≠ Information
• Information theory is very useful – Models are information
containers
• Minimal dataset – dataset that has the smallest number of items,
that holds the required information.
• Lets get some intuition on information content and entropy (the
same one from the famous “cross entropy loss”)
17. Information Content
• Shannon defined Information content function 𝐼 𝑝 that satisfies the
following given event with probability 𝑝 :
• I(p) is anti-monotonic in p – increases and decreases in the probability of
an event produce decreases and increases in information, respectively
• I(p) ≥ 0 – information is a non-negative quantity
• I(1) = 0 – events that always occur do not communicate information
• I(p1 p2) = I(p1) + I(p2) – information due to independent events is additive
• The function 𝐼 𝑝 = log
1
𝑝
satisfies the above requirements of information
behavior
18. Information Content - Example
I have a 4x4 card with randomly selected number, You try to guess it by going
serially on 1,2,3…
You start at 1 and get a miss, what are the odds ?
15
16
How much information did you get by this result ? log
16
15
= 0.084
Round 2,3 will yield log
15
14
(0.098), log
14
13
(0.106) respectively
Magic happens in round 4, we get log
13
1
, 3.7 information spike
What is the sum of all ?
What is the info added in round 5,6... ?
1 2 3
5 6 7 8
9 10 11 12
13 14 15 16
19. Information Content – Example
(count’d)
Given series of binary samples of randomly chosen event 𝑖 out
of N:
Added information: 𝑙𝑜𝑔
𝑁
𝑁−1
+ 𝑙𝑜𝑔
𝑁−1
𝑁−2
+ ⋯ + 𝑙𝑜𝑔
𝑁−𝑖+1
1
+ 0 + 0
𝑙𝑜𝑔 𝑁 − 𝑙𝑜𝑔 𝑁 − 1 + 𝑙𝑜𝑔 𝑁 − 1 − 𝑙𝑜𝑔( 𝑁 −
1 2 3
5 6 7 8
9 10 11 12
13 14 15 16
Lets go back to datasets , Can we apply this for minimal dataset estimation ?
20. So your dataset is ready
• Go and train it
• It doesn’t meet your goals – time to debug
• In general debugging NN is an
experimental process
21. Debug Actions
Start with the trivial:
• Have you tried several architectures, depths, activation function,…
• Are your classes are balanced (also information wise)?
• Is your data clean?
• Full retrain
• Are you overfitting ? Overfitting Dataset information content <
Model information capacity.
22. Debug actions – cont’d
Output confusion matrix, This is your final priority list.
• Can you separate the confused items yourself?
• Compare activation heatmap, very hard to identify separating filter.
• Dimenetially reduce your feature vector, cluster and plot , Are they
separable?
• Increase confusion balance
• Increase confusion augmentation
• Merge classes
• Create null class
• Add controlled noise
• Accept it as final accuracy
23. Summary
• This is a new born field, based on
experiments and rich with brute force
• It works…
• At DataLoop we are formalizing the
process and building the platform to
match the development process