merantix.com Adrian Locher
Establishing the
future of AI in
Europe
Berlin AI Dr. Rasmus Rothe May 10, 2017
3 learnings from applying Deep Learning
to real world problems
Merantix GmbH, Berlin
HackZurich 2017: Sep 15 - 17, 2017
Quick reminder: Deep Learning
Neural networks in real world applications
Facebook face recognition Neural networks in autonomous driving
Companies working on deep learning
How we work at Merantix
Dataset
Ventures
Products
Machine Learning
3 learnings
It is actually more difficult than in theory...
First learning:
Value of pretraining
Problem: Datasets are expensive
Example 1 medical diagnostics: Cost for annotating 10’000 medical images
— 30min required per labelled image
— 100 EUR/hour
— 2 images/hour
— 50 EUR/image
EUR 500’000
Example 2 credit scoring: Cost of knowing if someone defaults
— To estimate default risk, labels of
defaulted people are required
— You can only get them if you let them
default
EUR 10’000/d
Assuming average default volume of EUR 10K
Pretraining is the solution!
Pretraining with cheap but large datasets on related domain1
Fine-tuning with well labeled data2
Performance
boost!!
How to get data for pretraining
IMDB
WIKI
25 36 14 51
66 34 54 18
Crawl dataPublic datasetsPretrained models
...
...
Weakly labeled data: Medical imaging
We don’t have labeled data so we get the labels from medical reports
We extract text
labels via NLP
and use them for
training
How do we do this?
1 Condition 2 Prognosis
Keine Pleuraerguss in der linken Lunge
Keine Erguss in der linken Lunge
Keine Pleuraergusses in der linken Lunge
Keine Randwinkelerguss in der rechte Lunge
Keine Erguß in der Lunge
Word embeddings
help to come up with
smart rules
If “Kein”/”Keine” → NO_EXISTENCE
If “Einige Beweise” → SMALLER_EXISTENCE
Else → DEFINITE_EXISTENCE
Second learning:
Caveats of real label distributions
Academic datasets are balanced
Example 1: MNIST - equally many samples per digit Example 2: Food 101 - perfectly balanced
... ... ... ... ... ... ... ... ......
TrainingsetTestset
... ... ... ... ... ... ... ... ......
Real world datasets are not...
Credit scoring Medical Imaging
1-2% of people default Luckily, the majority of people are healthy
And: Making mistakes can be expensive
Credit scoring Medical Imaging
AcceptReject
Paid Defaulted
$
$$$$$
Diagnosed
Not
diagnosed
Healthy Sick
How to cope with this
Sick
Sick
Sick
Be careful
Training Inference
Rare class A
Rare class B
Frequent class
Rare class A & B
Frequent class
1. More data
2. Change labeling
How to cope with this
Easy:
Hard:
Oversampling Undersampling Negative mining
Hard:
Training batch Weighting of loss
3. Sampling
4. Weighting
Third learning:
Understanding black box models
Neural networks are black boxes
Lin. regression / decision trees:
Decision mechanism can be easily explained
Neural networks:
Complex systems are hard to understand!
In reality: 100m+ parameters….
This is problematic in the real world! Why?
King penguin Starfish Baseball Electric guitar
+E =
Panda
57.7% confidence
Gibbon
99.3% confidence
Can the neural network be fooled? Does it really work in production?
This is problematic in the real world! Why?
Why DIDN’T it work? What biases does it learn?
Our Picasso Visualizer in practice
Partial occlusion Saliency map
Soon to be open-sourced!
Join us on our journey
Science1 Datasets2 Business3
Research on the bleeding edge of
deep learning.
Get access to some of the best
datasets in the world.
Grow businesses in the space of
AI/deep learning
WEBSITE CONTACT SOCIAL
merantix.com Twitter: @merantix
Github: merantix
Dr. Rasmus Rothe
rasmus@merantix.com

3 learnings from applying Deep Learning to real world problems

  • 2.
    merantix.com Adrian Locher Establishingthe future of AI in Europe Berlin AI Dr. Rasmus Rothe May 10, 2017
  • 3.
    3 learnings fromapplying Deep Learning to real world problems Merantix GmbH, Berlin
  • 5.
    HackZurich 2017: Sep15 - 17, 2017
  • 7.
  • 8.
    Neural networks inreal world applications Facebook face recognition Neural networks in autonomous driving Companies working on deep learning
  • 9.
    How we workat Merantix Dataset Ventures Products Machine Learning
  • 14.
  • 15.
    It is actuallymore difficult than in theory...
  • 18.
  • 19.
    Problem: Datasets areexpensive Example 1 medical diagnostics: Cost for annotating 10’000 medical images — 30min required per labelled image — 100 EUR/hour — 2 images/hour — 50 EUR/image EUR 500’000 Example 2 credit scoring: Cost of knowing if someone defaults — To estimate default risk, labels of defaulted people are required — You can only get them if you let them default EUR 10’000/d Assuming average default volume of EUR 10K
  • 20.
    Pretraining is thesolution! Pretraining with cheap but large datasets on related domain1 Fine-tuning with well labeled data2 Performance boost!!
  • 21.
    How to getdata for pretraining IMDB WIKI 25 36 14 51 66 34 54 18 Crawl dataPublic datasetsPretrained models ... ...
  • 22.
    Weakly labeled data:Medical imaging We don’t have labeled data so we get the labels from medical reports We extract text labels via NLP and use them for training How do we do this? 1 Condition 2 Prognosis Keine Pleuraerguss in der linken Lunge Keine Erguss in der linken Lunge Keine Pleuraergusses in der linken Lunge Keine Randwinkelerguss in der rechte Lunge Keine Erguß in der Lunge Word embeddings help to come up with smart rules If “Kein”/”Keine” → NO_EXISTENCE If “Einige Beweise” → SMALLER_EXISTENCE Else → DEFINITE_EXISTENCE
  • 24.
    Second learning: Caveats ofreal label distributions
  • 25.
    Academic datasets arebalanced Example 1: MNIST - equally many samples per digit Example 2: Food 101 - perfectly balanced ... ... ... ... ... ... ... ... ...... TrainingsetTestset ... ... ... ... ... ... ... ... ......
  • 26.
    Real world datasetsare not... Credit scoring Medical Imaging 1-2% of people default Luckily, the majority of people are healthy
  • 27.
    And: Making mistakescan be expensive Credit scoring Medical Imaging AcceptReject Paid Defaulted $ $$$$$ Diagnosed Not diagnosed Healthy Sick
  • 28.
    How to copewith this Sick Sick Sick Be careful Training Inference Rare class A Rare class B Frequent class Rare class A & B Frequent class 1. More data 2. Change labeling
  • 29.
    How to copewith this Easy: Hard: Oversampling Undersampling Negative mining Hard: Training batch Weighting of loss 3. Sampling 4. Weighting
  • 31.
  • 32.
    Neural networks areblack boxes Lin. regression / decision trees: Decision mechanism can be easily explained Neural networks: Complex systems are hard to understand! In reality: 100m+ parameters….
  • 33.
    This is problematicin the real world! Why? King penguin Starfish Baseball Electric guitar +E = Panda 57.7% confidence Gibbon 99.3% confidence Can the neural network be fooled? Does it really work in production?
  • 34.
    This is problematicin the real world! Why? Why DIDN’T it work? What biases does it learn?
  • 35.
    Our Picasso Visualizerin practice Partial occlusion Saliency map Soon to be open-sourced!
  • 36.
    Join us onour journey Science1 Datasets2 Business3 Research on the bleeding edge of deep learning. Get access to some of the best datasets in the world. Grow businesses in the space of AI/deep learning
  • 37.
    WEBSITE CONTACT SOCIAL merantix.comTwitter: @merantix Github: merantix Dr. Rasmus Rothe rasmus@merantix.com