Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
merantix.com Adrian Locher
Establishing the
future of AI in
Europe
Berlin AI Dr. Rasmus Rothe May 10, 2017
3 learnings from applying Deep Learning
to real world problems
Merantix GmbH, Berlin
HackZurich 2017: Sep 15 - 17, 2017
Quick reminder: Deep Learning
Neural networks in real world applications
Facebook face recognition Neural networks in autonomous driving
Companies worki...
How we work at Merantix
Dataset
Ventures
Products
Machine Learning
3 learnings
It is actually more difficult than in theory...
First learning:
Value of pretraining
Problem: Datasets are expensive
Example 1 medical diagnostics: Cost for annotating 10’000 medical images
— 30min required ...
Pretraining is the solution!
Pretraining with cheap but large datasets on related domain1
Fine-tuning with well labeled da...
How to get data for pretraining
IMDB
WIKI
25 36 14 51
66 34 54 18
Crawl dataPublic datasetsPretrained models
...
...
Weakly labeled data: Medical imaging
We don’t have labeled data so we get the labels from medical reports
We extract text
...
Second learning:
Caveats of real label distributions
Academic datasets are balanced
Example 1: MNIST - equally many samples per digit Example 2: Food 101 - perfectly balanced
...
Real world datasets are not...
Credit scoring Medical Imaging
1-2% of people default Luckily, the majority of people are h...
And: Making mistakes can be expensive
Credit scoring Medical Imaging
AcceptReject
Paid Defaulted
$
$$$$$
Diagnosed
Not
dia...
How to cope with this
Sick
Sick
Sick
Be careful
Training Inference
Rare class A
Rare class B
Frequent class
Rare class A &...
How to cope with this
Easy:
Hard:
Oversampling Undersampling Negative mining
Hard:
Training batch Weighting of loss
3. Sam...
Third learning:
Understanding black box models
Neural networks are black boxes
Lin. regression / decision trees:
Decision mechanism can be easily explained
Neural networ...
This is problematic in the real world! Why?
King penguin Starfish Baseball Electric guitar
+E =
Panda
57.7% confidence
Gib...
This is problematic in the real world! Why?
Why DIDN’T it work? What biases does it learn?
Our Picasso Visualizer in practice
Partial occlusion Saliency map
Soon to be open-sourced!
Join us on our journey
Science1 Datasets2 Business3
Research on the bleeding edge of
deep learning.
Get access to some of ...
WEBSITE CONTACT SOCIAL
merantix.com Twitter: @merantix
Github: merantix
Dr. Rasmus Rothe
rasmus@merantix.com
3 learnings from applying Deep Learning to real world problems
3 learnings from applying Deep Learning to real world problems
3 learnings from applying Deep Learning to real world problems
3 learnings from applying Deep Learning to real world problems
3 learnings from applying Deep Learning to real world problems
3 learnings from applying Deep Learning to real world problems
3 learnings from applying Deep Learning to real world problems
3 learnings from applying Deep Learning to real world problems
3 learnings from applying Deep Learning to real world problems
3 learnings from applying Deep Learning to real world problems
3 learnings from applying Deep Learning to real world problems
Upcoming SlideShare
Loading in …5
×

3 learnings from applying Deep Learning to real world problems

3,077 views

Published on

Regrettably, datasets in the wild are much less clean than those in academia. At Merantix we apply Deep Learning to real world problems. In my talk at the Berlin.AI event on May 10, 2017 I shared 3 key learnings.

Published in: Data & Analytics

3 learnings from applying Deep Learning to real world problems

  1. 1. merantix.com Adrian Locher Establishing the future of AI in Europe Berlin AI Dr. Rasmus Rothe May 10, 2017
  2. 2. 3 learnings from applying Deep Learning to real world problems Merantix GmbH, Berlin
  3. 3. HackZurich 2017: Sep 15 - 17, 2017
  4. 4. Quick reminder: Deep Learning
  5. 5. Neural networks in real world applications Facebook face recognition Neural networks in autonomous driving Companies working on deep learning
  6. 6. How we work at Merantix Dataset Ventures Products Machine Learning
  7. 7. 3 learnings
  8. 8. It is actually more difficult than in theory...
  9. 9. First learning: Value of pretraining
  10. 10. Problem: Datasets are expensive Example 1 medical diagnostics: Cost for annotating 10’000 medical images — 30min required per labelled image — 100 EUR/hour — 2 images/hour — 50 EUR/image EUR 500’000 Example 2 credit scoring: Cost of knowing if someone defaults — To estimate default risk, labels of defaulted people are required — You can only get them if you let them default EUR 10’000/d Assuming average default volume of EUR 10K
  11. 11. Pretraining is the solution! Pretraining with cheap but large datasets on related domain1 Fine-tuning with well labeled data2 Performance boost!!
  12. 12. How to get data for pretraining IMDB WIKI 25 36 14 51 66 34 54 18 Crawl dataPublic datasetsPretrained models ... ...
  13. 13. Weakly labeled data: Medical imaging We don’t have labeled data so we get the labels from medical reports We extract text labels via NLP and use them for training How do we do this? 1 Condition 2 Prognosis Keine Pleuraerguss in der linken Lunge Keine Erguss in der linken Lunge Keine Pleuraergusses in der linken Lunge Keine Randwinkelerguss in der rechte Lunge Keine Erguß in der Lunge Word embeddings help to come up with smart rules If “Kein”/”Keine” → NO_EXISTENCE If “Einige Beweise” → SMALLER_EXISTENCE Else → DEFINITE_EXISTENCE
  14. 14. Second learning: Caveats of real label distributions
  15. 15. Academic datasets are balanced Example 1: MNIST - equally many samples per digit Example 2: Food 101 - perfectly balanced ... ... ... ... ... ... ... ... ...... TrainingsetTestset ... ... ... ... ... ... ... ... ......
  16. 16. Real world datasets are not... Credit scoring Medical Imaging 1-2% of people default Luckily, the majority of people are healthy
  17. 17. And: Making mistakes can be expensive Credit scoring Medical Imaging AcceptReject Paid Defaulted $ $$$$$ Diagnosed Not diagnosed Healthy Sick
  18. 18. How to cope with this Sick Sick Sick Be careful Training Inference Rare class A Rare class B Frequent class Rare class A & B Frequent class 1. More data 2. Change labeling
  19. 19. How to cope with this Easy: Hard: Oversampling Undersampling Negative mining Hard: Training batch Weighting of loss 3. Sampling 4. Weighting
  20. 20. Third learning: Understanding black box models
  21. 21. Neural networks are black boxes Lin. regression / decision trees: Decision mechanism can be easily explained Neural networks: Complex systems are hard to understand! In reality: 100m+ parameters….
  22. 22. This is problematic in the real world! Why? King penguin Starfish Baseball Electric guitar +E = Panda 57.7% confidence Gibbon 99.3% confidence Can the neural network be fooled? Does it really work in production?
  23. 23. This is problematic in the real world! Why? Why DIDN’T it work? What biases does it learn?
  24. 24. Our Picasso Visualizer in practice Partial occlusion Saliency map Soon to be open-sourced!
  25. 25. Join us on our journey Science1 Datasets2 Business3 Research on the bleeding edge of deep learning. Get access to some of the best datasets in the world. Grow businesses in the space of AI/deep learning
  26. 26. WEBSITE CONTACT SOCIAL merantix.com Twitter: @merantix Github: merantix Dr. Rasmus Rothe rasmus@merantix.com

×