Successfully reported this slideshow.
Your SlideShare is downloading. ×

Fueling-the-AI-revolution.pdf

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 39 Ad

More Related Content

Similar to Fueling-the-AI-revolution.pdf (20)

More from Lionel Wolberger (20)

Advertisement

Recently uploaded (20)

Fueling-the-AI-revolution.pdf

  1. 1. Software 2.0 | Fueling the AI Revolution www.understand.ai
  2. 2. Founded: March 10th, 2017 Research dates back to 2014 HQ: Karlsruhe (Germany) Heart of a key European Automotive & AI cluster Investment: $3.5M seed From Valley, UK & German entrepreneurial VCs Team: 53 associates, combination of AI & Automotive talent Customers: Automotive OEMs and Tier 1/2s e.g. VDA V&V-Methoden (Pegasus successor) Understand.ai A young fast-growing start-up
  3. 3. Agenda 1. Where is AI used? 2. What is required to build an AI system? 3. Characteristics of Data 4. Training & Validation Data 5. Data Engine 6. Processes and People 7. Conclusion Fueling the AI Revolution
  4. 4. Where is AI used?
  5. 5. ● speech to text ● night vision ● translation ● search & find ● ... Personal Assistance AI as...
  6. 6. ● Perception ● Planning ● Routing ● Behaviour Models ● ... Automotive AI in...
  7. 7. ● Diagnose Diseases ● Develop drugs faster ● Personalize treatment ● Enable gene editing ● Chatbot as a personal doctor (X2.ai) ● ... Medicine AI in...
  8. 8. What is required to build an AI system?
  9. 9. Infrastructure
  10. 10. Algorithms generated model performs significantly better than manually designed model Source: https://ai.googleblog.com/2017/11/automl-for-large-scale-image.html
  11. 11. Data Amount of sleep lost over... Algorithms Data Sets PhD Tesla Source: Andrej Karpathy
  12. 12. Academia Academia versus Industry fixed dataset for comparability variable, high sophisticated algorithms variable results, often hard to reproduce
  13. 13. Industry Academia versus Industry variable, ever changing data set variable but less relevant algorithms fixed performance results needs to be reliably reproducible under a very wide range of different circumstances
  14. 14. Fixed Model Performance Industry
  15. 15. Characteristics of Data
  16. 16. Data Quality Distorting VOC Bounding Boxes ground-truth small noise (0.08) large noise (0.13) Source: VOC dataset
  17. 17. Data Quality Distorting VOC Bounding Boxes at loU 0.5 at loU 0.8 industry standard mAP VOC 0.6996 0.4626 mAP VOC 0.08 0.6794 0.2262 mAP VOC 0.13 0.6400 0.0562 -40,64 pts Source: VOC dataset
  18. 18. Quantity of Data The more the better! Source: Revisiting Unreasonable Effectiveness of Data in Deep Learning Era (https://arxiv.org/pdf/1707.02968.pdf) object detection performance on COCO object detection performance on PASCAL VOC 2007
  19. 19. Diversity of Data When is performance acceptable? Source: Tesla Autonomy Day
  20. 20. Diversity of Data Weird Edge Cases :)
  21. 21. Training & Validation Data
  22. 22. Validation Data Best Practice a satisfying run on the test set is no guarantee for happy customers...
  23. 23. Validation Data Best Practice
  24. 24. Required Validation Effort Validation effort might be insane...
  25. 25. Validate Autonomous Driving Be as good as a human driver 96 fatalities within 8.8bil miles +/- 20% 325mio hours of driving at 25mp/h Source: https://www.rand.org/content/dam/rand/pubs/research_reports/RR1400/RR1478/RAND_RR1478.pdf
  26. 26. Will we all be labelers? Tesla: 1.000 labelers Google: 6.000 labelers Baidu: 20.000 labelers
  27. 27. Tooling to Create Data Sets Data Engine
  28. 28. Data Engine Turbo Boost Data Collection
  29. 29. Data Selection Data Engine Data selection as a key technique to keep the data flow under control Most data is valueless! Build product around the problem to be solved: initial data and later data needs to be sourced as efficiently as possible
  30. 30. Training & Validation Data Data Engine Aim for coverage, no matter how big the data set grows Keep train set well balanced when adding new scenarios Data sets need to cover real world variance!
  31. 31. Uncovered Areas Data Engine Visualize output, implement statistics, analyze outliers, run new models in shadow mode Initial data will not cover real world variance + over time the world is changing (model staleness) Design product to make end-users naturally train your system!
  32. 32. Processes and People
  33. 33. Machine Learning Model Development Process
  34. 34. Split into 3 Teams One team to collect, condense and massage the data as well as to train and validate the algorithm One team to enrich data by labeling it and build annotation tools One team to build the data pipeline and surrounding stack to host the model
  35. 35. Traditional System Testing and Monitoring
  36. 36. ML-Based System Testing and Monitoring Behavior depends on dynamic qualities of data and model configuration choices
  37. 37. Conclusion
  38. 38. UAI is an awesome company Quality, Quantity and Diversity of training and validation data enable high performance AI Systems Validation requirements might have an insane impact on effort Build a data engine! QA for production grade ML Systems is highly complex Conclusion It’s all in the data!
  39. 39. Q&A Daniel Rödler daniel@understand.ai Software 2.0 Fueling the AI Revolution! Feedback, please. Thanks! Philip Kessler philip@understand.ai

×