Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Illarion Khlestov "Developing robust ML training systems"

11 views

Published on

BigData & Data Engineering

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Illarion Khlestov "Developing robust ML training systems"

  1. 1. Developing robust training systems
  2. 2. About me Illarion Khlestov, Senior Research Engineer GitHub: https://github.com/ikhlestov Blog: https://medium.com/@illarionkhlestov Facebook: https://www.facebook.com/i.khlestov
  3. 3. Motivation
  4. 4. It’s not complicated
  5. 5. Structure 25-30 mins 10 mins
  6. 6. Data handling
  7. 7. Data variety issue
  8. 8. Solution - unified format
  9. 9. How to store data?
  10. 10. Format validation - jsonschema - trafaret - validr - voluptuous
  11. 11. Inbound data validation - Allowed range of entries - Annotation quality - Data quantity - Checksums - Suitability
  12. 12. Visualization
  13. 13. Plotly & Dash
  14. 14. Plotly & Dash
  15. 15. Plotly structuring
  16. 16. Moving between servers
  17. 17. Data downloaders
  18. 18. Auto synchronization
  19. 19. Data update & cleanup
  20. 20. Training
  21. 21. The simplest approach
  22. 22. Results saving - id: - Logs - Weights - Graphs - Predictions - id example: - 1541023813_mobile_net_trained_with_SGD_001
  23. 23. Configs
  24. 24. Python configs
  25. 25. Python configs import https://docs.python.org/3/library/importlib.html
  26. 26. Configs updates
  27. 27. Network validation - Test run: - Train - Validation - Test - Results saving - Size calculation - Speed calculation
  28. 28. Training Speedup Image source
  29. 29. Evaluation
  30. 30. Summary table
  31. 31. Configs diffs
  32. 32. Model evaluation table - Dataset ID - Link to dataset statistic - Data preprocessing - Train graphs and logs - Performance - Accuracy - Speed
  33. 33. Can you spot a frog-cat? Generated by facets
  34. 34. Mislabeling Debugging
  35. 35. Pre-production - Additional metrics - Don’t mix ids - Test auto-check
  36. 36. Training result structure
  37. 37. Bonus: Libraries - https://github.com/IDSIA/sacred - configure, organize, log and reproduce experiments - https://github.com/henripal/labnotebook - flexibly monitor, record, save, and query experiments - https://github.com/facebookresearch/visdom - visualizations of live, rich data - https://github.com/uber/horovod - distributed training framework - https://github.com/autonomio/talos - Hyperparameter Optimization for Keras Models - https://github.com/hyperopt/hyperopt - Hyperparameter Optimization in Python
  38. 38. http://bit.ly/training_systems

×