Successfully reported this slideshow.
Your SlideShare is downloading. ×

Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO at Hydrosphere.io

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Monitoring AI with AI
Monitoring AI with AI
Loading in …3
×

Check these out next

1 of 43 Ad

Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO at Hydrosphere.io

Download to read offline

In this demo based talk we discuss a solution, tooling and architecture that allows machine learning engineer to be involved in delivery phase and take ownership over deployment and monitoring of machine learning pipelines. It allows data scientists to safely deploy early results as end-to-end AI applications in a self serve mode without assistance from engineering and operations teams. It shifts experimentation and even training phases from offline datasets to live production and closes a feedback loop between research and production.

In this demo based talk we discuss a solution, tooling and architecture that allows machine learning engineer to be involved in delivery phase and take ownership over deployment and monitoring of machine learning pipelines. It allows data scientists to safely deploy early results as end-to-end AI applications in a self serve mode without assistance from engineering and operations teams. It shifts experimentation and even training phases from offline datasets to live production and closes a feedback loop between research and production.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO at Hydrosphere.io (20)

Advertisement

More from Provectus (20)

Recently uploaded (20)

Advertisement

Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO at Hydrosphere.io

  1. 1. Monitoring AI with AI Stepan Pushkarev CTO of Hydrosphere.io
  2. 2. Mission: Accelerate Machine Learning to Production Opensource Products: - ML Lambda: ML Deployment and Serving - Sonar: Data and ML Monitoring - Mist: Serverless proxy for Spark Business Model: PaaS and hands-on consulting About
  3. 3. Traditional Software Machine Learning applications Explicit business rules ML generated model Unit testing Model Evaluation (Micro)service Model as a Service Docker per service Docker per Model 1 version of Microservice in prod 1-10-20 model versions in prod at a time Eng + QA team owning a service 1 ML Engineer owning 10-20 models Fail loudly (exception, stack trace) Fail silently Can work forever if verified Performance declines over time Needs continuous retraining / redeployment App metrics monitoring Data Monitoring | Model Metrics Monitoring
  4. 4. Cost of an AI/ML Error ● Fun © http://blog.ycombinator.com/how-adversarial-attacks-work/
  5. 5. ● Fun ● Not fun Cost of an AI Error
  6. 6. ● Fun ● Not fun ● Not fun at all... Cost of an AI Error
  7. 7. ● Fun ● Not fun ● Not fun at all… ● Money Cost of an AI Error
  8. 8. ● Fun ● Not fun ● Not fun at all… ● Money ● Business Cost of an AI Error
  9. 9. Where/why may AI fail in prod?
  10. 10. Where/why may AI fail in prod? Everywhere!
  11. 11. Where/why may AI fail in prod? ● Bad training data ● Bad serving data ● Training/serving data skew ● Misconfiguration ● Deployment issue ● Retraining issue ● Performance ● Concept Drift Everywhere!
  12. 12. AI Reliability Pyramid
  13. 13. Reliable Training-Serving pipelines Comfort Zone for Data Scientist in the middle of Production
  14. 14. AI Reliability Pyramid
  15. 15. Model Deployment and integration model.pkl model.zip How to integrate it into AI Application?
  16. 16. Model server = Model Artifact + Metadata + Runtime + Deps + Sidecar /predict input: string text; bytes image; output: string summary; JVM DL4j GPU matching_model v2 [ .... ] gRPC HTTP server routing, shadowing pipelining tracing metrics autoscaling A/B, canary sidecar serving requests
  17. 17. Model Deployment takeaways ● Eliminates hand-off between Data Scientist -> ML Eng -> Data Eng -> SA Eng -> QA -> Ops ● Sticks components together: Data + Model + Applications + Automation = AI Application ● Enables quick transition from research to production. ML engineers can deploy models many times a day But wait… This is not safe! How to ensure we’ll not break things in prod?
  18. 18. AI Reliability Pyramid 1) Is the model degraded? 2) What is the reason?
  19. 19. Data Format Drift
  20. 20. Concept Drift
  21. 21. Concept Drift
  22. 22. Data exploration in production Research: Data Scientist makes assumptions based on results of data exploration
  23. 23. Data exploration in production Research: Data Scientist explores datasets and makes assumptions/hypothesis Production: The model works if and only if the format and statistical properties of prod data are the same as in research Push to Prod
  24. 24. Data exploration in production Research: Data Scientist makes assumptions based on results of data exploration Production: The model works if and only if format and statistical properties of prod data are the same as in research Push to Prod Continuous data exploration and validation?
  25. 25. Automatic Data Profiling ● Avro/Protobuf schema can catch data format drifts ● Statistical properties of input features are to be captured and continously validated {"name": "User", "fields": [ {"name": "name", "type": "string", "min_length": 2, "max_length": 128}, {"name": "age", "type": ["int", "null"], "range": "[10, 100]"}, {"name": "sex", "type": ["string", "null"], " enum": "[male, female, ...]"}, {"name": "wage", "type": ["int", "null"], "validator": "a-distance"} ] }
  26. 26. Quality metrics generated from data profile checks
  27. 27. How to deal with - multidimensional dataset - data timeliness - data completeness - image data - complicated seasonality?
  28. 28. Anomaly detection ● Rule based programs -> statistical models -> machine learning models ● Deal with multidimensional datasets, timeliness and complicated seasonality
  29. 29. Model Monitoring Metrics on streaming data ● System metrics (latency/throughput) ● Kolmogorov-Smirnov ● Q-Q plot, t-digest ● Spearman and Pearson correlations ● Density based clustering algorithms with Elbow or Silhouette methods ● Deep Autoencoders ● Generative Adversarial Networks ● Random Cut Forest (AWS paper) ● “Bring your own” metric
  30. 30. GANs for monitoring data quality at serving time {production input} {good} {drift (fake)}
  31. 31. Model server = Metadata + Model Artifact + Runtime + Deps + Sidecar + Training Metadata /predict input: output: JVM DL4j / TF / Other GPU CPU model v2 [ .... ] gRPC HTTP server sidecar serving requests training data stats: - min, max - range - clusters - quantiles - autoencoder compare with prod data in runtime
  32. 32. Change of the Paradigm Shifts experimentation to prod/shadowed environment
  33. 33. Use Case: Kolmogorov-Smirnov in action
  34. 34. Use Case: Monitoring NLU system Figure from: Bapna, Ankur, et al. "Towards zero-shot frame semantic parsing for domain scaling." arXiv preprint arXiv:1707.02363 (2017).
  35. 35. Use Case: Monitoring NLU system Source image: Kurata, Gakuto, et al. "Leveraging sentence-level information with encoder lstm for semantic slot filling." arXiv preprint arXiv:1601.01530 (2016). ● Train and test offline on restaurants domain ● Deploy do prod ● Feed the model with new random Wiki data ● Monitor intermediate input representations (neural network hidden states)
  36. 36. Use Case: Monitoring NLU system ● Red and Purple - cluster of “Bad” production data ● Yellow and Blue - dev and test data
  37. 37. AI Reliability Pyramid
  38. 38. Drift Handling ● Unexpected or dramatic drift? - Alert and add ML/Data Engineer into the loop. ● Expected drift? - Retrain. Open question to be solved with ML: classify expected vs. unexpected drift.
  39. 39. Model Retraining - common questions When to retrain? When/how to push to prod? What data to retraining with? Manually on demand Works well for 1 model But does not scale
  40. 40. Model Retraining - common questions When to retrain? When/how to push to prod safely? What data to retraining with? Manually on demand Works well for 1 model But does not scale Automatically with the latest batch Not safe Can be expensive The latest batch may not be representative
  41. 41. Solution: Reactive AI powered retraining
  42. 42. Thank you - Stepan Pushkarev - @hydrospheredata - https://github.com/Hydrospheredata - https://hydrosphere.io/ - spushkarev@hydrosphere.io

×