Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning Vital Signs


Published on

Machine Learning Vital Signs: Metrics and Monitoring of AI in Production

This talk details the tracking of machine learning models in production to ensure model reliability, consistency, and performance into the future. Production models are interacting with the real world, and it is terrifying that often times nobody has any idea how they are performing on live data. The world changes! Bias and variance can creep into your models over time and you should know when that happens.

Published in: Technology
  • Be the first to comment

Machine Learning Vital Signs

  1. 1. Machine Learning Vital Signs Metrics and Monitoring of AI in Production Donald Miner Miner & Kasch OSCON July 16, 2019
  2. 2. We build a model • Someone builds a model • They test it • Everyone is happy • It works in prod
  3. 3. We build lots of models Multiple people build multiple models to solve multiple problems
  4. 4. Who’s watching all these models?
  5. 5. Which ones are working? What does working mean?
  6. 6. The world changes slowly Over time the nature of the world changes Our models will not work as well
  7. 7. Big things can happen and fundamentally change the world This may render previous models less useful or worthless The world changes abruptly
  8. 8. Seasonal and periodic changes happen This can impact model effectiveness temporarily or permanently The world changes periodically
  9. 9. Current events can change the world for a small period of time Model effectiveness (usually for worse) for a short period of time Weird things happen then go away
  10. 10. They will happen Can be troublesome to detect in machine learning pipelines Bugs
  11. 11. Bad people exist Could they exploit your model or training set to your detriment? Adversaries
  12. 12. Proposed solution: Metrics & Monitoring Instrument your models with “vital signs” Timely catch your model: • Suddenly breaking • Drifting into worthlessness • Doing something strange
  13. 13. Machine Learning Vital Signs • Some metric from a productionalized model that you can monitor for change over time • Have alerts in place that detect: • An unacceptable amount of drift over time • A surprise and strange amount of errors in one period • What is the average of the vital? • What is the standard deviation of the vital? • What are acceptable bounds for the vital?
  14. 14. Vital: Accuracy How often the model is correct or not correct • Naturally will decrease over time • Big dips (or jumps) can be indicative of something wrong • Can mimic how the data was initially labeled • Automatically labeled as part of the data • Manually labeled… uh oh
  15. 15. Vital: Accuracy
  16. 16. Vital: Accuracy Per Label How often the model is correct or not correct, for each potential output label • More fine grained than Accuracy • Can sometimes catch things that Accuracy with large class imbalance
  17. 17. Vital: Accuracy Per Label
  18. 18. Vital: Model Agreement How often the previous models, not in production, agree with the new model • Some disagreement is natural, but a large number amounts of disagreement can be indicative of a bug or problem • Can be an alternative to Accuracy if Accuracy is hard to measure
  19. 19. Vital: Model Agreement
  20. 20. Vital: Output Distribution How often each class is predicted or the distribution of regression output values • Can catch long-term trends, permanent changes, and seasonal changes • Can catch bugs and problems with large swings outside of a few standard deviations • Can be an alternative to Accuracy, but hard to tell the difference between “weird” and “bad”
  21. 21. Vital: Output Distribution GOOD MAYBE BAD
  22. 22. Vital: Canaries Does a test input case predict what we expect? • Can catch obvious issues if a test case the model should get right returns a wrong output value • Good at testing all or nothing problems, but struggles on trends • Very simple to implement and brutally effective
  23. 23. Vital: Canaries
  24. 24. Vital: Human Complaints Do humans agree with what the model is doing? • People love complaining about AI • Harness that power to give you feedback • Effective in large-scale applications that interact with humans • Can double as a continuous data labeling exercise
  25. 25. Metrics and Monitoring Tips • Figure out which vital signs can be done for each model • Create log files • Send the logfiles somewhere • Make pretty charts • Build a dashboard • Watch it
  26. 26. Summary Track what your models are doing Watch what your models are doing
  27. 27. Machine Learning Vital Signs Metrics and Monitoring of AI in Production Donald Miner Miner & Kasch OSCON July 16, 2019