SlideShare a Scribd company logo
1 of 43
Download to read offline
Monitoring AI with AI
Stepan Pushkarev
CTO of Hydrosphere.io
Mission: Accelerate Machine Learning to Production
Opensource Products:
- ML Lambda: ML Deployment and Serving
- Sonar: Data and ML Monitoring
- Mist: Serverless proxy for Spark
Business Model: PaaS and hands-on consulting
About
Traditional Software Machine Learning applications
Explicit business rules ML generated model
Unit testing Model Evaluation
(Micro)service Model as a Service
Docker per service Docker per Model
1 version of Microservice in prod 1-10-20 model versions in prod at a time
Eng + QA team owning a service 1 ML Engineer owning 10-20 models
Fail loudly (exception, stack trace) Fail silently
Can work forever if verified Performance declines over time
Needs continuous retraining / redeployment
App metrics monitoring Data Monitoring | Model Metrics Monitoring
Cost of an AI/ML Error
● Fun
© http://blog.ycombinator.com/how-adversarial-attacks-work/
● Fun
● Not fun
Cost of an AI Error
● Fun
● Not fun
● Not fun at all...
Cost of an AI Error
● Fun
● Not fun
● Not fun at all…
● Money
Cost of an AI Error
● Fun
● Not fun
● Not fun at all…
● Money
● Business
Cost of an AI Error
Where/why may AI fail in prod?
Where/why may AI fail in prod?
Everywhere!
Where/why may AI fail in prod?
● Bad training data
● Bad serving data
● Training/serving data skew
● Misconfiguration
● Deployment issue
● Retraining issue
● Performance
● Concept Drift
Everywhere!
AI Reliability Pyramid
Reliable Training-Serving pipelines
Comfort Zone for Data Scientist in the
middle of Production
AI Reliability Pyramid
Model Deployment and integration
model.pkl model.zip
How to integrate it into AI Application?
Model server = Model Artifact +
Metadata + Runtime + Deps + Sidecar
/predict
input:
string text;
bytes image;
output:
string summary;
JVM DL4j
GPU
matching_model v2
[
....
]
gRPC HTTP server
routing, shadowing
pipelining
tracing
metrics
autoscaling
A/B, canary
sidecar
serving
requests
Model Deployment takeaways
● Eliminates hand-off between Data Scientist -> ML Eng ->
Data Eng -> SA Eng -> QA -> Ops
● Sticks components together: Data + Model + Applications +
Automation = AI Application
● Enables quick transition from research to production. ML
engineers can deploy models many times a day
But wait… This is not safe!
How to ensure we’ll not break things in prod?
AI Reliability Pyramid
1) Is the model degraded?
2) What is the reason?
Data Format Drift
Concept Drift
Concept Drift
Data exploration in production
Research:
Data Scientist makes
assumptions based on results
of data exploration
Data exploration in production
Research:
Data Scientist explores
datasets and makes
assumptions/hypothesis
Production:
The model works if and only
if the format and statistical
properties of prod data are
the same as in research
Push to Prod
Data exploration in production
Research:
Data Scientist makes
assumptions based on results
of data exploration
Production:
The model works if and only
if format and statistical
properties of prod data are
the same as in research
Push to Prod
Continuous data exploration
and validation?
Automatic Data Profiling
● Avro/Protobuf schema can catch data format drifts
● Statistical properties of input features are to be
captured and continously validated
{"name": "User",
"fields": [
{"name": "name", "type": "string", "min_length": 2, "max_length": 128},
{"name": "age", "type": ["int", "null"], "range": "[10, 100]"},
{"name": "sex", "type": ["string", "null"], " enum": "[male, female, ...]"},
{"name": "wage", "type": ["int", "null"], "validator": "a-distance"}
]
}
Quality metrics generated from
data profile checks
How to deal with
- multidimensional dataset
- data timeliness
- data completeness
- image data
- complicated seasonality?
Anomaly detection
● Rule based programs -> statistical models -> machine
learning models
● Deal with multidimensional datasets, timeliness and
complicated seasonality
Model Monitoring Metrics on streaming data
● System metrics (latency/throughput)
● Kolmogorov-Smirnov
● Q-Q plot, t-digest
● Spearman and Pearson correlations
● Density based clustering algorithms with Elbow or
Silhouette methods
● Deep Autoencoders
● Generative Adversarial Networks
● Random Cut Forest (AWS paper)
● “Bring your own” metric
GANs for monitoring data quality at serving time
{production input}
{good}
{drift (fake)}
Model server = Metadata + Model Artifact +
Runtime + Deps + Sidecar + Training Metadata
/predict
input:
output:
JVM DL4j / TF / Other
GPU
CPU
model v2
[
....
]
gRPC HTTP server
sidecar
serving
requests
training data stats:
- min, max
- range
- clusters
- quantiles
- autoencoder
compare with prod
data in runtime
Change of the Paradigm
Shifts experimentation to
prod/shadowed environment
Use Case: Kolmogorov-Smirnov in action
Use Case: Monitoring NLU system
Figure from: Bapna, Ankur, et al. "Towards zero-shot frame semantic parsing for domain scaling."
arXiv preprint arXiv:1707.02363 (2017).
Use Case: Monitoring NLU system
Source image: Kurata, Gakuto, et al. "Leveraging sentence-level information with encoder lstm for semantic slot filling." arXiv preprint
arXiv:1601.01530 (2016).
● Train and test offline on restaurants domain
● Deploy do prod
● Feed the model with new random Wiki data
● Monitor intermediate input representations (neural network hidden states)
Use Case: Monitoring NLU system
● Red and Purple - cluster
of “Bad” production data
● Yellow and Blue - dev and
test data
AI Reliability Pyramid
Drift Handling
● Unexpected or dramatic drift? - Alert and add
ML/Data Engineer into the loop.
● Expected drift? - Retrain.
Open question to be solved with ML: classify expected
vs. unexpected drift.
Model Retraining - common questions
When to retrain?
When/how to push to prod?
What data to retraining with?
Manually on demand
Works well for 1 model
But does not scale
Model Retraining - common questions
When to retrain?
When/how to push to prod safely?
What data to retraining with?
Manually on demand
Works well for 1 model
But does not scale
Automatically with the
latest batch
Not safe
Can be expensive
The latest batch may
not be representative
Solution: Reactive AI powered retraining
Thank you
- Stepan Pushkarev
- @hydrospheredata
- https://github.com/Hydrospheredata
- https://hydrosphere.io/
- spushkarev@hydrosphere.io

More Related Content

What's hot

Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 

What's hot (20)

Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scale
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
Generative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second SessionGenerative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second Session
 
ML-Ops: Philosophy, Best-Practices and Tools
ML-Ops:Philosophy, Best-Practices and ToolsML-Ops:Philosophy, Best-Practices and Tools
ML-Ops: Philosophy, Best-Practices and Tools
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Concept Drift: Monitoring Model Quality In Streaming ML Applications
Concept Drift: Monitoring Model Quality In Streaming ML ApplicationsConcept Drift: Monitoring Model Quality In Streaming ML Applications
Concept Drift: Monitoring Model Quality In Streaming ML Applications
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
 
Machine learning life cycle
Machine learning life cycleMachine learning life cycle
Machine learning life cycle
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 

Similar to Monitoring AI with AI

Similar to Monitoring AI with AI (20)

Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in production
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019 Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
Build and Host Real-world Machine Learning Services from Scratch @ pycontw2019
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
 
Data Science in the Elastic Stack
Data Science in the Elastic StackData Science in the Elastic Stack
Data Science in the Elastic Stack
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
EPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHUEPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHU
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 

More from Stepan Pushkarev

More from Stepan Pushkarev (7)

AI for the Human Retina to Protect Newborn Vision
AI for the Human Retina to Protect Newborn VisionAI for the Human Retina to Protect Newborn Vision
AI for the Human Retina to Protect Newborn Vision
 
Automating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowAutomating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflow
 
Handling inference in anomalous ever changing environment
Handling inference in anomalous ever changing environmentHandling inference in anomalous ever changing environment
Handling inference in anomalous ever changing environment
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 

Recently uploaded (20)

Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
Malaysia E-Invoice digital signature docpptx
Malaysia E-Invoice digital signature docpptxMalaysia E-Invoice digital signature docpptx
Malaysia E-Invoice digital signature docpptx
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Naer Toolbar Redesign - Usability Research Synthesis
Naer Toolbar Redesign - Usability Research SynthesisNaer Toolbar Redesign - Usability Research Synthesis
Naer Toolbar Redesign - Usability Research Synthesis
 

Monitoring AI with AI

  • 1. Monitoring AI with AI Stepan Pushkarev CTO of Hydrosphere.io
  • 2. Mission: Accelerate Machine Learning to Production Opensource Products: - ML Lambda: ML Deployment and Serving - Sonar: Data and ML Monitoring - Mist: Serverless proxy for Spark Business Model: PaaS and hands-on consulting About
  • 3. Traditional Software Machine Learning applications Explicit business rules ML generated model Unit testing Model Evaluation (Micro)service Model as a Service Docker per service Docker per Model 1 version of Microservice in prod 1-10-20 model versions in prod at a time Eng + QA team owning a service 1 ML Engineer owning 10-20 models Fail loudly (exception, stack trace) Fail silently Can work forever if verified Performance declines over time Needs continuous retraining / redeployment App metrics monitoring Data Monitoring | Model Metrics Monitoring
  • 4. Cost of an AI/ML Error ● Fun © http://blog.ycombinator.com/how-adversarial-attacks-work/
  • 5. ● Fun ● Not fun Cost of an AI Error
  • 6. ● Fun ● Not fun ● Not fun at all... Cost of an AI Error
  • 7. ● Fun ● Not fun ● Not fun at all… ● Money Cost of an AI Error
  • 8. ● Fun ● Not fun ● Not fun at all… ● Money ● Business Cost of an AI Error
  • 9. Where/why may AI fail in prod?
  • 10. Where/why may AI fail in prod? Everywhere!
  • 11. Where/why may AI fail in prod? ● Bad training data ● Bad serving data ● Training/serving data skew ● Misconfiguration ● Deployment issue ● Retraining issue ● Performance ● Concept Drift Everywhere!
  • 13. Reliable Training-Serving pipelines Comfort Zone for Data Scientist in the middle of Production
  • 15. Model Deployment and integration model.pkl model.zip How to integrate it into AI Application?
  • 16. Model server = Model Artifact + Metadata + Runtime + Deps + Sidecar /predict input: string text; bytes image; output: string summary; JVM DL4j GPU matching_model v2 [ .... ] gRPC HTTP server routing, shadowing pipelining tracing metrics autoscaling A/B, canary sidecar serving requests
  • 17. Model Deployment takeaways ● Eliminates hand-off between Data Scientist -> ML Eng -> Data Eng -> SA Eng -> QA -> Ops ● Sticks components together: Data + Model + Applications + Automation = AI Application ● Enables quick transition from research to production. ML engineers can deploy models many times a day But wait… This is not safe! How to ensure we’ll not break things in prod?
  • 18. AI Reliability Pyramid 1) Is the model degraded? 2) What is the reason?
  • 22. Data exploration in production Research: Data Scientist makes assumptions based on results of data exploration
  • 23. Data exploration in production Research: Data Scientist explores datasets and makes assumptions/hypothesis Production: The model works if and only if the format and statistical properties of prod data are the same as in research Push to Prod
  • 24. Data exploration in production Research: Data Scientist makes assumptions based on results of data exploration Production: The model works if and only if format and statistical properties of prod data are the same as in research Push to Prod Continuous data exploration and validation?
  • 25. Automatic Data Profiling ● Avro/Protobuf schema can catch data format drifts ● Statistical properties of input features are to be captured and continously validated {"name": "User", "fields": [ {"name": "name", "type": "string", "min_length": 2, "max_length": 128}, {"name": "age", "type": ["int", "null"], "range": "[10, 100]"}, {"name": "sex", "type": ["string", "null"], " enum": "[male, female, ...]"}, {"name": "wage", "type": ["int", "null"], "validator": "a-distance"} ] }
  • 26. Quality metrics generated from data profile checks
  • 27. How to deal with - multidimensional dataset - data timeliness - data completeness - image data - complicated seasonality?
  • 28.
  • 29. Anomaly detection ● Rule based programs -> statistical models -> machine learning models ● Deal with multidimensional datasets, timeliness and complicated seasonality
  • 30. Model Monitoring Metrics on streaming data ● System metrics (latency/throughput) ● Kolmogorov-Smirnov ● Q-Q plot, t-digest ● Spearman and Pearson correlations ● Density based clustering algorithms with Elbow or Silhouette methods ● Deep Autoencoders ● Generative Adversarial Networks ● Random Cut Forest (AWS paper) ● “Bring your own” metric
  • 31. GANs for monitoring data quality at serving time {production input} {good} {drift (fake)}
  • 32. Model server = Metadata + Model Artifact + Runtime + Deps + Sidecar + Training Metadata /predict input: output: JVM DL4j / TF / Other GPU CPU model v2 [ .... ] gRPC HTTP server sidecar serving requests training data stats: - min, max - range - clusters - quantiles - autoencoder compare with prod data in runtime
  • 33. Change of the Paradigm Shifts experimentation to prod/shadowed environment
  • 35. Use Case: Monitoring NLU system Figure from: Bapna, Ankur, et al. "Towards zero-shot frame semantic parsing for domain scaling." arXiv preprint arXiv:1707.02363 (2017).
  • 36. Use Case: Monitoring NLU system Source image: Kurata, Gakuto, et al. "Leveraging sentence-level information with encoder lstm for semantic slot filling." arXiv preprint arXiv:1601.01530 (2016). ● Train and test offline on restaurants domain ● Deploy do prod ● Feed the model with new random Wiki data ● Monitor intermediate input representations (neural network hidden states)
  • 37. Use Case: Monitoring NLU system ● Red and Purple - cluster of “Bad” production data ● Yellow and Blue - dev and test data
  • 39. Drift Handling ● Unexpected or dramatic drift? - Alert and add ML/Data Engineer into the loop. ● Expected drift? - Retrain. Open question to be solved with ML: classify expected vs. unexpected drift.
  • 40. Model Retraining - common questions When to retrain? When/how to push to prod? What data to retraining with? Manually on demand Works well for 1 model But does not scale
  • 41. Model Retraining - common questions When to retrain? When/how to push to prod safely? What data to retraining with? Manually on demand Works well for 1 model But does not scale Automatically with the latest batch Not safe Can be expensive The latest batch may not be representative
  • 42. Solution: Reactive AI powered retraining
  • 43. Thank you - Stepan Pushkarev - @hydrospheredata - https://github.com/Hydrospheredata - https://hydrosphere.io/ - spushkarev@hydrosphere.io