SlideShare a Scribd company logo
A Missing Link in the ML
Infrastructure Stack
Josh Tobin
Stealth Startup, UC Berkeley, Former OpenAI
Machine Learning is now a
product engineering
discipline
Machine Learning is now a
product engineering
discipline
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
How did we get here?
4
ML analytics
2000s
• Simple models run offline
on medium to large
datasets to produce
reports
• Value comes from
incorporating model
insights into decisions
ML hype
2010s
• Complicated models
trained on massive
datasets to produce
papers
• Value comes from
marketing potential of
high-profile research
output
ML products
2020s?
• Reproducibility, scalability,
and maintainability over
complexity
• Value comes from models
improving the business’s
products or services
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
ML products require a fundamentally new process
5
“Flat-earth” ML
Collect data
Clean and
label
Train Report
Select problem
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
ML products require a fundamentally new process
6
Collect data
Clean and
label
Train Report
Select problem
Test
Deploy
Monitor
ML Product Engineering
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
ML teams that don’t make the transition die
7
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• Other disciplines will catch up to model training in prestige and
pay
• The three Ps (papers, pie charts, PoCs) are no longer enough
What does it mean for you?
8
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Those that make the transition will create amazing things
9
• Autonomous Vehicles
• Real-time translation
• Drug discovery
• Marketing automation
• Personalization
• Document understanding
• Etc
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• Run online and in real-time
• Deal with constantly evolving data distributions
• Handle messy, long-tail real world data
• Make predictions autonomously or semi-autonomously
Unlike flat-earth ML, ML products often:
10
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• Run online and in real-time
Host and serve models with low latency
• Deal with constantly evolving data distributions
Retrain models frequently, even continuously
• Handle messy, long-tail real world data
Inspect your data scalable, manage slices and edge cases
• Make predictions autonomously or semi-autonomously
Quickly catch and diagnose bugs and distribution changes
This implies new ops & infra demands
11
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Is the infrastructure stack keeping up?
12
Collect data
Clean and
label
Train Report
Select problem
Test
Deploy
Monitor
Reproducible pipelines
Training infrastructure
Experiment management
Train
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Is the infrastructure stack keeping up?
13
Collect data
Clean and
label
Train Report
Select problem
Deploy
Monitor
Train
What’s still hard?
• Surfacing areas of poor
performance
• Managing all your test
cases
Model perf exploration
Test
CI/CD tools
Explainability tools
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Test
Is the infrastructure stack keeping up?
14
Collect data
Clean and
label
Train Report
Select problem
Monitor
Train
Deploy
Model serving
Feature stores
What’s still hard?
• Experimentation (AB
tests, shadow tests)
• Online / offline
consistency
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Deploy Test
Is the infrastructure stack keeping up?
15
Collect data
Clean and
label
Train Report
Select problem Train
System monitoring
What’s still hard?
• Performance
monitoring
• Drift is still a bit of an
art
Monitor
Data quality / drift
Deequ
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Monitor Deploy Test
Is the infrastructure stack keeping up?
16
Clean and
label
Train Report
Select problem Train
What’s still hard?
• Subsampling data
• Connecting the data
back to the model
Collect data
Data lakes, warehouses
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Collect data
Monitor Deploy Test
Is the infrastructure stack keeping up?
17
Train Report
Select problem Train
What’s still hard?
• What data should I
label?
• What data should I
train on?
Clean and
label
Labeling tools & services Active learning tools
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Clean and
label
Collect data
Monitor Deploy Test
Is the infrastructure stack keeping up?
18
Train Report
Select problem
What’s still hard?
• How do I know when to
retrain?
• (Retraining online)
Train
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• Many tools emerging to address the problems of ML product
engineering
• Problems arise at the boundaries of the tools, especially anything
that shepherds data through the process
• At all stages, granular understanding of model performance is
lacking
Takeaways
19
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
A central place to store and query online and offline ground truth
and approximate model quality metrics
The Evaluation Store
20
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin 21
Eval Store
Training Evaluation
Production
Data and prediction profiles
Metric & slice definitions
Feedback on model predictions
Feature store
Model hub
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• Subset of models in the store
• Subset of metrics in the store
• Subset of slices in the store
• Specification of the window of data
Querying the evaluation store
22
What form do queries take?
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• Subset of models in the store
• Subset of metrics in the store
• Subset of slices in the store
• Specification of the window of data
Querying the evaluation store
23
What form do queries take? E.g.,
What is the importance-weighted
average drift across all of my features in
my production model in the last 60
minutes?
Monitoring
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• Subset of models in the store
• Subset of metrics in the store
• Subset of slices in the store
• Specification of the window of data
Querying the evaluation store
24
What form do queries take? E.g.,
How much worse is the my accuracy in
the last 7 days than it was during
training?
Monitoring
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• Subset of models in the store
• Subset of metrics in the store
• Subset of slices in the store
• Specification of the window of data
Querying the evaluation store
25
What form do queries take? E.g.,
How do all of the metrics compare for
model A and model B across all slices in
my main evaluation set?
Testing
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• Subset of models in the store
• Subset of metrics in the store
• Subset of slices in the store
• Specification of the window of data
Querying the evaluation store
26
What form do queries take? E.g.,
How do my business metrics compare
for model A and model B in the last 60
minutes
AB testing
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• In a perfect world, we would know right away how well the model performs on all data
points seen in production
• In the real world, labels are unreliable, expensive, and delayed
• Approximate performance metrics are ways to guess which data points may have poor
performance
• E.g., distribution distance between these data points and a reference distribution
• E.g., outlier detection
• E.g., weak supervision (a la Snorkel)
• E.g., metrics about your users (like engagement)
A digression: approximate performance metrics
27
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
The Evaluation Store
28
Collect data
Clean and
label
Train Report
Select problem
Test
Deploy
Monitor
Eval Store
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
The Evaluation Store
29
Collect data
Clean and
label
Train Report
Select problem
Test
Deploy
Monitor
Train
Eval Store
• Register data distribution and
performance for this model
• Warn us if training data looks
too different than prod
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
The Evaluation Store
30
Collect data
Clean and
label
Train Report
Select problem
Deploy
Monitor
Train
Test
Eval Store
• Register performance for this
model on all test slices
• Pull historical that has been
flagged as “interesting” (e.g.,
gave another model trouble)
• Pull definitions of slices
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Test
The Evaluation Store
31
Collect data
Clean and
label
Train Report
Select problem
Monitor
Train
Deploy
Eval Store
• Run a shadow test or AB test by
pulling the diff in model
performance between versions
• Log data and approximate
performance back to the eval
store
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Deploy Test
The Evaluation Store
32
Collect data
Clean and
label
Train Report
Select problem Train
Monitor
Eval Store
• Fire an alert when approximate
performance on any of our
slices dips below a threshold
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Monitor Deploy Test
The Evaluation Store
33
Clean and
label
Train Report
Select problem Train
Collect data
Eval Store
• Log more data with low or
uncertain approximate
performance
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Collect data
Monitor Deploy Test
The Evaluation Store
34
Train Report
Select problem Train
Clean and
label
Eval Store
• Inspect & label data with low
approximate performance
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Clean and
label
Collect data
Monitor Deploy Test
The Evaluation Store
35
Train Report
Select problem Train
Eval Store
• Retrain when approximate
performance dips below a
threshold
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• Reduce organization friction. Get stakeholders (ML eng, ML research, PM, MLOps,
etc) on the same page about metric and slice definitions
• Deploy models more confidently. Evaluate metrics and slices consistently in
testing and prod. Make the metrics visible to stakeholders
• Catch production bugs faster. Catch degradations across any slice, and drill
down to the data that caused the degradation
• Reduce data-related costs. Collect and label production data more intelligently
• Make your model better. Decide when to retrain. Pick the right data to retrain on.
What could an eval store help you with?
36
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• Feature store is indexed by feature, eval store is indexed by model
• A model taking a feature as input doesn’t mean that it looks at the
entire distribution
• A “poor quality” feature has different effects on different models
• Not all data will come through the feature store
• The two should talk to each other!
Shouldn’t the feature store do this?
37
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• Yes
• The hard part here is approximating how well your model might
be performing right now
• That’s ML monitoring
Wait, isn’t this just ML monitoring?
38
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• No
• Eval store should provide a consistent view of online and offline
performance
• Eval store is tightly integrated into the entire MLOps stack
• Eval store keeps track of what data caused questions
performance, so it can be used for testing and retraining
Wait, isn’t this just ML monitoring?
39
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
ML monitoring
40
Evaluation Production
Training Monitoring
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Eval store
41
Evaluation Production
Training
Eval store
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Case study 1: the Tesla data engine
42
youtube.com/watch?t=7714&v=Ucp0TTmvqOE
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Case study 2: TFX data validation
43
https://mlsys.org/Conferences/2019/doc/2019/167.pdf
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
Case study 3: Overton (Apple)
44
https://machinelearning.apple.com/research/overton
Missing Link in ML Infrastructure — SF Big Data Analytics
Josh Tobin
• To turn ML into a product engineering discipline, we need an
infrastructure stack that helps create a data flywheel
• What’s still missing?
• Granular, online-offline understanding of model performance
• Orchestrating data and models throughout the whole loop
• Maybe the Evaluation Store could help
A Missing Link in the ML Infra Stack?
45

More Related Content

What's hot

The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
Databricks
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
markgrover
 
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Databricks
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
Georg Heiler
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
Stepan Pushkarev
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud ServicesOracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Michael Hichwa
 
Cara v3 8 major new features
Cara v3 8 major new featuresCara v3 8 major new features
Cara v3 8 major new features
Generis
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
Syed Hadoop
 
Porting R Models into Scala Spark
Porting R Models into Scala SparkPorting R Models into Scala Spark
Porting R Models into Scala Spark
carl_pulley
 
Data Science Salon: A Journey of Deploying a Data Science Engine to Production
Data Science Salon: A Journey of Deploying a Data Science Engine to ProductionData Science Salon: A Journey of Deploying a Data Science Engine to Production
Data Science Salon: A Journey of Deploying a Data Science Engine to Production
Formulatedby
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
jeykottalam
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
Jasjeet Thind
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
Databricks
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireEmbracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Databricks
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
DataWorks Summit
 
Zipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering FrameworkZipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering Framework
Databricks
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
Bryan Yang
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
Jo-fai Chow
 

What's hot (20)

The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud ServicesOracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
 
Cara v3 8 major new features
Cara v3 8 major new featuresCara v3 8 major new features
Cara v3 8 major new features
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
 
Porting R Models into Scala Spark
Porting R Models into Scala SparkPorting R Models into Scala Spark
Porting R Models into Scala Spark
 
Data Science Salon: A Journey of Deploying a Data Science Engine to Production
Data Science Salon: A Journey of Deploying a Data Science Engine to ProductionData Science Salon: A Journey of Deploying a Data Science Engine to Production
Data Science Salon: A Journey of Deploying a Data Science Engine to Production
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireEmbracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Zipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering FrameworkZipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering Framework
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
 

Similar to A missing link in the ML infrastructure stack?

Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in Production
Databricks
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
Turi, Inc.
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
Turi, Inc.
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
Databricks
 
ISO 19650 Information Management Process - Information Model Delivery (Episod...
ISO 19650 Information Management Process - Information Model Delivery (Episod...ISO 19650 Information Management Process - Information Model Delivery (Episod...
ISO 19650 Information Management Process - Information Model Delivery (Episod...
Clive Jordan - fighter of Evil BIM
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsVivastream
 
Preconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technologyPreconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technology
Jen Stirrup
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
Pierre Gutierrez
 
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
Databricks
 
Driving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsDriving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle Analytics
Perficient, Inc.
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Kent Graziano
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya
 
Enabling Self Service Business Intelligence using Excel
Enabling Self Service Business Intelligenceusing ExcelEnabling Self Service Business Intelligenceusing Excel
Enabling Self Service Business Intelligence using ExcelAlan Koo
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life Cycle
SrujanaMerugu1
 
Understanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis ProjectUnderstanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis Project
Level Education
 
Data quality in decision making - Dr. Philip Woodall, University of Cambridge
Data quality in decision making - Dr. Philip Woodall, University of CambridgeData quality in decision making - Dr. Philip Woodall, University of Cambridge
Data quality in decision making - Dr. Philip Woodall, University of Cambridge
BCS Data Management Specialist Group
 
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic BorstnarSupporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Institute of Contemporary Sciences
 

Similar to A missing link in the ML infrastructure stack? (20)

Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in Production
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
ISO 19650 Information Management Process - Information Model Delivery (Episod...
ISO 19650 Information Management Process - Information Model Delivery (Episod...ISO 19650 Information Management Process - Information Model Delivery (Episod...
ISO 19650 Information Management Process - Information Model Delivery (Episod...
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Preconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technologyPreconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technology
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
 
Driving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsDriving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle Analytics
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
Enabling Self Service Business Intelligence using Excel
Enabling Self Service Business Intelligenceusing ExcelEnabling Self Service Business Intelligenceusing Excel
Enabling Self Service Business Intelligence using Excel
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life Cycle
 
Understanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis ProjectUnderstanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis Project
 
Data quality in decision making - Dr. Philip Woodall, University of Cambridge
Data quality in decision making - Dr. Philip Woodall, University of CambridgeData quality in decision making - Dr. Philip Woodall, University of Cambridge
Data quality in decision making - Dr. Philip Woodall, University of Cambridge
 
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic BorstnarSupporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
 

More from Chester Chen

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
Chester Chen
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdf
Chester Chen
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
Chester Chen
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
Chester Chen
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdata
Chester Chen
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
Chester Chen
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
Chester Chen
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
Chester Chen
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Chester Chen
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdap
Chester Chen
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
Chester Chen
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
Chester Chen
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
Chester Chen
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
Chester Chen
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
Chester Chen
 
Index conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreathIndex conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreath
Chester Chen
 
Hspark index conf
Hspark index confHspark index conf
Hspark index conf
Chester Chen
 
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
Chester Chen
 

More from Chester Chen (20)

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdf
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdata
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdap
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
 
Index conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreathIndex conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreath
 
Hspark index conf
Hspark index confHspark index conf
Hspark index conf
 
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 

A missing link in the ML infrastructure stack?

  • 1. A Missing Link in the ML Infrastructure Stack Josh Tobin Stealth Startup, UC Berkeley, Former OpenAI
  • 2. Machine Learning is now a product engineering discipline
  • 3. Machine Learning is now a product engineering discipline
  • 4. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin How did we get here? 4 ML analytics 2000s • Simple models run offline on medium to large datasets to produce reports • Value comes from incorporating model insights into decisions ML hype 2010s • Complicated models trained on massive datasets to produce papers • Value comes from marketing potential of high-profile research output ML products 2020s? • Reproducibility, scalability, and maintainability over complexity • Value comes from models improving the business’s products or services
  • 5. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin ML products require a fundamentally new process 5 “Flat-earth” ML Collect data Clean and label Train Report Select problem
  • 6. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin ML products require a fundamentally new process 6 Collect data Clean and label Train Report Select problem Test Deploy Monitor ML Product Engineering
  • 7. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin ML teams that don’t make the transition die 7
  • 8. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • Other disciplines will catch up to model training in prestige and pay • The three Ps (papers, pie charts, PoCs) are no longer enough What does it mean for you? 8
  • 9. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Those that make the transition will create amazing things 9 • Autonomous Vehicles • Real-time translation • Drug discovery • Marketing automation • Personalization • Document understanding • Etc
  • 10. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • Run online and in real-time • Deal with constantly evolving data distributions • Handle messy, long-tail real world data • Make predictions autonomously or semi-autonomously Unlike flat-earth ML, ML products often: 10
  • 11. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • Run online and in real-time Host and serve models with low latency • Deal with constantly evolving data distributions Retrain models frequently, even continuously • Handle messy, long-tail real world data Inspect your data scalable, manage slices and edge cases • Make predictions autonomously or semi-autonomously Quickly catch and diagnose bugs and distribution changes This implies new ops & infra demands 11
  • 12. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Is the infrastructure stack keeping up? 12 Collect data Clean and label Train Report Select problem Test Deploy Monitor Reproducible pipelines Training infrastructure Experiment management Train
  • 13. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Is the infrastructure stack keeping up? 13 Collect data Clean and label Train Report Select problem Deploy Monitor Train What’s still hard? • Surfacing areas of poor performance • Managing all your test cases Model perf exploration Test CI/CD tools Explainability tools
  • 14. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Test Is the infrastructure stack keeping up? 14 Collect data Clean and label Train Report Select problem Monitor Train Deploy Model serving Feature stores What’s still hard? • Experimentation (AB tests, shadow tests) • Online / offline consistency
  • 15. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Deploy Test Is the infrastructure stack keeping up? 15 Collect data Clean and label Train Report Select problem Train System monitoring What’s still hard? • Performance monitoring • Drift is still a bit of an art Monitor Data quality / drift Deequ
  • 16. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Monitor Deploy Test Is the infrastructure stack keeping up? 16 Clean and label Train Report Select problem Train What’s still hard? • Subsampling data • Connecting the data back to the model Collect data Data lakes, warehouses
  • 17. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Collect data Monitor Deploy Test Is the infrastructure stack keeping up? 17 Train Report Select problem Train What’s still hard? • What data should I label? • What data should I train on? Clean and label Labeling tools & services Active learning tools
  • 18. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Clean and label Collect data Monitor Deploy Test Is the infrastructure stack keeping up? 18 Train Report Select problem What’s still hard? • How do I know when to retrain? • (Retraining online) Train
  • 19. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • Many tools emerging to address the problems of ML product engineering • Problems arise at the boundaries of the tools, especially anything that shepherds data through the process • At all stages, granular understanding of model performance is lacking Takeaways 19
  • 20. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin A central place to store and query online and offline ground truth and approximate model quality metrics The Evaluation Store 20
  • 21. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin 21 Eval Store Training Evaluation Production Data and prediction profiles Metric & slice definitions Feedback on model predictions Feature store Model hub
  • 22. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • Subset of models in the store • Subset of metrics in the store • Subset of slices in the store • Specification of the window of data Querying the evaluation store 22 What form do queries take?
  • 23. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • Subset of models in the store • Subset of metrics in the store • Subset of slices in the store • Specification of the window of data Querying the evaluation store 23 What form do queries take? E.g., What is the importance-weighted average drift across all of my features in my production model in the last 60 minutes? Monitoring
  • 24. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • Subset of models in the store • Subset of metrics in the store • Subset of slices in the store • Specification of the window of data Querying the evaluation store 24 What form do queries take? E.g., How much worse is the my accuracy in the last 7 days than it was during training? Monitoring
  • 25. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • Subset of models in the store • Subset of metrics in the store • Subset of slices in the store • Specification of the window of data Querying the evaluation store 25 What form do queries take? E.g., How do all of the metrics compare for model A and model B across all slices in my main evaluation set? Testing
  • 26. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • Subset of models in the store • Subset of metrics in the store • Subset of slices in the store • Specification of the window of data Querying the evaluation store 26 What form do queries take? E.g., How do my business metrics compare for model A and model B in the last 60 minutes AB testing
  • 27. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • In a perfect world, we would know right away how well the model performs on all data points seen in production • In the real world, labels are unreliable, expensive, and delayed • Approximate performance metrics are ways to guess which data points may have poor performance • E.g., distribution distance between these data points and a reference distribution • E.g., outlier detection • E.g., weak supervision (a la Snorkel) • E.g., metrics about your users (like engagement) A digression: approximate performance metrics 27
  • 28. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin The Evaluation Store 28 Collect data Clean and label Train Report Select problem Test Deploy Monitor Eval Store
  • 29. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin The Evaluation Store 29 Collect data Clean and label Train Report Select problem Test Deploy Monitor Train Eval Store • Register data distribution and performance for this model • Warn us if training data looks too different than prod
  • 30. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin The Evaluation Store 30 Collect data Clean and label Train Report Select problem Deploy Monitor Train Test Eval Store • Register performance for this model on all test slices • Pull historical that has been flagged as “interesting” (e.g., gave another model trouble) • Pull definitions of slices
  • 31. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Test The Evaluation Store 31 Collect data Clean and label Train Report Select problem Monitor Train Deploy Eval Store • Run a shadow test or AB test by pulling the diff in model performance between versions • Log data and approximate performance back to the eval store
  • 32. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Deploy Test The Evaluation Store 32 Collect data Clean and label Train Report Select problem Train Monitor Eval Store • Fire an alert when approximate performance on any of our slices dips below a threshold
  • 33. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Monitor Deploy Test The Evaluation Store 33 Clean and label Train Report Select problem Train Collect data Eval Store • Log more data with low or uncertain approximate performance
  • 34. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Collect data Monitor Deploy Test The Evaluation Store 34 Train Report Select problem Train Clean and label Eval Store • Inspect & label data with low approximate performance
  • 35. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Clean and label Collect data Monitor Deploy Test The Evaluation Store 35 Train Report Select problem Train Eval Store • Retrain when approximate performance dips below a threshold
  • 36. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • Reduce organization friction. Get stakeholders (ML eng, ML research, PM, MLOps, etc) on the same page about metric and slice definitions • Deploy models more confidently. Evaluate metrics and slices consistently in testing and prod. Make the metrics visible to stakeholders • Catch production bugs faster. Catch degradations across any slice, and drill down to the data that caused the degradation • Reduce data-related costs. Collect and label production data more intelligently • Make your model better. Decide when to retrain. Pick the right data to retrain on. What could an eval store help you with? 36
  • 37. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • Feature store is indexed by feature, eval store is indexed by model • A model taking a feature as input doesn’t mean that it looks at the entire distribution • A “poor quality” feature has different effects on different models • Not all data will come through the feature store • The two should talk to each other! Shouldn’t the feature store do this? 37
  • 38. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • Yes • The hard part here is approximating how well your model might be performing right now • That’s ML monitoring Wait, isn’t this just ML monitoring? 38
  • 39. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • No • Eval store should provide a consistent view of online and offline performance • Eval store is tightly integrated into the entire MLOps stack • Eval store keeps track of what data caused questions performance, so it can be used for testing and retraining Wait, isn’t this just ML monitoring? 39
  • 40. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin ML monitoring 40 Evaluation Production Training Monitoring
  • 41. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Eval store 41 Evaluation Production Training Eval store
  • 42. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Case study 1: the Tesla data engine 42 youtube.com/watch?t=7714&v=Ucp0TTmvqOE
  • 43. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Case study 2: TFX data validation 43 https://mlsys.org/Conferences/2019/doc/2019/167.pdf
  • 44. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin Case study 3: Overton (Apple) 44 https://machinelearning.apple.com/research/overton
  • 45. Missing Link in ML Infrastructure — SF Big Data Analytics Josh Tobin • To turn ML into a product engineering discipline, we need an infrastructure stack that helps create a data flywheel • What’s still missing? • Granular, online-offline understanding of model performance • Orchestrating data and models throughout the whole loop • Maybe the Evaluation Store could help A Missing Link in the ML Infra Stack? 45