SlideShare a Scribd company logo
11
Is that a Time Machine?
Some Design Patterns for Real-World Machine Learning Systems
Justin Basilico
Page Algorithms Engineering
ICML ML Systems Workshop
June 24, 2016
@JustinBasilico
DeLorean image by JMortonPhoto.com & OtoGodfrey.com
22
Introduction
3
Focus
2006 2016
4
Netflix Scale
 > 81M members
 > 190 countries
 > 1000 device types
 > 3B hours/month
 > 36% of peak US
downstream traffic
5
Goal
Help members find content to watch and enjoy
to maximize member satisfaction and retention
6
Machine Learning is Everywhere
Rows
Ranking
Over 80% of what
people watch
comes from our
recommendations
7
Models & Algorithms
 Regression (linear, logistic, elastic net)
 SVD and other Matrix Factorizations
 Factorization Machines
 Restricted Boltzmann Machines
 Deep Neural Networks
 Markov Models and Graph Algorithms
 Clustering
 Latent Dirichlet Allocation
 Gradient Boosted Decision
Trees/Random Forests
 Gaussian Processes
 …
8
Systems
 AWS Cloud
 Online:
 Microservices
 Java
 EVCache, Cassandra
 Offline:
 Hive on S3
 Spark, Docker, Meson
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
More details on Netflix Techblog
99
Design Patterns
10
Why Design Patterns for ML Systems?
Idea Experiment Live
Problem
Problem
11
Design patterns provide…
 Common solutions to common problems
 No need to re-invent them
 A menu of approaches
 Reusable abstractions
 Transcend specific implementations
 Common terminology
 Eases communications of how something works
12
Some machine learning patterns…
 The Hulk
 The Lumberjack
 The Online Archive
 The Time Machine
 The Sentinel
 The Precog
 The Dagobah
 The Anytime Algorithm
 The Parameter Oracle
 The LEGO
 The Terminator
 The Inception
 The Feature Encoder
 The Hoarder
 The Transformer
 The Parameter Server
 The Log Space
 The Matrix Transposed
 The Overflow
 The Substitute
Thanks to: Aish Fenton, Yves Raimond, Dave Ray, Hossein Taghavi, Anuj Shah, DB Tsai, …
13
Application
Machine Learning in an Application
Machine Learning
Application
?Machine
Learned Model
Feature
Encoding
Output
Decoding
Predictor
14
Antipattern:
The Phantom Menace
(AKA Training/Serving Skew)
Different
code/data/platform
between training and
applying model
© Lucasfilm Ltd.
15
Training Pipeline
Application
“Typical” ML Pipeline: A tale of two worlds
Historical
Data
Generate
Features
Train Models
Validate &
Select Models
Application
Logic
Live
Data
Load
Model
Offline
Online
Collect Labels
Publish
Model
Evaluate
Model
Experimentation
16
The Sentinel
Validate model/data in
online environment before
letting it go live“You shall not pass!”
© New Line Cinema
17
Sentinel
Service
Application
Sentinel: Structure
Model
Model
Publisher
Model
Loader
Model Loader
Model
Validator
Offline
Online
Alert
Republish
Some potential checks:
• File format is valid
• Dependent data is available
• Accuracy on shadow live data
• Feature distributions match
• Output is properly calibrated
18
Sentinel
 Example: Checking that new ranking model is valid and
performs better than previous one
 Pros:
 Using a model requires both code and data are available
 Models may need to be versioned along-side code changes
 Ensure that a new model is no worse than previous one
 Cons:
 Sentinel needs to be in sync with application code
 Difficult to choose failure thresholds for data-based checks
19
The Hulk
(AKA Offline Precompute)
Train and evaluate your
full model offline then
publish final outputs
Scale for production
by batching
and brute force
© Disney
© Disney
20
Offline Precompute: Example Structure
Application
Cache
Historical
Data
OfflineOnline
Model Evaluation
Predictor
Data
Publisher
Generate
Features
Decode
Output
lookup
key -> output
save
21
Offline Precompute (aka The Hulk)
 Example: Computing unpersonalized video-to-video similarities
 Pros:
 Easy to set up based on experiment code
 Decouples implementation from online platform
 Can use more computationally expensive models
 Cons:
 Can’t depend on online facts or fresh data
 May have data gaps (e.g. handling new videos, users, etc.)
 May require cleanup to make consistent with online data
 Model output based on offline data; may not be properly calibrated
22
The Lumberjack
(AKA Feature Logging)
Train model on features
logged online from within
an application
Image via YouTube
23
Application
Feature Logging: Structure
Live
Data
Feature
Log Train Models
Predictor
Labels
log
id
Feature
Config
Generate
Features
Decode
Output
Model Evaluation
Offline
Online
24
 Example: Features of pages, rows, and videos in page generation
 Pros:
 Train on features exactly as seen online
 Easy to deploy trained model
 Can include impact of up-stream application logic
 Cons:
 Requires production-grade feature code and deployment
 Takes time to log enough data
 Need all dependent data also in production
 Adds risk to production servers for experimental features
 Feature data can be large; may require sampling
Feature Logging (aka The Lumberjack)
25
The Online Archive
Have online services save
history and expose to
offline systems via batch
interface
© Lucasfilm Ltd.
26
Online Archive: Structure
Live +
Historical
Data
Generate
Features
Collect Labels
Offline
Online
Application
Train Model
batch
interface
live
interface
27
Online Archive
 Example: Filtering online viewing history
 Pros:
 Provides access to online view of data at any time
 Can experiment with new features
 Cons:
 All dependent data needs to keep track of all history
 Only works for small data
 Requires batch interface also available within application
 May be other processes that edit history (e.g. slow arriving events)
 Service needs to handle two very different request loads so batch queries
don’t bring down the live system
28
The Time Machine
Snapshot facts and share
feature generation code
DeLorean image by JMortonPhoto.com & OtoGodfrey.com
29
Application
Time Machine: Example Structure
FeaturesFact Log
Feature
Config
Predictor
Generate
Features
Decode
Output
Online
Snapshotter
Model Evaluation
Generate
Features
Labels
Data
Service
Bulk
Data
Other
Models
Live Data
30
Time Machine
 Example: Training ranking models in Spark*
 Pros:
 Easy to experiment with new features offline
 Allows testing impact of modifying non-ML components
 Can construct full application output after trying new model
 Can share snapshots across applications to help build new ones
 Cons:
 Fact data volume can be high; may require sampling
 Snapshotting requires deciding contexts to collect data for
* See http://bit.ly/sparktimetravel for more info
3131
Conclusions
32
Conclusion
 Some design patterns for avoiding online-offline discrepancies
 The Sentinel
 The Hulk
 The Lumberjack
 The Online Archive
 The Time Machine
 What useful patterns do you see for ML systems?
 Share them!
33
Thank You Justin Basilico
jbasilico@netflix.com
@JustinBasilico

More Related Content

What's hot

Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
Linas Baltrunas
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Justin Basilico
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
Parmeshwar Khurd
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
Faisal Siddiqi
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and Deepak
Deepak Agarwal
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Fernando Amat
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Harald Steck
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Anoop Deoras
 

What's hot (20)

Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and Deepak
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 

Similar to Is that a Time Machine? Some Design Patterns for Real World Machine Learning Systems

Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
MLconf
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Kai Wähner
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
Andrey Sadovykh
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
IanFurlong4
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Kai Wähner
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
confluent
 
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September LondonIcon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
IBM Systems UKI
 
2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges
Ivica Crnkovic
 
Graphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform OptimizationGraphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform Optimization
Big Data Value Association
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
Sri Ambati
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical Systems
Ákos Horváth
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery LabsIncquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
IncQuery Labs
 
Model driven engineering for big data management systems
Model driven engineering for big data management systemsModel driven engineering for big data management systems
Model driven engineering for big data management systems
Marcos Almeida
 
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu QianVirtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
Flink Forward
 

Similar to Is that a Time Machine? Some Design Patterns for Real World Machine Learning Systems (20)

Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
 
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September LondonIcon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
 
2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges
 
Graphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform OptimizationGraphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform Optimization
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud
Unleashing Apache Kafka and TensorFlow in the Cloud

Unleashing Apache Kafka and TensorFlow in the Cloud

 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical Systems
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery LabsIncquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
 
Model driven engineering for big data management systems
Model driven engineering for big data management systemsModel driven engineering for big data management systems
Model driven engineering for big data management systems
 
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu QianVirtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
Virtual Flink Forward 2020: Machine learning with Flink in Weibo - Yu Qian
 

Recently uploaded

Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 

Recently uploaded (20)

Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 

Is that a Time Machine? Some Design Patterns for Real World Machine Learning Systems

  • 1. 11 Is that a Time Machine? Some Design Patterns for Real-World Machine Learning Systems Justin Basilico Page Algorithms Engineering ICML ML Systems Workshop June 24, 2016 @JustinBasilico DeLorean image by JMortonPhoto.com & OtoGodfrey.com
  • 4. 4 Netflix Scale  > 81M members  > 190 countries  > 1000 device types  > 3B hours/month  > 36% of peak US downstream traffic
  • 5. 5 Goal Help members find content to watch and enjoy to maximize member satisfaction and retention
  • 6. 6 Machine Learning is Everywhere Rows Ranking Over 80% of what people watch comes from our recommendations
  • 7. 7 Models & Algorithms  Regression (linear, logistic, elastic net)  SVD and other Matrix Factorizations  Factorization Machines  Restricted Boltzmann Machines  Deep Neural Networks  Markov Models and Graph Algorithms  Clustering  Latent Dirichlet Allocation  Gradient Boosted Decision Trees/Random Forests  Gaussian Processes  …
  • 8. 8 Systems  AWS Cloud  Online:  Microservices  Java  EVCache, Cassandra  Offline:  Hive on S3  Spark, Docker, Meson Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE More details on Netflix Techblog
  • 10. 10 Why Design Patterns for ML Systems? Idea Experiment Live Problem Problem
  • 11. 11 Design patterns provide…  Common solutions to common problems  No need to re-invent them  A menu of approaches  Reusable abstractions  Transcend specific implementations  Common terminology  Eases communications of how something works
  • 12. 12 Some machine learning patterns…  The Hulk  The Lumberjack  The Online Archive  The Time Machine  The Sentinel  The Precog  The Dagobah  The Anytime Algorithm  The Parameter Oracle  The LEGO  The Terminator  The Inception  The Feature Encoder  The Hoarder  The Transformer  The Parameter Server  The Log Space  The Matrix Transposed  The Overflow  The Substitute Thanks to: Aish Fenton, Yves Raimond, Dave Ray, Hossein Taghavi, Anuj Shah, DB Tsai, …
  • 13. 13 Application Machine Learning in an Application Machine Learning Application ?Machine Learned Model Feature Encoding Output Decoding Predictor
  • 14. 14 Antipattern: The Phantom Menace (AKA Training/Serving Skew) Different code/data/platform between training and applying model © Lucasfilm Ltd.
  • 15. 15 Training Pipeline Application “Typical” ML Pipeline: A tale of two worlds Historical Data Generate Features Train Models Validate & Select Models Application Logic Live Data Load Model Offline Online Collect Labels Publish Model Evaluate Model Experimentation
  • 16. 16 The Sentinel Validate model/data in online environment before letting it go live“You shall not pass!” © New Line Cinema
  • 17. 17 Sentinel Service Application Sentinel: Structure Model Model Publisher Model Loader Model Loader Model Validator Offline Online Alert Republish Some potential checks: • File format is valid • Dependent data is available • Accuracy on shadow live data • Feature distributions match • Output is properly calibrated
  • 18. 18 Sentinel  Example: Checking that new ranking model is valid and performs better than previous one  Pros:  Using a model requires both code and data are available  Models may need to be versioned along-side code changes  Ensure that a new model is no worse than previous one  Cons:  Sentinel needs to be in sync with application code  Difficult to choose failure thresholds for data-based checks
  • 19. 19 The Hulk (AKA Offline Precompute) Train and evaluate your full model offline then publish final outputs Scale for production by batching and brute force © Disney © Disney
  • 20. 20 Offline Precompute: Example Structure Application Cache Historical Data OfflineOnline Model Evaluation Predictor Data Publisher Generate Features Decode Output lookup key -> output save
  • 21. 21 Offline Precompute (aka The Hulk)  Example: Computing unpersonalized video-to-video similarities  Pros:  Easy to set up based on experiment code  Decouples implementation from online platform  Can use more computationally expensive models  Cons:  Can’t depend on online facts or fresh data  May have data gaps (e.g. handling new videos, users, etc.)  May require cleanup to make consistent with online data  Model output based on offline data; may not be properly calibrated
  • 22. 22 The Lumberjack (AKA Feature Logging) Train model on features logged online from within an application Image via YouTube
  • 23. 23 Application Feature Logging: Structure Live Data Feature Log Train Models Predictor Labels log id Feature Config Generate Features Decode Output Model Evaluation Offline Online
  • 24. 24  Example: Features of pages, rows, and videos in page generation  Pros:  Train on features exactly as seen online  Easy to deploy trained model  Can include impact of up-stream application logic  Cons:  Requires production-grade feature code and deployment  Takes time to log enough data  Need all dependent data also in production  Adds risk to production servers for experimental features  Feature data can be large; may require sampling Feature Logging (aka The Lumberjack)
  • 25. 25 The Online Archive Have online services save history and expose to offline systems via batch interface © Lucasfilm Ltd.
  • 26. 26 Online Archive: Structure Live + Historical Data Generate Features Collect Labels Offline Online Application Train Model batch interface live interface
  • 27. 27 Online Archive  Example: Filtering online viewing history  Pros:  Provides access to online view of data at any time  Can experiment with new features  Cons:  All dependent data needs to keep track of all history  Only works for small data  Requires batch interface also available within application  May be other processes that edit history (e.g. slow arriving events)  Service needs to handle two very different request loads so batch queries don’t bring down the live system
  • 28. 28 The Time Machine Snapshot facts and share feature generation code DeLorean image by JMortonPhoto.com & OtoGodfrey.com
  • 29. 29 Application Time Machine: Example Structure FeaturesFact Log Feature Config Predictor Generate Features Decode Output Online Snapshotter Model Evaluation Generate Features Labels Data Service Bulk Data Other Models Live Data
  • 30. 30 Time Machine  Example: Training ranking models in Spark*  Pros:  Easy to experiment with new features offline  Allows testing impact of modifying non-ML components  Can construct full application output after trying new model  Can share snapshots across applications to help build new ones  Cons:  Fact data volume can be high; may require sampling  Snapshotting requires deciding contexts to collect data for * See http://bit.ly/sparktimetravel for more info
  • 32. 32 Conclusion  Some design patterns for avoiding online-offline discrepancies  The Sentinel  The Hulk  The Lumberjack  The Online Archive  The Time Machine  What useful patterns do you see for ML systems?  Share them!
  • 33. 33 Thank You Justin Basilico jbasilico@netflix.com @JustinBasilico