SlideShare a Scribd company logo
Bighead
Airbnb’s End-to-End Machine
Learning Infrastructure
Krishna Puttaswamy & Nick Handel
On behalf of ML Infra @ Airbnb
Context
In 2016
● Only major models in production
● Models took on average 8 weeks to build (source: survey of ML producers)
● Everything built in Aerosolve, Spark and Scala
● No support for Tensorflow, PyTorch, SK-Learn or other popular ML packages
● Significant discrepancies between offline and online data
ML Infra was formed with the charter to:
● Enable more users to build ML products
● Reduce time and effort
● Enable easier model evaluation
Q4 2016: Formation of our ML Infra team
Before ML
Infrastructure
ML has had a massive impact on Airbnb’s
product
● Search Ranking
● Smart Pricing
● Trust
● Paid Growth
● …And a few other major models
After ML
Infrastructure
But there were many other areas that had
high-potential for ML, but were realized less of
that potential.
● Paid Growth - Hosts
● Classifying listing
● Experience Ranking + Personalization
● Host Availability
● Business Travel Classifier
● Room Type Categorizations
● Make Listing a Space Easier
● Customer Service Ticket Routing
● … And many more
Vision
Airbnb routinely ships ML-powered features throughout the
product.
Mission
Equip Airbnb with shared technology to build
production-ready ML applications with no incidental
complexity.
(Technology = tools, platforms, knowledge, shared feature data, etc.)
Value of ML
Infrastructure
Machine Learning Infrastructure can:
● Remove incidental complexities, by providing
generic, reusable solutions
● Simplify the workflow for intrinsic
complexities, by providing tooling, libraries,
and environments that make ML
development more efficient
And at the same time:
● Establish a standardized platform that
enables cross-company sharing of feature
data and model components
● “Make it easy to do the right thing” (ex:
consistent training/streaming/scoring logic)
Bighead: Motivations
Learnings:
● No consistency between ML Workflows
● New teams struggle to begin using ML
● Airbnb has a wide variety in ML applications
● Existing ML workflows are slow, fragmented, and brittle
● Incidental complexity vs. intrinsic complexity
● Build and forget - ML as a linear process
Q1 2017: Figuring out what to build
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Architecture
● Consistent environment across the stack
○ Use Docker
● Common workflow across different ML frameworks
○ Supports Scikit-learn, TF, PyTorch, etc.
● Modular components
○ Easy to customize parts
○ Easy to share data/pipelines
Key Design Decisions
Architecture
Architecture
Architecture
Architecture
Architecture
Components
air/mlinfravision
● Data Management: Zipline
● Training: Redspot / BigQueue
● Core ML Library: Bighead libraries
● Productionization: Deep Thought (online) / ML Automator (offline)
● Model Management: Model Repo
● Monitoring: Model Repo UI
Zipline (ML Data Management Framework)
Zipline - Why
● Defining features (especially windowed) with hive was complicated and error
prone
● Backfilling training sets (on inefficient hive queries) was a major bottleneck
● No feature sharing
● Inconsistent offline and online datasets
● Warehouse is built as of end-of-day, lacked point-in-time features
● ML data pipelines lacked data quality checks or monitoring
● Ownership of pipelines was in disarray
A data management platform for ML
● Common (and simple) definition: Define the feature once and use it in batch
and streaming
● Training data backfills: Resource efficient and point-in-time correct with
scheduled updates
● Lambda updates: Features available both offline and online
● Data quality: Feature visualizations and automatic data quality monitoring
Zipline - Overview
Zipline - Feature definition language
Primary Key
Timestamp
Owner
Operation = Sum
Time windows
● Owner allows us
to trace
accountability
● Primary keys and
timestamp are
used to guarantee
point in time
correctness in
Training Set
● Operations and
time windows are
optional
● Spark efficiently
handles
aggregations
(windowed and
not)
Zipline - Data Quality and Collaboration
● Features can be
visualized and
browsed through
online editor
● Gives stats on
feature, and also
provides info on
ownership
Zipline - Training Data
PK1 = User ID PK2 = Listing ID Timestamp bookings_by_user bookings_by_listing
123 456 2018-01-01 23... 0 4
234 567 2018-01-04 01... 2 8
456 789 2018-01-02 08... 1 0
User provides: Primary keys, timestamps, list of features
Zipline computes feature values
point-in-time correct for those PKs and
those timestamps. And joins them
together.
FeatureSet 1 FeatureSet 2
Zipline - Training Data
Airflow integration for daily
update of training data
Label logic
● Labels are often
joined to features with
an offset for training
(60 days offset)
● But that offset does
not apply to scoring
data
Zipline - Training Data with Labels
ds=2017-08-16
ds=2017-10-15
???
Features Table Labels Table
Training
...
???
ds=2017-10-15
Scoring
Features served
from online KV
store
Zipline schedules
daily batch
correction
Zipline - Consistent online and offline features
User writes one conf
Zipline starts the
streaming job
● More efficient cluster usage: Hive and Spark jobs are optimized; Many weeks
to create training data backfills => a few hours
● Ease of use: Can define 100s of new features in a few hours (from many days)
● Online scoring with lambda: Features are automatically availability in online
scoring environment
● Collaboration: Many features are shared!
● Management: Clear data ownership and maintenance
Zipline - Impact
Redspot (Hosted Jupyter Notebook Service)
Architecture
● Started with Jupyterhub (open-source project), which manages multiple Jupyter
Notebook Servers (prototyping environment)
● But users were installing packages locally, and then creating virtualenv for
other parts of our infra
○ Environment was very fragile
● Users wanted to be able to use jupyterhub on larger instances or instances
with GPU
● Wanting to share notebooks with other teammates was common too
Redspot - Why
Containerized environments
● Every user’s environment is containerized via docker
○ Allows customizing the notebook environment without
affecting other users
■ e.g. install system/python packages
○ Easier to restore state therefore helps with reproducibility
● Support using custom docker images
○ Base images based on user’s needs
■ e.g. GPU access, pre-installed ML packages
○ Build your own image for a faster start time
Remote Instance Spawner
● For bigger jobs and total isolation,
Redspot allows launching a dedicated
instance
● Hardware resources not shared with
other users
● Automatically terminates idle instances
periodically
● A multi-tenant notebook environment
● Makes it easy to iterate and prototype ML models, share work
○ Integrated with the rest of our infra - so one can deploy a notebook to prod
● Improved upon open source Jupyterhub
○ Containerized; can bring custom Docker env
○ Remote notebook spawner for dedicated instances (P3 and X1 machines on
AWS)
○ Persist notebooks in EFS and share with teams
○ Reverting to prior checkpoint
Redspot Summary
Deep Thought (Online Inference Service)
Architecture
● Performant, scalable execution of model inference in production is hard
○ Engineers shouldn’t build one off solutions for every model.
○ Data scientists should be able to launch new models in production with minimal
eng involvement.
● Debugging differences between online inference and training are difficult
○ We should support the exact serialized version of the model the data scientist
built
○ We should be able to run the same python transformations data scientists write
for training.
○ We should be able to load data computed in the warehouse or streaming easily
into online scoring.
Deep Thought - Why
● Deep Thought is a shared service for online inference
○ Support for pickled sklearn models, TensorFlow models, and custom code in
python or Java
○ Add your model configuration to a file and deploy. Completely config driven so data
scientists don’t have to involve engineers to launch new models.
○ Engineers can then connect to a REST API from other services to get scores.
○ Support for loading data from K/V stores
○ Standardized logging, alerting and dashboarding for monitoring and offline
analysis of model performance
○ Process isolation to enable multi-tenancy without contention
○ Scalable and Reliable: 80+ models. Highest QPS service at Airbnb. Median response
time: 4ms. p95: 13ms.
Deep Thought - How
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Model Repo
Model Repo
Overview
Model Repo is Bighead’s model management service
● Contains prototype and production models
● Can serve models “raw” or trained
● The source of truth on which trained models are
in production
● Stores model health data
Model Repo
Internals
We decompose Models into two components:
● Model Version - raw model code + docker image
● Model Artifact - parameters learned via training
Model
Version
Model Artifact
Code
Docker
Image
A trained model consists of:
Model Version
+
Model Artifact
Production
Our built-in UI provides:
● Deployment - review changes, deploy, and rollback trained models
● Model Health - metrics, visualizations, alerting, central dashboard
● Experimentation - Ability to setup model experiments - e.g. split traffic
between two or more models
Model Repo: UI
ML Automator
● Tools and libraries for common tasks
○ Periodic training, evaluation and scoring on a model is common: Building Airflow
DAGs, uploading scores to K/V stores, dashboards on scores, alert on score changes
○ Scoring on large tables is tricky to scale
ML Automator - Why
● Once a model file is checked in, we generate the DAGs automatically to train/score it
● 40+ models using this feature
● Score on Spark for large datasets (we generate virtualenv equivalent to the docker image,
as spark doesn’t run executors in docker image)
ML Automator
Core Libraries
ML Helpers - Why
● Transformations are re-written too often
○ There are many versions of transformations for NLP, data cleaning, imputing, etc.
○ Models used to “start from scratch” and rebuild the same things
○ Model observability -- understand what features are important
● Library of transformations; holds more than 50 different transformations including
automated preprocessing for common input formats
● Created example notebooks to show usage of our infra
○ Example usage of ML pipelines, contains diagnostics that help people debug and
improve models
○ Has been cloned and modified more than 20 times to build new models
● Improved Scikit-Learn Pipelines
○ Propagate feature metadata so we can plot feature importance at the end and
connect it to feature names
○ Pipelines for data processing are reusable in other pipelines
○ Added wrappers for model libraries (XGB, etc.) can be serialized (robust to minor
version changes)
ML Helpers and Pipelines
Open Source in H1 2018
If you want to collaborate we can provide
early access
nick.handel@airbnb.com
krishna.puttaswamy@airbnb.com
Appendix
ML models have diverse dependency sets (tensorflow,
xgboost, etc.). We allow users to provide a docker image
within which model code always runs.
ML models don’t run in isolation however, so we’ve built a
lightweight API to interact with the “dockerized model”
Docker Container
Model
(user code)
Other ML
Infra
Services
Model
API
Dockerized
Models

More Related Content

What's hot

Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 
Process Mining and Predictive Process Monitoring in Apromore
Process Mining and Predictive Process Monitoring in ApromoreProcess Mining and Predictive Process Monitoring in Apromore
Process Mining and Predictive Process Monitoring in Apromore
Marlon Dumas
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
Pieter de Bruin
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
Databricks
 
.NET Conference 2020 - Introduction to Azure Form Recognizer
.NET Conference 2020 - Introduction to Azure Form Recognizer.NET Conference 2020 - Introduction to Azure Form Recognizer
.NET Conference 2020 - Introduction to Azure Form Recognizer
Teerasej Jiraphatchandej
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
C. Scyphers
 
Wix's ML Platform
Wix's ML PlatformWix's ML Platform
Wix's ML Platform
Ran Romano
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
James Serra
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
Saurabh Kaushik
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
Databricks
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
Andre Muscat
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-Source
Databricks
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
Rama Irsheidat
 
Introduction to Robotic Process Automation (rpa) and RPA Case Study
Introduction to Robotic Process Automation (rpa) and RPA Case StudyIntroduction to Robotic Process Automation (rpa) and RPA Case Study
Introduction to Robotic Process Automation (rpa) and RPA Case Study
ALTEN Calsoft Labs
 
Machine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual WorkshopMachine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual Workshop
CCG
 
Azure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopAzure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshop
Parashar Shah
 

What's hot (20)

Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Process Mining and Predictive Process Monitoring in Apromore
Process Mining and Predictive Process Monitoring in ApromoreProcess Mining and Predictive Process Monitoring in Apromore
Process Mining and Predictive Process Monitoring in Apromore
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
.NET Conference 2020 - Introduction to Azure Form Recognizer
.NET Conference 2020 - Introduction to Azure Form Recognizer.NET Conference 2020 - Introduction to Azure Form Recognizer
.NET Conference 2020 - Introduction to Azure Form Recognizer
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Wix's ML Platform
Wix's ML PlatformWix's ML Platform
Wix's ML Platform
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-Source
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
 
Introduction to Robotic Process Automation (rpa) and RPA Case Study
Introduction to Robotic Process Automation (rpa) and RPA Case StudyIntroduction to Robotic Process Automation (rpa) and RPA Case Study
Introduction to Robotic Process Automation (rpa) and RPA Case Study
 
Machine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual WorkshopMachine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual Workshop
 
Azure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopAzure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshop
 

Similar to ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure

AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
Karthik Murugesan
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
Chester Chen
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
Adam Gibson
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
vitm11
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Akash Tandon
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stage
Nick Handel
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
Dani Solà Lagares
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
Henry Saputra
 
Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...
Pôle Systematic Paris-Region
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scale
Henry Saputra
 
DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
Matthew Reynolds
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.com
Mathieu Dumoulin
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.io
Máté Lang
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
Awantik Das
 
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Pôle Systematic Paris-Region
 
Serverless Functions and Machine Learning: Putting the AI in APIs
Serverless Functions and Machine Learning: Putting the AI in APIsServerless Functions and Machine Learning: Putting the AI in APIs
Serverless Functions and Machine Learning: Putting the AI in APIs
Nordic APIs
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
DataScienceConferenc1
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
FIWARE
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
GetInData
 

Similar to ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure (20)

AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stage
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
 
Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scale
 
DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.com
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.io
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
 
Serverless Functions and Machine Learning: Putting the AI in APIs
Serverless Functions and Machine Learning: Putting the AI in APIsServerless Functions and Machine Learning: Putting the AI in APIs
Serverless Functions and Machine Learning: Putting the AI in APIs
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
 

Recently uploaded

Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
simran hot girls
 
TEQnation 2024: Sustainable Software: May the Green Code Be with You
TEQnation 2024: Sustainable Software: May the Green Code Be with YouTEQnation 2024: Sustainable Software: May the Green Code Be with You
TEQnation 2024: Sustainable Software: May the Green Code Be with You
marcofolio
 
Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...
Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...
Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...
902basic
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
ThousandEyes
 
ERP Software Solutions Provider in Coimbatore
ERP Software Solutions Provider in CoimbatoreERP Software Solutions Provider in Coimbatore
ERP Software Solutions Provider in Coimbatore
Nextskill Technologies
 
High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...
High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...
High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...
singhlata50dh
 
Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...
45unexpected
 
Amadeus Travel API, Amadeus Booking API, Amadeus GDS
Amadeus Travel API, Amadeus Booking API, Amadeus GDSAmadeus Travel API, Amadeus Booking API, Amadeus GDS
Amadeus Travel API, Amadeus Booking API, Amadeus GDS
aadhiyaeliza
 
GT degree offer diploma Transcript
GT degree offer diploma TranscriptGT degree offer diploma Transcript
GT degree offer diploma Transcript
attueb
 
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
Srinivas Dukka
 
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
87tomato
 
Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...
Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...
Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...
dream girl
 
ThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and DjangoThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and Django
akshesh doshi
 
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
norina2645
 
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
revolutionary575
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
neshakor5152
 
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
OnePlan Solutions
 
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdfAI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
Daniel Zivkovic
 
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
ashiklo9823
 

Recently uploaded (20)

Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
 
TEQnation 2024: Sustainable Software: May the Green Code Be with You
TEQnation 2024: Sustainable Software: May the Green Code Be with YouTEQnation 2024: Sustainable Software: May the Green Code Be with You
TEQnation 2024: Sustainable Software: May the Green Code Be with You
 
Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...
Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...
Private Girls Call Navi Mumbai 🛵🚡9820252231 💃 Choose Best And Top Girl Servic...
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
 
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
 
ERP Software Solutions Provider in Coimbatore
ERP Software Solutions Provider in CoimbatoreERP Software Solutions Provider in Coimbatore
ERP Software Solutions Provider in Coimbatore
 
High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...
High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...
High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...
 
Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...
 
Amadeus Travel API, Amadeus Booking API, Amadeus GDS
Amadeus Travel API, Amadeus Booking API, Amadeus GDSAmadeus Travel API, Amadeus Booking API, Amadeus GDS
Amadeus Travel API, Amadeus Booking API, Amadeus GDS
 
GT degree offer diploma Transcript
GT degree offer diploma TranscriptGT degree offer diploma Transcript
GT degree offer diploma Transcript
 
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
 
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
 
Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...
Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...
Russian Girls Call Mumbai 🛵🚡9833363713 💃 Choose Best And Top Girl Service And...
 
ThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and DjangoThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and Django
 
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
 
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
 
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
Maximizing Efficiency and Profitability: Optimizing Data Systems, Enhancing C...
 
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdfAI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
 
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
 

ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure

  • 1. Bighead Airbnb’s End-to-End Machine Learning Infrastructure Krishna Puttaswamy & Nick Handel On behalf of ML Infra @ Airbnb
  • 3. In 2016 ● Only major models in production ● Models took on average 8 weeks to build (source: survey of ML producers) ● Everything built in Aerosolve, Spark and Scala ● No support for Tensorflow, PyTorch, SK-Learn or other popular ML packages ● Significant discrepancies between offline and online data ML Infra was formed with the charter to: ● Enable more users to build ML products ● Reduce time and effort ● Enable easier model evaluation Q4 2016: Formation of our ML Infra team
  • 4. Before ML Infrastructure ML has had a massive impact on Airbnb’s product ● Search Ranking ● Smart Pricing ● Trust ● Paid Growth ● …And a few other major models
  • 5. After ML Infrastructure But there were many other areas that had high-potential for ML, but were realized less of that potential. ● Paid Growth - Hosts ● Classifying listing ● Experience Ranking + Personalization ● Host Availability ● Business Travel Classifier ● Room Type Categorizations ● Make Listing a Space Easier ● Customer Service Ticket Routing ● … And many more
  • 6. Vision Airbnb routinely ships ML-powered features throughout the product. Mission Equip Airbnb with shared technology to build production-ready ML applications with no incidental complexity. (Technology = tools, platforms, knowledge, shared feature data, etc.)
  • 7. Value of ML Infrastructure Machine Learning Infrastructure can: ● Remove incidental complexities, by providing generic, reusable solutions ● Simplify the workflow for intrinsic complexities, by providing tooling, libraries, and environments that make ML development more efficient And at the same time: ● Establish a standardized platform that enables cross-company sharing of feature data and model components ● “Make it easy to do the right thing” (ex: consistent training/streaming/scoring logic)
  • 9. Learnings: ● No consistency between ML Workflows ● New teams struggle to begin using ML ● Airbnb has a wide variety in ML applications ● Existing ML workflows are slow, fragmented, and brittle ● Incidental complexity vs. intrinsic complexity ● Build and forget - ML as a linear process Q1 2017: Figuring out what to build
  • 12. ● Consistent environment across the stack ○ Use Docker ● Common workflow across different ML frameworks ○ Supports Scikit-learn, TF, PyTorch, etc. ● Modular components ○ Easy to customize parts ○ Easy to share data/pipelines Key Design Decisions
  • 18. Components air/mlinfravision ● Data Management: Zipline ● Training: Redspot / BigQueue ● Core ML Library: Bighead libraries ● Productionization: Deep Thought (online) / ML Automator (offline) ● Model Management: Model Repo ● Monitoring: Model Repo UI
  • 19. Zipline (ML Data Management Framework)
  • 20. Zipline - Why ● Defining features (especially windowed) with hive was complicated and error prone ● Backfilling training sets (on inefficient hive queries) was a major bottleneck ● No feature sharing ● Inconsistent offline and online datasets ● Warehouse is built as of end-of-day, lacked point-in-time features ● ML data pipelines lacked data quality checks or monitoring ● Ownership of pipelines was in disarray
  • 21. A data management platform for ML ● Common (and simple) definition: Define the feature once and use it in batch and streaming ● Training data backfills: Resource efficient and point-in-time correct with scheduled updates ● Lambda updates: Features available both offline and online ● Data quality: Feature visualizations and automatic data quality monitoring Zipline - Overview
  • 22. Zipline - Feature definition language Primary Key Timestamp Owner Operation = Sum Time windows ● Owner allows us to trace accountability ● Primary keys and timestamp are used to guarantee point in time correctness in Training Set ● Operations and time windows are optional ● Spark efficiently handles aggregations (windowed and not)
  • 23. Zipline - Data Quality and Collaboration ● Features can be visualized and browsed through online editor ● Gives stats on feature, and also provides info on ownership
  • 24. Zipline - Training Data PK1 = User ID PK2 = Listing ID Timestamp bookings_by_user bookings_by_listing 123 456 2018-01-01 23... 0 4 234 567 2018-01-04 01... 2 8 456 789 2018-01-02 08... 1 0 User provides: Primary keys, timestamps, list of features Zipline computes feature values point-in-time correct for those PKs and those timestamps. And joins them together. FeatureSet 1 FeatureSet 2
  • 25. Zipline - Training Data Airflow integration for daily update of training data
  • 26. Label logic ● Labels are often joined to features with an offset for training (60 days offset) ● But that offset does not apply to scoring data Zipline - Training Data with Labels ds=2017-08-16 ds=2017-10-15 ??? Features Table Labels Table Training ... ??? ds=2017-10-15 Scoring
  • 27. Features served from online KV store Zipline schedules daily batch correction Zipline - Consistent online and offline features User writes one conf Zipline starts the streaming job
  • 28. ● More efficient cluster usage: Hive and Spark jobs are optimized; Many weeks to create training data backfills => a few hours ● Ease of use: Can define 100s of new features in a few hours (from many days) ● Online scoring with lambda: Features are automatically availability in online scoring environment ● Collaboration: Many features are shared! ● Management: Clear data ownership and maintenance Zipline - Impact
  • 29. Redspot (Hosted Jupyter Notebook Service)
  • 31. ● Started with Jupyterhub (open-source project), which manages multiple Jupyter Notebook Servers (prototyping environment) ● But users were installing packages locally, and then creating virtualenv for other parts of our infra ○ Environment was very fragile ● Users wanted to be able to use jupyterhub on larger instances or instances with GPU ● Wanting to share notebooks with other teammates was common too Redspot - Why
  • 32. Containerized environments ● Every user’s environment is containerized via docker ○ Allows customizing the notebook environment without affecting other users ■ e.g. install system/python packages ○ Easier to restore state therefore helps with reproducibility ● Support using custom docker images ○ Base images based on user’s needs ■ e.g. GPU access, pre-installed ML packages ○ Build your own image for a faster start time
  • 33. Remote Instance Spawner ● For bigger jobs and total isolation, Redspot allows launching a dedicated instance ● Hardware resources not shared with other users ● Automatically terminates idle instances periodically
  • 34. ● A multi-tenant notebook environment ● Makes it easy to iterate and prototype ML models, share work ○ Integrated with the rest of our infra - so one can deploy a notebook to prod ● Improved upon open source Jupyterhub ○ Containerized; can bring custom Docker env ○ Remote notebook spawner for dedicated instances (P3 and X1 machines on AWS) ○ Persist notebooks in EFS and share with teams ○ Reverting to prior checkpoint Redspot Summary
  • 35. Deep Thought (Online Inference Service)
  • 37. ● Performant, scalable execution of model inference in production is hard ○ Engineers shouldn’t build one off solutions for every model. ○ Data scientists should be able to launch new models in production with minimal eng involvement. ● Debugging differences between online inference and training are difficult ○ We should support the exact serialized version of the model the data scientist built ○ We should be able to run the same python transformations data scientists write for training. ○ We should be able to load data computed in the warehouse or streaming easily into online scoring. Deep Thought - Why
  • 38. ● Deep Thought is a shared service for online inference ○ Support for pickled sklearn models, TensorFlow models, and custom code in python or Java ○ Add your model configuration to a file and deploy. Completely config driven so data scientists don’t have to involve engineers to launch new models. ○ Engineers can then connect to a REST API from other services to get scores. ○ Support for loading data from K/V stores ○ Standardized logging, alerting and dashboarding for monitoring and offline analysis of model performance ○ Process isolation to enable multi-tenancy without contention ○ Scalable and Reliable: 80+ models. Highest QPS service at Airbnb. Median response time: 4ms. p95: 13ms. Deep Thought - How
  • 41. Model Repo Overview Model Repo is Bighead’s model management service ● Contains prototype and production models ● Can serve models “raw” or trained ● The source of truth on which trained models are in production ● Stores model health data
  • 42. Model Repo Internals We decompose Models into two components: ● Model Version - raw model code + docker image ● Model Artifact - parameters learned via training Model Version Model Artifact Code Docker Image A trained model consists of: Model Version + Model Artifact Production
  • 43. Our built-in UI provides: ● Deployment - review changes, deploy, and rollback trained models ● Model Health - metrics, visualizations, alerting, central dashboard ● Experimentation - Ability to setup model experiments - e.g. split traffic between two or more models Model Repo: UI
  • 45. ● Tools and libraries for common tasks ○ Periodic training, evaluation and scoring on a model is common: Building Airflow DAGs, uploading scores to K/V stores, dashboards on scores, alert on score changes ○ Scoring on large tables is tricky to scale ML Automator - Why
  • 46. ● Once a model file is checked in, we generate the DAGs automatically to train/score it ● 40+ models using this feature ● Score on Spark for large datasets (we generate virtualenv equivalent to the docker image, as spark doesn’t run executors in docker image) ML Automator
  • 48. ML Helpers - Why ● Transformations are re-written too often ○ There are many versions of transformations for NLP, data cleaning, imputing, etc. ○ Models used to “start from scratch” and rebuild the same things ○ Model observability -- understand what features are important
  • 49. ● Library of transformations; holds more than 50 different transformations including automated preprocessing for common input formats ● Created example notebooks to show usage of our infra ○ Example usage of ML pipelines, contains diagnostics that help people debug and improve models ○ Has been cloned and modified more than 20 times to build new models ● Improved Scikit-Learn Pipelines ○ Propagate feature metadata so we can plot feature importance at the end and connect it to feature names ○ Pipelines for data processing are reusable in other pipelines ○ Added wrappers for model libraries (XGB, etc.) can be serialized (robust to minor version changes) ML Helpers and Pipelines
  • 50. Open Source in H1 2018 If you want to collaborate we can provide early access nick.handel@airbnb.com krishna.puttaswamy@airbnb.com
  • 52. ML models have diverse dependency sets (tensorflow, xgboost, etc.). We allow users to provide a docker image within which model code always runs. ML models don’t run in isolation however, so we’ve built a lightweight API to interact with the “dockerized model” Docker Container Model (user code) Other ML Infra Services Model API Dockerized Models