SlideShare a Scribd company logo
Parameter Server Approach for
Online Learning @ Twitter
Joe Xie, Yong Wang and Yue Lu
ML Infra Group, Ads Prediction Team
Oct 10, 2017
Outline
• Background
– Online learning
– Challenges
• Parameter Server Approaches
– v1.0 Decouple the training and prediction
– v2.0 Scale the training
– v3.0 Scale the model
• Future Directions
Background
Twitter is Realtime
• Twitter is all about real-time: news, events, trends,
hashtags.
– Users interest and intent change in realtime.
– Context changes in realtime.
– New advertisers, new campaigns are added in realtime.
• ML is increasingly at the core of everything we build at
Twitter
– ML model dynamically adapts to changes spanning as short as a few
hours even minutes
Real time:
Time
Model
Data Stream
Prediction Stream
Time
Model
Data Stream
Prediction Stream
Online Learning Offline Learning
Learning Phase Training Phase Serving Phase
ReadWriteRead &
Write
Read &
Write
Real time – Online Learning
Architecture
Simple and efficient for Ads Prediction and
Moments Relevance production services
Challenges
• Network fanout
– The same traffic stream is sent many times over to each prediction
instance, wasting network bandwidth.
• Limit to training traffic size
–Online training throughput is currently limited by the capacity (CPU /
Network bandwidth) of a single mesos worker
• Limit to model size
– All model are hosted within the memory for each instance.
Parameter Server Approaches
Model Architecture
Raw Features
Raw Features Feature Crosses Decision Tree
(e.g., XGBoost...)
Neural Network
(e.g., Torch,
TensorFlow...)
...
Distributed Large-scale Online Logistic Regression
(Parameter Server)
● Fully explore the feature interaction
w/o training latency constraint.
● The feature interactions don’t
change frequently historically.
● Flexible architecture with new model
structure & external machine
learning framework.
20X training data
- Parameter server v2.0 to scale the
training traffic
10X features+algo complexity
- Parameter server v3.0 to scale the
model size
10X prediction qps
- Parameter server v1.0 to decouple
the training and prediction requests
Parameter Server Approaches
Parameter Server v1.0
Training
Worker
Training
Traffic
Observation
Service
Observation
Service
Observation
Workers
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
ServicePrediction
Worker
Pull Model
Model
Model
Pull
Downsampling
Through
■ New architecture to decouple
the training / prediction services
into different clusters.
10X Prediction capacity
Higher Serving efficiency
Prediction
Requests
Updates
Downsampling
Parameter Server v1.0
• Separated training service
–Take training traffic to generate incremental model update
• New observation service
– Consume incremental model update
– Evaluate training traffic for model quality assurance
• Separated prediction service
– Consume incremental model update
– Serve the prediction request
Parameter Server v1.0
• Launched into ads engagement
prediction models.
– Mesos Efficiency: 40% reduction in CPU cores
required.
– Network Efficiency: 60% reduction in fan-out
messages required.
Parameter Server v2.0
Parameter
Server
Mo
del
Instance of
Prediction
Service Mo
del
Training
Workers
Training
Traffic
Observation
Service
Observation
Service
Observation
Worker
NO downsamplingPull
Push/Pull
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
M
od
el
Instance of
Prediction
ServicePrediction
Workers
Pull
Model
ModelModel
Model
Through
■ New architecture to
distribute the training
20X Training data
Higher model quality
Dispatch
Workers
Dispatch
Workers
Dispatch
Workers
Downsampling
Prediction
Requests
Parameter Server v2.0
• New dispatch service
–Take un-sampled training traffic and dispatch to training service
• Updated training service
–Take training traffic and produce updates for parameter service
–Receive model update from parameter service
• New parameter service
– Aggregate the updates from training services
– Send model update to training / observation / prediction services
Parameter Server v2.0
• Launched into ads engagement
prediction models.
• First version using simple model-average
aggregation.
–20x training capacity
–xx% model quality gain
Parameter Server v3.0
Mo
del
Instance of
Prediction
Service Mo
del
Training
Workers
Training
Traffic
Observation
Service
Observation
Service
Observation
Worker
NO downsamplingPull
Push/Pull
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
Instance of
Prediction
Service
M
od
el
M
od
el
Instance of
Prediction
ServicePrediction
Workers
Pull
Model
ModelModel
Model
Dispatch
Workers
Dispatch
Workers
Dispatch
Workers
Downsampling
Prediction
RequestsParameter
Server
Parameter
Server
Parameter
Server
Model
Through
■ New architecture for
model / feature sharding
More complex model
Higher model quality
Parameter Server v3.0
• Updated parameter service (In progress)
–Model sharding: Parameter instance hosts single model instead of
multiple models.
•xx% model quality gain in experimentation.
–Feature sharding: Parameter instance hosts partial of single model.
Future Directions
Future Works
•
•
Parameter Server Approach for Online Learning at Twitter

More Related Content

What's hot

Dropbox Talk at Netflix ML Platform Meetup Spe 2019
Dropbox Talk at Netflix ML Platform Meetup Spe 2019Dropbox Talk at Netflix ML Platform Meetup Spe 2019
Dropbox Talk at Netflix ML Platform Meetup Spe 2019
Faisal Siddiqi
 
Wizard Driven AI Anomaly Detection with Databricks in Azure
Wizard Driven AI Anomaly Detection with Databricks in AzureWizard Driven AI Anomaly Detection with Databricks in Azure
Wizard Driven AI Anomaly Detection with Databricks in Azure
Databricks
 
Fully Utilizing Spark for Data Validation
Fully Utilizing Spark for Data ValidationFully Utilizing Spark for Data Validation
Fully Utilizing Spark for Data Validation
Databricks
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large State
Flink Forward
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine Learning
David Stein
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Top use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache KafkaTop use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafka
confluent
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
Data Con LA
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Flink Forward
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at Scale
Seldon
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
markgrover
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
HostedbyConfluent
 
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationFlink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Matthew Ring
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index File
Navis Ryu
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
Vinay Kumar Chella
 

What's hot (20)

Dropbox Talk at Netflix ML Platform Meetup Spe 2019
Dropbox Talk at Netflix ML Platform Meetup Spe 2019Dropbox Talk at Netflix ML Platform Meetup Spe 2019
Dropbox Talk at Netflix ML Platform Meetup Spe 2019
 
Wizard Driven AI Anomaly Detection with Databricks in Azure
Wizard Driven AI Anomaly Detection with Databricks in AzureWizard Driven AI Anomaly Detection with Databricks in Azure
Wizard Driven AI Anomaly Detection with Databricks in Azure
 
Fully Utilizing Spark for Data Validation
Fully Utilizing Spark for Data ValidationFully Utilizing Spark for Data Validation
Fully Utilizing Spark for Data Validation
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large State
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine Learning
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Top use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache KafkaTop use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafka
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at Scale
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationFlink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data Constellation
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index File
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
 

Viewers also liked

Horovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made EasyHorovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made Easy
Alexander Sergeev
 
Large-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at FacebookLarge-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at Facebook
Faisal Siddiqi
 
2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...
Ed Chi
 
Olivier Mathiot - Rakuten PriceMinister - Extrait Livre Blanc 100 #PortraitDe...
Olivier Mathiot - Rakuten PriceMinister - Extrait Livre Blanc 100 #PortraitDe...Olivier Mathiot - Rakuten PriceMinister - Extrait Livre Blanc 100 #PortraitDe...
Olivier Mathiot - Rakuten PriceMinister - Extrait Livre Blanc 100 #PortraitDe...
Sébastien Bourguignon
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Sujit Pal
 
Understanding Feature Space in Machine Learning
Understanding Feature Space in Machine LearningUnderstanding Feature Space in Machine Learning
Understanding Feature Space in Machine Learning
Alice Zheng
 

Viewers also liked (6)

Horovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made EasyHorovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made Easy
 
Large-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at FacebookLarge-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at Facebook
 
2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...
 
Olivier Mathiot - Rakuten PriceMinister - Extrait Livre Blanc 100 #PortraitDe...
Olivier Mathiot - Rakuten PriceMinister - Extrait Livre Blanc 100 #PortraitDe...Olivier Mathiot - Rakuten PriceMinister - Extrait Livre Blanc 100 #PortraitDe...
Olivier Mathiot - Rakuten PriceMinister - Extrait Livre Blanc 100 #PortraitDe...
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Understanding Feature Space in Machine Learning
Understanding Feature Space in Machine LearningUnderstanding Feature Space in Machine Learning
Understanding Feature Space in Machine Learning
 

Similar to Parameter Server Approach for Online Learning at Twitter

ML Model Serving at Twitter
ML Model Serving at TwitterML Model Serving at Twitter
ML Model Serving at Twitter
Zhiyong (Joe) Xie
 
ICML'16 Scaling ML System@Twitter
ICML'16 Scaling ML System@TwitterICML'16 Scaling ML System@Twitter
ICML'16 Scaling ML System@TwitterJack Xiaojiang Guo
 
Scaling ml @ careem (oreilly ai conf)
Scaling ml @ careem (oreilly ai conf)Scaling ml @ careem (oreilly ai conf)
Scaling ml @ careem (oreilly ai conf)
Ahmed Kamal
 
Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013
Min Fang
 
Automation & Professional Services
Automation & Professional ServicesAutomation & Professional Services
Automation & Professional Services
MarketingArrowECS_CZ
 
PureApplication: System, Service, Software
PureApplication: System, Service, SoftwarePureApplication: System, Service, Software
PureApplication: System, Service, Software
Prolifics
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Paul Brebner
 
BT Group: Use of Graph in VENA (a smart broadcast network)
BT Group: Use of Graph in VENA (a smart broadcast network)BT Group: Use of Graph in VENA (a smart broadcast network)
BT Group: Use of Graph in VENA (a smart broadcast network)
Neo4j
 
Transform Enterprise IT Infrastructure with AWS DevOps
Transform Enterprise IT Infrastructure with AWS DevOpsTransform Enterprise IT Infrastructure with AWS DevOps
Transform Enterprise IT Infrastructure with AWS DevOps
Amazon Web Services
 
14
1414
14
1414
DEVNET-1153 Enterprise Application to Infrastructure Integration – SDN Apps
DEVNET-1153	Enterprise Application to Infrastructure Integration – SDN AppsDEVNET-1153	Enterprise Application to Infrastructure Integration – SDN Apps
DEVNET-1153 Enterprise Application to Infrastructure Integration – SDN Apps
Cisco DevNet
 
Practical soa for business and researchers
Practical soa for business and researchersPractical soa for business and researchers
Practical soa for business and researchers
Mustafa Gamal
 
Enterprise Application to Infrastructure Integration - SDN Apps
Enterprise Application to Infrastructure Integration - SDN AppsEnterprise Application to Infrastructure Integration - SDN Apps
Enterprise Application to Infrastructure Integration - SDN Apps
MiftakhZein1
 
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Prasanna Hegde
 
How to improve customer experience with a self organizing network
How to improve customer experience with a self organizing networkHow to improve customer experience with a self organizing network
How to improve customer experience with a self organizing networkComarch
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
Lviv Startup Club
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
Edunomica
 
How to Revamp your Legacy Applications For More Agility and Better Service - ...
How to Revamp your Legacy Applications For More Agility and Better Service - ...How to Revamp your Legacy Applications For More Agility and Better Service - ...
How to Revamp your Legacy Applications For More Agility and Better Service - ...
NRB
 
Service Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay KidService Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay Kid
MyNOG
 

Similar to Parameter Server Approach for Online Learning at Twitter (20)

ML Model Serving at Twitter
ML Model Serving at TwitterML Model Serving at Twitter
ML Model Serving at Twitter
 
ICML'16 Scaling ML System@Twitter
ICML'16 Scaling ML System@TwitterICML'16 Scaling ML System@Twitter
ICML'16 Scaling ML System@Twitter
 
Scaling ml @ careem (oreilly ai conf)
Scaling ml @ careem (oreilly ai conf)Scaling ml @ careem (oreilly ai conf)
Scaling ml @ careem (oreilly ai conf)
 
Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013Service Virtualization - Next Gen Testing Conference Singapore 2013
Service Virtualization - Next Gen Testing Conference Singapore 2013
 
Automation & Professional Services
Automation & Professional ServicesAutomation & Professional Services
Automation & Professional Services
 
PureApplication: System, Service, Software
PureApplication: System, Service, SoftwarePureApplication: System, Service, Software
PureApplication: System, Service, Software
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
 
BT Group: Use of Graph in VENA (a smart broadcast network)
BT Group: Use of Graph in VENA (a smart broadcast network)BT Group: Use of Graph in VENA (a smart broadcast network)
BT Group: Use of Graph in VENA (a smart broadcast network)
 
Transform Enterprise IT Infrastructure with AWS DevOps
Transform Enterprise IT Infrastructure with AWS DevOpsTransform Enterprise IT Infrastructure with AWS DevOps
Transform Enterprise IT Infrastructure with AWS DevOps
 
14
1414
14
 
14
1414
14
 
DEVNET-1153 Enterprise Application to Infrastructure Integration – SDN Apps
DEVNET-1153	Enterprise Application to Infrastructure Integration – SDN AppsDEVNET-1153	Enterprise Application to Infrastructure Integration – SDN Apps
DEVNET-1153 Enterprise Application to Infrastructure Integration – SDN Apps
 
Practical soa for business and researchers
Practical soa for business and researchersPractical soa for business and researchers
Practical soa for business and researchers
 
Enterprise Application to Infrastructure Integration - SDN Apps
Enterprise Application to Infrastructure Integration - SDN AppsEnterprise Application to Infrastructure Integration - SDN Apps
Enterprise Application to Infrastructure Integration - SDN Apps
 
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
 
How to improve customer experience with a self organizing network
How to improve customer experience with a self organizing networkHow to improve customer experience with a self organizing network
How to improve customer experience with a self organizing network
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
How to Revamp your Legacy Applications For More Agility and Better Service - ...
How to Revamp your Legacy Applications For More Agility and Better Service - ...How to Revamp your Legacy Applications For More Agility and Better Service - ...
How to Revamp your Legacy Applications For More Agility and Better Service - ...
 
Service Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay KidService Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay Kid
 

Recently uploaded

Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
itech2017
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
Kamal Acharya
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 

Recently uploaded (20)

Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 

Parameter Server Approach for Online Learning at Twitter

  • 1. Parameter Server Approach for Online Learning @ Twitter Joe Xie, Yong Wang and Yue Lu ML Infra Group, Ads Prediction Team Oct 10, 2017
  • 2. Outline • Background – Online learning – Challenges • Parameter Server Approaches – v1.0 Decouple the training and prediction – v2.0 Scale the training – v3.0 Scale the model • Future Directions
  • 4. Twitter is Realtime • Twitter is all about real-time: news, events, trends, hashtags. – Users interest and intent change in realtime. – Context changes in realtime. – New advertisers, new campaigns are added in realtime. • ML is increasingly at the core of everything we build at Twitter – ML model dynamically adapts to changes spanning as short as a few hours even minutes
  • 5. Real time: Time Model Data Stream Prediction Stream Time Model Data Stream Prediction Stream Online Learning Offline Learning Learning Phase Training Phase Serving Phase ReadWriteRead & Write Read & Write
  • 6. Real time – Online Learning Architecture Simple and efficient for Ads Prediction and Moments Relevance production services
  • 7. Challenges • Network fanout – The same traffic stream is sent many times over to each prediction instance, wasting network bandwidth. • Limit to training traffic size –Online training throughput is currently limited by the capacity (CPU / Network bandwidth) of a single mesos worker • Limit to model size – All model are hosted within the memory for each instance.
  • 9. Model Architecture Raw Features Raw Features Feature Crosses Decision Tree (e.g., XGBoost...) Neural Network (e.g., Torch, TensorFlow...) ... Distributed Large-scale Online Logistic Regression (Parameter Server) ● Fully explore the feature interaction w/o training latency constraint. ● The feature interactions don’t change frequently historically. ● Flexible architecture with new model structure & external machine learning framework.
  • 10. 20X training data - Parameter server v2.0 to scale the training traffic 10X features+algo complexity - Parameter server v3.0 to scale the model size 10X prediction qps - Parameter server v1.0 to decouple the training and prediction requests Parameter Server Approaches
  • 11. Parameter Server v1.0 Training Worker Training Traffic Observation Service Observation Service Observation Workers Instance of Prediction Service M od el Instance of Prediction Service M od el Instance of Prediction ServicePrediction Worker Pull Model Model Model Pull Downsampling Through ■ New architecture to decouple the training / prediction services into different clusters. 10X Prediction capacity Higher Serving efficiency Prediction Requests Updates Downsampling
  • 12. Parameter Server v1.0 • Separated training service –Take training traffic to generate incremental model update • New observation service – Consume incremental model update – Evaluate training traffic for model quality assurance • Separated prediction service – Consume incremental model update – Serve the prediction request
  • 13. Parameter Server v1.0 • Launched into ads engagement prediction models. – Mesos Efficiency: 40% reduction in CPU cores required. – Network Efficiency: 60% reduction in fan-out messages required.
  • 14. Parameter Server v2.0 Parameter Server Mo del Instance of Prediction Service Mo del Training Workers Training Traffic Observation Service Observation Service Observation Worker NO downsamplingPull Push/Pull Instance of Prediction Service M od el Instance of Prediction Service M od el Instance of Prediction Service M od el Instance of Prediction Service M od el M od el Instance of Prediction ServicePrediction Workers Pull Model ModelModel Model Through ■ New architecture to distribute the training 20X Training data Higher model quality Dispatch Workers Dispatch Workers Dispatch Workers Downsampling Prediction Requests
  • 15. Parameter Server v2.0 • New dispatch service –Take un-sampled training traffic and dispatch to training service • Updated training service –Take training traffic and produce updates for parameter service –Receive model update from parameter service • New parameter service – Aggregate the updates from training services – Send model update to training / observation / prediction services
  • 16. Parameter Server v2.0 • Launched into ads engagement prediction models. • First version using simple model-average aggregation. –20x training capacity –xx% model quality gain
  • 17. Parameter Server v3.0 Mo del Instance of Prediction Service Mo del Training Workers Training Traffic Observation Service Observation Service Observation Worker NO downsamplingPull Push/Pull Instance of Prediction Service M od el Instance of Prediction Service M od el Instance of Prediction Service M od el Instance of Prediction Service M od el M od el Instance of Prediction ServicePrediction Workers Pull Model ModelModel Model Dispatch Workers Dispatch Workers Dispatch Workers Downsampling Prediction RequestsParameter Server Parameter Server Parameter Server Model Through ■ New architecture for model / feature sharding More complex model Higher model quality
  • 18. Parameter Server v3.0 • Updated parameter service (In progress) –Model sharding: Parameter instance hosts single model instead of multiple models. •xx% model quality gain in experimentation. –Feature sharding: Parameter instance hosts partial of single model.