SlideShare a Scribd company logo
1 of 15
ICML 16’
Scaling Machine Learning @Twitter
Jack Guo
ML is Important
• ~80% of DAU is attributed to teams doing
ML
• ~90% of revenue comes from ads backed
by ML models
• ML platform supports many teams
– ads ranking, ads targeting, timeline ranking,
anti-spam, recommendation, moments
ranking, trends
ML is Large Scale
• Take ads ranking as an example
– Trillions of predictions made daily
– Hundreds of millions of weights per model
– Thousands of features per example
– TB of training data
ML is Realtime
• Twitter is all about realtime: news, events,
videos, trends
• Advertiser campaign targets realtime
event, hashtags, spanning as short as a
few hours even minutes
• ML needs to adapt to dynamically
changing traffic
Scaling Challenges
• Organization scaling
– How to support client team efficiently?
• System scaling
– How to train and make inference efficiently?
– How to enable fast iteration and experimentation?
Organization Scaling
• ML platform’s focus
– Define feature, transform, model format
– Provide framework and tooling
• data ETL, trainers, parameter search, serving runtime,
workflow management
– Onboard client and provide support
• Client team’s focus
– Define and extract features
– Own and maintain training pipeline and serving
runtime
Standardized Feature Format
• Enable feature sharing across teams
• Make ML platform iteration easy
• Feature format
– Support 4 dense, 2 sparse feature types
– Use hashed id instead of string name for
efficient serialization, storage, compute
– Collocate schema (id to name mapping) with
the data
TrainingAPI
• Make operations on distributed data
painless for ML practitioners
• Scala data ETL API
– Provide powerful abstractions for ML datasets
and operations
– Fluent API, enabling imperative programming
• Ensure data and metadata consistency
through operations
Example
1. Take my dataset whose path given by “input”
2. Sample it by 10% randomly
3. Discretize with the given discretizer
4. Left join with media label on tweet id
5. Dump the result to path given by “output”
PredictionEngine
• Large scale online SGD
learning
• Architecture
– Transform: MDL, Decision tree
– Feature crossing
– Logistic Regression: Vowpal
Wabbit or in-house JVM learner
Transform
Transform
Transform
Cross
Logistic
Regression
DataRecord
DataRecord
PredictionEngine Optimization
• Reduce serialization cost
– Model collocation
– Batch request API
• Reduce compute cost
– Feature id instead of string name
– Transform sharing across models
– Feature cross done on the fly
PredictionEngine Optimization
• Training/Serving throughput
– Sharding for model updates
– Separation of training and prediction services
– Elastic load based on latency
• Realtime feedback
– Treat ads impression as non-click event
• Fault tolerance
– Snapshot model every fixed interval
– Anomaly traffic detection
Tooling
• Autotune hyper parameter
• Insight and interpretation
– Inspect data/model in human readable format
– Compute dataset stats
– Visualize tree model
• Feature selection tool
– Forward/backward greedy search
Work in progress
• Algorithm flexibility
– Large scale torch based ML
• Better tooling
– Workflow management framework
– Visualization and interactive exploration
Thank you

More Related Content

Viewers also liked

AbbyPardueResume
AbbyPardueResumeAbbyPardueResume
AbbyPardueResume
Abby Pardue
 
Tu Eres America - Pitchbook
Tu Eres America - PitchbookTu Eres America - Pitchbook
Tu Eres America - Pitchbook
Ana Trevino
 

Viewers also liked (11)

група 1 малятко
група 1 маляткогрупа 1 малятко
група 1 малятко
 
група 1 малятко
група 1 маляткогрупа 1 малятко
група 1 малятко
 
Cloverleaf Presentation
Cloverleaf PresentationCloverleaf Presentation
Cloverleaf Presentation
 
Amazon amb un click (Sergi Anglada)
Amazon amb un click (Sergi Anglada) Amazon amb un click (Sergi Anglada)
Amazon amb un click (Sergi Anglada)
 
The veterinary pharmacy residency and other careers in veterinary pharmacy
The veterinary pharmacy residency and other careers in veterinary pharmacyThe veterinary pharmacy residency and other careers in veterinary pharmacy
The veterinary pharmacy residency and other careers in veterinary pharmacy
 
Condo in Cloverleaf by Avida Land an Ayala land Company
Condo in Cloverleaf  by Avida Land an Ayala land CompanyCondo in Cloverleaf  by Avida Land an Ayala land Company
Condo in Cloverleaf by Avida Land an Ayala land Company
 
SocialHi5: Google q4 Partner Connect Event
SocialHi5: Google q4 Partner Connect EventSocialHi5: Google q4 Partner Connect Event
SocialHi5: Google q4 Partner Connect Event
 
Avida Tower Cloverleaf
Avida Tower Cloverleaf Avida Tower Cloverleaf
Avida Tower Cloverleaf
 
AbbyPardueResume
AbbyPardueResumeAbbyPardueResume
AbbyPardueResume
 
PNAIRP 2014 Final
PNAIRP 2014 FinalPNAIRP 2014 Final
PNAIRP 2014 Final
 
Tu Eres America - Pitchbook
Tu Eres America - PitchbookTu Eres America - Pitchbook
Tu Eres America - Pitchbook
 

Similar to ICML'16 Scaling ML System@Twitter

Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
vitm11
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit
 

Similar to ICML'16 Scaling ML System@Twitter (20)

ML Model Serving at Twitter
ML Model Serving at TwitterML Model Serving at Twitter
ML Model Serving at Twitter
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at Starbucks
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 
Parameter Server Approach for Online Learning at Twitter
Parameter Server Approach for Online Learning at TwitterParameter Server Approach for Online Learning at Twitter
Parameter Server Approach for Online Learning at Twitter
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
 
Practical soa for business and researchers
Practical soa for business and researchersPractical soa for business and researchers
Practical soa for business and researchers
 
Scaling ml @ careem (oreilly ai conf)
Scaling ml @ careem (oreilly ai conf)Scaling ml @ careem (oreilly ai conf)
Scaling ml @ careem (oreilly ai conf)
 
Introducing MLOps.pdf
Introducing MLOps.pdfIntroducing MLOps.pdf
Introducing MLOps.pdf
 
Mark Willemse - Strategy & Deployment Journey
Mark Willemse - Strategy & Deployment JourneyMark Willemse - Strategy & Deployment Journey
Mark Willemse - Strategy & Deployment Journey
 
Microsoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDrivenMicrosoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDriven
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptx
 

ICML'16 Scaling ML System@Twitter

  • 1. ICML 16’ Scaling Machine Learning @Twitter Jack Guo
  • 2. ML is Important • ~80% of DAU is attributed to teams doing ML • ~90% of revenue comes from ads backed by ML models • ML platform supports many teams – ads ranking, ads targeting, timeline ranking, anti-spam, recommendation, moments ranking, trends
  • 3. ML is Large Scale • Take ads ranking as an example – Trillions of predictions made daily – Hundreds of millions of weights per model – Thousands of features per example – TB of training data
  • 4. ML is Realtime • Twitter is all about realtime: news, events, videos, trends • Advertiser campaign targets realtime event, hashtags, spanning as short as a few hours even minutes • ML needs to adapt to dynamically changing traffic
  • 5. Scaling Challenges • Organization scaling – How to support client team efficiently? • System scaling – How to train and make inference efficiently? – How to enable fast iteration and experimentation?
  • 6. Organization Scaling • ML platform’s focus – Define feature, transform, model format – Provide framework and tooling • data ETL, trainers, parameter search, serving runtime, workflow management – Onboard client and provide support • Client team’s focus – Define and extract features – Own and maintain training pipeline and serving runtime
  • 7. Standardized Feature Format • Enable feature sharing across teams • Make ML platform iteration easy • Feature format – Support 4 dense, 2 sparse feature types – Use hashed id instead of string name for efficient serialization, storage, compute – Collocate schema (id to name mapping) with the data
  • 8. TrainingAPI • Make operations on distributed data painless for ML practitioners • Scala data ETL API – Provide powerful abstractions for ML datasets and operations – Fluent API, enabling imperative programming • Ensure data and metadata consistency through operations
  • 9. Example 1. Take my dataset whose path given by “input” 2. Sample it by 10% randomly 3. Discretize with the given discretizer 4. Left join with media label on tweet id 5. Dump the result to path given by “output”
  • 10. PredictionEngine • Large scale online SGD learning • Architecture – Transform: MDL, Decision tree – Feature crossing – Logistic Regression: Vowpal Wabbit or in-house JVM learner Transform Transform Transform Cross Logistic Regression DataRecord DataRecord
  • 11. PredictionEngine Optimization • Reduce serialization cost – Model collocation – Batch request API • Reduce compute cost – Feature id instead of string name – Transform sharing across models – Feature cross done on the fly
  • 12. PredictionEngine Optimization • Training/Serving throughput – Sharding for model updates – Separation of training and prediction services – Elastic load based on latency • Realtime feedback – Treat ads impression as non-click event • Fault tolerance – Snapshot model every fixed interval – Anomaly traffic detection
  • 13. Tooling • Autotune hyper parameter • Insight and interpretation – Inspect data/model in human readable format – Compute dataset stats – Visualize tree model • Feature selection tool – Forward/backward greedy search
  • 14. Work in progress • Algorithm flexibility – Large scale torch based ML • Better tooling – Workflow management framework – Visualization and interactive exploration