SlideShare a Scribd company logo
1 of 30
Tsahi Glik
Sep 12, 2019
ML Infra @ Dropbox
Overview
ML @ dropbox
Our signal sources:
Files
Multi-exabyte data
File Metadata
Trillions
User interactions
Billions / day
ML @ dropbox
ML Impact at Dropbox:
● Smart Sync
● Content Suggestions
● Team Activity Ranking
● Search Ranking
● OCR
And many more …
ML Platform
Challenges:
● Huge data sources that are isolated in various system across production
● Multiple privacy levels of data
● Custom work and build dedicated services for each new use case
● Manual training which is hard to reproduce
● Wide variety of development processes and ML frameworks
ML Platform
Mission:
Accelerate intelligent product development at Dropbox
By:
● Scalable access to data for offline and online
● Ensures sensitive data is protected and accessed only in approved ways
● Easy model deployment & experimentation
● Automate workflows
● Standardize the process, frameworks and tools
Platform Architecture
Online Data Collection
Antenna
What is Antenna?
● User activity service
● Provides various ways to query activity events
● Support aggregations for simple summaries and histograms of activity data
Example usage of Antenna
Antenna Architecture
Content Ingestion pipeline
Read more in our blog post:
OCR
Content Ingestion Architecture
Offline Data Preparation
Data Preparation - ETL pipeline
Data Preparation - Predict logger
- Converting raw logs into labeled datasets
- Logging partial information from different services at different times
- Eliminate discrepancies between online and offline
Offline Training &
Evaluation
Prototyping
HDFS
Signal and training
data store
Spark
Zeppelin Notebooks
Multi-user notebook environment
Workbench
40 cores, 400GB ram
dbxlearn
Elastic ML training and
hyperparameter tuning
dbxlearn
What is dbxlearn?
● dbxlearn provides an easy way to use computing at scale for training
● Core problems dbxlearn is addressing:
○ Elasticity
○ Standard way to train on different hw configurations (GPU, TPU) on
different cloud platforms.
● Hybrid cloud architecture - Interface with private cluster and well as public
clouds
● Currently integrated with AWS and use SageMaker
dbxlearn Architecture
dbxlearn
Datasets
Training script
bazelized binary
Dropbox Data Center
Public Cloud (AWS)
S3
Data and code store
AWS Sagemaker
Trainers cluster
Training Instances
Training Instances
S3
Model store
deploy
train/tuneexport
dbxlearn workflow
$ dbxlearn train --py-binary <script>
--train_uri <...> --validation_uri <...> [--local]
$ dbxlearn tune --py-binary <script> --train_uri <...> --validation_uri <...>
$ dbxlearn query --tuning_job_id <id> print_top_summary
$ dbxlearn deploy-model --tuning-job_id <id> <experiment-group>
Model Deployment
Predict service
Live experimentation - Suggest backend
Shadow experimentation - Suggest backend
● Send live traffic to shadow cluster with a different experiment variant
● Results are logged for experiment analysis
● Useful to collect labeled datasets using Predict Logger
Example
Campaign Ranker - Using Multi Arm Bandits
Campaign Ranker - Using Multi Arm Bandits
Summary
● End-to-End platform that supports all steps in ML development
workflow
● Deep integration with Dropbox large scale data sources
● Flexible APIs to support wide variety of use cases
● Hybrid cloud architecture for elasticity and early adoption of new
technologies
Next Challenges
● Better representation of data relations across multiple systems
● Democratize ML at dropbox, extending our tools from ML
developers to more engineers
Thank You

More Related Content

What's hot

Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David AndersonVerverica
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningDavid Stein
 
Wix's ML Platform
Wix's ML PlatformWix's ML Platform
Wix's ML PlatformRan Romano
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsJames Kirk
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastDatabricks
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveYingjun Wu
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
FIWARE Wednesday Webinars - How to Design DataModels
FIWARE Wednesday Webinars - How to Design DataModelsFIWARE Wednesday Webinars - How to Design DataModels
FIWARE Wednesday Webinars - How to Design DataModelsFIWARE
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15MLconf
 
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Databricks
 
When apache pulsar meets apache flink
When apache pulsar meets apache flinkWhen apache pulsar meets apache flink
When apache pulsar meets apache flinkStreamNative
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache PinotSiddharth Teotia
 

What's hot (20)

Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine Learning
 
Wix's ML Platform
Wix's ML PlatformWix's ML Platform
Wix's ML Platform
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
FIWARE Wednesday Webinars - How to Design DataModels
FIWARE Wednesday Webinars - How to Design DataModelsFIWARE Wednesday Webinars - How to Design DataModels
FIWARE Wednesday Webinars - How to Design DataModels
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
 
When apache pulsar meets apache flink
When apache pulsar meets apache flinkWhen apache pulsar meets apache flink
When apache pulsar meets apache flink
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
 

Similar to ML Infrastracture @ Dropbox

Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Google Cloud Platform (GCP) At a Glance
Google Cloud Platform (GCP)  At a GlanceGoogle Cloud Platform (GCP)  At a Glance
Google Cloud Platform (GCP) At a GlanceCloud Analogy
 
Introduction to Google Cloud & GCCP Campaign
Introduction to Google Cloud & GCCP CampaignIntroduction to Google Cloud & GCCP Campaign
Introduction to Google Cloud & GCCP CampaignGDSCVJTI
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Deploy Deep Learning Models with TensorFlow + Lambda
Deploy Deep Learning Models with TensorFlow + LambdaDeploy Deep Learning Models with TensorFlow + Lambda
Deploy Deep Learning Models with TensorFlow + LambdaGreg Werner
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud lohitvijayarenu
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
 
Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Vrushali Channapattan
 
Cloud-based Energy Efficient Software
Cloud-based Energy Efficient SoftwareCloud-based Energy Efficient Software
Cloud-based Energy Efficient SoftwareFotis Stamatelopoulos
 
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio, Inc.
 
Session 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramSession 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개Ha-Yang(White) Moon
 
Categorizing Docker Hub Public Images
Categorizing Docker Hub Public ImagesCategorizing Docker Hub Public Images
Categorizing Docker Hub Public ImagesRoberto Hashioka
 

Similar to ML Infrastracture @ Dropbox (20)

Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Google Cloud Platform (GCP) At a Glance
Google Cloud Platform (GCP)  At a GlanceGoogle Cloud Platform (GCP)  At a Glance
Google Cloud Platform (GCP) At a Glance
 
Introduction to Google Cloud & GCCP Campaign
Introduction to Google Cloud & GCCP CampaignIntroduction to Google Cloud & GCCP Campaign
Introduction to Google Cloud & GCCP Campaign
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Deploy Deep Learning Models with TensorFlow + Lambda
Deploy Deep Learning Models with TensorFlow + LambdaDeploy Deep Learning Models with TensorFlow + Lambda
Deploy Deep Learning Models with TensorFlow + Lambda
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud
 
Cloud-based Energy Efficient Software
Cloud-based Energy Efficient SoftwareCloud-based Energy Efficient Software
Cloud-based Energy Efficient Software
 
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to Production
 
Session 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramSession 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers Program
 
Integrating with Aws s3
Integrating with Aws s3Integrating with Aws s3
Integrating with Aws s3
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개
 
Categorizing Docker Hub Public Images
Categorizing Docker Hub Public ImagesCategorizing Docker Hub Public Images
Categorizing Docker Hub Public Images
 

Recently uploaded

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 

Recently uploaded (20)

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 

ML Infrastracture @ Dropbox

Editor's Notes

  1. Understand the scale of our data sources that we use for ml features, not necessarily for training huge file repository. probably one of the largest in the world. containing exabytes of data with millions of new files added every day. And we have file system trees which provides our systems valuable signals on content organization and grouping. billions of events every day of the users interaction with these files. Which provide our systems valuable signals on how users are working with files and collaborate with one another in their workplace. These are huge data sources that present lots of potential for ML, but also lots of challenges to ML developers.
  2. Go through some of the use cases Smart sync is trying to predict which files you will need on each device so we can make it available locally and vacuum unneeded files. Content suggestion is trying to simplify retrieval process and predict which file you are looking for Team Activity ranking is trying to increase awareness of what others are doing while filtering the noise Search ranking - use a ml model for ranking search results OCR - we extract text from each image which is uploaded to dropbox So these are some of the highlights, but there are many more.
  3. So what are the challenges that we are trying to solve? Our huge data sources are isolated in various system across production, which makes it challenging to access for training. Multiple privacy levels of data, some of which is not reviewable by engineers Teams are doing custom work and build new services for solve their problems There are manual complex training workflows which are hard to reproduce Wide variety of ml frameworks and tools in use across teams
  4. So Our mission: Accelerate intelligence products development at Dropbox And we trying to solve it by: provide scalable access to context to models both offline and online Ensures sensitive data is protected and accessed only in approved ways Make it easy to deploy models without building new services and make it easy to experiment with new models Automate development workflows Standardize the process, frameworks and tools for Intelligent product development and release
  5. All ml development has a common basic workflow that includes: Data collection, Data preparation, Training & evaluation, And model deployment to production. We have developed components to support each step in this basic workflow. Online components integrates with DBX production systems to provide data in realtime. And make it easy to deploy and experiment with new models. Our offline components capture historical data and make it easy to access it for training and prototyping. And Management components are making it easy to automate these workflows.
  6. Serves user activity to online production services Given a user what are all the all the files that the user have interacted on. Given a file who are all the users that have interacted with that file. We also support simple aggregations like how many edits a user had on a specific file or creating histograms of number of events across different days of week
  7. In this example we are creating suggestions for a user that try to predict which file the user will open next To do that it will first query Antenna our user activity service for the user activity from the last few months. It will then use that to generate a candidate list of files for suggestions and aggregations of the user activity on each file candidate. Then an ml model will rank them and return the top N. It is important that the data will be fresh. The user recent activity, even from the last few minutes, is relevant for what the user want to do next.
  8. Online ingestion hosts will process events in realtime , updating indexes and aggregations and provide fresh data. Offline components persist the raw events in a durable store and rebuild the indexes and aggregations periodically ml developers can define new aggregators and indexes that will be run in both in online ingestion and in offline workers and get automatically backfilled
  9. Antenna is our ingestion infrastructure for user activity, lets now talk about content ingestion infrastructure OCR is a good example that will demonstrate how an ML model will run in our content ingestion infrastructure It is actually a multistep process of not one but several models . Every image that passes through our ingestion pipeline classified whether it contain OCR-able content, image is rectified to align the text , deep net model is use to extract word boxes, LSTM model is used to convert each word box to a sequence of characters, and finally a lexicon based algorithm is converting these sequence of characters to actual words.
  10. Every File update in ingested by our indexers that call a plugin framework to do a transform on the user content, which is in our case an OCR model, and then it store the results in Doc Store. Which contain all derived data for each file. When running a transform on raw user content, there are security concerns for exploits and vulnerabilities, so the plugin framework run each plugin in a sandboxed environment we call jail. This can allow us to be sure that any exploits in the frameworks that we are using, like ImageMagick and TF, cannot be used to to gain access to our systems. One of the challenges that we are dealing with here is how to simplify model deployment to this jailed environment and enable easy experimentation with different model variants.
  11. Our ETL pipeline makes it easy to generate training data and signals from our data sources We maintain interfaces for spark jobs to import data from our data lake and Antenna Using periodical Spark jobs orchestrated by Airflow to generate signals and training data Then ML developer can access the output signal and training data for training.
  12. We provide a more specialized pipeline that help automate the generation of labeled datasets from live traffic. Which we call the predict logger The predict logger define an api with a set of predict events that define the life cycle of online predictions Like : requested, predicted, viewed, actions This events are being logged from different services at different time. And the predict logger merge them in a consistent way that help developers avoid incidental complexities The result labeled dataset is ready for use in training with all the signals and context as was seen in serving time, so this help us avoid the discrepancies between online and offline data
  13. Before full scale training, developers first need to prototype For prototyping they need access to training and signal data from the etl pipeline which they use for exploration and offline evaluations We use workbenches in production that are integretage with a Spark cluster to provide them access to all the offline data After prototyping, they use dbxlearn for large scale training and hyperparameter tuning.
  14. Training jobs require lots of computing power dbxlearn provides our developers an easy way to use computing at scale for training by enabling them to submit training jobs to remote clusters. It provides us elasticy, make sure that each job will get all the resources that it needs when it needs them Also a standard way to train on different hw configuration, so we can use specialized hw configuration in training like GPUs. We have build a hybrid cloud architecture that interface with our private cluster as well as public clouds. We have integrated currently with AWS and use SageMaker for training.
  15. Typical workflow with dbxlearn Use dbxlearn train --local to test the training script locally Remove --local to test the training script in the cloud. If it runs well in the cloud, use dbxlearn tune to find the optimal hyperparameters automatically Use dbxlearn query to check the status and the results of all training jobs If the results are good, use dbxlearn deploy-model to deploy the best model to the model store
  16. We have a central service to host models in production called predict service Loads models from the model store and provide an standard API to do realtime inference. Support multiple model inference partitions for resources isolation. The inference api can also provide a proxy for running inference on public cloud services
  17. Help reduce boilerplate of running live experiments Simple config defines which signals to collect and which model to run . Client send requests with target experiment variant Suggest backend run the signal collector that was defined in the config And then run configured model Standard logging to monitor experiments results Provide signal collector abstraction that developers can customize
  18. The example is a campaign ranker that was implemented as a contextual multi-arm bandits problem At dropbox we have a campaign framework that makes it easy to define campaigns that can be displayed in various ui surfaces for various populations. In this case one of our web pages is displaying a campaign for Dropbox business. For each impression there are many competing campaigns, that can be modeled as a multi-arm bandits problem, where we need to choose which arm to play. And we use ui features and user features as context for this decision.
  19. Impressions are being logged by UI surfaces and by predict service from the backend, using the predict logger This generates a batch of new training data periodically by our ETL pipeline A training job then update the policies in the contextual bandits model and store it in the model store The predict service then load the new model and use it for future decisions And this repeats itself forever This demonstrate the use of our ETL pipeline to automate the full workflow of labeled data generation, training and model deployment
  20. We have built an End-to-End platform that supports all steps in ML development workflow We provide Deep integration with Dropbox large scale data sources to make this data accessible for offline training and online inference We built our components with flexible APIs to support wide variety of use cases We chose a hybrid cloud architecture for easier elasticity and early adoption of new technologies
  21. Currently the exabytes of data are in multiple systems. But there are relationships that would be useful to know across these systems. So our challenge is how can we present the interaction among data across systems so the ML team can use them for features? Currently our tools are used mainly by ml developers with high ML expertise. We would like to make our tools accessible to more engineers with better and simpler interfaces