Dropbox Talk at Netflix ML Platform Meetup Spe 2019

Sep 12, 2019
ML Infra @ Dropbox
Overview

ML @ dropbox - context
Our data sources:
● Files - multi-exabyte of user content
● File metadata - File names and directory trees
● User activity - Billions of file access and sharing events per day

ML @ dropbox - context
ML Impact at Dropbox:
● Search Ranking
● Content Suggestions
● File naming Suggestions
● Smart sync
● Spam detection
● Payments
● OCR
● Prompt campaign ranking

Realtime User Activity Signals - Antenna & UPS

Data collection - ETL
HDFS
Signal store
Hive
Dropbox data lake
Antenna
User activity logs
Predict Logs
User/File Metadata
User/File Activities
Airflow
Workflow orchestration
Spark Jobs
Compute signals and use-case specific datasets
HDFS
Training data store
Online data Offline dataOffline ETL

Workbench and Notebooks
HDFS
Signal and training
data store
Spark
Zeppelin Notebooks
Multi-user notebook environment
Workbench
40 cores, 400GB ram
dbxlearn
Elastic ML training and
hyperparameter tuning

dbxlearn
dbxlearn
Datasets
Training script
bazelized binary
Dropbox Data Center
Public Cloud (AWS)
S3
Data and code store
Sagemaker
Training Instances
Training Instances
S3
Model store
deploy
train/tuneexport

dbxlearn workflow
$ dbxlearn train --py-binary <script>
--train_uri <...> --validation_uri <...> [--local]
$ dbxlearn tune --py-binary <script> --train_uri <...> --validation_uri <...>
$ dbxlearn query --tuning_job_id <id> print_top_summary
$ dbxlearn deploy-model --tuning-job_id <id> <experiment-group>

Predict logger
- Converting raw logs into labeled
datasets
- Logging partial information from
different services at different times
- Reward windows

Inference configuration
- Defines signal sources
- Enable complex inference graphs
- Allow shadow and A/B experimentation

RL Example - Multi Arm Bandits

RL example - Multi-Arm Bandits

Summary
● End-to-End platform to support all steps on ML development @
dropbox
● Deep integration with Dropbox most valuable data sources
● Built for easy integration with cloud services for model training and
inference
● Built using open source technologies: Hadoop, Spark, TF, Scikit-
learn, …
● Support wide variety of use cases

Dropbox Talk at Netflix ML Platform Meetup Spe 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dropbox Talk at Netflix ML Platform Meetup Spe 2019

Similar to Dropbox Talk at Netflix ML Platform Meetup Spe 2019 (20)

Recently uploaded

Recently uploaded (20)

Dropbox Talk at Netflix ML Platform Meetup Spe 2019

Editor's Notes