SlideShare a Scribd company logo
Tuning ML Models:
Scaling, Workflows,
and Architecture
Joseph Bradley
Solutions Architect
About me
▪ Solutions Architect at Databricks (½ a year)
▪ Software Engineer at Databricks (5 ½ years)
▪ Apache Spark committer and PMC member
Global company with over 5,000 customers and 450+ partners
Original creators of popular data and machine learning open source projects
A unified data analytics platform for accelerating innovation across
data engineering, data science, and analytics
Hyperparameter
Model 1: “Dress”
Model 2: “Sneaker”
Why the difference?
Model 2 had better hyperparameter settings:
• Learning rate
• Model structure
• ...
What’s a hyperparameter?
• Statistical: assumptions about your
model/data
• Practical: inputs your ML library does not
learn from data
• Algorithmic: problem-dependent configs
Hyperparameter tuning
Expert knowledge
Not in this talk:
• Statistical best practices
• Overview of methods for tuning
In this talk:
• Data Science workflow best practices
• Tips for the big data and cloud computing space
Black-box tuning Ignore until needed
à See references at end!
Hyperparameter tuning is tough
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10
# hyperparameters
(7 possible values each)
Fractioncoverage
Non-convex optimization
Curse of dimensionality
è Computational cost
Unintuitive hyperparameters
• Regularization
• Neural net structure
• and many more...
In this talk
Architectural patterns
Workflows
Tips and details
Getting started
In this talk
Architectural patterns
Workflows
Tips and details
Getting started
Single-machine vs. distributed training
• Single-machine training
• Distributed training
• Training 1 model per group (per customer, product, etc.)
Single-machine training
Tuning
Scale out via distributed
hyperparameter tuning
à Train 1 model per Spark task
Driver
Worker WorkerWorker
Tuning
Tools for this:
• Hyperopt + SparkTrials
• sklearn + joblibspark
• Pandas UDFs
Distributed training
Driver
Worker WorkerWorker
Training
Tuning
ML
ML
Scale out via parallel
tuning
à Train 1 model at a time, or
train 2+ models in parallel
Tools for this:
• Apache Spark ML
• Hyperopt
Training one model per group
Driver
Worker WorkerWorker
Distribute
over groups
Tuning Tuning Tuning
Scale out by distributing
over groups
à Train 1 group’s model per
Spark task
Tools for this:
• Apache Spark, Pandas UDFs
• Hyperopt
In this talk
Architectural patterns
Workflows
Tips and details
Getting started
Getting started
Start small
• Bias towards smaller models, fewer
iterations, etc.
• Small is cheaper, and it may suffice.
• Regardless, it gives a baseline.
Think before you tune
• Separate train/val/test sets.
• Use early stopping or smart tuning
wherever possible.
• Pick hyperparameters carefully.
• Pick ranges carefully.
Models vs. pipelines
Best practice: Set up full pipeline before tuning.
• At what point does the pipeline compute the metric you care about?
Tuning models vs. tuning pipelines
• Tuning featurization
Optimizing tuning for pipelines
• Cache intermediate results
Evaluating and iterating
Validation data and metrics
• Record many metrics on both training and validation data.
Tuning hyperparameters independently vs. jointly
• Using smarter hyperparameter search algorithms
Tracking and reproducibility
• Data, code, params, metrics, models, metadata
• Tip: Parametrize code to facilitate tracking
(Demo)
In this talk
Architectural patterns
Workflows
Tips and details
Getting started
Handling code
Getting code to workers
• Generally simple: Pandas UDFs or
integrations (Hyperopt, etc.)
• Debugging code serialization
• Errors are often in worker logs and
look like ML library bugs (e.g., “no
module named X...”)
• Tip: For Python, import libraries
within closures
Passing configs and credentials
• E.g., MLflow active runs and
credentials
Helpful resource:
Distributed Hyperopt best practices
and troubleshooting
Moving data
Single-machine ML
• Broadcast
• Load from blob storage
• Caching data
Distributed ML
• Caching data
Blob storage data prep
• Delta Lake format
• Petastorm and TFRecords
Helpful resources:
• Distributed Hyperopt best practices
and troubleshooting
• Prepare data for distributed
training
Configuring clusters
Single-machine ML
• Sharing machine resources
• Selecting machine types
Distributed ML
• Right-sizing clusters
• Sharing a cluster
Helpful resource:
Scaling Hyperopt to Tune Machine
Learning Models in Python
In this talk
Architectural patterns
Workflows
Tips and details
Getting started
Tools to know about
Apache Spark CrossValidator, TrainValidationSplit, Pandas UDFs
MLflow Tracking, Auto-logging
ML in Python Hyperopt SparkTrials & more (See last year’s SAIS talk)
Scikit-learn sklearn.model_selection, skopt, joblib+Spark
TensorFlow HParams, Keras Tuner, Model Optimization Toolkit
PyTorch Ax
Resources
Blog posts + example notebooks
• Hyperparameter Tuning with MLflow, Apache Spark MLlib and Hyperopt
Talks
• Best Practices for Hyperparameter Tuning with MLflow (SAIS 2019)
• Advanced Hyperparameter Optimization for Deep Learning with MLflow (SAIS 2019)
Project pages + docs
• Hyperopt docs and Github
• MLflow homepage and Github
Slides: tinyurl.com/sais2020-joseph
Notebook: tinyurl.com/sais2020-joseph-demo
Thanks!
Architectural patterns
Workflows
Tips and details
Getting started
Any questions?

More Related Content

What's hot

Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon Web Services
 
Databricks の始め方
Databricks の始め方Databricks の始め方
Databricks の始め方
Ryoma Nagata
 
わかりづらいS3クロスアカウントアクセス許可に立ち向かおう
わかりづらいS3クロスアカウントアクセス許可に立ち向かおうわかりづらいS3クロスアカウントアクセス許可に立ち向かおう
わかりづらいS3クロスアカウントアクセス許可に立ち向かおう
Takashi Toyosaki
 
【AI:ML#16】Amazon Lexを用いたチャットボットの構築.pdf
【AI:ML#16】Amazon Lexを用いたチャットボットの構築.pdf【AI:ML#16】Amazon Lexを用いたチャットボットの構築.pdf
【AI:ML#16】Amazon Lexを用いたチャットボットの構築.pdf
TakeshiFukae
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
Amazon Web Services
 
20200218 AWS Black Belt Online Seminar Next Generation Redshift
20200218 AWS Black Belt Online Seminar Next Generation Redshift20200218 AWS Black Belt Online Seminar Next Generation Redshift
20200218 AWS Black Belt Online Seminar Next Generation Redshift
Amazon Web Services Japan
 
[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編
[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編
[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編
Amazon Web Services Japan
 
Amazon s3へのデータ転送における課題とその対処法を一挙紹介
Amazon s3へのデータ転送における課題とその対処法を一挙紹介Amazon s3へのデータ転送における課題とその対処法を一挙紹介
Amazon s3へのデータ転送における課題とその対処法を一挙紹介
Tetsunori Nishizawa
 
Azure Logic Apps
Azure Logic AppsAzure Logic Apps
Azure Logic Apps
BizTalk360
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
Amazon Web Services
 
AWS Organizations連携サービスの罠(Security JAWS 第26回 発表資料)
AWS Organizations連携サービスの罠(Security JAWS 第26回 発表資料)AWS Organizations連携サービスの罠(Security JAWS 第26回 発表資料)
AWS Organizations連携サービスの罠(Security JAWS 第26回 発表資料)
NTT DATA Technology & Innovation
 
Apache Atlasの現状とデータガバナンス事例 #hadoopreading
Apache Atlasの現状とデータガバナンス事例 #hadoopreadingApache Atlasの現状とデータガバナンス事例 #hadoopreading
Apache Atlasの現状とデータガバナンス事例 #hadoopreading
Yahoo!デベロッパーネットワーク
 
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Amazon Web Services
 
AWS를 활용한 상품 추천 서비스 구축::김태현:: AWS Summit Seoul 2018
AWS를 활용한 상품 추천 서비스 구축::김태현:: AWS Summit Seoul 2018AWS를 활용한 상품 추천 서비스 구축::김태현:: AWS Summit Seoul 2018
AWS를 활용한 상품 추천 서비스 구축::김태현:: AWS Summit Seoul 2018
Amazon Web Services Korea
 
Prestoで実現するインタラクティブクエリ - dbtech showcase 2014 Tokyo
Prestoで実現するインタラクティブクエリ - dbtech showcase 2014 TokyoPrestoで実現するインタラクティブクエリ - dbtech showcase 2014 Tokyo
Prestoで実現するインタラクティブクエリ - dbtech showcase 2014 Tokyo
Treasure Data, Inc.
 
Deep Dive into Spark SQL with Advanced Performance Tuning
Deep Dive into Spark SQL with Advanced Performance TuningDeep Dive into Spark SQL with Advanced Performance Tuning
Deep Dive into Spark SQL with Advanced Performance Tuning
Takuya UESHIN
 
Amazon Connect ハンズオン初級編
Amazon Connect ハンズオン初級編Amazon Connect ハンズオン初級編
Amazon Connect ハンズオン初級編
Amazon Web Services Japan
 
【de:code 2020】 Azure Synapse Analytics 技術編 ~ 最新の統合分析プラットフォームによる新しい価値の創出(後編)
【de:code 2020】 Azure Synapse Analytics 技術編 ~ 最新の統合分析プラットフォームによる新しい価値の創出(後編)【de:code 2020】 Azure Synapse Analytics 技術編 ~ 最新の統合分析プラットフォームによる新しい価値の創出(後編)
【de:code 2020】 Azure Synapse Analytics 技術編 ~ 最新の統合分析プラットフォームによる新しい価値の創出(後編)
日本マイクロソフト株式会社
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
20190326 AWS Black Belt Online Seminar Amazon CloudWatch
20190326 AWS Black Belt Online Seminar Amazon CloudWatch20190326 AWS Black Belt Online Seminar Amazon CloudWatch
20190326 AWS Black Belt Online Seminar Amazon CloudWatch
Amazon Web Services Japan
 

What's hot (20)

Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
 
Databricks の始め方
Databricks の始め方Databricks の始め方
Databricks の始め方
 
わかりづらいS3クロスアカウントアクセス許可に立ち向かおう
わかりづらいS3クロスアカウントアクセス許可に立ち向かおうわかりづらいS3クロスアカウントアクセス許可に立ち向かおう
わかりづらいS3クロスアカウントアクセス許可に立ち向かおう
 
【AI:ML#16】Amazon Lexを用いたチャットボットの構築.pdf
【AI:ML#16】Amazon Lexを用いたチャットボットの構築.pdf【AI:ML#16】Amazon Lexを用いたチャットボットの構築.pdf
【AI:ML#16】Amazon Lexを用いたチャットボットの構築.pdf
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
20200218 AWS Black Belt Online Seminar Next Generation Redshift
20200218 AWS Black Belt Online Seminar Next Generation Redshift20200218 AWS Black Belt Online Seminar Next Generation Redshift
20200218 AWS Black Belt Online Seminar Next Generation Redshift
 
[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編
[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編
[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編
 
Amazon s3へのデータ転送における課題とその対処法を一挙紹介
Amazon s3へのデータ転送における課題とその対処法を一挙紹介Amazon s3へのデータ転送における課題とその対処法を一挙紹介
Amazon s3へのデータ転送における課題とその対処法を一挙紹介
 
Azure Logic Apps
Azure Logic AppsAzure Logic Apps
Azure Logic Apps
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
AWS Organizations連携サービスの罠(Security JAWS 第26回 発表資料)
AWS Organizations連携サービスの罠(Security JAWS 第26回 発表資料)AWS Organizations連携サービスの罠(Security JAWS 第26回 発表資料)
AWS Organizations連携サービスの罠(Security JAWS 第26回 発表資料)
 
Apache Atlasの現状とデータガバナンス事例 #hadoopreading
Apache Atlasの現状とデータガバナンス事例 #hadoopreadingApache Atlasの現状とデータガバナンス事例 #hadoopreading
Apache Atlasの現状とデータガバナンス事例 #hadoopreading
 
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
 
AWS를 활용한 상품 추천 서비스 구축::김태현:: AWS Summit Seoul 2018
AWS를 활용한 상품 추천 서비스 구축::김태현:: AWS Summit Seoul 2018AWS를 활용한 상품 추천 서비스 구축::김태현:: AWS Summit Seoul 2018
AWS를 활용한 상품 추천 서비스 구축::김태현:: AWS Summit Seoul 2018
 
Prestoで実現するインタラクティブクエリ - dbtech showcase 2014 Tokyo
Prestoで実現するインタラクティブクエリ - dbtech showcase 2014 TokyoPrestoで実現するインタラクティブクエリ - dbtech showcase 2014 Tokyo
Prestoで実現するインタラクティブクエリ - dbtech showcase 2014 Tokyo
 
Deep Dive into Spark SQL with Advanced Performance Tuning
Deep Dive into Spark SQL with Advanced Performance TuningDeep Dive into Spark SQL with Advanced Performance Tuning
Deep Dive into Spark SQL with Advanced Performance Tuning
 
Amazon Connect ハンズオン初級編
Amazon Connect ハンズオン初級編Amazon Connect ハンズオン初級編
Amazon Connect ハンズオン初級編
 
【de:code 2020】 Azure Synapse Analytics 技術編 ~ 最新の統合分析プラットフォームによる新しい価値の創出(後編)
【de:code 2020】 Azure Synapse Analytics 技術編 ~ 最新の統合分析プラットフォームによる新しい価値の創出(後編)【de:code 2020】 Azure Synapse Analytics 技術編 ~ 最新の統合分析プラットフォームによる新しい価値の創出(後編)
【de:code 2020】 Azure Synapse Analytics 技術編 ~ 最新の統合分析プラットフォームによる新しい価値の創出(後編)
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
20190326 AWS Black Belt Online Seminar Amazon CloudWatch
20190326 AWS Black Belt Online Seminar Amazon CloudWatch20190326 AWS Black Belt Online Seminar Amazon CloudWatch
20190326 AWS Black Belt Online Seminar Amazon CloudWatch
 

Similar to Tuning ML Models: Scaling, Workflows, and Architecture

Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-final
supportlogic
 
Apache Spark MLlib
Apache Spark MLlib Apache Spark MLlib
Apache Spark MLlib
Zahra Eskandari
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
Databricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
Nick Pentreath
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
Databricks
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
Databricks
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyApache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Databricks
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
 
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkScaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache Spark
Databricks
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Sri Ambati
 
TensorFlow 101
TensorFlow 101TensorFlow 101
TensorFlow 101
Raghu Rajah
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Databricks
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlow
Fernando Ortega Gallego
 

Similar to Tuning ML Models: Scaling, Workflows, and Architecture (20)

Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-final
 
Apache Spark MLlib
Apache Spark MLlib Apache Spark MLlib
Apache Spark MLlib
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyApache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkScaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache Spark
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
TensorFlow 101
TensorFlow 101TensorFlow 101
TensorFlow 101
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlow
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 

Recently uploaded (20)

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 

Tuning ML Models: Scaling, Workflows, and Architecture

  • 1. Tuning ML Models: Scaling, Workflows, and Architecture Joseph Bradley Solutions Architect
  • 2. About me ▪ Solutions Architect at Databricks (½ a year) ▪ Software Engineer at Databricks (5 ½ years) ▪ Apache Spark committer and PMC member
  • 3. Global company with over 5,000 customers and 450+ partners Original creators of popular data and machine learning open source projects A unified data analytics platform for accelerating innovation across data engineering, data science, and analytics
  • 4. Hyperparameter Model 1: “Dress” Model 2: “Sneaker” Why the difference? Model 2 had better hyperparameter settings: • Learning rate • Model structure • ... What’s a hyperparameter? • Statistical: assumptions about your model/data • Practical: inputs your ML library does not learn from data • Algorithmic: problem-dependent configs
  • 5. Hyperparameter tuning Expert knowledge Not in this talk: • Statistical best practices • Overview of methods for tuning In this talk: • Data Science workflow best practices • Tips for the big data and cloud computing space Black-box tuning Ignore until needed à See references at end!
  • 6. Hyperparameter tuning is tough 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 # hyperparameters (7 possible values each) Fractioncoverage Non-convex optimization Curse of dimensionality è Computational cost Unintuitive hyperparameters • Regularization • Neural net structure • and many more...
  • 7. In this talk Architectural patterns Workflows Tips and details Getting started
  • 8. In this talk Architectural patterns Workflows Tips and details Getting started
  • 9. Single-machine vs. distributed training • Single-machine training • Distributed training • Training 1 model per group (per customer, product, etc.)
  • 10. Single-machine training Tuning Scale out via distributed hyperparameter tuning à Train 1 model per Spark task Driver Worker WorkerWorker Tuning Tools for this: • Hyperopt + SparkTrials • sklearn + joblibspark • Pandas UDFs
  • 11. Distributed training Driver Worker WorkerWorker Training Tuning ML ML Scale out via parallel tuning à Train 1 model at a time, or train 2+ models in parallel Tools for this: • Apache Spark ML • Hyperopt
  • 12. Training one model per group Driver Worker WorkerWorker Distribute over groups Tuning Tuning Tuning Scale out by distributing over groups à Train 1 group’s model per Spark task Tools for this: • Apache Spark, Pandas UDFs • Hyperopt
  • 13. In this talk Architectural patterns Workflows Tips and details Getting started
  • 14. Getting started Start small • Bias towards smaller models, fewer iterations, etc. • Small is cheaper, and it may suffice. • Regardless, it gives a baseline. Think before you tune • Separate train/val/test sets. • Use early stopping or smart tuning wherever possible. • Pick hyperparameters carefully. • Pick ranges carefully.
  • 15. Models vs. pipelines Best practice: Set up full pipeline before tuning. • At what point does the pipeline compute the metric you care about? Tuning models vs. tuning pipelines • Tuning featurization Optimizing tuning for pipelines • Cache intermediate results
  • 16. Evaluating and iterating Validation data and metrics • Record many metrics on both training and validation data. Tuning hyperparameters independently vs. jointly • Using smarter hyperparameter search algorithms Tracking and reproducibility • Data, code, params, metrics, models, metadata • Tip: Parametrize code to facilitate tracking
  • 18. In this talk Architectural patterns Workflows Tips and details Getting started
  • 19. Handling code Getting code to workers • Generally simple: Pandas UDFs or integrations (Hyperopt, etc.) • Debugging code serialization • Errors are often in worker logs and look like ML library bugs (e.g., “no module named X...”) • Tip: For Python, import libraries within closures Passing configs and credentials • E.g., MLflow active runs and credentials Helpful resource: Distributed Hyperopt best practices and troubleshooting
  • 20. Moving data Single-machine ML • Broadcast • Load from blob storage • Caching data Distributed ML • Caching data Blob storage data prep • Delta Lake format • Petastorm and TFRecords Helpful resources: • Distributed Hyperopt best practices and troubleshooting • Prepare data for distributed training
  • 21. Configuring clusters Single-machine ML • Sharing machine resources • Selecting machine types Distributed ML • Right-sizing clusters • Sharing a cluster Helpful resource: Scaling Hyperopt to Tune Machine Learning Models in Python
  • 22. In this talk Architectural patterns Workflows Tips and details Getting started
  • 23. Tools to know about Apache Spark CrossValidator, TrainValidationSplit, Pandas UDFs MLflow Tracking, Auto-logging ML in Python Hyperopt SparkTrials & more (See last year’s SAIS talk) Scikit-learn sklearn.model_selection, skopt, joblib+Spark TensorFlow HParams, Keras Tuner, Model Optimization Toolkit PyTorch Ax
  • 24. Resources Blog posts + example notebooks • Hyperparameter Tuning with MLflow, Apache Spark MLlib and Hyperopt Talks • Best Practices for Hyperparameter Tuning with MLflow (SAIS 2019) • Advanced Hyperparameter Optimization for Deep Learning with MLflow (SAIS 2019) Project pages + docs • Hyperopt docs and Github • MLflow homepage and Github Slides: tinyurl.com/sais2020-joseph Notebook: tinyurl.com/sais2020-joseph-demo
  • 25. Thanks! Architectural patterns Workflows Tips and details Getting started Any questions?