SlideShare a Scribd company logo
Best Practices for Hyperparameter Tuning
with
Joseph Bradley
April 24, 2019
Spark + AI Summit
About me
Joseph Bradley
• Software engineer at Databricks
• Apache Spark committer & PMC member
TEAM
About Databricks
Started Spark project (now Apache Spark) at UC Berkeley in 2009
PRODUCT
Unified Analytics Platform
MISSION
Making Big Data Simple
Try for free today.
databricks.com
Hyperparameters
• Express high-level concepts, such as statistical assumptions
• Are fixed before training or are hard to learn from data
• Affect objective, test time performance, computational cost
E.g.:
• Linear Regression: regularization, # iterations of optimization
• Neural Network: learning rate, # hidden layers
Tuning hyperparameters
E.g.: Fitting a
polynomial
Common goals:
• More flexible modeling process
• Reduced generalization error
• Faster training
• Plug & play ML
Challenges in tuning
Curse of dimensionality
Non-convex optimization
Computational cost
Unintuitive hyperparameters
Tuning in the Data Science workflow
Data
Tuning in the Data Science workflow
Training Data Test Data
ML Model
Tuning in the Data Science workflow
Training
Data
Validation
Data
Test Data
Final
ML Model
ML Model 1
ML Model 2
ML Model 3
Tuning in the Data Science workflow
ML Model
Featurization
Model family
selection
Hyperparameter
tuning
“AutoML” includes hyperparameter tuning.
This talk
Popular methods for hyperparameter tuning
• Overview of methods
• Comparing methods
• Open-source tools
Tuning in practice with MLflow
• Instrument tuning
• Analyze results
• Productionize models
Beyond this talk
This talk
Popular methods for hyperparameter tuning
• Overview of methods
• Comparing methods
• Open-source tools
Tuning in practice with MLflow
• Instrument tuning
• Analyze results
• Productionize models
Beyond this talk
Overview of tuning methods
• Manual search
• Grid search
• Random search
• Population-based algorithms
• Bayesian algorithms
Manual search
Select hyperparameter settings to try based on human intuition.
2 hyperparameters:
• [0, ..., 5]
• {A, B, ..., F}
Expert knowledge tells us to try:
(2,C), (2,D), (2,E), (3,C), (3,D), (3,E)
A B C D E F
0
1
2
3
4
5
Grid Search
Try points on a grid defined by ranges and step sizes
X-axis: {A,...,F}
Y-axis: 0-5, step = 1
A B C D E F
0
1
2
3
4
5
A B C D E F
0
1
2
3
4
5
Random Search
Sample from distributions over ranges
X-axis: Uniform({A,...,F})
Y-axis: Uniform([0,5])
Start with random search, then iterate:
• Use the previous “generation” to
inform the next generation
• E.g., sample from best performers &
then perturb them
Population Based Algorithms
A B C D E F
0
1
2
3
4
5
Start with random search, then iterate:
• Use the previous “generation” to
inform the next generation
• E.g., sample from best performers &
then perturb them
Population Based Algorithms
A B C D E F
0
1
2
3
4
5
Start with random search, then iterate:
• Use the previous “generation” to
inform the next generation
• E.g., sample from best performers &
then perturb them
Population Based Algorithms
A B C D E F
0
1
2
3
4
5
Model the loss function:
Hyperparameters à loss
Iteratively search space, trading off
between exploration and exploitation
A B C D E F
0
1
2
3
4
5
Bayesian Optimization
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Bayesian OptimizationPerformance
Parameter Space
Comparing tuning methods
Iterative /
adaptive?
# evaluations
for P params
Model of
param space
Grid search No O(c^P) none
Random search No O(k) none
Population-based Yes O(k) implicit
Bayesian Yes O(k) explicit
Open-source tools for tuning
Grid
search
Random
search
Population
-based
Bayesian PyPi
downloads
last month
Github
stars
License
scikit-learn Yes Yes --- --- BSD
MLlib Yes --- --- Apache 2.0
scikit-
optimize
Yes 49,189 1,278 BSD
Hyperopt Yes Yes 98,282 3,286 BSD
DEAP Yes 26,700 2,789 LGPL v3
TPOT Yes 9,057 5,609 LGPL v3
GPyOpt Yes 4,959 451 BSD
As of mid-April 2019
This talk
Popular methods for hyperparameter tuning
• Overview of methods
• Comparing methods
• Open-source tools
Tuning in practice with MLflow
• Instrument tuning
• Analyze results
• Productionize models
Beyond this talk
This talk
Popular methods for hyperparameter tuning
• Overview of methods
• Comparing methods
• Open-source tools
Tuning in practice with MLflow
• Instrument tuning
• Analyze results
• Productionize models
Beyond this talk
Tracking
• Experiments
• Runs
• Parameters
• Metrics
• Tags & artifacts
Projects
• Directory or git
repository
• Entry points
• Environments
Models
• Storage format
• Flavors
• Deployment
tools
Organizing with
Training Data Validation Data Test Data
Final ML ModelML Model 1
ML Model 2
ML Model 3
Experiment
Main run
Child runs
Instrumenting tuning with
What to track in a run for a model
• Hyperparameters: all vs. ones being tuned
• Metric(s): training & validation, loss & objective, multiple objectives
• Tags: provenance, simple metadata
• Artifacts: serialized model, large metadata
Tip: Tune full pipeline, not 1 model.
Analyzing how tuning performs
Questions to answer
• Am I tuning the right hyperparameters?
• Am I exploring the right parts of the search space?
• Do I need to do another round of tuning?
Examining results
• Simple case: visualize param vs metric
• Challenges: multiple params and metrics, iterative experimentation
Moving models to production
Repeatable experiments via MLflow Projects
• Code checkpoints
• Environments
Model serialization via MLflow Models
• Flavors: TensorFlow, Keras, Spark, MLeap, ...
Deployment to prediction services
• Azure ML, AWS Sagemaker, Spark UDF
Auto-tracking MLlib with
Training Data Validation Data Test Data
Final ML ModelML Model 1
ML Model 2
ML Model 3
Experiment
Main run
Child runs
In Databricks
• CrossValidator &
TrainValidationSplit
• 1 run per setting of
hyperparameters
• Avg metrics for CV folds(demo)
This talk
Popular methods for hyperparameter tuning
• Overview of methods
• Comparing methods
• Open-source tools
Tuning in practice with MLflow
• Instrument tuning
• Analyze results
• Productionize models
Beyond this talk
This talk
Popular methods for hyperparameter tuning
• Overview of methods
• Comparing methods
• Open-source tools
Tuning in practice with MLflow
• Instrument tuning
• Analyze results
• Productionize models
Beyond this talk
Advanced topics
Efficient tuning
• Parallelizing hyperparameter
search
• Early stopping
• Transfer learning
Fancy tuning
• Multi-metric optimization
• Conditional/awkward
parameter spaces
Check out Maneesh Bhide’s talk:
"Advanced Hyperparameter
Optimization for Deep Learning"
to hear about early stopping,
multi-metric, & conditionals
Thursday @ 3:30pm, Room 3014
Advanced topics
Efficient tuning: Parallelizing
hyperparameter search
Challenge in analyzing results:
multiple parameters or
multiple metrics
Hyperopt + Apache Spark
+ MLflow integration
• Hyperopt: general tuning
library for ML in Python
• Spark integration:
parallelize model tuning in
batches
• MLflow integration: track
runs, analogous to MLlib +
MLflow integration
(demo)
Getting started
MLflow: http://mlflow.org
MLlib tuning
• Databricks auto-tracking with MLflow in private preview now, public
preview mid-May
Hyperopt
• Distributed tuning via Apache Spark: working to open-source the code
• Databricks auto-tracking with MLflow in public preview mid-May
Thank You!
Questions?
AMA @ DevLounge Theater
Thursday @ 10:30-11am
Thanks to Maneesh Bhide for
material for this talk!

More Related Content

What's hot

Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
Databricks
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
Databricks
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Edureka!
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
Databricks
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Databricks
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
Databricks
 
AWS 기반 소프트웨어 서비스(SaaS) -김용우 솔루션즈 아키텍트 :: AWS 파트너 테크시프트 세미나
AWS 기반 소프트웨어 서비스(SaaS) -김용우 솔루션즈 아키텍트 :: AWS 파트너 테크시프트 세미나 AWS 기반 소프트웨어 서비스(SaaS) -김용우 솔루션즈 아키텍트 :: AWS 파트너 테크시프트 세미나
AWS 기반 소프트웨어 서비스(SaaS) -김용우 솔루션즈 아키텍트 :: AWS 파트너 테크시프트 세미나
Amazon Web Services Korea
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
Amazon Web Services
 
Machine Learning Operations & Azure
Machine Learning Operations & AzureMachine Learning Operations & Azure
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
Ori Reshef
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
Pieter de Bruin
 
[2017 AWS Startup Day] AWS 비용 최대 90% 절감하기: 스팟 인스턴스 Deep-Dive
[2017 AWS Startup Day] AWS 비용 최대 90% 절감하기: 스팟 인스턴스 Deep-Dive [2017 AWS Startup Day] AWS 비용 최대 90% 절감하기: 스팟 인스턴스 Deep-Dive
[2017 AWS Startup Day] AWS 비용 최대 90% 절감하기: 스팟 인스턴스 Deep-Dive
Amazon Web Services Korea
 
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
Amazon Web Services Korea
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Databricks
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
Databricks
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session
Splunk
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Amazon Web Services
 

What's hot (20)

Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
AWS 기반 소프트웨어 서비스(SaaS) -김용우 솔루션즈 아키텍트 :: AWS 파트너 테크시프트 세미나
AWS 기반 소프트웨어 서비스(SaaS) -김용우 솔루션즈 아키텍트 :: AWS 파트너 테크시프트 세미나 AWS 기반 소프트웨어 서비스(SaaS) -김용우 솔루션즈 아키텍트 :: AWS 파트너 테크시프트 세미나
AWS 기반 소프트웨어 서비스(SaaS) -김용우 솔루션즈 아키텍트 :: AWS 파트너 테크시프트 세미나
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Machine Learning Operations & Azure
Machine Learning Operations & AzureMachine Learning Operations & Azure
Machine Learning Operations & Azure
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
[2017 AWS Startup Day] AWS 비용 최대 90% 절감하기: 스팟 인스턴스 Deep-Dive
[2017 AWS Startup Day] AWS 비용 최대 90% 절감하기: 스팟 인스턴스 Deep-Dive [2017 AWS Startup Day] AWS 비용 최대 90% 절감하기: 스팟 인스턴스 Deep-Dive
[2017 AWS Startup Day] AWS 비용 최대 90% 절감하기: 스팟 인스턴스 Deep-Dive
 
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 

Similar to Best Practices for Hyperparameter Tuning with MLflow

Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Lucidworks
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
Databricks
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDeep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Databricks
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Databricks
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
Michael Gerke
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
Simon Hughes
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Machine learning
Machine learningMachine learning
Machine learning
Saravanan Subburayal
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
Databricks
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
Sri Ambati
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
Pierre Gutierrez
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
Mark Tabladillo
 
AlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in PythonAlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in Python
Mark Conway
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
DotNetCampus
 

Similar to Best Practices for Hyperparameter Tuning with MLflow (20)

Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDeep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Machine learning
Machine learningMachine learning
Machine learning
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
AlphaPy
AlphaPyAlphaPy
AlphaPy
 
AlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in PythonAlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in Python
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 

Recently uploaded (20)

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 

Best Practices for Hyperparameter Tuning with MLflow

  • 1. Best Practices for Hyperparameter Tuning with Joseph Bradley April 24, 2019 Spark + AI Summit
  • 2. About me Joseph Bradley • Software engineer at Databricks • Apache Spark committer & PMC member
  • 3. TEAM About Databricks Started Spark project (now Apache Spark) at UC Berkeley in 2009 PRODUCT Unified Analytics Platform MISSION Making Big Data Simple Try for free today. databricks.com
  • 4. Hyperparameters • Express high-level concepts, such as statistical assumptions • Are fixed before training or are hard to learn from data • Affect objective, test time performance, computational cost E.g.: • Linear Regression: regularization, # iterations of optimization • Neural Network: learning rate, # hidden layers
  • 5. Tuning hyperparameters E.g.: Fitting a polynomial Common goals: • More flexible modeling process • Reduced generalization error • Faster training • Plug & play ML
  • 6. Challenges in tuning Curse of dimensionality Non-convex optimization Computational cost Unintuitive hyperparameters
  • 7. Tuning in the Data Science workflow Data
  • 8. Tuning in the Data Science workflow Training Data Test Data ML Model
  • 9. Tuning in the Data Science workflow Training Data Validation Data Test Data Final ML Model ML Model 1 ML Model 2 ML Model 3
  • 10. Tuning in the Data Science workflow ML Model Featurization Model family selection Hyperparameter tuning “AutoML” includes hyperparameter tuning.
  • 11. This talk Popular methods for hyperparameter tuning • Overview of methods • Comparing methods • Open-source tools Tuning in practice with MLflow • Instrument tuning • Analyze results • Productionize models Beyond this talk
  • 12. This talk Popular methods for hyperparameter tuning • Overview of methods • Comparing methods • Open-source tools Tuning in practice with MLflow • Instrument tuning • Analyze results • Productionize models Beyond this talk
  • 13. Overview of tuning methods • Manual search • Grid search • Random search • Population-based algorithms • Bayesian algorithms
  • 14. Manual search Select hyperparameter settings to try based on human intuition. 2 hyperparameters: • [0, ..., 5] • {A, B, ..., F} Expert knowledge tells us to try: (2,C), (2,D), (2,E), (3,C), (3,D), (3,E) A B C D E F 0 1 2 3 4 5
  • 15. Grid Search Try points on a grid defined by ranges and step sizes X-axis: {A,...,F} Y-axis: 0-5, step = 1 A B C D E F 0 1 2 3 4 5
  • 16. A B C D E F 0 1 2 3 4 5 Random Search Sample from distributions over ranges X-axis: Uniform({A,...,F}) Y-axis: Uniform([0,5])
  • 17. Start with random search, then iterate: • Use the previous “generation” to inform the next generation • E.g., sample from best performers & then perturb them Population Based Algorithms A B C D E F 0 1 2 3 4 5
  • 18. Start with random search, then iterate: • Use the previous “generation” to inform the next generation • E.g., sample from best performers & then perturb them Population Based Algorithms A B C D E F 0 1 2 3 4 5
  • 19. Start with random search, then iterate: • Use the previous “generation” to inform the next generation • E.g., sample from best performers & then perturb them Population Based Algorithms A B C D E F 0 1 2 3 4 5
  • 20. Model the loss function: Hyperparameters à loss Iteratively search space, trading off between exploration and exploitation A B C D E F 0 1 2 3 4 5 Bayesian Optimization
  • 36. Comparing tuning methods Iterative / adaptive? # evaluations for P params Model of param space Grid search No O(c^P) none Random search No O(k) none Population-based Yes O(k) implicit Bayesian Yes O(k) explicit
  • 37. Open-source tools for tuning Grid search Random search Population -based Bayesian PyPi downloads last month Github stars License scikit-learn Yes Yes --- --- BSD MLlib Yes --- --- Apache 2.0 scikit- optimize Yes 49,189 1,278 BSD Hyperopt Yes Yes 98,282 3,286 BSD DEAP Yes 26,700 2,789 LGPL v3 TPOT Yes 9,057 5,609 LGPL v3 GPyOpt Yes 4,959 451 BSD As of mid-April 2019
  • 38. This talk Popular methods for hyperparameter tuning • Overview of methods • Comparing methods • Open-source tools Tuning in practice with MLflow • Instrument tuning • Analyze results • Productionize models Beyond this talk
  • 39. This talk Popular methods for hyperparameter tuning • Overview of methods • Comparing methods • Open-source tools Tuning in practice with MLflow • Instrument tuning • Analyze results • Productionize models Beyond this talk
  • 40. Tracking • Experiments • Runs • Parameters • Metrics • Tags & artifacts Projects • Directory or git repository • Entry points • Environments Models • Storage format • Flavors • Deployment tools
  • 41. Organizing with Training Data Validation Data Test Data Final ML ModelML Model 1 ML Model 2 ML Model 3 Experiment Main run Child runs
  • 42. Instrumenting tuning with What to track in a run for a model • Hyperparameters: all vs. ones being tuned • Metric(s): training & validation, loss & objective, multiple objectives • Tags: provenance, simple metadata • Artifacts: serialized model, large metadata Tip: Tune full pipeline, not 1 model.
  • 43. Analyzing how tuning performs Questions to answer • Am I tuning the right hyperparameters? • Am I exploring the right parts of the search space? • Do I need to do another round of tuning? Examining results • Simple case: visualize param vs metric • Challenges: multiple params and metrics, iterative experimentation
  • 44. Moving models to production Repeatable experiments via MLflow Projects • Code checkpoints • Environments Model serialization via MLflow Models • Flavors: TensorFlow, Keras, Spark, MLeap, ... Deployment to prediction services • Azure ML, AWS Sagemaker, Spark UDF
  • 45. Auto-tracking MLlib with Training Data Validation Data Test Data Final ML ModelML Model 1 ML Model 2 ML Model 3 Experiment Main run Child runs In Databricks • CrossValidator & TrainValidationSplit • 1 run per setting of hyperparameters • Avg metrics for CV folds(demo)
  • 46. This talk Popular methods for hyperparameter tuning • Overview of methods • Comparing methods • Open-source tools Tuning in practice with MLflow • Instrument tuning • Analyze results • Productionize models Beyond this talk
  • 47. This talk Popular methods for hyperparameter tuning • Overview of methods • Comparing methods • Open-source tools Tuning in practice with MLflow • Instrument tuning • Analyze results • Productionize models Beyond this talk
  • 48. Advanced topics Efficient tuning • Parallelizing hyperparameter search • Early stopping • Transfer learning Fancy tuning • Multi-metric optimization • Conditional/awkward parameter spaces Check out Maneesh Bhide’s talk: "Advanced Hyperparameter Optimization for Deep Learning" to hear about early stopping, multi-metric, & conditionals Thursday @ 3:30pm, Room 3014
  • 49. Advanced topics Efficient tuning: Parallelizing hyperparameter search Challenge in analyzing results: multiple parameters or multiple metrics Hyperopt + Apache Spark + MLflow integration • Hyperopt: general tuning library for ML in Python • Spark integration: parallelize model tuning in batches • MLflow integration: track runs, analogous to MLlib + MLflow integration (demo)
  • 50. Getting started MLflow: http://mlflow.org MLlib tuning • Databricks auto-tracking with MLflow in private preview now, public preview mid-May Hyperopt • Distributed tuning via Apache Spark: working to open-source the code • Databricks auto-tracking with MLflow in public preview mid-May
  • 51. Thank You! Questions? AMA @ DevLounge Theater Thursday @ 10:30-11am Thanks to Maneesh Bhide for material for this talk!