Using H2O Random Grid Search for Hyper-parameters Optimization

Jo-fai Chow
Jo-fai ChowData Scientist at H2O.ai - Customer Success, Smart Apps, Machine Learning & Data Visualization
Using H2O Random Grid Search for
Hyper-parameters Optimization
Jo-fai (Joe) Chow
Data Scientist
joe@h2o.ai
WHO AM I
• Customer Data Scientist at H2O.ai
• Background
o Telecom (Virgin Media)
o Data Science Platform (Domino Data Lab)
o Water Engineering + Machine Learning Research
(STREAM Industrial Doctorate Centre)
ABOUT H2O
• Company
o Team: 50 (45 shown)
o Founded in 2012, Mountain View,
California.
o Venture capital backed
• Products
o Open-source machine learning
platform.
o Flow (Web), R, Python, Spark,
Hadoop interfaces.
ABOUT THIS TALK
• Story of a baker and a data scientist
o Why you should care
• Hyper-parameters optimization
o Common techniques
o H2O Python API
• Other H2O features for streamlining workflow
STORY OF A BAKER
• Making a cake
o Source
• Ingredients
o Process:
• Mixing
• Baking
• Decorating
o End product
• A nice looking cake
Credit: www.dphotographer.co.uk/image/201305/baking_a_cake
STORY OF A DATA SCIENTIST
• Making a data product
o Source
• Raw data
o Process:
• Data munging
• Analyzing / Modeling
• Reporting
o End product
• Apps, graphs or reports
Credit: www.simranjindal.com
BAKER AND DATA SCIENTIST
• What do they have in common?
o Process is important to bakers and data scientists.
Yet, most customers do not appreciate the effort.
o Most customers only care about raw materials
quality and end products.
WHY YOU SHOULD CARE
• We can use machine/software to automate
some laborious tasks.
• We can spend more time on quality assurance
and presentation.
• This talk is about making one specific task,
hyper-parameters tuning, more efficient.
HYPER-PARAMETERS OPTIMISATION
• Overview
o Optimizing an algorithm’s performance.
• e.g. Random Forest, Gradient Boosting Machine (GBM)
o Trying different sets of hyper-parameters within a
defined search space.
o No rules of thumb.
HYPER-PARAMETERS OPTIMISATION
• Example of hyper-parameters in H2O
o Random Forest:
• No. of trees, depth of trees, sample rate …
o Gradient Boosting Machine (GBM):
• No. of trees, depth of trees, learning rate, sample rate …
o Deep Learning:
• Activation, hidden layer sizes, L1, L2, dropout ratios …
COMMON TECHNIQUES
• Manual search
o Tuning by hand - inefficient
o Expert opinion (not always reliable)
• Grid search
o Automated search within a defined space
o Computationally expensive
• Random grid search
o More efficient than manual / grid search
o Equal performance in less time
RANDOM GRID SEARCH – DOES IT WORK?
• Random Search for Hyper-Parameter
Optimization
o Journal of Machine Learning Research (2012)
o James Bergstra and Yoshua Bengio
o “Compared with deep belief networks configured by a
thoughtful combination of manual search and grid
search, purely random search found statistically equal
performance on four of seven data sets, and superior
performance on one of seven.”
RELATED FEATURE – EARLY STOPPING
• A technique for regularization.
• Avoid over-fitting the training set.
• Useful when combined with hyper-parameter
search:
o Additional controls (e.g. time constraint,
tolerance)
H2O RANDOM GRID SEARCH
• Objectives
o Optimize model performance based on evaluation
metric.
o Explore the defined search space randomly.
o Use early-stopping for regularization and
additional controls.
RANDOM GRID SEARCH (PYTHON API)
RANDOM GRID SEARCH
• Outputs
o Best model based on metric
o A set of hyper-parameters for the best model
• Other APIs
o R, REST, Java (see documentation on GitHub)
OTHER H2O FEATURES
• h2oEnsemble
o Better predictive performance
• Sparkling Water = Spark + H2O
• Plain Old Java Object (POJO)
o Productionize H2O models
CONCLUSIONS
• Most people only care about the end product.
• Use H2O random grid search to save time on
hyper-parameters tuning.
• Spend more time on quality assurance and
presentation.
CONCLUSIONS
• H2O Random Grid Search
o An efficient way to tune hyper-parameters
o APIs for Python, R, Java, REST
o Do check out the code examples on GitHub
• Combine with other H2O features
o Streamline data science workflow
ACKNOWLEDGEMENTS
• GoDataDriven
• Conference sponsors
• My colleagues at H2O.ai
THANK YOU
• Resources
o Slides + code – github.com/h2oai/h2o-meetups
o Download H2O – www.h2o.ai
o Documentation – www.h2o.ai/docs/
o joe@h2o.ai
• We are hiring – www.h2o.ai/careers/
1 of 21

Recommended

Building Random Forest at Scale by
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at ScaleSri Ambati
144.9K views42 slides
Introduction to text classification using naive bayes by
Introduction to text classification using naive bayesIntroduction to text classification using naive bayes
Introduction to text classification using naive bayesDhwaj Raj
6.5K views25 slides
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods by
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsSalah Amean
11.6K views83 slides
Do you trust your artificial intelligence system? by
Do you trust your artificial intelligence system?Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?Facultad de Informática UCM
131 views40 slides
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science by
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data ScienceCOVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data ScienceVibhuti Mandral
3.8K views22 slides
Hyperparameter optimization with approximate gradient by
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientFabian Pedregosa
13.6K views18 slides

More Related Content

Similar to Using H2O Random Grid Search for Hyper-parameters Optimization

How and why you need to build a big data lab by
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data labChris Kernaghan
1.3K views27 slides
Scalable Automatic Machine Learning with H2O by
Scalable Automatic Machine Learning with H2OScalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2OSri Ambati
679 views25 slides
Applied Machine learning using H2O, python and R Workshop by
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopAvkash Chauhan
1.8K views50 slides
eBay Experimentation Platform on Hadoop by
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
2.3K views34 slides
Experimentation Platform on Hadoop by
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on HadoopDataWorks Summit
838 views34 slides
Open Platform for AI & ML modeling by
Open Platform for AI & ML modelingOpen Platform for AI & ML modeling
Open Platform for AI & ML modelingInstitute of Contemporary Sciences
74 views32 slides

Similar to Using H2O Random Grid Search for Hyper-parameters Optimization(20)

How and why you need to build a big data lab by Chris Kernaghan
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
Chris Kernaghan1.3K views
Scalable Automatic Machine Learning with H2O by Sri Ambati
Scalable Automatic Machine Learning with H2OScalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2O
Sri Ambati679 views
Applied Machine learning using H2O, python and R Workshop by Avkash Chauhan
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
Avkash Chauhan1.8K views
eBay Experimentation Platform on Hadoop by Tony Ng
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
Tony Ng2.3K views
Reproducible AI using MLflow and PyTorch by Databricks
Reproducible AI using MLflow and PyTorchReproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorch
Databricks1.2K views
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab by Sri Ambati
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Sri Ambati1.2K views
Intro to Machine Learning with H2O and AWS by Sri Ambati
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
Sri Ambati11.2K views
Introduction to H2O and Model Stacking Use Cases by Jo-fai Chow
Introduction to H2O and Model Stacking Use CasesIntroduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use Cases
Jo-fai Chow1.1K views
H2O at Poznan R Meetup by Jo-fai Chow
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
Jo-fai Chow753 views
Improving Model Predictions via Stacking and Hyper-parameters Tuning by Jo-fai Chow
Improving Model Predictions via Stacking and Hyper-parameters TuningImproving Model Predictions via Stacking and Hyper-parameters Tuning
Improving Model Predictions via Stacking and Hyper-parameters Tuning
Jo-fai Chow2.1K views
Scalable Machine Learning in R and Python with H2O by Sri Ambati
Scalable Machine Learning in R and Python with H2OScalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2O
Sri Ambati2.1K views
H2O at Berlin R Meetup by Jo-fai Chow
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
Jo-fai Chow447 views
Berlin R Meetup by Sri Ambati
Berlin R MeetupBerlin R Meetup
Berlin R Meetup
Sri Ambati181 views
Best Practices for Hyperparameter Tuning with MLflow by Databricks
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
Databricks5.6K views
Three signs your architecture is too small for big data. Camp IT December 2014 by Craig Jordan
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
Craig Jordan702 views
The Effect of Third Party Implementations on Reproducibility by Balázs Hidasi
The Effect of Third Party Implementations on ReproducibilityThe Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on Reproducibility
Balázs Hidasi113 views

More from Jo-fai Chow

Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny by
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyMaking Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyJo-fai Chow
702 views50 slides
Automatic and Interpretable Machine Learning in R with H2O and LIME by
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIMEJo-fai Chow
313 views92 slides
Automatic and Interpretable Machine Learning with H2O and LIME by
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEJo-fai Chow
3.7K views57 slides
H2O at BelgradeR Meetup by
H2O at BelgradeR MeetupH2O at BelgradeR Meetup
H2O at BelgradeR MeetupJo-fai Chow
473 views100 slides
Introduction to Machine Learning with H2O and Python by
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonJo-fai Chow
716 views120 slides
Introduction to Machine Learning with H2O and Python by
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonJo-fai Chow
842 views108 slides

More from Jo-fai Chow(20)

Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny by Jo-fai Chow
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyMaking Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Jo-fai Chow702 views
Automatic and Interpretable Machine Learning in R with H2O and LIME by Jo-fai Chow
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIME
Jo-fai Chow313 views
Automatic and Interpretable Machine Learning with H2O and LIME by Jo-fai Chow
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
Jo-fai Chow3.7K views
H2O at BelgradeR Meetup by Jo-fai Chow
H2O at BelgradeR MeetupH2O at BelgradeR Meetup
H2O at BelgradeR Meetup
Jo-fai Chow473 views
Introduction to Machine Learning with H2O and Python by Jo-fai Chow
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
Jo-fai Chow716 views
Introduction to Machine Learning with H2O and Python by Jo-fai Chow
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
Jo-fai Chow842 views
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek by Jo-fai Chow
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekFrom Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
Jo-fai Chow1.8K views
H2O Deep Water - Making Deep Learning Accessible to Everyone by Jo-fai Chow
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow566 views
H2O Deep Water - Making Deep Learning Accessible to Everyone by Jo-fai Chow
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow1K views
Project "Deep Water" by Jo-fai Chow
Project "Deep Water"Project "Deep Water"
Project "Deep Water"
Jo-fai Chow1.5K views
H2O Machine Learning Use Cases by Jo-fai Chow
H2O Machine Learning Use CasesH2O Machine Learning Use Cases
H2O Machine Learning Use Cases
Jo-fai Chow1.5K views
Kaggle Competitions, New Friends, New Skills and New Opportunities by Jo-fai Chow
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New Opportunities
Jo-fai Chow363 views
Introduction to Generalised Low-Rank Model and Missing Values by Jo-fai Chow
Introduction to Generalised Low-Rank Model and Missing ValuesIntroduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing Values
Jo-fai Chow2.3K views
Kaggle competitions, new friends, new skills and new opportunities by Jo-fai Chow
Kaggle competitions, new friends, new skills and new opportunitiesKaggle competitions, new friends, new skills and new opportunities
Kaggle competitions, new friends, new skills and new opportunities
Jo-fai Chow973 views
Deploying your Predictive Models as a Service via Domino by Jo-fai Chow
Deploying your Predictive Models as a Service via DominoDeploying your Predictive Models as a Service via Domino
Deploying your Predictive Models as a Service via Domino
Jo-fai Chow19.1K views
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain... by Jo-fai Chow
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
Jo-fai Chow1.2K views
Designing Sustainable Drainage Systems by Jo-fai Chow
Designing Sustainable Drainage SystemsDesigning Sustainable Drainage Systems
Designing Sustainable Drainage Systems
Jo-fai Chow2.2K views
Developing a New Decision Support System for SuDS by Jo-fai Chow
Developing a New Decision Support System for SuDSDeveloping a New Decision Support System for SuDS
Developing a New Decision Support System for SuDS
Jo-fai Chow1.3K views
Udacity Statement (Introduction to Statistics, August 2012) by Jo-fai Chow
Udacity Statement (Introduction to Statistics, August 2012)Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)
Jo-fai Chow365 views
Coursera Statement (Computational Investing, Part I, by Jo-fai Chow
Coursera Statement (Computational Investing, Part I, Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I,
Jo-fai Chow532 views

Recently uploaded

3196 The Case of The East River by
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East RiverErickANDRADE90
11 views4 slides
SAP-TCodes.pdf by
SAP-TCodes.pdfSAP-TCodes.pdf
SAP-TCodes.pdfmustafaghulam8181
6 views285 slides
Cross-network in Google Analytics 4.pdf by
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdfGA4 Tutorials
6 views7 slides
How Leaders See Data? (Level 1) by
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)Narendra Narendra
13 views76 slides
Ukraine Infographic_22NOV2023_v2.pdf by
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdfAnastosiyaGurin
1.3K views3 slides
UNEP FI CRS Climate Risk Results.pptx by
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptxpekka28
11 views51 slides

Recently uploaded(20)

3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9011 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.3K views
UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 views
Organic Shopping in Google Analytics 4.pdf by GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials11 views
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862012 views
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf by vikas12611618
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfVikas 500 BIG DATA TECHNOLOGIES LAB.pdf
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
vikas126116188 views

Using H2O Random Grid Search for Hyper-parameters Optimization

  • 1. Using H2O Random Grid Search for Hyper-parameters Optimization Jo-fai (Joe) Chow Data Scientist joe@h2o.ai
  • 2. WHO AM I • Customer Data Scientist at H2O.ai • Background o Telecom (Virgin Media) o Data Science Platform (Domino Data Lab) o Water Engineering + Machine Learning Research (STREAM Industrial Doctorate Centre)
  • 3. ABOUT H2O • Company o Team: 50 (45 shown) o Founded in 2012, Mountain View, California. o Venture capital backed • Products o Open-source machine learning platform. o Flow (Web), R, Python, Spark, Hadoop interfaces.
  • 4. ABOUT THIS TALK • Story of a baker and a data scientist o Why you should care • Hyper-parameters optimization o Common techniques o H2O Python API • Other H2O features for streamlining workflow
  • 5. STORY OF A BAKER • Making a cake o Source • Ingredients o Process: • Mixing • Baking • Decorating o End product • A nice looking cake Credit: www.dphotographer.co.uk/image/201305/baking_a_cake
  • 6. STORY OF A DATA SCIENTIST • Making a data product o Source • Raw data o Process: • Data munging • Analyzing / Modeling • Reporting o End product • Apps, graphs or reports Credit: www.simranjindal.com
  • 7. BAKER AND DATA SCIENTIST • What do they have in common? o Process is important to bakers and data scientists. Yet, most customers do not appreciate the effort. o Most customers only care about raw materials quality and end products.
  • 8. WHY YOU SHOULD CARE • We can use machine/software to automate some laborious tasks. • We can spend more time on quality assurance and presentation. • This talk is about making one specific task, hyper-parameters tuning, more efficient.
  • 9. HYPER-PARAMETERS OPTIMISATION • Overview o Optimizing an algorithm’s performance. • e.g. Random Forest, Gradient Boosting Machine (GBM) o Trying different sets of hyper-parameters within a defined search space. o No rules of thumb.
  • 10. HYPER-PARAMETERS OPTIMISATION • Example of hyper-parameters in H2O o Random Forest: • No. of trees, depth of trees, sample rate … o Gradient Boosting Machine (GBM): • No. of trees, depth of trees, learning rate, sample rate … o Deep Learning: • Activation, hidden layer sizes, L1, L2, dropout ratios …
  • 11. COMMON TECHNIQUES • Manual search o Tuning by hand - inefficient o Expert opinion (not always reliable) • Grid search o Automated search within a defined space o Computationally expensive • Random grid search o More efficient than manual / grid search o Equal performance in less time
  • 12. RANDOM GRID SEARCH – DOES IT WORK? • Random Search for Hyper-Parameter Optimization o Journal of Machine Learning Research (2012) o James Bergstra and Yoshua Bengio o “Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search found statistically equal performance on four of seven data sets, and superior performance on one of seven.”
  • 13. RELATED FEATURE – EARLY STOPPING • A technique for regularization. • Avoid over-fitting the training set. • Useful when combined with hyper-parameter search: o Additional controls (e.g. time constraint, tolerance)
  • 14. H2O RANDOM GRID SEARCH • Objectives o Optimize model performance based on evaluation metric. o Explore the defined search space randomly. o Use early-stopping for regularization and additional controls.
  • 15. RANDOM GRID SEARCH (PYTHON API)
  • 16. RANDOM GRID SEARCH • Outputs o Best model based on metric o A set of hyper-parameters for the best model • Other APIs o R, REST, Java (see documentation on GitHub)
  • 17. OTHER H2O FEATURES • h2oEnsemble o Better predictive performance • Sparkling Water = Spark + H2O • Plain Old Java Object (POJO) o Productionize H2O models
  • 18. CONCLUSIONS • Most people only care about the end product. • Use H2O random grid search to save time on hyper-parameters tuning. • Spend more time on quality assurance and presentation.
  • 19. CONCLUSIONS • H2O Random Grid Search o An efficient way to tune hyper-parameters o APIs for Python, R, Java, REST o Do check out the code examples on GitHub • Combine with other H2O features o Streamline data science workflow
  • 20. ACKNOWLEDGEMENTS • GoDataDriven • Conference sponsors • My colleagues at H2O.ai
  • 21. THANK YOU • Resources o Slides + code – github.com/h2oai/h2o-meetups o Download H2O – www.h2o.ai o Documentation – www.h2o.ai/docs/ o joe@h2o.ai • We are hiring – www.h2o.ai/careers/

Editor's Notes

  1. New face in PyData conference Thanks for having me I am Joe. I work for H2O. Only joined 6 weeks ago Not really a contributor to their code base – more like a long-term user (using H2O for my previous jobs since 2014) Today’s talk is about one specific feature in H2O Instead of giving you a full demo Tell you a story from a user’s perspective
  2. A little bit more background of me H2O – helping customers leverage H2O with their data, build smart apps Before I joined H2O VM – building data apps for CEO and CFO – answering specific business questions Domino – evangelist – blogging and going to meetups and conferences Before data science journey, I was a civil engineer – working on water related research
  3. Team is growing rapidly Based in Mountain View – I am the only guy in UK/Europe VC: $34M in total
  4. Start with a story of baker / data scientist Background information Explain why you should care about the main topic – hyper-para tuning What is it Techniques API Other useful features in H2O
  5. Start with a story Or a workflow of making a cake
  6. Raw data – structured / unstructured / different databases
  7. Customers don’t see much of the process