Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scalable Automatic Machine Learning with H2O


Published on

In this presentation, Parul Pandey, will provide a history and overview of the field of “Automatic Machine Learning” (AutoML), followed by a detailed look inside H2O’s open source AutoML algorithm. H2O AutoML provides an easy-to-use interface which automates data pre-processing, training and tuning a large selection of candidate models (including multiple stacked ensemble models for superior model performance). The result of the AutoML run is a “leaderboard” of H2O models which can be easily exported for use in production. AutoML is available in all H2O interfaces (R, Python, Scala, web GUI) and due to the distributed nature of the H2O platform, can scale to very large datasets. The presentation will end with a demo of H2O AutoML in R and Python, including a handful of code examples to get you started using automatic machine learning on your own projects.

Parul's Bio:
Parul is a Data Science Evangelist here at She combines Data Science, evangelism and community in her work. Her emphasis is to spread the information about H2O and Driverless AI to as many people as possible, She is also an active writer and has contributed towards various national and international publications.

Published in: Technology
  • Be the first to comment

Scalable Automatic Machine Learning with H2O

  1. 1. Scalable Automatic Machine Learning with H2O Parul Pandey Data Science Evangelist,
  2. 2. What is H2O?, the company H2O, the platform • • • Founded in 2012 Advised by Stanford Professors Hastie, Tibshirani & Boyd Headquarters: Mountain View, California, USA • • • Open Source Software (Apache 2.0 Licensed) R, Python, Scala, Java and Web Interfaces Distributed Machine Learning Algorithms for Big Data
  3. 3. H2OTools
  4. 4. H2O in Industry
  5. 5. Agenda • H2O Platform • Automatic Machine Learning (AutoML) • H2O AutoML Overview • Demo
  6. 6. H2O Platform
  7. 7. H2O Machine Learning Platform • Open source, distributed (multi-core + multi-node) implementations of cutting edge ML algorithms. • Core algorithms written in high performance Java. • APIs available in R, Python, Scala; web GUI. • Easily deploy models to production as pure Java code. • Works on Hadoop, Spark, AWS, your laptop, etc.
  8. 8. H2O Machine Learning Features • Supervised & unsupervised machine learning algos (GBM, RF,DNN, GLM, Stacked Ensembles, etc.) • Imputation, normalization & auto one-hot-encoding • Automatic early stopping • Cross-validation, grid search & random search • Variable importance, model evaluation metrics, plots
  9. 9. Intro to A utomatic Machine Learning
  10. 10. Aspects of Automatic Machine Learning Data Prep Model Generation Ensembles
  11. 11. H2O’s Auto ML
  12. 12. H2O AutoML • Basic data pre-processing (as in all H2O algos). • Trains a Random grid of algorithms like GBMs, DNNs, GLMs, etc. using a carefully chosen hyper-parameter space. • Individual models are tuned using cross-validation. • Two Stacked Ensembles are trained (“All Models” ensemble & a lightweight “Best of Family” ensemble). • Returns a sorted “Leaderboard” of all models. • All models can be easily exported to production.
  13. 13.
  14. 14. Random G r id Search & Stacking • Random Grid Search combined with Stacked Ensembles is a powerful combination. • Ensembles perform particularly well if the models they are based on (1) are individually strong, and (2) make uncorrelated errors. • Stacking usesa second-level metalearning algorithm to find the optimal combination of base learners.
  15. 15. Who is it for?
  16. 16. H 2 O A utoML in R
  17. 17. H2O AutoML in Python
  18. 18. H 2 O A utoML in Flow GUI
  19. 19. H 2 O A utoML Leaderboard Example Leaderboard for binary classification
  20. 20. H2O Auto ML Tutorial
  21. 21. Learn H2O AutoML! • Docs: • R& Py tutorials: • Blog: A Deep dive into H2O’s AutoML
  22. 22. H2O Resources • Documentation: • Tutorials: • Slidedecks: • Videos: • Stack Overflow: • Google Group: • Gitter: • Events & Meetups:
  23. 23. Contribute to H2O! Get in touch over email, Gitter or JIRA.
  24. 24. Thank you!