Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

H2O Advancements - Arno Candel


Published on

Arno Candel, Chief Architect, talks about what's new in H2O including all the new advancements in the algorithms.

- Powered by the open source machine learning software Contributors welcome at:
- To view videos on H2O open source machine learning software, go to:

Published in: Data & Analytics
  • Be the first to comment

H2O Advancements - Arno Candel

  1. 1. H2O Advancements Arno Candel, PhD Chief Architect, Physicist & Hacker, @ArnoCandel July 18 2016
  2. 2. New Enterprise Features Auth & Security LDAP/Kerberos/HTTPS/Encryption for max. compliance, IPv6 Semi-Supervised Pre-train Deep Learning model on unlabeled data, then fine-tune Large POJOs Productionize large Java models (multi-GB source code) Hyper Parameter Tuning Automatically tunes model parameters for the desired metric, with convergence-based early stopping for models and search Steam Platform for Data Products - next-gen product Sparkling Water 2.0 The killer app for the latest version of Apache Spark Advanced Munging Bringing R’s data.table to H2O - scalable, fast and distributed Mission-Critical Capabilities for Enterprise Production Use
  3. 3. New Features for H2O Tree Algorithms: GBM + DRF Highest Accuracy for Smarter Applications Optimal missing value handling Missing data has meaning, is rarely missing at random Optimal splits are found taking missing values into account Quantile-based histograms Finds optimal split points for data with outliers, e.g. -99999,0,1,2,3,4,5 New Algorithm ExtraTreesClassifier (pick best among random split points) Robust Regression: Huber loss for GBM Higher accuracy for models on data with outliers quadratic loss for inliers, linear loss for outliers More tuning parameters for GBM col_sample_rate_change_per_level - H2O exclusive learn_rate_annealing - faster training min_split_improvement - avoids overfitting max_abs_leafnode_pred - avoids overfitting sample_rate_per_class - for imbalanced datasets
  4. 4. Integration with existing GPU backends Leverage open-source tools and research for TensorFlow, Caffe, mxnet, Theano, etc. Scalability and Ease of Use/Deployment of H2O Distributed training, real-time model inspection Flow, R, Python, Spark/Scala, Java, REST, POJO, Steam Convolutional Neural Networks Image, video, speech recognition, etc. Recurrent Neural Networks Sequences, time series, etc. NLP: natural language processing Hybrid Neural Networks Architectures Speech to text translation, image captioning, scene parsing, etc. Deep Water: Next-Gen Deep Learning in H2O Enterprise Deep Learning for Business Transformation Much more about Deep Learning tomorrow afternoon!