Successfully reported this slideshow.
Your SlideShare is downloading. ×

H2O Advancements - Arno Candel

H2O Advancements - Arno Candel

Download to read offline

Arno Candel, Chief Architect, H2O.ai talks about what's new in H2O including all the new advancements in the algorithms.

- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata

Arno Candel, Chief Architect, H2O.ai talks about what's new in H2O including all the new advancements in the algorithms.

- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata

More Related Content

H2O Advancements - Arno Candel

  1. 1. H2O Advancements Arno Candel, PhD Chief Architect, Physicist & Hacker, H2O.ai @ArnoCandel July 18 2016
  2. 2. New Enterprise Features Auth & Security LDAP/Kerberos/HTTPS/Encryption for max. compliance, IPv6 Semi-Supervised Pre-train Deep Learning model on unlabeled data, then fine-tune Large POJOs Productionize large Java models (multi-GB source code) Hyper Parameter Tuning Automatically tunes model parameters for the desired metric, with convergence-based early stopping for models and search Steam Platform for Data Products - next-gen product Sparkling Water 2.0 The killer app for the latest version of Apache Spark Advanced Munging Bringing R’s data.table to H2O - scalable, fast and distributed Mission-Critical Capabilities for Enterprise Production Use
  3. 3. New Features for H2O Tree Algorithms: GBM + DRF Highest Accuracy for Smarter Applications Optimal missing value handling Missing data has meaning, is rarely missing at random Optimal splits are found taking missing values into account Quantile-based histograms Finds optimal split points for data with outliers, e.g. -99999,0,1,2,3,4,5 New Algorithm ExtraTreesClassifier (pick best among random split points) Robust Regression: Huber loss for GBM Higher accuracy for models on data with outliers quadratic loss for inliers, linear loss for outliers More tuning parameters for GBM col_sample_rate_change_per_level - H2O exclusive learn_rate_annealing - faster training min_split_improvement - avoids overfitting max_abs_leafnode_pred - avoids overfitting sample_rate_per_class - for imbalanced datasets
  4. 4. Integration with existing GPU backends Leverage open-source tools and research for TensorFlow, Caffe, mxnet, Theano, etc. Scalability and Ease of Use/Deployment of H2O Distributed training, real-time model inspection Flow, R, Python, Spark/Scala, Java, REST, POJO, Steam Convolutional Neural Networks Image, video, speech recognition, etc. Recurrent Neural Networks Sequences, time series, etc. NLP: natural language processing Hybrid Neural Networks Architectures Speech to text translation, image captioning, scene parsing, etc. Deep Water: Next-Gen Deep Learning in H2O Enterprise Deep Learning for Business Transformation Much more about Deep Learning tomorrow afternoon!

×