Generalized linear models (GLMs) and gradient boosting machines (GBMs) are two of the most widely used supervised learning approaches in all of commercial data science. GLMs have been the go-to predictive and inferential modeling tool for decades, but important mathematical and computational advances have been made in training GLMs in recent years. This talk will contrast H2O’s implementation of penalized GLM techniques with ordinary least squares and give specific hints for building regularized and accurate GLMs for both predictive and inferential purposes. As more organizations begin experimenting with and embracing algorithms from the machine learning tradition, GBMs have come to prominence due to their predictive accuracy, the ability to train on real-world data, and resistance to overfitting training data. This talk will give some background on the GBM approach, some insight into the H2O implementation, and some tips for tuning and interpreting GBMs in H2O.
Patrick Hall is a senior data scientist and product engineer at H2O.ai. Patrick works with H2O.ai customers to derive substantive business value from machine learning technologies. His product work at H2O.ai focuses on two important aspects of applied machine learning, model interpretability and model deployment. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning.
Prior to joining H2O.ai, Patrick held global customer facing roles and R & D research roles at SAS Institute. He holds multiple patents in automated market segmentation using clustering and deep neural networks. Patrick is the 11th person worldwide to become a Cloudera certified data scientist. He studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University.