This session was recorded in San Francisco on February 5th, 2019 and can be viewed here: https://youtu.be/0pvvDHfxdZ8
Driverless AI is H2O.ai's latest flagship product for automatic machine learning. It fully automates some of the most challenging and productive tasks in applied data science such as feature engineering, model tuning, model ensembling and model deployment. Driverless AI turns Kaggle-winning grandmaster recipes into production-ready code, and is specifically designed to avoid common mistakes such as under- or overfitting, data leakage or improper model validation, some of the hardest challenges in data science. Avoiding these pitfalls alone can save weeks or more for each model, and is necessary to achieve high modeling accuracy.
Driverless AI is now equipped with time-series functionality. Time series helps forecast sales, predict industrial machine failure and more. With the time series capability in Driverless AI, H2O.ai directly addresses some of the most pressing concerns of organizations across industries for use cases such as transactional data in capital markets, in retail to track in-store and online sales, and in manufacturing with sensor data to improve supply chain or predictive maintenance.
Bio: Marios Michailidis is a Competitive Data Scientist at H2O.ai. He holds a Bsc in accounting Finance from the University of Macedonia in Greece, an Msc in Risk Management from the University of Southampton and a PhD in machine learning at from UCL . He has worked in both marketing and credit sectors in the UK Market and has led many analytics’ projects with various themes including: acquisition, retention, recommenders, fraud detection, portfolio optimization and more. He is the creator of KazAnova, a freeware GUI for credit scoring and data mining 100% made in Java as well as is the creator of StackNet Meta-Modelling Framework. In his spare time he loves competing on data science challenges and was ranked 1st out of 500,000 members in the popular Kaggle.com data competition platform. He currently ranks 3rd.
Bio: A Kaggle Grandmaster and a Data Scientist at H2O.ai, Mathias Müller holds an AI and ML focused diploma (eq. M.Sc.) in computer science from Humboldt University in Berlin. During his studies, he keenly worked on computer vision in the context of bio-inspired visual navigation of autonomous flying quadrocopters. Prior to H2O.ai, he as a machine learning engineer for FSD Fahrzeugsystemdaten GmbH in the automotive sector. His stint with Kaggle was a chance encounter as he stumbled upon the data competition platform while looking for a more ML-focused platform as compared to TopCoder. This is where he entered his first predictive modeling competition and climbed up the ladder to be a Grandmaster. He is an active contributor to XGBoost and is working on Driverless AI with H2O.ai.
11. Date Sales
1/1/2018 100
2/1/2018 150
3/1/2018 160
4/1/2018 200
5/1/2018 210
6/1/2018 150
7/1/2018 160
8/1/2018 120
9/1/2018 80
10/1/2018 70
Lag1 Lag2
- -
100 -
150 100
160 150
200 160
210 200
150 210
160 150
120 160
80 120
Moving Average
-
100
125
155
180
205
180
155
140
100
Feature Engineering (cont.)
• Lags on subsets of the specified group columns (e.g. {Store, Department} vs. {Department} vs. {Store})
• Exponentially Weighted Moving Averages (EWMA) of n-th order differentiated lags
• Aggregation of lags (mean, std, sums, etc.)
• Interactions of lags (e.g. Lag2 - Lag1)
• Linear regression on lags (taking slope and/or intercept as new features)
12. • Ranking based on autocorrelation
• Pre-defined intervals (based on estimated frequency)
Daily data
• [7, 14, 21, …]
• [14, 28, 32, …]
• …
Weekly data
• [2, 4, 6, 8, …]
• [4, 8, 12, 16, …]
• …
…
Candidates for Lag-Sizes
13. • Lower bound for considered lag sizes
• Dropout
• Random replacement of actual lag-values by „n.a.“
• Align frequency of available lag information between train and validation/test
• Target binning
• Decrease of possible amount of splits GBM can perform
Regularization of Lag-Features