Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Scalable	and	Automatic
Machine	Learning	with	H2O
Introduction,	demos	and	a	real-world	use-case
Jo-fai	(Joe)	Chow
Data	Scie...
Agenda
• Talk	1:	Introduction	to	H2O
• Company	and	People
• H2O	Open	Source	ML	Platform
• Demos
• H2O	on	Hadoop	(320	Cores...
Founded 2012, Series C in Nov, 2017
Products • Driverless AI – Automated Machine Learning
• H2O Open Source Machine Learni...
4
Our	Mission:
Make	Machine	Learning	Accessible	to	Everyone
Scientific	Advisory	Council
5
6H2O	Team
7H2O	Team
Arno	Candel,	CTO
Fortune’s	2014	Big	Data	All-Star
Sri	Ambati,	Co-founder	&	CEO
8H2O	Team
Origin	of	R	Package	`ggplot2`
9H2O	Team
Matt	Dowle
10H2O	Team
Erin	LeDell,	Chief	ML	Scientist
Women	in	ML/DS	& R-Ladies	Global
11H2O	Team
1st
4th
25th
48th
33rd
Their	Highest	Rank	in	Kaggle
(about	80,000	competitors)
12H2O	Team
1st
4th
25th
48th
33rd
181st
Trying	to	get	closer	to	them	at	some	point	…
13H2O	Team
Joe
Avni
Priya
Bonsoir!
14H2O	Team
H2O	Team	in	UK
Feb	2016	- Present
June	2017	- Present
Joe’s	Roles	at	H2O.ai
15
• Data	Scientist	/	
Sales	Engineer	/	
Speaker	/
Meetup	Organiser	/	
Community	Evangelist
(on	pape...
Joe’s	Real	Job	at	H2O.ai
16
Reminder:	#360Selfie
H2O Products
In-Memory, Distributed
Machine Learning Algorithms
with H2O Flow GUI
H2O AI Open Source Engine
Integration wi...
*	DATA	FROM	GOOGLE	ANALYTICS	EMBEDDED	IN	THE	END	USER	PRODUCT	
Worldwide
Community
Adoption
CONFIDENTIAL
Gartner names H2O as Leader with the most completeness of vision
• H2O.ai recognized as a technology
leader w...
CONFIDENTIAL
Platforms with H2O integration
H2O + KNIME Talk
at KNIME Summit
Mar 2017
H2O.ai Solution Leadership Across Verticals
21
2
1
Financial InsuranceMarketing TelecomHealthcareRetail Advisory	&	
Accoun...
Community Expansion
Find	out	more:	www.h2o.ai/community/
23
H2O Products
In-Memory, Distributed
Machine Learning Algorithms
with H2O Flow GUI
H2O AI Open Source Engine
Integration wi...
HDFS
S3
NFS
Distributed
In-Memory
Load	Data
Loss-less
Compression
H2O	Compute	Engine
Production	Scoring	Environment
Explor...
HDFS
S3
NFS
Distributed
In-Memory
Load	Data
Loss-less
Compression
H2O	Compute	Engine
Production	Scoring	Environment
Explor...
Supported Formats & Data Sources
CSV
XLS
XLSX
ORC*
Hive*
SVMLight
ARFF
Parquet
Avro 1.8.0*
HDFS
S3
NFS
LOCAL
SQL
9Formats ...
HDFS
S3
NFS
Distributed
In-Memory
Load	Data
Loss-less
Compression
H2O	Compute	Engine
Production	Scoring	Environment
Explor...
H2O	Core
CPU
Model Building
H2O
H2O	Core
H2O H2O H2O
H2O	Core
CPU CPU CPU
Model Building
H2O Distributed In-Memory
H2O	Core
YARN
CPU CPU CPU
Model Building
H2O Distributed In-Memory
SQL NFS
S3
Firewall or Cloud
Distributed	Algorithms
• Foundation	for	In-Memory	Distributed	Algorithm	
Calculation	- Distributed	Data	Frames and	columna...
Supervised	Learning
• Generalized	Linear	Models:	Binomial,	
Gaussian,	Gamma,	Poisson	and	Tweedie
• Naïve	Bayes	
Statistica...
HDFS
S3
NFS
Distributed
In-Memory
Load	Data
Loss-less
Compression
H2O	Compute	Engine
Production	Scoring	Environment
Explor...
H2O	Flow	(Web)	– First	Demo
36
Using	H2O	with	R	and	Python	– Second	Demo
37
HDFS
S3
NFS
Distributed
In-Memory
Load	Data
Loss-less
Compression
H2O	Compute	Engine
Production	Scoring	Environment
Explor...
39
URL: docs.h2o.ai
Demo:	
H2O	on	a	320-Core	Hadoop	
Cluster
(Web	Interface)
40
41
https://www.kaggle.com/c/higgs-boson
Learning	from	Higgs	Boson	Machine	Data
42
Sensors
(Detector)	
Data
Historical	Outcome
Is	it	a	Higgs	Particle	
(Yes/No)
Pre...
43
11M Rows Size	(Raw):					7.48	GB
Compressed:	2.00	GB	(≈	27%	of	Raw)
44
10	nodes
10	x	32	=	
320	Cores
10	x		29.6	=	296	
GB	Memory
H2O	Water	Meter	
(CPU	Monitor)
45
10	x	32	=	320	Cores
Demo:	AutoML
46
Automatic	Machine	Learning	with	H2O
(R	Interface)
47Think 2018 / 3456 / March, 2018 / © 2018 IBM Corporation
48
AutoML
Think 2018 / 3456 / March, 2018 / © 2018 IBM Corporation
49
50
51
Learning	from	Boston	Housing	Data
52
Crime,
No.	of	rooms,
Age	…
Historical	House	Price Predicted	House	Price
H2O	AutoML:	
...
53
54
55
56
57
58
59
Other	H2O	News
60
Latest	Developments
Events
H2O Products
In-Memory, Distributed
Machine Learning Algorithms
with H2O Flow GUI
H2O AI Open Source Engine
Integration wi...
“Confidential	and	property	of	H2O.ai.	All	rights	reserved”
Supervised Learning
• Generalized Linear Models: Binomial,
Gaus...
“Confidential	and	property	of	H2O.ai.	All	rights	reserved”
Supervised Learning
• Generalized Linear Models: Binomial,
Gaus...
64
https://github.com/h2oai/h2o4gpu
65
End	of	First	Talk
66
Any	Questions?
Making	Multimillion-Dollar	Decisions	
with	H2O	AutoML,	LIME	and	Shiny
My	journey	to	a	real	Moneyball	application
About	Moneyball
The	first	rule	of	Moneyball:	
You	do	not	ask	me	about	the	names	of	team	and	player	involved.
The	second	ru...
About	Moneyball
69
Billy	Beane Peter	Brand
(based	on	Paul	DePodesta)
Ari	Kaplan	– the	Real	”Moneyball”	Guy
70
• The	real	characters	in	the	movie	(Billy	
Beane	and	Paul	DePodesta)	did	not	
wan...
A	Proof-of-Concept	Demo	for	
IBM	Think	Conference
71
Enterprise Solution
72Think 2018 / 3456 / March, 2018 / © 2018 IBM Corporation
The Architecture
The Workflow
1. Data loade...
Approach One: Learning from Lahman only
Lahman: Age, Height, Weight …
Historical Performance Stats
Home Runs
Batting Avera...
Approach Two: Learning from Lahman & AriDB
Lahman: Age, Height, Weight …
Historical Performance Stats
Home Runs
Batting Av...
Timeline
• March	19	– AutoML	Predictions	finalized.	
Initial	presentation	in	Excel.
• March	20	– Version	1	of	Shiny	app.	A...
Shiny App
76
Presentation
Green: Predictions based
on Lahman only
Orange: Predictions based
on AriDB + Lahman
Think 2018 /...
Acknowledgement
77
• Organisers	&	Sponsors
• Alexia	Audevart
• Christophe	Regouby
• HarryCow	Coworking
• H2O’s	Mission
• Democratize	AI
• Mak...
Upcoming SlideShare
Loading in …5
×

Scalable and Automatic Machine Learning with H2O

718 views

Published on

H2O is widely used for machine learning projects. A TechCrunch article, published in January 2017 by John Mannes, reported that around 20% of Fortune 500 companies use H2O.

Talk 1: Introduction to Scalable & Automatic Machine Learning with H2O

In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. Although H2O and other tools have made it easier for practitioners to train and deploy machine learning models at scale, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models.

In this presentation, Joe will introduce the AutoML functionality in H2O. H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard.

Talk 2: Making Multimillion-dollar Baseball Decisions with H2O AutoML and Shiny

Joe recently teamed up with IBM and Aginity to create a proof of concept "Moneyball" app for the IBM Think conference in Vegas. The original goal was to prove that different tools (e.g. H2O, Aginity AMP, IBM Data Science Experience, R and Shiny) could work together seamlessly for common business use-cases. Little did Joe know, the app would be used by Ari Kaplan (the real "Moneyball" guy) to validate the future performance of some baseball players. Ari recommended one player to a Major League Baseball team. The player was signed the next day with a multimillion-dollar contract. This talk is about Joe's journey to a real "Moneyball" application.

Bio : Jo-fai (or Joe) Chow is a data scientist at H2O.ai. Before joining H2O, he was in the business intelligence team at Virgin Media in UK where he developed data products to enable quick and smart business decisions. He also worked remotely for Domino Data Lab in the US as a data science evangelist promoting products via blogging and giving talks at meetups. Joe has a background in water engineering. Before his data science journey, he was an EngD research engineer at STREAM Industrial Doctorate Centre working on machine learning techniques for drainage design optimization. Prior to that, he was an asset management consultant specialized in data mining and constrained optimization for the utilities sector in the UK and abroad. He also holds an MSc in Environmental Management and a BEng in Civil Engineering.

Published in: Technology
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Scalable and Automatic Machine Learning with H2O

  1. 1. Scalable and Automatic Machine Learning with H2O Introduction, demos and a real-world use-case Jo-fai (Joe) Chow Data Scientist / Community Manager joe@h2o.ai @matlabulous
  2. 2. Agenda • Talk 1: Introduction to H2O • Company and People • H2O Open Source ML Platform • Demos • H2O on Hadoop (320 Cores) • AutoML • Other News • Talk 2: Moneyball • From a proof-of-concept project to a multimillion-dollar contract 2
  3. 3. Founded 2012, Series C in Nov, 2017 Products • Driverless AI – Automated Machine Learning • H2O Open Source Machine Learning • Sparkling Water Mission Democratize AI. Do Good Team ~100 employees • Distributed Systems Engineers doing Machine Learning • World-class visualization designers Offices Mountain View, London, Prague 3 Company Overview
  4. 4. 4 Our Mission: Make Machine Learning Accessible to Everyone
  5. 5. Scientific Advisory Council 5
  6. 6. 6H2O Team
  7. 7. 7H2O Team Arno Candel, CTO Fortune’s 2014 Big Data All-Star Sri Ambati, Co-founder & CEO
  8. 8. 8H2O Team Origin of R Package `ggplot2`
  9. 9. 9H2O Team Matt Dowle
  10. 10. 10H2O Team Erin LeDell, Chief ML Scientist Women in ML/DS & R-Ladies Global
  11. 11. 11H2O Team 1st 4th 25th 48th 33rd Their Highest Rank in Kaggle (about 80,000 competitors)
  12. 12. 12H2O Team 1st 4th 25th 48th 33rd 181st Trying to get closer to them at some point …
  13. 13. 13H2O Team Joe Avni Priya Bonsoir!
  14. 14. 14H2O Team H2O Team in UK Feb 2016 - Present June 2017 - Present
  15. 15. Joe’s Roles at H2O.ai 15 • Data Scientist / Sales Engineer / Speaker / Meetup Organiser / Community Evangelist (on paper) • Unofficial Photographer of H2O.ai SWAG (the travelling data scientist) • H2O.ai SWAG EMEA Distributor (please help yourself)
  16. 16. Joe’s Real Job at H2O.ai 16 Reminder: #360Selfie
  17. 17. H2O Products In-Memory, Distributed Machine Learning Algorithms with H2O Flow GUI H2O AI Open Source Engine Integration with Spark Lightning Fast machine learning on GPUs Automatic feature engineering, machine learning and interpretability Secure multi-tenant H2O clusters
  18. 18. * DATA FROM GOOGLE ANALYTICS EMBEDDED IN THE END USER PRODUCT Worldwide Community Adoption
  19. 19. CONFIDENTIAL Gartner names H2O as Leader with the most completeness of vision • H2O.ai recognized as a technology leader with most completeness of vision • H2O.ai was recognized for the mindshare, partner network and status as a quasi-industry standard for machine learning and AI. • H2O customers gave the highest overall score among all the vendors for sales relationship and account management, customer support (onboarding, troubleshooting, etc.) and overall service and support.
  20. 20. CONFIDENTIAL Platforms with H2O integration H2O + KNIME Talk at KNIME Summit Mar 2017
  21. 21. H2O.ai Solution Leadership Across Verticals 21 2 1 Financial InsuranceMarketing TelecomHealthcareRetail Advisory & Accounting
  22. 22. Community Expansion Find out more: www.h2o.ai/community/
  23. 23. 23
  24. 24. H2O Products In-Memory, Distributed Machine Learning Algorithms with H2O Flow GUI H2O AI Open Source Engine Integration with Spark Lightning Fast machine learning on GPUs Automatic feature engineering, machine learning and interpretability Secure multi-tenant H2O clusters In-Memory, Distributed Machine Learning Algorithms with H2O Flow GUI
  25. 25. HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture 25
  26. 26. HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture 26 Import Data from Multiple Sources
  27. 27. Supported Formats & Data Sources CSV XLS XLSX ORC* Hive* SVMLight ARFF Parquet Avro 1.8.0* HDFS S3 NFS LOCAL SQL 9Formats 5Sources File type or Folder of Files * 1. only if H2O is running as a Hadoop job * 2. Hive files that are saved in ORC format * 3. without multi-file parsing or column type modification
  28. 28. HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture 28 Fast, Scalable & Distributed Compute Engine Written in Java
  29. 29. H2O Core CPU Model Building H2O
  30. 30. H2O Core H2O H2O H2O
  31. 31. H2O Core CPU CPU CPU Model Building H2O Distributed In-Memory
  32. 32. H2O Core YARN CPU CPU CPU Model Building H2O Distributed In-Memory SQL NFS S3 Firewall or Cloud
  33. 33. Distributed Algorithms • Foundation for In-Memory Distributed Algorithm Calculation - Distributed Data Frames and columnar compression • All algorithms are distributed in H2O: GBM, GLM, DRF, Deep Learning and more. Fine-grained map-reduce iterations. • Only enterprise-grade, open-source distributed algorithms in the market User Benefits Advantageous Foundation • “Out-of-box” functionalities for all algorithms (NO MORE SCRIPTING) and uniform interface across all languages: R, Python, Java • Designed for all sizes of data sets, especially large data • Highly optimized Java code for model exports • In-house expertise for all algorithms Parallel Parse into Distributed Rows Fine Grain Map Reduce Illustration: Scalable Distributed Histogram Calculation for GBM Foundation for Distributed Algorithms 33
  34. 34. Supervised Learning • Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie • Naïve Bayes Statistical Analysis Ensembles • Distributed Random Forest: Classification or regression models • Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations Deep Neural Networks • Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations H2O-3 Algorithms Overview Unsupervised Learning • K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k Clustering Dimensionality Reduction • Principal Component Analysis: Linearly transforms correlated variables to independent components • Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data Anomaly Detection • Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning 34
  35. 35. HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture 35 Multiple Interfaces
  36. 36. H2O Flow (Web) – First Demo 36
  37. 37. Using H2O with R and Python – Second Demo 37
  38. 38. HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture 38 Export Standalone Models for Production
  39. 39. 39 URL: docs.h2o.ai
  40. 40. Demo: H2O on a 320-Core Hadoop Cluster (Web Interface) 40
  41. 41. 41 https://www.kaggle.com/c/higgs-boson
  42. 42. Learning from Higgs Boson Machine Data 42 Sensors (Detector) Data Historical Outcome Is it a Higgs Particle (Yes/No) Predicted Outcome Learn the Pattern 11M Rows 28 Features Raw Data Size: 7.48 GB
  43. 43. 43 11M Rows Size (Raw): 7.48 GB Compressed: 2.00 GB (≈ 27% of Raw)
  44. 44. 44 10 nodes 10 x 32 = 320 Cores 10 x 29.6 = 296 GB Memory
  45. 45. H2O Water Meter (CPU Monitor) 45 10 x 32 = 320 Cores
  46. 46. Demo: AutoML 46 Automatic Machine Learning with H2O (R Interface)
  47. 47. 47Think 2018 / 3456 / March, 2018 / © 2018 IBM Corporation
  48. 48. 48 AutoML Think 2018 / 3456 / March, 2018 / © 2018 IBM Corporation
  49. 49. 49
  50. 50. 50
  51. 51. 51
  52. 52. Learning from Boston Housing Data 52 Crime, No. of rooms, Age … Historical House Price Predicted House Price H2O AutoML: Learn the Pattern
  53. 53. 53
  54. 54. 54
  55. 55. 55
  56. 56. 56
  57. 57. 57
  58. 58. 58
  59. 59. 59
  60. 60. Other H2O News 60 Latest Developments Events
  61. 61. H2O Products In-Memory, Distributed Machine Learning Algorithms with H2O Flow GUI H2O AI Open Source Engine Integration with Spark Lightning Fast machine learning on GPUs Automatic feature engineering, machine learning and interpretability Secure multi-tenant H2O clusters Lightning Fast machine learning on GPUs
  62. 62. “Confidential and property of H2O.ai. All rights reserved” Supervised Learning • Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie • Naïve Bayes Statistical Analysis Ensembles • Distributed Random Forest: Classification or regression models • Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations Deep Neural Networks • Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations Algorithms on H2O-3 (CPU) Unsupervised Learning • K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k Clustering Dimensionality Reduction • Principal Component Analysis: Linearly transforms correlated variables to independent components • Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data Anomaly Detection • Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning
  63. 63. “Confidential and property of H2O.ai. All rights reserved” Supervised Learning • Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie • Naïve Bayes Statistical Analysis Ensembles • Distributed Random Forest: Classification or regression models • Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations Deep Neural Networks • Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations Algorithms on H2O4GPU (more to come) Unsupervised Learning • K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k Clustering Dimensionality Reduction • Principal Component Analysis: Linearly transforms correlated variables to independent components • Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data Anomaly Detection • Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning
  64. 64. 64 https://github.com/h2oai/h2o4gpu
  65. 65. 65
  66. 66. End of First Talk 66 Any Questions?
  67. 67. Making Multimillion-Dollar Decisions with H2O AutoML, LIME and Shiny My journey to a real Moneyball application
  68. 68. About Moneyball The first rule of Moneyball: You do not ask me about the names of team and player involved. The second rule of Moneyball: You do not ask me about the names of team and player involved. (… for legal reasons …) The third rule of Moneyball: If you happen to guess the names right, I can neither confirm nor deny. 68
  69. 69. About Moneyball 69 Billy Beane Peter Brand (based on Paul DePodesta)
  70. 70. Ari Kaplan – the Real ”Moneyball” Guy 70 • The real characters in the movie (Billy Beane and Paul DePodesta) did not want to work with Hollywood. • The filmmaker interviewed Ari instead and created the Paul DePodesta character based on Ari’s real-life story. • Ari happens to work at Aginity so we have a real ”Moneyball” guy for this demo.
  71. 71. A Proof-of-Concept Demo for IBM Think Conference 71
  72. 72. Enterprise Solution 72Think 2018 / 3456 / March, 2018 / © 2018 IBM Corporation The Architecture The Workflow 1. Data loaded into the databases 2. Connected diverse data sources to Amp 3. Amp used to create derived attributes and publish them and data to DSX and H2O 4. DSX and H2O to build and tweak statistical and machine learning models 5. Visualizations tested in Immersive Insights 6. Steps 4 and 5 repeated to get settled data 7. Statistical and machine learning models saved in Amp 8. Data exported to Immersive Insights for final visualizations DB2 Machine Learning & AI Libraries Data Science Modeling Tools Augmented Reality Visualization Analytic Management and Reuse Layer Hadoop Data Environment High-performing Database for Analytics
  73. 73. Approach One: Learning from Lahman only Lahman: Age, Height, Weight … Historical Performance Stats Home Runs Batting Average … Predictions H2O AutoML: Learn the Pattern Sliding Windows (Stats from previous n years) About 300 Lahman Features 73
  74. 74. Approach Two: Learning from Lahman & AriDB Lahman: Age, Height, Weight … Historical Performance Stats Home Runs Batting Average … Predictions H2O AutoML: Learn the Pattern Sliding Windows (Stats from previous n years) About 300 Lahman Features + 200 AriDB Features AriDB: Fastball, curveball, slider, velocity … 74
  75. 75. Timeline • March 19 – AutoML Predictions finalized. Initial presentation in Excel. • March 20 – Version 1 of Shiny app. Ari used to app to validate some players he had in mind and recommended one player to his team. • March 21 – Multimillion-dollar contract finalized. • March 22 – Moneyball presentation at IBM Think 75
  76. 76. Shiny App 76 Presentation Green: Predictions based on Lahman only Orange: Predictions based on AriDB + Lahman Think 2018 / 3456 / March, 2018 / © 2018 IBM Corporation
  77. 77. Acknowledgement 77
  78. 78. • Organisers & Sponsors • Alexia Audevart • Christophe Regouby • HarryCow Coworking • H2O’s Mission • Democratize AI • Make Machine Learning Accessible to Everyone 78 Merci beaucoup! • Code, Slides & Documents • bit.ly/h2o_meetups • docs.h2o.ai • Contact • joe@h2o.ai • @matlabulous • github.com/woobe • Please search/ask questions on Stack Overflow • Use the tag `h2o` (not h2 zero)

×