H2O.ai
Machine Intelligence
Fast, Scalable In-Memory Machine and Deep Learning
For Smarter Applications
Python with H2O
Cliff Click
H2O.ai
Machine Intelligence
Who Am I?
Cliff Click
CTO, Co-Founder H2O.ai
cliff@h2o.ai
40 yrs coding
35 yrs building compiler
30 yrs distributed computation
20 yrs OS, device drivers, HPC, HotSpot
10 yrs Low-latency GC, custom java hardware,
NonBlockingHashMap
20 patents, dozens of papers
100s of public talks
PhD Computer Science
1995 Rice University
HotSpot JVM Server Compiler
“showed the world JITing is possible”
H2O.ai
Machine Intelligence
H2O Open Source In-Memory
Machine Learning for Big Data
Distributed In-Memory Math Platform
GLM, GBM, RF, K-Means, PCA, Deep Learning
Easy to use SDK & API
Java, R/CRAN, Scala, Spark, Python, JSON, Browser GUI
Use ALL your data
Modeling without sampling
HDFS, S3, NFS, NoSql
Big Data & Better Algorithms
Better Predictions!
H2O.ai
Machine Intelligence
TBD.
Customer
Support
TBD
Head of
Sales
Distributed
Systems
Engineers
Making
ML Scale!
H2O.ai
Machine Intelligence
Practical Machine Learning
Value Requirements
Fast & Interactive In-Memory
Big Data (No Sampling) Distributed
Ownership Open Source
Extensibility API/SDK
Portability Java, REST/JSON
Infrastructure Cloud or On-Premise Hadoop
or Private Cluster
H2O.ai
Machine Intelligence
H2O Architecture
Prediction Engine
R & Exec Engine
Web Interface
Spark Scala REPL
Nano-Fast
Scoring Engine
Distributed
In-Memory K/V Store
Column Compress Data
Map/Reduce
Memory Manager
Algorithms!
GBM, Random Forest,
GLM, PCA, K-Means,
Deep Learning
HDFS S3 NFS
H2O.ai
Machine Intelligence
H2O Architecture
Prediction Engine
R & Exec Engine
Web Interface
Spark Scala REPL
Nano-Fast
Scoring Engine
Distributed
In-Memory K/V Store
Column Compress Data
Map/Reduce
Memory Manager
Algorithms!
GBM, Random Forest,
GLM, PCA, K-Means,
Deep Learning
HDFS S3 NFS
H2O.ai
Machine Intelligence
Demo!
Python Demo
●
CitiBike of NYC
●
Predict bikes-per-hour-per-station
– From per-trip logs
●
10M rows of data
●
Group-By, date/time feature-munging
H2O.ai
Machine Intelligence
H2O: A Platform for Big Math
●
Most Any Java on Big 2-D Tables
– Write like its single-thread POJO code
– Runs distributed & parallel by default
●
Fast: billion row logistic regression takes 4 sec
●
Worlds first parallel & distributed GBM
– Plus GBM, Deep Learn / Neural Nets, RF, PCA, GLM...
●
R integration: use terabyte datasets from R
●
Sparkling Water: Direct Spark integration
H2O.ai
Machine Intelligence
H2O: A Platform for Big Math
●
Easy launch: “java -jar h2o.jar”
– No GC tuning: -Xmx as big as you like
●
Production ready:
– Private on-premise cluster OR
– In the Cloud
– Hadoop, Yarn, EC2, or standalone cluster
– HDFS, S3, NFS, URI & other datasources
– Open Source, Apache v2

Python and H2O with Cliff Click at PyData Dallas 2015

  • 1.
    H2O.ai Machine Intelligence Fast, ScalableIn-Memory Machine and Deep Learning For Smarter Applications Python with H2O Cliff Click
  • 2.
    H2O.ai Machine Intelligence Who AmI? Cliff Click CTO, Co-Founder H2O.ai cliff@h2o.ai 40 yrs coding 35 yrs building compiler 30 yrs distributed computation 20 yrs OS, device drivers, HPC, HotSpot 10 yrs Low-latency GC, custom java hardware, NonBlockingHashMap 20 patents, dozens of papers 100s of public talks PhD Computer Science 1995 Rice University HotSpot JVM Server Compiler “showed the world JITing is possible”
  • 3.
    H2O.ai Machine Intelligence H2O OpenSource In-Memory Machine Learning for Big Data Distributed In-Memory Math Platform GLM, GBM, RF, K-Means, PCA, Deep Learning Easy to use SDK & API Java, R/CRAN, Scala, Spark, Python, JSON, Browser GUI Use ALL your data Modeling without sampling HDFS, S3, NFS, NoSql Big Data & Better Algorithms Better Predictions!
  • 4.
  • 5.
    H2O.ai Machine Intelligence Practical MachineLearning Value Requirements Fast & Interactive In-Memory Big Data (No Sampling) Distributed Ownership Open Source Extensibility API/SDK Portability Java, REST/JSON Infrastructure Cloud or On-Premise Hadoop or Private Cluster
  • 6.
    H2O.ai Machine Intelligence H2O Architecture PredictionEngine R & Exec Engine Web Interface Spark Scala REPL Nano-Fast Scoring Engine Distributed In-Memory K/V Store Column Compress Data Map/Reduce Memory Manager Algorithms! GBM, Random Forest, GLM, PCA, K-Means, Deep Learning HDFS S3 NFS
  • 7.
    H2O.ai Machine Intelligence H2O Architecture PredictionEngine R & Exec Engine Web Interface Spark Scala REPL Nano-Fast Scoring Engine Distributed In-Memory K/V Store Column Compress Data Map/Reduce Memory Manager Algorithms! GBM, Random Forest, GLM, PCA, K-Means, Deep Learning HDFS S3 NFS
  • 8.
    H2O.ai Machine Intelligence Demo! Python Demo ● CitiBikeof NYC ● Predict bikes-per-hour-per-station – From per-trip logs ● 10M rows of data ● Group-By, date/time feature-munging
  • 9.
    H2O.ai Machine Intelligence H2O: APlatform for Big Math ● Most Any Java on Big 2-D Tables – Write like its single-thread POJO code – Runs distributed & parallel by default ● Fast: billion row logistic regression takes 4 sec ● Worlds first parallel & distributed GBM – Plus GBM, Deep Learn / Neural Nets, RF, PCA, GLM... ● R integration: use terabyte datasets from R ● Sparkling Water: Direct Spark integration
  • 10.
    H2O.ai Machine Intelligence H2O: APlatform for Big Math ● Easy launch: “java -jar h2o.jar” – No GC tuning: -Xmx as big as you like ● Production ready: – Private on-premise cluster OR – In the Cloud – Hadoop, Yarn, EC2, or standalone cluster – HDFS, S3, NFS, URI & other datasources – Open Source, Apache v2