Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Python and H2O with Cliff Click at PyData Dallas 2015

2,758 views

Published on

Python with H2O, Cliff Click's talk from PyData Dallas 2015.

Published in: Software
  • Be the first to comment

Python and H2O with Cliff Click at PyData Dallas 2015

  1. 1. H2O.ai Machine Intelligence Fast, Scalable In-Memory Machine and Deep Learning For Smarter Applications Python with H2O Cliff Click
  2. 2. H2O.ai Machine Intelligence Who Am I? Cliff Click CTO, Co-Founder H2O.ai cliff@h2o.ai 40 yrs coding 35 yrs building compiler 30 yrs distributed computation 20 yrs OS, device drivers, HPC, HotSpot 10 yrs Low-latency GC, custom java hardware, NonBlockingHashMap 20 patents, dozens of papers 100s of public talks PhD Computer Science 1995 Rice University HotSpot JVM Server Compiler “showed the world JITing is possible”
  3. 3. H2O.ai Machine Intelligence H2O Open Source In-Memory Machine Learning for Big Data Distributed In-Memory Math Platform GLM, GBM, RF, K-Means, PCA, Deep Learning Easy to use SDK & API Java, R/CRAN, Scala, Spark, Python, JSON, Browser GUI Use ALL your data Modeling without sampling HDFS, S3, NFS, NoSql Big Data & Better Algorithms Better Predictions!
  4. 4. H2O.ai Machine Intelligence TBD. Customer Support TBD Head of Sales Distributed Systems Engineers Making ML Scale!
  5. 5. H2O.ai Machine Intelligence Practical Machine Learning Value Requirements Fast & Interactive In-Memory Big Data (No Sampling) Distributed Ownership Open Source Extensibility API/SDK Portability Java, REST/JSON Infrastructure Cloud or On-Premise Hadoop or Private Cluster
  6. 6. H2O.ai Machine Intelligence H2O Architecture Prediction Engine R & Exec Engine Web Interface Spark Scala REPL Nano-Fast Scoring Engine Distributed In-Memory K/V Store Column Compress Data Map/Reduce Memory Manager Algorithms! GBM, Random Forest, GLM, PCA, K-Means, Deep Learning HDFS S3 NFS
  7. 7. H2O.ai Machine Intelligence H2O Architecture Prediction Engine R & Exec Engine Web Interface Spark Scala REPL Nano-Fast Scoring Engine Distributed In-Memory K/V Store Column Compress Data Map/Reduce Memory Manager Algorithms! GBM, Random Forest, GLM, PCA, K-Means, Deep Learning HDFS S3 NFS
  8. 8. H2O.ai Machine Intelligence Demo! Python Demo ● CitiBike of NYC ● Predict bikes-per-hour-per-station – From per-trip logs ● 10M rows of data ● Group-By, date/time feature-munging
  9. 9. H2O.ai Machine Intelligence H2O: A Platform for Big Math ● Most Any Java on Big 2-D Tables – Write like its single-thread POJO code – Runs distributed & parallel by default ● Fast: billion row logistic regression takes 4 sec ● Worlds first parallel & distributed GBM – Plus GBM, Deep Learn / Neural Nets, RF, PCA, GLM... ● R integration: use terabyte datasets from R ● Sparkling Water: Direct Spark integration
  10. 10. H2O.ai Machine Intelligence H2O: A Platform for Big Math ● Easy launch: “java -jar h2o.jar” – No GC tuning: -Xmx as big as you like ● Production ready: – Private on-premise cluster OR – In the Cloud – Hadoop, Yarn, EC2, or standalone cluster – HDFS, S3, NFS, URI & other datasources – Open Source, Apache v2

×