Successfully reported this slideshow.
Beauty and Big Data
[Made possible by H2O and Tableau]
Amy Wang
“A data scientist knows more statistics than a
computer scientist and more computer science
than a statistician.”
What is H2O?
Open source in-memory prediction engineMath Platform
• Parallelized and distributed algorithms making the mos...
SQLHDFS NoSQLS3
R
JSON
H2O
Scala
Java
Intelligent Enterprise Applications
Prediction Engine
Memory Manager
ensembles
Solve...
Installation Process
Start playing with H2O with R yourself!
Grab H2O and our R package:
• Download from website : 0xdata....
Demo: Big Data Workflow using R
with H2O
OSEMN
INterpret [in Tableau]
Model [in H2O] and Explore the different models
Explore [in R or Tableau]
Obtain and Scrub
H2O
Data
Demo: Big Data Modeling
Visualization in Tableau through R
with H2O
A little about us
Advisors
Systems, Data, File Systems
and Hadoop
Scientific Advisory
Council
Investors
Doug Lea
ACM Fellow, Malloc for C,
f...
Beauty and Big Data
Beauty and Big Data
Upcoming SlideShare
Loading in …5
×

Beauty and Big Data

1,643 views

Published on

Published in: Software
  • Be the first to comment

Beauty and Big Data

  1. 1. Beauty and Big Data [Made possible by H2O and Tableau] Amy Wang
  2. 2. “A data scientist knows more statistics than a computer scientist and more computer science than a statistician.”
  3. 3. What is H2O? Open source in-memory prediction engineMath Platform • Parallelized and distributed algorithms making the most use out of multithreaded systems • GLM, Random Forest, GBM, PCA, etc Easy to use and adoptAPI • Written in Java – perfect for Java Programmers • REST API (JSON) – drives H2O from R, python, excel More data? Or better models? Both?Big Data • Use all of your data – model without down sampling • Run a simple GLM or a more complex GBM to find the best fit for the data • More Data + Better Models = Better Predictions
  4. 4. SQLHDFS NoSQLS3 R JSON H2O Scala Java Intelligent Enterprise Applications Prediction Engine Memory Manager ensembles Solvers Deep learning Cluster Classify Regression Trees Forest Boosting Gradients Processes Nano Fast Scoring Engine Columnar Compression Query Processor R-engine In-Mem Map Reduce 2M Row ingest/ sec 50M Row Regression / sec 750M Row Aggregates / sec On Premise On / Off Hadoop On EC2 Python
  5. 5. Installation Process Start playing with H2O with R yourself! Grab H2O and our R package: • Download from website : 0xdata.com/downloads • Build from git: https://github.com/0xdata/h2o Get support at: • http://docs.0xdata.com/
  6. 6. Demo: Big Data Workflow using R with H2O
  7. 7. OSEMN INterpret [in Tableau] Model [in H2O] and Explore the different models Explore [in R or Tableau] Obtain and Scrub
  8. 8. H2O Data
  9. 9. Demo: Big Data Modeling Visualization in Tableau through R with H2O
  10. 10. A little about us
  11. 11. Advisors Systems, Data, File Systems and Hadoop Scientific Advisory Council Investors Doug Lea ACM Fellow, Malloc for C, fork-join, java memory model, suny Oswego Chris Pouliot VP of Data Science, Lyft, formerly, Netflix, Google Dhruba Borthakur HDFS, Hive, Facebook Stephen Boyd Professor of EE Engineering, Stanford Rob Tibshirani Professor of Health Research and Policy, and Statistics, Stanford Trevor Hastie Professor of Statistics, Stanford Jishnu Bhattacharjee Nexus Venture Partners Anand Babu Periasamy Founder, Gluster (RedHat) Anand Rajaraman Founder, Junglee (Amazon) Kosmix (WalmartLabs) Dipchand “Deep” Nishar SVP of Products & UX (LinkedIn) We’ve Got the Who’s Who of Predictive Analytics

×