Big Data and R –
H2O Saves the Day
December 12, 2013, Anqi Fu

4/23/13
About Me

Anqi Fu: Anqi@0xdata.com
• Math Hacker at 0xdata
• R+H2O rockstar
• Economics and Statistics Background
Installation

Everything in the demo is something YOU CAN DO.
Grab H2O and our R package and try it yourself:
• on our web...
“The non-scientist in the street probably has a
clearer notion of physics, chemistry and
biology than of statistics, regar...
DO Try This At Home!!! (here
are the basic steps)
Basic Steps

1. Get R, H2O, and the R package (both H2O and R package
are in any of our download files – you can find them...
Overview of Objects

• H2OClient: ip=character, port=numeric

• H2OParsedData: h2o=H2OClient, key=character

H2O

key=“pro...
Overview of Methods

Standard R

H2O

read.csv, read.table, etc

h2o.importFile, h2o.importURL

summary

summary (limited ...
Demo: Big Data Manipulation and
Modeling in R with H2O
Thanks
Upcoming SlideShare
Loading in …5
×

Anqi Fu presents H2O and R; an intro to getting the most out of big data with R and H2O

2,244 views

Published on

Anqi Fu exposes the code behind integrating R with H2O, and demos how users can manipulate, slice, dice, and examine data to ask different questions using ALL of the same big data.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,244
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
44
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Pull up R and demo this in the console, making sure everyone can follow along
  • H2OParsedData: Each data set/calculation associated with unique hex key, object acts like a “pointer”Model: coefficients, deviance, aic, df.residual, etc
  • Anqi Fu presents H2O and R; an intro to getting the most out of big data with R and H2O

    1. 1. Big Data and R – H2O Saves the Day December 12, 2013, Anqi Fu 4/23/13
    2. 2. About Me Anqi Fu: Anqi@0xdata.com • Math Hacker at 0xdata • R+H2O rockstar • Economics and Statistics Background
    3. 3. Installation Everything in the demo is something YOU CAN DO. Grab H2O and our R package and try it yourself: • on our website (www.0xdata.com/downloadtable • on our git: (https://github.com/organizations/0xdata Get support at: • http://s3.amazonaws.com/h2o-release/h2o/master/1144/docswebsite/index.html
    4. 4. “The non-scientist in the street probably has a clearer notion of physics, chemistry and biology than of statistics, regarding statisticians as numerical philatelists, mere collector of numbers.” Stephen Senn - Anonymous
    5. 5. DO Try This At Home!!! (here are the basic steps)
    6. 6. Basic Steps 1. Get R, H2O, and the R package (both H2O and R package are in any of our download files – you can find them on our web page.) 2. Install the H2O R package in R (along with dependencies you may not have). 3. Tell R to talk to H2O with a single command >h2o.init() 4. Let the connection automatically install the algorithms used by H2O+R. 5. Analyze Big Data!
    7. 7. Overview of Objects • H2OClient: ip=character, port=numeric • H2OParsedData: h2o=H2OClient, key=character H2O key=“prostate.hex” key=“airlines.hex” • H2OGLMModel: key=character, data=H2OParsedData, model=list(coefficients, deviance, aic, etc) Example: myModel@model$coefficients
    8. 8. Overview of Methods Standard R H2O read.csv, read.table, etc h2o.importFile, h2o.importURL summary summary (limited to data only) glm, glmnet h2o.glm(y, x, data, family, nfolds, alpha, lambda) kmeans h2o.kmeans(data, centers, cols, iter.max) randomForest, cforest h2o.randomForest(y, x_ignore, data, ntree, depth, classwt)
    9. 9. Demo: Big Data Manipulation and Modeling in R with H2O
    10. 10. Thanks

    ×