Meetup8 29 2013

4/23/13
Hack Data With
Math
Meetup 8/29/2013

Introduction
What H2
O is:
● Machine learning platform
● Distributed
● In-memory
● Open Source
What H2
O can do:
● Scales your analysis:
● Handles large datasets: Billions of rows, 100s of GBs
● Performs computations at very quickly (near Fortran speeds)

Agenda
Use H2O with these data:
● MNIST Data
● Kaggle Allstate Data

MNIST DATA: Recognizing Handwritten Digits

MNIST Data
➢ Each observation has ~800 features, one
feature for each pixel in the image
➢ Each feature observation ranges from 0 to
255 where 0 is blank and 255 is totally black
➢ There are ~60K observations total
0 1 ……... 783 784
5 0 ……... 231 255
.
.
.
.
.
.
.
.
……...
……...
……...
……...
.
.
.
.
.
.
.
.
1 120 ……... 4 0
Class Labels Pixel Values

Random Forest
Random Forest For Classification
➢ Build a committee of decorrelated decision
trees, call it a forest
➢ Give data to the committee for prediction
➢ Majority vote on a row of data to classify

Random Forest
Pros
➢ Decision trees model complex
interactions
➢ Committee of trees reduces
classification error
➢ Easy to train and tune on small
data

MNIST Data Class Label Counts
Inspect data by piping together command line tools into lengthy
statements...
Or use H2O! Demo time!
Direct your browser to 192.168.1.161:xxxxx
xxxxx is your provided port number

Bodily Injury Claims: Allstate Data

Generalized Linear Modeling (GLM)
Supervised Learning For Prediction:
>Train on data with known labels
>Validate on out of sample data with known labels
>Test on new data
● Enough training data for the model to adequately capture
complex interactions between variables
● Use shrinkage methods to improve predictive power
● A model can be mostly judged by its “ability to predict”
○ Model is no good when predictive power falls below
some threshold
Examples:
○ Regression: Linear, Logistic, Poisson, Tweedie

Meetup8 29 2013

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Meetup8 29 2013

Similar to Meetup8 29 2013 (20)

Recently uploaded

Recently uploaded (20)

Meetup8 29 2013