Top 10 Performance Gotchas for scaling in-memory Algorithms.

Better
Predictions!

H2O – The Open Source Math Engine !

H2O –
Open Source
in-memory
Machine Learning
for Big Data
4/23/13

Universe is sparse. Life is messy.  
Data is sparse & messy.!
- Lao Tzu

Hadoop = opportunity
Not enough Data Scientists
Analysts won’t code java

Group
By

Grep

Messy

NAs

Classiﬁca-on

Regression

Clustering

Ensembles
100’s

nanos

models

H 2O

Big Data

the
Adhoc

Explora-on

Math

Modeling

Real-‐-me

Scoring

Prediction
Engine

No New API!
Big
Data

Explora-on

Modeling

Scoring

Real-‐-me

H 2O
the
Prediction
Engine

Approximate!
results each step!

Intellectual

Legacy

Math
needs

to
be
free

Open
Source

Support and Innovation

hFps://github.com/0xdata/h2o

H 2O
the
Prediction
Engine

All Top 10ʼs are binary!
- Anonymous

10

Move Code not Data

Data chunks > code chunks
TCP for Data. UDP for Control.
>> Generated Java Assist

A Chunk, Unit of Parallel Access
A Frame: Vec[]
age

sex

zip

ID

car

JVM
1
Heap
JVM
2
Heap
JVM
3
Heap
JVM
4
Heap

Vecs aligned
in heaps
l Optimized for
concurrent access
l Random access
any row, any JVM
l

9

Chunk-ing Express!

season for Variable-sized chunks
and a season Uniform chunks.
Tightly-packed!
(chunk is also unit of batch!)

8

Reduce early. Reduce Often!

No Expensive intermediate states.
Fine-grain parallelism wins!
>> Fork / Join

8

Reduce early. Reduce Often!

Vec
Vec
Vec
Vec
Vec

All CPUs grab
Chunks in parallel
Map/Reduce & F/J
handles all sync

JVM
1
Hea
p
JVM
2
Hea
p
JVM
3
Hea
p
JVM
4
Hea
p

7

Slow is not different from Dead

Debugging slow
>> Heartbeats, Messages
Two General’s Paradox

6

Memory Manager

in-memory system as good as
your memory manager!
lazy eviction.
compress.
align.
Corollary: Track down Leaks!

5

Memory Overheads

Use primitives
// A Distributed Vector
//
much more than 2billion elements
class Vec {
long length(); // more than an int's worth
// fast random access
double at(long idx); // Get the idx'th elem
boolean isNA(long idx);

}

void set(long idx, double d); // writable
void append(double d); // variable sized

4

Cache-‐Oblivious

Tree size
Bin size

Recursively divide
Till Data à Cache

3

EC2 – Nothing is bounded

User-mode reliability
S3 Readers will TCP Reset
Mux your connections
Not all toolkits are equal.
>> JetS3

2 No Locks, No Cry

Non-Blocking Data Structures.

// VOLATILE READ before key compare.
// CAS
private final boolean CAS_kvs( final Object[]
oldkvs, final Object[] newkvs ) {
return _unsafe.compareAndSwapObject(this,
_kvs_offset, oldkvs, newkvs );
}

1 endian wars ended!
Keep-It-Simple-Serialization.

byte[ ]. roll-your-own. fast.
public AutoBuffer putA1
( byte[] ary, int sofar, int length )
{
while( sofar < length ) {
int len = Math.min(length - sofar, _bb.remaining());
_bb.put(ary, sofar, len);
sofar += len;
if( sofar < length ) sendPartial();
}
return this;

}

Data Movement is a Defect.
Slowing down helps communication.
Got Speed?

0

Math always produces a number

Accuracy rules over speed.
Predictive Performance

1

Shuffle

Data presentation bias.
Sorted data => interesting results

2

Random acts of Kindness?

3

Convex Problems: ADMM

4  Amdahl strikes:
Cholesky / QR Decomposition

Matrix operations
jama, jblas.. all single node.
Distributed version
needs data transfer!

5

Random
Forests

embarrassingly parallel
binning
tree-building
splits

6

Boos-ng

iterate & stage
weak-learners =>
strong learners
each tree can be parallel
minimize communication

7

Neural
Nets
&
Clustering

embarrassingly parallel
pre-calculate base stats
distance calculation
weight matrices – small footprint

8

Ensembles

Daisy chain a bunch of models
Interleave.
JIT – Minimize loops over data.

9

Tools

Deterministic versions first!
Got Pen & Paper?
Optimize often.
Test Big Data soon.

Replace NAs to improves 
predictive performance by about 10pc. 
 
 
!
- Newton

Munging Missing Features 
impute NAs with mean 
impute NAs with knn 
impute with recursive pca!
- Boyd

Unbalanced data 
single rare classes 
Fraud / No-Fraud!
Stratify

Unbalanced data 
multiple rare classes 
Browse, Click, Purchase!
Stratify

10

Data

is the System

Use Customer Data
Algorithms for Sparse vs. Dense
Unbalanced Data.
Robustness under noise

Before H2O

Velocity:
Events

Online
Scoring

Volume:
HDFS

Rule
Engine

Munging
slice n dice
Features

HIVE/SQL

Applications

Explora-on

Data Scientist

Modeling

Oﬄine
Scoring

Engineer

Business Analyst

Ensemble models
Low latency

Classification
Regression
Clustering
Optimal Model
Predictions

Big
Data

Explora-on

Modeling

Scoring

Real-‐-me

Big Data beats Better Algorithms!

Big
Data

Explora-on

Modeling

Scoring

Real-‐-me

Big Data and Better Algorithms!
Scale & Parallelism!

Distributed Coding Taxonomy

l 

No Distribution Coding:
l 
l 

l 

Whole Algorithms, Whole Vector-Math!
REST + JSON: e.g. load data, GLM, get results!

Simple Data-Parallel Coding:
l 
l 

l 

Per-Row (or neighbor row) Math!
Map/Reduce-style: e.g. Any dense linear algebra!

Complex Data-Parallel Coding
l 

K/V Store, Graph Algo's, e.g. PageRank!

0xdata.c45

Distributed Coding Taxonomy

l 

No Distribution Coding:
l 

l 

Whole Algorithms, Whole Vector-Math!

l 

REST + JSON: e.g. load data, GLM, get results!

Simple Data-Parallel Coding:
l 

Per-Row (or neighbor row) Math!

l 

l 

Read
the
docs!

This
talk!

Map/Reduce-style: e.g. Any dense linear algebra!

Complex Data-Parallel Coding
l 

K/V Store, Graph Algo's, e.g. PageRank!

Join
our
GIT!

46

Distributed Data Taxonomy

Frame – a collection of Vecs
Vec – a collection of Chunks
Chunk – a collection of 1e3 to 1e6 elems
elem – a java double
Row i – i'th elements of all the Vecs in a Frame

0xdata.c47

Usecases

Conversion, Retention & Churn!
•  Lead Conversion!
•  Engagement!
•  Product Placement!
•  Recommendations!
Pricing Engine!
Fraud Detection!

Top 10 Performance Gotchas for scaling in-memory Algorithms.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Top 10 Performance Gotchas for scaling in-memory Algorithms.

Similar to Top 10 Performance Gotchas for scaling in-memory Algorithms. (20)

More from srisatish ambati

More from srisatish ambati (10)

Recently uploaded

Recently uploaded (20)

Top 10 Performance Gotchas for scaling in-memory Algorithms.