SlideShare a Scribd company logo
GBM:
Distributed Tree Algorithms
on H2O
Cliff Click, CTO 0xdata
cliffc@0xdata.com
http://0xdata.com
http://cliffc.org/blog
0xdata.com 2
H2O is...
● Pure Java, Open Source: 0xdata.com
● https://github.com/0xdata/h2o/
● A Platform for doing Math
● Parallel Distributed Math
● In-memory analytics: GLM, GBM, RF, Logistic Reg
● Accessible via REST & JSON
● A K/V Store: ~150ns per get or put
● Distributed Fork/Join + Map/Reduce + K/V
0xdata.com 3
Agenda
● Building Blocks For Big Data:
● Vecs & Frames & Chunks
● Distributed Tree Algorithms
● Access Patterns & Execution
● GBM on H2O
● Performance
0xdata.com 4
A Collection of Distributed Vectors
// A Distributed Vector
// much more than 2billion elements
class Vec {
long length(); // more than an int's worth
// fast random access
double at(long idx); // Get the idx'th elem
boolean isNA(long idx);
void set(long idx, double d); // writable
void append(double d); // variable sized
}
0xdata.com 5
JVM 4
Heap
JVM 1
Heap
JVM 2
Heap
JVM 3
Heap
Frames
A Frame: Vec[]
age sex zip ID car
●Vecs aligned
in heaps
●Optimized for
concurrent access
●Random access
any row, any JVM
●But faster if local...
more on that later
0xdata.com 6
JVM 4
Heap
JVM 1
Heap
JVM 2
Heap
JVM 3
Heap
Distributed Data Taxonomy
A Chunk, Unit of Parallel Access
Vec Vec Vec Vec Vec
●Typically 1e3 to
1e6 elements
●Stored compressed
●In byte arrays
●Get/put is a few
clock cycles
including
compression
0xdata.com 7
JVM 4
Heap
JVM 1
Heap
JVM 2
Heap
JVM 3
Heap
Distributed Parallel Execution
Vec Vec Vec Vec Vec
●All CPUs grab
Chunks in parallel
●F/J load balances
●Code moves to Data
●Map/Reduce & F/J
handles all sync
●H2O handles all
comm, data manage
0xdata.com 8
Distributed Data Taxonomy
Frame – a collection of Vecs
Vec – a collection of Chunks
Chunk – a collection of 1e3 to 1e6 elems
elem – a java double
Row i – i'th elements of all the Vecs in a Frame
0xdata.com 9
Distributed Coding Taxonomy
● No Distribution Coding:
● Whole Algorithms, Whole Vector-Math
● REST + JSON: e.g. load data, GLM, get results
● Simple Data-Parallel Coding:
● Per-Row (or neighbor row) Math
● Map/Reduce-style: e.g. Any dense linear algebra
● Complex Data-Parallel Coding
● K/V Store, Graph Algo's, e.g. PageRank
0xdata.com 10
Distributed Coding Taxonomy
● No Distribution Coding:
● Whole Algorithms, Whole Vector-Math
● REST + JSON: e.g. load data, GLM, get results
● Simple Data-Parallel Coding:
● Per-Row (or neighbor row) Math
● Map/Reduce-style: e.g. Any dense linear algebra
● Complex Data-Parallel Coding
● K/V Store, Graph Algo's, e.g. PageRank
Read the docs!
This talk!
Join our GIT!
0xdata.com 11
Simple Data-Parallel Coding
● Map/Reduce Per-Row: Stateless
●
Example from Linear Regression, Σ y2
● Auto-parallel, auto-distributed
● Near Fortran speed, Java Ease
double sumY2 = new MRTask() {
double map( double d ) { return d*d; }
double reduce( double d1, double d2 ) {
return d1+d2;
}
}.doAll( vecY );
0xdata.com 12
Simple Data-Parallel Coding
● Map/Reduce Per-Row: State-full
●
Linear Regression Pass1: Σ x, Σ y, Σ y2
class LRPass1 extends MRTask {
double sumX, sumY, sumY2; // I Can Haz State?
void map( double X, double Y ) {
sumX += X; sumY += Y; sumY2 += Y*Y;
}
void reduce( LRPass1 that ) {
sumX += that.sumX ;
sumY += that.sumY ;
sumY2 += that.sumY2;
}
}
0xdata.com 13
Simple Data-Parallel Coding
● Map/Reduce Per-Row: Batch State-full
class LRPass1 extends MRTask {
double sumX, sumY, sumY2;
void map( Chunk CX, Chunk CY ) {// Whole Chunks
for( int i=0; i<CX.len; i++ ){// Batch!
double X = CX.at(i), Y = CY.at(i);
sumX += X; sumY += Y; sumY2 += Y*Y;
}
}
void reduce( LRPass1 that ) {
sumX += that.sumX ;
sumY += that.sumY ;
sumY2 += that.sumY2;
}
}
0xdata.com 14
Distributed Trees
● Overlay a Tree over the data
● Really: Assign a Tree Node to each Row
● Number the Nodes
● Store "Node_ID" per row in a temp Vec
● Make a pass over all Rows
● Nodes not visited in order...
● but all rows, all Nodes efficiently visited
● Do work (e.g. histogram) per Row/Node
Vec nids = v.makeZero();
… nids.set(row,nid)...
0xdata.com 15
Distributed Trees
● An initial Tree
● All rows at nid==0
● MRTask: compute stats
● Use the stats to make a decision...
● (varies by algorithm)!
nid=0
X Y nids
A 1.2 0
B 3.1 0
C -2. 0
D 1.1 0
nid=0nid=0
Tree
MRTask.sum=3.4
0xdata.com 16
Distributed Trees
● Next layer in the Tree (and MRTask across rows)
● Each row: decide!
– If "1<Y<1.5" go left else right
● Compute stats per new leaf
● Each pass across all
rows builds entire layer
nid=0
X Y nids
A 1.2 1
B 3.1 2
C -2. 2
D 1.1 1
nid=01 < Y < 1.5
Tree
sum=1.1
nid=1 nid=2
sum=2.3
0xdata.com 17
Distributed Trees
● Another MRTask, another layer...
● i.e., a 5-deep tree
takes 5 passes
●
nid=0nid=01 < Y < 1.5
Tree
sum=1.1
Y==1.1 leaf
nid=3 nid=4
X Y nids
A 1.2 3
B 3.1 2
C -2. 2
D 1.1 4 sum= -2. sum=3.1
0xdata.com 18
Distributed Trees
● Each pass is over one layer in the tree
● Builds per-node histogram in map+reduce calls
class Pass extends MRTask2<Pass> {
void map( Chunk chks[] ) {
Chunk nids = chks[...]; // Node-IDs per row
for( int r=0; r<nids.len; r++ ){// All rows
int nid = nids.at80(i); // Node-ID THIS row
// Lazy: not all Chunks see all Nodes
if( dHisto[nid]==null ) dHisto[nid]=...
// Accumulate histogram stats per node
dHisto[nid].accum(chks,r);
}
}
}.doAll(myDataFrame,nids);
0xdata.com 19
Distributed Trees
● Each pass analyzes one Tree level
● Then decide how to build next level
● Reassign Rows to new levels in another pass
– (actually merge the two passes)
● Builds a Histogram-per-Node
● Which requires a reduce() call to roll up
● All Histograms for one level done in parallel
0xdata.com 20
Distributed Trees: utilities
● “score+build” in one pass:
● Test each row against decision from prior pass
● Assign to a new leaf
● Build histogram on that leaf
● “score”: just walk the tree, and get results
● “compress”: Tree from POJO to byte[]
● Easily 10x smaller, can still walk, score, print
● Plus utilities to walk, print, display
0xdata.com 21
GBM on Distributed Trees
● GBM builds 1 Tree, 1 level at a time, but...
● We run the entire level in parallel & distributed
● Built breadth-first because it's "free"
● More data offset by more CPUs
● Classic GBM otherwise
● Build residuals tree-by-tree
● Tuning knobs: trees, depth, shrinkage, min_rows
● Pure Java
0xdata.com 22
GBM on Distributed Trees
● Limiting factor: latency in turning over a level
● About 4x faster than R single-node on covtype
● Does the per-level compute in parallel
● Requires sending histograms over network
– Can get big for very deep trees
●
0xdata.com 23
Summary: Write (parallel) Java
● Most simple Java “just works”
● Fast: parallel distributed reads, writes, appends
● Reads same speed as plain Java array loads
● Writes, appends: slightly slower (compression)
● Typically memory bandwidth limited
– (may be CPU limited in a few cases)
● Slower: conflicting writes (but follows strict JMM)
● Also supports transactional updates
0xdata.com 24
Summary: Writing Analytics
● We're writing Big Data Analytics
● Generalized Linear Modeling (ADMM, GLMNET)
– Logistic Regression, Poisson, Gamma
● Random Forest, GBM, KMeans++, KNN
● State-of-the-art Algorithms, running Distributed
● Solidly working on 100G datasets
● Heading for Tera Scale
● Paying customers (in production!)
● Come write your own (distributed) algorithm!!!
0xdata.com 25
Cool Systems Stuff...
● … that I ran out of space for
● Reliable UDP, integrated w/RPC
● TCP is reliably UNReliable
● Already have a reliable UDP framework, so no prob
● Fork/Join Goodies:
● Priority Queues
● Distributed F/J
● Surviving fork bombs & lost threads
● K/V does JMM via hardware-like MESI protocol
0xdata.com 26
H2O is...
● Pure Java, Open Source: 0xdata.com
● https://github.com/0xdata/h2o/
● A Platform for doing Math
● Parallel Distributed Math
● In-memory analytics: GLM, GBM, RF, Logistic Reg
● Accessible via REST & JSON
● A K/V Store: ~150ns per get or put
● Distributed Fork/Join + Map/Reduce + K/V
0xdata.com 27
The Platform
NFS
HDFS
byte[]
extends Iced
extends DTask
AutoBuffer
RPC
extends DRemoteTask D/F/J
extends MRTask User code?
JVM 1
NFS
HDFS
byte[]
extends Iced
extends DTask
AutoBuffer
RPC
extends DRemoteTask D/F/J
extends MRTask User code?
JVM 2
K/V get/put
UDP / TCP
0xdata.com 28
Other Simple Examples
● Filter & Count (underage males):
● (can pass in any number of Vecs or a Frame)
long sumY2 = new MRTask() {
long map( long age, long sex ) {
return (age<=17 && sex==MALE) ? 1 : 0;
}
long reduce( long d1, long d2 ) {
return d1+d2;
}
}.doAll( vecAge, vecSex );
0xdata.com 29
Other Simple Examples
● Filter into new set (underage males):
● Can write or append subset of rows
– (append order is preserved)
class Filter extends MRTask {
void map(Chunk CRisk, Chunk CAge, Chunk CSex){
for( int i=0; i<CAge.len; i++ )
if( CAge.at(i)<=17 && CSex.at(i)==MALE )
CRisk.append(CAge.at(i)); // build a set
}
};
Vec risk = new AppendableVec();
new Filter().doAll( risk, vecAge, vecSex );
...risk... // all the underage males
0xdata.com 30
Other Simple Examples
● Filter into new set (underage males):
● Can write or append subset of rows
– (append order is preserved)
class Filter extends MRTask {
void map(Chunk CRisk, Chunk CAge, Chunk CSex){
for( int i=0; i<CAge.len; i++ )
if( CAge.at(i)<=17 && CSex.at(i)==MALE )
CRisk.append(CAge.at(i)); // build a set
}
};
Vec risk = new AppendableVec();
new Filter().doAll( risk, vecAge, vecSex );
...risk... // all the underage males
0xdata.com 31
Other Simple Examples
● Group-by: count of car-types by age
class AgeHisto extends MRTask {
long carAges[][]; // count of cars by age
void map( Chunk CAge, Chunk CCar ) {
carAges = new long[numAges][numCars];
for( int i=0; i<CAge.len; i++ )
carAges[CAge.at(i)][CCar.at(i)]++;
}
void reduce( AgeHisto that ) {
for( int i=0; i<carAges.length; i++ )
for( int j=0; i<carAges[j].length; j++ )
carAges[i][j] += that.carAges[i][j];
}
}
0xdata.com 32
class AgeHisto extends MRTask {
long carAges[][]; // count of cars by age
void map( Chunk CAge, Chunk CCar ) {
carAges = new long[numAges][numCars];
for( int i=0; i<CAge.len; i++ )
carAges[CAge.at(i)][CCar.at(i)]++;
}
void reduce( AgeHisto that ) {
for( int i=0; i<carAges.length; i++ )
for( int j=0; i<carAges[j].length; j++ )
carAges[i][j] += that.carAges[i][j];
}
}
Other Simple Examples
● Group-by: count of car-types by age
Setting carAges in map() makes it an output field.
Private per-map call, single-threaded write access.
Must be rolled-up in the reduce call.
Setting carAges in map makes it an output field.
Private per-map call, single-threaded write access.
Must be rolled-up in the reduce call.
0xdata.com 33
Other Simple Examples
● Uniques
● Uses distributed hash set
class Uniques extends MRTask {
DNonBlockingHashSet<Long> dnbhs = new ...;
void map( long id ) { dnbhs.add(id); }
void reduce( Uniques that ) {
dnbhs.putAll(that.dnbhs);
}
};
long uniques = new Uniques().
doAll( vecVistors ).dnbhs.size();
0xdata.com 34
Other Simple Examples
● Uniques
● Uses distributed hash set
class Uniques extends MRTask {
DNonBlockingHashSet<Long> dnbhs = new ...;
void map( long id ) { dnbhs.add(id); }
void reduce( Uniques that ) {
dnbhs.putAll(that.dnbhs);
}
};
long uniques = new Uniques().
doAll( vecVistors ).dnbhs.size();
Setting dnbhs in <init> makes it an input field.
Shared across all maps(). Often read-only.
This one is written, so needs a reduce.

More Related Content

What's hot

Advanced Non-Relational Schemas For Big Data
Advanced Non-Relational Schemas For Big DataAdvanced Non-Relational Schemas For Big Data
Advanced Non-Relational Schemas For Big Data
Victor Smirnov
 
User biglm
User biglmUser biglm
User biglm
johnatan pladott
 
Predicate-Preserving Collision-Resistant Hashing
Predicate-Preserving  Collision-Resistant HashingPredicate-Preserving  Collision-Resistant Hashing
Predicate-Preserving Collision-Resistant Hashing
Philippe Camacho, Ph.D.
 
RealmDB for Android
RealmDB for AndroidRealmDB for Android
RealmDB for Android
GlobalLogic Ukraine
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
Amin Chowdhury
 
AES effecitve software implementation
AES effecitve software implementationAES effecitve software implementation
AES effecitve software implementation
Roman Oliynykov
 
MapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelMapDB - taking Java collections to the next level
MapDB - taking Java collections to the next level
JavaDayUA
 
Nicety of Java 8 Multithreading
Nicety of Java 8 MultithreadingNicety of Java 8 Multithreading
Nicety of Java 8 Multithreading
GlobalLogic Ukraine
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Spark Summit
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
Rishikese MR
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
Parallel programming patterns - Олександр Павлишак
Parallel programming patterns - Олександр ПавлишакParallel programming patterns - Олександр Павлишак
Parallel programming patterns - Олександр Павлишак
Igor Bronovskyy
 
Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data era
Bill GU
 
IR-ranking
IR-rankingIR-ranking
IR-rankingFELIX75
 
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldApache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Jihoon Son
 
Secure Hash Algorithm
Secure Hash AlgorithmSecure Hash Algorithm
Secure Hash Algorithm
Vishakha Agarwal
 
Three steps to untangle data traffic jams
Three steps to untangle data traffic jamsThree steps to untangle data traffic jams
Three steps to untangle data traffic jams
Bol.com Techlab
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Tathagata Das
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
Dr Ganesh Iyer
 

What's hot (20)

Advanced Non-Relational Schemas For Big Data
Advanced Non-Relational Schemas For Big DataAdvanced Non-Relational Schemas For Big Data
Advanced Non-Relational Schemas For Big Data
 
User biglm
User biglmUser biglm
User biglm
 
Predicate-Preserving Collision-Resistant Hashing
Predicate-Preserving  Collision-Resistant HashingPredicate-Preserving  Collision-Resistant Hashing
Predicate-Preserving Collision-Resistant Hashing
 
RealmDB for Android
RealmDB for AndroidRealmDB for Android
RealmDB for Android
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
 
AES effecitve software implementation
AES effecitve software implementationAES effecitve software implementation
AES effecitve software implementation
 
MapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelMapDB - taking Java collections to the next level
MapDB - taking Java collections to the next level
 
Nicety of Java 8 Multithreading
Nicety of Java 8 MultithreadingNicety of Java 8 Multithreading
Nicety of Java 8 Multithreading
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
 
icwet1097
icwet1097icwet1097
icwet1097
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Parallel programming patterns - Олександр Павлишак
Parallel programming patterns - Олександр ПавлишакParallel programming patterns - Олександр Павлишак
Parallel programming patterns - Олександр Павлишак
 
Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data era
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
 
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldApache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack World
 
Secure Hash Algorithm
Secure Hash AlgorithmSecure Hash Algorithm
Secure Hash Algorithm
 
Three steps to untangle data traffic jams
Three steps to untangle data traffic jamsThree steps to untangle data traffic jams
Three steps to untangle data traffic jams
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 

Viewers also liked

21st Century University feasibility study
21st Century University feasibility study 21st Century University feasibility study
21st Century University feasibility study
Jouni Eho
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
Sri Ambati
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015
Sri Ambati
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014
Sri Ambati
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
Sri Ambati
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
Avkash Chauhan
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...
Sri Ambati
 
Transformation, H2O Open Dallas 2016, Keynote by Sri Ambati,
Transformation, H2O Open Dallas 2016, Keynote by Sri Ambati, Transformation, H2O Open Dallas 2016, Keynote by Sri Ambati,
Transformation, H2O Open Dallas 2016, Keynote by Sri Ambati,
Sri Ambati
 
H2O Big Data Environments
H2O Big Data EnvironmentsH2O Big Data Environments
H2O Big Data Environments
Sri Ambati
 
Big Data Science with H2O in R
Big Data Science with H2O in RBig Data Science with H2O in R
Big Data Science with H2O in R
Anqi Fu
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal Malohlava
Sri Ambati
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
Sri Ambati
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
Sri Ambati
 
H2O AutoML roadmap - Ray Peck
H2O AutoML roadmap - Ray PeckH2O AutoML roadmap - Ray Peck
H2O AutoML roadmap - Ray Peck
Sri Ambati
 
Designing chatbot personalities
Designing chatbot personalitiesDesigning chatbot personalities
Designing chatbot personalities
Eva Lettner
 
La rivoluzione dei chatbot
La rivoluzione dei chatbotLa rivoluzione dei chatbot
La rivoluzione dei chatbot
Giuliano Iacobelli
 
Chatbot - new opportunities and insights
Chatbot - new opportunities and insightsChatbot - new opportunities and insights
Chatbot - new opportunities and insights
Po-Cheng Chu
 
Artificially Intelligent chatbot Implementation
Artificially Intelligent chatbot ImplementationArtificially Intelligent chatbot Implementation
Artificially Intelligent chatbot Implementation
Rakesh Chintha
 
Transform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine LearningTransform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine Learning
Sri Ambati
 
Chatbot Artificial Intelligence
Chatbot Artificial IntelligenceChatbot Artificial Intelligence
Chatbot Artificial IntelligenceMd. Mahedi Mahfuj
 

Viewers also liked (20)

21st Century University feasibility study
21st Century University feasibility study 21st Century University feasibility study
21st Century University feasibility study
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...
 
Transformation, H2O Open Dallas 2016, Keynote by Sri Ambati,
Transformation, H2O Open Dallas 2016, Keynote by Sri Ambati, Transformation, H2O Open Dallas 2016, Keynote by Sri Ambati,
Transformation, H2O Open Dallas 2016, Keynote by Sri Ambati,
 
H2O Big Data Environments
H2O Big Data EnvironmentsH2O Big Data Environments
H2O Big Data Environments
 
Big Data Science with H2O in R
Big Data Science with H2O in RBig Data Science with H2O in R
Big Data Science with H2O in R
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal Malohlava
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
 
H2O AutoML roadmap - Ray Peck
H2O AutoML roadmap - Ray PeckH2O AutoML roadmap - Ray Peck
H2O AutoML roadmap - Ray Peck
 
Designing chatbot personalities
Designing chatbot personalitiesDesigning chatbot personalities
Designing chatbot personalities
 
La rivoluzione dei chatbot
La rivoluzione dei chatbotLa rivoluzione dei chatbot
La rivoluzione dei chatbot
 
Chatbot - new opportunities and insights
Chatbot - new opportunities and insightsChatbot - new opportunities and insights
Chatbot - new opportunities and insights
 
Artificially Intelligent chatbot Implementation
Artificially Intelligent chatbot ImplementationArtificially Intelligent chatbot Implementation
Artificially Intelligent chatbot Implementation
 
Transform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine LearningTransform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine Learning
 
Chatbot Artificial Intelligence
Chatbot Artificial IntelligenceChatbot Artificial Intelligence
Chatbot Artificial Intelligence
 

Similar to GBM in H2O with Cliff Click: H2O API

Building a Big Data Machine Learning Platform
Building a Big Data Machine Learning PlatformBuilding a Big Data Machine Learning Platform
Building a Big Data Machine Learning Platform
Cliff Click
 
Sv big datascience_cliffclick_5_2_2013
Sv big datascience_cliffclick_5_2_2013Sv big datascience_cliffclick_5_2_2013
Sv big datascience_cliffclick_5_2_2013Sri Ambati
 
H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt Dowle
Sri Ambati
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
MLconf
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory Sanitizer
Platonov Sergey
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
Raymond Tay
 
HPC Essentials 0
HPC Essentials 0HPC Essentials 0
HPC Essentials 0
William Brouwer
 
Simd programming introduction
Simd programming introductionSimd programming introduction
Simd programming introduction
Champ Yen
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
AMD Developer Central
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
Ferdinand Jamitzky
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++
Satalia
 
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
PROIDEA
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systems
Vsevolod Stakhov
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
David Walker
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Stevens 3rd Annual Conference Hfc2011
Stevens 3rd Annual Conference Hfc2011Stevens 3rd Annual Conference Hfc2011
Stevens 3rd Annual Conference Hfc2011
jzw200
 
C# as a System Language
C# as a System LanguageC# as a System Language
C# as a System Language
ScyllaDB
 

Similar to GBM in H2O with Cliff Click: H2O API (20)

2013 05 ny
2013 05 ny2013 05 ny
2013 05 ny
 
Building a Big Data Machine Learning Platform
Building a Big Data Machine Learning PlatformBuilding a Big Data Machine Learning Platform
Building a Big Data Machine Learning Platform
 
Sv big datascience_cliffclick_5_2_2013
Sv big datascience_cliffclick_5_2_2013Sv big datascience_cliffclick_5_2_2013
Sv big datascience_cliffclick_5_2_2013
 
H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt Dowle
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory Sanitizer
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
HPC Essentials 0
HPC Essentials 0HPC Essentials 0
HPC Essentials 0
 
Simd programming introduction
Simd programming introductionSimd programming introduction
Simd programming introduction
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++
 
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
4Developers 2018: Ile (nie) wiesz o strukturach w .NET (Łukasz Pyrzyk)
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systems
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Stevens 3rd Annual Conference Hfc2011
Stevens 3rd Annual Conference Hfc2011Stevens 3rd Annual Conference Hfc2011
Stevens 3rd Annual Conference Hfc2011
 
Vectorization in ATLAS
Vectorization in ATLASVectorization in ATLAS
Vectorization in ATLAS
 
C# as a System Language
C# as a System LanguageC# as a System Language
C# as a System Language
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 

Recently uploaded

The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 

Recently uploaded (20)

The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 

GBM in H2O with Cliff Click: H2O API

  • 1. GBM: Distributed Tree Algorithms on H2O Cliff Click, CTO 0xdata cliffc@0xdata.com http://0xdata.com http://cliffc.org/blog
  • 2. 0xdata.com 2 H2O is... ● Pure Java, Open Source: 0xdata.com ● https://github.com/0xdata/h2o/ ● A Platform for doing Math ● Parallel Distributed Math ● In-memory analytics: GLM, GBM, RF, Logistic Reg ● Accessible via REST & JSON ● A K/V Store: ~150ns per get or put ● Distributed Fork/Join + Map/Reduce + K/V
  • 3. 0xdata.com 3 Agenda ● Building Blocks For Big Data: ● Vecs & Frames & Chunks ● Distributed Tree Algorithms ● Access Patterns & Execution ● GBM on H2O ● Performance
  • 4. 0xdata.com 4 A Collection of Distributed Vectors // A Distributed Vector // much more than 2billion elements class Vec { long length(); // more than an int's worth // fast random access double at(long idx); // Get the idx'th elem boolean isNA(long idx); void set(long idx, double d); // writable void append(double d); // variable sized }
  • 5. 0xdata.com 5 JVM 4 Heap JVM 1 Heap JVM 2 Heap JVM 3 Heap Frames A Frame: Vec[] age sex zip ID car ●Vecs aligned in heaps ●Optimized for concurrent access ●Random access any row, any JVM ●But faster if local... more on that later
  • 6. 0xdata.com 6 JVM 4 Heap JVM 1 Heap JVM 2 Heap JVM 3 Heap Distributed Data Taxonomy A Chunk, Unit of Parallel Access Vec Vec Vec Vec Vec ●Typically 1e3 to 1e6 elements ●Stored compressed ●In byte arrays ●Get/put is a few clock cycles including compression
  • 7. 0xdata.com 7 JVM 4 Heap JVM 1 Heap JVM 2 Heap JVM 3 Heap Distributed Parallel Execution Vec Vec Vec Vec Vec ●All CPUs grab Chunks in parallel ●F/J load balances ●Code moves to Data ●Map/Reduce & F/J handles all sync ●H2O handles all comm, data manage
  • 8. 0xdata.com 8 Distributed Data Taxonomy Frame – a collection of Vecs Vec – a collection of Chunks Chunk – a collection of 1e3 to 1e6 elems elem – a java double Row i – i'th elements of all the Vecs in a Frame
  • 9. 0xdata.com 9 Distributed Coding Taxonomy ● No Distribution Coding: ● Whole Algorithms, Whole Vector-Math ● REST + JSON: e.g. load data, GLM, get results ● Simple Data-Parallel Coding: ● Per-Row (or neighbor row) Math ● Map/Reduce-style: e.g. Any dense linear algebra ● Complex Data-Parallel Coding ● K/V Store, Graph Algo's, e.g. PageRank
  • 10. 0xdata.com 10 Distributed Coding Taxonomy ● No Distribution Coding: ● Whole Algorithms, Whole Vector-Math ● REST + JSON: e.g. load data, GLM, get results ● Simple Data-Parallel Coding: ● Per-Row (or neighbor row) Math ● Map/Reduce-style: e.g. Any dense linear algebra ● Complex Data-Parallel Coding ● K/V Store, Graph Algo's, e.g. PageRank Read the docs! This talk! Join our GIT!
  • 11. 0xdata.com 11 Simple Data-Parallel Coding ● Map/Reduce Per-Row: Stateless ● Example from Linear Regression, Σ y2 ● Auto-parallel, auto-distributed ● Near Fortran speed, Java Ease double sumY2 = new MRTask() { double map( double d ) { return d*d; } double reduce( double d1, double d2 ) { return d1+d2; } }.doAll( vecY );
  • 12. 0xdata.com 12 Simple Data-Parallel Coding ● Map/Reduce Per-Row: State-full ● Linear Regression Pass1: Σ x, Σ y, Σ y2 class LRPass1 extends MRTask { double sumX, sumY, sumY2; // I Can Haz State? void map( double X, double Y ) { sumX += X; sumY += Y; sumY2 += Y*Y; } void reduce( LRPass1 that ) { sumX += that.sumX ; sumY += that.sumY ; sumY2 += that.sumY2; } }
  • 13. 0xdata.com 13 Simple Data-Parallel Coding ● Map/Reduce Per-Row: Batch State-full class LRPass1 extends MRTask { double sumX, sumY, sumY2; void map( Chunk CX, Chunk CY ) {// Whole Chunks for( int i=0; i<CX.len; i++ ){// Batch! double X = CX.at(i), Y = CY.at(i); sumX += X; sumY += Y; sumY2 += Y*Y; } } void reduce( LRPass1 that ) { sumX += that.sumX ; sumY += that.sumY ; sumY2 += that.sumY2; } }
  • 14. 0xdata.com 14 Distributed Trees ● Overlay a Tree over the data ● Really: Assign a Tree Node to each Row ● Number the Nodes ● Store "Node_ID" per row in a temp Vec ● Make a pass over all Rows ● Nodes not visited in order... ● but all rows, all Nodes efficiently visited ● Do work (e.g. histogram) per Row/Node Vec nids = v.makeZero(); … nids.set(row,nid)...
  • 15. 0xdata.com 15 Distributed Trees ● An initial Tree ● All rows at nid==0 ● MRTask: compute stats ● Use the stats to make a decision... ● (varies by algorithm)! nid=0 X Y nids A 1.2 0 B 3.1 0 C -2. 0 D 1.1 0 nid=0nid=0 Tree MRTask.sum=3.4
  • 16. 0xdata.com 16 Distributed Trees ● Next layer in the Tree (and MRTask across rows) ● Each row: decide! – If "1<Y<1.5" go left else right ● Compute stats per new leaf ● Each pass across all rows builds entire layer nid=0 X Y nids A 1.2 1 B 3.1 2 C -2. 2 D 1.1 1 nid=01 < Y < 1.5 Tree sum=1.1 nid=1 nid=2 sum=2.3
  • 17. 0xdata.com 17 Distributed Trees ● Another MRTask, another layer... ● i.e., a 5-deep tree takes 5 passes ● nid=0nid=01 < Y < 1.5 Tree sum=1.1 Y==1.1 leaf nid=3 nid=4 X Y nids A 1.2 3 B 3.1 2 C -2. 2 D 1.1 4 sum= -2. sum=3.1
  • 18. 0xdata.com 18 Distributed Trees ● Each pass is over one layer in the tree ● Builds per-node histogram in map+reduce calls class Pass extends MRTask2<Pass> { void map( Chunk chks[] ) { Chunk nids = chks[...]; // Node-IDs per row for( int r=0; r<nids.len; r++ ){// All rows int nid = nids.at80(i); // Node-ID THIS row // Lazy: not all Chunks see all Nodes if( dHisto[nid]==null ) dHisto[nid]=... // Accumulate histogram stats per node dHisto[nid].accum(chks,r); } } }.doAll(myDataFrame,nids);
  • 19. 0xdata.com 19 Distributed Trees ● Each pass analyzes one Tree level ● Then decide how to build next level ● Reassign Rows to new levels in another pass – (actually merge the two passes) ● Builds a Histogram-per-Node ● Which requires a reduce() call to roll up ● All Histograms for one level done in parallel
  • 20. 0xdata.com 20 Distributed Trees: utilities ● “score+build” in one pass: ● Test each row against decision from prior pass ● Assign to a new leaf ● Build histogram on that leaf ● “score”: just walk the tree, and get results ● “compress”: Tree from POJO to byte[] ● Easily 10x smaller, can still walk, score, print ● Plus utilities to walk, print, display
  • 21. 0xdata.com 21 GBM on Distributed Trees ● GBM builds 1 Tree, 1 level at a time, but... ● We run the entire level in parallel & distributed ● Built breadth-first because it's "free" ● More data offset by more CPUs ● Classic GBM otherwise ● Build residuals tree-by-tree ● Tuning knobs: trees, depth, shrinkage, min_rows ● Pure Java
  • 22. 0xdata.com 22 GBM on Distributed Trees ● Limiting factor: latency in turning over a level ● About 4x faster than R single-node on covtype ● Does the per-level compute in parallel ● Requires sending histograms over network – Can get big for very deep trees ●
  • 23. 0xdata.com 23 Summary: Write (parallel) Java ● Most simple Java “just works” ● Fast: parallel distributed reads, writes, appends ● Reads same speed as plain Java array loads ● Writes, appends: slightly slower (compression) ● Typically memory bandwidth limited – (may be CPU limited in a few cases) ● Slower: conflicting writes (but follows strict JMM) ● Also supports transactional updates
  • 24. 0xdata.com 24 Summary: Writing Analytics ● We're writing Big Data Analytics ● Generalized Linear Modeling (ADMM, GLMNET) – Logistic Regression, Poisson, Gamma ● Random Forest, GBM, KMeans++, KNN ● State-of-the-art Algorithms, running Distributed ● Solidly working on 100G datasets ● Heading for Tera Scale ● Paying customers (in production!) ● Come write your own (distributed) algorithm!!!
  • 25. 0xdata.com 25 Cool Systems Stuff... ● … that I ran out of space for ● Reliable UDP, integrated w/RPC ● TCP is reliably UNReliable ● Already have a reliable UDP framework, so no prob ● Fork/Join Goodies: ● Priority Queues ● Distributed F/J ● Surviving fork bombs & lost threads ● K/V does JMM via hardware-like MESI protocol
  • 26. 0xdata.com 26 H2O is... ● Pure Java, Open Source: 0xdata.com ● https://github.com/0xdata/h2o/ ● A Platform for doing Math ● Parallel Distributed Math ● In-memory analytics: GLM, GBM, RF, Logistic Reg ● Accessible via REST & JSON ● A K/V Store: ~150ns per get or put ● Distributed Fork/Join + Map/Reduce + K/V
  • 27. 0xdata.com 27 The Platform NFS HDFS byte[] extends Iced extends DTask AutoBuffer RPC extends DRemoteTask D/F/J extends MRTask User code? JVM 1 NFS HDFS byte[] extends Iced extends DTask AutoBuffer RPC extends DRemoteTask D/F/J extends MRTask User code? JVM 2 K/V get/put UDP / TCP
  • 28. 0xdata.com 28 Other Simple Examples ● Filter & Count (underage males): ● (can pass in any number of Vecs or a Frame) long sumY2 = new MRTask() { long map( long age, long sex ) { return (age<=17 && sex==MALE) ? 1 : 0; } long reduce( long d1, long d2 ) { return d1+d2; } }.doAll( vecAge, vecSex );
  • 29. 0xdata.com 29 Other Simple Examples ● Filter into new set (underage males): ● Can write or append subset of rows – (append order is preserved) class Filter extends MRTask { void map(Chunk CRisk, Chunk CAge, Chunk CSex){ for( int i=0; i<CAge.len; i++ ) if( CAge.at(i)<=17 && CSex.at(i)==MALE ) CRisk.append(CAge.at(i)); // build a set } }; Vec risk = new AppendableVec(); new Filter().doAll( risk, vecAge, vecSex ); ...risk... // all the underage males
  • 30. 0xdata.com 30 Other Simple Examples ● Filter into new set (underage males): ● Can write or append subset of rows – (append order is preserved) class Filter extends MRTask { void map(Chunk CRisk, Chunk CAge, Chunk CSex){ for( int i=0; i<CAge.len; i++ ) if( CAge.at(i)<=17 && CSex.at(i)==MALE ) CRisk.append(CAge.at(i)); // build a set } }; Vec risk = new AppendableVec(); new Filter().doAll( risk, vecAge, vecSex ); ...risk... // all the underage males
  • 31. 0xdata.com 31 Other Simple Examples ● Group-by: count of car-types by age class AgeHisto extends MRTask { long carAges[][]; // count of cars by age void map( Chunk CAge, Chunk CCar ) { carAges = new long[numAges][numCars]; for( int i=0; i<CAge.len; i++ ) carAges[CAge.at(i)][CCar.at(i)]++; } void reduce( AgeHisto that ) { for( int i=0; i<carAges.length; i++ ) for( int j=0; i<carAges[j].length; j++ ) carAges[i][j] += that.carAges[i][j]; } }
  • 32. 0xdata.com 32 class AgeHisto extends MRTask { long carAges[][]; // count of cars by age void map( Chunk CAge, Chunk CCar ) { carAges = new long[numAges][numCars]; for( int i=0; i<CAge.len; i++ ) carAges[CAge.at(i)][CCar.at(i)]++; } void reduce( AgeHisto that ) { for( int i=0; i<carAges.length; i++ ) for( int j=0; i<carAges[j].length; j++ ) carAges[i][j] += that.carAges[i][j]; } } Other Simple Examples ● Group-by: count of car-types by age Setting carAges in map() makes it an output field. Private per-map call, single-threaded write access. Must be rolled-up in the reduce call. Setting carAges in map makes it an output field. Private per-map call, single-threaded write access. Must be rolled-up in the reduce call.
  • 33. 0xdata.com 33 Other Simple Examples ● Uniques ● Uses distributed hash set class Uniques extends MRTask { DNonBlockingHashSet<Long> dnbhs = new ...; void map( long id ) { dnbhs.add(id); } void reduce( Uniques that ) { dnbhs.putAll(that.dnbhs); } }; long uniques = new Uniques(). doAll( vecVistors ).dnbhs.size();
  • 34. 0xdata.com 34 Other Simple Examples ● Uniques ● Uses distributed hash set class Uniques extends MRTask { DNonBlockingHashSet<Long> dnbhs = new ...; void map( long id ) { dnbhs.add(id); } void reduce( Uniques that ) { dnbhs.putAll(that.dnbhs); } }; long uniques = new Uniques(). doAll( vecVistors ).dnbhs.size(); Setting dnbhs in <init> makes it an input field. Shared across all maps(). Often read-only. This one is written, so needs a reduce.