SlideShare a Scribd company logo
1 of 45
Download to read offline
Building a Big Data 
Machine Learning Platform 
Cliff Click, CTO 0xdata 
cliffc@0xdata.com 
http://0xdata.com 
http://cliffc.org/blog
2 
H2O is... 
● Pure Java, Open Source: 0xdata.com 
● https://github.com/0xdata/h2o/ 
● A Platform for doing Parallel Distributed Math 
● In-memory analytics: GLM, GBM, RF, Logistic Reg, 
Deep Learning, PCA, Kmeans... 
● Accessible via REST & JSON, R, Python, Java 
● Scala & now Spark!
3 
Sparkling Water 
● Bleeding edge: Spark & H2ORDDs 
● Move data back & forth, model & munge 
● Same process, same JVM 
● H2O Data as a: 
● Spark RDD or 
● Scala Collection 
● Code in: 
Frame.toRDD.runJob(...) 
Frame.foreach{...} 
● https://github.com/0xdata/h2o-dev 
● https://github.com/0xdata/perrier
4 
A Collection of Distributed Vectors 
// A Distributed Vector 
// much more than 2billion elements 
class Vec { 
long length(); // more than an int's worth 
// fast random access 
double at(long idx); // Get the idx'th elem 
boolean isNA(long idx); 
void set(long idx, double d); // writable 
void append(double d); // variable sized 
}
5 
Distributed Data Taxonomy 
A Single Vector 
Vec
6 
Distributed Data Taxonomy 
A Very Large Single Vec 
Vec 
>> 2billion elements 
●Java primitive 
●Usually double 
●Length is a long 
●>> 2^31 elements 
●Compressed 
● Often 2x to 4x 
●Random access 
●Linear access is 
FORTRAN speed
7 
Distributed Data Taxonomy 
JVM 1 
Heap 
32Gig 
JVM 2 
Heap 
32Gig 
JVM 3 
Heap 
32Gig 
JVM 4 
Heap 
32Gig 
A Single Distributed Vec 
Vec 
>> 2billion elements 
●Java Heap 
● Data In-Heap 
● Not off heap 
●Split Across Heaps 
●GC management 
● Watch FullGC 
● Spill-to-disk 
● GC very cheap 
● Default GC 
●To-the-metal speed 
●Java ease
8 
Distributed Data Taxonomy 
A Collection of Distributed Vecs 
Vec Vec Vec Vec Vec 
JVM 1 
Heap 
JVM 2 
Heap 
JVM 3 
Heap 
JVM 4 
Heap 
●Vecs aligned 
in heaps 
●Optimized for 
concurrent access 
●Random access 
any row, any JVM 
●But faster if local... 
more on that later
9 
Distributed Data Taxonomy 
JVM 1 
Heap 
JVM 2 
Heap 
JVM 3 
Heap 
JVM 4 
Heap 
A Frame: Vec[] 
age sex zip ID car 
●Similar to R frame 
●Change Vecs freely 
●Add, remove Vecs 
●Describes a row of 
user data 
●Struct-of-Arrays 
(vs ary-of-structs)
10 
Distributed Data Taxonomy 
A Chunk, Unit of Parallel Access 
Vec Vec Vec Vec Vec 
JVM 1 
Heap 
JVM 2 
Heap 
JVM 3 
Heap 
JVM 4 
Heap 
●Typically 1e3 to 
1e6 elements 
●Stored compressed 
●In byte arrays 
●Get/put is a few 
clock cycles 
including 
compression 
●Compression is 
Good: more data 
per cache-miss
11 
Distributed Data Taxonomy 
A Chunk[]: Concurrent Vec Access 
age sex zip ID car 
JVM 1 
Heap 
JVM 2 
Heap 
JVM 3 
Heap 
JVM 4 
Heap 
●Access Row in a 
single thread 
●Like a Java object 
●Can read & write: 
Mutable Vectors 
●Both are full Java 
speed 
●Conflicting writes: 
use JMM rules 
class Person { }
12 
Distributed Data Taxonomy 
JVM 1 
Heap 
JVM 2 
Heap 
JVM 3 
Heap 
JVM 4 
Heap 
Single Threaded Execution 
Vec Vec Vec Vec Vec 
●One CPU works a 
Chunk of rows 
●Fork/Join work unit 
●Big enough to cover 
control overheads 
●Small enough to 
get fine-grained par 
●Map/Reduce 
●Code written in a 
simple single-threaded 
style
13 
Distributed Data Taxonomy 
Distributed Parallel Execution 
Vec Vec Vec Vec Vec 
JVM 1 
Heap 
JVM 2 
Heap 
JVM 3 
Heap 
JVM 4 
Heap 
●All CPUs grab 
Chunks in parallel 
●F/J load balances 
●Code moves to Data 
●Map/Reduce & F/J 
handles all sync 
●H2O handles all 
comm, data manage
14 
Distributed Data Taxonomy 
Frame – a collection of Vecs 
Vec – a collection of Chunks 
Chunk – a collection of 1e3 to 1e6 elems 
elem – a java double 
Row i – i'th elements of all the Vecs in a Frame
15 
Sparkling Water: Spark and H2O 
● Convert RDDs <==> Frames 
● In memory, simple fast call 
● In process, no external tooling needed 
● Distributed – data does not move* 
● Eager, not Lazy 
● Makes a data copy! 
● H2O data is highly compressed 
● Often 1/4 to 1/10th original size 
*See fine print
16 
Spark Partitions and H2O Chunks 
ID age sex salary 
101 27 M 50000 
102 21 F 25000 
103 45 - 120000 
104 53 M 150000 
Spark: Partition[User] 
JVM Heap 
H2O: Chunk 
These structures are limited to 1 JVM heap 
There can be many in the heap, limited by only by memory 
*Only data correspondance is shown; a real data copy is required!
17 
Spark RDDs and H2O Frames 
JVM Heap #1 
ID age sex salary 
101 27 M 50000 
102 21 F 25000 
103 45 - 120000 
104 53 M 150000 
105 33 F 140000 
106 65 F - 
107 20 M 55000 
108 17 M - 
JVM Heap #2 
109 27 M 50000 
110 21 F 25000 
111 45 - 120000 
112 53 M 150000 
113 33 F 140000 
114 65 F - 
115 20 M 55000 
116 17 M - 
JVM Heap #3 
117 27 M 50000 
118 21 F 25000 
119 45 - 120000 
120 53 M 150000 
121 33 F 140000 
122 65 F - 
123 20 M 55000 
124 17 M - 
JVM Heap #4 
125 27 M 50000 
126 21 F 25000 
127 45 - 120000 
128 53 M 150000 
129 33 F 140000 
130 65 F - 
131 20 M 55000 
132 17 M - 
Frame 
RDD 
H2O: Chunk 
Partition 
Vec
18 
Sparkling Water 
● Convert to H2O Frame 
● Eager, executes RDDs immediately 
● Makes a compressed H2O copy 
val fr = toDataFrame(sparkCx,rdd) 
● Convert to Spark RDD 
● Lazy, defines a normal RDD 
● When executed, acts as a checkpoint 
val rdd = toRDD(sparkCx,fr)
19 
Distributed Coding Taxonomy 
● No Distribution Coding: 
● Whole Algorithms, Whole Vector-Math 
● REST + JSON: e.g. load data, GLM, get results 
● R, Python, Web, bash/curl 
● Simple Data-Parallel Coding: 
● Map/Reduce-style: e.g. Any dense linear algebra 
● Java/Scala foreach* style 
● Complex Data-Parallel Coding 
● K/V Store, Graph Algo's, e.g. PageRank
20 
Simple Data-Parallel Coding 
● Map/Reduce Per-Row: Stateless 
● Example from Linear Regression, Σ y2 
double sumY2 = new MRTask() { 
double map( double d ) { return d*d; } 
double reduce( double d1, double d2 ) { 
return d1+d2; 
} 
}.doAll( vecY ); 
● Auto-parallel, auto-distributed 
● Fortran speed, Java Ease
21 
Simple Data-Parallel Coding 
● Scala version in development: 
MR(var X:Double =0) { 
● Or: 
def map = (d:Double) => { X*X } 
def reduce = (that) => { X+that.X } 
}.doAll( vecY ); 
MR(0).map(_*_).reduce(add).doAll( vecY );
22 
Simple Data-Parallel Coding 
● Map/Reduce Per-Row: Statefull 
● Linear Regression Pass1: Σ x, Σ y, Σ y2 
class LRPass1 extends MRTask { 
double sumX, sumY, sumY2; // I Can Haz State? 
void map( double X, double Y ) { 
sumX += X; sumY += Y; sumY2 += Y*Y; 
} 
void reduce( LRPass1 that ) { 
sumX += that.sumX ; 
sumY += that.sumY ; 
sumY2 += that.sumY2; 
} 
}
23 
Simple Data-Parallel Coding 
● Scala version in development: 
MR { var X, Y, X2=0.0; var nrows=0L; 
def map = (x:Double, y:Double) => 
{ X=x; Y=y; X2=x*x; nrows=1 } 
def reduce = (that) => 
{ X += that.X ; Y += that.Y; 
X2+= that.X2; nrows += that.nrows } 
}.doAll(vecX,vecY)
24 
Simple Data-Parallel Coding 
● Map/Reduce Per-Row: Batch Statefull 
class LRPass1 extends MRTask { 
double sumX, sumY, sumY2; 
void map( Chunk CX, Chunk CY ) {// Whole Chunks 
for( int i=0; i<CX.len; i++ ){// Batch! 
double X = CX.at(i), Y = CY.at(i); 
sumX += X; sumY += Y; sumY2 += Y*Y; 
} 
} 
void reduce( LRPass1 that ) { 
sumX += that.sumX ; 
sumY += that.sumY ; 
sumY2 += that.sumY2; 
} 
}
25 
Other Simple Examples 
● Filter & Count (underage males): 
● (can pass in any number of Vecs or a Frame) 
long sumY2 = new MRTask() { 
long map( long age, long sex ) { 
return (age<=17 && sex==MALE) ? 1 : 0; 
} 
long reduce( long d1, long d2 ) { 
return d1+d2; 
} 
}.doAll( vecAge, vecSex ); 
● Scala syntax 
MR(0).map(_('age)<=17 && _('sex)==MALE ) 
.reduce(add).doAll( frame );
26 
Other Simple Examples 
● Filter into new set (underage males): 
● Can write or append subset of rows 
– (append order is preserved) 
class Filter extends MRTask { 
void map(Chunk CRisk, Chunk CAge, Chunk CSex){ 
for( int i=0; i<CAge.len; i++ ) 
if( CAge.at(i)<=17 && CSex.at(i)==MALE ) 
CRisk.append(CAge.at(i)); // build a set 
} 
}; 
Vec risk = new AppendableVec(); 
new Filter().doAll( risk, vecAge, vecSex ); 
...risk... // all the underage males
27 
Other Simple Examples 
● Filter into new set (underage males): 
● Can write or append subset of rows 
– (append order is preserved) 
class Filter extends MRTask { 
void map(Chunk CRisk, Chunk CAge, Chunk CSex){ 
for( int i=0; i<CAge.len; i++ ) 
if( CAge.at(i)<=17 && CSex.at(i)==MALE ) 
CRisk.append(CAge.at(i)); // build a set 
} 
}; 
Vec risk = new AppendableVec(); 
new Filter().doAll( risk, vecAge, vecSex ); 
...risk... // all the underage males
28 
Other Simple Examples 
● Group-by: count of car-types by age 
class AgeHisto extends MRTask { 
long carAges[][]; // count of cars by age 
void map( Chunk CAge, Chunk CCar ) { 
carAges = new long[numAges][numCars]; 
for( int i=0; i<CAge.len; i++ ) 
carAges[CAge.at(i)][CCar.at(i)]++; 
} 
void reduce( AgeHisto that ) { 
for( int i=0; i<carAges.length; i++ ) 
for( int j=0; i<carAges[j].length; j++ ) 
carAges[i][j] += that.carAges[i][j]; 
} 
}
29 
Other Simple Examples 
● Group-by: count of car-types by age 
Setting carAges in map makes it an output field. 
Private per-map call, single-threaded write access. 
class AgeHisto extends MRTask { 
reduce long carAges[][]; // count of cars by age 
void map( Chunk CAge, Chunk CCar ) { 
carAges = new long[numAges][numCars]; 
for( int i=0; i<CAge.len; i++ ) 
carAges[CAge.at(i)][CCar.at(i)]++; 
} 
void reduce( AgeHisto that ) { 
for( int i=0; i<carAges.length; i++ ) 
for( int j=0; i<carAges[j].length; j++ ) 
carAges[i][j] += that.carAges[i][j]; 
} 
} 
Setting carAges in map() makes it an output field. 
Must be rolled-up in the call.
30 
Other Simple Examples 
● Uniques 
● Uses distributed hash set 
class Uniques extends MRTask { 
DNonBlockingHashSet<Long> dnbhs = new ...; 
void map( long id ) { dnbhs.add(id); } 
void reduce( Uniques that ) { 
dnbhs.putAll(that.dnbhs); 
} 
}; 
long uniques = new Uniques(). 
doAll( vecVistors ).dnbhs.size();
31 
Other Simple Examples 
● Uniques 
Setting dnbhs in <init> makes it an input field. 
Shared across all maps(). Often read-only. 
● Uses distributed hash set 
class Uniques extends MRTask { 
This one is written, so needs a reduce. 
DNonBlockingHashSet<Long> dnbhs = new ...; 
void map( long id ) { dnbhs.add(id); } 
void reduce( Uniques that ) { 
dnbhs.putAll(that.dnbhs); 
} 
}; 
long uniques = new Uniques(). 
doAll( vecVistors ).dnbhs.size();
32 
Summary: Write (parallel) Java 
● Most simple Java “just works” 
● Scala API is experimental, but will also "just work" 
● Fast: parallel distributed reads, writes, appends 
● Reads same speed as plain Java array loads 
● Writes, appends: slightly slower (compression) 
● Typically memory bandwidth limited 
– (may be CPU limited in a few cases) 
● Slower: conflicting writes (but follows strict JMM) 
● Also supports transactional updates
33 
Summary: Writing Analytics 
● We're writing Big Data Distributed Analytics 
● Deep Learning 
● Generalized Linear Modeling (ADMM, GLMNET) 
– Logistic Regression, Poisson, Gamma 
● Random Forest, GBM, KMeans++, KNN 
● Solidly working on 100G datasets 
● Testing Tera Scale Now 
● Paying customers (in production!) 
● Come write your own (distributed) algorithm!!!
34 
Q & A 
0xdata.com 
https://github.com/0xdata/h2o 
https://github.com/0xdata/h2o-dev 
https://github.com/0xdata/h2o-perrier
35 
Cool Systems Stuff... 
● … that I ran out of space for 
● Reliable UDP, integrated w/RPC 
● TCP is reliably UNReliable 
● Already have a reliable UDP framework, so no prob 
● Fork/Join Goodies: 
● Priority Queues 
● Distributed F/J 
● Surviving fork bombs & lost threads 
● K/V does JMM via hardware-like MESI protocol
36 
Speed Concerns 
● How fast is fast? 
● Data is Big (by definition!) & must see it all 
● Typically: less math than memory bandwidth 
● So decompression happens while waiting for mem 
● More (de)compression is better 
● Currently 15 compression schemes 
● Picked per-chunk, so can (does) vary across dataset 
● All decompression schemes take 2-10 cycles max 
● Time leftover for plenty of math
37 
Speed Concerns 
● Serialization: 
● Rarely send Big Data around (too much of that! 
Must be normally node-local access) 
● Instead it's POJO's doing the math (Histograms, 
Gram Matrix, sums & variances, etc) 
● Bytecode weaver on class load 
● Write fields via Unsafe into DirectByteBuffers 
● 2-byte unique token defines type (and nested types) 
● Compression on that too! (more CPU than network)
38 
Serialization 
● Write fields via Unsafe into DirectByteBuffers 
● All from simple JIT'd code - 
– Just the loads & stores, nothing else 
● 2-byte token once per top-level RPC 
– (more tokens if subclassed objects used) 
● Streaming async NIO 
● Multiple shared TCP & UDP channels 
– Small stuff via UDP & big via TCP 
● Full app-level retry & error recovery 
– (can pull cable & re-insert & all will recover)
39 
Map / Reduce 
● Map: Once-per-chunk; typically 1000's per-node 
● Using Fork/Join for fine-grained parallelism 
● Reduce: reduce-early-often – after every 2 maps 
● Deterministic, same Maps, same rows every time 
● Until all the Maps & Reduces on a Node are done 
● Then ship results over-the-wire 
● And Reduce globally in a log-tree rollup 
● Network latency is 2 log-tree traversals
40 
Fork/Join Experience 
● Really Good (except when it's not) 
● Good Stuff: easy to write... 
● (after a steep learning curve) 
● Works! Fine to have many many small jobs, load 
balances across CPUs, keeps 'em all busy, etc. 
● Full-featured, flexible 
● We've got 100's of uses of it scattered throughout
41 
Fork / Join Experience 
● Really Good (except when it's not) 
● Blocking threads is hard on F/J 
– (ManagedBlocker.block API is painful) 
– Still get thread starvation sometimes 
● "CountedCompleters" – CPS by any other name 
– Painful to write explicit-CPS in Java 
● No priority queues – a Must Have 
● And no Java thread priorities 
● So built up priority queues over F/J & JVM
42 
Fork/Join Experience 
● Default exception is silently dropped 
● Usual symptom: all threads idle, but job not done 
● Complete maintenance disaster – must catch & 
track & log all exceptions 
– (and even pass around cluster distributed) 
● Forgotten “tryComplete()” not too hard to track 
● Fork-Bomb – must cap all thread pools 
● Which can lead to deadlock 
● Which leads to using CPS-style occasionally 
● Despite issues, I'd use it again
43 
The Platform 
JVM 1 
extends MRTask User code? 
extends DRemoteTask D/F/J 
extends DTask 
extends Iced 
byte[] 
NFS 
HDFS 
RPC 
AutoBuffer 
JVM 2 
extends MRTask User code? 
extends DRemoteTask D/F/J 
extends DTask 
extends Iced 
byte[] 
NFS 
HDFS 
RPC 
AutoBuffer 
K/V get/put 
UDP / TCP
44 
TCP Fails 
● In <5mins, I can force a TCP fail on Linux 
● "Fail": means Server opens+writes+closes 
● NO ERRORS 
● Client gets no data, no errors 
● In my lab (no virtualization) or EC2 
● Basically, H2O can mimic a DDOS attack 
● And Linux will "cheat" on the TCP protocol 
● And cancel valid, in-progress, TCP handshakes 
● Verified w/wireshark
45 
TCP Fails 
● Any formal verification? (yes lots) 
● Of recent Linux kernals? 
● Ones with DDOS-defense built-in?

More Related Content

What's hot

Handling 20 billion requests a month
Handling 20 billion requests a monthHandling 20 billion requests a month
Handling 20 billion requests a month
Dmitriy Dumanskiy
 
Grand Central Dispatch - iOS Conf SG 2015
Grand Central Dispatch - iOS Conf SG 2015Grand Central Dispatch - iOS Conf SG 2015
Grand Central Dispatch - iOS Conf SG 2015
Ben Asher
 

What's hot (20)

Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Scala meetup - Intro to spark
Scala meetup - Intro to sparkScala meetup - Intro to spark
Scala meetup - Intro to spark
 
Using timed-release cryptography to mitigate the preservation risk of embargo...
Using timed-release cryptography to mitigate the preservation risk of embargo...Using timed-release cryptography to mitigate the preservation risk of embargo...
Using timed-release cryptography to mitigate the preservation risk of embargo...
 
Baker: Scaling OVN with Kubernetes API Server
Baker: Scaling OVN with Kubernetes API ServerBaker: Scaling OVN with Kubernetes API Server
Baker: Scaling OVN with Kubernetes API Server
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloaded
 
OVN Controller Incremental Processing
OVN Controller Incremental ProcessingOVN Controller Incremental Processing
OVN Controller Incremental Processing
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
Handling 20 billion requests a month
Handling 20 billion requests a monthHandling 20 billion requests a month
Handling 20 billion requests a month
 
JEEConf. Vanilla java
JEEConf. Vanilla javaJEEConf. Vanilla java
JEEConf. Vanilla java
 
Progress_190315
Progress_190315Progress_190315
Progress_190315
 
Grand Central Dispatch - iOS Conf SG 2015
Grand Central Dispatch - iOS Conf SG 2015Grand Central Dispatch - iOS Conf SG 2015
Grand Central Dispatch - iOS Conf SG 2015
 
Three CRuby Performance Projects
Three CRuby Performance ProjectsThree CRuby Performance Projects
Three CRuby Performance Projects
 
RSA NetWitness Log Decoder
RSA NetWitness Log DecoderRSA NetWitness Log Decoder
RSA NetWitness Log Decoder
 
Ultrasound Image Viewer - Qt + SGX
Ultrasound Image Viewer - Qt + SGXUltrasound Image Viewer - Qt + SGX
Ultrasound Image Viewer - Qt + SGX
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Inside the JVM - Performance & Garbage Collector Tuning in JAVA
Inside the JVM - Performance & Garbage Collector Tuning in JAVA Inside the JVM - Performance & Garbage Collector Tuning in JAVA
Inside the JVM - Performance & Garbage Collector Tuning in JAVA
 
RealmDB for Android
RealmDB for AndroidRealmDB for Android
RealmDB for Android
 
Gor Nishanov, C++ Coroutines – a negative overhead abstraction
Gor Nishanov,  C++ Coroutines – a negative overhead abstractionGor Nishanov,  C++ Coroutines – a negative overhead abstraction
Gor Nishanov, C++ Coroutines – a negative overhead abstraction
 
Rust in TiKV
Rust in TiKVRust in TiKV
Rust in TiKV
 

Similar to Building a Big Data Machine Learning Platform

Similar to Building a Big Data Machine Learning Platform (20)

H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt Dowle
 
GBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APIGBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O API
 
Simd programming introduction
Simd programming introductionSimd programming introduction
Simd programming introduction
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
DConf 2016: Keynote by Walter Bright
DConf 2016: Keynote by Walter Bright DConf 2016: Keynote by Walter Bright
DConf 2016: Keynote by Walter Bright
 
A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
ProxySQL at Scale on AWS.pdf
ProxySQL at Scale on AWS.pdfProxySQL at Scale on AWS.pdf
ProxySQL at Scale on AWS.pdf
 
Exactly once with spark streaming
Exactly once with spark streamingExactly once with spark streaming
Exactly once with spark streaming
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018
 
Android Developer Days: Increasing performance of big arrays processing on An...
Android Developer Days: Increasing performance of big arrays processing on An...Android Developer Days: Increasing performance of big arrays processing on An...
Android Developer Days: Increasing performance of big arrays processing on An...
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in Spark
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Arvindsujeeth scaladays12
Arvindsujeeth scaladays12Arvindsujeeth scaladays12
Arvindsujeeth scaladays12
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 

Building a Big Data Machine Learning Platform

  • 1. Building a Big Data Machine Learning Platform Cliff Click, CTO 0xdata cliffc@0xdata.com http://0xdata.com http://cliffc.org/blog
  • 2. 2 H2O is... ● Pure Java, Open Source: 0xdata.com ● https://github.com/0xdata/h2o/ ● A Platform for doing Parallel Distributed Math ● In-memory analytics: GLM, GBM, RF, Logistic Reg, Deep Learning, PCA, Kmeans... ● Accessible via REST & JSON, R, Python, Java ● Scala & now Spark!
  • 3. 3 Sparkling Water ● Bleeding edge: Spark & H2ORDDs ● Move data back & forth, model & munge ● Same process, same JVM ● H2O Data as a: ● Spark RDD or ● Scala Collection ● Code in: Frame.toRDD.runJob(...) Frame.foreach{...} ● https://github.com/0xdata/h2o-dev ● https://github.com/0xdata/perrier
  • 4. 4 A Collection of Distributed Vectors // A Distributed Vector // much more than 2billion elements class Vec { long length(); // more than an int's worth // fast random access double at(long idx); // Get the idx'th elem boolean isNA(long idx); void set(long idx, double d); // writable void append(double d); // variable sized }
  • 5. 5 Distributed Data Taxonomy A Single Vector Vec
  • 6. 6 Distributed Data Taxonomy A Very Large Single Vec Vec >> 2billion elements ●Java primitive ●Usually double ●Length is a long ●>> 2^31 elements ●Compressed ● Often 2x to 4x ●Random access ●Linear access is FORTRAN speed
  • 7. 7 Distributed Data Taxonomy JVM 1 Heap 32Gig JVM 2 Heap 32Gig JVM 3 Heap 32Gig JVM 4 Heap 32Gig A Single Distributed Vec Vec >> 2billion elements ●Java Heap ● Data In-Heap ● Not off heap ●Split Across Heaps ●GC management ● Watch FullGC ● Spill-to-disk ● GC very cheap ● Default GC ●To-the-metal speed ●Java ease
  • 8. 8 Distributed Data Taxonomy A Collection of Distributed Vecs Vec Vec Vec Vec Vec JVM 1 Heap JVM 2 Heap JVM 3 Heap JVM 4 Heap ●Vecs aligned in heaps ●Optimized for concurrent access ●Random access any row, any JVM ●But faster if local... more on that later
  • 9. 9 Distributed Data Taxonomy JVM 1 Heap JVM 2 Heap JVM 3 Heap JVM 4 Heap A Frame: Vec[] age sex zip ID car ●Similar to R frame ●Change Vecs freely ●Add, remove Vecs ●Describes a row of user data ●Struct-of-Arrays (vs ary-of-structs)
  • 10. 10 Distributed Data Taxonomy A Chunk, Unit of Parallel Access Vec Vec Vec Vec Vec JVM 1 Heap JVM 2 Heap JVM 3 Heap JVM 4 Heap ●Typically 1e3 to 1e6 elements ●Stored compressed ●In byte arrays ●Get/put is a few clock cycles including compression ●Compression is Good: more data per cache-miss
  • 11. 11 Distributed Data Taxonomy A Chunk[]: Concurrent Vec Access age sex zip ID car JVM 1 Heap JVM 2 Heap JVM 3 Heap JVM 4 Heap ●Access Row in a single thread ●Like a Java object ●Can read & write: Mutable Vectors ●Both are full Java speed ●Conflicting writes: use JMM rules class Person { }
  • 12. 12 Distributed Data Taxonomy JVM 1 Heap JVM 2 Heap JVM 3 Heap JVM 4 Heap Single Threaded Execution Vec Vec Vec Vec Vec ●One CPU works a Chunk of rows ●Fork/Join work unit ●Big enough to cover control overheads ●Small enough to get fine-grained par ●Map/Reduce ●Code written in a simple single-threaded style
  • 13. 13 Distributed Data Taxonomy Distributed Parallel Execution Vec Vec Vec Vec Vec JVM 1 Heap JVM 2 Heap JVM 3 Heap JVM 4 Heap ●All CPUs grab Chunks in parallel ●F/J load balances ●Code moves to Data ●Map/Reduce & F/J handles all sync ●H2O handles all comm, data manage
  • 14. 14 Distributed Data Taxonomy Frame – a collection of Vecs Vec – a collection of Chunks Chunk – a collection of 1e3 to 1e6 elems elem – a java double Row i – i'th elements of all the Vecs in a Frame
  • 15. 15 Sparkling Water: Spark and H2O ● Convert RDDs <==> Frames ● In memory, simple fast call ● In process, no external tooling needed ● Distributed – data does not move* ● Eager, not Lazy ● Makes a data copy! ● H2O data is highly compressed ● Often 1/4 to 1/10th original size *See fine print
  • 16. 16 Spark Partitions and H2O Chunks ID age sex salary 101 27 M 50000 102 21 F 25000 103 45 - 120000 104 53 M 150000 Spark: Partition[User] JVM Heap H2O: Chunk These structures are limited to 1 JVM heap There can be many in the heap, limited by only by memory *Only data correspondance is shown; a real data copy is required!
  • 17. 17 Spark RDDs and H2O Frames JVM Heap #1 ID age sex salary 101 27 M 50000 102 21 F 25000 103 45 - 120000 104 53 M 150000 105 33 F 140000 106 65 F - 107 20 M 55000 108 17 M - JVM Heap #2 109 27 M 50000 110 21 F 25000 111 45 - 120000 112 53 M 150000 113 33 F 140000 114 65 F - 115 20 M 55000 116 17 M - JVM Heap #3 117 27 M 50000 118 21 F 25000 119 45 - 120000 120 53 M 150000 121 33 F 140000 122 65 F - 123 20 M 55000 124 17 M - JVM Heap #4 125 27 M 50000 126 21 F 25000 127 45 - 120000 128 53 M 150000 129 33 F 140000 130 65 F - 131 20 M 55000 132 17 M - Frame RDD H2O: Chunk Partition Vec
  • 18. 18 Sparkling Water ● Convert to H2O Frame ● Eager, executes RDDs immediately ● Makes a compressed H2O copy val fr = toDataFrame(sparkCx,rdd) ● Convert to Spark RDD ● Lazy, defines a normal RDD ● When executed, acts as a checkpoint val rdd = toRDD(sparkCx,fr)
  • 19. 19 Distributed Coding Taxonomy ● No Distribution Coding: ● Whole Algorithms, Whole Vector-Math ● REST + JSON: e.g. load data, GLM, get results ● R, Python, Web, bash/curl ● Simple Data-Parallel Coding: ● Map/Reduce-style: e.g. Any dense linear algebra ● Java/Scala foreach* style ● Complex Data-Parallel Coding ● K/V Store, Graph Algo's, e.g. PageRank
  • 20. 20 Simple Data-Parallel Coding ● Map/Reduce Per-Row: Stateless ● Example from Linear Regression, Σ y2 double sumY2 = new MRTask() { double map( double d ) { return d*d; } double reduce( double d1, double d2 ) { return d1+d2; } }.doAll( vecY ); ● Auto-parallel, auto-distributed ● Fortran speed, Java Ease
  • 21. 21 Simple Data-Parallel Coding ● Scala version in development: MR(var X:Double =0) { ● Or: def map = (d:Double) => { X*X } def reduce = (that) => { X+that.X } }.doAll( vecY ); MR(0).map(_*_).reduce(add).doAll( vecY );
  • 22. 22 Simple Data-Parallel Coding ● Map/Reduce Per-Row: Statefull ● Linear Regression Pass1: Σ x, Σ y, Σ y2 class LRPass1 extends MRTask { double sumX, sumY, sumY2; // I Can Haz State? void map( double X, double Y ) { sumX += X; sumY += Y; sumY2 += Y*Y; } void reduce( LRPass1 that ) { sumX += that.sumX ; sumY += that.sumY ; sumY2 += that.sumY2; } }
  • 23. 23 Simple Data-Parallel Coding ● Scala version in development: MR { var X, Y, X2=0.0; var nrows=0L; def map = (x:Double, y:Double) => { X=x; Y=y; X2=x*x; nrows=1 } def reduce = (that) => { X += that.X ; Y += that.Y; X2+= that.X2; nrows += that.nrows } }.doAll(vecX,vecY)
  • 24. 24 Simple Data-Parallel Coding ● Map/Reduce Per-Row: Batch Statefull class LRPass1 extends MRTask { double sumX, sumY, sumY2; void map( Chunk CX, Chunk CY ) {// Whole Chunks for( int i=0; i<CX.len; i++ ){// Batch! double X = CX.at(i), Y = CY.at(i); sumX += X; sumY += Y; sumY2 += Y*Y; } } void reduce( LRPass1 that ) { sumX += that.sumX ; sumY += that.sumY ; sumY2 += that.sumY2; } }
  • 25. 25 Other Simple Examples ● Filter & Count (underage males): ● (can pass in any number of Vecs or a Frame) long sumY2 = new MRTask() { long map( long age, long sex ) { return (age<=17 && sex==MALE) ? 1 : 0; } long reduce( long d1, long d2 ) { return d1+d2; } }.doAll( vecAge, vecSex ); ● Scala syntax MR(0).map(_('age)<=17 && _('sex)==MALE ) .reduce(add).doAll( frame );
  • 26. 26 Other Simple Examples ● Filter into new set (underage males): ● Can write or append subset of rows – (append order is preserved) class Filter extends MRTask { void map(Chunk CRisk, Chunk CAge, Chunk CSex){ for( int i=0; i<CAge.len; i++ ) if( CAge.at(i)<=17 && CSex.at(i)==MALE ) CRisk.append(CAge.at(i)); // build a set } }; Vec risk = new AppendableVec(); new Filter().doAll( risk, vecAge, vecSex ); ...risk... // all the underage males
  • 27. 27 Other Simple Examples ● Filter into new set (underage males): ● Can write or append subset of rows – (append order is preserved) class Filter extends MRTask { void map(Chunk CRisk, Chunk CAge, Chunk CSex){ for( int i=0; i<CAge.len; i++ ) if( CAge.at(i)<=17 && CSex.at(i)==MALE ) CRisk.append(CAge.at(i)); // build a set } }; Vec risk = new AppendableVec(); new Filter().doAll( risk, vecAge, vecSex ); ...risk... // all the underage males
  • 28. 28 Other Simple Examples ● Group-by: count of car-types by age class AgeHisto extends MRTask { long carAges[][]; // count of cars by age void map( Chunk CAge, Chunk CCar ) { carAges = new long[numAges][numCars]; for( int i=0; i<CAge.len; i++ ) carAges[CAge.at(i)][CCar.at(i)]++; } void reduce( AgeHisto that ) { for( int i=0; i<carAges.length; i++ ) for( int j=0; i<carAges[j].length; j++ ) carAges[i][j] += that.carAges[i][j]; } }
  • 29. 29 Other Simple Examples ● Group-by: count of car-types by age Setting carAges in map makes it an output field. Private per-map call, single-threaded write access. class AgeHisto extends MRTask { reduce long carAges[][]; // count of cars by age void map( Chunk CAge, Chunk CCar ) { carAges = new long[numAges][numCars]; for( int i=0; i<CAge.len; i++ ) carAges[CAge.at(i)][CCar.at(i)]++; } void reduce( AgeHisto that ) { for( int i=0; i<carAges.length; i++ ) for( int j=0; i<carAges[j].length; j++ ) carAges[i][j] += that.carAges[i][j]; } } Setting carAges in map() makes it an output field. Must be rolled-up in the call.
  • 30. 30 Other Simple Examples ● Uniques ● Uses distributed hash set class Uniques extends MRTask { DNonBlockingHashSet<Long> dnbhs = new ...; void map( long id ) { dnbhs.add(id); } void reduce( Uniques that ) { dnbhs.putAll(that.dnbhs); } }; long uniques = new Uniques(). doAll( vecVistors ).dnbhs.size();
  • 31. 31 Other Simple Examples ● Uniques Setting dnbhs in <init> makes it an input field. Shared across all maps(). Often read-only. ● Uses distributed hash set class Uniques extends MRTask { This one is written, so needs a reduce. DNonBlockingHashSet<Long> dnbhs = new ...; void map( long id ) { dnbhs.add(id); } void reduce( Uniques that ) { dnbhs.putAll(that.dnbhs); } }; long uniques = new Uniques(). doAll( vecVistors ).dnbhs.size();
  • 32. 32 Summary: Write (parallel) Java ● Most simple Java “just works” ● Scala API is experimental, but will also "just work" ● Fast: parallel distributed reads, writes, appends ● Reads same speed as plain Java array loads ● Writes, appends: slightly slower (compression) ● Typically memory bandwidth limited – (may be CPU limited in a few cases) ● Slower: conflicting writes (but follows strict JMM) ● Also supports transactional updates
  • 33. 33 Summary: Writing Analytics ● We're writing Big Data Distributed Analytics ● Deep Learning ● Generalized Linear Modeling (ADMM, GLMNET) – Logistic Regression, Poisson, Gamma ● Random Forest, GBM, KMeans++, KNN ● Solidly working on 100G datasets ● Testing Tera Scale Now ● Paying customers (in production!) ● Come write your own (distributed) algorithm!!!
  • 34. 34 Q & A 0xdata.com https://github.com/0xdata/h2o https://github.com/0xdata/h2o-dev https://github.com/0xdata/h2o-perrier
  • 35. 35 Cool Systems Stuff... ● … that I ran out of space for ● Reliable UDP, integrated w/RPC ● TCP is reliably UNReliable ● Already have a reliable UDP framework, so no prob ● Fork/Join Goodies: ● Priority Queues ● Distributed F/J ● Surviving fork bombs & lost threads ● K/V does JMM via hardware-like MESI protocol
  • 36. 36 Speed Concerns ● How fast is fast? ● Data is Big (by definition!) & must see it all ● Typically: less math than memory bandwidth ● So decompression happens while waiting for mem ● More (de)compression is better ● Currently 15 compression schemes ● Picked per-chunk, so can (does) vary across dataset ● All decompression schemes take 2-10 cycles max ● Time leftover for plenty of math
  • 37. 37 Speed Concerns ● Serialization: ● Rarely send Big Data around (too much of that! Must be normally node-local access) ● Instead it's POJO's doing the math (Histograms, Gram Matrix, sums & variances, etc) ● Bytecode weaver on class load ● Write fields via Unsafe into DirectByteBuffers ● 2-byte unique token defines type (and nested types) ● Compression on that too! (more CPU than network)
  • 38. 38 Serialization ● Write fields via Unsafe into DirectByteBuffers ● All from simple JIT'd code - – Just the loads & stores, nothing else ● 2-byte token once per top-level RPC – (more tokens if subclassed objects used) ● Streaming async NIO ● Multiple shared TCP & UDP channels – Small stuff via UDP & big via TCP ● Full app-level retry & error recovery – (can pull cable & re-insert & all will recover)
  • 39. 39 Map / Reduce ● Map: Once-per-chunk; typically 1000's per-node ● Using Fork/Join for fine-grained parallelism ● Reduce: reduce-early-often – after every 2 maps ● Deterministic, same Maps, same rows every time ● Until all the Maps & Reduces on a Node are done ● Then ship results over-the-wire ● And Reduce globally in a log-tree rollup ● Network latency is 2 log-tree traversals
  • 40. 40 Fork/Join Experience ● Really Good (except when it's not) ● Good Stuff: easy to write... ● (after a steep learning curve) ● Works! Fine to have many many small jobs, load balances across CPUs, keeps 'em all busy, etc. ● Full-featured, flexible ● We've got 100's of uses of it scattered throughout
  • 41. 41 Fork / Join Experience ● Really Good (except when it's not) ● Blocking threads is hard on F/J – (ManagedBlocker.block API is painful) – Still get thread starvation sometimes ● "CountedCompleters" – CPS by any other name – Painful to write explicit-CPS in Java ● No priority queues – a Must Have ● And no Java thread priorities ● So built up priority queues over F/J & JVM
  • 42. 42 Fork/Join Experience ● Default exception is silently dropped ● Usual symptom: all threads idle, but job not done ● Complete maintenance disaster – must catch & track & log all exceptions – (and even pass around cluster distributed) ● Forgotten “tryComplete()” not too hard to track ● Fork-Bomb – must cap all thread pools ● Which can lead to deadlock ● Which leads to using CPS-style occasionally ● Despite issues, I'd use it again
  • 43. 43 The Platform JVM 1 extends MRTask User code? extends DRemoteTask D/F/J extends DTask extends Iced byte[] NFS HDFS RPC AutoBuffer JVM 2 extends MRTask User code? extends DRemoteTask D/F/J extends DTask extends Iced byte[] NFS HDFS RPC AutoBuffer K/V get/put UDP / TCP
  • 44. 44 TCP Fails ● In <5mins, I can force a TCP fail on Linux ● "Fail": means Server opens+writes+closes ● NO ERRORS ● Client gets no data, no errors ● In my lab (no virtualization) or EC2 ● Basically, H2O can mimic a DDOS attack ● And Linux will "cheat" on the TCP protocol ● And cancel valid, in-progress, TCP handshakes ● Verified w/wireshark
  • 45. 45 TCP Fails ● Any formal verification? (yes lots) ● Of recent Linux kernals? ● Ones with DDOS-defense built-in?