SlideShare a Scribd company logo
© 2015 IBM Corporation
S7/8: SystemML’s Optimizer and Runtime
Matthias Boehm1, Arvind C. Surve2
1 IBM Research – Almaden
2 IBM Spark Technology Center
IBM Research
© 2015 IBM Corporation
Abstraction: The Good, the Bad and the Ugly
2 IBM Research
q = t(X) %*% (w * (X %*% v))
[adapted from Peter Alvaro:"I See What You Mean“,
Strange Loop, 2015]
Simple & Analysis-Centric
Data Independence
Platform Independence
Adaptivity
(Missing)
Size InformationOperator
Selection
(Missing) Rewrites
Distributed
Operations
Distributed
Storage
(Implicit)
Copy-on-Write
Data Skew
Load
Imbalance
Latency
Complex Control Flow
Local / Remote
Memory Budgets
The Ugly: Expectations ≠ Reality
è Understanding of optimizer and runtime techniques
underpinning declarative, large-scale ML
Efficiency & Performance
© 2015 IBM Corporation
Outline
§ Common Framework
§ Optimizer-Centric Techniques
§ Runtime-Centric Techniques
– ParFor Optimizer/Runtime
– Buffer Pool + Specific Optimizations
– Spark-Specific Rewrites
– Partitioning-Preserving Operations
– Update In-Place
– Ongoing Research (CLA)
3 IBM Research
© 2015 IBM Corporation
Optimization through ParFor
§ Motivation
– SystemML focus primarily on data parallelism
– Dedicated parfor construct for task parallelism
§ ParFor approach:
– Complementary parfor parallelization strategies
– Cost-based optimization framework for task-parallel ML
– Memory budget as common constraint
4 IBM Research
© 2015 IBM Corporation
Recap: Basic HOP DAG Compilation
Example Pearson Correlation
§ DML
Script
§ HOP
DAG
5 IBM Research
X = read( "./in/X" ); #data on HDFS
Y = read( "./in/Y" );
m = nrow(X);
sigmaX = sqrt( centralMoment(X,2)*(m/(m-1.0)) );
sigmaY = sqrt( centralMoment(Y,2)*(m/(m-1.0)) );
r = cov(X,Y) / (sigmaX * sigmaY);
write( r, "./out/r" );
b(cov)
X
r (“./out/r“)
Y (“./in/Y“, 106
x1)
b(cm) b(cm)
b(*) b(*)
2
u(sqrt) u(sqrt)
b(*)
b(/ )
b(/ )
b(-)
1,000,000 1
w/ o constant
folding (1.000001)
(“./in/X“,
106
x1)
u() … unary operator
b() … binary operator
cov … covariance
cm … central moment
sqrt … square root
yx
yx
YX
σσ
ρ
),cov(
, =
Exploit Spark/MR
data parallelism
if beneficial/required
© 2015 IBM Corporation
Running Example: Pairwise Pearson Correlation
§ Representative for more complex bivariate statistics
(Pearson‘s R, Anova F, Chi-squared, Degree of freedom, P-value, Cramers V, Spearman, etc)
6 IBM Research
D = read("./input/D");
m = nrow(D);
n = ncol(D);
R = matrix(0, rows=n, cols=n);
parfor( i in 1:(n-1) ) {
X = D[ ,i];
m2X = centralMoment(X,2);
sigmaX = sqrt( m2X*(m/(m-1.0)) );
parfor( j in (i+1):n ) {
Y = D[ ,j];
m2Y = centralMoment(Y,2);
sigmaY = sqrt( m2Y*(m/(m-1.0)) );
R[i,j] = cov(X,Y) / (sigmaX*sigmaY);
}}
write(R, "./output/R");
Challenges:
• Triangular nested loop
• Column-wise access on
unordered distributed data
• Bivariate all-to-all data
shuffling pattern.
Exploit task and
data parallelism
if beneficial/required
© 2015 IBM Corporation
Overview Parallelization Strategies
§ Conceptual Design: Master/worker
– Task: group of parfor iterations
§ Task Partitioning
– Naive, static, fixed, factoring,
factoring_cmax
– Task overhead vs load balance?
§ Task Execution
– Local, remote (Spark/MR), remoteDP (Spark/MR)
– Various runtime optimizations
– Degree of parallelism/IO/latency?
§ Result Aggregation
– Local memory, local file, remote (Spark/MR)
– W/ and w/o compare
– Result locality/IO/latency?
7 IBM Research
n = 12
parfor( i in 1:(n-1) ) {
X = D[ ,i];
…
R[i,j] = …
}
è Optimizer leverages
these to generate
efficient execution
plans
© 2015 IBM Corporation
Example Task Partitioning
8 IBM Research
§ Scenario: k=24 workers, 10,000 iterations
Factoring Factoring CMAX (150)
0
50
100
150
200
250
#	of	Iterations
Tasks	(1	to		208)
Naive Fixed(250) Static
0
50
100
150
200
250
300
350
400
450
#	of	Iterations
Tasks	(1	to		24)
0
50
100
150
200
250
300
#	of	Iterations
Tasks	(1	to		40)
0
10
20
30
40
50
1	Iteration	per	task
Tasks	(1	to		10000)
0
50
100
150
200
250
#	of		Iterations
Tasks	(1	to		228)
© 2015 IBM Corporation
Task Execution: Local and Remote Parallelism
9 IBM Research
Local execution (multicore) Remote execution (cluster)
Local
ParWorker k
ParFOR (local)
Local
ParWorker 1
while(wßdeq())
foreach pi ∈ w
execute(prog(pi))
Task Partitioning
Parallel Result Aggregation
Task Queue
...
w5: i, { 11}
w4: i, { 9,10}
w3: i, { 7, 8 }
w2: i, { 4,5,6}
w1: i, { 1,2,3}
Hadoop
ParWorker
Mapper k
ParFOR (remote)
ParWorker
Mapper 1
map(key,value)
wßparse(value)
foreach pi ∈ w
execute(prog(pi))
Task Partitioning
Parallel Result Aggregation
...
…
A|MATRIX|./ out/ A7tmp
w5: i, { 11}
w4: i, { 9,10}
w3: i, { 7, 8 }
w2: i, { 4,5,6}
w1: i, { 1,2,3}
Hybrid parallelism: combinations of local/remote and data-parallel jobs
© 2015 IBM Corporation
Task Execution: Runtime Optimizations
§ Data Partitioning
– Problem: Repeated MR
jobs for indexed access
– Access-awareness
(cost estimation, correct plan generation)
– Operators: local file-based, remote MR job
§ Data Locality
– Problem: Co-location of parfor tasks to partitions/matrices
– Location reporting
per logical parfor
task (e.g., for
parfor(i) à D[, i])
10 IBM Research
parfor( i in 1:(n-1) ) {
X = D[ ,i]; …
parfor( j in (i+1):n ){
Y = D[ ,j]; …
}}
N ode2
D
3
D
4
D
5
D
9
D
10
D
11
Node 1
N ode1
D
1
D
2
D
6
D
7
D
8
Node 2
Node 1
Node 1, 2
Node 2 w5: i, { 11}
w4: i, { 9,10}
w3: i, { 7, 8 }
w2: i, { 4,5,6}
w1: i, { 1,2,3}
Reported
Locations: Task File
Partitions Partitions
© 2015 IBM Corporation
Optimization Framework – Problem Formulation
§ Design: Runtime optimization for each top-level parfor
§ Plan Tree P
– Nodes NP
• Exec type et
• Parallelism k
• Attributes A
– Height h
– Exec contexts ECP
§ Plan Tree Optimization Problem
–
11 IBM Research
ParFOR
b(cm)
Generic ParFOR
Generic
RIX LIX b(cov)...
RIX b(cm)...
ec0
ParFOR
b(cm)
Generic ParFOR
ec1
Generic
RIX LIX b(cov)...
RIX b(cm)... cmec = 600 MB
ckec = 1
cmec = 1024 MB
ckec = 16
MR
ec … execution context
cm … memory constraint
ck … parallelism constraint
[M. Boehm et al. Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML PVLDB 7(7), 2014]
[M. Boehm et al. Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs.CoRR,2015]
© 2015 IBM Corporation
Optimization Framework – Cost Model / Optimizer
§ Overview Heuristic Optimizer
– Time- and memory-based cost model w/o shared reads
– Heuristic high-impact rewrites
– Transformation-based search strategy with global opt scope
§ Cost Model
– HOP DAG
size propagation
– Worst-case
memory estimates
– Time estimates
– Plan tree statistics
aggregation
12 IBM Research
ParFOR
b(cm)
Generic ParFOR
Generic
RIX LIX b(cov)...
RIX b(cm)...
Plan Tree P
k=4
Mapped
HOP DAGs
D
RIX
b(cov) b(cm)
j
...
X
d1= 0, d2= 0
d1= 1M
d2= 1
d1= 0, d2= 0
d1= 1M
d2= 1
d1= 1M
d2= 10
M = (80 M B,
80 M B)
M = (8 M B,
8 M B)
M=(8 MB,
88 MB)
M = (0 M B,
8 M B)
M = (0 M B,
16 M B)
M= (< output mem> ,
< operation mem> )
Y
M=88MB
M=352MB
© 2015 IBM Corporation
Hands-On Lab: Task-Parallel ParFor Programs
§ Exercise: Pairwise Pearson Correlation
– a) Simple for
– loop w/ -stats
– b) Task-parallel
parfor w/ -stats
13 IBM Research
D = rand(rows=100000, cols=100);
m = nrow(D);
n = ncol(D);
R = matrix(0, rows=n, cols=n);
parfor( i in 1:(n-1) ) {
X = D[ ,i];
m2X = centralMoment(X,2);
sigmaX = sqrt( m2X*(m/(m-1.0)) );
parfor( j in (i+1):n ) {
Y = D[ ,j];
m2Y = centralMoment(Y,2);
sigmaY = sqrt( m2Y*(m/(m-1.0)) );
R[i,j] = cov(X,Y) / (sigmaX*sigmaY);
}}
write(R, "./tmp/R", format="binary");
© 2015 IBM Corporation
Outline
§ Common Framework
§ Optimizer-Centric Techniques
§ Runtime-Centric Techniques
– ParFor Optimizer/Runtime
– Buffer Pool + Specific Optimizations
– Spark-Specific Rewrites
– Partitioning-Preserving Operations
– Update In-Place
– Ongoing Research (CLA)
14 IBM Research
© 2015 IBM Corporation
Buffer Pool Overview
§ Motivation
– Exchange of intermediates between local and remote operations
(HDFS, RDDs, GPU divide memory)
– Eviction of in-memory objects (integrated with garbage collector)
§ Primitives
– acquireRead, acquireModify, release, exportData, getRdd, getBroadcast
§ Spark Specifics
– Lineage tracking
RDDs/broadcasts
– Guarded RDD
collect/parallelize
– Partitioned
Broadcast variables
15 IBM Research
MatrixObject/
WriteBuffer
Lineage Tracking
© 2015 IBM Corporation
Outline
§ Common Framework
§ Optimizer-Centric Techniques
§ Runtime-Centric Techniques
– ParFor Optimizer/Runtime
– Buffer Pool + Specific Optimizations
– Spark-Specific Rewrites
– Partitioning-Preserving Operations
– Update In-Place
– Ongoing Research (CLA)
16 IBM Research
© 2015 IBM Corporation
Spark-Specific Optimizations
§ Spark-Specific Rewrites
– Automatic caching/checkpoint injection
(MEM_DISK / MEM_DISK_SER)
– Automatic repartition injection
§ Operator Selection
– Spark exec type selection
– Transitive Spark exec type
– Physical operator selection
§ Extended ParFor Optimizer
– Deferred checkpoint/repartition injection
– Eager checkpointing/repartitioning
– Fair scheduling for concurrent jobs
– Local degree of parallelism
§ Runtime Optimizations
– Lazy Spark context creation
– Short-circuit read/collect
17 IBM Research
X = read($1);
y = read($2);
...
r = -(t(X) %*% y);
while(i < maxi &
norm_r2 > norm_r2_trgt) {
q = t(X)%*%(X%*%p) + lambda*p;
alpha = norm_r2 / (t(p)%*%q);
w = w + alpha * p;
old_norm_r2 = norm_r2;
r = r + alpha * q;
norm_r2 = sum(r * r);
beta = norm_r2 / old_norm_r2;
p = -r + beta * p;
i = i + 1;
}
...
write(w, $4);
chkpt X MEM_DISK
Ex: Checkpoint Injection LinregCG
Spark Exec
(24 cores)
25% user
75% data&exec
(50% Min & 75% Max)
© 2015 IBM Corporation
SystemML on Spark: Lessons Learned
§ Spark over Custom Framework
– Well engineered framework with strong contributor base
– Seamless data preparation and feature engineering
§ Stateful Distributed Caching
– Standing executors with distributed caching and fast task scheduling
– Challenges: task parallelism, memory constraints, fair resource management
§ Memory Efficiency
– Compact data structures to avoid cache spilling (serialization, CSR)
– Custom serialization and compression
§ Lazy RDD Evaluation
– Automatic grouping of operations into distributed jobs, incl partitioning
– Challenges: multiple actions/repeated execution, runtime plan compilation!
§ Declarative ML
– Introduction of Spark backend did not require algorithm changes!
– Automatically exploit distributed caching and partitioning via rewrites
18 IBM Research
25% tasks
© 2015 IBM Corporation
Outline
§ Common Framework
§ Optimizer-Centric Techniques
§ Runtime-Centric Techniques
– ParFor Optimizer/Runtime
– Buffer Pool + Specific Optimizations
– Spark-Specific Rewrites
– Partitioning-Preserving Operations
– Update In-Place
– Ongoing Research (CLA)
19 IBM Research
© 2015 IBM Corporation
Partitioning-Preserving Operations on Spark
§ Partitioning-preserving ops
– Op is partitioning-preserving if key not changed (guaranteed)
– 1) Implicit: Use restrictive APIs (mapValues() vs mapToPair())
– 2) Explicit: Partition computation w/ declaration of partitioning-preserving
(memory-efficiency via “lazy iterators”)
§ Partitioning-exploiting ops
– 1) Implicit: Operations based on join, cogroup, etc
– 2) Explicit: Custom physical operators on original keys (e.g., zipmm)
20 IBM Research
Physical
Blocking and
Partitioning
© 2015 IBM Corporation
Partitioning-Exploiting ZIPMM
§ Operation:
Z = t(X) %* % y
21 IBM Research
§ Operations: Transpose, Join, Multiplication
§ Shuffle
§ Operations: Join, Transpose & Multiplication
§ Avoid unnecessary shuffle
X y
Input:
1,1
1,2
1,3
Approach: zipmm
X y Z
1,1
1,2
1,3
Partitions not
preserved after
transpose, as keys
changed.
t(X)
yApproach: Naive
1,1 2,1 3,1
© 2015 IBM Corporation
Example Multiclass SVM
§ Example: Multiclass SVM
– Vectors in nrow(X) neither fit into driver nor broadcast
(MapMM not applicable)
– ncol(X) ≤ Bc (zipmm applicable)
22 IBM Research
parfor(iter_class in 1:num_classes) {
Y_local = 2 * (Y == iter_class) – 1;
g_old = t(X) %*% Y_local;
...
while( continue ) {
Xd = X %*% s;
... inner while loop (compute step_sz)
Xw = Xw + step_sz * Xd;
out = 1 - Y_local * Xw;
out = (out > 0) * out;
g_new = t(X) %*% (out * Y_local) ...
repart, chkpt X MEM_DISK
chkpt y_local MEM_DISK
zipmm
chkpt Xd, Xw MEM_DISK
© 2015 IBM Corporation
Hands-On Lab: Partitioning-Preserving Operations
§ Exercise: MultiClass SVM
– W/o repartition injection
– W/ repartitioning injection
23 IBM Research
parfor(iter_class in 1:num_classes) {
Y_local = 2 * (Y == iter_class) –
1;
g_old = t(X) %*% Y_local;
...
while( continue ) {
Xd = X %*% s;
... inner while loop (compute
step_sz)
Xw = Xw + step_sz * Xd;
out = 1 - Y_local * Xw;
out = (out > 0) * out;
g_new = t(X) %*% (out *
Y_local) ...
}
}
© 2015 IBM Corporation
Outline
§ Common Framework
§ Optimizer-Centric Techniques
§ Runtime-Centric Techniques
– ParFor Optimizer/Runtime
– Buffer Pool + Specific Optimizations
– Spark-Specific Rewrites
– Partitioning-Preserving Operations
– Update In-Place
– Ongoing Research (CLA)
24 IBM Research
© 2015 IBM Corporation
Update In-Place
§ Loop Update In-Place
– 1) ParFor result indexing / intermediates (w/ pinned matrix objects)
– 2) For/while/parfor loops with pure left indexing access to variable
– Both require pinning / shallow serialize to overcome buffer pool serialization
– Example Type 2:
§ Where we cannot apply Update In-Place
– Matrix object cannot fit into local memory budget (CP only)
– Interleaving operations (mix of update and reference, might be non-obvious)
– Example
25 IBM Research
for(i in 1:nrow(X))
for(j in 1:ncol(X))
X[i,j] = i+j;
R = X;
X[i,j] = i+j;
y = sum(R);
Would create
incorrect results!
© 2015 IBM Corporation
Hands-On Lab: Update In-Place
§ Exercise: Update In-Place (SystemML master/0.11 only):
– a) Update in-place application (investigate -explain and –stats)
– b) Update in-place not applicable – why?
26 IBM Research
for(i in 1:nrow(X))
for(j in 1:ncol(X))
X[i,j] = i+j;
for(i in 1:nrow(X)) {
for(j in 1:ncol(X)) {
print(sum(X));
X[i,j] = i+j;
}
}
© 2015 IBM Corporation
Outline
§ Common Framework
§ Optimizer-Centric Techniques
§ Runtime-Centric Techniques
– ParFor Optimizer/Runtime
– Buffer Pool + Specific Optimizations
– Spark-Specific Rewrites
– Partitioning-Preserving Operations
– Update In-Place
– Ongoing Research (CLA)
27 IBM Research
© 2015 IBM Corporation
Compressed Linear Algebra
§ Motivation / Problem
– Iterative ML algorithms w/ repeated read-only data access
– IO-bound matrix-vector multiplications è crucial to fit data in memory
– General-purpose heavy-/lightweight techniques too slow / modest comp. ratios
§ Goals
– Performance close to uncompressed
– Good compression ratios
28 IBM Research
[A. Elgohary,M. Boehm,P. J. Haas, F. R. Reiss, B.
Reinwald:Compressed Linear Algebra for Large-
Scale Machine Learning.PVLDB 9(12), 2016]
© 2015 IBM Corporation
Compressed Linear Algebra (2)
§ Approach
– Database compression
– LA over compressed rep.
– Column-compression
schemes (OLE, RLE, UC)
– Cache-conscious CLA ops
– Sampling-based
compression algorithm
§ Results
29 IBM Research
[A. Elgohary,M. Boehm,P. J. Haas, F. R. Reiss, B.
Reinwald:Compressed Linear Algebra for Large-
Scale Machine Learning.PVLDB 9(12), 2016]
Algorithm Dataset ULA Snappy CLA
GLM Mnist40m (90GB) 409s 647s 397s
Mnist240m (540GB) 74,301s 23,717s 2,787s
MLogreg Mnist40m (90GB) 630s 875s 622s
Mnist240m (540GB) 83,153s 27,626s 4,379s
L2SVM Mnist40m (90GB) 394 461 429
Mnist240m (540GB) 14,041 8,423 2,593
Up to
26x
© 2015 IBM Corporation30 IBM Research
SystemML is Open Source:
• Apache Incubator Project (11/2015)
• Website: http://systemml.apache.org/
• Source code: https://github.com/
apache/incubator-systemml

More Related Content

What's hot

Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Ruairi de Frein
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
Alexandros Karatzoglou
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
David Gleich
 
pmux
pmuxpmux
pmux
maebashi
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Alexis Perrier
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
Yanchang Zhao
 
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
Asai Masataro
 
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung HanHomomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
vpnmentor
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
International Islamic University
 
Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...
Erlangen Artificial Intelligence & Machine Learning Meetup
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
Pietro Michiardi
 
Core concepts of C++
Core concepts of C++  Core concepts of C++
Core concepts of C++
Martin Ayvazyan
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorial
Roger Rafanell Mas
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
LEGATO project
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
J Singh
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICS
MohammedMedani4
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
Samuel Bosch
 
Datomic rtree-pres
Datomic rtree-presDatomic rtree-pres
Datomic rtree-pres
jsofra
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
rerngvit yanggratoke
 

What's hot (19)

Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
pmux
pmuxpmux
pmux
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
 
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung HanHomomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 
Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
 
Core concepts of C++
Core concepts of C++  Core concepts of C++
Core concepts of C++
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorial
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICS
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Datomic rtree-pres
Datomic rtree-presDatomic rtree-pres
Datomic rtree-pres
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 

Viewers also liked

Krishna Sharma
Krishna SharmaKrishna Sharma
Krishna Sharma
Krishna Sharma
 
THE RIG RECRUITMENT Brochure New 2016
THE RIG RECRUITMENT Brochure New 2016THE RIG RECRUITMENT Brochure New 2016
THE RIG RECRUITMENT Brochure New 2016
David Pearson
 
California Home Builders - The Heritage Collection
California Home Builders - The Heritage CollectionCalifornia Home Builders - The Heritage Collection
California Home Builders - The Heritage Collection
Jamie General
 
امتحان دی ماه وب مقدماتی
امتحان دی ماه وب مقدماتیامتحان دی ماه وب مقدماتی
امتحان دی ماه وب مقدماتیsomayeh daneshparvar
 
Qantas News Well-earned Breaks
Qantas News Well-earned BreaksQantas News Well-earned Breaks
Qantas News Well-earned Breaks
Stephanie Christopher
 
California Home Builders - La Ventana
California Home Builders - La VentanaCalifornia Home Builders - La Ventana
California Home Builders - La Ventana
Jamie General
 
Satyajith resume
Satyajith resumeSatyajith resume
Satyajith resume
satyajith shetty
 
5. implicaciones éticas en torno al acceso y uso de la información.
5. implicaciones éticas en torno al acceso y uso de la información.5. implicaciones éticas en torno al acceso y uso de la información.
5. implicaciones éticas en torno al acceso y uso de la información.
Margarita Perez Robles
 
Top Ten things that have been proven to effect software reliability
Top Ten things that have been proven to effect software reliabilityTop Ten things that have been proven to effect software reliability
Top Ten things that have been proven to effect software reliability
Ann Marie Neufelder
 
How to get the most out of your doctor's visits dr. potter
How to get the most out of your doctor's visits dr. potterHow to get the most out of your doctor's visits dr. potter
How to get the most out of your doctor's visits dr. potter
lupusdmv
 
Deber de Informatica Sol Gomez
Deber de Informatica Sol Gomez Deber de Informatica Sol Gomez
Deber de Informatica Sol Gomez
Sol Gomez
 
On - Fideicomiso Ganadero (1)
On - Fideicomiso Ganadero (1)On - Fideicomiso Ganadero (1)
On - Fideicomiso Ganadero (1)
Maximiliano Del Torchio
 
Coordenadas curvilineas ortogonales
Coordenadas curvilineas ortogonalesCoordenadas curvilineas ortogonales
Coordenadas curvilineas ortogonales
sosarafael
 
Orçamento programa do município exercício 2016
Orçamento programa do município   exercício 2016Orçamento programa do município   exercício 2016
Orçamento programa do município exercício 2016
Jonhcp
 
Gianluca Fiorelli - SMM Internazionale
Gianluca Fiorelli - SMM InternazionaleGianluca Fiorelli - SMM Internazionale
Gianluca Fiorelli - SMM Internazionale
Elena Minchenok
 
Thinh Hoang Resume
Thinh Hoang Resume Thinh Hoang Resume
Thinh Hoang Resume
Thinh Hoang
 

Viewers also liked (16)

Krishna Sharma
Krishna SharmaKrishna Sharma
Krishna Sharma
 
THE RIG RECRUITMENT Brochure New 2016
THE RIG RECRUITMENT Brochure New 2016THE RIG RECRUITMENT Brochure New 2016
THE RIG RECRUITMENT Brochure New 2016
 
California Home Builders - The Heritage Collection
California Home Builders - The Heritage CollectionCalifornia Home Builders - The Heritage Collection
California Home Builders - The Heritage Collection
 
امتحان دی ماه وب مقدماتی
امتحان دی ماه وب مقدماتیامتحان دی ماه وب مقدماتی
امتحان دی ماه وب مقدماتی
 
Qantas News Well-earned Breaks
Qantas News Well-earned BreaksQantas News Well-earned Breaks
Qantas News Well-earned Breaks
 
California Home Builders - La Ventana
California Home Builders - La VentanaCalifornia Home Builders - La Ventana
California Home Builders - La Ventana
 
Satyajith resume
Satyajith resumeSatyajith resume
Satyajith resume
 
5. implicaciones éticas en torno al acceso y uso de la información.
5. implicaciones éticas en torno al acceso y uso de la información.5. implicaciones éticas en torno al acceso y uso de la información.
5. implicaciones éticas en torno al acceso y uso de la información.
 
Top Ten things that have been proven to effect software reliability
Top Ten things that have been proven to effect software reliabilityTop Ten things that have been proven to effect software reliability
Top Ten things that have been proven to effect software reliability
 
How to get the most out of your doctor's visits dr. potter
How to get the most out of your doctor's visits dr. potterHow to get the most out of your doctor's visits dr. potter
How to get the most out of your doctor's visits dr. potter
 
Deber de Informatica Sol Gomez
Deber de Informatica Sol Gomez Deber de Informatica Sol Gomez
Deber de Informatica Sol Gomez
 
On - Fideicomiso Ganadero (1)
On - Fideicomiso Ganadero (1)On - Fideicomiso Ganadero (1)
On - Fideicomiso Ganadero (1)
 
Coordenadas curvilineas ortogonales
Coordenadas curvilineas ortogonalesCoordenadas curvilineas ortogonales
Coordenadas curvilineas ortogonales
 
Orçamento programa do município exercício 2016
Orçamento programa do município   exercício 2016Orçamento programa do município   exercício 2016
Orçamento programa do município exercício 2016
 
Gianluca Fiorelli - SMM Internazionale
Gianluca Fiorelli - SMM InternazionaleGianluca Fiorelli - SMM Internazionale
Gianluca Fiorelli - SMM Internazionale
 
Thinh Hoang Resume
Thinh Hoang Resume Thinh Hoang Resume
Thinh Hoang Resume
 

Similar to Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias Boehm

Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Arvind Surve
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Arvind Surve
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Intel® Software
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
Massimo Schenone
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
Intel® Software
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
Vijay Srinivas Agneeswaran, Ph.D
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
MLconf
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
Rizwan Habib
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
IndicThreads
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Vincent Poncet
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Carol McDonald
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
Cloudera, Inc.
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Chetan Khatri
 
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
Facultad de Informática UCM
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
MapR Technologies
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
MapR Technologies
 
[214]유연하고 확장성 있는 빅데이터 처리
[214]유연하고 확장성 있는 빅데이터 처리[214]유연하고 확장성 있는 빅데이터 처리
[214]유연하고 확장성 있는 빅데이터 처리
NAVER D2
 
Scala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkScala Meetup Hamburg - Spark
Scala Meetup Hamburg - Spark
Ivan Morozov
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
Chetan Khatri
 
Accelerating the Development of Efficient CP Optimizer Models
Accelerating the Development of Efficient CP Optimizer ModelsAccelerating the Development of Efficient CP Optimizer Models
Accelerating the Development of Efficient CP Optimizer Models
Philippe Laborie
 

Similar to Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias Boehm (20)

Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
 
[214]유연하고 확장성 있는 빅데이터 처리
[214]유연하고 확장성 있는 빅데이터 처리[214]유연하고 확장성 있는 빅데이터 처리
[214]유연하고 확장성 있는 빅데이터 처리
 
Scala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkScala Meetup Hamburg - Spark
Scala Meetup Hamburg - Spark
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
Accelerating the Development of Efficient CP Optimizer Models
Accelerating the Development of Efficient CP Optimizer ModelsAccelerating the Development of Efficient CP Optimizer Models
Accelerating the Development of Efficient CP Optimizer Models
 

More from Arvind Surve

Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Arvind Surve
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
Arvind Surve
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Arvind Surve
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Arvind Surve
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj Sen
Arvind Surve
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V Evfimievski
Arvind Surve
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Arvind Surve
 
DML Syntax and Invocation process
DML Syntax and Invocation processDML Syntax and Invocation process
DML Syntax and Invocation process
Arvind Surve
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Arvind Surve
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Arvind Surve
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
Arvind Surve
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Arvind Surve
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Arvind Surve
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj Sen
Arvind Surve
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V Evfimievski
Arvind Surve
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Arvind Surve
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and Invocation
Arvind Surve
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Arvind Surve
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Arvind Surve
 

More from Arvind Surve (19)

Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj Sen
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj Sen
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V Evfimievski
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...
 
DML Syntax and Invocation process
DML Syntax and Invocation processDML Syntax and Invocation process
DML Syntax and Invocation process
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj Sen
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj Sen
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V Evfimievski
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and Invocation
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
 

Recently uploaded

Constructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective CommunicationConstructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective Communication
Chevonnese Chevers Whyte, MBA, B.Sc.
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching AptitudeUGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
S. Raj Kumar
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
B. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdfB. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdf
BoudhayanBhattachari
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
ssuser13ffe4
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
spdendr
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 

Recently uploaded (20)

Constructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective CommunicationConstructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective Communication
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching AptitudeUGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
B. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdfB. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdf
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 

Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias Boehm

  • 1. © 2015 IBM Corporation S7/8: SystemML’s Optimizer and Runtime Matthias Boehm1, Arvind C. Surve2 1 IBM Research – Almaden 2 IBM Spark Technology Center IBM Research
  • 2. © 2015 IBM Corporation Abstraction: The Good, the Bad and the Ugly 2 IBM Research q = t(X) %*% (w * (X %*% v)) [adapted from Peter Alvaro:"I See What You Mean“, Strange Loop, 2015] Simple & Analysis-Centric Data Independence Platform Independence Adaptivity (Missing) Size InformationOperator Selection (Missing) Rewrites Distributed Operations Distributed Storage (Implicit) Copy-on-Write Data Skew Load Imbalance Latency Complex Control Flow Local / Remote Memory Budgets The Ugly: Expectations ≠ Reality è Understanding of optimizer and runtime techniques underpinning declarative, large-scale ML Efficiency & Performance
  • 3. © 2015 IBM Corporation Outline § Common Framework § Optimizer-Centric Techniques § Runtime-Centric Techniques – ParFor Optimizer/Runtime – Buffer Pool + Specific Optimizations – Spark-Specific Rewrites – Partitioning-Preserving Operations – Update In-Place – Ongoing Research (CLA) 3 IBM Research
  • 4. © 2015 IBM Corporation Optimization through ParFor § Motivation – SystemML focus primarily on data parallelism – Dedicated parfor construct for task parallelism § ParFor approach: – Complementary parfor parallelization strategies – Cost-based optimization framework for task-parallel ML – Memory budget as common constraint 4 IBM Research
  • 5. © 2015 IBM Corporation Recap: Basic HOP DAG Compilation Example Pearson Correlation § DML Script § HOP DAG 5 IBM Research X = read( "./in/X" ); #data on HDFS Y = read( "./in/Y" ); m = nrow(X); sigmaX = sqrt( centralMoment(X,2)*(m/(m-1.0)) ); sigmaY = sqrt( centralMoment(Y,2)*(m/(m-1.0)) ); r = cov(X,Y) / (sigmaX * sigmaY); write( r, "./out/r" ); b(cov) X r (“./out/r“) Y (“./in/Y“, 106 x1) b(cm) b(cm) b(*) b(*) 2 u(sqrt) u(sqrt) b(*) b(/ ) b(/ ) b(-) 1,000,000 1 w/ o constant folding (1.000001) (“./in/X“, 106 x1) u() … unary operator b() … binary operator cov … covariance cm … central moment sqrt … square root yx yx YX σσ ρ ),cov( , = Exploit Spark/MR data parallelism if beneficial/required
  • 6. © 2015 IBM Corporation Running Example: Pairwise Pearson Correlation § Representative for more complex bivariate statistics (Pearson‘s R, Anova F, Chi-squared, Degree of freedom, P-value, Cramers V, Spearman, etc) 6 IBM Research D = read("./input/D"); m = nrow(D); n = ncol(D); R = matrix(0, rows=n, cols=n); parfor( i in 1:(n-1) ) { X = D[ ,i]; m2X = centralMoment(X,2); sigmaX = sqrt( m2X*(m/(m-1.0)) ); parfor( j in (i+1):n ) { Y = D[ ,j]; m2Y = centralMoment(Y,2); sigmaY = sqrt( m2Y*(m/(m-1.0)) ); R[i,j] = cov(X,Y) / (sigmaX*sigmaY); }} write(R, "./output/R"); Challenges: • Triangular nested loop • Column-wise access on unordered distributed data • Bivariate all-to-all data shuffling pattern. Exploit task and data parallelism if beneficial/required
  • 7. © 2015 IBM Corporation Overview Parallelization Strategies § Conceptual Design: Master/worker – Task: group of parfor iterations § Task Partitioning – Naive, static, fixed, factoring, factoring_cmax – Task overhead vs load balance? § Task Execution – Local, remote (Spark/MR), remoteDP (Spark/MR) – Various runtime optimizations – Degree of parallelism/IO/latency? § Result Aggregation – Local memory, local file, remote (Spark/MR) – W/ and w/o compare – Result locality/IO/latency? 7 IBM Research n = 12 parfor( i in 1:(n-1) ) { X = D[ ,i]; … R[i,j] = … } è Optimizer leverages these to generate efficient execution plans
  • 8. © 2015 IBM Corporation Example Task Partitioning 8 IBM Research § Scenario: k=24 workers, 10,000 iterations Factoring Factoring CMAX (150) 0 50 100 150 200 250 # of Iterations Tasks (1 to 208) Naive Fixed(250) Static 0 50 100 150 200 250 300 350 400 450 # of Iterations Tasks (1 to 24) 0 50 100 150 200 250 300 # of Iterations Tasks (1 to 40) 0 10 20 30 40 50 1 Iteration per task Tasks (1 to 10000) 0 50 100 150 200 250 # of Iterations Tasks (1 to 228)
  • 9. © 2015 IBM Corporation Task Execution: Local and Remote Parallelism 9 IBM Research Local execution (multicore) Remote execution (cluster) Local ParWorker k ParFOR (local) Local ParWorker 1 while(wßdeq()) foreach pi ∈ w execute(prog(pi)) Task Partitioning Parallel Result Aggregation Task Queue ... w5: i, { 11} w4: i, { 9,10} w3: i, { 7, 8 } w2: i, { 4,5,6} w1: i, { 1,2,3} Hadoop ParWorker Mapper k ParFOR (remote) ParWorker Mapper 1 map(key,value) wßparse(value) foreach pi ∈ w execute(prog(pi)) Task Partitioning Parallel Result Aggregation ... … A|MATRIX|./ out/ A7tmp w5: i, { 11} w4: i, { 9,10} w3: i, { 7, 8 } w2: i, { 4,5,6} w1: i, { 1,2,3} Hybrid parallelism: combinations of local/remote and data-parallel jobs
  • 10. © 2015 IBM Corporation Task Execution: Runtime Optimizations § Data Partitioning – Problem: Repeated MR jobs for indexed access – Access-awareness (cost estimation, correct plan generation) – Operators: local file-based, remote MR job § Data Locality – Problem: Co-location of parfor tasks to partitions/matrices – Location reporting per logical parfor task (e.g., for parfor(i) à D[, i]) 10 IBM Research parfor( i in 1:(n-1) ) { X = D[ ,i]; … parfor( j in (i+1):n ){ Y = D[ ,j]; … }} N ode2 D 3 D 4 D 5 D 9 D 10 D 11 Node 1 N ode1 D 1 D 2 D 6 D 7 D 8 Node 2 Node 1 Node 1, 2 Node 2 w5: i, { 11} w4: i, { 9,10} w3: i, { 7, 8 } w2: i, { 4,5,6} w1: i, { 1,2,3} Reported Locations: Task File Partitions Partitions
  • 11. © 2015 IBM Corporation Optimization Framework – Problem Formulation § Design: Runtime optimization for each top-level parfor § Plan Tree P – Nodes NP • Exec type et • Parallelism k • Attributes A – Height h – Exec contexts ECP § Plan Tree Optimization Problem – 11 IBM Research ParFOR b(cm) Generic ParFOR Generic RIX LIX b(cov)... RIX b(cm)... ec0 ParFOR b(cm) Generic ParFOR ec1 Generic RIX LIX b(cov)... RIX b(cm)... cmec = 600 MB ckec = 1 cmec = 1024 MB ckec = 16 MR ec … execution context cm … memory constraint ck … parallelism constraint [M. Boehm et al. Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML PVLDB 7(7), 2014] [M. Boehm et al. Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs.CoRR,2015]
  • 12. © 2015 IBM Corporation Optimization Framework – Cost Model / Optimizer § Overview Heuristic Optimizer – Time- and memory-based cost model w/o shared reads – Heuristic high-impact rewrites – Transformation-based search strategy with global opt scope § Cost Model – HOP DAG size propagation – Worst-case memory estimates – Time estimates – Plan tree statistics aggregation 12 IBM Research ParFOR b(cm) Generic ParFOR Generic RIX LIX b(cov)... RIX b(cm)... Plan Tree P k=4 Mapped HOP DAGs D RIX b(cov) b(cm) j ... X d1= 0, d2= 0 d1= 1M d2= 1 d1= 0, d2= 0 d1= 1M d2= 1 d1= 1M d2= 10 M = (80 M B, 80 M B) M = (8 M B, 8 M B) M=(8 MB, 88 MB) M = (0 M B, 8 M B) M = (0 M B, 16 M B) M= (< output mem> , < operation mem> ) Y M=88MB M=352MB
  • 13. © 2015 IBM Corporation Hands-On Lab: Task-Parallel ParFor Programs § Exercise: Pairwise Pearson Correlation – a) Simple for – loop w/ -stats – b) Task-parallel parfor w/ -stats 13 IBM Research D = rand(rows=100000, cols=100); m = nrow(D); n = ncol(D); R = matrix(0, rows=n, cols=n); parfor( i in 1:(n-1) ) { X = D[ ,i]; m2X = centralMoment(X,2); sigmaX = sqrt( m2X*(m/(m-1.0)) ); parfor( j in (i+1):n ) { Y = D[ ,j]; m2Y = centralMoment(Y,2); sigmaY = sqrt( m2Y*(m/(m-1.0)) ); R[i,j] = cov(X,Y) / (sigmaX*sigmaY); }} write(R, "./tmp/R", format="binary");
  • 14. © 2015 IBM Corporation Outline § Common Framework § Optimizer-Centric Techniques § Runtime-Centric Techniques – ParFor Optimizer/Runtime – Buffer Pool + Specific Optimizations – Spark-Specific Rewrites – Partitioning-Preserving Operations – Update In-Place – Ongoing Research (CLA) 14 IBM Research
  • 15. © 2015 IBM Corporation Buffer Pool Overview § Motivation – Exchange of intermediates between local and remote operations (HDFS, RDDs, GPU divide memory) – Eviction of in-memory objects (integrated with garbage collector) § Primitives – acquireRead, acquireModify, release, exportData, getRdd, getBroadcast § Spark Specifics – Lineage tracking RDDs/broadcasts – Guarded RDD collect/parallelize – Partitioned Broadcast variables 15 IBM Research MatrixObject/ WriteBuffer Lineage Tracking
  • 16. © 2015 IBM Corporation Outline § Common Framework § Optimizer-Centric Techniques § Runtime-Centric Techniques – ParFor Optimizer/Runtime – Buffer Pool + Specific Optimizations – Spark-Specific Rewrites – Partitioning-Preserving Operations – Update In-Place – Ongoing Research (CLA) 16 IBM Research
  • 17. © 2015 IBM Corporation Spark-Specific Optimizations § Spark-Specific Rewrites – Automatic caching/checkpoint injection (MEM_DISK / MEM_DISK_SER) – Automatic repartition injection § Operator Selection – Spark exec type selection – Transitive Spark exec type – Physical operator selection § Extended ParFor Optimizer – Deferred checkpoint/repartition injection – Eager checkpointing/repartitioning – Fair scheduling for concurrent jobs – Local degree of parallelism § Runtime Optimizations – Lazy Spark context creation – Short-circuit read/collect 17 IBM Research X = read($1); y = read($2); ... r = -(t(X) %*% y); while(i < maxi & norm_r2 > norm_r2_trgt) { q = t(X)%*%(X%*%p) + lambda*p; alpha = norm_r2 / (t(p)%*%q); w = w + alpha * p; old_norm_r2 = norm_r2; r = r + alpha * q; norm_r2 = sum(r * r); beta = norm_r2 / old_norm_r2; p = -r + beta * p; i = i + 1; } ... write(w, $4); chkpt X MEM_DISK Ex: Checkpoint Injection LinregCG Spark Exec (24 cores) 25% user 75% data&exec (50% Min & 75% Max)
  • 18. © 2015 IBM Corporation SystemML on Spark: Lessons Learned § Spark over Custom Framework – Well engineered framework with strong contributor base – Seamless data preparation and feature engineering § Stateful Distributed Caching – Standing executors with distributed caching and fast task scheduling – Challenges: task parallelism, memory constraints, fair resource management § Memory Efficiency – Compact data structures to avoid cache spilling (serialization, CSR) – Custom serialization and compression § Lazy RDD Evaluation – Automatic grouping of operations into distributed jobs, incl partitioning – Challenges: multiple actions/repeated execution, runtime plan compilation! § Declarative ML – Introduction of Spark backend did not require algorithm changes! – Automatically exploit distributed caching and partitioning via rewrites 18 IBM Research 25% tasks
  • 19. © 2015 IBM Corporation Outline § Common Framework § Optimizer-Centric Techniques § Runtime-Centric Techniques – ParFor Optimizer/Runtime – Buffer Pool + Specific Optimizations – Spark-Specific Rewrites – Partitioning-Preserving Operations – Update In-Place – Ongoing Research (CLA) 19 IBM Research
  • 20. © 2015 IBM Corporation Partitioning-Preserving Operations on Spark § Partitioning-preserving ops – Op is partitioning-preserving if key not changed (guaranteed) – 1) Implicit: Use restrictive APIs (mapValues() vs mapToPair()) – 2) Explicit: Partition computation w/ declaration of partitioning-preserving (memory-efficiency via “lazy iterators”) § Partitioning-exploiting ops – 1) Implicit: Operations based on join, cogroup, etc – 2) Explicit: Custom physical operators on original keys (e.g., zipmm) 20 IBM Research Physical Blocking and Partitioning
  • 21. © 2015 IBM Corporation Partitioning-Exploiting ZIPMM § Operation: Z = t(X) %* % y 21 IBM Research § Operations: Transpose, Join, Multiplication § Shuffle § Operations: Join, Transpose & Multiplication § Avoid unnecessary shuffle X y Input: 1,1 1,2 1,3 Approach: zipmm X y Z 1,1 1,2 1,3 Partitions not preserved after transpose, as keys changed. t(X) yApproach: Naive 1,1 2,1 3,1
  • 22. © 2015 IBM Corporation Example Multiclass SVM § Example: Multiclass SVM – Vectors in nrow(X) neither fit into driver nor broadcast (MapMM not applicable) – ncol(X) ≤ Bc (zipmm applicable) 22 IBM Research parfor(iter_class in 1:num_classes) { Y_local = 2 * (Y == iter_class) – 1; g_old = t(X) %*% Y_local; ... while( continue ) { Xd = X %*% s; ... inner while loop (compute step_sz) Xw = Xw + step_sz * Xd; out = 1 - Y_local * Xw; out = (out > 0) * out; g_new = t(X) %*% (out * Y_local) ... repart, chkpt X MEM_DISK chkpt y_local MEM_DISK zipmm chkpt Xd, Xw MEM_DISK
  • 23. © 2015 IBM Corporation Hands-On Lab: Partitioning-Preserving Operations § Exercise: MultiClass SVM – W/o repartition injection – W/ repartitioning injection 23 IBM Research parfor(iter_class in 1:num_classes) { Y_local = 2 * (Y == iter_class) – 1; g_old = t(X) %*% Y_local; ... while( continue ) { Xd = X %*% s; ... inner while loop (compute step_sz) Xw = Xw + step_sz * Xd; out = 1 - Y_local * Xw; out = (out > 0) * out; g_new = t(X) %*% (out * Y_local) ... } }
  • 24. © 2015 IBM Corporation Outline § Common Framework § Optimizer-Centric Techniques § Runtime-Centric Techniques – ParFor Optimizer/Runtime – Buffer Pool + Specific Optimizations – Spark-Specific Rewrites – Partitioning-Preserving Operations – Update In-Place – Ongoing Research (CLA) 24 IBM Research
  • 25. © 2015 IBM Corporation Update In-Place § Loop Update In-Place – 1) ParFor result indexing / intermediates (w/ pinned matrix objects) – 2) For/while/parfor loops with pure left indexing access to variable – Both require pinning / shallow serialize to overcome buffer pool serialization – Example Type 2: § Where we cannot apply Update In-Place – Matrix object cannot fit into local memory budget (CP only) – Interleaving operations (mix of update and reference, might be non-obvious) – Example 25 IBM Research for(i in 1:nrow(X)) for(j in 1:ncol(X)) X[i,j] = i+j; R = X; X[i,j] = i+j; y = sum(R); Would create incorrect results!
  • 26. © 2015 IBM Corporation Hands-On Lab: Update In-Place § Exercise: Update In-Place (SystemML master/0.11 only): – a) Update in-place application (investigate -explain and –stats) – b) Update in-place not applicable – why? 26 IBM Research for(i in 1:nrow(X)) for(j in 1:ncol(X)) X[i,j] = i+j; for(i in 1:nrow(X)) { for(j in 1:ncol(X)) { print(sum(X)); X[i,j] = i+j; } }
  • 27. © 2015 IBM Corporation Outline § Common Framework § Optimizer-Centric Techniques § Runtime-Centric Techniques – ParFor Optimizer/Runtime – Buffer Pool + Specific Optimizations – Spark-Specific Rewrites – Partitioning-Preserving Operations – Update In-Place – Ongoing Research (CLA) 27 IBM Research
  • 28. © 2015 IBM Corporation Compressed Linear Algebra § Motivation / Problem – Iterative ML algorithms w/ repeated read-only data access – IO-bound matrix-vector multiplications è crucial to fit data in memory – General-purpose heavy-/lightweight techniques too slow / modest comp. ratios § Goals – Performance close to uncompressed – Good compression ratios 28 IBM Research [A. Elgohary,M. Boehm,P. J. Haas, F. R. Reiss, B. Reinwald:Compressed Linear Algebra for Large- Scale Machine Learning.PVLDB 9(12), 2016]
  • 29. © 2015 IBM Corporation Compressed Linear Algebra (2) § Approach – Database compression – LA over compressed rep. – Column-compression schemes (OLE, RLE, UC) – Cache-conscious CLA ops – Sampling-based compression algorithm § Results 29 IBM Research [A. Elgohary,M. Boehm,P. J. Haas, F. R. Reiss, B. Reinwald:Compressed Linear Algebra for Large- Scale Machine Learning.PVLDB 9(12), 2016] Algorithm Dataset ULA Snappy CLA GLM Mnist40m (90GB) 409s 647s 397s Mnist240m (540GB) 74,301s 23,717s 2,787s MLogreg Mnist40m (90GB) 630s 875s 622s Mnist240m (540GB) 83,153s 27,626s 4,379s L2SVM Mnist40m (90GB) 394 461 429 Mnist240m (540GB) 14,041 8,423 2,593 Up to 26x
  • 30. © 2015 IBM Corporation30 IBM Research SystemML is Open Source: • Apache Incubator Project (11/2015) • Website: http://systemml.apache.org/ • Source code: https://github.com/ apache/incubator-systemml