SlideShare a Scribd company logo
1 of 30
Download to read offline
© 2015 IBM Corporation
S7/8: SystemML’s Optimizer and Runtime
Matthias Boehm1, Arvind C. Surve2
1 IBM Research – Almaden
2 IBM Spark Technology Center
IBM Research
© 2015 IBM Corporation
Abstraction: The Good, the Bad and the Ugly
2 IBM Research
q = t(X) %*% (w * (X %*% v))
[adapted from Peter Alvaro:"I See What You Mean“,
Strange Loop, 2015]
Simple & Analysis-Centric
Data Independence
Platform Independence
Adaptivity
(Missing)
Size InformationOperator
Selection
(Missing) Rewrites
Distributed
Operations
Distributed
Storage
(Implicit)
Copy-on-Write
Data Skew
Load
Imbalance
Latency
Complex Control Flow
Local / Remote
Memory Budgets
The Ugly: Expectations ≠ Reality
è Understanding of optimizer and runtime techniques
underpinning declarative, large-scale ML
Efficiency & Performance
© 2015 IBM Corporation
Outline
§ Common Framework
§ Optimizer-Centric Techniques
§ Runtime-Centric Techniques
– ParFor Optimizer/Runtime
– Buffer Pool + Specific Optimizations
– Spark-Specific Rewrites
– Partitioning-Preserving Operations
– Update In-Place
– Ongoing Research (CLA)
3 IBM Research
© 2015 IBM Corporation
Optimization through ParFor
§ Motivation
– SystemML focus primarily on data parallelism
– Dedicated parfor construct for task parallelism
§ ParFor approach:
– Complementary parfor parallelization strategies
– Cost-based optimization framework for task-parallel ML
– Memory budget as common constraint
4 IBM Research
© 2015 IBM Corporation
Recap: Basic HOP DAG Compilation
Example Pearson Correlation
§ DML
Script
§ HOP
DAG
5 IBM Research
X = read( "./in/X" ); #data on HDFS
Y = read( "./in/Y" );
m = nrow(X);
sigmaX = sqrt( centralMoment(X,2)*(m/(m-1.0)) );
sigmaY = sqrt( centralMoment(Y,2)*(m/(m-1.0)) );
r = cov(X,Y) / (sigmaX * sigmaY);
write( r, "./out/r" );
b(cov)
X
r (“./out/r“)
Y (“./in/Y“, 106
x1)
b(cm) b(cm)
b(*) b(*)
2
u(sqrt) u(sqrt)
b(*)
b(/ )
b(/ )
b(-)
1,000,000 1
w/ o constant
folding (1.000001)
(“./in/X“,
106
x1)
u() … unary operator
b() … binary operator
cov … covariance
cm … central moment
sqrt … square root
yx
yx
YX
σσ
ρ
),cov(
, =
Exploit Spark/MR
data parallelism
if beneficial/required
© 2015 IBM Corporation
Running Example: Pairwise Pearson Correlation
§ Representative for more complex bivariate statistics
(Pearson‘s R, Anova F, Chi-squared, Degree of freedom, P-value, Cramers V, Spearman, etc)
6 IBM Research
D = read("./input/D");
m = nrow(D);
n = ncol(D);
R = matrix(0, rows=n, cols=n);
parfor( i in 1:(n-1) ) {
X = D[ ,i];
m2X = centralMoment(X,2);
sigmaX = sqrt( m2X*(m/(m-1.0)) );
parfor( j in (i+1):n ) {
Y = D[ ,j];
m2Y = centralMoment(Y,2);
sigmaY = sqrt( m2Y*(m/(m-1.0)) );
R[i,j] = cov(X,Y) / (sigmaX*sigmaY);
}}
write(R, "./output/R");
Challenges:
• Triangular nested loop
• Column-wise access on
unordered distributed data
• Bivariate all-to-all data
shuffling pattern.
Exploit task and
data parallelism
if beneficial/required
© 2015 IBM Corporation
Overview Parallelization Strategies
§ Conceptual Design: Master/worker
– Task: group of parfor iterations
§ Task Partitioning
– Naive, static, fixed, factoring,
factoring_cmax
– Task overhead vs load balance?
§ Task Execution
– Local, remote (Spark/MR), remoteDP (Spark/MR)
– Various runtime optimizations
– Degree of parallelism/IO/latency?
§ Result Aggregation
– Local memory, local file, remote (Spark/MR)
– W/ and w/o compare
– Result locality/IO/latency?
7 IBM Research
n = 12
parfor( i in 1:(n-1) ) {
X = D[ ,i];
…
R[i,j] = …
}
è Optimizer leverages
these to generate
efficient execution
plans
© 2015 IBM Corporation
Example Task Partitioning
8 IBM Research
§ Scenario: k=24 workers, 10,000 iterations
Factoring Factoring CMAX (150)
0
50
100
150
200
250
#	of	Iterations
Tasks	(1	to		208)
Naive Fixed(250) Static
0
50
100
150
200
250
300
350
400
450
#	of	Iterations
Tasks	(1	to		24)
0
50
100
150
200
250
300
#	of	Iterations
Tasks	(1	to		40)
0
10
20
30
40
50
1	Iteration	per	task
Tasks	(1	to		10000)
0
50
100
150
200
250
#	of		Iterations
Tasks	(1	to		228)
© 2015 IBM Corporation
Task Execution: Local and Remote Parallelism
9 IBM Research
Local execution (multicore) Remote execution (cluster)
Local
ParWorker k
ParFOR (local)
Local
ParWorker 1
while(wßdeq())
foreach pi ∈ w
execute(prog(pi))
Task Partitioning
Parallel Result Aggregation
Task Queue
...
w5: i, { 11}
w4: i, { 9,10}
w3: i, { 7, 8 }
w2: i, { 4,5,6}
w1: i, { 1,2,3}
Hadoop
ParWorker
Mapper k
ParFOR (remote)
ParWorker
Mapper 1
map(key,value)
wßparse(value)
foreach pi ∈ w
execute(prog(pi))
Task Partitioning
Parallel Result Aggregation
...
…
A|MATRIX|./ out/ A7tmp
w5: i, { 11}
w4: i, { 9,10}
w3: i, { 7, 8 }
w2: i, { 4,5,6}
w1: i, { 1,2,3}
Hybrid parallelism: combinations of local/remote and data-parallel jobs
© 2015 IBM Corporation
Task Execution: Runtime Optimizations
§ Data Partitioning
– Problem: Repeated MR
jobs for indexed access
– Access-awareness
(cost estimation, correct plan generation)
– Operators: local file-based, remote MR job
§ Data Locality
– Problem: Co-location of parfor tasks to partitions/matrices
– Location reporting
per logical parfor
task (e.g., for
parfor(i) à D[, i])
10 IBM Research
parfor( i in 1:(n-1) ) {
X = D[ ,i]; …
parfor( j in (i+1):n ){
Y = D[ ,j]; …
}}
N ode2
D
3
D
4
D
5
D
9
D
10
D
11
Node 1
N ode1
D
1
D
2
D
6
D
7
D
8
Node 2
Node 1
Node 1, 2
Node 2 w5: i, { 11}
w4: i, { 9,10}
w3: i, { 7, 8 }
w2: i, { 4,5,6}
w1: i, { 1,2,3}
Reported
Locations: Task File
Partitions Partitions
© 2015 IBM Corporation
Optimization Framework – Problem Formulation
§ Design: Runtime optimization for each top-level parfor
§ Plan Tree P
– Nodes NP
• Exec type et
• Parallelism k
• Attributes A
– Height h
– Exec contexts ECP
§ Plan Tree Optimization Problem
–
11 IBM Research
ParFOR
b(cm)
Generic ParFOR
Generic
RIX LIX b(cov)...
RIX b(cm)...
ec0
ParFOR
b(cm)
Generic ParFOR
ec1
Generic
RIX LIX b(cov)...
RIX b(cm)... cmec = 600 MB
ckec = 1
cmec = 1024 MB
ckec = 16
MR
ec … execution context
cm … memory constraint
ck … parallelism constraint
[M. Boehm et al. Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML PVLDB 7(7), 2014]
[M. Boehm et al. Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs.CoRR,2015]
© 2015 IBM Corporation
Optimization Framework – Cost Model / Optimizer
§ Overview Heuristic Optimizer
– Time- and memory-based cost model w/o shared reads
– Heuristic high-impact rewrites
– Transformation-based search strategy with global opt scope
§ Cost Model
– HOP DAG
size propagation
– Worst-case
memory estimates
– Time estimates
– Plan tree statistics
aggregation
12 IBM Research
ParFOR
b(cm)
Generic ParFOR
Generic
RIX LIX b(cov)...
RIX b(cm)...
Plan Tree P
k=4
Mapped
HOP DAGs
D
RIX
b(cov) b(cm)
j
...
X
d1= 0, d2= 0
d1= 1M
d2= 1
d1= 0, d2= 0
d1= 1M
d2= 1
d1= 1M
d2= 10
M = (80 M B,
80 M B)
M = (8 M B,
8 M B)
M=(8 MB,
88 MB)
M = (0 M B,
8 M B)
M = (0 M B,
16 M B)
M= (< output mem> ,
< operation mem> )
Y
M=88MB
M=352MB
© 2015 IBM Corporation
Hands-On Lab: Task-Parallel ParFor Programs
§ Exercise: Pairwise Pearson Correlation
– a) Simple for
– loop w/ -stats
– b) Task-parallel
parfor w/ -stats
13 IBM Research
D = rand(rows=100000, cols=100);
m = nrow(D);
n = ncol(D);
R = matrix(0, rows=n, cols=n);
parfor( i in 1:(n-1) ) {
X = D[ ,i];
m2X = centralMoment(X,2);
sigmaX = sqrt( m2X*(m/(m-1.0)) );
parfor( j in (i+1):n ) {
Y = D[ ,j];
m2Y = centralMoment(Y,2);
sigmaY = sqrt( m2Y*(m/(m-1.0)) );
R[i,j] = cov(X,Y) / (sigmaX*sigmaY);
}}
write(R, "./tmp/R", format="binary");
© 2015 IBM Corporation
Outline
§ Common Framework
§ Optimizer-Centric Techniques
§ Runtime-Centric Techniques
– ParFor Optimizer/Runtime
– Buffer Pool + Specific Optimizations
– Spark-Specific Rewrites
– Partitioning-Preserving Operations
– Update In-Place
– Ongoing Research (CLA)
14 IBM Research
© 2015 IBM Corporation
Buffer Pool Overview
§ Motivation
– Exchange of intermediates between local and remote operations
(HDFS, RDDs, GPU divide memory)
– Eviction of in-memory objects (integrated with garbage collector)
§ Primitives
– acquireRead, acquireModify, release, exportData, getRdd, getBroadcast
§ Spark Specifics
– Lineage tracking
RDDs/broadcasts
– Guarded RDD
collect/parallelize
– Partitioned
Broadcast variables
15 IBM Research
MatrixObject/
WriteBuffer
Lineage Tracking
© 2015 IBM Corporation
Outline
§ Common Framework
§ Optimizer-Centric Techniques
§ Runtime-Centric Techniques
– ParFor Optimizer/Runtime
– Buffer Pool + Specific Optimizations
– Spark-Specific Rewrites
– Partitioning-Preserving Operations
– Update In-Place
– Ongoing Research (CLA)
16 IBM Research
© 2015 IBM Corporation
Spark-Specific Optimizations
§ Spark-Specific Rewrites
– Automatic caching/checkpoint injection
(MEM_DISK / MEM_DISK_SER)
– Automatic repartition injection
§ Operator Selection
– Spark exec type selection
– Transitive Spark exec type
– Physical operator selection
§ Extended ParFor Optimizer
– Deferred checkpoint/repartition injection
– Eager checkpointing/repartitioning
– Fair scheduling for concurrent jobs
– Local degree of parallelism
§ Runtime Optimizations
– Lazy Spark context creation
– Short-circuit read/collect
17 IBM Research
X = read($1);
y = read($2);
...
r = -(t(X) %*% y);
while(i < maxi &
norm_r2 > norm_r2_trgt) {
q = t(X)%*%(X%*%p) + lambda*p;
alpha = norm_r2 / (t(p)%*%q);
w = w + alpha * p;
old_norm_r2 = norm_r2;
r = r + alpha * q;
norm_r2 = sum(r * r);
beta = norm_r2 / old_norm_r2;
p = -r + beta * p;
i = i + 1;
}
...
write(w, $4);
chkpt X MEM_DISK
Ex: Checkpoint Injection LinregCG
Spark Exec
(24 cores)
25% user
75% data&exec
(50% Min & 75% Max)
© 2015 IBM Corporation
SystemML on Spark: Lessons Learned
§ Spark over Custom Framework
– Well engineered framework with strong contributor base
– Seamless data preparation and feature engineering
§ Stateful Distributed Caching
– Standing executors with distributed caching and fast task scheduling
– Challenges: task parallelism, memory constraints, fair resource management
§ Memory Efficiency
– Compact data structures to avoid cache spilling (serialization, CSR)
– Custom serialization and compression
§ Lazy RDD Evaluation
– Automatic grouping of operations into distributed jobs, incl partitioning
– Challenges: multiple actions/repeated execution, runtime plan compilation!
§ Declarative ML
– Introduction of Spark backend did not require algorithm changes!
– Automatically exploit distributed caching and partitioning via rewrites
18 IBM Research
25% tasks
© 2015 IBM Corporation
Outline
§ Common Framework
§ Optimizer-Centric Techniques
§ Runtime-Centric Techniques
– ParFor Optimizer/Runtime
– Buffer Pool + Specific Optimizations
– Spark-Specific Rewrites
– Partitioning-Preserving Operations
– Update In-Place
– Ongoing Research (CLA)
19 IBM Research
© 2015 IBM Corporation
Partitioning-Preserving Operations on Spark
§ Partitioning-preserving ops
– Op is partitioning-preserving if key not changed (guaranteed)
– 1) Implicit: Use restrictive APIs (mapValues() vs mapToPair())
– 2) Explicit: Partition computation w/ declaration of partitioning-preserving
(memory-efficiency via “lazy iterators”)
§ Partitioning-exploiting ops
– 1) Implicit: Operations based on join, cogroup, etc
– 2) Explicit: Custom physical operators on original keys (e.g., zipmm)
20 IBM Research
Physical
Blocking and
Partitioning
© 2015 IBM Corporation
Partitioning-Exploiting ZIPMM
§ Operation:
Z = t(X) %* % y
21 IBM Research
§ Operations: Transpose, Join, Multiplication
§ Shuffle
§ Operations: Join, Transpose & Multiplication
§ Avoid unnecessary shuffle
X y
Input:
1,1
1,2
1,3
Approach: zipmm
X y Z
1,1
1,2
1,3
Partitions not
preserved after
transpose, as keys
changed.
t(X)
yApproach: Naive
1,1 2,1 3,1
© 2015 IBM Corporation
Example Multiclass SVM
§ Example: Multiclass SVM
– Vectors in nrow(X) neither fit into driver nor broadcast
(MapMM not applicable)
– ncol(X) ≤ Bc (zipmm applicable)
22 IBM Research
parfor(iter_class in 1:num_classes) {
Y_local = 2 * (Y == iter_class) – 1;
g_old = t(X) %*% Y_local;
...
while( continue ) {
Xd = X %*% s;
... inner while loop (compute step_sz)
Xw = Xw + step_sz * Xd;
out = 1 - Y_local * Xw;
out = (out > 0) * out;
g_new = t(X) %*% (out * Y_local) ...
repart, chkpt X MEM_DISK
chkpt y_local MEM_DISK
zipmm
chkpt Xd, Xw MEM_DISK
© 2015 IBM Corporation
Hands-On Lab: Partitioning-Preserving Operations
§ Exercise: MultiClass SVM
– W/o repartition injection
– W/ repartitioning injection
23 IBM Research
parfor(iter_class in 1:num_classes) {
Y_local = 2 * (Y == iter_class) –
1;
g_old = t(X) %*% Y_local;
...
while( continue ) {
Xd = X %*% s;
... inner while loop (compute
step_sz)
Xw = Xw + step_sz * Xd;
out = 1 - Y_local * Xw;
out = (out > 0) * out;
g_new = t(X) %*% (out *
Y_local) ...
}
}
© 2015 IBM Corporation
Outline
§ Common Framework
§ Optimizer-Centric Techniques
§ Runtime-Centric Techniques
– ParFor Optimizer/Runtime
– Buffer Pool + Specific Optimizations
– Spark-Specific Rewrites
– Partitioning-Preserving Operations
– Update In-Place
– Ongoing Research (CLA)
24 IBM Research
© 2015 IBM Corporation
Update In-Place
§ Loop Update In-Place
– 1) ParFor result indexing / intermediates (w/ pinned matrix objects)
– 2) For/while/parfor loops with pure left indexing access to variable
– Both require pinning / shallow serialize to overcome buffer pool serialization
– Example Type 2:
§ Where we cannot apply Update In-Place
– Matrix object cannot fit into local memory budget (CP only)
– Interleaving operations (mix of update and reference, might be non-obvious)
– Example
25 IBM Research
for(i in 1:nrow(X))
for(j in 1:ncol(X))
X[i,j] = i+j;
R = X;
X[i,j] = i+j;
y = sum(R);
Would create
incorrect results!
© 2015 IBM Corporation
Hands-On Lab: Update In-Place
§ Exercise: Update In-Place (SystemML master/0.11 only):
– a) Update in-place application (investigate -explain and –stats)
– b) Update in-place not applicable – why?
26 IBM Research
for(i in 1:nrow(X))
for(j in 1:ncol(X))
X[i,j] = i+j;
for(i in 1:nrow(X)) {
for(j in 1:ncol(X)) {
print(sum(X));
X[i,j] = i+j;
}
}
© 2015 IBM Corporation
Outline
§ Common Framework
§ Optimizer-Centric Techniques
§ Runtime-Centric Techniques
– ParFor Optimizer/Runtime
– Buffer Pool + Specific Optimizations
– Spark-Specific Rewrites
– Partitioning-Preserving Operations
– Update In-Place
– Ongoing Research (CLA)
27 IBM Research
© 2015 IBM Corporation
Compressed Linear Algebra
§ Motivation / Problem
– Iterative ML algorithms w/ repeated read-only data access
– IO-bound matrix-vector multiplications è crucial to fit data in memory
– General-purpose heavy-/lightweight techniques too slow / modest comp. ratios
§ Goals
– Performance close to uncompressed
– Good compression ratios
28 IBM Research
[A. Elgohary,M. Boehm,P. J. Haas, F. R. Reiss, B.
Reinwald:Compressed Linear Algebra for Large-
Scale Machine Learning.PVLDB 9(12), 2016]
© 2015 IBM Corporation
Compressed Linear Algebra (2)
§ Approach
– Database compression
– LA over compressed rep.
– Column-compression
schemes (OLE, RLE, UC)
– Cache-conscious CLA ops
– Sampling-based
compression algorithm
§ Results
29 IBM Research
[A. Elgohary,M. Boehm,P. J. Haas, F. R. Reiss, B.
Reinwald:Compressed Linear Algebra for Large-
Scale Machine Learning.PVLDB 9(12), 2016]
Algorithm Dataset ULA Snappy CLA
GLM Mnist40m (90GB) 409s 647s 397s
Mnist240m (540GB) 74,301s 23,717s 2,787s
MLogreg Mnist40m (90GB) 630s 875s 622s
Mnist240m (540GB) 83,153s 27,626s 4,379s
L2SVM Mnist40m (90GB) 394 461 429
Mnist240m (540GB) 14,041 8,423 2,593
Up to
26x
© 2015 IBM Corporation30 IBM Research
SystemML is Open Source:
• Apache Incubator Project (11/2015)
• Website: http://systemml.apache.org/
• Source code: https://github.com/
apache/incubator-systemml

More Related Content

What's hot

Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Ruairi de Frein
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisDavid Gleich
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex PerrierAlexis Perrier
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rYanchang Zhao
 
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...Asai Masataro
 
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung HanHomomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Hanvpnmentor
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReducePietro Michiardi
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialRoger Rafanell Mas
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGATO project
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSMohammedMedani4
 
Datomic rtree-pres
Datomic rtree-presDatomic rtree-pres
Datomic rtree-presjsofra
 

What's hot (19)

Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
pmux
pmuxpmux
pmux
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
[AAAI-16] Tiebreaking Strategies for A* Search: How to Explore the Final Fron...
 
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung HanHomomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 
Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
 
Core concepts of C++
Core concepts of C++  Core concepts of C++
Core concepts of C++
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorial
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
ELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICSELECTRICAL POWER SYSTEMS ECONOMICS
ELECTRICAL POWER SYSTEMS ECONOMICS
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Datomic rtree-pres
Datomic rtree-presDatomic rtree-pres
Datomic rtree-pres
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 

Viewers also liked

Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj SenArvind Surve
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiArvind Surve
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and InvocationArvind Surve
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Arvind Surve
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenArvind Surve
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarArvind Surve
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V EvfimievskiArvind Surve
 
Amia tb-review-11
Amia tb-review-11Amia tb-review-11
Amia tb-review-11Russ Altman
 
南投縣發祥國小辦理教育優先區計畫實施情形考核表
南投縣發祥國小辦理教育優先區計畫實施情形考核表南投縣發祥國小辦理教育優先區計畫實施情形考核表
南投縣發祥國小辦理教育優先區計畫實施情形考核表Shi Guo Xian
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissSpark Summit
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLJen Aman
 
MANIFESTAÇÃO AO SUBSTITUTIVO DO SENADO FEDERAL AO PROJETO DE LEI DA CÂMARA Nº...
MANIFESTAÇÃO AO SUBSTITUTIVO DO SENADO FEDERAL AO PROJETO DE LEI DA CÂMARA Nº...MANIFESTAÇÃO AO SUBSTITUTIVO DO SENADO FEDERAL AO PROJETO DE LEI DA CÂMARA Nº...
MANIFESTAÇÃO AO SUBSTITUTIVO DO SENADO FEDERAL AO PROJETO DE LEI DA CÂMARA Nº...Brasscom
 
Innovative & Groundbreaking Automotive Startups
Innovative & Groundbreaking Automotive StartupsInnovative & Groundbreaking Automotive Startups
Innovative & Groundbreaking Automotive StartupsMark Seyforth
 

Viewers also liked (20)

Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj Sen
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V Evfimievski
 
Resume sachin kuckian
Resume sachin kuckianResume sachin kuckian
Resume sachin kuckian
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and Invocation
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj Sen
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
 
Amia tb-review-11
Amia tb-review-11Amia tb-review-11
Amia tb-review-11
 
南投縣發祥國小辦理教育優先區計畫實施情形考核表
南投縣發祥國小辦理教育優先區計畫實施情形考核表南投縣發祥國小辦理教育優先區計畫實施情形考核表
南投縣發祥國小辦理教育優先區計畫實施情形考核表
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko Korndorf
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
Inside Apache SystemML
Inside Apache SystemMLInside Apache SystemML
Inside Apache SystemML
 
MANIFESTAÇÃO AO SUBSTITUTIVO DO SENADO FEDERAL AO PROJETO DE LEI DA CÂMARA Nº...
MANIFESTAÇÃO AO SUBSTITUTIVO DO SENADO FEDERAL AO PROJETO DE LEI DA CÂMARA Nº...MANIFESTAÇÃO AO SUBSTITUTIVO DO SENADO FEDERAL AO PROJETO DE LEI DA CÂMARA Nº...
MANIFESTAÇÃO AO SUBSTITUTIVO DO SENADO FEDERAL AO PROJETO DE LEI DA CÂMARA Nº...
 
S4 tarea4 cagaf
S4 tarea4 cagafS4 tarea4 cagaf
S4 tarea4 cagaf
 
Brochure English
Brochure EnglishBrochure English
Brochure English
 
Innovative & Groundbreaking Automotive Startups
Innovative & Groundbreaking Automotive StartupsInnovative & Groundbreaking Automotive Startups
Innovative & Groundbreaking Automotive Startups
 
GlobalmkgPart2
GlobalmkgPart2GlobalmkgPart2
GlobalmkgPart2
 

Similar to Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias Boehm

Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Massimo Schenone
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache SparkIndicThreads
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkVincent Poncet
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesCarol McDonald
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomFacultad de Informática UCM
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkMapR Technologies
 
[214]유연하고 확장성 있는 빅데이터 처리
[214]유연하고 확장성 있는 빅데이터 처리[214]유연하고 확장성 있는 빅데이터 처리
[214]유연하고 확장성 있는 빅데이터 처리NAVER D2
 
Scala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkScala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkIvan Morozov
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
Accelerating the Development of Efficient CP Optimizer Models
Accelerating the Development of Efficient CP Optimizer ModelsAccelerating the Development of Efficient CP Optimizer Models
Accelerating the Development of Efficient CP Optimizer ModelsPhilippe Laborie
 

Similar to Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias Boehm (20)

Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
 
[214]유연하고 확장성 있는 빅데이터 처리
[214]유연하고 확장성 있는 빅데이터 처리[214]유연하고 확장성 있는 빅데이터 처리
[214]유연하고 확장성 있는 빅데이터 처리
 
Scala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkScala Meetup Hamburg - Spark
Scala Meetup Hamburg - Spark
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
Accelerating the Development of Efficient CP Optimizer Models
Accelerating the Development of Efficient CP Optimizer ModelsAccelerating the Development of Efficient CP Optimizer Models
Accelerating the Development of Efficient CP Optimizer Models
 

More from Arvind Surve

Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarArvind Surve
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj SenArvind Surve
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V EvfimievskiArvind Surve
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenArvind Surve
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Arvind Surve
 
DML Syntax and Invocation process
DML Syntax and Invocation processDML Syntax and Invocation process
DML Syntax and Invocation processArvind Surve
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldArvind Surve
 
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Arvind Surve
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiArvind Surve
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldArvind Surve
 

More from Arvind Surve (11)

Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj Sen
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj Sen
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...
 
DML Syntax and Invocation process
DML Syntax and Invocation processDML Syntax and Invocation process
DML Syntax and Invocation process
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
 
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V Evfimievski
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
 

Recently uploaded

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 

Recently uploaded (20)

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 

Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias Boehm

  • 1. © 2015 IBM Corporation S7/8: SystemML’s Optimizer and Runtime Matthias Boehm1, Arvind C. Surve2 1 IBM Research – Almaden 2 IBM Spark Technology Center IBM Research
  • 2. © 2015 IBM Corporation Abstraction: The Good, the Bad and the Ugly 2 IBM Research q = t(X) %*% (w * (X %*% v)) [adapted from Peter Alvaro:"I See What You Mean“, Strange Loop, 2015] Simple & Analysis-Centric Data Independence Platform Independence Adaptivity (Missing) Size InformationOperator Selection (Missing) Rewrites Distributed Operations Distributed Storage (Implicit) Copy-on-Write Data Skew Load Imbalance Latency Complex Control Flow Local / Remote Memory Budgets The Ugly: Expectations ≠ Reality è Understanding of optimizer and runtime techniques underpinning declarative, large-scale ML Efficiency & Performance
  • 3. © 2015 IBM Corporation Outline § Common Framework § Optimizer-Centric Techniques § Runtime-Centric Techniques – ParFor Optimizer/Runtime – Buffer Pool + Specific Optimizations – Spark-Specific Rewrites – Partitioning-Preserving Operations – Update In-Place – Ongoing Research (CLA) 3 IBM Research
  • 4. © 2015 IBM Corporation Optimization through ParFor § Motivation – SystemML focus primarily on data parallelism – Dedicated parfor construct for task parallelism § ParFor approach: – Complementary parfor parallelization strategies – Cost-based optimization framework for task-parallel ML – Memory budget as common constraint 4 IBM Research
  • 5. © 2015 IBM Corporation Recap: Basic HOP DAG Compilation Example Pearson Correlation § DML Script § HOP DAG 5 IBM Research X = read( "./in/X" ); #data on HDFS Y = read( "./in/Y" ); m = nrow(X); sigmaX = sqrt( centralMoment(X,2)*(m/(m-1.0)) ); sigmaY = sqrt( centralMoment(Y,2)*(m/(m-1.0)) ); r = cov(X,Y) / (sigmaX * sigmaY); write( r, "./out/r" ); b(cov) X r (“./out/r“) Y (“./in/Y“, 106 x1) b(cm) b(cm) b(*) b(*) 2 u(sqrt) u(sqrt) b(*) b(/ ) b(/ ) b(-) 1,000,000 1 w/ o constant folding (1.000001) (“./in/X“, 106 x1) u() … unary operator b() … binary operator cov … covariance cm … central moment sqrt … square root yx yx YX σσ ρ ),cov( , = Exploit Spark/MR data parallelism if beneficial/required
  • 6. © 2015 IBM Corporation Running Example: Pairwise Pearson Correlation § Representative for more complex bivariate statistics (Pearson‘s R, Anova F, Chi-squared, Degree of freedom, P-value, Cramers V, Spearman, etc) 6 IBM Research D = read("./input/D"); m = nrow(D); n = ncol(D); R = matrix(0, rows=n, cols=n); parfor( i in 1:(n-1) ) { X = D[ ,i]; m2X = centralMoment(X,2); sigmaX = sqrt( m2X*(m/(m-1.0)) ); parfor( j in (i+1):n ) { Y = D[ ,j]; m2Y = centralMoment(Y,2); sigmaY = sqrt( m2Y*(m/(m-1.0)) ); R[i,j] = cov(X,Y) / (sigmaX*sigmaY); }} write(R, "./output/R"); Challenges: • Triangular nested loop • Column-wise access on unordered distributed data • Bivariate all-to-all data shuffling pattern. Exploit task and data parallelism if beneficial/required
  • 7. © 2015 IBM Corporation Overview Parallelization Strategies § Conceptual Design: Master/worker – Task: group of parfor iterations § Task Partitioning – Naive, static, fixed, factoring, factoring_cmax – Task overhead vs load balance? § Task Execution – Local, remote (Spark/MR), remoteDP (Spark/MR) – Various runtime optimizations – Degree of parallelism/IO/latency? § Result Aggregation – Local memory, local file, remote (Spark/MR) – W/ and w/o compare – Result locality/IO/latency? 7 IBM Research n = 12 parfor( i in 1:(n-1) ) { X = D[ ,i]; … R[i,j] = … } è Optimizer leverages these to generate efficient execution plans
  • 8. © 2015 IBM Corporation Example Task Partitioning 8 IBM Research § Scenario: k=24 workers, 10,000 iterations Factoring Factoring CMAX (150) 0 50 100 150 200 250 # of Iterations Tasks (1 to 208) Naive Fixed(250) Static 0 50 100 150 200 250 300 350 400 450 # of Iterations Tasks (1 to 24) 0 50 100 150 200 250 300 # of Iterations Tasks (1 to 40) 0 10 20 30 40 50 1 Iteration per task Tasks (1 to 10000) 0 50 100 150 200 250 # of Iterations Tasks (1 to 228)
  • 9. © 2015 IBM Corporation Task Execution: Local and Remote Parallelism 9 IBM Research Local execution (multicore) Remote execution (cluster) Local ParWorker k ParFOR (local) Local ParWorker 1 while(wßdeq()) foreach pi ∈ w execute(prog(pi)) Task Partitioning Parallel Result Aggregation Task Queue ... w5: i, { 11} w4: i, { 9,10} w3: i, { 7, 8 } w2: i, { 4,5,6} w1: i, { 1,2,3} Hadoop ParWorker Mapper k ParFOR (remote) ParWorker Mapper 1 map(key,value) wßparse(value) foreach pi ∈ w execute(prog(pi)) Task Partitioning Parallel Result Aggregation ... … A|MATRIX|./ out/ A7tmp w5: i, { 11} w4: i, { 9,10} w3: i, { 7, 8 } w2: i, { 4,5,6} w1: i, { 1,2,3} Hybrid parallelism: combinations of local/remote and data-parallel jobs
  • 10. © 2015 IBM Corporation Task Execution: Runtime Optimizations § Data Partitioning – Problem: Repeated MR jobs for indexed access – Access-awareness (cost estimation, correct plan generation) – Operators: local file-based, remote MR job § Data Locality – Problem: Co-location of parfor tasks to partitions/matrices – Location reporting per logical parfor task (e.g., for parfor(i) à D[, i]) 10 IBM Research parfor( i in 1:(n-1) ) { X = D[ ,i]; … parfor( j in (i+1):n ){ Y = D[ ,j]; … }} N ode2 D 3 D 4 D 5 D 9 D 10 D 11 Node 1 N ode1 D 1 D 2 D 6 D 7 D 8 Node 2 Node 1 Node 1, 2 Node 2 w5: i, { 11} w4: i, { 9,10} w3: i, { 7, 8 } w2: i, { 4,5,6} w1: i, { 1,2,3} Reported Locations: Task File Partitions Partitions
  • 11. © 2015 IBM Corporation Optimization Framework – Problem Formulation § Design: Runtime optimization for each top-level parfor § Plan Tree P – Nodes NP • Exec type et • Parallelism k • Attributes A – Height h – Exec contexts ECP § Plan Tree Optimization Problem – 11 IBM Research ParFOR b(cm) Generic ParFOR Generic RIX LIX b(cov)... RIX b(cm)... ec0 ParFOR b(cm) Generic ParFOR ec1 Generic RIX LIX b(cov)... RIX b(cm)... cmec = 600 MB ckec = 1 cmec = 1024 MB ckec = 16 MR ec … execution context cm … memory constraint ck … parallelism constraint [M. Boehm et al. Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML PVLDB 7(7), 2014] [M. Boehm et al. Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs.CoRR,2015]
  • 12. © 2015 IBM Corporation Optimization Framework – Cost Model / Optimizer § Overview Heuristic Optimizer – Time- and memory-based cost model w/o shared reads – Heuristic high-impact rewrites – Transformation-based search strategy with global opt scope § Cost Model – HOP DAG size propagation – Worst-case memory estimates – Time estimates – Plan tree statistics aggregation 12 IBM Research ParFOR b(cm) Generic ParFOR Generic RIX LIX b(cov)... RIX b(cm)... Plan Tree P k=4 Mapped HOP DAGs D RIX b(cov) b(cm) j ... X d1= 0, d2= 0 d1= 1M d2= 1 d1= 0, d2= 0 d1= 1M d2= 1 d1= 1M d2= 10 M = (80 M B, 80 M B) M = (8 M B, 8 M B) M=(8 MB, 88 MB) M = (0 M B, 8 M B) M = (0 M B, 16 M B) M= (< output mem> , < operation mem> ) Y M=88MB M=352MB
  • 13. © 2015 IBM Corporation Hands-On Lab: Task-Parallel ParFor Programs § Exercise: Pairwise Pearson Correlation – a) Simple for – loop w/ -stats – b) Task-parallel parfor w/ -stats 13 IBM Research D = rand(rows=100000, cols=100); m = nrow(D); n = ncol(D); R = matrix(0, rows=n, cols=n); parfor( i in 1:(n-1) ) { X = D[ ,i]; m2X = centralMoment(X,2); sigmaX = sqrt( m2X*(m/(m-1.0)) ); parfor( j in (i+1):n ) { Y = D[ ,j]; m2Y = centralMoment(Y,2); sigmaY = sqrt( m2Y*(m/(m-1.0)) ); R[i,j] = cov(X,Y) / (sigmaX*sigmaY); }} write(R, "./tmp/R", format="binary");
  • 14. © 2015 IBM Corporation Outline § Common Framework § Optimizer-Centric Techniques § Runtime-Centric Techniques – ParFor Optimizer/Runtime – Buffer Pool + Specific Optimizations – Spark-Specific Rewrites – Partitioning-Preserving Operations – Update In-Place – Ongoing Research (CLA) 14 IBM Research
  • 15. © 2015 IBM Corporation Buffer Pool Overview § Motivation – Exchange of intermediates between local and remote operations (HDFS, RDDs, GPU divide memory) – Eviction of in-memory objects (integrated with garbage collector) § Primitives – acquireRead, acquireModify, release, exportData, getRdd, getBroadcast § Spark Specifics – Lineage tracking RDDs/broadcasts – Guarded RDD collect/parallelize – Partitioned Broadcast variables 15 IBM Research MatrixObject/ WriteBuffer Lineage Tracking
  • 16. © 2015 IBM Corporation Outline § Common Framework § Optimizer-Centric Techniques § Runtime-Centric Techniques – ParFor Optimizer/Runtime – Buffer Pool + Specific Optimizations – Spark-Specific Rewrites – Partitioning-Preserving Operations – Update In-Place – Ongoing Research (CLA) 16 IBM Research
  • 17. © 2015 IBM Corporation Spark-Specific Optimizations § Spark-Specific Rewrites – Automatic caching/checkpoint injection (MEM_DISK / MEM_DISK_SER) – Automatic repartition injection § Operator Selection – Spark exec type selection – Transitive Spark exec type – Physical operator selection § Extended ParFor Optimizer – Deferred checkpoint/repartition injection – Eager checkpointing/repartitioning – Fair scheduling for concurrent jobs – Local degree of parallelism § Runtime Optimizations – Lazy Spark context creation – Short-circuit read/collect 17 IBM Research X = read($1); y = read($2); ... r = -(t(X) %*% y); while(i < maxi & norm_r2 > norm_r2_trgt) { q = t(X)%*%(X%*%p) + lambda*p; alpha = norm_r2 / (t(p)%*%q); w = w + alpha * p; old_norm_r2 = norm_r2; r = r + alpha * q; norm_r2 = sum(r * r); beta = norm_r2 / old_norm_r2; p = -r + beta * p; i = i + 1; } ... write(w, $4); chkpt X MEM_DISK Ex: Checkpoint Injection LinregCG Spark Exec (24 cores) 25% user 75% data&exec (50% Min & 75% Max)
  • 18. © 2015 IBM Corporation SystemML on Spark: Lessons Learned § Spark over Custom Framework – Well engineered framework with strong contributor base – Seamless data preparation and feature engineering § Stateful Distributed Caching – Standing executors with distributed caching and fast task scheduling – Challenges: task parallelism, memory constraints, fair resource management § Memory Efficiency – Compact data structures to avoid cache spilling (serialization, CSR) – Custom serialization and compression § Lazy RDD Evaluation – Automatic grouping of operations into distributed jobs, incl partitioning – Challenges: multiple actions/repeated execution, runtime plan compilation! § Declarative ML – Introduction of Spark backend did not require algorithm changes! – Automatically exploit distributed caching and partitioning via rewrites 18 IBM Research 25% tasks
  • 19. © 2015 IBM Corporation Outline § Common Framework § Optimizer-Centric Techniques § Runtime-Centric Techniques – ParFor Optimizer/Runtime – Buffer Pool + Specific Optimizations – Spark-Specific Rewrites – Partitioning-Preserving Operations – Update In-Place – Ongoing Research (CLA) 19 IBM Research
  • 20. © 2015 IBM Corporation Partitioning-Preserving Operations on Spark § Partitioning-preserving ops – Op is partitioning-preserving if key not changed (guaranteed) – 1) Implicit: Use restrictive APIs (mapValues() vs mapToPair()) – 2) Explicit: Partition computation w/ declaration of partitioning-preserving (memory-efficiency via “lazy iterators”) § Partitioning-exploiting ops – 1) Implicit: Operations based on join, cogroup, etc – 2) Explicit: Custom physical operators on original keys (e.g., zipmm) 20 IBM Research Physical Blocking and Partitioning
  • 21. © 2015 IBM Corporation Partitioning-Exploiting ZIPMM § Operation: Z = t(X) %* % y 21 IBM Research § Operations: Transpose, Join, Multiplication § Shuffle § Operations: Join, Transpose & Multiplication § Avoid unnecessary shuffle X y Input: 1,1 1,2 1,3 Approach: zipmm X y Z 1,1 1,2 1,3 Partitions not preserved after transpose, as keys changed. t(X) yApproach: Naive 1,1 2,1 3,1
  • 22. © 2015 IBM Corporation Example Multiclass SVM § Example: Multiclass SVM – Vectors in nrow(X) neither fit into driver nor broadcast (MapMM not applicable) – ncol(X) ≤ Bc (zipmm applicable) 22 IBM Research parfor(iter_class in 1:num_classes) { Y_local = 2 * (Y == iter_class) – 1; g_old = t(X) %*% Y_local; ... while( continue ) { Xd = X %*% s; ... inner while loop (compute step_sz) Xw = Xw + step_sz * Xd; out = 1 - Y_local * Xw; out = (out > 0) * out; g_new = t(X) %*% (out * Y_local) ... repart, chkpt X MEM_DISK chkpt y_local MEM_DISK zipmm chkpt Xd, Xw MEM_DISK
  • 23. © 2015 IBM Corporation Hands-On Lab: Partitioning-Preserving Operations § Exercise: MultiClass SVM – W/o repartition injection – W/ repartitioning injection 23 IBM Research parfor(iter_class in 1:num_classes) { Y_local = 2 * (Y == iter_class) – 1; g_old = t(X) %*% Y_local; ... while( continue ) { Xd = X %*% s; ... inner while loop (compute step_sz) Xw = Xw + step_sz * Xd; out = 1 - Y_local * Xw; out = (out > 0) * out; g_new = t(X) %*% (out * Y_local) ... } }
  • 24. © 2015 IBM Corporation Outline § Common Framework § Optimizer-Centric Techniques § Runtime-Centric Techniques – ParFor Optimizer/Runtime – Buffer Pool + Specific Optimizations – Spark-Specific Rewrites – Partitioning-Preserving Operations – Update In-Place – Ongoing Research (CLA) 24 IBM Research
  • 25. © 2015 IBM Corporation Update In-Place § Loop Update In-Place – 1) ParFor result indexing / intermediates (w/ pinned matrix objects) – 2) For/while/parfor loops with pure left indexing access to variable – Both require pinning / shallow serialize to overcome buffer pool serialization – Example Type 2: § Where we cannot apply Update In-Place – Matrix object cannot fit into local memory budget (CP only) – Interleaving operations (mix of update and reference, might be non-obvious) – Example 25 IBM Research for(i in 1:nrow(X)) for(j in 1:ncol(X)) X[i,j] = i+j; R = X; X[i,j] = i+j; y = sum(R); Would create incorrect results!
  • 26. © 2015 IBM Corporation Hands-On Lab: Update In-Place § Exercise: Update In-Place (SystemML master/0.11 only): – a) Update in-place application (investigate -explain and –stats) – b) Update in-place not applicable – why? 26 IBM Research for(i in 1:nrow(X)) for(j in 1:ncol(X)) X[i,j] = i+j; for(i in 1:nrow(X)) { for(j in 1:ncol(X)) { print(sum(X)); X[i,j] = i+j; } }
  • 27. © 2015 IBM Corporation Outline § Common Framework § Optimizer-Centric Techniques § Runtime-Centric Techniques – ParFor Optimizer/Runtime – Buffer Pool + Specific Optimizations – Spark-Specific Rewrites – Partitioning-Preserving Operations – Update In-Place – Ongoing Research (CLA) 27 IBM Research
  • 28. © 2015 IBM Corporation Compressed Linear Algebra § Motivation / Problem – Iterative ML algorithms w/ repeated read-only data access – IO-bound matrix-vector multiplications è crucial to fit data in memory – General-purpose heavy-/lightweight techniques too slow / modest comp. ratios § Goals – Performance close to uncompressed – Good compression ratios 28 IBM Research [A. Elgohary,M. Boehm,P. J. Haas, F. R. Reiss, B. Reinwald:Compressed Linear Algebra for Large- Scale Machine Learning.PVLDB 9(12), 2016]
  • 29. © 2015 IBM Corporation Compressed Linear Algebra (2) § Approach – Database compression – LA over compressed rep. – Column-compression schemes (OLE, RLE, UC) – Cache-conscious CLA ops – Sampling-based compression algorithm § Results 29 IBM Research [A. Elgohary,M. Boehm,P. J. Haas, F. R. Reiss, B. Reinwald:Compressed Linear Algebra for Large- Scale Machine Learning.PVLDB 9(12), 2016] Algorithm Dataset ULA Snappy CLA GLM Mnist40m (90GB) 409s 647s 397s Mnist240m (540GB) 74,301s 23,717s 2,787s MLogreg Mnist40m (90GB) 630s 875s 622s Mnist240m (540GB) 83,153s 27,626s 4,379s L2SVM Mnist40m (90GB) 394 461 429 Mnist240m (540GB) 14,041 8,423 2,593 Up to 26x
  • 30. © 2015 IBM Corporation30 IBM Research SystemML is Open Source: • Apache Incubator Project (11/2015) • Website: http://systemml.apache.org/ • Source code: https://github.com/ apache/incubator-systemml