Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SystemML	Architecture
Niketan	Pansare,	Berthold	Reinwald
July	25th,	2016
Agenda
• High-level	Design	&	APIs
• Architecture	Overview
• Tooling
• Important	links
2
From	http://systemml.apache.org/
Agenda
• High-level	Design	&	APIs	
• Architecture	Overview
• Language
• Compiler
• Runtime
• Two	examples:
• Simple	DML	ex...
Agenda
• High-level	Design	&	APIs	
• Architecture	Overview
• Language
• Compiler
• Runtime
• Two	examples:
• Simple	DML	ex...
SystemML Design
5
DML (Declarative Machine
Learning Language)
Hadoop or Spark Cluster
(scale-out)
since 2010
In-Memory Sin...
SystemML Design
6
Hadoop or Spark Cluster
(scale-out)
since 2010
In-Memory Single Node
(scale-up)
since 2012
DML Scripts
D...
SystemML Design
7
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
since 2012
DML Scripts
Data
SystemM...
SystemML Design
8
Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
since 2012 since 2015
DML Scripts
Data
System...
SystemML Design
9
Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
since 2012 since 2015
DML Scripts
Data
System...
SystemML Design
10
In-Memory Single Node
(scale-up)
since 2012
DML Scripts
Data
SystemML
1.	On	disk/HDFS
2.	RDD/DataFrame
...
Agenda
• High-level	Design	&	APIs	
• Architecture	Overview
• Language
• Compiler
• Runtime
• Two	examples:
• Simple	DML	ex...
From DML to Execution Plan
12
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
DML Scripts DML (Declar...
From DML to Execution Plan
13
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
Runtime
Compiler
Langua...
SystemML Compilation Chain
14
SystemML Compilation Chain
15
• Parsing
• Parse input DML/PyDML using Antlr v4 (see Dml.g4 and Pydml.g4)
• Perform syntact...
SystemML Compilation Chain
16
• Dataflow in DAGs of operations on matrices, frames, and scalars
• Choosing from alternativ...
SystemML Compilation Chain
17
*	Discussed	later	in	Tooling
spark-submit	--master	yarn-client	 --driver-memory	20G	--num-ex...
SystemML Compilation Chain
18
• Low-level physical execution plan (LOPDags)
• Over key-value pairs for MR
• Over RDDs for ...
SystemML Compilation Chain
19
Spark
CP + b sb _mVar1
SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE
_mVar2.MATRIX.DOUBLE...
SystemML Runtime
• Hybrid Runtime
• CP: single machine operations & orchestrate jobs
• MR: generic Map-Reduce jobs & opera...
From DML to Execution Plan
21
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
Runtime
Compiler
Langua...
A	Data	Scientist	– Linear	Regression
22
X ≈
Explanatory/
Independent Variables
Predicted/
Dependant VariableModel
w
w = ar...
SystemML	– Run	LinReg	CG	on	Spark
23
100M
10,000
100M
1
yX
100M
1,000
X
100M
100
X
100M
10
X
100M
1
y
100M
1
y
100M
1
y
8 ...
Agenda
• Architecture	Overview
• Language	&	APIs
• Compiler
• Runtime
• Two	examples:
• Simple	DML	expression	with	an	exam...
SystemML’s	Compilation	Chain	/	Overview	Tools
25
EXPLAIN
hops
STATS
DEBUG
EXPLAIN
runtime
[Matthias	Boehm	et	al:
SystemML'...
Explain	(Understanding	Execution	Plans)
• Overview
• Shows	generated	execution	plan	(at	different	compilation	steps)	
• In...
Explain:	Understanding	HOP	DAGs	(simple		DML)
27
Spark
• HOP	ID
• HOP	opcode
• HOP	input	data	dependencies	(via	HOP	IDs)
•...
Explain:	Understanding	HOP	DAGs	(entire	script)
• Example	DML	Script	(Simplified	LinregDS)
28
X = read($1);
y = read($2);
...
Explain:	Understanding	HOP	DAGs	(2)
• Explain	Hops
29
15/07/05 17:18:06 INFO api.DMLScript: EXPLAIN (HOPS):
# Memory Budge...
Explain:	Understanding	Runtime	Plans	(1)
• Explain	Runtime	(simplified	filenames,	removed	rmvar)
30 IBM	Research
15/07/05 ...
Stats	(Profiling	Runtime	Statistics)
• Overview
• Profiles	and	shows	aggregated	runtime	statistics	of	potential	bottleneck...
SystemML Statistics
Total	exec	time
Buffer	pool	stats	
Dynamic	recompilation	stats
JVM	stats	(JIT,	GC)
Heavy	hitter	instru...
Debug	(Script	Debugging)
• Overview
• Script-level	debugging	by	end-users	(and	developers)
• Introduced	09/2014	as	result	...
Agenda
• Architecture	Overview
• Language	&	APIs
• Compiler
• Runtime
• Two	examples:
• Simple	DML	expression	with	an	exam...
Important Links
• Website:	http://systemml.apache.org/
35
Important Links
• Website:	http://systemml.apache.org/
• Interested	in	SystemML	?
• Go	to	https://github.com/apache/incuba...
Important Links
• Website: http://systemml.apache.org/
• Interested in SystemML ?
• Go to https://github.com/apache/incuba...
Thank	You
Upcoming SlideShare
Loading in …5
×

0

Share

Download to read offline

Apache SystemML Architecture by Niketan Panesar

Download to read offline

This deck will present high level Apache SystemML design and architecture containing language, compiler and runtime modules. It will describe how compilation chain gets generated and variable analysis done. It will show HOPs and runtime plan for sample use case. It will show how to get statistics, and some diagnostic tools can be used.

  • Be the first to like this

Apache SystemML Architecture by Niketan Panesar

  1. 1. SystemML Architecture Niketan Pansare, Berthold Reinwald July 25th, 2016
  2. 2. Agenda • High-level Design & APIs • Architecture Overview • Tooling • Important links 2 From http://systemml.apache.org/
  3. 3. Agenda • High-level Design & APIs • Architecture Overview • Language • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 3
  4. 4. Agenda • High-level Design & APIs • Architecture Overview • Language • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 4
  5. 5. SystemML Design 5 DML (Declarative Machine Learning Language) Hadoop or Spark Cluster (scale-out) since 2010 In-Memory Single Node (scale-up) since 2012 since 2015 DML Scripts Data CP + b sb _mVar1 SPARK mapmm X _mvar1 _mVar2 RIGHT false NONE CP * y _mVar2 _mVar3 Hybrid execution plans* SystemML3. double [] [] 1. On disk/HDFS 2. RDD/DataFrame
  6. 6. SystemML Design 6 Hadoop or Spark Cluster (scale-out) since 2010 In-Memory Single Node (scale-up) since 2012 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] Command line API* (also MLContext*) -exec hadoop
  7. 7. SystemML Design 7 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) since 2012 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] Two options: 1. –exec singlenode 2. Use standalone jar (preserves rewrites, but may spawn Local MR jobs) Command line API* (also MLContext*)
  8. 8. SystemML Design 8 Spark Cluster (scale-out) In-Memory Single Node (scale-up) since 2012 since 2015 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] Command line API* (also MLContext*)
  9. 9. SystemML Design 9 Spark Cluster (scale-out) In-Memory Single Node (scale-up) since 2012 since 2015 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] MLContext API - Java/Python/Scala https://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
  10. 10. SystemML Design 10 In-Memory Single Node (scale-up) since 2012 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] JMLC API https://apache.github.io/incubator-systemml/jmlc.html
  11. 11. Agenda • High-level Design & APIs • Architecture Overview • Language • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 11
  12. 12. From DML to Execution Plan 12 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) DML Scripts DML (Declarative Machine Learning Language) since 2010since 2012 since 2015 Data CP + b sb _mVar1 SPARK mapmm X _mvar1 _mVar2 RIGHT false NONE CP * y _mVar2 _mVar3 Hybrid execution plans* SystemML
  13. 13. From DML to Execution Plan 13 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) Runtime Compiler Language DML Scripts DML (Declarative Machine Learning Language) since 2010since 2012 since 2015 Data CP + b sb _mVar1 SPARK mapmm X _mvar1 _mVar2 RIGHT false NONE CP * y _mVar2 _mVar3 Hybrid execution plans* Assuming an example dataset X: 100M X 500, y: 100M X 1, b/sb: 500 X 1
  14. 14. SystemML Compilation Chain 14
  15. 15. SystemML Compilation Chain 15 • Parsing • Parse input DML/PyDML using Antlr v4 (see Dml.g4 and Pydml.g4) • Perform syntactic validation • Construct DMLProgram (=> list of Statement and function blocks) • Live Variable Analysis • Classic dataflow analysis • A variable is “live” if it holds value that may be needed in future • Dead code elimination • Semantic Validation
  16. 16. SystemML Compilation Chain 16 • Dataflow in DAGs of operations on matrices, frames, and scalars • Choosing from alternative execution plans based on memory and cost estimates • Operator ordering & selection; hybrid plans
  17. 17. SystemML Compilation Chain 17 * Discussed later in Tooling spark-submit --master yarn-client --driver-memory 20G --num-executors 4 --executor-memory 40G --executor-cores 24 SystemML.jar -f test.dml -explain hops
  18. 18. SystemML Compilation Chain 18 • Low-level physical execution plan (LOPDags) • Over key-value pairs for MR • Over RDDs for Spark • “Piggybacking” operations into minimal number Map-Reduce jobs
  19. 19. SystemML Compilation Chain 19 Spark CP + b sb _mVar1 SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT false NONE CP * y _mVar2 _mVar3
  20. 20. SystemML Runtime • Hybrid Runtime • CP: single machine operations & orchestrate jobs • MR: generic Map-Reduce jobs & operations • SP: Spark Jobs • Numerically stable operators • Dense / sparse matrix representation • Multi-Level buffer pool (caching) to evict in-memory objects • Dynamic Recompilation for initial unknowns Control Program Runtime Program Buffer Pool ParFor Optimizer/ Runtime MR InstSpark Inst CP Inst Recompiler DFS IOMem/FS IO Generic MR Jobs MatrixBlock Library (single/multi-threaded)
  21. 21. From DML to Execution Plan 21 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) Runtime Compiler Language DML Scripts DML (Declarative Machine Learning Language) since 2010since 2012 since 2015 Data CP + b sb _mVar1 SPARK mapmm X_mvar1 _mVar2 RIGHT false NONE CP * y _mVar2 _mVar3 Hybrid execution plans* Varying data sizes LinearRegression.dml
  22. 22. A Data Scientist – Linear Regression 22 X ≈ Explanatory/ Independent Variables Predicted/ Dependant VariableModel w w = argminw ||Xw-y||2 +λ||w||2 Optimization Problem: next direction Iterate until convergence initialize step size update w initial direction accuracy measures Conjugate GradientMethod: • Start off with the (negative) gradient • For each step 1. Move to the optimal point along the chosen direction; 2. Recompute the gradient; 3. Project it onto the subspace conjugate* to allprior directions; 4. Use this as the next direction (* conjugate =orthogonalgiven A as the metric) A = XT X + λ y
  23. 23. SystemML – Run LinReg CG on Spark 23 100M 10,000 100M 1 yX 100M 1,000 X 100M 100 X 100M 10 X 100M 1 y 100M 1 y 100M 1 y 8 TB 800 GB 80 GB 8 GB … tMMp … Multithreaded Single Node 20 GB Driver on 16c 6 x 55 GB Executors Hybrid Plan with RDD caching and fused operator Hybrid Plan with RDD out-of- core and fused operator Hybrid Plan with RDD out-of- core and different operators … x.persist(); ... X.mapValues(tMMv ) .reduce () … Driver Fused Executors … RDD cache: X tMMv tMMv … x.persist(); ... X.mapValues(tMMv) .reduce() ... Executors … RDD cache: X tMMv tMMv Driver Spilling … x.persist(); ... // 2 MxV mult // with broadcast, // mapToPair, and // reduceByKey ... Executors … RDD cache: X Mv tvM Mv tvM Driver Driver Cache
  24. 24. Agenda • Architecture Overview • Language & APIs • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 24
  25. 25. SystemML’s Compilation Chain / Overview Tools 25 EXPLAIN hops STATS DEBUG EXPLAIN runtime [Matthias Boehm et al: SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng. Bull 2014] HOP (High-level operator) LOP (Low-level operator) EXPLAIN *_recompile
  26. 26. Explain (Understanding Execution Plans) • Overview • Shows generated execution plan (at different compilation steps) • Introduced 05/2014 for internal usage • Important tool for understanding/debugging optimizer choices! • Usage • hadoop jar SystemML.jar -f test.dml –explain [hops | runtime | hops_recompile | runtime_recompile] • Hops • Program w/ hop dags after optimization • Runtime (default) • Program w/ generated runtime instructions • Hops_recompile: • See hops + hop dag after every recompile • Runtime_recompile: • See runtime + generated runtime instructions after every recompile 26
  27. 27. Explain: Understanding HOP DAGs (simple DML) 27 Spark • HOP ID • HOP opcode • HOP input data dependencies (via HOP IDs) • HOP output matrix characteristics (rlen, clen, brlen, bclen, nnz) • Hop memory estimates (all inputs, intermediates, output à operation mem) • Hop execution type (CP/SP/MR) • Optional: indicators of reblock/checkpointing (caching) of hop outputs -explain hops -explain recompile_hops spark-submit --master yarn-client --driver-memory 20G --num-executors 4 --executor-memory 40G --executor-cores 24 SystemML.jar -f test.dml -explain hops Broadcast mem budget
  28. 28. Explain: Understanding HOP DAGs (entire script) • Example DML Script (Simplified LinregDS) 28 X = read($1); y = read($2); intercept = $3; lambda = $4; if( intercept == 1 ) { ones = matrix(1, nrow(X), 1); X = append(X, ones); } I = matrix(1, ncol(X), 1); A = t(X) %*% X + diag(I*lambda); b = t(X) %*% y; beta = solve(A, b); write(beta, $5); Invocation: hadoop jar SystemML.jar -f linregds.dml -args X y 0 0 beta Scenario: X: 100,000 x 1,000, 1.0 y: 100,000 x 1, 1.0 (800MB, 200+GFlop)
  29. 29. Explain: Understanding HOP DAGs (2) • Explain Hops 29 15/07/05 17:18:06 INFO api.DMLScript: EXPLAIN (HOPS): # Memory Budget local/remote = 57344MB/1434MB/1434MB # Degree of Parallelism (vcores) local/remote = 24/144/72 PROGRAM --MAIN PROGRAM ----GENERIC (lines 1-4) [recompile=false] ------(10) PRead X [100000,1000,1000,1000,100000000] [0,0,763 -> 763MB], CP ------(11) TWrite X (10) [100000,1000,1000,1000,100000000] [763,0,0 -> 763MB], CP ------(21) PRead y [100000,1,1000,1000,100000] [0,0,1 -> 1MB], CP ------(22) TWrite y (21) [100000,1,1000,1000,100000] [1,0,0 -> 1MB], CP ------(24) TWrite intercept [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP ------(26) TWrite lambda [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP ----GENERIC (lines 11-16) [recompile=false] ------(42) TRead X [100000,1000,1000,1000,100000000] [0,0,763 -> 763MB], CP ------(52) r(t) (42) [1000,100000,1000,1000,100000000] [763,0,763 -> 1526MB] ------(53) ba(+*) (52,42) [1000,1000,1000,1000,-1] [1526,8,8 -> 1541MB], CP ------(43) TRead y [100000,1,1000,1000,100000] [0,0,1 -> 1MB], CP ------(59) ba(+*) (52,43) [1000,1,1000,1000,-1] [764,0,0 -> 764MB], CP ------(60) b(solve) (53,59) [1000,1,1000,1000,-1] [8,8,0 -> 15MB], CP ------(66) PWrite beta (60) [1000,1,-1,-1,-1] [0,0,0 -> 0MB], CP Cluster Characteristics Program Structure (incl recompile) Unrolled HOP DAG Notes: if branch (6-9) and regularization removed by rewrites
  30. 30. Explain: Understanding Runtime Plans (1) • Explain Runtime (simplified filenames, removed rmvar) 30 IBM Research 15/07/05 17:18:53 INFO api.DMLScript: EXPLAIN (RUNTIME): # Memory Budget local/remote = 57344MB/1434MB/1434MB # Degree of Parallelism (vcores) local/remote = 24/144/72 PROGRAM ( size CP/MR = 25/0 ) --MAIN PROGRAM ----GENERIC (lines 1-4) [recompile=false] ------CP createvar pREADX X false binaryblock 100000 1000 1000 1000 100000000 ------CP createvar pREADy y false binaryblock 100000 1 1000 1000 100000 ------CP assignvar 0.SCALAR.INT.true intercept.SCALAR.INT ------CP assignvar 0.0.SCALAR.DOUBLE.true lambda.SCALAR.DOUBLE ------CP cpvar pREADX X ------CP cpvar pREADy y ----GENERIC (lines 11-16) [recompile=false] ------CP createvar _mVar2 .../_t0/temp1 true binaryblock 1000 1000 1000 1000 -1 ------CP tsmm X.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE LEFT 24 ------CP createvar _mVar3 .../_t0/temp2 true binaryblock 1 100000 1000 1000 100000 ------CP r' y.MATRIX.DOUBLE _mVar3.MATRIX.DOUBLE ------CP createvar _mVar4 .../_t0/temp3 true binaryblock 1 1000 1000 1000 -1 ------CP ba+* _mVar3.MATRIX.DOUBLE X.MATRIX.DOUBLE _mVar4.MATRIX.DOUBLE 24 ------CP createvar _mVar5 .../_t0/temp4 true binaryblock 1000 1 1000 1000 -1 ------CP r' _mVar4.MATRIX.DOUBLE _mVar5.MATRIX.DOUBLE ------CP createvar _mVar6 .../_t0/temp5 true binaryblock 1000 1 1000 1000 -1 ------CP solve _mVar2.MATRIX.DOUBLE _mVar5.MATRIX.DOUBLE _mVar6.MATRIX.DOUBLE ------CP write _mVar6.MATRIX.DOUBLE .../beta.SCALAR.STRING.true textcell.SCALAR.STRING.true Literally a string representation of runtime instructions
  31. 31. Stats (Profiling Runtime Statistics) • Overview • Profiles and shows aggregated runtime statistics of potential bottlenecks • Introduced 01/2014 for internal usage, extension of buffer pool stats 01/2013 • Important tool for understanding runtime characteristics and profiling/tuning system internals by developers • Usage • hadoop jar SystemML.jar -f test.dml -stats 31 IBM Research
  32. 32. SystemML Statistics Total exec time Buffer pool stats Dynamic recompilation stats JVM stats (JIT, GC) Heavy hitter instructions (incl. buffer pool times) optional: parfor stats (if program contains parfors)
  33. 33. Debug (Script Debugging) • Overview • Script-level debugging by end-users (and developers) • Introduced 09/2014 as result of intern project • gdb-inspired command-line debugger interface • Usage • hadoop jar SystemML.jar -f test.dml -debug 33
  34. 34. Agenda • Architecture Overview • Language & APIs • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 34
  35. 35. Important Links • Website: http://systemml.apache.org/ 35
  36. 36. Important Links • Website: http://systemml.apache.org/ • Interested in SystemML ? • Go to https://github.com/apache/incubator-systemml and “Star it” 36
  37. 37. Important Links • Website: http://systemml.apache.org/ • Interested in SystemML ? • Go to https://github.com/apache/incubator-systemml and “Star it” • Want to contribute to SystemML ? • See http://apache.github.io/incubator-systemml/contributing-to- systemml.html • List of issues: https://issues.apache.org/jira/browse/SYSTEMML/ • Ask any of our PMC members for suggestions • Want to try out SystemML ? • Laptop: http://apache.github.io/incubator-systemml/quick-start-guide.html (Does not require Hadoop/Spark installation) • Spark Cluster: http://apache.github.io/incubator-systemml/spark- mlcontext-programming-guide.html (Includes Jupyter/Zeppelin demo) 37
  38. 38. Thank You

This deck will present high level Apache SystemML design and architecture containing language, compiler and runtime modules. It will describe how compilation chain gets generated and variable analysis done. It will show HOPs and runtime plan for sample use case. It will show how to get statistics, and some diagnostic tools can be used.

Views

Total views

310

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

10

Shares

0

Comments

0

Likes

0

×