SlideShare a Scribd company logo
Scalable	Machine	Learning	
with	Apache	SystemML
Berthold	Reinwald,	Nakul	Jindal
IBM
June	21st,	2016
1
Agenda
• What	is	Apache	SystemML
• How	to	implement	SystemML algorithms
è data	scientist
• How	to	run	SystemML algorithms
è user
• How	does	SystemML work
è SystemML developer
2
What	is	Apache	SystemML
• In	a	nutshell
• a	language	for	data	scientists	to	implement	scalable	ML	algorithms	
• 2	language	variants:	R-like	and	Python-like	syntax
• Strong	foundation	of	linear	algebra	operations	and	statistical	functions
• Comes	with	approx.	20+	algorithms	pre-implemented
• Cost-based	optimizer	to	compile	execution	plans
• Depending	on	data	characteristics	(tall/skinny,	short/wide;	dense/sparse)	
and	cluster	characteristics
• ranging	from	single	node	to	clusters	(MapReduce,	Spark);	hybrid	plans
• APIs	&	Tools
• Command	line:	hadoop jar,	spark-submit,	standalone	Java	app
• JMLC:	embed	as	library
• Spark	MLContext:	Scala,	Python,	and	Java
• Tools
• REPL	(Scala	Spark	and	pyspark)
• Spark	ML	pipeline
3
Big	Data	Analytics	- Characteristics
• Large	number	of	models
• Large	number	of	data	points
• Large	number	of	features
• Sparse	data
• Large	number/size	of	intermediates
• Large	number	of	pairs
• Custom	analytics
4
SystemML	– Declarative	ML
• Analytics	language	for	data	scientists
(“The	SQL	for	analytics”)
• Algorithms	expressed	in	a	declarative,	
high-level	language	DML	with	R-like	syntax
• Productivity	of	data	scientists	
• Enable
• Solutions	 development
• Tools
• Compiler
• Cost-based	optimizer	to	generate	
execution	plans	and	to	parallelize
• based	on	data	characteristics
• based	on	cluster	and	machine	characteristics
• Physical	operators	for	in-memory	single	node	
and	cluster	execution
• Performance	&	Scalability
5
High-Level	SystemML	Architecture
6
Hadoop or Spark Cluster
(scale-out)
In-Memory Single Node
(scale-up)
Runtime
Compiler
Language
DML Scripts DML (Declarative Machine
Learning Language)
Apache	SystemML Incubator	Project
• June,	2015:	SystemML open	source	announced	at	
Spark	Summit
• Sep.,	2015:	public	github
• Oct.,	2015:	1st open	source	binary	release	(0.8.0)
• Nov.,	2015:	Enter	Apache	incubation
• http://systemml.apache.org/
• https://github.com/apache/incubator-systemml
• Jan.,	2016:	SystemML 0.9.0	(1st Apache	release)
• June,	2016:	SystemML 0.10.0	release
7
Apache	SystemML	Incubator
http://systemml.apache.org/
• Get	SystemML
• Documentation
• DML	Reference	Guide
• Algorithms	Guide
• Running
• Community
• JIRA	server
• GitHub
8
DML	Language	Reference	Guide
9
https://apache.github.io/incubator-systemml/dml-language-reference.html
Sample	Code
A = 1.0 # A is an integer
X <- matrix(“4 3 2 5 7 8”, rows=3, cols=2) # X = matrix of size 3,2 '<-' is assignment
Y = matrix(1, rows=3, cols=2) # Y = matrix of size 3,2 with all 1s
b <- t(X) %*% Y # %*% is matrix multiply, t(X) is transpose
S = "hello world"
i=0
while(i < max_iteration) {
H = (H * (t(W) %*% (V/(W%*%H))))/t(colSums(W)) # * is element by element mult
W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
i = i + 1; # i is an integer
}
print (toString(H)) # toString converts a matrix to a string
10
Sample	Code
source("nn/layers/affine.dml") as affine # import a file in the “affine“ namespace
[W, b] = affine::init(D, M) # calls the init function, multiple
return
parfor (i in 1:nrow(X)) { # i iterates over 1 through num rows in X in parallel
for (j in 1:ncol(X)) { # j iterates over 1 through num cols in X
# Computation ...
}
}
write (M, fileM, format=“text”) # M=matrix, fileM=file, also writes to
HDFS
X = read (fileX) # fileX=file, also reads from HDFS
if (ncol (A) > 1) {
# Matrix A is being sliced by a given range of columns
A[,1:(ncol (A) - 1)] = A[,1:(ncol (A) - 1)] - A[,2:ncol (A)];
}
11
Sample	Code
interpSpline = function(
double x, matrix[double] X, matrix[double] Y, matrix[double] K) return (double q) {
i = as.integer(nrow(X) - sum(ppred(X, x, ">=")) + 1)
# misc computation …
q = as.scalar(qm)
}
eigen = externalFunction(Matrix[Double] A)
return(Matrix[Double] eval, Matrix[Double] evec)
implemented in (classname="org.apache.sysml.udf.lib.EigenWrapper",
exectype="mem")
12
Sample	Code	(From	LinearRegDS.dml*)
A = t(X) %*% X
b = t(X) %*% y
if (intercept_status == 2) {
A = t(diag (scale_X) %*% A + shift_X %*% A [m_ext, ])
A = diag (scale_X) %*% A + shift_X %*% A [m_ext, ]
b = diag (scale_X) %*% b + shift_X %*% b [m_ext, ]
}
A = A + diag (lambda)
print ("Calling the Direct Solver...")
beta_unscaled = solve (A, b)
*https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/LinearRegDS.dml#L133
13
DML	Editor	Support
• Very	rudimentary	editor	support
• Bit	of	shameless	self-promotion	 :	
• Atom	– Hackable	Text	editor
• Install	package	- https://atom.io/packages/language-dml
• From	GUI	- http://flight-manual.atom.io/using-atom/sections/atom-packages/
• Or	from	command	line	– apm	install	language-dml
• Rudimentary	snippet	based	completion	of	builtin	function
• Vim
• Install	package	- https://github.com/nakul02/vim-dml
• Works	with	Vundle	(vim	package	manager)
• There	is	an	experimental	Zeppelin	Notebook	integration	with	DML	–
• https://issues.apache.org/jira/browse/SYSTEMML-542
• Available	as	a	docker	image	to	play	with	- https://hub.docker.com/r/nakul02/incubator-
zeppelin/
• Please	send	feedback	when	using	these,	requests	for	features,	bugs
• I’ll	work	on	them	when	I	can
14
SystemML Algorithms
15
Category Description
Descriptive Statistics
Univariate
Bivariate
Stratified Bivariate
Classification
Logistic Regression (multinomial)
Multi-Class SVM
Naïve Bayes (multinomial)
Decision Trees
Random Forest
Clustering k-Means
Regression
Linear Regression system of equations
CG (conjugate gradient descent)
Generalized Linear Models
(GLM)
Distributions: Gaussian, Poisson, Gamma, InverseGaussian, Binomial, Bernoulli
Links for all distributions: identity, log, sq. root,inverse, 1/μ2
Links for Binomial / Bernoulli: logit, probit, cloglog, cauchit
Stepwise
Linear
GLM
Dimension Reduction PCA
Matrix Factorization ALS
direct solve
CG (conjugate gradient descent)
Survival Models
Kaplan Meier Estimate
Cox Proportional Hazard Regression
Predict Algorithm-specific scoring
Transformation (native) Recoding, dummy coding, binning, scaling, missing value imputation
Documentation: https://apache.github.io/incubator-systemml/algorithms-reference.html
Scripts:	/usr/SystemML/systemml-0.10.0-incubating/scripts/algorithms/
Running	/	Invoking	SystemML
• Command	line
• Standalone	(Java	application	in	single	JVM,	in	bin	folder)
• Spark	(spark-submit,	in	scripts	folder)
• hadoop command	line
• APIs	(MLContext)
• Scala,	e.g.	run	from	Spark	shell
• Python,	e.g.	run	from	PySpark
• Java
• In-Memory
16
MLContext	API	– Example	Usage
val ml = new MLContext(sc)
val X_train = sc.textFile("amazon0601.txt")
.filter(!_.startsWith("#"))
.map(_.split("t") match{case Array(prod1, prod2)=>(prod1.toInt, prod2.toInt,1.0)})
.toDF("prod_i", "prod_j", "x_ij")
.filter("prod_i < 5000 AND prod_j < 5000") // Change to smaller number
.cache()
17
MLContext API	– Example	Usage
val pnmf =
"""
# data & args
X = read($X)
rank = as.integer($rank)
# Computation ....
write(negloglik, $negloglikout)
write(W, $Wout)
write(H, $Hout)
"""
18
MLContext API	– Example	Usage
val pnmf =
"""
# data & args
X = read($X)
rank = as.integer($rank)
# Computation ....
write(negloglik, $negloglikout)
write(W, $Wout)
write(H, $Hout)
"""
ml.registerInput("X", X_train)
ml.registerOutput("W")
ml.registerOutput("H")
ml.registerOutput("negloglik")
val outputs = ml.executeScript(pnmf,
Map("maxiter" -> "100", "rank" -> "10"))
val negloglik = getScalarDouble(outputs,
"negloglik")
19
Run	LinReg	CG	from	Spark	Shell	
(MLContext)
20
Run	SystemML	in	ML	Pipeline
21
End-to-end	on	Spark	…	in	Code
22
import org.apache.spark.sql._
val ctx = new org.apache.spark.sql.SQLContext(sc)
val tweets = ctx.jsonFile("hdfs:/twitter/decahose")
tweets.registerAsTable("tweetTable")
ctx.sql("SELECT text FROM tweetTable LIMIT 5").collect.foreach(println)
ctx.sql("SELECT lang, COUNT(*) AS cnt FROM tweetTable 
GROUP BY lang ORDER BY cnt DESC LIMIT 10").collect.foreach(println)
val texts = ctx.sql("SELECT text FROM tweetTable").map(_.head.toString)
def featurize(str: String): Vector = { ... }
val vectors = texts.map(featurize).toDF.cache()
val mcV = new MatrixCharacteristics(vectors.count, vocabSize, 1000,1000)
val V = RDDConvertUtilsExt(sc, vectors, mcV, false, "_1")
val ml = new com.ibm.bi.dml.api.MLContext(sc)
ml.registerInput("V", V, mcV)
ml.registerOutput("W")
ml.registerOutput("H")
val args = Array(numTopics, numGNMFIter)
val out = ml.execute("GNMF.dml", args)
val W = out.getDF("W")
val H = out.getDF("H")
def getWords(r: Row): Array[(String, Double)] = { ... }
val topics = H.rdd.map(getWords)
Twitter Data
Explore Data
In SQL
Data Set
Training Set
Topic Modeling
SQLML
Get Topics
SystemML	Architecture	
Language
• R- like syntax
• Linear algebra, statisticalfunctions, controlstructures, etc.
• User-defined & externalfunction
• Parsing
• Statement blocks & statements
• Program Analysis, type inference, dead code elimination
High-Level Operator (HOP) Component
• Dataflow in DAGs of operations on matrices, frames, and scalars
• Choosing from alternative execution plans based on memoryand
cost estimates: operatorordering & selection; hybrid plans
Low-Level Operator (LOP) Component
• Low-levelphysicalexecution plan (LOPDags)overkey-value pairs
• “Piggybacking”operationsinto minimalnumber Map-Reduce jobs
Runtime
• Hybrid Runtime
• CP: single machine operations & orchestrate jobs
• MR: generic Map-Reduce jobs & operations
• SP: Spark Jobs
• Numerically stable operators
• Dense / sparse matrix representation
• Multi-Levelbuffer pool (caching) to evict in-memory objects
• Dynamic Recompilation for initial unknowns
Command	
Line
JMLC
Spark	
MLContext
APIs
High-Level	 Operators
Parser/Language
Low-Level	 Operators
Compiler
Runtime
Control	 Program
Runtime
Program
Buffer	Pool
ParFor Optimizer/
Runtime
MR
InstSpark	
Inst
CP
Inst
Recompiler
Cost-based
optimizations
DFS	IOMem/FS	IO
Generic
MR	Jobs
MatrixBlock Library
(single/multi-threaded)
23
SystemML	Compilation	Chain
24
CP + b sb _mVar1
SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE
_mVar2.MATRIX.DOUBLE RIGHT false NONE
CP * y _mVar2 _mVar3
Selected	Algebraic	Simplification	
Rewrites
25
Name Dynamic	Pattern
Remove	Unnecessary	Indexing X[a:b,c:d] = Y à X = Y iff dims(X)=dims(Y)
X = Y[, 1] à X = Y iff ncol(Y)=1
Remove	Empty	
Matrix	Multiply
X%*%Y à matrix(0,nrow(X),ncol(Y))
iff nnz(X)=0|nnz(Y)=0
Removed	Unnecessary	Outer
Product
X*(Y%*%matrix(1,...)) à X*Y
iff ncol(Y)=1
Simplify	Diag Aggregates sum(diag(X))àtrace(X) iff ncol(X)=1
SimplifyMatrix	Mult Diag diag(X)%*%Y à X*Y iff ncol(X)=1&ncol(Y)=1
Simplify	Diag Matrix	Mult diag(X%*%Y) à rowSums(X*t(Y)) iff ncol(Y)>1
Simplify	Dot	Product	Sum	 sum(X^2) à t(X)%*%X iff ncol(X)=1
Name Static	Pattern
Remove	Unnecessary	Operations t(t(X)), X/1, X*1, X-0 à X matrix(1,)/X à 1/X
rand(,min=-1,max=1)*7 à rand(,min=-7,max=7)
Binary	to Unary X+X à 2*X X*X à X^2 X-X*Y à X*(1-Y)
Simplify	Diag Aggregates trace(X%*%Y)àsum(X*t(Y))
A	Data	Scientist	– Linear	Regression
26
X ≈
Explanatory/
Independent Variables
Predicted/
Dependant VariableModel
w
w = argminw ||Xw-y||2 +λ||w||2
Optimization Problem:
next	direction
Iterate	until	
convergence
initialize
step	size
update		w
initial	direction
accuracy
measures
Conjugate GradientMethod:
• Start off with the (negative) gradient
• For each step
1. Move to the optimal point along the chosen direction;
2. Recompute the gradient;
3. Project it onto the subspace conjugate* to allprior directions;
4. Use this as the next direction
(* conjugate =orthogonalgiven A as the metric)
A = XT X + λ
y
SystemML – Run	LinReg CG	on	Spark
27
100M
10,000
100M
1
yX
100M
1,000
X
100M
100
X
100M
10
X
100M
1
y
100M
1
y
100M
1
y
8 TB
800 GB
80 GB
8 GB …
tMMp
…
Multithreaded
Single Node
20 GB Driver on 16c
6 x 55 GB Executors
Hybrid Plan
with RDD caching
and fused operator
Hybrid Plan
with RDD out-of-
core and fused
operator
Hybrid Plan
with RDD out-of-
core and different
operators
…
x.persist();
...
X.mapValues(tMMp
)
.reduce ()
…
Driver
Fused
Executors
…
RDD	cache:	X
tMMv tMMv
…
x.persist();
...
X.mapValues(tMMp)
.reduce()
...
Executors
…
RDD	cache:	X
tMMv tMMv
Driver
Spilling
…
x.persist();
...
// 2 MxV mult
// with broadcast,
// mapToPair, and
// reduceByKey
... Executors
…
RDD	cache:	X
Mv
tvM
Mv
tvM
Driver
Driver
Cache
LinReg CG	for	varying	Data
28
8 GB
100M x 10
80 GB
100M x 100
800 GB
100M x 1K
8 TB
100M x 10K
CP+Spark 21 92 2,065 40,395
Spark 76 124 2,159 40,130
CP+MR 24 277 2,613 41,006
10
100
1,000
10,000
100,000
ExecutionTimeinsecs(logscale)
Data Size
Note	
Driver	w+h	20	GB,	16c	
6	Executors	each	55	GB,	24c	
Convergence	in	3-4	itera+ons	
SystemML	as	of	10/2015	
Single node MT
avoids Spark Ctx
& distributed ops
3.6 x
Hybrid plan &
RDD caching
3x
Out of Core
1.2x
Fully Utilized
Ø Cost-based	optimization	 is	
important
Ø Hybrid	execution	 plans	
benefit	especially	medium-
sized	data	sets	
Ø Aggregated	in-memory	data	
sets	are	sweet	spot	for	
Spark	esp.	for	iterative	
algorithms
Ø Graceful	 degradation	for	
out-of-core
Apache	SystemML	- Summary
• Cost-based	compilation	of	machine	learning	algorithms	generates	execution	plans
• for	single-node	in-memory,	cluster,	and	hybrid	execution
• for	varying	data	characteristics:
• varying	number	of	observations	(1,000s	to	10s	of	billions)
• varying	number	of	variables	(10s	to	10s	of	millions)
• dense	and	sparse	data
• for	varying	cluster	characteristics	(memory	configurations,	degree	of	parallelism)
• Out-of-the-box,	scalable	machine	learning	algorithms
• e.g.	descriptive	statistics,	regression,	clustering,	and	classification
• "Roll-your-own"	algorithms
• Enable	programmer	productivity	(no	worry	about	scalability,	numeric	stability,	and	
optimizations)
• Fast	turn-around	for	new	algorithms
• Higher-level	language	shields	algorithm	development	investment	from	platform	
progression
• Yarn	for	resource	negotiation	and	elasticity
• Spark	for	in-memory,	iterative	processing
29
Roadmap
• Algorithms
• kNN,	word2vec,	non-linear	SVM,	etc.
• Deep	learning
• Engine
• Compressed	Linear	Algebra
• Code	Gen
• Extensions	for	Deep	Learning
• GPU	backend
• Usability
• DML	notebook
• Language	integration
• API	cleanup
30
Research	Papers
• Ahmed	Elgohary,	Matthias	Boehm,	Peter	J.	Haas,	Frederick	R.	Reiss,	Berthold	Reinwald:	Compressed	
Linear	Algebra	for	Large	Scale	Machine	Learning.	Conditional	Accept	at	VLDB	2016
• Matthias	Boehm,	Michael	W.	Dusenberry,	Deron	Eriksson,	Alexandre	V.	Evfimievski,	FarazMakari
Manshadi,	Niketan Pansare,	Berthold	Reinwald,	Frederick	R.	Reiss,	PrithvirajSen,	Arvind	C.	Surve,	
Shirish Tatikonda.	SystemML:	 Declarative	Machine	Learning	on	Spark.	VLDB	2016
• Botong Huang, Matthias	Boehm, Yuanyuan Tian, Berthold	Reinwald, Shirish Tatikonda, Frederick	R.	
Reiss:	Resource	Elasticity	for	Large-Scale	 Machine	Learning. SIGMOD	Conference 2015:137-152
• Arash Ashari,Shirish Tatikonda, Matthias	Boehm, Berthold	Reinwald, Keith	Campbell, John	
Keenleyside, P.	Sadayappan:	On	optimizing	machine	 learning	workloads	via	kernel	
fusion. PPOPP 2015:173-182
• Sebastian	Schelter, Juan	Soto, Volker	Markl, Douglas	Burdick, Berthold	Reinwald, Alexandre	V.	
Evfimievski:	Efficient	sample	generation	for	scalable	meta	learning. ICDE 2015:1191-1202
• Matthias	Boehm, Douglas	R.	Burdick,Alexandre	V.	Evfimievski, Berthold	Reinwald, Frederick	R.	
Reiss, PrithvirajSen, Shirish Tatikonda, Yuanyuan Tian:	SystemML's Optimizer:	Plan	Generation	for	
Large-Scale	 Machine	Learning	Programs. IEEE	Data	Eng.	Bull. 37(3):52-62 (2014)
• Matthias	Boehm, Shirish Tatikonda, Berthold	Reinwald, PrithvirajSen, Yuanyuan Tian, Douglas	
Burdick, Shivakumar Vaithyanathan:	 Hybrid	Parallelization	Strategies	 for	Large-Scale	Machine	
Learning	in	SystemML. PVLDB 7(7): 553-564 (2014)
• Peter	D.	Kirchner, Matthias	Boehm, Berthold	Reinwald, Daby M.	Sow, Michael	Schmidt, Deepak	S.	
Turaga, Alain	Biem:	Large	Scale	Discriminative	Metric	Learning. IPDPS	Workshops2014:1656-1663
• Yuanyuan Tian, Shirish Tatikonda, Berthold	Reinwald:	Scalable	and	Numerically	Stable	Descriptive	
Statistics	 in	SystemML. ICDE 2012:1351-1359
• Amol	Ghoting, Rajasekar Krishnamurthy,Edwin	P.	D.	Pednault, Berthold	Reinwald, Vikas
Sindhwani, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan:	SystemML:	 Declarative	
machine	learning	on	MapReduce. ICDE 2011:231-242
31
Custom
Algorithm
Optimizer
Resource
Elasticity
GPU
Sampling
Numeric
Stability
Task
Parallelism
1st
paper
on Spark
Compression
32
Thank You

More Related Content

What's hot

Introduction to functional programming using Ocaml
Introduction to functional programming using OcamlIntroduction to functional programming using Ocaml
Introduction to functional programming using Ocaml
pramode_ce
 
Systematic Generation Data and Types in C++
Systematic Generation Data and Types in C++Systematic Generation Data and Types in C++
Systematic Generation Data and Types in C++
Sumant Tambe
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regression
Akhilesh Joshi
 
Exploratory data analysis using r
Exploratory data analysis using rExploratory data analysis using r
Exploratory data analysis using r
Tahera Shaikh
 
simple linear regression
simple linear regressionsimple linear regression
simple linear regression
Akhilesh Joshi
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegressionDaniel K
 
support vector regression
support vector regressionsupport vector regression
support vector regression
Akhilesh Joshi
 
Scheme 核心概念(一)
Scheme 核心概念(一)Scheme 核心概念(一)
Scheme 核心概念(一)
維然 柯維然
 
Hive function-cheat-sheet
Hive function-cheat-sheetHive function-cheat-sheet
Hive function-cheat-sheet
Dr. Volkan OBAN
 
Thinking Functionally with JavaScript
Thinking Functionally with JavaScriptThinking Functionally with JavaScript
Thinking Functionally with JavaScript
Luis Atencio
 
Lec2
Lec2Lec2
Lec1
Lec1Lec1
Queue implementation
Queue implementationQueue implementation
Queue implementation
Rajendran
 
logistic regression with python and R
logistic regression with python and Rlogistic regression with python and R
logistic regression with python and R
Akhilesh Joshi
 
No more promises lets RxJS 2 Edit
No more promises lets RxJS 2 EditNo more promises lets RxJS 2 Edit
No more promises lets RxJS 2 Edit
Ilia Idakiev
 
Java patterns in Scala
Java patterns in ScalaJava patterns in Scala
Java patterns in Scala
Radim Pavlicek
 
ScalaDays 2013 Keynote Speech by Martin Odersky
ScalaDays 2013 Keynote Speech by Martin OderskyScalaDays 2013 Keynote Speech by Martin Odersky
ScalaDays 2013 Keynote Speech by Martin Odersky
Typesafe
 
Introduction to java 8 stream api
Introduction to java 8 stream apiIntroduction to java 8 stream api
Introduction to java 8 stream api
Vladislav sidlyarevich
 
New features in jdk8 iti
New features in jdk8 itiNew features in jdk8 iti
New features in jdk8 iti
Ahmed mar3y
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regression
Akhilesh Joshi
 

What's hot (20)

Introduction to functional programming using Ocaml
Introduction to functional programming using OcamlIntroduction to functional programming using Ocaml
Introduction to functional programming using Ocaml
 
Systematic Generation Data and Types in C++
Systematic Generation Data and Types in C++Systematic Generation Data and Types in C++
Systematic Generation Data and Types in C++
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regression
 
Exploratory data analysis using r
Exploratory data analysis using rExploratory data analysis using r
Exploratory data analysis using r
 
simple linear regression
simple linear regressionsimple linear regression
simple linear regression
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
 
support vector regression
support vector regressionsupport vector regression
support vector regression
 
Scheme 核心概念(一)
Scheme 核心概念(一)Scheme 核心概念(一)
Scheme 核心概念(一)
 
Hive function-cheat-sheet
Hive function-cheat-sheetHive function-cheat-sheet
Hive function-cheat-sheet
 
Thinking Functionally with JavaScript
Thinking Functionally with JavaScriptThinking Functionally with JavaScript
Thinking Functionally with JavaScript
 
Lec2
Lec2Lec2
Lec2
 
Lec1
Lec1Lec1
Lec1
 
Queue implementation
Queue implementationQueue implementation
Queue implementation
 
logistic regression with python and R
logistic regression with python and Rlogistic regression with python and R
logistic regression with python and R
 
No more promises lets RxJS 2 Edit
No more promises lets RxJS 2 EditNo more promises lets RxJS 2 Edit
No more promises lets RxJS 2 Edit
 
Java patterns in Scala
Java patterns in ScalaJava patterns in Scala
Java patterns in Scala
 
ScalaDays 2013 Keynote Speech by Martin Odersky
ScalaDays 2013 Keynote Speech by Martin OderskyScalaDays 2013 Keynote Speech by Martin Odersky
ScalaDays 2013 Keynote Speech by Martin Odersky
 
Introduction to java 8 stream api
Introduction to java 8 stream apiIntroduction to java 8 stream api
Introduction to java 8 stream api
 
New features in jdk8 iti
New features in jdk8 itiNew features in jdk8 iti
New features in jdk8 iti
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regression
 

Viewers also liked

Our Culture
Our CultureOur Culture
Our Culture
Vũ Nguyễn
 
Star Wars and Character Merchandising
Star Wars and Character Merchandising Star Wars and Character Merchandising
Star Wars and Character Merchandising
Mehmet – Nafi Artemel
 
Drogas y alcoholismo en el mundo juvenil
Drogas y alcoholismo en el mundo juvenilDrogas y alcoholismo en el mundo juvenil
Drogas y alcoholismo en el mundo juvenil
kennisse1
 
Chemicalcombinationsbalancingchemeqns
ChemicalcombinationsbalancingchemeqnsChemicalcombinationsbalancingchemeqns
Chemicalcombinationsbalancingchemeqns
Conferat Conferat
 
HOW TECHNOLOGY HAS CHANGED EDUCATION - DEU
HOW TECHNOLOGY HAS CHANGED EDUCATION - DEUHOW TECHNOLOGY HAS CHANGED EDUCATION - DEU
HOW TECHNOLOGY HAS CHANGED EDUCATION - DEUG Kavak
 
Growing your eBay Sales with Linnworks
Growing your eBay Sales with LinnworksGrowing your eBay Sales with Linnworks
Growing your eBay Sales with Linnworks
Linnworks
 
Витухина Юлия Анатольевна
Витухина Юлия АнатольевнаВитухина Юлия Анатольевна
Витухина Юлия Анатольевна
school135
 
Herbario
HerbarioHerbario
Herbario
andruxitoss
 
Theoryofsupply
TheoryofsupplyTheoryofsupply
Theoryofsupply
Conferat Conferat
 
On - Fideicomiso Ganadero (1)
On - Fideicomiso Ganadero (1)On - Fideicomiso Ganadero (1)
On - Fideicomiso Ganadero (1)
Maximiliano Del Torchio
 
Extracto de el gran juego
Extracto de el gran juegoExtracto de el gran juego
Extracto de el gran juego
JOHNNY JARA RAMOS
 
Maryland summit jhh 2015 how to live longer and better with lupus
Maryland summit jhh 2015 how to live longer and better with lupusMaryland summit jhh 2015 how to live longer and better with lupus
Maryland summit jhh 2015 how to live longer and better with lupus
lupusdmv
 

Viewers also liked (18)

Catalogue
CatalogueCatalogue
Catalogue
 
Our Culture
Our CultureOur Culture
Our Culture
 
Star Wars and Character Merchandising
Star Wars and Character Merchandising Star Wars and Character Merchandising
Star Wars and Character Merchandising
 
SUJEET MISHRA (1)
SUJEET MISHRA (1)SUJEET MISHRA (1)
SUJEET MISHRA (1)
 
Drogas y alcoholismo en el mundo juvenil
Drogas y alcoholismo en el mundo juvenilDrogas y alcoholismo en el mundo juvenil
Drogas y alcoholismo en el mundo juvenil
 
Chemicalcombinationsbalancingchemeqns
ChemicalcombinationsbalancingchemeqnsChemicalcombinationsbalancingchemeqns
Chemicalcombinationsbalancingchemeqns
 
HOW TECHNOLOGY HAS CHANGED EDUCATION - DEU
HOW TECHNOLOGY HAS CHANGED EDUCATION - DEUHOW TECHNOLOGY HAS CHANGED EDUCATION - DEU
HOW TECHNOLOGY HAS CHANGED EDUCATION - DEU
 
Білорусь
БілорусьБілорусь
Білорусь
 
Growing your eBay Sales with Linnworks
Growing your eBay Sales with LinnworksGrowing your eBay Sales with Linnworks
Growing your eBay Sales with Linnworks
 
Education
EducationEducation
Education
 
Витухина Юлия Анатольевна
Витухина Юлия АнатольевнаВитухина Юлия Анатольевна
Витухина Юлия Анатольевна
 
Herbario
HerbarioHerbario
Herbario
 
week 7 (2)
week 7 (2)week 7 (2)
week 7 (2)
 
Theoryofsupply
TheoryofsupplyTheoryofsupply
Theoryofsupply
 
On - Fideicomiso Ganadero (1)
On - Fideicomiso Ganadero (1)On - Fideicomiso Ganadero (1)
On - Fideicomiso Ganadero (1)
 
Extracto de el gran juego
Extracto de el gran juegoExtracto de el gran juego
Extracto de el gran juego
 
Maryland summit jhh 2015 how to live longer and better with lupus
Maryland summit jhh 2015 how to live longer and better with lupusMaryland summit jhh 2015 how to live longer and better with lupus
Maryland summit jhh 2015 how to live longer and better with lupus
 
Social Media
Social MediaSocial Media
Social Media
 

Similar to Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal

DML Syntax and Invocation process
DML Syntax and Invocation processDML Syntax and Invocation process
DML Syntax and Invocation process
Arvind Surve
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and Invocation
Arvind Surve
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
Albert Bifet
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
Baishampayan Ghose
 
MLconf NYC Xiangrui Meng
MLconf NYC Xiangrui MengMLconf NYC Xiangrui Meng
MLconf NYC Xiangrui MengMLconf
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
Databricks
 
Scala for Java Programmers
Scala for Java ProgrammersScala for Java Programmers
Scala for Java Programmers
Eric Pederson
 
OmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPOmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMP
Intel IT Center
 
Functional Programming Past Present Future
Functional Programming Past Present FutureFunctional Programming Past Present Future
Functional Programming Past Present Future
IndicThreads
 
Functional Programming - Past, Present and Future
Functional Programming - Past, Present and FutureFunctional Programming - Past, Present and Future
Functional Programming - Past, Present and Future
Pushkar Kulkarni
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
Ed Kohlwey
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
The Statistical and Applied Mathematical Sciences Institute
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
DB Tsai
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big Data
Leonardo Gamas
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
Joel Falcou
 
Basic concept of MATLAB.ppt
Basic concept of MATLAB.pptBasic concept of MATLAB.ppt
Basic concept of MATLAB.ppt
aliraza2732
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
JAXLondon_Conference
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
Massimo Schenone
 
Introduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with sparkIntroduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with spark
Angelo Leto
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
Petr Zapletal
 

Similar to Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal (20)

DML Syntax and Invocation process
DML Syntax and Invocation processDML Syntax and Invocation process
DML Syntax and Invocation process
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and Invocation
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
MLconf NYC Xiangrui Meng
MLconf NYC Xiangrui MengMLconf NYC Xiangrui Meng
MLconf NYC Xiangrui Meng
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Scala for Java Programmers
Scala for Java ProgrammersScala for Java Programmers
Scala for Java Programmers
 
OmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPOmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMP
 
Functional Programming Past Present Future
Functional Programming Past Present FutureFunctional Programming Past Present Future
Functional Programming Past Present Future
 
Functional Programming - Past, Present and Future
Functional Programming - Past, Present and FutureFunctional Programming - Past, Present and Future
Functional Programming - Past, Present and Future
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big Data
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
Basic concept of MATLAB.ppt
Basic concept of MATLAB.pptBasic concept of MATLAB.ppt
Basic concept of MATLAB.ppt
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Introduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with sparkIntroduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with spark
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 

More from Arvind Surve

Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Arvind Surve
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Arvind Surve
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
Arvind Surve
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Arvind Surve
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Arvind Surve
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj Sen
Arvind Surve
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V Evfimievski
Arvind Surve
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Arvind Surve
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Arvind Surve
 
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Arvind Surve
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Arvind Surve
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
Arvind Surve
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Arvind Surve
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Arvind Surve
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj Sen
Arvind Surve
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V Evfimievski
Arvind Surve
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Arvind Surve
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Arvind Surve
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Arvind Surve
 

More from Arvind Surve (19)

Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj Sen
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj Sen
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V Evfimievski
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
 
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Apache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan PanesarApache SystemML Architecture by Niketan Panesar
Apache SystemML Architecture by Niketan Panesar
 
Clustering and Factorization using Apache SystemML by Prithviraj Sen
Clustering and Factorization using Apache SystemML by  Prithviraj SenClustering and Factorization using Apache SystemML by  Prithviraj Sen
Clustering and Factorization using Apache SystemML by Prithviraj Sen
 
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by  Alexandre V EvfimievskiClustering and Factorization using Apache SystemML by  Alexandre V Evfimievski
Clustering and Factorization using Apache SystemML by Alexandre V Evfimievski
 
Classification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj SenClassification using Apache SystemML by Prithviraj Sen
Classification using Apache SystemML by Prithviraj Sen
 
Regression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V EvfimievskiRegression using Apache SystemML by Alexandre V Evfimievski
Regression using Apache SystemML by Alexandre V Evfimievski
 
Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...Data preparation, training and validation using SystemML by Faraz Makari Mans...
Data preparation, training and validation using SystemML by Faraz Makari Mans...
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold ReinwaldApache SystemML 2016 Summer class primer by Berthold Reinwald
Apache SystemML 2016 Summer class primer by Berthold Reinwald
 

Recently uploaded

Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Ashish Kohli
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Landownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptxLandownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptx
JezreelCabil2
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
AG2 Design
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 

Recently uploaded (20)

Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Landownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptxLandownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptx
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 

Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal

  • 2. Agenda • What is Apache SystemML • How to implement SystemML algorithms è data scientist • How to run SystemML algorithms è user • How does SystemML work è SystemML developer 2
  • 3. What is Apache SystemML • In a nutshell • a language for data scientists to implement scalable ML algorithms • 2 language variants: R-like and Python-like syntax • Strong foundation of linear algebra operations and statistical functions • Comes with approx. 20+ algorithms pre-implemented • Cost-based optimizer to compile execution plans • Depending on data characteristics (tall/skinny, short/wide; dense/sparse) and cluster characteristics • ranging from single node to clusters (MapReduce, Spark); hybrid plans • APIs & Tools • Command line: hadoop jar, spark-submit, standalone Java app • JMLC: embed as library • Spark MLContext: Scala, Python, and Java • Tools • REPL (Scala Spark and pyspark) • Spark ML pipeline 3
  • 4. Big Data Analytics - Characteristics • Large number of models • Large number of data points • Large number of features • Sparse data • Large number/size of intermediates • Large number of pairs • Custom analytics 4
  • 5. SystemML – Declarative ML • Analytics language for data scientists (“The SQL for analytics”) • Algorithms expressed in a declarative, high-level language DML with R-like syntax • Productivity of data scientists • Enable • Solutions development • Tools • Compiler • Cost-based optimizer to generate execution plans and to parallelize • based on data characteristics • based on cluster and machine characteristics • Physical operators for in-memory single node and cluster execution • Performance & Scalability 5
  • 6. High-Level SystemML Architecture 6 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) Runtime Compiler Language DML Scripts DML (Declarative Machine Learning Language)
  • 7. Apache SystemML Incubator Project • June, 2015: SystemML open source announced at Spark Summit • Sep., 2015: public github • Oct., 2015: 1st open source binary release (0.8.0) • Nov., 2015: Enter Apache incubation • http://systemml.apache.org/ • https://github.com/apache/incubator-systemml • Jan., 2016: SystemML 0.9.0 (1st Apache release) • June, 2016: SystemML 0.10.0 release 7
  • 8. Apache SystemML Incubator http://systemml.apache.org/ • Get SystemML • Documentation • DML Reference Guide • Algorithms Guide • Running • Community • JIRA server • GitHub 8
  • 10. Sample Code A = 1.0 # A is an integer X <- matrix(“4 3 2 5 7 8”, rows=3, cols=2) # X = matrix of size 3,2 '<-' is assignment Y = matrix(1, rows=3, cols=2) # Y = matrix of size 3,2 with all 1s b <- t(X) %*% Y # %*% is matrix multiply, t(X) is transpose S = "hello world" i=0 while(i < max_iteration) { H = (H * (t(W) %*% (V/(W%*%H))))/t(colSums(W)) # * is element by element mult W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H)) i = i + 1; # i is an integer } print (toString(H)) # toString converts a matrix to a string 10
  • 11. Sample Code source("nn/layers/affine.dml") as affine # import a file in the “affine“ namespace [W, b] = affine::init(D, M) # calls the init function, multiple return parfor (i in 1:nrow(X)) { # i iterates over 1 through num rows in X in parallel for (j in 1:ncol(X)) { # j iterates over 1 through num cols in X # Computation ... } } write (M, fileM, format=“text”) # M=matrix, fileM=file, also writes to HDFS X = read (fileX) # fileX=file, also reads from HDFS if (ncol (A) > 1) { # Matrix A is being sliced by a given range of columns A[,1:(ncol (A) - 1)] = A[,1:(ncol (A) - 1)] - A[,2:ncol (A)]; } 11
  • 12. Sample Code interpSpline = function( double x, matrix[double] X, matrix[double] Y, matrix[double] K) return (double q) { i = as.integer(nrow(X) - sum(ppred(X, x, ">=")) + 1) # misc computation … q = as.scalar(qm) } eigen = externalFunction(Matrix[Double] A) return(Matrix[Double] eval, Matrix[Double] evec) implemented in (classname="org.apache.sysml.udf.lib.EigenWrapper", exectype="mem") 12
  • 13. Sample Code (From LinearRegDS.dml*) A = t(X) %*% X b = t(X) %*% y if (intercept_status == 2) { A = t(diag (scale_X) %*% A + shift_X %*% A [m_ext, ]) A = diag (scale_X) %*% A + shift_X %*% A [m_ext, ] b = diag (scale_X) %*% b + shift_X %*% b [m_ext, ] } A = A + diag (lambda) print ("Calling the Direct Solver...") beta_unscaled = solve (A, b) *https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/LinearRegDS.dml#L133 13
  • 14. DML Editor Support • Very rudimentary editor support • Bit of shameless self-promotion : • Atom – Hackable Text editor • Install package - https://atom.io/packages/language-dml • From GUI - http://flight-manual.atom.io/using-atom/sections/atom-packages/ • Or from command line – apm install language-dml • Rudimentary snippet based completion of builtin function • Vim • Install package - https://github.com/nakul02/vim-dml • Works with Vundle (vim package manager) • There is an experimental Zeppelin Notebook integration with DML – • https://issues.apache.org/jira/browse/SYSTEMML-542 • Available as a docker image to play with - https://hub.docker.com/r/nakul02/incubator- zeppelin/ • Please send feedback when using these, requests for features, bugs • I’ll work on them when I can 14
  • 15. SystemML Algorithms 15 Category Description Descriptive Statistics Univariate Bivariate Stratified Bivariate Classification Logistic Regression (multinomial) Multi-Class SVM Naïve Bayes (multinomial) Decision Trees Random Forest Clustering k-Means Regression Linear Regression system of equations CG (conjugate gradient descent) Generalized Linear Models (GLM) Distributions: Gaussian, Poisson, Gamma, InverseGaussian, Binomial, Bernoulli Links for all distributions: identity, log, sq. root,inverse, 1/μ2 Links for Binomial / Bernoulli: logit, probit, cloglog, cauchit Stepwise Linear GLM Dimension Reduction PCA Matrix Factorization ALS direct solve CG (conjugate gradient descent) Survival Models Kaplan Meier Estimate Cox Proportional Hazard Regression Predict Algorithm-specific scoring Transformation (native) Recoding, dummy coding, binning, scaling, missing value imputation Documentation: https://apache.github.io/incubator-systemml/algorithms-reference.html Scripts: /usr/SystemML/systemml-0.10.0-incubating/scripts/algorithms/
  • 16. Running / Invoking SystemML • Command line • Standalone (Java application in single JVM, in bin folder) • Spark (spark-submit, in scripts folder) • hadoop command line • APIs (MLContext) • Scala, e.g. run from Spark shell • Python, e.g. run from PySpark • Java • In-Memory 16
  • 17. MLContext API – Example Usage val ml = new MLContext(sc) val X_train = sc.textFile("amazon0601.txt") .filter(!_.startsWith("#")) .map(_.split("t") match{case Array(prod1, prod2)=>(prod1.toInt, prod2.toInt,1.0)}) .toDF("prod_i", "prod_j", "x_ij") .filter("prod_i < 5000 AND prod_j < 5000") // Change to smaller number .cache() 17
  • 18. MLContext API – Example Usage val pnmf = """ # data & args X = read($X) rank = as.integer($rank) # Computation .... write(negloglik, $negloglikout) write(W, $Wout) write(H, $Hout) """ 18
  • 19. MLContext API – Example Usage val pnmf = """ # data & args X = read($X) rank = as.integer($rank) # Computation .... write(negloglik, $negloglikout) write(W, $Wout) write(H, $Hout) """ ml.registerInput("X", X_train) ml.registerOutput("W") ml.registerOutput("H") ml.registerOutput("negloglik") val outputs = ml.executeScript(pnmf, Map("maxiter" -> "100", "rank" -> "10")) val negloglik = getScalarDouble(outputs, "negloglik") 19
  • 22. End-to-end on Spark … in Code 22 import org.apache.spark.sql._ val ctx = new org.apache.spark.sql.SQLContext(sc) val tweets = ctx.jsonFile("hdfs:/twitter/decahose") tweets.registerAsTable("tweetTable") ctx.sql("SELECT text FROM tweetTable LIMIT 5").collect.foreach(println) ctx.sql("SELECT lang, COUNT(*) AS cnt FROM tweetTable GROUP BY lang ORDER BY cnt DESC LIMIT 10").collect.foreach(println) val texts = ctx.sql("SELECT text FROM tweetTable").map(_.head.toString) def featurize(str: String): Vector = { ... } val vectors = texts.map(featurize).toDF.cache() val mcV = new MatrixCharacteristics(vectors.count, vocabSize, 1000,1000) val V = RDDConvertUtilsExt(sc, vectors, mcV, false, "_1") val ml = new com.ibm.bi.dml.api.MLContext(sc) ml.registerInput("V", V, mcV) ml.registerOutput("W") ml.registerOutput("H") val args = Array(numTopics, numGNMFIter) val out = ml.execute("GNMF.dml", args) val W = out.getDF("W") val H = out.getDF("H") def getWords(r: Row): Array[(String, Double)] = { ... } val topics = H.rdd.map(getWords) Twitter Data Explore Data In SQL Data Set Training Set Topic Modeling SQLML Get Topics
  • 23. SystemML Architecture Language • R- like syntax • Linear algebra, statisticalfunctions, controlstructures, etc. • User-defined & externalfunction • Parsing • Statement blocks & statements • Program Analysis, type inference, dead code elimination High-Level Operator (HOP) Component • Dataflow in DAGs of operations on matrices, frames, and scalars • Choosing from alternative execution plans based on memoryand cost estimates: operatorordering & selection; hybrid plans Low-Level Operator (LOP) Component • Low-levelphysicalexecution plan (LOPDags)overkey-value pairs • “Piggybacking”operationsinto minimalnumber Map-Reduce jobs Runtime • Hybrid Runtime • CP: single machine operations & orchestrate jobs • MR: generic Map-Reduce jobs & operations • SP: Spark Jobs • Numerically stable operators • Dense / sparse matrix representation • Multi-Levelbuffer pool (caching) to evict in-memory objects • Dynamic Recompilation for initial unknowns Command Line JMLC Spark MLContext APIs High-Level Operators Parser/Language Low-Level Operators Compiler Runtime Control Program Runtime Program Buffer Pool ParFor Optimizer/ Runtime MR InstSpark Inst CP Inst Recompiler Cost-based optimizations DFS IOMem/FS IO Generic MR Jobs MatrixBlock Library (single/multi-threaded) 23
  • 24. SystemML Compilation Chain 24 CP + b sb _mVar1 SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT false NONE CP * y _mVar2 _mVar3
  • 25. Selected Algebraic Simplification Rewrites 25 Name Dynamic Pattern Remove Unnecessary Indexing X[a:b,c:d] = Y à X = Y iff dims(X)=dims(Y) X = Y[, 1] à X = Y iff ncol(Y)=1 Remove Empty Matrix Multiply X%*%Y à matrix(0,nrow(X),ncol(Y)) iff nnz(X)=0|nnz(Y)=0 Removed Unnecessary Outer Product X*(Y%*%matrix(1,...)) à X*Y iff ncol(Y)=1 Simplify Diag Aggregates sum(diag(X))àtrace(X) iff ncol(X)=1 SimplifyMatrix Mult Diag diag(X)%*%Y à X*Y iff ncol(X)=1&ncol(Y)=1 Simplify Diag Matrix Mult diag(X%*%Y) à rowSums(X*t(Y)) iff ncol(Y)>1 Simplify Dot Product Sum sum(X^2) à t(X)%*%X iff ncol(X)=1 Name Static Pattern Remove Unnecessary Operations t(t(X)), X/1, X*1, X-0 à X matrix(1,)/X à 1/X rand(,min=-1,max=1)*7 à rand(,min=-7,max=7) Binary to Unary X+X à 2*X X*X à X^2 X-X*Y à X*(1-Y) Simplify Diag Aggregates trace(X%*%Y)àsum(X*t(Y))
  • 26. A Data Scientist – Linear Regression 26 X ≈ Explanatory/ Independent Variables Predicted/ Dependant VariableModel w w = argminw ||Xw-y||2 +λ||w||2 Optimization Problem: next direction Iterate until convergence initialize step size update w initial direction accuracy measures Conjugate GradientMethod: • Start off with the (negative) gradient • For each step 1. Move to the optimal point along the chosen direction; 2. Recompute the gradient; 3. Project it onto the subspace conjugate* to allprior directions; 4. Use this as the next direction (* conjugate =orthogonalgiven A as the metric) A = XT X + λ y
  • 27. SystemML – Run LinReg CG on Spark 27 100M 10,000 100M 1 yX 100M 1,000 X 100M 100 X 100M 10 X 100M 1 y 100M 1 y 100M 1 y 8 TB 800 GB 80 GB 8 GB … tMMp … Multithreaded Single Node 20 GB Driver on 16c 6 x 55 GB Executors Hybrid Plan with RDD caching and fused operator Hybrid Plan with RDD out-of- core and fused operator Hybrid Plan with RDD out-of- core and different operators … x.persist(); ... X.mapValues(tMMp ) .reduce () … Driver Fused Executors … RDD cache: X tMMv tMMv … x.persist(); ... X.mapValues(tMMp) .reduce() ... Executors … RDD cache: X tMMv tMMv Driver Spilling … x.persist(); ... // 2 MxV mult // with broadcast, // mapToPair, and // reduceByKey ... Executors … RDD cache: X Mv tvM Mv tvM Driver Driver Cache
  • 28. LinReg CG for varying Data 28 8 GB 100M x 10 80 GB 100M x 100 800 GB 100M x 1K 8 TB 100M x 10K CP+Spark 21 92 2,065 40,395 Spark 76 124 2,159 40,130 CP+MR 24 277 2,613 41,006 10 100 1,000 10,000 100,000 ExecutionTimeinsecs(logscale) Data Size Note Driver w+h 20 GB, 16c 6 Executors each 55 GB, 24c Convergence in 3-4 itera+ons SystemML as of 10/2015 Single node MT avoids Spark Ctx & distributed ops 3.6 x Hybrid plan & RDD caching 3x Out of Core 1.2x Fully Utilized Ø Cost-based optimization is important Ø Hybrid execution plans benefit especially medium- sized data sets Ø Aggregated in-memory data sets are sweet spot for Spark esp. for iterative algorithms Ø Graceful degradation for out-of-core
  • 29. Apache SystemML - Summary • Cost-based compilation of machine learning algorithms generates execution plans • for single-node in-memory, cluster, and hybrid execution • for varying data characteristics: • varying number of observations (1,000s to 10s of billions) • varying number of variables (10s to 10s of millions) • dense and sparse data • for varying cluster characteristics (memory configurations, degree of parallelism) • Out-of-the-box, scalable machine learning algorithms • e.g. descriptive statistics, regression, clustering, and classification • "Roll-your-own" algorithms • Enable programmer productivity (no worry about scalability, numeric stability, and optimizations) • Fast turn-around for new algorithms • Higher-level language shields algorithm development investment from platform progression • Yarn for resource negotiation and elasticity • Spark for in-memory, iterative processing 29
  • 30. Roadmap • Algorithms • kNN, word2vec, non-linear SVM, etc. • Deep learning • Engine • Compressed Linear Algebra • Code Gen • Extensions for Deep Learning • GPU backend • Usability • DML notebook • Language integration • API cleanup 30
  • 31. Research Papers • Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large Scale Machine Learning. Conditional Accept at VLDB 2016 • Matthias Boehm, Michael W. Dusenberry, Deron Eriksson, Alexandre V. Evfimievski, FarazMakari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick R. Reiss, PrithvirajSen, Arvind C. Surve, Shirish Tatikonda. SystemML: Declarative Machine Learning on Spark. VLDB 2016 • Botong Huang, Matthias Boehm, Yuanyuan Tian, Berthold Reinwald, Shirish Tatikonda, Frederick R. Reiss: Resource Elasticity for Large-Scale Machine Learning. SIGMOD Conference 2015:137-152 • Arash Ashari,Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, P. Sadayappan: On optimizing machine learning workloads via kernel fusion. PPOPP 2015:173-182 • Sebastian Schelter, Juan Soto, Volker Markl, Douglas Burdick, Berthold Reinwald, Alexandre V. Evfimievski: Efficient sample generation for scalable meta learning. ICDE 2015:1191-1202 • Matthias Boehm, Douglas R. Burdick,Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, PrithvirajSen, Shirish Tatikonda, Yuanyuan Tian: SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng. Bull. 37(3):52-62 (2014) • Matthias Boehm, Shirish Tatikonda, Berthold Reinwald, PrithvirajSen, Yuanyuan Tian, Douglas Burdick, Shivakumar Vaithyanathan: Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML. PVLDB 7(7): 553-564 (2014) • Peter D. Kirchner, Matthias Boehm, Berthold Reinwald, Daby M. Sow, Michael Schmidt, Deepak S. Turaga, Alain Biem: Large Scale Discriminative Metric Learning. IPDPS Workshops2014:1656-1663 • Yuanyuan Tian, Shirish Tatikonda, Berthold Reinwald: Scalable and Numerically Stable Descriptive Statistics in SystemML. ICDE 2012:1351-1359 • Amol Ghoting, Rajasekar Krishnamurthy,Edwin P. D. Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan: SystemML: Declarative machine learning on MapReduce. ICDE 2011:231-242 31 Custom Algorithm Optimizer Resource Elasticity GPU Sampling Numeric Stability Task Parallelism 1st paper on Spark Compression