Apache	SystemML Class
”I	predict	what	you	will	do	next	summer.”
Summer	2016
1
Class	Description
• Goal
• Teach	scalable	machine	learning	with	Apache	SystemML
• Attract	potential	contributors
• Audience
• Initially	summer	interns,	but	goal	of	developing	/	folding	into	
University	class
• Duration	~16	hours
• Content
• Development	of	scalable	machine	learning	algorithms
• SystemML usage	and	hands-on	exercises
• Advanced	SystemMLinternals
• Office	hours
• At	Adlab:	Thursday,	4-5	pm			(may	be	expanded	as	demanded)
2
Outline
1. SystemML Primer	
2. Machine	Learning	Algorithms
3. Advanced	SystemML Internals
3
SystemML	Primer
• Goal
• Teach	enough	DML,	SystemML	usage,	and	Spark	for	
people	to	be	able	to	write	and	run	SystemML	algorithms	
on	Spark	and	understand	its	execution.
• Content
• DML	syntax
• SystemML	usage	
• Some	Spark
4
Machine	Learning	Algorithms
• Descriptive	Statistics,	Data	Preparation,	and	
Train/Test/Cross-Validation
• Regression
• Classification
• Clustering	&	
Matrix	Factorization
5
For	each	session	/	chosen	algorithm	have	a	
similar	structure:
• Possible	Applications
• Math	/	Alternatives	/	Discussion
• DML	formulation
• Data	generation
• Hands-on	exercises
• Performance
• Accuracy
Advanced	SystemML	Internals
• Architecture
• Compiler
• Rewrites
• Optimizer
• Runtime
• Buffer	pool
• Storage
• Advanced	Operators
• Spark	Backend
• Performance	debugging
6
7
S#	/
Date
Category Title Content Instructor
S#	/	
Date
Category Title Content Instructor
S1
6/21:
9-12	am
R:	
G1-404
SystemML
Primer
Scalable	
Machine	
Learning	with	
Apache	
SystemML
• Intro	ML
• DML
• SystemML usage
• Architecture
Berthold	
Reinwald,
Nakul	
Jindal
S5
7/18
4-6
R:	
ML	Algs
Clustering	&	
Matrix	
Factorization
• kMeans,	mf,
ALS,	PCA,	…)
• DML	
• Data	gen
• Hands-on	
• Perf &	
Accuracy
Alexandre
Evfimievski,	
Prithvi Sen
S2
6/27:
4-6	pm
R:
ML	Algs
Data	Prep,	
Descriptive	
Statistics,	and	
Train/Test/Cr
oss-validation
• Math
• DML	
• Data-gen
• Hands-on	
• Perf &	Accuracy
Faraz
Makari
Manshadi
S6
7/25
4-6	pm
R:
SystemML
Internals
Apache	
SystemML
Architect.
• Architecture
• Hops/Lops
• CP/Cluster
Berthold	
Reinwald,	
Niketan
Pansare
S3
7/5:
4-6
R:	
ML	Algs Regression
• Linear, log.,	GLM,	
Cox,	Time	series;	CG	
method
• DML	
• Data-gen
• Hands-on	
• Perf &	Accuracy
Alexandre
Evfimievski
S7
8/1
4-6	pm
R:	
SystemML
Internals
Apache	
SystemML
Optimizer
• Rewrites
• Optimizer
• Cost	model
Matthias
Boehm,	
Arvind	Surve
S4
7/11:
4-6
R:	
ML Algs Classificat.
• NaïveBayes,	SVM,	
decTree,	RF
• DML
• Data-gen
• Hands-on	
• Perf &	Accuracy
Prithvi Sen
S8
8/8
4-6	pm
R:	
SystemML
Internals
Apache	
SystemML
Runtime
• Buffer	pool
• Storage
• Spark
backend
• Matrix	block	
lib
• Performance	
debugging
Matthias	
Boehm,	
Arvind	Surve

Apache SystemML 2016 Summer class primer by Berthold Reinwald