SlideShare a Scribd company logo
Scalable Machine/Deep Learning with
Apache SystemML on Power
Berthold Reinwald
IBM Research – Almaden
San Jose, CA
Nov. 17th, 2017
 Use cases
 What is Apache SystemML
 Demos on Power
– Handwritten Digits Image Classification
– Medical Image Segmentation
 Inside SystemML
– Compiler, optimizer, and runtime
– Advanced Features
Tumor Proliferation Score
Medical Image Segmentation
Enterprise Use cases for Scalable Machine Learning
 Insurance
 Problem Description
– optimal subset of features that leads to the best regr model
 Problem Size
– 1.1M observations, 95 features, Subsets of 15 variables
 Algorithm
– Parallelization of independent model building
 Automotive
 Problem Description
– Customer Satisfaction
 Problem Size
– 2 mill cars with 8,000 reacquired cars, 10 mill repair cases, 25 mill
parts exchanges
 Algorithms
– Logistic regression using ~22k feature variables
– Increasing the #features from ~250 to ~21,800, improved
precision/recall by order of magnitude
– Sequence mining using very low support value
– Very large number of intermediate result sequences.
 Air Transportation
 Problem Description
– Predict passenger volumes at locations in an airport
 Problem Size
– WiFi data with ~66 M rows for ~1.3 M MAC addr.
 Algorithms
– Multiple models per location, per passenger type
– Time-series analysis using seasonal and non-seasonal auto-
regressive, moving average components along with differencing
operations (Arima and Holt-Winters triple exponential smoothing)
Financial Services
Problem Description
– Compute correlations between Financial Analysts’
performance metrics and sentiments extracted from surveys
submitted by them
– Descriptive (Bivariate) Statistics: Chi-squared test, Spearman’s
Rho, Gamma, Kendall’s Tau-B, Odds-Ratio test, F-test (stratified
and unstratified)
Retail Banking
Problem Description
– Use statistical analysis on social media data linked to the bank’s
data to identify customer segments of interest, find predictors
of purchase intent, and gauge sentiment towards bank’s
– Bivariate odds ratios and binomial proportions with confidence
Services Company
– Compute a benchmark index by mapping producers’ financial
reports into a normalized schema, using analytics to extrapolate
missing reports and/or impute missing values.
– Regularized least-squares loss minimization and Gibbs sampling
(MCMC) jointly over the parameter space and over the missing
(estimated) values
Why Apache SystemML
 Today’s Roles of Data Scientists
– Algorithm researcher: Invent new optimization schemes
– Systems programmer: provide distributed
– Deployment engineer: Run for varying datasets
– Systems researcher: Optimize clusters
 SystemML simplifies the Life of Data Scientists
– in implementing custom machine learning
– running algorithms distributed if needed
– running algorithms varying from small data to large data
Apache SystemML – Declarative Machine Learning
 Productivity of data scientists
– Machine learning language for data scientists
(“The SQL for analytics”)
– Strong foundation in linear algebra and statistical functions
– Comes with approx. 20+ algorithms pre-implemented
– Enable Solutions development and Tools
 Scalability & Performance
– Built on data parallel platforms, e.g. Spark
 Cost-based optimizer to compile execution plans
– Depending on data characteristics (tall/skinny, short/wide) and cluster
– Ranging from in-memory single node to clusters (MapReduce, Spark),
and hybrid plans
 APIs & Tools
– Command line: standalone Java app, spark-submit, hadoop jar
– Use in Spark through Scala, Python, R, and Java APIs
– Embeddable scoring library
– Tools: REPL (Scala Spark and pyspark), SparkR, SparkML,
Jupyter, Zeppelin Notebooks
Hadoop or
Spark Cluster
Single Node
GPU backend
In progress
SystemML integrated in Spark Ecosystem
Spark Core Engine
Streaming (MLlib)
Machine Learning
Spark API to SystemML
SystemML to run against Spark
core for distributed
Apache SystemML Open Source
 Apache Open source Project (
– Nov. 2015, Start SystemML Apache Incubator Project
– …
– Feb. 2017, Release 0.12.0 on Spark 1.6.x …, Python API.
May 2017, Release 0.14.0 on Spark 2.0.2+.
– May 2017, Apache Top Level Project
– Sep 2017, Release 0.15
 Release downloads (
– Binaries
– Coordinates to Maven repository
 Github source code (
 Documentation (
 3 Hours KDD Hands-On Tutorial (
kdd2017.html), Aug. 2017
SystemML’s Scalable Algorithms
Category Description
Descriptive Statistics
Stratified Bivariate
Logistic Regression (multinomial)
Multi-Class SVM, non-linear SVM
Naïve Bayes (multinomial)
Decision Trees
Random Forest
Clustering k-Means
Linear Regression system of equations
CG (conjugate gradient descent)
Generalized Linear
Models (GLM)
Distributions: Gaussian, Poisson, Gamma, Inverse Gaussian, Binomial, Bernoulli
Links for all distributions: identity, log, sq. root, inverse, 1/μ2
Links for Binomial / Bernoulli: logit, probit, cloglog, cauchit
Dimension Reduction PCA, Probabilistic PCA
Matrix Factorization ALS
direct solve
CG (conjugate gradient descent)
Survival Models
Kaplan Meier Estimate
Cox Proportional Hazard Regression
Deep Learning Autoencoder, word2vec, CNN, LSTM, RBM … and Deep Learning Library (DML-bodied) functions
Predict Algorithm-specific scoring
Transformation (native) Recoding, dummy coding, binning, scaling, missing value imputation
PMML models lm, kmeans, svm, glm, mlogit 10
Effect of Deep Learning: ImageNet Large-Scale Visual
Recognition Challenge
ResNet (34 layer)
 Fully connected layer
Reference: Convolutional Neural Networks for Visual Recognition.
• Fully connected layer
• Convolution layer
• Less number of parameters as
compared to FC
• Useful to capture local
features (spatially)
• Output #channels = #filters
Reference: Convolutional Neural Networks for Visual Recognition.
Deep Learning Support
NN library: Reuse existing infrastructure to implement
custom DNNs like other training algorithms
 Small number of DL-specific built-in functions
– e.g. convolution
 NN library of layers and training optimizers to stack layers, e.g.
– Affine (fully-connected) layer is matrix multiplication
– Convolution layer invokes new convolution function
 Caffe/Keras2DML to import existing DNNs
 Transfer learning to continue training on different data
 GPU and native BLAS libraries
Handwritten Digits Image Classification
Using LeNet CNN
Medical Image Segmentation
Using U-Net CNN
Automatic Algebraic Simplification Rewrites lead to
Significant Performance Improvements
 Simplify operations over mmult  Eliminate unnecessary compute
– trace (X %*% Y)  sum(X * t(Y))
 Remove unnecessary operations  Merging operations
– rand (…, min=-1, max=1) * 7
 rand (…, min=-7, max=7)
 Binary to unary operations  Reduce amount of data touched
– X*X
 X^2
 Remove unnecessary Indexing  Eliminate operations (conditional)
– X[a:b,c:d] = Y
 X = Y iff dims(X)=dims(Y)
 … 10’s more rewrite rules 23
Compilation Chain
Compressed Linear Algebra (CLA)
 Motivation: Iterative ML algorithms with I/O-bound MV multiplications
 Key Ideas: Use lightweight DB compression techniques and perform LA
operations on compressed matrices (w/o decompression)
 Experiments
– LinregCG, 10 iterations, SystemML 0.14
– 1+6 node cluster, Spark 2.1
Dataset Gzip Snappy CLA
Higgs 1.93 1.38 2.17
Census 17.11 6.04 35.69
Covtype 10.40 6.13 18.19
ImageNet 5.54 3.35 7.34
Mnist8m 4.12 2.60 7.32
Airline78 7.07 4.28 7.44
Compression Ratios
Mnist40m Mnist240m Mnist480m
Snappy (RDD Compression)
End-to-End Performance [sec]
90GB 540GB 1.1TB
Code Generation for Operator Fusion
 Motivation
– Ubiquitous Fusion Opportunities
– High Performance Impact
 Key Ideas
– Templates skeletons (Row, Cell, Outer, MultiAgg)
– Candidate exploration to identify fusion opportunities
– Candidate selection via cost-based optimizer or heuristics
– Codegen with janino / javac during compile and dynamic recompile
b(*)u(^2) u(^2)
sumsum sum
┬X * logsum
Codegen Micro Benchmarks (FP64)
sum(X ʘ Y ʘ Z), dense sum(X ʘ Y ʘ Z), sparse
(X v), dense
Data size
20K x 20K
sum(X ʘ log(UV
+ 1e-15))
#1 Gen close
to hand-coded
fused ops
#2 TF/Julia Gen
only single-
#3 TF w/ very
limited sparse
#4 Sparse Gen
Gen better
than hand-
coded ops
#5 TF w/ poor
for data-
intensive ops,
#6 Gen at
peak mem
#7 Autom.
across chains
of ops
SystemML on Power Environment
 Contributed native ppc64le libraries for Jcuda to mavenized jcuda
– GPU backend on Power for SystemML
 Contributed native ppc64le libraries to protoc project
– Useful for compiling Caffe proto files
 Supported native BLAS operations in SystemML
– Matrix Multiplication, Convolution (forward/backward)
– OpenBLAS with OpenMP support
Linear Regression Conjugate Gradient
(preliminary 1/2)
64 128 256 512 1024 2048
No. of Rows of input matrix (in Thousands)
x86 CPU Time
x86 GPU Time
Data: random with sparsity 0.95, 1000 features
Icpt: 0, maxi: 20, tol: 0.001, reg: 0.01
Driver-memory: 100G, local[*] master
M-V multiplication
chain is memory bound,
But more cores help
with parallelization.
Linear Regression Conjugate Gradient
(preliminary 2/2)
64 256 1024
No. of Rows of input matrix (in Thousands)
x86 GPU Time
Data: random with sparsity 0.95, 1000 features
Icpt: 0, maxi: 20, tol: 0.001, reg: 0.01
Driver-memory: 100G, local[*] master
64 256 1024
No. of Rows of input matrix (in Thousands)
CPU-GPU Transfer Time
PPC toDev Time
x86 toDev Time
Most of the time is spent
in transferring data from
host to device
-> 2x performance benefit
due to CPU-GPU NVLink
More Details
 Matthias Boehm, Alexandre Evfimievski, Niketan Pansare, Berthold Reinwald, Prithvi Sen: Declarative, Large-Scale Machine Learning with
Apache SystemML, 3 hours hands-on tutorial, KDD 2017
 Tarek Elgamal, Shangyu Luo, Matthias Boehm, Alexandre V. Evfimievski, Shirish Tatikonda, Berthold Reinwald, Prithviray Sen: SPOOF: Sum-
Product Optimization and Operator Fusion for Large-Scale Machine Learning. CIDR 2017
 Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large Scale
Machine Learning. VLDB 2016 (Best Paper Award)
– Extended Version to appear in VLDB Journal, 2017
– Summary Version to appear in ACM SIGMOD Record Research Highlights, 2017
 Matthias Boehm, Michael W. Dusenberry, Deron Eriksson, Alexandre V. Evfimievski, Faraz Makari Manshadi, Niketan Pansare, Berthold
Reinwald, Frederick R. Reiss, Prithviraj Sen, Arvind C. Surve, Shirish Tatikonda. SystemML: Declarative Machine Learning on Spark. VLDB
 Botong Huang, Matthias Boehm, Yuanyuan Tian, Berthold Reinwald, Shirish Tatikonda, Frederick R. Reiss: Resource Elasticity for Large-
Scale Machine Learning. SIGMOD 2015: 137-152
 Arash Ashari, Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, P. Sadayappan: On optimizing
machine learning workloads via kernel fusion. PPOPP 2015: 173-182
 Sebastian Schelter, Juan Soto, Volker Markl, Douglas Burdick, Berthold Reinwald, Alexandre V. Evfimievski: Efficient sample generation for
scalable meta learning. ICDE 2015: 1191-1202
 Matthias Boehm, Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Shirish
Tatikonda, Yuanyuan Tian: SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng.
Bull. 37(3): 52-62 (2014)
 Matthias Boehm, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen, Yuanyuan Tian, Douglas Burdick, Shivakumar Vaithyanathan: Hybrid
Parallelization Strategies for Large-Scale Machine Learning in SystemML. PVLDB 7(7): 553-564 (2014)
 Peter D. Kirchner, Matthias Boehm, Berthold Reinwald, Daby M. Sow, Michael Schmidt, Deepak S. Turaga, Alain Biem: Large Scale
Discriminative Metric Learning. IPDPS Workshop 2014: 1656-1663
 Yuanyuan Tian, Shirish Tatikonda, Berthold Reinwald: Scalable and Numerically Stable Descriptive Statistics in SystemML. ICDE 2012: 1351-
 Amol Ghoting, Rajasekar Krishnamurthy, Edwin P. D. Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan
Tian, Shivakumar Vaithyanathan: SystemML: Declarative machine learning on MapReduce. ICDE 2011: 231-242
1st paper
on Spark
Rewr & Fusion
Hands on
 SystemML simplifies the Life of Data Scientist
 Custom Machine/Deep Learning Algorithms
 Scale up & out
 Mixed Workloads
– Memory access bound
– Compute bound
 Strike Balance between
– Data transfer
– Parallelism

More Related Content

What's hot

DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Srivatsan Ramanujam
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
Ganesan Narayanasamy
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
Daniel S. Katz
Serving BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServeServing BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServe
Nidhin Pattaniyil
Apache Hama 0.4
Apache Hama 0.4Apache Hama 0.4
Apache Hama 0.4
Edward Yoon
Concept Drift: Monitoring Model Quality In Streaming ML Applications
Concept Drift: Monitoring Model Quality In Streaming ML ApplicationsConcept Drift: Monitoring Model Quality In Streaming ML Applications
Concept Drift: Monitoring Model Quality In Streaming ML Applications
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
Kazuaki Ishizaki
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learning
Mehdi Shibahara
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Spark Summit
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Modern Data Stack France
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
DataWorks Summit
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
jie cao
Apache Spark & MLlib
Apache Spark & MLlibApache Spark & MLlib
Apache Spark & MLlib
Grigory Sapunov
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, SparkDistributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Jan Wiegelmann
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
Facultad de Informática UCM
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Spark Summit
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...

What's hot (20)

DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
Serving BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServeServing BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServe
Apache Hama 0.4
Apache Hama 0.4Apache Hama 0.4
Apache Hama 0.4
Concept Drift: Monitoring Model Quality In Streaming ML Applications
Concept Drift: Monitoring Model Quality In Streaming ML ApplicationsConcept Drift: Monitoring Model Quality In Streaming ML Applications
Concept Drift: Monitoring Model Quality In Streaming ML Applications
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learning
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
Apache Spark & MLlib
Apache Spark & MLlibApache Spark & MLlib
Apache Spark & MLlib
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, SparkDistributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...

Similar to System mldl meetup

2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
Deepak Shankar
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
Ganesan Narayanasamy
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
Sanket Shikhar
Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017
Dan Crankshaw
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
Rusif Eyvazli
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
Data science and OSS
Data science and OSSData science and OSS
Data science and OSS
Kevin Crocker
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
Sri Ambati
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Josef A. Habdank
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Cluster Tutorial
Cluster TutorialCluster Tutorial
Cluster Tutorialcybercbm
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
Spark Summit
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico

Similar to System mldl meetup (20)

2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017Clipper at UC Berkeley RISECamp 2017
Clipper at UC Berkeley RISECamp 2017
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
Data science and OSS
Data science and OSSData science and OSS
Data science and OSS
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Cluster Tutorial
Cluster TutorialCluster Tutorial
Cluster Tutorial
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software

More from Ganesan Narayanasamy

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
Ganesan Narayanasamy
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
Ganesan Narayanasamy
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
Ganesan Narayanasamy
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
Ganesan Narayanasamy
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
Ganesan Narayanasamy
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
Ganesan Narayanasamy
Ganesan Narayanasamy
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
Ganesan Narayanasamy
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
Ganesan Narayanasamy
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
Ganesan Narayanasamy
Ganesan Narayanasamy
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
Ganesan Narayanasamy
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
Ganesan Narayanasamy
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
Ganesan Narayanasamy
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
Ganesan Narayanasamy
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
Ganesan Narayanasamy
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
Ganesan Narayanasamy
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
Ganesan Narayanasamy
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
Ganesan Narayanasamy

More from Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise

Recently uploaded

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications

Recently uploaded (20)

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications

System mldl meetup

  • 1. Scalable Machine/Deep Learning with Apache SystemML on Power Berthold Reinwald IBM Research – Almaden San Jose, CA Nov. 17th, 2017 1
  • 2. Agenda  Use cases  What is Apache SystemML  Demos on Power – Handwritten Digits Image Classification – Medical Image Segmentation  Inside SystemML – Compiler, optimizer, and runtime – Advanced Features 2
  • 5. Enterprise Use cases for Scalable Machine Learning 5  Insurance  Problem Description – optimal subset of features that leads to the best regr model  Problem Size – 1.1M observations, 95 features, Subsets of 15 variables  Algorithm – Parallelization of independent model building  Automotive  Problem Description – Customer Satisfaction  Problem Size – 2 mill cars with 8,000 reacquired cars, 10 mill repair cases, 25 mill parts exchanges  Algorithms – Logistic regression using ~22k feature variables – Increasing the #features from ~250 to ~21,800, improved precision/recall by order of magnitude – Sequence mining using very low support value – Very large number of intermediate result sequences.  Air Transportation  Problem Description – Predict passenger volumes at locations in an airport  Problem Size – WiFi data with ~66 M rows for ~1.3 M MAC addr.  Algorithms – Multiple models per location, per passenger type – Time-series analysis using seasonal and non-seasonal auto- regressive, moving average components along with differencing operations (Arima and Holt-Winters triple exponential smoothing) Financial Services Problem Description – Compute correlations between Financial Analysts’ performance metrics and sentiments extracted from surveys submitted by them Algorithms – Descriptive (Bivariate) Statistics: Chi-squared test, Spearman’s Rho, Gamma, Kendall’s Tau-B, Odds-Ratio test, F-test (stratified and unstratified) Retail Banking Problem Description – Use statistical analysis on social media data linked to the bank’s data to identify customer segments of interest, find predictors of purchase intent, and gauge sentiment towards bank’s products. Algorithms – Bivariate odds ratios and binomial proportions with confidence intervals Services Company Problem – Compute a benchmark index by mapping producers’ financial reports into a normalized schema, using analytics to extrapolate missing reports and/or impute missing values. Algorithms – Regularized least-squares loss minimization and Gibbs sampling (MCMC) jointly over the parameter space and over the missing (estimated) values • •
  • 6. Why Apache SystemML  Today’s Roles of Data Scientists – Algorithm researcher: Invent new optimization schemes – Systems programmer: provide distributed implementations – Deployment engineer: Run for varying datasets – Systems researcher: Optimize clusters  SystemML simplifies the Life of Data Scientists – in implementing custom machine learning – running algorithms distributed if needed – running algorithms varying from small data to large data NIPS ICML KDD JMLR 6
  • 7. Apache SystemML – Declarative Machine Learning  Productivity of data scientists – Machine learning language for data scientists (“The SQL for analytics”) – Strong foundation in linear algebra and statistical functions – Comes with approx. 20+ algorithms pre-implemented – Enable Solutions development and Tools  Scalability & Performance – Built on data parallel platforms, e.g. Spark  Cost-based optimizer to compile execution plans – Depending on data characteristics (tall/skinny, short/wide) and cluster characteristics – Ranging from in-memory single node to clusters (MapReduce, Spark), and hybrid plans  APIs & Tools – Command line: standalone Java app, spark-submit, hadoop jar – Use in Spark through Scala, Python, R, and Java APIs – Embeddable scoring library – Tools: REPL (Scala Spark and pyspark), SparkR, SparkML, Jupyter, Zeppelin Notebooks Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) Runtime Compiler Language GPU backend In progress 7
  • 8. SystemML integrated in Spark Ecosystem Spark Core Engine Spark SQL Spark Streaming (MLlib) GraphX (SystemML) Analytics Library Custom Analytics Machine Learning DataFrame Spark API to SystemML SystemML to run against Spark core for distributed computations 8
  • 9. Apache SystemML Open Source  Apache Open source Project ( – Nov. 2015, Start SystemML Apache Incubator Project – … – Feb. 2017, Release 0.12.0 on Spark 1.6.x …, Python API. May 2017, Release 0.14.0 on Spark 2.0.2+. – May 2017, Apache Top Level Project – Sep 2017, Release 0.15  Release downloads ( – Binaries – Coordinates to Maven repository  Github source code (  Documentation (  3 Hours KDD Hands-On Tutorial ( kdd2017.html), Aug. 2017 9
  • 10. SystemML’s Scalable Algorithms Category Description Descriptive Statistics Univariate Bivariate Stratified Bivariate Classification Logistic Regression (multinomial) Multi-Class SVM, non-linear SVM Naïve Bayes (multinomial) Decision Trees Random Forest kNN Clustering k-Means Regression Linear Regression system of equations CG (conjugate gradient descent) Generalized Linear Models (GLM) Distributions: Gaussian, Poisson, Gamma, Inverse Gaussian, Binomial, Bernoulli Links for all distributions: identity, log, sq. root, inverse, 1/μ2 Links for Binomial / Bernoulli: logit, probit, cloglog, cauchit Stepwise Linear GLM Lasso Dimension Reduction PCA, Probabilistic PCA Matrix Factorization ALS direct solve CG (conjugate gradient descent) Survival Models Kaplan Meier Estimate Cox Proportional Hazard Regression Deep Learning Autoencoder, word2vec, CNN, LSTM, RBM … and Deep Learning Library (DML-bodied) functions Predict Algorithm-specific scoring Transformation (native) Recoding, dummy coding, binning, scaling, missing value imputation PMML models lm, kmeans, svm, glm, mlogit 10
  • 11. Effect of Deep Learning: ImageNet Large-Scale Visual Recognition Challenge 11 AlexNet GoogleNet ResNet (34 layer)
  • 12. Layers  Fully connected layer Reference: Convolutional Neural Networks for Visual Recognition. 13
  • 13. Layers • Fully connected layer • Convolution layer • Less number of parameters as compared to FC • Useful to capture local features (spatially) • Output #channels = #filters Reference: Convolutional Neural Networks for Visual Recognition. 14
  • 14. Deep Learning Support NN library: Reuse existing infrastructure to implement custom DNNs like other training algorithms  Small number of DL-specific built-in functions – e.g. convolution  NN library of layers and training optimizers to stack layers, e.g. – Affine (fully-connected) layer is matrix multiplication – Convolution layer invokes new convolution function  Caffe/Keras2DML to import existing DNNs  Transfer learning to continue training on different data  GPU and native BLAS libraries
  • 17. Automatic Algebraic Simplification Rewrites lead to Significant Performance Improvements  Simplify operations over mmult  Eliminate unnecessary compute – trace (X %*% Y)  sum(X * t(Y))  Remove unnecessary operations  Merging operations – rand (…, min=-1, max=1) * 7  rand (…, min=-7, max=7)  Binary to unary operations  Reduce amount of data touched – X*X  X^2  Remove unnecessary Indexing  Eliminate operations (conditional) – X[a:b,c:d] = Y  X = Y iff dims(X)=dims(Y)  … 10’s more rewrite rules 23
  • 19. Compressed Linear Algebra (CLA)  Motivation: Iterative ML algorithms with I/O-bound MV multiplications  Key Ideas: Use lightweight DB compression techniques and perform LA operations on compressed matrices (w/o decompression)  Experiments – LinregCG, 10 iterations, SystemML 0.14 – 1+6 node cluster, Spark 2.1 Dataset Gzip Snappy CLA Higgs 1.93 1.38 2.17 Census 17.11 6.04 35.69 Covtype 10.40 6.13 18.19 ImageNet 5.54 3.35 7.34 Mnist8m 4.12 2.60 7.32 Airline78 7.07 4.28 7.44 Compression Ratios 89 3409 5663 135 765 2730 93 463 998 0 1000 2000 3000 4000 5000 6000 Mnist40m Mnist240m Mnist480m Uncompressed Snappy (RDD Compression) CLA End-to-End Performance [sec] 90GB 540GB 1.1TB 26
  • 20. Code Generation for Operator Fusion  Motivation – Ubiquitous Fusion Opportunities – High Performance Impact  Key Ideas – Templates skeletons (Row, Cell, Outer, MultiAgg) – Candidate exploration to identify fusion opportunities – Candidate selection via cost-based optimizer or heuristics – Codegen with janino / javac during compile and dynamic recompile X Y b(*)u(^2) u(^2) sumsum sum Multi-Aggregate a=sum(X^2) b=sum(X*Y) c=sum(Y^2) X Y Z* sum * 1st pass X v X 2nd pass q ┬ U V ┬X * logsum sparsity exploitation 27
  • 21. Codegen Micro Benchmarks (FP64) sum(X ʘ Y ʘ Z), dense sum(X ʘ Y ʘ Z), sparse Sparsity 0.1 X ┬ (X v), dense Data size 20K x 20K sum(X ʘ log(UV ┬ + 1e-15)) #1 Gen close to hand-coded fused ops #2 TF/Julia Gen only single- threaded #3 TF w/ very limited sparse support #4 Sparse Gen challenging, Gen better than hand- coded ops #5 TF w/ poor performance for data- intensive ops, #6 Gen at peak mem bandwidth #7 Autom. Sparsity exploitation across chains of ops
  • 22. SystemML on Power Environment  Contributed native ppc64le libraries for Jcuda to mavenized jcuda project – GPU backend on Power for SystemML  Contributed native ppc64le libraries to protoc project – Useful for compiling Caffe proto files  Supported native BLAS operations in SystemML – Matrix Multiplication, Convolution (forward/backward) – OpenBLAS with OpenMP support 30
  • 23. Linear Regression Conjugate Gradient (preliminary 1/2) 31 0 2 4 6 8 10 12 14 64 128 256 512 1024 2048 TimeinSeconds No. of Rows of input matrix (in Thousands) PPC CPU Time PPC GPU Time x86 CPU Time x86 GPU Time Data: random with sparsity 0.95, 1000 features Icpt: 0, maxi: 20, tol: 0.001, reg: 0.01 Driver-memory: 100G, local[*] master M-V multiplication chain is memory bound, But more cores help with parallelization.
  • 24. Linear Regression Conjugate Gradient (preliminary 2/2) 32 0 2 4 6 8 10 12 14 64 256 1024 TimeinSeconds No. of Rows of input matrix (in Thousands) PPC GPU Time x86 GPU Time Data: random with sparsity 0.95, 1000 features Icpt: 0, maxi: 20, tol: 0.001, reg: 0.01 Driver-memory: 100G, local[*] master 0 1 2 3 4 5 6 7 64 256 1024 TimeinSeconds No. of Rows of input matrix (in Thousands) CPU-GPU Transfer Time PPC toDev Time x86 toDev Time Most of the time is spent in transferring data from host to device -> 2x performance benefit due to CPU-GPU NVLink
  • 25. More Details  Matthias Boehm, Alexandre Evfimievski, Niketan Pansare, Berthold Reinwald, Prithvi Sen: Declarative, Large-Scale Machine Learning with Apache SystemML, 3 hours hands-on tutorial, KDD 2017  Tarek Elgamal, Shangyu Luo, Matthias Boehm, Alexandre V. Evfimievski, Shirish Tatikonda, Berthold Reinwald, Prithviray Sen: SPOOF: Sum- Product Optimization and Operator Fusion for Large-Scale Machine Learning. CIDR 2017  Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large Scale Machine Learning. VLDB 2016 (Best Paper Award) – Extended Version to appear in VLDB Journal, 2017 – Summary Version to appear in ACM SIGMOD Record Research Highlights, 2017  Matthias Boehm, Michael W. Dusenberry, Deron Eriksson, Alexandre V. Evfimievski, Faraz Makari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Arvind C. Surve, Shirish Tatikonda. SystemML: Declarative Machine Learning on Spark. VLDB 2016  Botong Huang, Matthias Boehm, Yuanyuan Tian, Berthold Reinwald, Shirish Tatikonda, Frederick R. Reiss: Resource Elasticity for Large- Scale Machine Learning. SIGMOD 2015: 137-152  Arash Ashari, Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, P. Sadayappan: On optimizing machine learning workloads via kernel fusion. PPOPP 2015: 173-182  Sebastian Schelter, Juan Soto, Volker Markl, Douglas Burdick, Berthold Reinwald, Alexandre V. Evfimievski: Efficient sample generation for scalable meta learning. ICDE 2015: 1191-1202  Matthias Boehm, Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian: SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng. Bull. 37(3): 52-62 (2014)  Matthias Boehm, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen, Yuanyuan Tian, Douglas Burdick, Shivakumar Vaithyanathan: Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML. PVLDB 7(7): 553-564 (2014)  Peter D. Kirchner, Matthias Boehm, Berthold Reinwald, Daby M. Sow, Michael Schmidt, Deepak S. Turaga, Alain Biem: Large Scale Discriminative Metric Learning. IPDPS Workshop 2014: 1656-1663  Yuanyuan Tian, Shirish Tatikonda, Berthold Reinwald: Scalable and Numerically Stable Descriptive Statistics in SystemML. ICDE 2012: 1351- 1359  Amol Ghoting, Rajasekar Krishnamurthy, Edwin P. D. Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan: SystemML: Declarative machine learning on MapReduce. ICDE 2011: 231-242 Custom Algorithm Optimizer Resource Elasticity GPU Sampling Numeric Stability Task Parallelism 1st paper on Spark Compression Automatic Rewr & Fusion 33 Hands on Tutorial
  • 26. Summary  SystemML simplifies the Life of Data Scientist  Custom Machine/Deep Learning Algorithms  Scale up & out  Mixed Workloads – Memory access bound – Compute bound  Strike Balance between – Data transfer – Parallelism 34

Editor's Notes

  1. 7