Submit Search
Upload
Alpine Tech Talk: System ML by Berthold Reinwald
•
1 like
•
1,784 views
Chester Chen
Follow
Alpine Tech Talk Series: System ML by Berthold Reinwald IBM Almaden Research Center
Read less
Read more
Software
Report
Share
Report
Share
1 of 25
Download now
Download to read offline
Recommended
E05312426
E05312426
IOSR-JEN
Performance Comparison of Dimensionality Reduction Methods using MCDR
Performance Comparison of Dimensionality Reduction Methods using MCDR
AM Publications
STARBUCKS Site Selection Analysis drift
STARBUCKS Site Selection Analysis drift
Park JunPyo
Association Rule Mining using RHadoop
Association Rule Mining using RHadoop
IRJET Journal
Data Imputation by Soft Computing
Data Imputation by Soft Computing
ijtsrd
Automatic suggestion of query-rewrite rules for enterprise search
Automatic suggestion of query-rewrite rules for enterprise search
Yunyao Li
Dmml report final
Dmml report final
sarthakkhare3
Graph Analytics
Graph Analytics
Khalid Salama
Recommended
E05312426
E05312426
IOSR-JEN
Performance Comparison of Dimensionality Reduction Methods using MCDR
Performance Comparison of Dimensionality Reduction Methods using MCDR
AM Publications
STARBUCKS Site Selection Analysis drift
STARBUCKS Site Selection Analysis drift
Park JunPyo
Association Rule Mining using RHadoop
Association Rule Mining using RHadoop
IRJET Journal
Data Imputation by Soft Computing
Data Imputation by Soft Computing
ijtsrd
Automatic suggestion of query-rewrite rules for enterprise search
Automatic suggestion of query-rewrite rules for enterprise search
Yunyao Li
Dmml report final
Dmml report final
sarthakkhare3
Graph Analytics
Graph Analytics
Khalid Salama
Exploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active Learning
Yunyao Li
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
HPCC Systems
Automation Tool Development to Improve Machine Results using Data Analysis
Automation Tool Development to Improve Machine Results using Data Analysis
IRJET Journal
Developing Optimization Applications Quickly and Effectively with Algebraic M...
Developing Optimization Applications Quickly and Effectively with Algebraic M...
Bob Fourer
Churn Modeling For Mobile Telecommunications
Churn Modeling For Mobile Telecommunications
Salford Systems
Innovaccer service capabilities with case studies
Innovaccer service capabilities with case studies
Abhinav Shashank
Feature Importance Analysis with XGBoost in Tax audit
Feature Importance Analysis with XGBoost in Tax audit
Michael BENESTY
Big Data Graph Analytics
Big Data Graph Analytics
Cristian Spigariol
Building a Predictive Model
Building a Predictive Model
DKALab
Clustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining Techniques
IRJET Journal
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
Benjamin Bengfort
Issues in Query Processing and Optimization
Issues in Query Processing and Optimization
Editor IJMTER
Graphs and Financial Services Analytics
Graphs and Financial Services Analytics
Neo4j
Graph analytic and machine learning
Graph analytic and machine learning
Stanley Wang
IRJET- Solving Supply Chain Network Design Problem using Center of Gravit...
IRJET- Solving Supply Chain Network Design Problem using Center of Gravit...
IRJET Journal
Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016
Chester Chen
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
Chester Chen
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
Chester Chen
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inference
Kaushalya Madhawa
Paper Reading, "On Causal and Anticausal Learning", ICML-12
Paper Reading, "On Causal and Anticausal Learning", ICML-12
Yusuke Iwasawa
Neural_Programmer_Interpreter
Neural_Programmer_Interpreter
Katy Lee
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursion
Katy Lee
More Related Content
What's hot
Exploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active Learning
Yunyao Li
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
HPCC Systems
Automation Tool Development to Improve Machine Results using Data Analysis
Automation Tool Development to Improve Machine Results using Data Analysis
IRJET Journal
Developing Optimization Applications Quickly and Effectively with Algebraic M...
Developing Optimization Applications Quickly and Effectively with Algebraic M...
Bob Fourer
Churn Modeling For Mobile Telecommunications
Churn Modeling For Mobile Telecommunications
Salford Systems
Innovaccer service capabilities with case studies
Innovaccer service capabilities with case studies
Abhinav Shashank
Feature Importance Analysis with XGBoost in Tax audit
Feature Importance Analysis with XGBoost in Tax audit
Michael BENESTY
Big Data Graph Analytics
Big Data Graph Analytics
Cristian Spigariol
Building a Predictive Model
Building a Predictive Model
DKALab
Clustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining Techniques
IRJET Journal
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
Benjamin Bengfort
Issues in Query Processing and Optimization
Issues in Query Processing and Optimization
Editor IJMTER
Graphs and Financial Services Analytics
Graphs and Financial Services Analytics
Neo4j
Graph analytic and machine learning
Graph analytic and machine learning
Stanley Wang
IRJET- Solving Supply Chain Network Design Problem using Center of Gravit...
IRJET- Solving Supply Chain Network Design Problem using Center of Gravit...
IRJET Journal
What's hot
(15)
Exploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active Learning
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Automation Tool Development to Improve Machine Results using Data Analysis
Automation Tool Development to Improve Machine Results using Data Analysis
Developing Optimization Applications Quickly and Effectively with Algebraic M...
Developing Optimization Applications Quickly and Effectively with Algebraic M...
Churn Modeling For Mobile Telecommunications
Churn Modeling For Mobile Telecommunications
Innovaccer service capabilities with case studies
Innovaccer service capabilities with case studies
Feature Importance Analysis with XGBoost in Tax audit
Feature Importance Analysis with XGBoost in Tax audit
Big Data Graph Analytics
Big Data Graph Analytics
Building a Predictive Model
Building a Predictive Model
Clustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining Techniques
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
Issues in Query Processing and Optimization
Issues in Query Processing and Optimization
Graphs and Financial Services Analytics
Graphs and Financial Services Analytics
Graph analytic and machine learning
Graph analytic and machine learning
IRJET- Solving Supply Chain Network Design Problem using Center of Gravit...
IRJET- Solving Supply Chain Network Design Problem using Center of Gravit...
Viewers also liked
Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016
Chester Chen
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
Chester Chen
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
Chester Chen
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inference
Kaushalya Madhawa
Paper Reading, "On Causal and Anticausal Learning", ICML-12
Paper Reading, "On Causal and Anticausal Learning", ICML-12
Yusuke Iwasawa
Neural_Programmer_Interpreter
Neural_Programmer_Interpreter
Katy Lee
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursion
Katy Lee
[DL輪読会] Hybrid computing using a neural network with dynamic external memory
[DL輪読会] Hybrid computing using a neural network with dynamic external memory
Yusuke Iwasawa
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
홍배 김
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
Deep Learning JP
[DL輪読会]Unsupervised Cross-Domain Image Generation
[DL輪読会]Unsupervised Cross-Domain Image Generation
Deep Learning JP
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
Deep Learning JP
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
Deep Learning JP
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
Chester Chen
[DL輪読会] GAN系の研究まとめ (NIPS2016とICLR2016が中心)
[DL輪読会] GAN系の研究まとめ (NIPS2016とICLR2016が中心)
Yusuke Iwasawa
[DL輪読会]Understanding deep learning requires rethinking generalization
[DL輪読会]Understanding deep learning requires rethinking generalization
Deep Learning JP
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Alluxio, Inc.
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
Data Science Milan
Viewers also liked
(18)
Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inference
Paper Reading, "On Causal and Anticausal Learning", ICML-12
Paper Reading, "On Causal and Anticausal Learning", ICML-12
Neural_Programmer_Interpreter
Neural_Programmer_Interpreter
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursion
[DL輪読会] Hybrid computing using a neural network with dynamic external memory
[DL輪読会] Hybrid computing using a neural network with dynamic external memory
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Unsupervised Cross-Domain Image Generation
[DL輪読会]Unsupervised Cross-Domain Image Generation
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
[DL輪読会] GAN系の研究まとめ (NIPS2016とICLR2016が中心)
[DL輪読会] GAN系の研究まとめ (NIPS2016とICLR2016が中心)
[DL輪読会]Understanding deep learning requires rethinking generalization
[DL輪読会]Understanding deep learning requires rethinking generalization
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
Similar to Alpine Tech Talk: System ML by Berthold Reinwald
Internship Presentation.pdf
Internship Presentation.pdf
vishwajeetparmar1
AI for Software Engineering
AI for Software Engineering
Miroslaw Staron
big-data-anallytics.pptx
big-data-anallytics.pptx
Sangamesh Kalyan
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
Rising Media, Inc.
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
Vadlamudi Saketh
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
IRJET - House Price Predictor using ML through Artificial Neural Network
IRJET - House Price Predictor using ML through Artificial Neural Network
IRJET Journal
Applying linear regression and predictive analytics
Applying linear regression and predictive analytics
MariaDB plc
Demystifying Data Science
Demystifying Data Science
Data Science Milan
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
DATAVERSITY
BIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNING
IRJET Journal
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
sumitkumar600840
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
Data Science in the Elastic Stack
Data Science in the Elastic Stack
Rochelle Sonnenberg
Predictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive Industry
Matouš Havlena
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
Value Amplify Consulting
MLIntro_ADA.pptx
MLIntro_ADA.pptx
ADA Consulting
Bhadale group of companies data science project methodologies catalogue
Bhadale group of companies data science project methodologies catalogue
Vijayananda Mohire
Decision Making Framework in e-Business Cloud Environment Using Software Metr...
Decision Making Framework in e-Business Cloud Environment Using Software Metr...
ijitjournal
Similar to Alpine Tech Talk: System ML by Berthold Reinwald
(20)
Internship Presentation.pdf
Internship Presentation.pdf
AI for Software Engineering
AI for Software Engineering
big-data-anallytics.pptx
big-data-anallytics.pptx
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
IRJET - House Price Predictor using ML through Artificial Neural Network
IRJET - House Price Predictor using ML through Artificial Neural Network
Applying linear regression and predictive analytics
Applying linear regression and predictive analytics
Demystifying Data Science
Demystifying Data Science
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
BIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNING
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Data Science in the Elastic Stack
Data Science in the Elastic Stack
Predictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive Industry
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
MLIntro_ADA.pptx
MLIntro_ADA.pptx
Bhadale group of companies data science project methodologies catalogue
Bhadale group of companies data science project methodologies catalogue
Decision Making Framework in e-Business Cloud Environment Using Software Metr...
Decision Making Framework in e-Business Cloud Environment Using Software Metr...
More from Chester Chen
SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
Chester Chen
zookeeer+raft-2.pdf
zookeeer+raft-2.pdf
Chester Chen
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
Chester Chen
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
Chester Chen
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?
Chester Chen
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdata
Chester Chen
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
Chester Chen
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
Chester Chen
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
Chester Chen
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Chester Chen
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdap
Chester Chen
Sf big analytics: bighead
Sf big analytics: bighead
Chester Chen
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Chester Chen
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Chester Chen
2018 data warehouse features in spark
2018 data warehouse features in spark
Chester Chen
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
Chester Chen
2018 02 20-jeg_index
2018 02 20-jeg_index
Chester Chen
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
Chester Chen
More from Chester Chen
(20)
SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
zookeeer+raft-2.pdf
zookeeer+raft-2.pdf
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdata
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdap
Sf big analytics: bighead
Sf big analytics: bighead
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
2018 data warehouse features in spark
2018 data warehouse features in spark
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
2018 02 20-jeg_index
2018 02 20-jeg_index
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
Recently uploaded
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
masabamasaba
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
masabamasaba
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
ayushiqss
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
masabamasaba
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
Jim McKeeth
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
Papp Krisztián
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
masabamasaba
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Steffen Staab
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
Presentation.STUDIO
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
masabamasaba
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
masabamasaba
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
harshavardhanraghave
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
Willy Marroquin (WillyDevNET)
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
masabamasaba
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
The title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
kalichargn70th171
Recently uploaded
(20)
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
The title is not connected to what is inside
The title is not connected to what is inside
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
Alpine Tech Talk: System ML by Berthold Reinwald
1.
© 2015 IBM
Corporation SystemML Scalable Machine Learning SystemML Team, IBM Almaden – Research Presented by: Berthold Reinwald, Technical Lead reinwald@us.ibm.com September, 2015
2.
© 2015 IBM
Corporation SystemML in IBM BigInsights Data Scientist • Fruition of 5+ years research project • Overview SystemML engine Broad class of scalable algorithms Stand alone, MapReduce, Spark Open Source • Some Stats Proof of Concepts with customers 8 technical research publications 2 BigInsights Data Scientist module: Accelerate data science teams with advanced analytics to extract valuable insights from Hadoop • Big R: Statistical analysis & distributed frames using entire Hadoop cluster • Machine Learning: Scalable algorithms • Text Analytics: Text extraction via business web tooling
3.
© 2015 IBM
Corporation More Information • Sebastian Schelter, Juan Soto, Volker Markl, Douglas Burdick, Berthold Reinwald, Alexandre V. Evfimievski: Efficient sample generation for scalable meta learning. ICDE 2015: 1191-1202 • Arash Ashari, Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, P. Sadayappan: On optimizing machine learning workloads via kernel fusion. PPOPP 2015: 173-182 • Botong Huang, Matthias Boehm, Yuanyuan Tian, Berthold Reinwald, Shirish Tatikonda, Frederick R. Reiss: Resource Elasticity for Large-Scale Machine Learning. SIGMOD Conference 2015: 137-152 • Matthias Boehm, Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian: SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng. Bull. 37(3): 52-62 (2014) • Matthias Boehm, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen, Yuanyuan Tian, Douglas Burdick, Shivakumar Vaithyanathan: Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML. PVLDB 7(7): 553-564 (2014) • Peter D. Kirchner, Matthias Boehm, Berthold Reinwald, Daby M. Sow, Michael Schmidt, Deepak S. Turaga, Alain Biem: Large Scale Discriminative Metric Learning. IPDPS Workshops 2014: 1656-1663 • Yuanyuan Tian, Shirish Tatikonda, Berthold Reinwald: Scalable and Numerically Stable Descriptive Statistics in SystemML. ICDE 2012: 1351-1359 • Amol Ghoting, Rajasekar Krishnamurthy, Edwin P. D. Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan: SystemML: Declarative machine learning on MapReduce. ICDE 2011: 231-242 3 Algorithm Optimizer Resource Elasticity GPU Sampling Numeric Stability Task Parallelism 1st paper
4.
© 2015 IBM
Corporation SystemML Open Source announced in June 2015 4
5.
© 2015 IBM
Corporation Big Data Analytics Usecases • Insurance Problem Description • optimal subset of features that leads to the best regression model Problem Size • 1.1M observations, 95 features, Subsets of 15 variables Algorithm • Parallelization of independent model building • Automotive Problem Description • Customer Satisfaction Problem Size • 2 mill cars with 8,000 reacquired cars, 10 mill repair cases, 25 mill parts exchanges Algorithms • Logistic regression using ~22k feature variables • Increasing the #features from ~250 to ~21,800, improved precision/recall by order of magnitude • Sequence mining using very low support value • Very large number of intermediate result sequences. • Air Transportation Problem Description • Predict passenger volumes at locations in an airport Problem Size • WiFi data with ~66 M rows for ~1.3 M MAC addr. Algorithms • Multiple models per location, per passenger type • Time-series analysis using seasonal and non-seasonal auto-regressive, moving average components along with differencing operations (Arima and Holt-Winters triple exponential smoothing) • Financial Services Problem Description • Compute correlations between Financial Analysts’ performance metrics and sentiments extracted from surveys submitted by them Algorithms • Descriptive (Bivariate) Statistics: Chi-squared test, Spearman’s Rho, Gamma, Kendall’s Tau-B, Odds-Ratio test, F-test (stratified and unstratified) • Retail Banking Problem Description • Use statistical analysis on social media data linked to the bank’s data to identify customer segments of interest, find predictors of purchase intent, and gauge sentiment towards bank’s products. Algorithms • Bivariate odds ratios and binomial proportions with confidence intervals • Services Company Problem • Compute a benchmark index by mapping producers’ financial reports into a normalized schema, using analytics to extrapolate missing reports and/or impute missing values. Algorithms • Regularized least-squares loss minimization and Gibbs sampling (MCMC) jointly over the parameter space and over the missing (estimated) values 5 More • PCA on 7k attributes at rail road company • GLM on 10B rows at insurance company
6.
© 2015 IBM
Corporation SystemML • Algorithms expressed in a declarative, high-level language with R-like syntax • Cost-based compilation of algorithms to generate execution plans Compilation and parallelization • Based on data characteristics • Based on cluster and machine characteristics In-Memory single node and cluster execution • Enable algorithm developer productivity to build additional algorithms (scalability, numeric stability and optimizations) 6 Linear Regression
7.
© 2015 IBM
Corporation Example: Gaussian Non-negative Matrix Factorization 7 package gnmf; import java.io.IOException; import java.net.URISyntaxException; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.JobConf; public class MatrixGNMF { public static void main(String[] args) throws IOException, URISyntaxException { if(args.length < 10) { System.out.println("missing parameters"); System.out.println("expected parameters: [directory of v] [directory of w] [directory of h] " + "[k] [num mappers] [num reducers] [replication] [working directory] " + "[final directory of w] [final directory of h]"); System.exit(1); } String vDir = args[0]; String wDir = args[1]; String hDir = args[2]; int k = Integer.parseInt(args[3]); int numMappers = Integer.parseInt(args[4]); int numReducers = Integer.parseInt(args[5]); int replication = Integer.parseInt(args[6]); String outputDir = args[7]; String wFinalDir = args[8]; String hFinalDir = args[9]; JobConf mainJob = new JobConf(MatrixGNMF.class); String vDirectory; String wDirectory; String hDirectory; FileSystem.get(mainJob).delete(new Path(outputDir)); vDirectory = vDir; hDirectory = hDir; wDirectory = wDir; String workingDirectory; String resultDirectoryX; String resultDirectoryY; long start = System.currentTimeMillis(); System.gc(); System.out.println("starting calculation"); System.out.print("calculating X = WT * V... "); workingDirectory = UpdateWHStep1.runJob(numMappers, numReducers, replication, UpdateWHStep1.UPDATE_TYPE_H, vDirectory, wDirectory, outputDir, k); resultDirectoryX = UpdateWHStep2.runJob(numMappers, numReducers, replication, workingDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating Y = WT * W * H... "); workingDirectory = UpdateWHStep3.runJob(numMappers, numReducers, replication, wDirectory, outputDir); resultDirectoryY = UpdateWHStep4.runJob(numMappers, replication, workingDirectory, UpdateWHStep4.UPDATE_TYPE_H, hDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating H = H .* X ./ Y... "); workingDirectory = UpdateWHStep5.runJob(numMappers, numReducers, replication, hDirectory, resultDirectoryX, resultDirectoryY, hFinalDir, k); System.out.println("done"); FileSystem.get(mainJob).delete(new Path(resultDirectoryX)); FileSystem.get(mainJob).delete(new Path(resultDirectoryY)); System.out.print("storing back H... "); FileSystem.get(mainJob).delete(new Path(hDirectory)); hDirectory = workingDirectory; System.out.println("done"); System.out.print("calculating X = V * HT... "); workingDirectory = UpdateWHStep1.runJob(numMappers, numReducers, replication, UpdateWHStep1.UPDATE_TYPE_W, vDirectory, hDirectory, outputDir, k); resultDirectoryX = UpdateWHStep2.runJob(numMappers, numReducers, replication, workingDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating Y = W * H * HT... "); workingDirectory = UpdateWHStep3.runJob(numMappers, numReducers, replication, hDirectory, outputDir); resultDirectoryY = UpdateWHStep4.runJob(numMappers, replication, workingDirectory, UpdateWHStep4.UPDATE_TYPE_W, wDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating W = W .* X ./ Y... "); workingDirectory = UpdateWHStep5.runJob(numMappers, numReducers, replication, wDirectory, resultDirectoryX, resultDirectoryY, wFinalDir, k); System.out.println("done"); FileSystem.get(mainJob).delete(new Path(resultDirectoryX)); FileSystem.get(mainJob).delete(new Path(resultDirectoryY)); System.out.print("storing back W... "); FileSystem.get(mainJob).delete(new Path(wDirectory)); wDirectory = workingDirectory; System.out.println("done"); long requiredTime = System.currentTimeMillis() - start; long requiredTimeMilliseconds = requiredTime % 1000; requiredTime -= requiredTimeMilliseconds; requiredTime /= 1000; long requiredTimeSeconds = requiredTime % 60; requiredTime -= requiredTimeSeconds; requiredTime /= 60; long requiredTimeMinutes = requiredTime % 60; requiredTime -= requiredTimeMinutes; requiredTime /= 60; long requiredTimeHours = requiredTime; } } package gnmf; import gnmf.io.MatrixObject; import gnmf.io.MatrixVector; import gnmf.io.TaggedIndex; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.SequenceFileInputFormat; import org.apache.hadoop.mapred.SequenceFileOutputFormat; public class UpdateWHStep2 { static class UpdateWHStep2Mapper extends MapReduceBase implements Mapper<TaggedIndex, MatrixVector, TaggedIndex, MatrixVector> { @Override public void map(TaggedIndex key, MatrixVector value, OutputCollector<TaggedIndex, MatrixVector> out, Reporter reporter) throws IOException { out.collect(key, value); } } static class UpdateWHStep2Reducer extends MapReduceBase implements Reducer<TaggedIndex, MatrixVector, TaggedIndex, MatrixObject> { @Override public void reduce(TaggedIndex key, Iterator<MatrixVector> values, OutputCollector<TaggedIndex, MatrixObject> out, Reporter reporter) throws IOException { MatrixVector result = null; while(values.hasNext()) { MatrixVector current = values.next(); if(result == null) { result = current.getCopy(); } else { result.addVector(current); } } if(result != null) { out.collect(new TaggedIndex(key.getIndex(), TaggedIndex.TYPE_VECTOR_X), new MatrixObject(result)); } } } public static String runJob(int numMappers, int numReducers, int replication, String inputDir, String outputDir) throws IOException { String workingDirectory = outputDir + System.currentTimeMillis() + "- UpdateWHStep2/"; JobConf job = new JobConf(UpdateWHStep2.class); job.setJobName("MatrixGNMFUpdateWHStep2"); job.setInputFormat(SequenceFileInputFormat.class); FileInputFormat.setInputPaths(job, new Path(inputDir)); job.setOutputFormat(SequenceFileOutputFormat.class); FileOutputFormat.setOutputPath(job, new Path(workingDirectory)); job.setNumMapTasks(numMappers); job.setMapperClass(UpdateWHStep2Mapper.class); job.setMapOutputKeyClass(TaggedIndex.class); job.setMapOutputValueClass(MatrixVector.class); job.setNumReduceTasks(numReducers); job.setReducerClass(UpdateWHStep2Reducer.class); job.setOutputKeyClass(TaggedIndex.class); job.setOutputValueClass(MatrixObject.class); JobClient.runJob(job); return workingDirectory; } } package gnmf; import gnmf.io.MatrixCell; import gnmf.io.MatrixFormats; import gnmf.io.MatrixObject; import gnmf.io.MatrixVector; import gnmf.io.TaggedIndex; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.SequenceFileInputFormat; import org.apache.hadoop.mapred.SequenceFileOutputFormat; public class UpdateWHStep1 { public static final int UPDATE_TYPE_H = 0; public static final int UPDATE_TYPE_W = 1; static class UpdateWHStep1Mapper extends MapReduceBase implements Mapper<TaggedIndex, MatrixObject, TaggedIndex, MatrixObject> { private int updateType; @Override public void map(TaggedIndex key, MatrixObject value, OutputCollector<TaggedIndex, MatrixObject> out, Reporter reporter) throws IOException { if(updateType == UPDATE_TYPE_W && key.getType() == TaggedIndex.TYPE_CELL) { MatrixCell current = (MatrixCell) value.getObject(); out.collect(new TaggedIndex(current.getColumn(), TaggedIndex.TYPE_CELL), new MatrixObject(new MatrixCell(key.getIndex(), current.getValue()))); } else { out.collect(key, value); } } @Override public void configure(JobConf job) { updateType = job.getInt("gnmf.updateType", 0); } } static class UpdateWHStep1Reducer extends MapReduceBase implements Reducer<TaggedIndex, MatrixObject, TaggedIndex, MatrixVector> { private double[] baseVector = null; private int vectorSizeK; @Override public void reduce(TaggedIndex key, Iterator<MatrixObject> values, OutputCollector<TaggedIndex, MatrixVector> out, Reporter reporter) throws IOException { if(key.getType() == TaggedIndex.TYPE_VECTOR) { if(!values.hasNext()) throw new RuntimeException("expected vector"); MatrixFormats current = values.next().getObject(); if(!(current instanceof MatrixVector)) throw new RuntimeException("expected vector"); baseVector = ((MatrixVector) current).getValues(); } else { while(values.hasNext()) { MatrixCell current = (MatrixCell) values.next().getObject(); if(baseVector == null) { out.collect(new TaggedIndex(current.getColumn(), TaggedIndex.TYPE_VECTOR), new MatrixVector(vectorSizeK)); } else { if(baseVector.length == 0) throw new RuntimeException("base vector is corrupted"); MatrixVector resultingVector = new MatrixVector(baseVector); resultingVector.multiplyWithScalar(current.getValue()); if(resultingVector.getValues().length == 0) throw new RuntimeException("multiplying with scalar failed"); out.collect(new TaggedIndex(current.getColumn(), TaggedIndex.TYPE_VECTOR), resultingVector); } } baseVector = null; } } @Override public void configure(JobConf job) { vectorSizeK = job.getInt("dml.matrix.gnmf.k", 0); if(vectorSizeK == 0) throw new RuntimeException("invalid k specified"); } } public static String runJob(int numMappers, int numReducers, int replication, int updateType, String matrixInputDir, String whInputDir, String outputDir, int k) throws IOException {
8.
© 2015 IBM
Corporation SystemML lm() - Scalability and Performance 8 lm() 28x Performance (data fit in memory) Scalability (data larger than aggr. memory) R out-of-memory R 3.1.1 lm() Cluster • 6 nodes w/ 12 cores each • M/R capacity: 144/72 • M/R JVM: 2 GB
9.
© 2015 IBM
Corporation SystemML - Scalable Algorithms in BigInsights 9 Category Description Descriptive Statistics Univariate Bivariate Stratified Bivariate Classification Logistic Regression (multinomial) Multi-Class SVM Naïve Bayes (multinomial) Decision Trees Random Forest Clustering k-Means Regression Linear Regression system of equations CG (conjugate gradient descent) Generalized Linear Models (GLM) Distributions: Gaussian, Poisson, Gamma, Inverse Gaussian, Binomial and Bernoulli Links for all distributions: identity, log, sq. root, inverse, 1/μ2 Links for Binomial / Bernoulli: logit, probit, cloglog, cauchit Stepwise Linear GLM Dimension Reduction PCA Matrix Factorization ALS Survival Models Kaplan Meier Cox Predict Scoring Transformation Recoding, dummy coding, binning, scaling, missing value imputation
10.
© 2015 IBM
Corporation High-Level SystemML Architecture 10 IBM Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) Runtime Compiler Language DML Scripts DML (Declarative Machine Learning Language)
11.
© 2015 IBM
Corporation SystemML Architecture 11 Language • R- like syntax w/ constructs for meta learning and task parallelism • Rich set of statistical functions • User-defined & external function • Parsing • Statement blocks & statements • Program Analysis, type inference, dead code elimination High-Level Operator (HOP) Component • Represent dataflow in DAGs of operations on matrices, scalars • Choosing from alternative execution plans based on time and cost estimates: operator ordering & selection; hybrid plans Low-Level Operator (LOP) Component • Low-level physical execution plan (LOPDags) over key-value pairs • “Piggybacking” operations into minimal number Map-Reduce jobs Runtime • Hybrid Runtime • CP: single machine operations & orchestrate MR jobs • MR: generic Map-Reduce jobs & operations • SP: Spark Jobs • Numerically stable operators • Dense / sparse matrix representation • Multi-Level buffer pool (caching) to evict in-memory objects • Dynamic Recompilation for initial unknowns Language HOP Component LOP Component Runtime DML Script Hadoop R- like syntax
12.
© 2015 IBM
Corporation SystemML Compilation Chain 12 Spark CP + b sb _mVar1 SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT false NONE CP * y _mVar2 _mVar3
13.
© 2015 IBM
Corporation Selected Algebraic Simplification Rewrites 13 Name Pattern Remove Unnecessary Indexing X[a:b,c:d] = Y X = Y iff dims(X)=dims(Y) X = Y[, 1] X = Y iff ncol(Y)=1 Remove Empty Matrix Multiply X%*%Y matrix(0,nrow(X),ncol(Y)) iff nnz(X)=0|nnz(Y)=0 Removed Unnecessary Outer Product X*(Y%*%matrix(1,...)) X*Y iff ncol(Y)=1 Simplify Diag Aggregates sum(diag(X))trace(X) iff ncol(X)=1 Simplify Matrix Mult Diag diag(X)%*%Y X*Y iff ncol(X)=1&ncol(Y)=1 Simplify Diag Matrix Mult diag(X%*%Y) rowSums(X*t(Y)) iff ncol(Y)>1 Simplify Dot Product Sum sum(X^2) t(X)%*%X iff ncol(X)=1 Name Static Pattern Remove Unnecessary Operations t(t(X)), X/1, X*1, X-0 X matrix(1,)/X 1/X rand(,min=-1,max=1)*7 rand(,min=-7,max=7) Binary to Unary X+X 2*X X*X X^2 X-X*Y X*(1-Y) Simplify Diag Aggregates trace(X%*%Y)sum(X*t(Y))
14.
© 2015 IBM
Corporation Example Operator Selection: Matrix Multiplication • Physical Operators • Hop-Lop Rewrites Partitioning (w/o, CP/MR, colblock/rowblock) Aggregation (w/o, singleblock/multiblock) Transpose-MM rewrite t(X)%*%y t(t(y)%*%X) Empty block materialization in output CP degree of parallelism (multi-threaded mm) 14 X r(t) ba(+*) y MR MR t(X)%*%yExample: Exec Type Physical MM Operators CP MM MMChain TSMM PMM MR / Spark MapMM MapMMChain TSMM (transpose-self mm) PMM (permutation mm) CPMM (cross-product mm) RMM (replication mm) Zipmm (partition aware mm) MapMM (MR,left) Transform (CP,’) Partition (CP,col) Transform (CP,’) Xy Aggregate (MR,ak+) Group (MR)
15.
© 2015 IBM
Corporation SystemML compiles hybrid runtime plans ranging from in- memory, single machine (CP) to large-scale, cluster compute • Example • Challenge Guaranteed hard memory constraints (budget of JVM size) for arbitrary complex ML programs • Key Technical Innovations CP & distributed runtime: Single machine & distributed operations, integrated runtime Caching: Reuse and eviction of in-memory objects (buffer pool) Cost Model: Accurate time and worst-case memory estimates Optimizer: Rewrites and cost-based runtime plan generation Dynamic Recompiler: Re-optimization for initial unknowns 15 Data size Runtime CP CP/Cluster Cluster Gradually exploit cluster parallelism High performance computing for small data sizes. Scalable computing for large data sizes. Hybrid Plans tokensdocuments 1 1 0.10 1 2 0.30 1 3 0.22 1 4 1.24 : : : : : : W H Ktopics wordsK topics documents 1 1 0.10 1 2 0.30 : : : V
16.
© 2015 IBM
Corporation Compilation of Execution plan for bigr.lm() 16 1 MR Job In-Mem Master + + + M M M … R Mappers compute • X’X for each block in X Combiners partial aggr. intermediate blocks Single reducer for final aggregation as only 1 result block Compute b, and execute solve(A, b) on small A, b (<2 MB) A= b’ X= (automatic, internal matrix block representation) … 1k 1k 300M 500 X 4 TB text file 300M 1 y 9 GB text file Data Characteristics 3.5 GB Map Task JVM 7 GB In-Mem Master JVM 128 MB HDFS block size Cluster Configuration Hadoop distributed cache y’ • y’X for each block in X, because X’y rewr to (y’X)’ How will execution plan change, if changes in • Data characteristics • More columns and rows • Less columns and rows • Cluster characteristics • Smaller task JVM size beta=
17.
© 2015 IBM
Corporation Different Execution Plans for bigr.lm, if … 17 300M 1500 X 300M 1 y 3.5 GB Map Task JVM 7 GB In-Mem Master JVM Data: X has 3 times more columns 300M 500 X 300M 1 y 1.5 GB Map Task JVM 7 GB In-Mem Master JVM Cluster: Change in Cluster configuration 1M 100 X 1M 1 y 3.5 GB Map Task JVM 7 GB In-Mem Master JVM Data: X is small and fits in memory 600M 500 X 600M 1 y 3.5 GB Map Task JVM 7 GB In-Mem Master JVM Data: X has 2 times more rows X’X job1 X’X job2 X’y job solve X’y job1 X’y job2 X’X job solve Solve X’X X’y X’y job1 X’y job2 X’X job solve 300M 500 X 300M 1 y Data Characteristics 3.5 GB Map Task JVM 7 GB In-Mem Master JVM Cluster Configuration Data: X’X and X’y job solve
18.
© 2015 IBM
Corporation SystemML Engine Key Components • Compiler Language parser (~25 KLoC Java) • Parsing, live variable analysis, semantic analysis Optimizer (~35 KLoC Java) • Hops, Lops • Rewrites, intra/inter-procedural analysis, memory estimates, cost model, operator selection • Parfor Optimizer, resource optimizer, global data flow opt. • Execution plan generation • Runtime (~70 KLoC Java) Runtime instructions Core runtime operations Buffer pool and IO Dynamic recompilation UDF framework YARN integration (SystemML AM) 18
19.
© 2015 IBM
Corporation Some Observations on SystemML with Spark (1/2) • Richer Spark Core API significantly simplified implementation • Symbol table tracks matrix runtime data either as single (large) MatrixBlock that is kept in driver JVM, • Used for single node instructions • Backed by multi-level cache or as distributed collection of MatrixBlocks in cluster • JavaPairRDD<MatrixIndexes, MatrixBlock> • Used for distributed Spark instructions • Subject to lazy evaluation • If beneficial, cache RDD – Before loops in iterative algorithms, if read only – Storage level: MEMORY_AND_DISK • Spark’s narrow dependency provides LOP piggybacking but problems with multiple consumers w/ individual actions (multiple scans) 19
20.
© 2015 IBM
Corporation Some Observations on Spark with Spark (2/2) • Robust handling of broadcast variables from driver Observe memory constraints Broadcast partitioned matrix blocks for efficiency • Preserve input RDD’s partitioning whenever possible to avoid shuffle e.g., matrix-vector binary operations using mapPartitions in combination with broadcast • Optimize degree of parallelism by shuffling if necessary e.g. coalesce RDDs before loops, taking into account the metadata information of data involved • Reduce overhead of Spark framework whenever possible for small-medium datasets Example: Lazy SparkContext 20
21.
© 2015 IBM
Corporation Performance SystemML Spark Backend In-Memory Data Set (160 GB) Large-Scale Data Set (1.6 TB) 5.1x1.4x 6.4x 9.7x 0.8x 1.3x 1.9x 1.9x
22.
© 2015 IBM
Corporation SystemML Spark MLContext • Fit into Spark APIs, consume and produce DataFrames • Exploit SystemML’s compiler to produce execution plans with Spark backend. • Useable from Scala, Java, Python, R/SparkR 22
23.
© 2015 IBM
Corporation Run SystemML in ML Pipeline 23
24.
© 2015 IBM
Corporation BigR Interface for SystemML Connect to BI cluster Data frame proxy to large data file Data transformation step Run scalable linear regression on cluster 24
25.
© 2015 IBM
Corporation SystemML Scalable Machine Learning - Summary • Cost-based compilation of machine learning algorithms generates execution plans for single-node in-memory, cluster, and hybrid execution for varying data characteristics: • varying number of observations (1,000s to 10s of billions) • varying number of variables (10s to 10s of millions) • dense and sparse data for varying cluster characteristics (memory configurations, degree of parallelism) • Out-of-the-box, scalable machine learning algorithms e.g. descriptive statistics, regression, clustering, and classification • "Roll-your-own" algorithms Enable programmer productivity (no worry about scalability, numeric stability, and optimizations) Fast turn-around for new algorithms • Machine-learning specific language constructs such as ensemble learning and cross-validation • Higher-level language shields algorithm development investment from platform progression Yarn for resource negotiation and elasticity Spark for in-memory, iterative processing • Open Source Commitment: R, Spark, Hadoop, etc. Platform to build, customize and run pre-processing, feature engineering, and machine learning algorithms in R-like syntax.
Download now