SlideShare a Scribd company logo
1 of 20
Download to read offline
IBM Research | Brazil
Helping HPC Users Specify Job Memory Requirements via
Machine Learning
Eduardo Rodrigues
SuperComputing’16 - Salt Lake City
1/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Motivation
End-users must specify several parameters in their job submissions to the queue
system, e.g.:
Number of processors
Queue / Partition
Memory requirements
Other resource requirements
Those parameters have direct impact in the job turnaround time and, more
importantly, in the total system utilization
Frequently, end-users are not aware of the implications of the parameters they use
System log keeps valuable information that can be leveraged to improve
parameter choice
2/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Related work
Karnak has been used in XSEDE to
predict waiting time and runtime
Useful for users to plan their
experiments
The method may not apply well for
other job parameters, for example
memory requirements fa fb
labela
b
c
d
e
Neighbor (p) Query (q)
Query neighborhood
E(q)
x
D (q,x) = D (x,q) = dx
Point in the knowledge base
3/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Memory requirements
System owner wants to maximize utilization
Users may not specify memory precisely
Log data can provide training examples for a machine learning approach for
predicting memory requirements
This can be seen as a supervised learning task
We have a set of features (e.g. user id, cwd, command parameters, submission
time, etc)
We want to predict memory requirements (label)
4/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
The Wisdom of Crowds
There are many learning algorithms available, e.g. Classification
trees, Neural Networks, Instance-based learners, etc
Instead of relying on a single algorithm, we aggregate the predictions
of several methods
"Aggregating the judgment of many consistently beats the accuracy
of the average member of the group"
5/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
The voting strategy
We turn the task of memory prediction into a classification task
The predictions fall into regular bins
A set of methods is used
Each method is trained with log data
The out-of-sample accuracy is estimated by validation
During prediction, the accuracy weights the vote of each method
6/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Three main components
The system has three modules
Asynchronous data collection process
Asynchronous model construction
Prediction request handling
7/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Asynchronous data collection process
On-line and off-line mode
Data curation
Database independent
8/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Asynchronous model construction
Each model is trained and validated in
parallel
Models are stored in a database
9/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Prediction request handling
Receives as input a bsub command /
script
Returns the predicted memory
requirement
Optionally, submit job with the memory
requirement set to the predicted value
10/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Learning methods
Methods used:
Support Vector Machines (SVM)
multi label classification by one versus all approach
two models (svm-1, svm-2)
Random Forests
Neural Networks (mlp-1, mlp-2)
K-Nearest Neighbors (KNN)
regular voting-based classification (knn-1)
same method used in XSEDE for queue time and runtime predictions (knn-2)
11/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Performance evaluation
0 1 2 3 4
time
test segments
0 1 2 3 4
time
train
test segments
train
log file
log file
In production, the tool does not need to wait a specific number of jobs to be retrained
12/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Features used
Feature Type Description
User ID Category User who submitted the job
Group ID Category User group that submitted the job
Queue ID Category Number of the queue the job has been submitted to
Working directory Category Directory where the job executes
ResReq Category Resources requested (e.g. architecture type, GPU)
Command Category Command executed
Priority Number User priority
Submission time Number Time at which the job was submitted
Requested time Number Amount of time requested to execute the job
Requested processors Number Number of processors requested at the submission time
Weekday Number Day of the week in which the job was submitted
Time since midnight Number Time of the day at which the job was
13/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Accuracy of the methods
2,128-node x86 system
Test performance
segment mode svm-1 svm-2 rforest mlp-1 mlp-2 knn-1 knn-2
0 0.7910 0.8606 0.3948 0.6546 0.8610 0.7278 0.6558 0.6826
1 0.6024 0.7448 0.1202 0.7544 0.7448 0.7180 0.7540 0.5492
2 0.6626 0.8640 0.0484 0.8698 0.8630 0.8666 0.6778 0.6770
3 0.8066 0.8836 0.2464 0.8918 0.7726 0.8842 0.8792 0.5166
4 0.7742 0.8038 0.7568 0.7812 0.8014 0.8140 0.7772 0.8034
14/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Accuracy of the methods
2,128-node x86 system
Test performance
segment mode svm-1 svm-2 rforest mlp-1 mlp-2 knn-1 knn-2
0 0.7910 0.8606 0.3948 0.6546 0.8610 0.7278 0.6558 0.6826
1 0.6024 0.7448 0.1202 0.7544 0.7448 0.7180 0.7540 0.5492
2 0.6626 0.8640 0.0484 0.8698 0.8630 0.8666 0.6778 0.6770
3 0.8066 0.8836 0.2464 0.8918 0.7726 0.8842 0.8792 0.5166
4 0.7742 0.8038 0.7568 0.7812 0.8014 0.8140 0.7772 0.8034
KNN-2, which is used to predict waiting time and running time, was not very consistent.
14/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Comparison between mode and poll
x86 system
0 1 2 3 4
Segment
0.0
0.2
0.4
0.6
0.8
1.0
Accuracy
0.791
0.602
0.663
0.807
0.774
0.909
0.754
0.869 0.892
0.782
Prediction performance in the x86 system
mode poll
15/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Accuracy of the methods
26-node Power8 system
segment mode svm-1 svm-2 rforest mlp-1 mlp-2 knn-1 knn-2
0 0.9856 0.9856 0.0024 0.9890 0.9848 0.0666 0.9796 0.9796
1 0.9628 0.9610 0.0014 0.9656 0.9610 0.7772 0.9620 0.9610
2 0.9398 0.9436 0.0042 0.9428 0.9398 0.1322 0.2912 0.2826
3 0.8276 0.8264 0.0092 0.8360 0.8162 0.7952 0.8278 0.8168
4 0.7386 0.7412 0.0232 0.7474 0.7400 0.5162 0.7410 0.7274
16/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Comparison between mode and poll
Power8 system
0 1 2 3 4
Segment
0.0
0.2
0.4
0.6
0.8
1.0
Accuracy
0.986 0.963 0.940
0.828
0.739
0.989 0.965 0.942
0.835
0.754
Prediction performance in the POWER8 system
mode poll
17/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
Final remarks
System log data can be leveraged to improve resource requirement specifications
One machine learning method may not fit all
In this tool we explored memory prediction, but other resources can be predicted
as well
18/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
QUESTIONS ?
19/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation

More Related Content

What's hot

Mahati's PPT Mainframes
Mahati's PPT MainframesMahati's PPT Mainframes
Mahati's PPT MainframesMahati V
 
Smart analytic optimizer how it works
Smart analytic optimizer   how it worksSmart analytic optimizer   how it works
Smart analytic optimizer how it worksWillie Favero
 
JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Laten...
JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Laten...JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Laten...
JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Laten...0xdaryl
 
z/OS Small Enhancements - Episode 2013A
z/OS Small Enhancements - Episode 2013Az/OS Small Enhancements - Episode 2013A
z/OS Small Enhancements - Episode 2013AMarna Walle
 
Computer Fundamentals Chapter 14 os
Computer Fundamentals Chapter 14 osComputer Fundamentals Chapter 14 os
Computer Fundamentals Chapter 14 osSaumya Sahu
 
Chapter 20 co c-ppt
Chapter 20 co c-pptChapter 20 co c-ppt
Chapter 20 co c-pptsumatipuri
 
openGauss - The evolution route of openGauss' AIcapabilities
openGauss - The evolution route of openGauss' AIcapabilitiesopenGauss - The evolution route of openGauss' AIcapabilities
openGauss - The evolution route of openGauss' AIcapabilitieswot chin
 
Ims13 ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
Ims13   ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...Ims13   ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
Ims13 ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...Robert Hain
 
Mis lesson5 PPT
Mis lesson5 PPTMis lesson5 PPT
Mis lesson5 PPTgeunice
 
Higher Order Thinking - Question paper setting
Higher Order Thinking - Question paper settingHigher Order Thinking - Question paper setting
Higher Order Thinking - Question paper settingPradeep Kumar TS
 

What's hot (14)

Mahati's PPT Mainframes
Mahati's PPT MainframesMahati's PPT Mainframes
Mahati's PPT Mainframes
 
Smart analytic optimizer how it works
Smart analytic optimizer   how it worksSmart analytic optimizer   how it works
Smart analytic optimizer how it works
 
JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Laten...
JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Laten...JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Laten...
JavaOne 2013 CON7370: Java Interprocess Communication Challenges in Low-Laten...
 
z/OS Small Enhancements - Episode 2013A
z/OS Small Enhancements - Episode 2013Az/OS Small Enhancements - Episode 2013A
z/OS Small Enhancements - Episode 2013A
 
Embedded os
Embedded osEmbedded os
Embedded os
 
Computer Fundamentals Chapter 14 os
Computer Fundamentals Chapter 14 osComputer Fundamentals Chapter 14 os
Computer Fundamentals Chapter 14 os
 
Chapter 15 asp
Chapter 15 aspChapter 15 asp
Chapter 15 asp
 
Chapter 13 sio
Chapter 13 sioChapter 13 sio
Chapter 13 sio
 
Chapter 20 co c-ppt
Chapter 20 co c-pptChapter 20 co c-ppt
Chapter 20 co c-ppt
 
openGauss - The evolution route of openGauss' AIcapabilities
openGauss - The evolution route of openGauss' AIcapabilitiesopenGauss - The evolution route of openGauss' AIcapabilities
openGauss - The evolution route of openGauss' AIcapabilities
 
Microassembler a10
Microassembler a10Microassembler a10
Microassembler a10
 
Ims13 ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
Ims13   ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...Ims13   ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
Ims13 ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
 
Mis lesson5 PPT
Mis lesson5 PPTMis lesson5 PPT
Mis lesson5 PPT
 
Higher Order Thinking - Question paper setting
Higher Order Thinking - Question paper settingHigher Order Thinking - Question paper setting
Higher Order Thinking - Question paper setting
 

Viewers also liked

Culture Eats Strategy for Breakfast - Dealing with Cultural Differences in Ka...
Culture Eats Strategy for Breakfast - Dealing with Cultural Differences in Ka...Culture Eats Strategy for Breakfast - Dealing with Cultural Differences in Ka...
Culture Eats Strategy for Breakfast - Dealing with Cultural Differences in Ka...LitheSpeed
 
Blockchain for Business
Blockchain for BusinessBlockchain for Business
Blockchain for BusinessFloyd DCosta
 
ATAGTR2017 Artificial Intelligence in Software Testing – Demystified
ATAGTR2017 Artificial Intelligence in Software Testing – DemystifiedATAGTR2017 Artificial Intelligence in Software Testing – Demystified
ATAGTR2017 Artificial Intelligence in Software Testing – DemystifiedAgile Testing Alliance
 
Artificial Intelligence in Project Management by Dr. Khaled A. Hamdy
Artificial Intelligence in Project Management by  Dr. Khaled A. HamdyArtificial Intelligence in Project Management by  Dr. Khaled A. Hamdy
Artificial Intelligence in Project Management by Dr. Khaled A. HamdyAgile ME
 
Paralegal Outsourcing Services by Legal Support Experts
Paralegal Outsourcing Services by Legal Support ExpertsParalegal Outsourcing Services by Legal Support Experts
Paralegal Outsourcing Services by Legal Support ExpertsCogneesol
 
Community connectivity : building the Internet from scratch
Community connectivity : building the Internet from scratchCommunity connectivity : building the Internet from scratch
Community connectivity : building the Internet from scratchFGV Brazil
 
Die Macht der Daten - CeBIT 2017
Die Macht der Daten - CeBIT 2017Die Macht der Daten - CeBIT 2017
Die Macht der Daten - CeBIT 2017Detlev Sandel
 
MDR aspects for the sterilisation industry
MDR aspects for the sterilisation industryMDR aspects for the sterilisation industry
MDR aspects for the sterilisation industryErik Vollebregt
 
[Infographic] Marketing Is Moving From The Demand Generation Approach To An A...
[Infographic] Marketing Is Moving From The Demand Generation Approach To An A...[Infographic] Marketing Is Moving From The Demand Generation Approach To An A...
[Infographic] Marketing Is Moving From The Demand Generation Approach To An A...Blue Mail Media Inc
 
Karnataka Health department annual report of 2015-16
Karnataka Health department annual report of 2015-16Karnataka Health department annual report of 2015-16
Karnataka Health department annual report of 2015-16Trinity Care Foundation
 
Gender equality and empowerment of women through ICT
Gender equality and empowerment of women through ICTGender equality and empowerment of women through ICT
Gender equality and empowerment of women through ICTEricsson
 
Etude Métiers et Compétences du marketing et de la communication dans un cont...
Etude Métiers et Compétences du marketing et de la communication dans un cont...Etude Métiers et Compétences du marketing et de la communication dans un cont...
Etude Métiers et Compétences du marketing et de la communication dans un cont...Nicolas Bariteau
 
CompTIA IT Employment Tracker - March 2017
CompTIA IT Employment Tracker - March 2017CompTIA IT Employment Tracker - March 2017
CompTIA IT Employment Tracker - March 2017CompTIA
 
Venture Scanner Energy Tech Report Q1 2017
Venture Scanner Energy Tech Report Q1 2017Venture Scanner Energy Tech Report Q1 2017
Venture Scanner Energy Tech Report Q1 2017Nathan Pacer
 
Event Report - SAP Ariba Live - The quest to make procurement awesome
Event Report - SAP Ariba Live - The quest to make procurement awesomeEvent Report - SAP Ariba Live - The quest to make procurement awesome
Event Report - SAP Ariba Live - The quest to make procurement awesomeHolger Mueller
 
Fiche pédagogique pour la correction phonétique fle
Fiche pédagogique pour la correction phonétique fleFiche pédagogique pour la correction phonétique fle
Fiche pédagogique pour la correction phonétique fleMichel Billières
 
Inspiring Marketing Episode 1: The Rise of Messengers
Inspiring Marketing Episode 1: The Rise of MessengersInspiring Marketing Episode 1: The Rise of Messengers
Inspiring Marketing Episode 1: The Rise of MessengersKepios
 

Viewers also liked (20)

Culture Eats Strategy for Breakfast - Dealing with Cultural Differences in Ka...
Culture Eats Strategy for Breakfast - Dealing with Cultural Differences in Ka...Culture Eats Strategy for Breakfast - Dealing with Cultural Differences in Ka...
Culture Eats Strategy for Breakfast - Dealing with Cultural Differences in Ka...
 
Blockchain for Business
Blockchain for BusinessBlockchain for Business
Blockchain for Business
 
ATAGTR2017 Artificial Intelligence in Software Testing – Demystified
ATAGTR2017 Artificial Intelligence in Software Testing – DemystifiedATAGTR2017 Artificial Intelligence in Software Testing – Demystified
ATAGTR2017 Artificial Intelligence in Software Testing – Demystified
 
Artificial Intelligence in Project Management by Dr. Khaled A. Hamdy
Artificial Intelligence in Project Management by  Dr. Khaled A. HamdyArtificial Intelligence in Project Management by  Dr. Khaled A. Hamdy
Artificial Intelligence in Project Management by Dr. Khaled A. Hamdy
 
Paralegal Outsourcing Services by Legal Support Experts
Paralegal Outsourcing Services by Legal Support ExpertsParalegal Outsourcing Services by Legal Support Experts
Paralegal Outsourcing Services by Legal Support Experts
 
Community connectivity : building the Internet from scratch
Community connectivity : building the Internet from scratchCommunity connectivity : building the Internet from scratch
Community connectivity : building the Internet from scratch
 
Die Macht der Daten - CeBIT 2017
Die Macht der Daten - CeBIT 2017Die Macht der Daten - CeBIT 2017
Die Macht der Daten - CeBIT 2017
 
Marketing automation ebook (Spanish)
Marketing automation ebook (Spanish)Marketing automation ebook (Spanish)
Marketing automation ebook (Spanish)
 
MDR aspects for the sterilisation industry
MDR aspects for the sterilisation industryMDR aspects for the sterilisation industry
MDR aspects for the sterilisation industry
 
[Infographic] Marketing Is Moving From The Demand Generation Approach To An A...
[Infographic] Marketing Is Moving From The Demand Generation Approach To An A...[Infographic] Marketing Is Moving From The Demand Generation Approach To An A...
[Infographic] Marketing Is Moving From The Demand Generation Approach To An A...
 
Karnataka Health department annual report of 2015-16
Karnataka Health department annual report of 2015-16Karnataka Health department annual report of 2015-16
Karnataka Health department annual report of 2015-16
 
Gender equality and empowerment of women through ICT
Gender equality and empowerment of women through ICTGender equality and empowerment of women through ICT
Gender equality and empowerment of women through ICT
 
Etude Métiers et Compétences du marketing et de la communication dans un cont...
Etude Métiers et Compétences du marketing et de la communication dans un cont...Etude Métiers et Compétences du marketing et de la communication dans un cont...
Etude Métiers et Compétences du marketing et de la communication dans un cont...
 
CompTIA IT Employment Tracker - March 2017
CompTIA IT Employment Tracker - March 2017CompTIA IT Employment Tracker - March 2017
CompTIA IT Employment Tracker - March 2017
 
Venture Scanner Energy Tech Report Q1 2017
Venture Scanner Energy Tech Report Q1 2017Venture Scanner Energy Tech Report Q1 2017
Venture Scanner Energy Tech Report Q1 2017
 
Event Report - SAP Ariba Live - The quest to make procurement awesome
Event Report - SAP Ariba Live - The quest to make procurement awesomeEvent Report - SAP Ariba Live - The quest to make procurement awesome
Event Report - SAP Ariba Live - The quest to make procurement awesome
 
Fiche pédagogique pour la correction phonétique fle
Fiche pédagogique pour la correction phonétique fleFiche pédagogique pour la correction phonétique fle
Fiche pédagogique pour la correction phonétique fle
 
Pour une politique documentaire musicale globale et « réseausonnée »... de l...
Pour une politique documentaire  musicale globale et « réseausonnée »... de l...Pour une politique documentaire  musicale globale et « réseausonnée »... de l...
Pour une politique documentaire musicale globale et « réseausonnée »... de l...
 
Inspiring Marketing Episode 1: The Rise of Messengers
Inspiring Marketing Episode 1: The Rise of MessengersInspiring Marketing Episode 1: The Rise of Messengers
Inspiring Marketing Episode 1: The Rise of Messengers
 
Smart TV Insecurity
Smart TV InsecuritySmart TV Insecurity
Smart TV Insecurity
 

Similar to SC16: Helping HPC Users Specify Job Memory Requirements via Machine Learning

STOCK MARKET PREDICTION USING NEURAL NETWORKS
STOCK MARKET PREDICTION USING NEURAL NETWORKSSTOCK MARKET PREDICTION USING NEURAL NETWORKS
STOCK MARKET PREDICTION USING NEURAL NETWORKSIRJET Journal
 
Parallel Application Performance Prediction of Using Analysis Based Modeling
Parallel Application Performance Prediction of Using Analysis Based ModelingParallel Application Performance Prediction of Using Analysis Based Modeling
Parallel Application Performance Prediction of Using Analysis Based ModelingJason Liu
 
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
M3AT: Monitoring Agents Assignment Model for the Data-Intensive ApplicationsM3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
M3AT: Monitoring Agents Assignment Model for the Data-Intensive ApplicationsVladislavKashansky
 
SA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdfSA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdfManjuAppukuttan2
 
UNIT I_Introduction.pptx
UNIT I_Introduction.pptxUNIT I_Introduction.pptx
UNIT I_Introduction.pptxssuser4ca1eb
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016Mathieu Dumoulin
 
COMPUTER AIDED DESIGN / COMPUTER AIDED MANUFACTURING (CAD/CAM
COMPUTER AIDED DESIGN / COMPUTER AIDED MANUFACTURING  (CAD/CAMCOMPUTER AIDED DESIGN / COMPUTER AIDED MANUFACTURING  (CAD/CAM
COMPUTER AIDED DESIGN / COMPUTER AIDED MANUFACTURING (CAD/CAMshishirrathod1
 
Sap implementation
Sap implementationSap implementation
Sap implementationsydraza786
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Indrajit Poddar
 
Poly3D Case Study: The Impact of Cache Misses on Performance of CPU-Intensive...
Poly3D Case Study: The Impact of Cache Misses on Performance of CPU-Intensive...Poly3D Case Study: The Impact of Cache Misses on Performance of CPU-Intensive...
Poly3D Case Study: The Impact of Cache Misses on Performance of CPU-Intensive...terraborealis
 
Class 1 introduction to embedded systems
Class 1 introduction to embedded systemsClass 1 introduction to embedded systems
Class 1 introduction to embedded systemsSURYAPRAKASH S
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance ComputingNous Infosystems
 
2 colin walls - how to measure rtos performance
2    colin walls - how to measure rtos performance2    colin walls - how to measure rtos performance
2 colin walls - how to measure rtos performanceIevgenii Katsan
 

Similar to SC16: Helping HPC Users Specify Job Memory Requirements via Machine Learning (20)

STOCK MARKET PREDICTION USING NEURAL NETWORKS
STOCK MARKET PREDICTION USING NEURAL NETWORKSSTOCK MARKET PREDICTION USING NEURAL NETWORKS
STOCK MARKET PREDICTION USING NEURAL NETWORKS
 
hetero_pim
hetero_pimhetero_pim
hetero_pim
 
Parallel Application Performance Prediction of Using Analysis Based Modeling
Parallel Application Performance Prediction of Using Analysis Based ModelingParallel Application Performance Prediction of Using Analysis Based Modeling
Parallel Application Performance Prediction of Using Analysis Based Modeling
 
Kairos aarohan
Kairos  aarohanKairos  aarohan
Kairos aarohan
 
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
M3AT: Monitoring Agents Assignment Model for the Data-Intensive ApplicationsM3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
 
UNIT I.pptx
UNIT I.pptxUNIT I.pptx
UNIT I.pptx
 
SA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdfSA UNIT I STREAMING ANALYTICS.pdf
SA UNIT I STREAMING ANALYTICS.pdf
 
UNIT I_Introduction.pptx
UNIT I_Introduction.pptxUNIT I_Introduction.pptx
UNIT I_Introduction.pptx
 
CNMES 2017 Software Cost Estimating with COSMIC - Critical knowledge for toda...
CNMES 2017 Software Cost Estimating with COSMIC - Critical knowledge for toda...CNMES 2017 Software Cost Estimating with COSMIC - Critical knowledge for toda...
CNMES 2017 Software Cost Estimating with COSMIC - Critical knowledge for toda...
 
Ameya_Kasbekar_Resume
Ameya_Kasbekar_ResumeAmeya_Kasbekar_Resume
Ameya_Kasbekar_Resume
 
Co question 2006
Co question 2006Co question 2006
Co question 2006
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
COMPUTER AIDED DESIGN / COMPUTER AIDED MANUFACTURING (CAD/CAM
COMPUTER AIDED DESIGN / COMPUTER AIDED MANUFACTURING  (CAD/CAMCOMPUTER AIDED DESIGN / COMPUTER AIDED MANUFACTURING  (CAD/CAM
COMPUTER AIDED DESIGN / COMPUTER AIDED MANUFACTURING (CAD/CAM
 
Sap implementation
Sap implementationSap implementation
Sap implementation
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
 
Poly3D Case Study: The Impact of Cache Misses on Performance of CPU-Intensive...
Poly3D Case Study: The Impact of Cache Misses on Performance of CPU-Intensive...Poly3D Case Study: The Impact of Cache Misses on Performance of CPU-Intensive...
Poly3D Case Study: The Impact of Cache Misses on Performance of CPU-Intensive...
 
Class 1 introduction to embedded systems
Class 1 introduction to embedded systemsClass 1 introduction to embedded systems
Class 1 introduction to embedded systems
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
2 colin walls - how to measure rtos performance
2    colin walls - how to measure rtos performance2    colin walls - how to measure rtos performance
2 colin walls - how to measure rtos performance
 

Recently uploaded

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

SC16: Helping HPC Users Specify Job Memory Requirements via Machine Learning

  • 1. IBM Research | Brazil Helping HPC Users Specify Job Memory Requirements via Machine Learning Eduardo Rodrigues SuperComputing’16 - Salt Lake City 1/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 2. Motivation End-users must specify several parameters in their job submissions to the queue system, e.g.: Number of processors Queue / Partition Memory requirements Other resource requirements Those parameters have direct impact in the job turnaround time and, more importantly, in the total system utilization Frequently, end-users are not aware of the implications of the parameters they use System log keeps valuable information that can be leveraged to improve parameter choice 2/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 3. Related work Karnak has been used in XSEDE to predict waiting time and runtime Useful for users to plan their experiments The method may not apply well for other job parameters, for example memory requirements fa fb labela b c d e Neighbor (p) Query (q) Query neighborhood E(q) x D (q,x) = D (x,q) = dx Point in the knowledge base 3/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 4. Memory requirements System owner wants to maximize utilization Users may not specify memory precisely Log data can provide training examples for a machine learning approach for predicting memory requirements This can be seen as a supervised learning task We have a set of features (e.g. user id, cwd, command parameters, submission time, etc) We want to predict memory requirements (label) 4/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 5. The Wisdom of Crowds There are many learning algorithms available, e.g. Classification trees, Neural Networks, Instance-based learners, etc Instead of relying on a single algorithm, we aggregate the predictions of several methods "Aggregating the judgment of many consistently beats the accuracy of the average member of the group" 5/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 6. The voting strategy We turn the task of memory prediction into a classification task The predictions fall into regular bins A set of methods is used Each method is trained with log data The out-of-sample accuracy is estimated by validation During prediction, the accuracy weights the vote of each method 6/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 7. Three main components The system has three modules Asynchronous data collection process Asynchronous model construction Prediction request handling 7/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 8. Asynchronous data collection process On-line and off-line mode Data curation Database independent 8/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 9. Asynchronous model construction Each model is trained and validated in parallel Models are stored in a database 9/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 10. Prediction request handling Receives as input a bsub command / script Returns the predicted memory requirement Optionally, submit job with the memory requirement set to the predicted value 10/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 11. Learning methods Methods used: Support Vector Machines (SVM) multi label classification by one versus all approach two models (svm-1, svm-2) Random Forests Neural Networks (mlp-1, mlp-2) K-Nearest Neighbors (KNN) regular voting-based classification (knn-1) same method used in XSEDE for queue time and runtime predictions (knn-2) 11/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 12. Performance evaluation 0 1 2 3 4 time test segments 0 1 2 3 4 time train test segments train log file log file In production, the tool does not need to wait a specific number of jobs to be retrained 12/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 13. Features used Feature Type Description User ID Category User who submitted the job Group ID Category User group that submitted the job Queue ID Category Number of the queue the job has been submitted to Working directory Category Directory where the job executes ResReq Category Resources requested (e.g. architecture type, GPU) Command Category Command executed Priority Number User priority Submission time Number Time at which the job was submitted Requested time Number Amount of time requested to execute the job Requested processors Number Number of processors requested at the submission time Weekday Number Day of the week in which the job was submitted Time since midnight Number Time of the day at which the job was 13/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 14. Accuracy of the methods 2,128-node x86 system Test performance segment mode svm-1 svm-2 rforest mlp-1 mlp-2 knn-1 knn-2 0 0.7910 0.8606 0.3948 0.6546 0.8610 0.7278 0.6558 0.6826 1 0.6024 0.7448 0.1202 0.7544 0.7448 0.7180 0.7540 0.5492 2 0.6626 0.8640 0.0484 0.8698 0.8630 0.8666 0.6778 0.6770 3 0.8066 0.8836 0.2464 0.8918 0.7726 0.8842 0.8792 0.5166 4 0.7742 0.8038 0.7568 0.7812 0.8014 0.8140 0.7772 0.8034 14/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 15. Accuracy of the methods 2,128-node x86 system Test performance segment mode svm-1 svm-2 rforest mlp-1 mlp-2 knn-1 knn-2 0 0.7910 0.8606 0.3948 0.6546 0.8610 0.7278 0.6558 0.6826 1 0.6024 0.7448 0.1202 0.7544 0.7448 0.7180 0.7540 0.5492 2 0.6626 0.8640 0.0484 0.8698 0.8630 0.8666 0.6778 0.6770 3 0.8066 0.8836 0.2464 0.8918 0.7726 0.8842 0.8792 0.5166 4 0.7742 0.8038 0.7568 0.7812 0.8014 0.8140 0.7772 0.8034 KNN-2, which is used to predict waiting time and running time, was not very consistent. 14/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 16. Comparison between mode and poll x86 system 0 1 2 3 4 Segment 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy 0.791 0.602 0.663 0.807 0.774 0.909 0.754 0.869 0.892 0.782 Prediction performance in the x86 system mode poll 15/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 17. Accuracy of the methods 26-node Power8 system segment mode svm-1 svm-2 rforest mlp-1 mlp-2 knn-1 knn-2 0 0.9856 0.9856 0.0024 0.9890 0.9848 0.0666 0.9796 0.9796 1 0.9628 0.9610 0.0014 0.9656 0.9610 0.7772 0.9620 0.9610 2 0.9398 0.9436 0.0042 0.9428 0.9398 0.1322 0.2912 0.2826 3 0.8276 0.8264 0.0092 0.8360 0.8162 0.7952 0.8278 0.8168 4 0.7386 0.7412 0.0232 0.7474 0.7400 0.5162 0.7410 0.7274 16/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 18. Comparison between mode and poll Power8 system 0 1 2 3 4 Segment 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy 0.986 0.963 0.940 0.828 0.739 0.989 0.965 0.942 0.835 0.754 Prediction performance in the POWER8 system mode poll 17/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 19. Final remarks System log data can be leveraged to improve resource requirement specifications One machine learning method may not fit all In this tool we explored memory prediction, but other resources can be predicted as well 18/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation
  • 20. QUESTIONS ? 19/19 Eduardo Rodrigues – Helping HPC Users Specify Job Memory Requirements via Machine Learning © 2016 IBM Corporation