Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Programs Perfo...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Goal, tasks, a...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Overview
A lot...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Method of stat...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
2 Split the U ...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
3 Extract feat...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
4 Filter the t...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
5 Rank the fea...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
6 Fit the regr...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
7 Test the mod...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Architecture o...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Technology sta...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Database inter...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Program buildi...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Experimentatio...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Information re...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Data analysis ...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Data analysis ...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Platform
Intel...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Feature rankin...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Feature rankin...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Resulting mode...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Resulting mode...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Where models f...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Results of eva...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Further work
V...
Introduction
Methodology
Implementation (general info)
Implementation (client)
Evaluation of implementation
Thank you!
Con...
Upcoming SlideShare
Loading in …5
×

Program Performance Analysis Toolkit Adaptor

269 views
217 views

Published on

The Adaptor framework automates experimentation, data collection and analysis in the field of programs performance and tuning.

It can be used for i.e. estimation of computer system performance during its design or search of optimal compiler settings by methods of iterative compilation and machine learning-driven techniques.

Contact information: Michael K. Pankov
• mkpankov@gmail.com
• michaelpankov.com

Source on GitHub: https://github.com/constantius9/adaptor

This is an extended and edited version of my diploma defense keynote, given on June 19, 2013

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
269
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Program Performance Analysis Toolkit Adaptor

  1. 1. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Programs Performance Analysis Toolkit Adaptor Michael K. Pankov Advisor: Anatoly P. Karpenko Bauman Moscow State Technical University October 11, 2013 Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  2. 2. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Goal, tasks, and importance of the work Goal Develop a method and software toolkit for modeling of programs performance on general purpose computers Tasks 1 Develop method of programs performance modeling 2 Implement the performance analysis & modeling toolkit 3 Study the efficiency of toolkit on a set of benchmarks Importance 1 Estimation of computer performance during its design 2 Search of optimal compiler settings by methods of iterative compilation and machine learning-driven techniques Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  3. 3. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Overview A lot of recent research: see C. Dubach, G. Fursin, B. C. Lee, W. Wu In particular, there’s cTuning public repository for research and corresponding program Collective Mind run by G. Fursin This work is about modeling of performance of general-purpose computer programs with feature ranking by means of Earth Importance and regression by means of k-Nearest Neighbors and Earth Regression. We try to accomplish automatic detection of relevant features. Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  4. 4. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Method of statistical programs performance analysis Velocitas 1 Perform a series of experiments on measuring time of program execution and form a set, U: U = {(Xi , yi )}, Xi = (xij , i ∈ [1; m], j ∈ [1; n]) Xi — features vector (CPU frequency, number of rows of processed matrix, etc.), yi — response (execution time), m — number of experiments, n — number of features Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  5. 5. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation 2 Split the U set into training sample D and test sample C by randomly assigning of 70% of experiments to D D = {di | f I rand (di ) > 0.3}, (1) di = (Xi , yi ), (2) f I rand (d) ∈ [0 : 1], (3) i ∈ [1; m], (4) C = U D (5) Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  6. 6. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation 3 Extract features xik xik = f (Xi ), (6) Xi = (xij ), (7) D = {(Xi , yi )}, (8) i ∈ [1; m], (9) j ∈ [1; n + r], (10) k ∈ [n + 1; r] (11) r — number of additional features (i.e. ”size of input data”) Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  7. 7. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation 4 Filter the training set D to remove noise and incorrect measurements D = D {(Xi , yi ) | P(Xi , yi )} P — experiment selection predicate (we remove all experiments where the measured execution time is less than tmin) Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  8. 8. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation 5 Rank the features and select only ones with non-zero importance sj = frank(D ), (12) j ∈ [1; n], (13) D = {(Xi , yi ) | Sj > 0} (14) sj — scalar value of importance of particular feature, frank — feature ranking function (we used MSE, Relief F, Earth Importance) Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  9. 9. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation 6 Fit the regression model of 1 of 4 kinds (linear, random forest, Earth, k nearest neighbors) Mp = {fpred , B} (15) B = ffit(D ), p ∈ [1; 4] (16) B — vector of model parameters, ffit — learning function, fpref — prediction function (they’re defined for each model separately) Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  10. 10. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation 7 Test the model by RRSE metric C = U D (17) = {(Xi , yi )}, (18) i ∈ [1; m], (19) Xi = (xik), (20) k ∈ [1; n + r] (21) ˜Y = fpred (X, B), (22) RRSE = m i=1 (˜yi − yi )2 m i=1 (yi − ¯y)2 (23) ˜yi — predicted value of response, ¯y — average value of response in testing sample Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  11. 11. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Architecture of Adaptor Framework Database server Data views Client Database interaction module Program building module Experimentation module Information retrieval module Information analysis module Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  12. 12. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Technology stack Database server Distributed client-server document-oriented storage CouchDB Cloud platform Cloudant Client Python Statistical framework Orange GNU/Linux on x86 platform Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  13. 13. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Database interaction module Provides high-level API for storage of Python objects to database documents Uses local CouchDB server as a fall-back if the remote isn’t available Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  14. 14. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Program building module Manages paths to source files of experimental programs Sources are in hierarchical structure of directories Module enables that only specifying the name of program to build is enough for sources to be found Manages build tools and their settings Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  15. 15. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Experimentation module Calibrates the program execution time measurement before every series of runs Subtracts the execution time of ”simplest” program to avoid systematical error Runs the program being studied until relative dispersion of time measurement becomes pretty low (drel < 5%) Passes experiment data to database interaction module Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  16. 16. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Information retrieval module Collects the information on used platform and experiment being carried out CPU Frequency Cache size Instruction set extensions etc. Compiler Experiment Studied program Size of input data Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  17. 17. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Data analysis module Receives data from database and saves it to CSV files for input to Orange statistical analysis system Graphs results using Python library matplotlib Two groups of program performance models Simplest (1 feature) More complex (3-5 features) Four regression models in both groups Linear k Nearest Neighbors Multivariate Adaptive Regression Splines Random Forest Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  18. 18. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Data analysis module (cont.) Scheme of 40 data analysis components in Orange system Reading in Preprocessing Filtering Feature extraction Feature ranking Predictor fitting Prediction results evaluation Saving predictions to CSV file Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  19. 19. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Platform Intel CPUs Core 2 Quad Q8200 2.33 GHz, 2 MB cache Core i5 M460 2.53 GHz, 3 MB cache Xeon E5430 2.66 GHz, 6 MB cache Ubuntu 12.04, gcc and llvm compilers Polybench/C 3.2 benchmark set, 28 programs in total Linear algebra, solution of systems of linear algebraic equations and ordinary differential equations Input data is generated by deterministic algorithms Performance of chosen programs from benchmark set is modeled using Adaptor framework symm. Multiplication of symmetric matrices Square matrices of 2i dimensionality, i = frand (1, 10) ludcmp. LU-decomposition. Square matrices of frand (2, 1024) dimensionality 1000 experiments per CPU Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  20. 20. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Feature ranking. symm program Attribute Relief F Mean Square Error Earth Importance size 0.268 0.573 4.9 cpu mhz 0.000 0.006 3.3 width 0.130 0.573 0.7 cpu cache 0.000 0.006 0.5 height 0.130 0.573 0.0 Earth Importance selected only relevant features Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  21. 21. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Feature ranking. symm program (cont.) 428 experiments 1 feature: matrix dimensionality RMSE RRSE R2 k Nearest Neighbors 5.761 0.051 0.997 Random Forest 5.961 0.052 0.997 Linear Regression 15.869 0.139 0.981 Root Relative Square Error of k Nearest Neighbors — approx. 5% Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  22. 22. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Resulting model of performance k Nearest Neighbors model of performance of symm program on Intel Core 2 Quad Q8200 CPU Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  23. 23. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Resulting model of performance Comparison of models of performance of ludcmp program 468 experiments 2 features: width of matrix, CPU frequency RMSE RRSE R2 k Nearest Neigbors 1.093 0.048 0.998 Linear Regression 9.067 0.394 0.845 Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  24. 24. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Where models fail Amazon throttles its micro servers: data is split into two ”curves” Earth Regression at least tries to follow the ”main curve” k Nearest Neighbors is much worse in this situation Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  25. 25. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Results of evaluation Most suitable Feature Ranking method — Earth Importance Most suitable Regression method — k Nearest Neighbors Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  26. 26. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Further work Velocitas method is promising, scales for larger feature sets Data filtering to reduce noise can help it to get even better Orange is decent statistical framework, but interactive work with it limits batch processing For larger data sets and increased automation of Adaptor framework, either its API, or other libraries (e.g. sklearn) should be used Custom research scenario support is required It would be interesting to perform experiments on GPU to study effects of massive parallel execution Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
  27. 27. Introduction Methodology Implementation (general info) Implementation (client) Evaluation of implementation Thank you! Contact information: Michael K. Pankov mkpankov@gmail.com michaelpankov.com This is an extended and edited version of my diploma defense keynote, given on June 19, 2013 Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor

×