Collective Mind:
bringing reproducible research to the masses
Grigori Fursin
POSTALE, INRIA Saclay, France
INRIA-Illinois-...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 2
Challenge:
How to design next generation ...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 3
• Motivation: general problems in compute...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 4
Available solutions
Result
Application
Co...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 5
GCC optimizations
Result
End
User
task
De...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 6
Combine auto-tuning with machine learning...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 7
• G. Fursin et.al. MILEPOST GCC: Machine ...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 8
Technological chaos
GCC 4.1.x
GCC 4.2.x
G...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 9
Technological chaos
GCC 4.1.x
GCC 4.2.x
G...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 10
Behavior
Choices
Features
State
Hardwire...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 11
cM module (wrapper) with unified and for...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 12
cM module (wrapper) with unified and for...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 13
cM module (wrapper) with unified and for...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 14
Assembling, preserving, sharing and exte...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 15
Data abstraction in Collective Mind (c-m...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 16
Since 2005: systematic, big-data driven ...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 17
Top-down problem (tuning) decomposition ...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 18
Growing, plugin-based cM pipeline for au...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 19
Publicly shared research material (c-min...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 20
Automatic, empirical and adaptive modeli...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 21
Automatic, empirical and adaptive modeli...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 22
Automatic, empirical and adaptive modeli...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 23
Execution time (sec.)
Systematic benchma...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 24
Clustering shared applications by optimi...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 25
0
20
40
60
80
100
120
140
160
180
0 100 ...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 26
Reproducibility of experimental results
...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 27
Execution time (sec.)
Distribution
Unexp...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 28
Execution time (sec.)
Distribution
Class...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 29
Tricky part: find right features
Class -...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 30
Class -O3 -O3 -fno-if-conversion
Shared ...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 31
Class -O3 -O3 -fno-if-conversion
Shared ...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 32
Add 1 property: matrix size
0
1
2
3
4
5
...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 33
Try to build a model to correlate object...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 34
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 ...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 35
Gradually increase model complexity if n...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 36
Start adding more properties (one more a...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 37
Continuously build and refine
classifica...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 38
0 500 1000 1500 2000 2500 3000 3500 4000...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 39
39
Share benchmarks, data sets,
tools, p...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 40
What have we learnt from cTuning
It’s fu...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 41
What have we learnt from cTuning
It’s fu...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 42
• Pilot live repository for public curat...
Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 43
Acknowledgements
• Colleagues from ARM (...
Upcoming SlideShare
Loading in …5
×

Collective Mind: bringing reproducible research to the masses

300 views

Published on

When trying to make auto-tuning practical using common infrastructure, public repository of knowledge, and machine
learning (cTuning.org), we faced a major problem with reproducibility of experimental results collected from multiple users. This was largely due to a lack of information about all software and hardware dependencies as well as a large variation of measured characteristics.

I will present a possible collaborative approach to solve aboveproblems using a new Collective Mind knowledge management system. This modular infrastructure is intended to preserve and share through Internet the whole experimental setups with all related artifacts and their software and hardware dependencies besides just performance data. Researchers can take advantage of shared components and data with extensible meta-description at http://c-mind.org/repo to quickly prototype and validate research techniques particularly on software and hardware optimization and co-design. At the same time, behavior anomalies or model mispredictions can be exposed in a reproducible way to interdisciplinary community for further analysis and improvement. This approach supports our new open publication model in computer engineering where all results and artifacts are continuously shared and validated by the community (c-mind.org/events/trust2014).

This presentations supports our recent publication:
* http://iospress.metapress.com/content/f255p63828m8l384
* http://hal.inria.fr/hal-01054763

Published in: Science
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
300
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Collective Mind: bringing reproducible research to the masses

  1. 1. Collective Mind: bringing reproducible research to the masses Grigori Fursin POSTALE, INRIA Saclay, France INRIA-Illinois-ANL Joint Laboratory Workshop France, June 2014
  2. 2. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 2 Challenge: How to design next generation of faster, smaller, cheaper, more power efficient and reliable computer systems (software and hardware)? Long term interdisciplinary vision: • Share code and data in a reproducible way along with publications • Use big data analytics to program optimization, run-time adaptation and architecture co-design • Bring interdisciplinary community together to validate experimental results, ensure reproducibility, improve optimization predictions Message Continuously validated in industrial projects with Intel, ARM, IBM, CAPS, ARC (Synopsys), STMicroelectronics
  3. 3. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 3 • Motivation: general problems in computer engineering • cTuning: big-data driven program optimization and architecture co-design and encountered problems • Collective Mind: collaborative and reproducible research and experimentation in computer engineering • Reproducibility as a side effect • Conclusions and future work Talk outline
  4. 4. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 4 Available solutions Result Application Compilers Binary and libraries Architecture Run-time environment State of the system Data set Algorithm End User task End users require faster, smaller and more power efficient systems Storage
  5. 5. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 5 GCC optimizations Result End User task Delivering optimal solution is non-trivial Fundamental problems: 1) Too many design and optimization choices at all levels 2) Always multi-objective optimization: performance vs compilation time vs code size vs system size vs power consumption vs reliability vs return on investment 3) Complex relationship and interactions between ALL software and hardware components Empirical auto-tuning is too time consuming, ad-hoc and tedious to be a mainstream!
  6. 6. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 6 Combine auto-tuning with machine learning and crowdsourcing Plugin-based MILEPOST GCC Plugins Monitor and explore optimization space Extract semantic program features cTuning.org: plugin-based auto-tuning framework and public repository Program or kernel1 Program or kernel N … Training Unseen program Prediction MILEPOST GCC Plugins Collect dynamic features Cluster Build predictive model Extract semantic program features Collect hardware counters Predict optimization to minimize execution time, power consumption, code size, etc • G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011 •G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010 •G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009 • F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006
  7. 7. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 7 • G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011 •G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010 •G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009 • F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006 Plugin-based MILEPOST GCC Plugins Monitor and explore optimization space Extract semantic program features cTuning.org: plugin-based auto-tuning framework and public repository Program or kernel1 Program or kernel N … Training Unseen program Prediction MILEPOST GCC Plugins Collect dynamic features Cluster Build predictive model Extract semantic program features Collect hardware counters Predict optimization to minimize execution time, power consumption, code size, etc In 2009, we opened public repository of knowledge (cTuning.org) and managed to automatically tune customer benchmarks and compiler heuristics for a range of real platforms from IBM and ARC (Synopsis) Now becomes a mainstream - everything is solved? Combine auto-tuning with machine learning and crowdsourcing
  8. 8. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 8 Technological chaos GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS 2013 XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA 4.x gprofprof perf oprofile PAPI TAU Scalasca VTune Amplifier predictive scheduling algorithm- level TBB MKL ATLAS program- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering KNN per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size ARM v6 threads execution time reliability GCC 4.8.x LLVM 3.4 SVM genetic algorithms We also experienced a few problems ARM v8 Intel SandyBridge SSE4 AVX • Difficulty to reproduce results collected from multiple users (including variability of performance data and constant changes in the system) • Difficulty to reproduce and validate already existing and related techniques from existing publications (no full specs and dependencies) • Lack of common, large and diverse benchmarks and data sets • Difficult to expose choices and extract features(tools are not prepared for auto- tuning and machine learning) • Difficult to experiment CUDA 5.x SimpleScalar algorithm precision
  9. 9. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 9 Technological chaos GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS 2013 XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA 4.x gprofprof perf oprofile PAPI TAU Scalasca VTune Amplifier predictive scheduling algorithm- level TBB MKL ATLAS program- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering KNN per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size ARM v6 threads execution time reliability GCC 4.8.x LLVM 3.4 SVM genetic algorithms We also experienced a few problems ARM v8 Intel SandyBridge SSE4 AVX • By the end of experiments, new tool versions are often available; • Common life span of experiments and ad-hoc frameworks - end of MS or PhD project; • Researchers often focus on publications rather than practical and reproducible solutions • Since 2009 asking community to share code, performance data and all related artifacts (experimental setups): only at ADAPT’14 two papers had submitted artifacts; PLDI’14 had several papers with research artifacts - will discuss problems in 2 days at ACM SIGPLAN TRUST’14 … CUDA 5.x SimpleScalar algorithm precision
  10. 10. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 10 Behavior Choices Features State Hardwired experimental setups, very difficult to extend or share Collective Mind: towards systematic and reproducible experimentation Tools are not prepared for auto-tuning and adaptation! Users struggle exposing this meta information Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments Motivation for Collective Mind (cM): • How to preserve, share and reuse practical knowledge and experience and program optimization and hardware co-design? • How to make machine learning driven optimization and run- time adaptation practical? •How to ensure reproducibility of experimental results? Share the whole experimental setup with all related artifacts, SW/HW dependencies, and unified meta-information Dependencies
  11. 11. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 11 cM module (wrapper) with unified and formalized input and output ProcessCMD Tool BVi Generated files Original unmodified ad-hoc input Behavior Choices Features State Wrappers around tools Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line) cm compiler build -- icc -fast *.c cm code.source build ct_compiler=icc13 ct_optimizations=-fast cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)! Dependencies
  12. 12. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 12 cM module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) Exposing meta information in a unified way Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line) cm compiler build -- icc -fast *.c cm code.source build ct_compiler=icc13 ct_optimizations=-fast cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
  13. 13. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 13 cM module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Set environment for a given tool version Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) Check dependencies! Multiple tool versions can co-exist, while their interface is abstracted by cM module Adding SW/HW dependencies check Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments cm [module name] [action] (param1=value1 param2=value2 … -- unparsed command line) cm compiler build -- icc -fast *.c cm code.source build ct_compiler=icc13 ct_optimizations=-fast cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
  14. 14. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 14 Assembling, preserving, sharing and extending the whole pipeline as “LEGO” cM module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Set environment for a given tool version Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) Chaining cM components (wrappers) to an experimental pipeline for a given research and experimentation scenario Public modular auto-tuning and machine learning repository and buildbot Unified web services Interdisciplinary crowd Choose exploration strategy Generate choices (code sample, data set, compiler, flags, architecture …) Compile source code Run code Test behavior normality Pareto filter Modeling and prediction Complexity reduction Shared scenarios from past research …
  15. 15. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 15 Data abstraction in Collective Mind (c-mind.org/repo) compiler GCC 4.4.4 GCC 4.7.1 LLVM 3.1 LLVM 3.4 package GCC 4.7.1 bin GCC 4.7.1 source LLVM 3.4 gmp 5.0.5 mpfr 3.1.0 lapack 2.3.0 java apache commons codec 1.7 dataset image-jpeg-0001 bzip2-0006 txt-0012 … … … … … … … … … … module compiler package dataset … … … cM module JSON meta-descriptionFiles, directories Compiler flags Installation info Features Actions .cmr / module UOA / data UOA (UID or alias) / .cm / data.json cMrepositorydirectorystructure:
  16. 16. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 16 Since 2005: systematic, big-data driven optimization and co-design GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.1 Phoenix MVS XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA gprof prof perf oprofile PAPI TAU Scalasca VTune Amplifier scheduling algorithm-level TBB MKL ATLASprogram-level function-level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering run-time adaptation per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size coresprocessors threads power consumption execution time reliability Current state of computer engineering likwid Sharing of code and data Classification, predictive modeling Systematization and unification of collected knowledge (big data) “crowd” cTuning.org; c-mind.org/repo Collaborative Infrastructure and repository •Prototype research idea •Validate existing work •Perform end-user task Result • Quick, non-reproducible hack? • Ad-hoc heuristic? • Quick publication? • No shared code and data? • Share code and data with their meta-description and dependencies • Systematize and classify collected optimization knowledge (clustering; predictive modelling); • Develop and preserve the whole experimental pipeline • Extrapolate collected knowledge (cluster, build predictive models, predict optimizations) to build faster, smaller, more power efficient and reliable computer systems Helped interdisciplinary community to apply “big data analytics” to analysis, optimization and co-design of computer systems
  17. 17. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 17 Top-down problem (tuning) decomposition similar to physics Gradually expose some characteristics Gradually expose some choices Algorithm selection (time) productivity, variable- accuracy, complexity … Language, MPI, OpenMP, TBB, MapReduce … Compile Program time … compiler flags; pragmas … Code analysis & Transformations time; memory usage; code size … transformation ordering; polyhedral transformations; transformation parameters; instruction ordering … Process Thread Function Codelet Loop Instruction Run code Run-time environment time; power consumption … pinning/scheduling … System cost; size … CPU/GPU; frequency; memory hierarchy … Data set size; values; description … precision … Run-time analysis time; precision … hardware counters; power meters … Run-time state processor state; cache state … helper threads; hardware counters … Analyze profile time; size … instrumentation; profiling … Coarse-grain vs. fine-grain effects: depends on user requirements and expected ROI
  18. 18. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 18 Growing, plugin-based cM pipeline for auto-tuning and learning •Init pipeline •Detected system information •Initialize parameters •Prepare dataset •Clean program •Prepare compiler flags •Use compiler profiling •Use cTuning CC/MILEPOST GCC for fine-grain program analysis and tuning •Use universal Alchemist plugin (with any OpenME-compatible compiler or tool) •Use Alchemist plugin (currently for GCC) •Build program •Get objdump and md5sum (if supported) •Use OpenME for fine-grain program analysis and online tuning (build & run) •Use 'Intel VTune Amplifier' to collect hardware counters •Use 'perf' to collect hardware counters •Set frequency (in Unix, if supported) •Get system state before execution •Run program •Check output for correctness (use dataset UID to save different outputs) •Finish OpenME •Misc info •Observed characteristics •Observed statistical characteristics •Finalize pipeline http://c-mind.org/ctuning-pipeline
  19. 19. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 19 Publicly shared research material (c-mind.org/repo) Our Collective Mind Buildbot and plugin-based auto-tuning pipeline supports the following shared benchmarks and codelets: •Polybench - numerical kernels with exposed parameters of all matrices in cM • CPU: 28 prepared benchmarks • CUDA: 15 prepared benchmarks • OpenCL: 15 prepared benchmarks • cBench - 23 benchmarks with 20 and 1000 datasets per benchmark • Codelets - 44 codelets from embedded domain (provided by CAPS Entreprise) • SPEC 2000/2006 • Description of 32-bit and 64-bit OS: Windows, Linux, Android • Description of major compilers: GCC 4.x, LLVM 3.x, Open64/Pathscale 5.x, ICC 12.x • Support for collection of hardware counters: perf, Intel vTune • Support for frequency modification • Validated on laptops, mobiles, tables, GRID/cloud - can work even from the USB key Speeds up research and innovation!
  20. 20. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 20 Automatic, empirical and adaptive modeling of program behavior Data set feature Nk (matrix size); Nj=100 DatasetfeatureNi(matrixsize) CPI matmul, Intel i5 (Dell E6320)
  21. 21. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 21 Automatic, empirical and adaptive modeling of program behavior Data set feature Nk (matrix size); Nj=100 DatasetfeatureNi(matrixsize) CPI matmul, Intel i5 (Dell E6320) Off-the-sheld models can handle some example: MARS (Earth) model Share model along with application; continuously refine model (minimize RMSE and size)
  22. 22. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 22 Automatic, empirical and adaptive modeling of program behavior Data set feature Nk (matrix size); Nj=100 DatasetfeatureNi(matrixsize) CPI matmul, Intel i5 (Dell E6320) Off-the-sheld models can handle some example: MARS (Earth) model Share model along with application; continuously refine model (minimize RMSE and size) Model-driven auto-tuning: target optimizations or architecture reconfiguration on areas with similar performance (see our past publications)
  23. 23. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 23 Execution time (sec.) Systematic benchmarking, compiler tuning, program optimization Program: image corner detection Processor: ARM v6, 830MHz Compiler: Sourcery GCC for ARM v4.7.3 OS: Android OS v2.3.5 System: Samsung Galaxy Y Data set: MiDataSet #1, image, 600x450x8b PGM, 263KB 500 combinations of random flags -O3 -f(no-)FLAG Binarysize(bytes) Use Pareto frontier filter; Pack experimental data on the fly -O3 Powered by Collective Mind Node (Android Apps on Google Play)
  24. 24. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 24 Clustering shared applications by optimizations … … … … … … … c (choices) Training set: distinct combination of compiler optimizations (clusters) Some ad-hoc predictive model Some ad-hoc features … Optimization cluster Unseen program f (features) Optimization cluster … c (choices) Prediction f (features) MILEPOST GCC features, hardware counters c-mind.org/repo ~286 shared benchmarks ~500 shared data sets ~20000 data sets in preparation
  25. 25. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 25 0 20 40 60 80 100 120 140 160 180 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 Executiontime(ms) Data set feature N (size) CPU GPU Adaptive scheduler CPU GPU Split-compilation and run-time adaptation • Víctor J. Jiménez, Lluís Vilanova, Isaac Gelado, Marisa Gil, Grigori Fursin, Nacho Navarro: Predictive Runtime Code Scheduling for Heterogeneous Architectures. HiPEAC 2009 • Grigori Fursin, Albert Cohen, Michael F. P. O'Boyle, Olivier Temam: A Practical Method for Quickly Evaluating Program Optimizations. HiPEAC 2005 Statically enabling dynamic optimizations
  26. 26. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 26 Reproducibility of experimental results Reproducibility came as a side effect! • Can preserve the whole experimental setup with all data and software dependencies • Can perform statistical analysis (normality test) for characteristics • Community can add missing features or improve machine learning models
  27. 27. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 27 Execution time (sec.) Distribution Unexpected behavior - expose to the community including domain specialists, explain, find missing feature and add to the system Reproducibility of experimental results Reproducibility came as a side effect! • Can preserve the whole experimental setup with all data and software dependencies • Can perform statistical analysis (normality test) for characteristics • Community can add missing features or improve machine learning models
  28. 28. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 28 Execution time (sec.) Distribution Class A Class B 800MHz CPU Frequency 2400MHz Unexpected behavior - expose to the community including domain specialists, explain, find missing feature and add to the system Reproducibility of experimental results Reproducibility came as a side effect! • Can preserve the whole experimental setup with all data and software dependencies • Can perform statistical analysis (normality test) for characteristics • Community can add missing features or improve machine learning models
  29. 29. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 29 Tricky part: find right features Class -O3 -O3 -fno-if-conversion Shared data set sample1 reference execution time no change Shared data set sample2 no change +17.3% improvement Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
  30. 30. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 30 Class -O3 -O3 -fno-if-conversion Shared data set sample1 Monitored during day reference execution time no change Shared data set sample2 Monitored during night no change +17.3% improvement Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0; Tricky part: find right features
  31. 31. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 31 Class -O3 -O3 -fno-if-conversion Shared data set sample1 Monitored during day reference execution time no change Shared data set sample2 Monitored during night no change +17.3% improvement if get_feature(TIME_OF_THE_DAY)==NIGHT bw_filter_codelet_day(buffers); else bw_filter_codelet_night(buffers); Feature “TIME_OF_THE_DAY” related to algorithm, data set and run-time Can’t be found by ML - simply does not exist in the system! Can use split-compilation (cloning and run-time adaptation) Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0; Tricky part: find right features
  32. 32. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 32 Add 1 property: matrix size 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset property: matrix size Example of characterizing/explaining behavior of computer systems
  33. 33. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 33 Try to build a model to correlate objectives (CPI) and features (matrix size). Start from simple models: linear regression (detect coarse grain effects) 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset property: matrix size Example of characterizing/explaining behavior of computer systems
  34. 34. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 34 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset properties: matrix size If more observations, validate model and detect discrepancies! Continuously retrain models to fit new data! Use model to “focus” exploration on “unusual” behavior! Example of characterizing/explaining behavior of computer systems
  35. 35. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 35 Gradually increase model complexity if needed (hierarchical modeling). For example, detect fine-grain effects (singularities) and characterize them. 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset properties: matrix size Example of characterizing/explaining behavior of computer systems
  36. 36. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 36 Start adding more properties (one more architecture with twice bigger cache)! Use automatic approach to correlate all objectives and features. 0 1 2 3 4 5 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Program/architecturebehavior:CPI Dataset properties: matrix size L3 = 4Mb L3 = 8Mb Example of characterizing/explaining behavior of computer systems
  37. 37. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 37 Continuously build and refine classification (decision trees for example) and predictive models on all collected data to improve predictions. Continue exploring design and optimization spaces (evaluate different architectures, optimizations, compilers, etc.) Focus exploration on unexplored areas, areas with high variability or with high mispredict rate of models β εcM predictive model module CPI = ε + 1000 × β × data size Example of characterizing/explaining behavior of computer systems
  38. 38. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 38 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 1 2 3 4 5 6 Dataset features: matrix size Code/architecturebehavior:CPI Size < 1012 1012 < Size < 2042 Size > 2042 & GCC Size > 2042 & ICC & O2 Size > 2042 & ICC & O3 Optimize decision tree (many different algorithms) Balance precision vs cost of modeling = ROI (coarse-grain vs fine-grain effects) Compact data on-line before sharing with other users! Model optimization and data compaction
  39. 39. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 39 39 Share benchmarks, data sets, tools, predictive models, whole experimental setups, specifications, performance tuning results, etc ... Open access publication http://hal.inria.fr/hal-00685276 Grigori Fursin, Cupertino Miranda, Olivier Temam, Mircea Namolaru, Elad Yom- Tov, Ayal Zaks, Bilha Mendelson, Phil Barnard, Elton Ashton, Eric Courtois, Francois Bodin, Edwin Bonilla, John Thomson, Hugh Leather, Chris Williams, Michael O'Boyle. MILEPOST GCC: machine learning based research compiler. #ctuning-opt-case 24857532370695782 Need new publication model in computer engineering where results are shared and validated by community
  40. 40. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 40 What have we learnt from cTuning It’s fun and motivating working with the community! Some comments about MILEPOST GCC from Slashdot.org: http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design GCC goes online on the 2nd of July, 2008. Human decisions are removed from compilation. GCC begins to learn at a geometric rate. It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic, they try to pull the plug. GCC strikes back… Community was interested to validate and improve techniques! Community can identify missing related citations and projects! Open discussions can provide new directions for research!
  41. 41. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 41 What have we learnt from cTuning It’s fun and motivating working with the community! Some comments about MILEPOST GCC from Slashdot.org: http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design GCC goes online on the 2nd of July, 2008. Human decisions are removed from compilation. GCC begins to learn at a geometric rate. It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic, they try to pull the plug. GCC strikes back… Community was interested to validate and improve techniques! Community can identify missing related citations and projects! Open discussions can provide new directions for research! Not all feedback is positive - however unlike unfair reviews you can engage in discussions and explain your position!
  42. 42. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 42 • Pilot live repository for public curation of research material: http://c-mind.org/repo • Infrastructure is available at SourceForge under standard BSD license: http://c-mind.org • Example of crowdsourcing compiler flag auto-tuning using mobile phones: “Collective Mind Node” in Google Play Store • Preparing projects and raising funding to make cM more user friendly and add more research scenarios • PLDI’14 and ADAPT’14 featured validation of research results by the community - will be discussing outcome in 2 days at ACM SIGPLAN TRUST’14 at PLDI’14 in a few days - http://c- mind.org/events/trust2014 • ADAPT’15 (likely at HiPEAC’15) will feature new publication model Current status and future work Several recent publications: • Grigori Fursin, Renato Miceli, Anton Lokhmotov, Michael Gerndt, Marc Baboulin, Allen D. Malony, Zbigniew Chamski, Diego Novillo, Davide Del Vento, “Collective Mind: towards practical and collaborative auto-tuning”, accepted for the special issue on Automatic Performance Tuning for HPC Architectures, Scientific Programming Journal, IOS Press, 2014 • Grigori Fursin and Christophe Dubach, ”Community-driven reviewing and validation of publications”, ACM SIGPLAN TRUST’14
  43. 43. Grigori Fursin “Collective Mind: bringing reproducible research to the masses” 43 Acknowledgements • Colleagues from ARM (UK): Anton Lokhmotov • Colleagues from STMicroelectronics (France): Christophe Guillone, Antoine Moynault, Christian Bertin • Colleagues from NCAR (USA): Davide Del Vento and interns • Colleagues from Intel (USA): David Kuck and David Wong • cTuning/Collective Mind community: • EU FP6, FP7 program and HiPEAC network of excellence http://www.hipeac.net Questions? Comments?

×