Collective Knowledge:
python and scikit-learn based open research SDK
for collaborative data management and exchange
PyDat...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March,...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and ex...
Upcoming SlideShare
Loading in …5
×

Collective Knowledge: python and scikit-learn based open research SDK for collaborative data management and exchange

433 views

Published on

We would like to share our experience with a python-based Collective Knowledge SDK for collaborative and reproducible experimentation. It helps organize and share experimental setups (code, data and meta) as unified and reusable components with JSON API via GITHUB. It also helps unify, automate and crowdsource analysis and exploration of multi-dimensional optimization spaces using scikit-learn.

Published in: Science
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
433
On SlideShare
0
From Embeds
0
Number of Embeds
22
Actions
Shares
0
Downloads
2
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Collective Knowledge: python and scikit-learn based open research SDK for collaborative data management and exchange

  1. 1. Collective Knowledge: python and scikit-learn based open research SDK for collaborative data management and exchange PyData, London 20 June 2015 Grigori Fursin, cTuning Foundation, France Anton Lokhmotov, dividiti, UK
  2. 2. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 2 • Background • Back to basics: major problems in computer engineering • Machine learning for performance/energy optimization • Collective Knowledge Infrastructure & Repository • Organizing local code and data using Python wrappers + JSON • Sharing all artifacts as reusable components • Designing collaborative experiments from shared components • Reproducing experiments • Connecting predictive analytics •Conclusions and future work Outline
  3. 3. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 3 Interdisciplinary background (physics, electronics, ML) 1999-2004: PhD in computer science, University of Edinburgh, UK Prepared foundation for machine-learning based performance autotuning 2007-2010: Tenured research scientist at INRIA, France Adjunct professor at Paris South University, France Developed self-tuning compiler GCC combined with machine learning 2010-2011: Head of application optimization group at Intel Exascale Lab, France Application characterization and optimization for exascale systems via 2012-2014: Senior tenured research scientist, INRIA, France Collective Mind Project – open platform for sharing optimization knowledge 2014-now: Chief Scientist, non-profit cTuning foundation, France CTO, dividiti, UK Collective Knowledge Project – python-based framework and repository for collaborative and reproducible experimentation in computer engineering combined with predictive analytics Close collaboration with IBM, Intel, ARM, ARC, STMicroelectronics Presented work and opinions are my own!
  4. 4. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 4 Back to 1993 Semiconductor neural element - base of neural accelerators and brain-inspired computers Modeling and understanding brain functions Faced major problem during modeling • Too slow • Too unreliable • Too costly • Too much data 1 -1 θ - threshold
  5. 5. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 5 Result Researchers and developers do not necessarily care about details of underlying technology but simply want to: • get result as fast as possible • minimize all costs power consumption, data/memory footprint, inaccuracies, price, size, faults … • guarantee some constraints power budget, real-time processing, bandwidth, QoS … Idea Back to basics G.Fursin, A. Lokhmotov, et.al. “Collective Mind, Part II: Towards Performance- and Cost- Aware Software Engineering as a Natural Science”, CPC’15, London, UK, available at ArXiv
  6. 6. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 6 Result Application Compilers Binary and libraries Architecture Run-time environment State of the system Data set Algorithm Choose “best” solution from all available choices Service/application providers (HPC, supercomputers, mobile systems) Hardware and software designers Idea Back to basics: available solutions 20 years ago was relatively simple!
  7. 7. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 7 Result Idea Back to basics: technological chaos GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA gprofprof perf oprofile PAPI TAU Scalasca VTune Amplifierscheduling algorithm- level TBB MKL ATLASprogram- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering run-time adaptation per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size processors threads power consumptionexecution time reliability Is your system optimal? No one knows ... Fundamental problems: 1) Ever rising complexity of computer systems: too many design and optimization choices at ALL levels 2) It’s not only performance that matters: multiple user objectives vs choices benefit vs optimization time 3) Complex relationship and interactions between software and hardware components 4) Too many ever changing tools with non-unified interfaces changing from version to version: technological chaos 5) No common methodology for performance/energy evaluation and benchmarking
  8. 8. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 8 Result Idea Back to basics: ever raising complexity GCC compiler (similar trends in LLVM) • OpenCL/CUDA/OpenMP/MPI parameters • CPU/GPU frequency • number of threads • algorithm accuracy/precision … Large, multi-dimensional design and optimization spaces
  9. 9. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 9 Program: image corner detection Processor: ARM v7 (Cortex A15), 2.0GHz Compiler: GCC for ARM v4.9.2 OS: Ubuntu 14.04.02 LTS System: ODROID-XU3 Data set: MiDataSet #1, image, 600x450x8b PGM, 263KB 500 combinations of random flags -Ox -f(no-)FLAG GCC v4.9.2 -O3 == LLVM v3.4 –O3 Cluster around –Os with “bad” flags Cluster around –O0 with “bad” flags Cluster around –O1,-O2 with “bad” flags Back to basics: SW/HW autotuning Paretofrontier ~20% improvement
  10. 10. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 10 How realistic programs behave? Continuously tuning 285 shared code and dataset combinations from 8 benchmarks including NAS, MiBench, SPEC2000, SPEC2006, Powerstone, UTDSP and SNU-RT using GRID 5000; Intel E5520, 2.6MHz; GCC 4.6.3; at least 5000 random combinations of flags Compilers are tested with a limited set of (possibly non- representative) benchmarks Continuously tuning (crowd-tuning) shared benchmarks and datasets using GRID5000, mobile phones, tablets, laptops, and other spare resources: Collective Mind Node (Android Apps on Google Play): https://play.google.com/store/apps/ details?id=com.collective_mind.node
  11. 11. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 11 Back to basics: cost of computation P1) Intel Core i5-2540M, 2.60GHz, 2 cores D1) grayscale image 1, size=1536x1536 P2) Qualcomm MSM7625A FFA, ARM Cortex A5, 1 GHz, 1 core D2) grayscale image 2, size=1536x1536 P3) Allwinner A20 (sun7i), ARM Cortex A7, 1.6GHz, Mali400 GPU, 2 core P4) NVidia Quadro NVS 135M, 400MHz, 16 cores O1) Windows 7 Pro SP1, cost~170 euros T1) 7.2E10 O2) O1 with MinGW32 W1) 32 bit processor mode T2) 9.6E9 O3) OpenSuse 12.1, Kernel 3.1.10 W2) 64 bit processor mode T3) 2.4E9 O4) Android 4.1.2, Kernel 3.4.0 T4) 1.0E9 O5) Android 4.2.2, Kernel 3.3.0 X1) GCC 4.1.1, opt.flags~190, release date=2006 X2) GCC 4.4.1, opt.flags~270, release date=2009 S1) Dell Laptop Latitude E6320, Mem=8Gb, 52W, 1200 euro X3) GCC 4.4.4, opt.flags~270, release date=2010 S2) Samsung Mobile GT-S6312, Mem=0.8Gb, 5W, 200 euros X4) GCC 4.6.3, opt.flags~320, release date=2012 S3) Polaroid Tablet MID0927, Mem=1Gb, 13W, 100 euros X5) GCC 4.7.2, opt.flags~340, release date=2012 S4) Semiconductor neural network,1.5years development X6) GCC 4.8.3, opt.flags~350, release date=2014 X7) GCC 4.9.1, opt.flags~357, release date=2014 Y1) Performance (usually -O3) X8) LLVM 3.1, release date=2012 Y2) Size (usually -Os) X9) LLVM 3.4.2, release date=2014 Y3) -O3 -fmodulo-sched -funroll-all-loops X10) Open64 5.0, release date=2011 Y4) -O3 -funroll-all-loops X11) PathScale 2.3.1, release date=2006 Y5) -O3 -fprefecth-loop-arrays X12) NVidia CUDA Toolkit 5.0, release date=2012 Y6) -O3 -fno-if-conversion X13) Intel Composer XE 2011, cost = ~800euro Y7) Auto-tuning with more than 6 flags (-fif-conversion) X14) Microsoft Visual Studio 2013 Y8) Auto-tuning with more than 6 flags (-fno-if-conversion) Analysis of computation cost of my neural network kernel in the past 10 years
  12. 12. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 12 1 2 3 4 5 9 6 7 8 A B C D E I F II 1) P1 O3 W2 X1 Y1 T2 D1 A) P3 O5 W1 X1 Y1 T4 D1 2) P1 O3 W2 X7 Y1 T2 D1 B) P3 O5 W1 X4 Y1 T4 D1 3) P1 O3 W2 X1 Y7 T2 D1 C) P3 O5 W1 X4 Y7 T4 D1 4) P1 O3 W2 X7 Y5 T2 D1 D) P3 O5 W1 X6 Y1 T4 D1 5) P1 O3 W2 X11 Y1 T2 D1 E) P3 O5 W1 X6 Y7 T4 D1 6) P1 O3 W2 X9 Y1 T2 D1 F) P3 O5 W1 X9 Y1 T4 D1 7) P1 O3 W2 X3 Y7 T2 D1 8) P1 O3 W2 X4 Y8 T2 D2 I) P2 O4 W1 X1 Y1 T4 D1 9) P1 O1 W1 X14 Y1 T3 D1 II) P2 O4 W1 X6 Y1 T4 D1 10) P1 O1 W1 X13 Y1 T2 D1 11) P1 O3 W2 X7 Y8 T2 D2 $) P4 O3 W1 X12 Y1 T1 D1 10 Available resource: P1, one core Available resource: P1, two cores $11 can plot similar graphs with consumed energy, price, frequency, faults or anything else depending on user needs Back to basics: cost of computation
  13. 13. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 13 1 2 3 4 5 9 6 7 8 A B C D E I F II 1) P1 O3 W2 X1 Y1 T2 D1 A) P3 O5 W1 X1 Y1 T4 D1 2) P1 O3 W2 X7 Y1 T2 D1 B) P3 O5 W1 X4 Y1 T4 D1 3) P1 O3 W2 X1 Y7 T2 D1 C) P3 O5 W1 X4 Y7 T4 D1 4) P1 O3 W2 X7 Y5 T2 D1 D) P3 O5 W1 X6 Y1 T4 D1 5) P1 O3 W2 X11 Y1 T2 D1 E) P3 O5 W1 X6 Y7 T4 D1 6) P1 O3 W2 X9 Y1 T2 D1 F) P3 O5 W1 X9 Y1 T4 D1 7) P1 O3 W2 X3 Y7 T2 D1 8) P1 O3 W2 X4 Y8 T2 D2 I) P2 O4 W1 X1 Y1 T4 D1 9) P1 O1 W1 X14 Y1 T3 D1 II) P2 O4 W1 X6 Y1 T4 D1 10) P1 O1 W1 X13 Y1 T2 D1 11) P1 O3 W2 X7 Y8 T2 D2 $) P4 O3 W1 X12 Y1 T1 D1 10 Available resource: P1, one core Available resource: P1, two cores $11 can plot similar graphs with consumed energy, price, frequency, faults depending on user needs Most of the time underperforming systems! Waste of expensive resources and time! Getting worse, not better! What should we do?
  14. 14. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 14 Result Consider tasks and computational resources as a complex physical system Continuously observe behavior (characteristics); check for normality Requirements ( r ) Properties ( p ) System/task state ( s ) Gradually expose all available algorithm, design and optimization choices Behavior / characteristics ( b ) Expose additional information Continuously learning (modeling) observed behavior Predict optimal choices / behavior if enough knowledge If unexpected behavior, continuously improve models (active learning), increase granularity, find more properties Why not to use machine learning to predict optimizations? Combine interdisiplinary knowledge in physics, electronics, mathematics, neural networks and machine learningUser task
  15. 15. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 15 Combine autotuning with machine learning and crowdsourcing Plugin-based MILEPOST GCC Plugins Monitor and explore optimization space Extract semantic program features cTuning.org: plugin-based auto-tuning framework and public repository Program or kernel1 Program or kernel N … Training Unseen program Prediction MILEPOST GCC Plugins Collect dynamic features Cluster Build predictive model Extract semantic program features Collect hardware counters Predict optimization to minimize execution time, power consumption, code size, etc • G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011 •G Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010 •G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009 • F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006
  16. 16. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 16 • G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011 •G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010 •G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009 • F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006 Plugin-based MILEPOST GCC Plugins Monitor and explore optimization space Extract semantic program features cTuning.org: plugin-based auto-tuning framework and public repository Program or kernel1 Program or kernel N … Training Unseen program Prediction MILEPOST GCC Plugins Collect dynamic features Cluster Build predictive model Extract semantic program features Collect hardware counters Predict optimization to minimize execution time, power consumption, code size, etc In 2009, we opened public repository of knowledge (cTuning.org) and managed to automatically tune customer benchmarks and compiler heuristics for a range of real platforms from IBM and ARC (Synopsis) Now becomes a hot topic - everything is solved? Combine autotuning with machine learning and crowdsourcing
  17. 17. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 17 Technological chaos GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS 2013 XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA 4.x gprofprof perf oprofile PAPI TAU Scalasca VTune Amplifier predictive scheduling algorithm- level TBB MKL ATLAS program- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering KNN per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size ARM v6 threads execution time reliability GCC 4.8.x LLVM 3.4 SVM genetic algorithms We also experienced a few more problems ARM v8 Intel SandyBridge SSE4 AVX • Everything changes all the time • Difficult to reproduce results collected from multiple users (including variability of performance data and constant changes in the system) • Difficult to expose choices, observe behavior and extract features (tools are not prepared for auto-tuning and machine learning) • Difficult to share experimental setups (many SW/HW dependencies) including code, data and their features • Difficult to save heterogeneous and continuously changing data in MySQL It’s not about machine learning – it’s about effective data and knowledge management CUDA 5.x SimpleScalar algorithm precision
  18. 18. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 18 Collective Knowledge Project: how to keep track of all past R&D? GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS 2013 XLC Open64 Testarossa OpenMP MPI HMPP OpenCL CUDA 4.x gprofprof perf oprofile PAPI TAU Scalasca VTune Amplifier predictive scheduling algorithm- level TBB MKL ATLAS program- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO process pass reordering per phase reconfiguration cache size frequency bandwidth HDD size TLB ARM v6 threads execution time reliability GCC 4.8.x LLVM 3.4 SVM genetic algorithms ARM v8 Intel SandyBridge SSE4 AVX CUDA 5.x SimpleScalar algorithm precision image-jpeg-0001 bzip2-0006 txt-0012 video-raw-1280x1024 GCC 5.0.1 bin GCC 5.0.1 source LLVM 3.6 gmp 5.0.5 mpfr 3.1.0 lapack 2.3.0 java apache commons codec 1.7 image corner detection matmul CUDA compression neural network OpenCL Group: programs Have some common functions: compile, run, etc … Group: data sets Have some common functions: find, extract features Group: packages Have some common functions: install, check dependencies Gradually cleaning up the mess Have some common meta: which datasets can use, how to compile, CMD, … Have some (common) meta: filename, size, width, height, colors, … Have some (common) meta: dependencies, installation scripts, …
  19. 19. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 19 Saviour – python, extensible json, and local repository image-jpeg-0001 bzip2-0006 txt-0012 video-raw-1280x1024 GCC 5.0.1 bin GCC 5.0.1 source LLVM 3.6 gmp 5.0.5 mpfr 3.1.0 lapack 2.3.0 java apache commons codec 1.7 image corner detection matmul CUDA compression neural network OpenCL Gradually cleaning up the mess meta.json meta.json meta.json Python wrapper: program Functions: compile, run Python wrapper: dataset Functions: extract_features Python wrapper: package Functions: install UID or alias (UOA) UID or alias (UOA)
  20. 20. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 20 Saviour – python, extensible json, and local repository compiler GCC 4.4.4 GCC 4.9.2 LLVM 3.1 LLVM 3.4 package GCC 5.0.1 bin GCC 5.0.1 source LLVM 3.6 gmp 5.0.5 mpfr 3.1.0 lapack 2.3.0 java apache commons codec 1.7 dataset image-jpeg-0001 bzip2-0006 video-raw-1280x1024 … … … … … … … … … … module compiler package dataset … … … CK module JSON meta-descriptionFiles, directories Compiler flags Installation info Features Actions Dependenciesbetweendataandmodules .cmr / module UOA / data UOA (UID or alias) / .cm / data.json cMrepositorydirectorystructure:
  21. 21. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 21 Saviour – python, extensible json, and local repository compiler GCC 4.4.4 GCC 4.9.2 LLVM 3.1 LLVM 3.4 package GCC 5.0.1 bin GCC 5.0.1 source LLVM 3.6 gmp 5.0.5 mpfr 3.1.0 lapack 2.3.0 java apache commons codec 1.7 dataset image-jpeg-0001 bzip2-0006 video-raw-1280x1024 … … … … … … … … … … module compiler package dataset … … … CK module JSON meta-descriptionFiles, directories Compiler flags Installation info Features Actions Dependenciesbetweendataandmodules .cmr / module UOA / data UOA (UID or alias) / .cm / data.json cMrepositorydirectorystructure: Data can always be found via CID (similar to DOI): (repo UOA):module UOA : data UOA
  22. 22. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 22 Connecting it all together – CK python-based interface http://github.com/ctuning/ckTiny CK infrastructure (~200Kb), permissive BSD-license (preparing package for DEBIAN and PIP) Perform some action on an entry from CMD: ck [action] [module_uoa] [CID] @input.json For example, ck list program ck find dataset:image-jpeg-0001 ck compile program:slambench-1.1-opencl From python/ipython notebook: import ck.kernel as ck r=ck.access({‘action’:’compile’, ‘cid’:’slambench-1.1-opencl’}) if r[‘return’]>0: print r[‘error’] exit(1) As a web-service with simple JSON-based API: ck start web firefox http://localhost:3344/?action=load&cid=dataset:image-jpeg-0001 (returns meta in JSON or HTML) Can perform P2P information exchange Input to all functions: schma-free dict/JSON – extended when needed and abstracted by module Fixed keys: action, module_uoa, CID Output from all functions: dict/JSON Fixed keys: return, error
  23. 23. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 23 Behavior Choices Features State Hardwired experimental setups, very difficult to change, scale or share Collective Knowledge concept Meta description that should be exposed in the information flow for auto-tuning and machine learning Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments
  24. 24. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 24 CK module (wrapper) with unified and formalized input and output ProcessCMD Tool BVi Generated files Original unmodified ad-hoc input Behavior Choices Features State Collective Knowledge concept Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments
  25. 25. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 25 CK module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) Collective Knowledge concept Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments
  26. 26. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 26 CK module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Set environment for a given tool version Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) Multiple tool versions can co-exist, while their interface is abstracted by CK module Collective Knowledge concept Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments
  27. 27. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 27 CK module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Set environment for a given tool version Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) ck run pipeline:program --speed --energy --dataset_uoa=image_1024_768 --record --record_uoa=test123 ck add experiment:test123 ck replay experiment:test123 Runs on any HW, SW and OS (Android, Linux, Windows, MacOS …) Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments Collective Knowledge concept
  28. 28. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 28 Assembling, preserving, sharing and extending experimental pipeline as “LEGO” CK module (wrapper) with unified and formalized input and output Unified JSON input (meta-data) Tool BVM Tool BV2 Tool AVN Tool AV2 Tool AV1 Tool BV1 Ad-hoc analysis and learning scripts Ad-hoc tuning scripts Collection of CSV, XLS, TXT and other files Experiments ProcessCMD Tool BVi Behavior Choices Features State Action Action function Generated files Set environment for a given tool version Parse and unify output Unified JSON output (meta-data) Unified JSON input (if exists) Original unmodified ad-hoc input b = B( c , f , s ) … … … … Formalized function (model) of a component behavior Flattened JSON vectors (either string categories or integer/float values) Chaining CK components (wrappers) to an experimental pipeline for a given research and experimentation scenario Public modular auto-tuning and machine learning repository and buildbot Unified web services Interdisciplinary crowd Choose exploration strategy Generate choices (code sample, data set, compiler, flags, architecture …) Compile source code Run code Test behavior normality Pareto filter Modeling and prediction Complexity reduction Shared scenarios from past research …
  29. 29. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 29 Gradually adding specification (agile development) cTuning experiment module data.json { "characteristics":{ "execution times": ["10.3","10.1","13.3"], "code size": "131938", ...}, "choices":{ "os":"linux", "os version":"2.6.32-5-amd64", "compiler":"gcc", "compiler version":"4.6.3", "compiler_flags":"-O3 -fno-if-conversion", "platform":{"processor":"intel xeon e5520", "l2":"8192“, ...}, ...}, "features":{ "semantic features": {"number_of_bb": "24", ...}, "hardware counters": {"cpi": "1.4" ...}, ... } "state":{ "frequency":"2.27", ...} } cM flattened JSON key ##characteristics#execution_times@1
  30. 30. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 30 Gradually adding specification (agile development) cTuning experiment module data.json { "characteristics":{ "execution times": ["10.3","10.1","13.3"], "code size": "131938", ...}, "choices":{ "os":"linux", "os version":"2.6.32-5-amd64", "compiler":"gcc", "compiler version":"4.6.3", "compiler_flags":"-O3 -fno-if-conversion", "platform":{"processor":"intel xeon e5520", "l2":"8192“, ...}, ...}, "features":{ "semantic features": {"number_of_bb": "24", ...}, "hardware counters": {"cpi": "1.4" ...}, ... } "state":{ "frequency":"2.27", ...} } cM flattened JSON key ##characteristics#execution_times@1 "flattened_json_key”:{ "type": "text”|"integer" | “float" | "dict" | "list” | "uid", "characteristic": "yes" | "no", "feature": "yes" | "no", "state": "yes" | "no", "has_choice": "yes“ | "no", "choices": [ list of strings if categorical choice], "explore_start": "start number if numerical range", "explore_stop": "stop number if numerical range", "explore_step": "step if numerical range", "can_be_omitted" : "yes" | "no" ... }
  31. 31. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 31 User A CK repo: ck-analytics CK repo: ck-env CK repo: ctuning-datasets-min CK repo: ck-autotuning CK repo: ctuning-programs CK repo: private-experiments User B CK repo: private-experiments CK repo: ctuning-programs CK repo: ctuning-datasets-min CK repo: ck-analytics CK repo: ck-env CK repo: ck-autotuning Private GIT Semi-private GIT Can be used in companies via private repos, while supporting common experimental methodology (reporting performance issues, sharing code samples and data sets) Enabling open collaboration and code/data sharing as reusable components with CK wrappers Get new repo simply as ck pull repo:ck-analytics CK web interface (with JSON API) See cknowledge.org/repo All interconnected and reusable artifacts (code&data), experiments, interactive graphs, predictive models, papers, reports, … Tiny CK core ~100Kb Python + JSON API github.com/ctuning/ck Public GIT (github; bitbucket) See our CK repositories at github.com/ctuning Optional JSON- based ElasticSearch indexing Organizing local and shared (public or private) repos
  32. 32. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 32 Apply top-down experimental methodology similar to physics Gradually expose some characteristics Gradually expose some choices Algorithm selection (time) productivity, variable- accuracy, complexity … Language, MPI, OpenMP, TBB, MapReduce … Compile Program time … compiler flags; pragmas … Code analysis & Transformations time; memory usage; code size … transformation ordering; polyhedral transformations; transformation parameters; instruction ordering … Process Thread Function Codelet Loop Instruction Run code Run-time environment time; power consumption … pinning/scheduling … System cost; size … CPU/GPU; frequency; memory hierarchy … Data set size; values; description … precision … Run-time analysis time; precision … hardware counters; power meters … Run-time state processor state; cache state … helper threads; hardware counters … Analyze profile time; size … instrumentation; profiling … Coarse-grain vs. fine-grain effects: depends on user requirements and expected ROI
  33. 33. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 33 Assemble pipelines from shared components •Init pipeline •Detected system information •Initialize parameters •Prepare dataset •Clean program •Prepare compiler flags •Use compiler profiling •Use cTuning CC/MILEPOST GCC for fine-grain program analysis and tuning •Use universal Alchemist plugin (with any OpenME-compatible compiler or tool) •Use Alchemist plugin (currently for GCC) •Compile program •Get objdump and md5sum (if supported) •Use OpenME for fine-grain program analysis and online tuning (build & run) •Use 'Intel VTune Amplifier' to collect hardware counters •Use 'perf' to collect hardware counters •Set frequency (in Unix, if supported) •Get system state before execution •Run program •Check output for correctness (use dataset UID to save different outputs) •Finish OpenME •Misc info •Observed characteristics •Observed statistical characteristics •Finalize pipeline We can easily assemble, extend and customize research, design and experimentation pipelines for company needs! We gradually unify and clean up ad-hoc setups! http://cknowledge.org/repo • Hundreds of benchmarks/kernels/codelets (CPU, OpenMP, OpenCL, CUDA) • Thousands of data sets • Description of major compilers: GCC 4.x, GCC 5.x, LLVM 3.x, ICC 12.x
  34. 34. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 34 Adaptive workload scheduling combined with active learning Original features (properties) : V1=GWS0 V2=GWS1 V3=GWS2 V4=cpu_freq V5=gpu_freq V6=block size V7=image cols V8=image rows Designed features: V9=image size V10=size_div_by_cpu_freq V11=size_div_by_gpu_freq V12=cpu_freq_div_by_gpu V13=size_div_by_cpu_div_by_gpu_freq V14=image_size_div_by_cpu_freq Application: OpenCL based real time video stream processing for mobile devices Experiments: 276 builds/runs with random features Characteristics: CPU execution time GPU ONLY execution time GPU + MEM COPY execution time Devices: Chromebook 1: 4x Mali-T60x / 2x A15 Chromebook 2: 4x Mali-T62x / 4x A15 Objective (divide execution time): CPU/GPU COPY > 1.07 (true/false)? (useful for adaptive scheduling) Our user had an real-time and machine-learning based image processing applications run on mobile device with GPUs – should it be always offloaded to GPU? ck build model.sklearn ck validate module.sklearn (operates with ‘features’ and ‘characteristics’ keys in JSON) EU FP7 TETRACOM project: cTuning and ARM
  35. 35. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 35 Samsung Chromebook1 Automatically built decision tree with scikit-learn when more data is available. Not a black box - gives hints to engineers where to focus their attention. Can drive further exploration on areas with “unusual” behavior. 96% prediction rate EU FP7 TETRACOM project: cTuning and ARM Adaptive workload scheduling combined with active learning
  36. 36. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 36 Samsung Chromebook2 Using old model 74% prediction rate Adaptive workload scheduling combined with active learning EU FP7 TETRACOM project: cTuning and ARM
  37. 37. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 37 Samsung Chromebook2 More data, new model 96% prediction rate ADAPTIVE SCHEDULING gives ~32% performance improvement in comparison with always using GPU Adaptive workload scheduling combined with active learning Results shared with the community for reproducibility: cknowledge.org/repo/web.php?wcid=bc0409fb61f0aa82:fd54cd4b3b73b72b cknowledge.org/repo/web.php?wcid=bc0409fb61f0aa82:3bfd697a48fbba16
  38. 38. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 38 Reproducibility comes as a side effect! • Can preserve the whole experimental setup with all data and software dependencies • Can perform statistical analysis for characteristics • Community can add missing features or improve machine learning models Variation of experimental results: 10.5 ± 6.5 secs. Reproducibility of experimental results as a side effect
  39. 39. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 39 Execution time (sec.) Distribution Unexpected behavior - expose to the community including experts to explain, find missing feature and add to the system Reproducibility of experimental results as a side effect Reproducibility comes as a side effect! • Can preserve the whole experimental setup with all data and software dependencies • Can perform statistical analysis for characteristics • Community can add missing features or improve machine learning models
  40. 40. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 40 Execution time (sec.) Distribution Class A Class B 800MHz CPU Frequency 2400MHz Unexpected behavior - expose to the community including experts to explain, find missing feature and add to the system Reproducibility of experimental results as a side effect Reproducibility comes as a side effect! • Can preserve the whole experimental setup with all data and software dependencies • Can perform statistical analysis for characteristics • Community can add missing features or improve machine learning models
  41. 41. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 41 / 73 Making computer engineering a data science GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.1 Phoenix MVS XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA gprof prof perf oprofile PAPI TAU Scalasca VTune Amplifier scheduling algorithm-level TBB MKL ATLASprogram-level function-level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering run-time adaptation per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size coresprocessors threads power consumption execution time reliability Current state of computer engineering likwid Classification, predictive modeling Optimal solutions Systematization and unification of collective knowledge (big data) “crowd” Collaborative Infrastructure and repository for continuous online learning Task Result Quick, non-reproducible hack? Ad-hoc heuristic? Quick publication? Waste of expensive resources and energy? cTuning.org collaborative approach Continuous systematization and unification of design and optimization of computer systems Extrapolate collective knowledge to build faster and more power efficient self-tuning computer systems to boost innovation in science and technology!
  42. 42. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 42 • Preparing Collective Knowledge for release (~August 2015) Beta already available (BSD license): http://github.com/ctuning/ck • Testing pilot live repository for code, data and interactive report sharing: http://cknowledge.org/repo • Crowdsourcing experiments using spare mobile phones or cloud services: https://play.google.com/store/apps/details?id=com.collective_mind.node • Preparing documentation and interactive demos (will take some time) Current status •Developing common methodology with ACM on code/data sharing along with publications, and validation of experimental results (Artifact Evaluation at major compiler/architecture conferences including CGO / PPoPP) • Raising more funds to continue this R&D
  43. 43. Grigori Fursin , Anton Lokhmotov “Python and scikit-learn based open research SDK for collaborative data management and exchange” 43 Our approach opened up many interesting R&D opportunities Thank you for attention! Follow us: @c_tuning @grigori_fursin http://github.com/ctuning/ck http://cknowledge.org/repo Get in touch: Grigori.Fursin@cTuning.org / anton@dividiti.com Recent publications about CK concept and community activities: • "Collective Mind: Towards practical and collaborative autotuning“, Journal of Scientific Programming 22 (4), 2014, http://hal.inria.fr/hal-01054763 • “Collective Mind, Part II: Towards Performance- and Cost-Aware Software Engineering as a Natural Science”, CPC 2015, London, UK, http://arxiv.org/abs/1506.06256 • “Community-driven reviewing and validation of publications”, TRUST’14@PLDI’14, Edinburgh, UK, http://arxiv.org/abs/1406.4020

×