Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Intel HPC Developer Convention Salt Lake City 2016
Machine Learning Track
Franz J. Király
Data Analytics, Machine Learning...
An overview ofdata analytics
DATA Scientific
Questions
Exploration
Statistical
Questions
Methods
Quantitative Modelling
Pr...
Data analytics and data science
in a broader context
Data analytics
Data mining,
Machine learning
Statistics, Modelling,
R...
Big Data?
What „Big Data“ may mean in practice
Kernel methods, OLS
10.000
Solution strategiesNumber of data samples
Strategies that
...
Large-scale motifs in data science
Not necessarily a lot of data, but computationally intensive models
Classical example: ...
Customer: Hospital specializing in treatment of patients with a certain disease.
Meta-modelling: stylized case studies
Sci...
= data-centric and data-dependent modelling
Model validation and model selection
1. There is no model that is good for all...
Machine Learning
and Meta-Modelling
in a Nutshell
modelling
strategy
Leitmotifs of Machine Learning
Statistical models are objects in their own right
„learning
machines“
mo...
Problem types in Machine Learning
?? ?
Supervised Learning:
some data is labelled by expert/oracle
Task: predict label fro...
Problem types in Machine Learning
? ?
!
Unsupervised Learning:
the training data is not pre-labelled
Task: find „structure...
Advanced learning tasks
Semi-supervised learning
some training data are labelled, some are not
On-line learning
the data i...
observations
„training data“
predictions
model fitting
“learning”
fitted model
prediction
new data
??
model tuning paramet...
Example: Linear Regression
observations
„training data“
predictions
model fitting
“learning”
fitted model
prediction
new d...
Model validation: does the model make sense?
Model
learning
Prediction
„the truth“
„training data“
„test data“
e.g. regres...
„Re-sampling“:
training data 1
test data
Predictor 1
Predictor 2
Predictor 3training data 2
test data
Predictor 1
Predicto...
Quantitative model comparison
a „benchmarking experiment“ results in a table like this
model RMSE
15.3
?
Confidence region...
Meta-model: automated parameter tuning
training
data
test data
Parameters 1
Parameters 2
Parameters 3
mo
del
goodn
ess
1
5...
Meta-Strategies in ML
„Model
tuning“
Model with tuning parameters
Best tuning parameters are determined
using data-driven ...
Object dependencies in the ML workflow
all data
One interesting dataset
into multiple
train/test splits
training
datatest ...
Machine Learning
Toolboxes
An incomplete list of influential toolboxes
Modular API
(e.g., methods)
Model tuning,
meta-methods
Model validation
and co...
The object-oriented ML Toolbox API
Learning Machines
as found in the R/mlr or scikit-learn packages
Leading principles: en...
HPC for benchmarking/validation today
all data
Scikit-learn: joblib
training
datatest data
training
datatest data
training...
HPC support tomorrow?
1 2 M
Layer 2:
Layer 1:
full graph of
dependencies:
re-samples
algorithms
parameters
…
Scheduler for...
Challenges in ML APIs and HPC
Surprisingly few resources have been invested in ML toolboxes
Most advanced toolboxes are cu...
Upcoming SlideShare
Loading in …5
×

Data Analytics, Machine Learning, and HPC in Today’s Changing Application Environment

385 views

Published on

This session explains how solutions desired by such IT/Internet/Silicon Valley etc companies can look like, how they may differ from the more “classical” consumers of machine learning and analytics, and the arising challenges that current and future HPC development may have to cope with.

Published in: Technology
  • Be the first to comment

Data Analytics, Machine Learning, and HPC in Today’s Changing Application Environment

  1. 1. Intel HPC Developer Convention Salt Lake City 2016 Machine Learning Track Franz J. Király Data Analytics, Machine Learning and HPC in today’s changing application environment
  2. 2. An overview ofdata analytics DATA Scientific Questions Exploration Statistical Questions Methods Quantitative Modelling Predictive/InferentialDescriptive/Explanatory StatisticalProgramming R python TheScientificMethod Scientific and Statistical Validation Knowledge (practical)
  3. 3. Data analytics and data science in a broader context Data analytics Data mining, Machine learning Statistics, Modelling, Raw data Clean data Lot of problems and subtleties at these stages already often, most of manpower in „data“ project needs to go here first before one can attempt reliable Knowledge underlying arguments need to be explained well and properly Relevant findings and
  4. 4. Big Data?
  5. 5. What „Big Data“ may mean in practice Kernel methods, OLS 10.000 Solution strategiesNumber of data samples Strategies that stop working in reasonable time Number of features 10.000.000 10.000.000.000 1.000 Reading in all the data Random forests 100 L1, LASSO (around the same order) Manual exploratory data analysis 1.000 Super-linear algorithms Linear algorithms, including Sub-sampling On-line models Feature extraction Large-scale strategies for super-linear algorithms Feature selection Distributed computing
  6. 6. Large-scale motifs in data science Not necessarily a lot of data, but computationally intensive models Classical example: finite elements and other numerical models „Big models“ New fancy example: large neural networks aka „deep learning“ = where high-performance computing is helpful/impactful Computational challenge arises from processing all of the data Example: histogram or linear regression with huge amounts of data „Big data“ Common HPC motif: divide/conquer in parts-of-model, e.g. neurons/nodes = the „classic“, beloved by everyone = what it says, a lot of data (ca 1 million samples or more) Common HPC motif: divide/conquer training/fitting of model, e.g. batchwise/epoch fitting Model validation and model selection = this talk‘s focus Answers the question: which model is best for your data? Demanding even for simple models and small amounts of data! Example: is deep learning better than logistic regression, or guessing?
  7. 7. Customer: Hospital specializing in treatment of patients with a certain disease. Meta-modelling: stylized case studies Scientific question: depending on patient characteristics, predict the event risk. Patients with this disease are at-risk to experience an adverse event (e.g. death) Data set: complete clinical records of 1.000 patients, including event if occurred Customer: Retailer who wants to accurately model behaviour of customers. Not of interest: which algorithm/strategy, out of many, exactly solves the task Scientific question: predict future customer behaviour given past behaviour Customers can buy (or not buy) any of a number of products, or churn. Data set: complete customer and purchase records of 100.000 customers Of interest: model interpretability; how accurate the predictions are expected to be Customer: Manufacturer wishes to find best parameter setting for machines. Scientific question: find parameter settings which optimizes the above Parameters influence amount/quality of product (or whether machine breaks) Data set: outcomes for 10.000 parameter settings on those machines whether the algorithm/model is (easily) deployable in the „real world“
  8. 8. = data-centric and data-dependent modelling Model validation and model selection 1. There is no model that is good for all data. 2. For given data, there is no a-priori reason to believe that a certain type of model will be the best one. (otherwise the justification of validity is circular hence faulty) a scientific necessity implied by the scientific method and the following: Machine learning provides algorithms & theory for meta-modelling (otherwise the concept of a model would be unnecessary) (any such belief is not empirically justified hence pseudoscientific) 3. No model can be trusted unless its validity has been verified by a model-independent argument. and powerful algorithms motivated by meta-modelling optimality.
  9. 9. Machine Learning and Meta-Modelling in a Nutshell
  10. 10. modelling strategy Leitmotifs of Machine Learning Statistical models are objects in their own right „learning machines“ modelling strategy Engineering & statistics idea: Engineering & computer science idea: Computer science & statistics idea: Any abstract algorithm can be a modelling strategy/learning machine Future performance of algorithm/learning machine can be estimated „model validation“ „model selection“ „computational learning“ from the intersection of engineering, statistics and computer science Possibly non-explicit (and should) learning machine ?
  11. 11. Problem types in Machine Learning ?? ? Supervised Learning: some data is labelled by expert/oracle Task: predict label from covariates statistical models are usually discriminative Examples: regression, classification
  12. 12. Problem types in Machine Learning ? ? ! Unsupervised Learning: the training data is not pre-labelled Task: find „structure“ or „pattern“ in data statistical models are usually generative Examples: clustering, dimension reduction
  13. 13. Advanced learning tasks Semi-supervised learning some training data are labelled, some are not On-line learning the data is revealed with time, models need to update Anomaly detection all or most data are „positive examples“, the task is to flag „test negatives“ Complications in the labelling Complications through correlated data and/or time Forecasting each data point has a time stamp, predict the temporal future Transfer learning the data comes in dissimilar batches, train and test may be distinct Reinforcement learning data are not directly labelled, only indirect gain/loss
  14. 14. observations „training data“ predictions model fitting “learning” fitted model prediction new data ?? model tuning parameters e.g., to base decisions on What is a Learning Machine? Examples: generalized linear model, linear regression, support vector machine, neural networks (= „deep learning“), random forests, gradient boosting, … … an algorithm that solves, e.g., the previous tasks: Illustration: supervised learning machine
  15. 15. Example: Linear Regression observations „training data“ predictions model fitting “learning” fitted model prediction new data ? Fit intercept or not?
  16. 16. Model validation: does the model make sense? Model learning Prediction „the truth“ „training data“ „test data“ e.g. regression, GLM, advanced methods learnt model ? „test labels“ compare & quantify „out-of-sample“ „hold-out “ „in-sample“ Predictive models need to be validated on unseen data! Which means the part of data for testing has not been seen by the algorithm before! (note: this includes the case where machine = linear regression, deep learning, etc) The only (general) way to test goodness of prediction is actually observing prediction! ?? predictions e.g. evaluating the regression model prediction strategy learning machine
  17. 17. „Re-sampling“: training data 1 test data Predictor 1 Predictor 2 Predictor 3training data 2 test data Predictor 1 Predictor 2 Predictor 3training data 3 test data 3 Predictor 1 Predictor 2 Predictor 3 all data errors 1,2,3 errors 1,2,3 errors 1,2,3 aggregate errors 1,2,3 comparison k-fold cross-validation how to obtain training/test splitstype of re-sampling pros/cons 2. obtain k train/tests splits via: 1. divide data in k (almost) equal parts each part is test data exactly once the rest of data is the training set often: k=5 good compromise between runtime and accuracy Multiple algorithms are compared on multiple data splits/sub-datasets leave-one-out when k is small compared to data size = [number of data points]-fold c.v. very accurate, high run-time repeated sub-sampling parameters: training/test size # of repetitions 1. obtain a random sub-sample of training/test data of specified sizes (train/test need not cover all data) can be arbitrarily quick can be arbitrarily inaccurate (depending on parameter choice) 2. repeat 1. desired number of times can be combined with k-fold State-of-art principle in model validation, model comparison and meta-modelling
  18. 18. Quantitative model comparison a „benchmarking experiment“ results in a table like this model RMSE 15.3 ? Confidence regions (or paired tests) to compare models to each other: A is better than B / B is better than A / A and B are equally good Uninformed model (stupid model/random guess) needs to be included otherwise a statement „is better than an uninformed guess“ cannot be made. 9.5 13.6 20.1 ± 1.2 ± 0.9 ± 0.7 ± 1.4 MAE 12.3 7.3 11.4 18.1 ± 1.1 ± 0.8 ± 0.9 ± 1.7 „useful model“ = (significantly) better than uninformed baseline
  19. 19. Meta-model: automated parameter tuning training data test data Parameters 1 Parameters 2 Parameters 3 mo del goodn ess 1 5 . 3 ? 9 . 5 1 3 . 62 0 . 1 ± 1 . 2 ± 0 . 9 ± 0 . 7 ± 1 . 4 Best parameters whole training data Re-sampled training data Important caveat: Which measure of predictive goodness Which inner re-sampling scheme Methods are usually less sensitive to these „new“ tuning parameters the „inner“ training/test splits need to be part of any „outer“ training set otherwise validation is not out-of-sample! Re-sampling is used to determine [best parameter setting] For validation, new unseen data needs to be used: all data training data test data tuning train tuning test „real“ test model goodness 1 5 . 3 ? 9 . 5 1 3 . 6 2 0 . 1 ± 1 . 2 ± 0 . 9 ± 0 . 7 ± 1 . 4 Model w. Best Parameter training data fit to all predict & quantify Multi-fold-schemes are nested: „splits within splits“
  20. 20. Meta-Strategies in ML „Model tuning“ Model with tuning parameters Best tuning parameters are determined using data-driven tuning algorithm „Ensemble learning“ A B C D a number of (possibly „weak“) models A D B „strong“ ensemble model
  21. 21. Object dependencies in the ML workflow all data One interesting dataset into multiple train/test splits training datatest data is re-sampled training datatest data training datatest data „Typical number of“ 5-10 on each of which the strategies are compared 1 2 M M = 5-20 most of which are parameter- tuned by the same principle 10-10.000 parameter combinations Ensembles: further nesting 10-1.000 base learners Runtime = 10 x 10 x 5 x 1.000 (x 100) x one run on N samples 3-5 nested splits outer splits N = 100-100.000 data points („small data“) (usually O(N²) or O(N³) )
  22. 22. Machine Learning Toolboxes
  23. 23. An incomplete list of influential toolboxes Modular API (e.g., methods) Model tuning, meta-methods Model validation and comparison GUILanguage R caret python multi- interface R Java 3rd party wrappers python Common models Not entirely scikit-learn is perhaps the most widely used ML toolbox mostly kernels some Few, mostly classifiers few python
  24. 24. The object-oriented ML Toolbox API Learning Machines as found in the R/mlr or scikit-learn packages Leading principles: encapsulation, modularization modular structure Linear regression fit(traindata) „learning machine“ object predict(testdata) plus metadata & model info object orientation Abstraction models objects with unified API: Public interfaceConcept abstracted in R/mlr in sklearn fitting, predicting, set parameters Learner estimator Re-sampling schemes sample, apply & get results ResampleDesc splitter classes in model_selection Evaluation metrics compute from results, tabulate Measure metrics classes in metrics Meta-modelling wrapping machines by strategy Learning task benchmark, list strategies/measures Task Implicit, not encapsulated Tuning Ensembling Pipelining Pipeline various wrappersvarious wrappers fused classes
  25. 25. HPC for benchmarking/validation today all data Scikit-learn: joblib training datatest data training datatest data training datatest data „Typical number of“ 5-10 1 2 M M = 5-20 10-10.000 parameter combinations 10-1.000 base learners Plus algorithm-specific HPC interfaces, e.g. deep learning (mutually exclusive) 3-5 nested splits outer splits N = 100-100.000 data points („small data“) mlr: parallelMap 1 2 3 4 At the selected level: Distribute to clusters/cores (one of 1-4)
  26. 26. HPC support tomorrow? 1 2 M Layer 2: Layer 1: full graph of dependencies: re-samples algorithms parameters … Scheduler for algorithms and meta-algorithms Data/taskpipeline DATA (e.g. Hadoop) Layer 3: Optimized Primitives Layer 4: Hardware API Combining (?) MapReduce, DAAL, dask, joblib -> TBB? e.g. MKL, CUDA, BLAS e.g. distributed, multi-core, multi-type/heterogeneous (image source: continuum analytics) Linear systems convex optimization stoch. gradient descent (image source: Intel math kernel library)
  27. 27. Challenges in ML APIs and HPC Surprisingly few resources have been invested in ML toolboxes Most advanced toolboxes are currently open-source & academic Features that would be desirable to the practitioner but not available without mid-scale software development: Integration of (a) data management, (b) exploration and (c) modelling Full HPC integration on granular level for distributed ML benchmarking Non-standard modelling tasks, structured data (incl time series) data heterogeneity, multiple datasets, time series, spatial features, images etc forecasting, on-line learning, anomaly detection, change point detection especially challenging: integration in large scale scenarios e.g. MapReduce for divide/conquer over data, model parts, and models making full use parallelism for nesting and computational redundancies complete HPC architecture for whole model benchmarking workflow meta-modelling and re-sampling for these is an order of magnitude more costly

×