SlideShare a Scribd company logo
1 of 64
Download to read offline
MACHINE LEARNING IN HIGH
ENERGY PHYSICS
LECTURE #4
Alex Rogozhnikov, 2015
WEIGHTED VOTING
The way to introduce importance of classifiers
D(x) = (x)∑j
αj dj
GENERAL CASE OF ENSEMBLING:
D(x) = f ( (x), (x), …, (x))d1 d2 dJ
COMPARISON OF DISTRIBUTIONS
good options to compare 1d
use classifier's output to build discriminating variable
ROC AUC + Mann Whitney to get significance
RANDOM FOREST
composition of independent trees
ADABOOST
After building th base classifier:j
1.
2. increase weight of misclassified
= ln
( )
αj
1
2
wcorrect
wwrong
← ×wi wi e
− ( )αj yi
dj xi
GRADIENT BOOSTING
 → min
(x) = (x)Dj ∑j
=1j
′ αj
′ dj
′
(x) = (x) + (x)Dj Dj−1 αj dj
At th iteration:j
pseudo-residual
train regressor to minimize MSE:
find optimal
= − zi
∂
∂D( )xi
∣∣D(x)= (x)Dj−1
dj
( ( ) − → min∑i
dj xi zi )
2
αj
LOSSES
regression,
Mean Squared Error
Mean Absolute Error
binary classification,
ExpLoss (aka AdaLoss)
LogLoss
ranking losses
boosting to uniformity: FlatnessLoss
y ∈ ℝ
(d( ) −∑i
xi yi )
2
d( ) −∑i
∣∣ xi yi
∣∣
= ±1yi
∑i
e
− d( )yi
xi
log(1 + )∑i
e
− d( )yi
xi
TUNING GRADIENT BOOSTING OVER
DECISION TREES
Parameters:
loss function
pre-pruning: maximal depth, minimal leaf size
subsample, max_features =
(learning_rate)
number of trees
N/3
η
LOSS FUNCTION
different combinations also may be used
TUNING GBDT
1. set high, but feasible number of trees
2. find optimal parameters by checking combinations
3. decrease learning rate, increase number of trees
See also: GBDT tutorial
BaggingClassifier(base_estimator=GradientBoostingClassifier(),
n_estimators=100)
ENSEMBLES: BAGGING OVER
BOOSTING
Different variations , are claimed to overcome single
GBDT.
[1] [2]
Very complex training, better quality if GB estimators are
overfitted
ENSEMBLES: STACKING
Correcting output of several classifiers with new classifier.
D(x) = f ( (x), (x), …, (x))d1 d2 dJ
To use unbiased predictions, use holdout or kFolding.
REWEIGHTING
Given two distributions: target and original, find new
weights for original distributions, that distributions will
coincide.
REWEIGHTING
Solution for 1 or 2 variables: reweight using bins (so-called
'histogram division').
Typical application: Monte-Carlo reweighting.
REWEIGHTING WITH BINS
Splitting all space in bins, for each bin:
compute , in each bin
multiply weights of original distribution in bin to
compensate the difference
woriginal wtarget
←wi wi
wtarget
woriginal
Problems
good in 1d, works sometimes in 2d, nearly impossible in
dimensions > 2
too few events in bin reweighting rule is unstable
we can reweight several times over 1d if variables aren't
correlated
⇒
reweighting using enhanced version of bin reweighter
reweighting using gradient boosted reweighter
GRADIENT BOOSTED REWEIGHTER
works well in high dimensions
works with correlated variables
produces more stable reweighting rule
HOW DOES GB REWEIGHTER
WORKS?
iteratively:
1. Find the tree, which is able to discriminate two distributions
2. Correct weights in each leaf
3. Reweight original distribution and repeat the process.
Less bins, bins are guaranteed to have high statistic in each.
DISCRIMINATING TREE
When looking for a tree with good discrimination, optimize
=χ2
∑l∈leaves
( −wl,target wl,original )
2
+wl,target wl,original
As before, using greedy minimization to build a tree, which
provides maximal .χ2
+ introducing all heuristics from standard GB.
QUALITY ESTIMATION
Quality of model can be estimated by the value of optimized
loss function. Though remember that different classifiers
use different optimization target
LogLoss ExpLoss
QUALITY ESTIMATION
different target of optimization use general quality
metrics
⇒
TESTING HYPOTHESIS WITH CUT
Example: we test hypothesis that there is signal channel
with fixed vs. no signal channel ( ).Br ≠ 0 Br = 0
: signal channel with
: there is no signal channel
H0 Br = const
H1
Putting a threshold: d(x) > threshold
Estimate number of signal / bkg events that will be selected:
s = α tpr, b = β fpr
:
:
H0 ∼ Poiss(s + b)nobs
H1 ∼ Poiss(b)nobs
TESTING HYPOTHESIS WITH CUT
Select some appropriate metric:
= 2(s + b) log(1 + ) − 2sAMS
2
s
b
Maximize it by selecting best
special holdout shall be used
or use kFolding
very poor usage of information from classifier
threshold
TESTING HYPOTHESIS WITH BINS
Splitting all data in bins:
.
n
x ∈ k-th bin ⇔ < d(x) ≤thrk−1 thrk
Expected amounts of signal:
= α( − ), = β( − )sk tpr
k−1
tpr
k
bk fpr
k−1
fpr
k
:
:
H0 ∼ Poiss( + )nk sk bk
H1 ∼ Poiss( )nk bk
Optimal statistics:
, = log(1 + )
∑
k
ck nk ck
sk
bk
take some approximation to test power and optimize
thresholds
FINDING OPTIMAL PARAMETERS
some algorithms have many parameters
not all the parameters are guessed
checking all combinations takes too long
FINDING OPTIMAL PARAMETERS
randomly picking parameters is a partial solution
given a target optimal value we can optimize it
no gradient with respect to parameters
noisy results
function reconstruction is a problem
Before running grid optimization make sure your metric is
stable (i.e. by train/testing on different subsets).
Overfitting by using many attempts is real issue.
OPTIMAL GRID SEARCH
stochastic optimization (Metropolis-Hastings, annealing)
regression techniques, reusing all known information
(ML to optimize ML!)
OPTIMAL GRID SEARCH USING
REGRESSION
General algorithm (point of grid = set of parameters):
1. evaluations at random points
2. build regression model based on known results
3. select the point with best expected quality according to
trained model
4. evaluate quality at this points
5. Go to 2 if not enough evaluations
Why not using linear regression?
GAUSSIAN PROCESSES FOR
REGRESSION
Some definitions: , where and are
functions of mean and covariance: ,
Y ∼ GP(m, K) m K
m(x) K( , )x1 x2
represents our prior expectation of quality
(may taken constant)
represents influence of known
results on expectation of new
RBF kernel is the most useful:
m(x) = Y(x)
K( , ) = Y( )Y( )x1 x2 x1 x2
K( , ) = exp(−| − )x1 x2 x1 x2 |
2
We can model the posterior distribution of results in each
point.
Gaussian Process Demo on Mathematica
Also see: .http://www.tmpl.fi/gp/
MINUTES BREAKx
PREDICTION SPEED
Cuts vs classifier
cuts are interpretable
cuts are applied really fast
ML classifiers are applied much slower
SPEED UP WAYS
Method-specific
Logistic regression: sparsity
Neural networks: removing neurons
GBDT: pruning by selecting best trees
Staging of classifiers (a-la triggers)
SPEED UP: LOOKUP TABLES
Split space of features into bins, simple piecewise-constant
function
SPEED UP: LOOKUP TABLES
Training:
1. split each variable into bins
2. replace values with index of bin
3. train any classifier on indices of bins
4. create lookup table (evaluate answer for each
combination of bins)
Prediction:
1. convert features to bins' indices
2. take answer from lookup table
LOOKUP TABLES
speed is comparable to cuts
allows to use any ML model behind
used in LHCb topological trigger ('Bonsai BDT')
Problems:
too many combination when number of features N > 10
(8 bins in 10 features),
finding optimal thresholds of bins
∼ 1Gb8
10
UNSUPERVISED LEARNING: PCA
[PEARSON, 1901]
PCA is finding axes along which variation is maximal
(based on principal axis theorem)
PCA DESCRIPTION
PCA is based on the principal axes
is orthogonal matrix, is diagonal matrix.
Q = ΛUU
T
U Λ
Λ = diag( , , …, ), ≥ ≥ ⋯ ≥λ1 λ2 λn λ1 λ2 λn
PCA: EIGENFACES
Emotion = α[scared] + β[laughs] + γ[angry]+. . .
PCA: CALORIMETER
Electron/jet recognition in ATLAS
AUTOENCODERS
autoencoding: , error hidden layer is
smaller
y = x | − y| → minŷ
CONNECTION TO PCA
When optimizing MSE:
( −
∑
i
ŷ
i
xi )
2
And activation is linear:
hi
ŷ
i
=
=
∑
j
wij xi
∑
aij hi
j
Optimal solution of optimization problem is PCA.
BOTTLENECK
There is an informational bottleneck algorithm will
eager to pass the most precious information.
⇒
MANIFOLD LEARNING TECHNIQUES
Preserve neighbourhood/distances.
Digits samples from MNIST
PCA and autoencoder:
PART-OF-SPEECH TAGGING
MARKOV CHAIN
MARKOV CHAIN IN TEXT
TOPIC MODELLING (LDA)
bag of words
generative model that describe different probabilities
fitting by maximizing log-likelihood
parameters of this model are useful
p(x)
COLLABORATIVE RESEARCH
problem: different environments
problem: data ping-pong
solution: dedicated machine (server) for a team
restart the research = run script or notebook
!ping www.bbc.co.uk
REPRODUCIBLE RESEARCH
readable code
code + description + plots together
keep versions + reviews
Solution: IPython notebook + github
Example notebook
EVEN MORE REPRODUCIBILITY?
Analysis preservation: VM with all needed stuff.
Docker
BIRD'S EYE VIEW TO ML:
Classification
Regression
Ranking
Dimensionality reduction
CLUSTERING
DENSITY ESTIMATION
REPRESENTATION LEARNING
Word2vec finds representations of words.
THE END

More Related Content

What's hot

20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variationsAndres Mendez-Vazquez
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Modelspetitegeek
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function범준 김
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMarjan Sterjev
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing홍배 김
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic netVivian S. Zhang
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleAlexander Litvinenko
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applicationsFrank Nielsen
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Hojin Yang
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approachnozomuhamada
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlonozomuhamada
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationMarjan Sterjev
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 

What's hot (20)

20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Models
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark Examples
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applications
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 

Similar to MLHEP 2015: Introductory Lecture #4

The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsFrank Nielsen
 
Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Derbew Tesfa
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Tomoya Murata
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Meanstthonet
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Atsushi Nitanda
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3MuhannadSaleh
 
Count-Distinct Problem
Count-Distinct ProblemCount-Distinct Problem
Count-Distinct ProblemKai Zhang
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Chiheb Ben Hammouda
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNDat Nguyen
 
Chapter 4 Simplex Method ppt
Chapter 4  Simplex Method pptChapter 4  Simplex Method ppt
Chapter 4 Simplex Method pptDereje Tigabu
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015Stefan Kühn
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca ldanozomuhamada
 

Similar to MLHEP 2015: Introductory Lecture #4 (20)

The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
Subquad multi ff
Subquad multi ffSubquad multi ff
Subquad multi ff
 
Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
Count-Distinct Problem
Count-Distinct ProblemCount-Distinct Problem
Count-Distinct Problem
 
Input analysis
Input analysisInput analysis
Input analysis
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
 
ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
Chapter 4 Simplex Method ppt
Chapter 4  Simplex Method pptChapter 4  Simplex Method ppt
Chapter 4 Simplex Method ppt
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015
 
Talk iccf 19_ben_hammouda
Talk iccf 19_ben_hammoudaTalk iccf 19_ben_hammouda
Talk iccf 19_ben_hammouda
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
 

Recently uploaded

Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 

Recently uploaded (20)

Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 

MLHEP 2015: Introductory Lecture #4

  • 1. MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #4 Alex Rogozhnikov, 2015
  • 2. WEIGHTED VOTING The way to introduce importance of classifiers D(x) = (x)∑j αj dj GENERAL CASE OF ENSEMBLING: D(x) = f ( (x), (x), …, (x))d1 d2 dJ
  • 3. COMPARISON OF DISTRIBUTIONS good options to compare 1d use classifier's output to build discriminating variable ROC AUC + Mann Whitney to get significance
  • 4.
  • 5. RANDOM FOREST composition of independent trees
  • 6. ADABOOST After building th base classifier:j 1. 2. increase weight of misclassified = ln ( ) αj 1 2 wcorrect wwrong ← ×wi wi e − ( )αj yi dj xi
  • 7.
  • 8. GRADIENT BOOSTING  → min (x) = (x)Dj ∑j =1j ′ αj ′ dj ′ (x) = (x) + (x)Dj Dj−1 αj dj At th iteration:j pseudo-residual train regressor to minimize MSE: find optimal = − zi ∂ ∂D( )xi ∣∣D(x)= (x)Dj−1 dj ( ( ) − → min∑i dj xi zi ) 2 αj
  • 9. LOSSES regression, Mean Squared Error Mean Absolute Error binary classification, ExpLoss (aka AdaLoss) LogLoss ranking losses boosting to uniformity: FlatnessLoss y ∈ ℝ (d( ) −∑i xi yi ) 2 d( ) −∑i ∣∣ xi yi ∣∣ = ±1yi ∑i e − d( )yi xi log(1 + )∑i e − d( )yi xi
  • 10. TUNING GRADIENT BOOSTING OVER DECISION TREES Parameters: loss function pre-pruning: maximal depth, minimal leaf size subsample, max_features = (learning_rate) number of trees N/3 η
  • 12.
  • 13.
  • 14. TUNING GBDT 1. set high, but feasible number of trees 2. find optimal parameters by checking combinations 3. decrease learning rate, increase number of trees See also: GBDT tutorial
  • 15. BaggingClassifier(base_estimator=GradientBoostingClassifier(), n_estimators=100) ENSEMBLES: BAGGING OVER BOOSTING Different variations , are claimed to overcome single GBDT. [1] [2] Very complex training, better quality if GB estimators are overfitted
  • 16. ENSEMBLES: STACKING Correcting output of several classifiers with new classifier. D(x) = f ( (x), (x), …, (x))d1 d2 dJ To use unbiased predictions, use holdout or kFolding.
  • 17. REWEIGHTING Given two distributions: target and original, find new weights for original distributions, that distributions will coincide.
  • 18.
  • 19. REWEIGHTING Solution for 1 or 2 variables: reweight using bins (so-called 'histogram division'). Typical application: Monte-Carlo reweighting.
  • 20. REWEIGHTING WITH BINS Splitting all space in bins, for each bin: compute , in each bin multiply weights of original distribution in bin to compensate the difference woriginal wtarget ←wi wi wtarget woriginal Problems good in 1d, works sometimes in 2d, nearly impossible in dimensions > 2 too few events in bin reweighting rule is unstable we can reweight several times over 1d if variables aren't correlated ⇒
  • 21. reweighting using enhanced version of bin reweighter
  • 22. reweighting using gradient boosted reweighter
  • 23. GRADIENT BOOSTED REWEIGHTER works well in high dimensions works with correlated variables produces more stable reweighting rule
  • 24. HOW DOES GB REWEIGHTER WORKS? iteratively: 1. Find the tree, which is able to discriminate two distributions 2. Correct weights in each leaf 3. Reweight original distribution and repeat the process. Less bins, bins are guaranteed to have high statistic in each.
  • 25. DISCRIMINATING TREE When looking for a tree with good discrimination, optimize =χ2 ∑l∈leaves ( −wl,target wl,original ) 2 +wl,target wl,original As before, using greedy minimization to build a tree, which provides maximal .χ2 + introducing all heuristics from standard GB.
  • 26. QUALITY ESTIMATION Quality of model can be estimated by the value of optimized loss function. Though remember that different classifiers use different optimization target LogLoss ExpLoss
  • 27. QUALITY ESTIMATION different target of optimization use general quality metrics ⇒
  • 28. TESTING HYPOTHESIS WITH CUT Example: we test hypothesis that there is signal channel with fixed vs. no signal channel ( ).Br ≠ 0 Br = 0 : signal channel with : there is no signal channel H0 Br = const H1 Putting a threshold: d(x) > threshold Estimate number of signal / bkg events that will be selected: s = α tpr, b = β fpr : : H0 ∼ Poiss(s + b)nobs H1 ∼ Poiss(b)nobs
  • 29. TESTING HYPOTHESIS WITH CUT Select some appropriate metric: = 2(s + b) log(1 + ) − 2sAMS 2 s b Maximize it by selecting best special holdout shall be used or use kFolding very poor usage of information from classifier threshold
  • 30. TESTING HYPOTHESIS WITH BINS Splitting all data in bins: . n x ∈ k-th bin ⇔ < d(x) ≤thrk−1 thrk Expected amounts of signal: = α( − ), = β( − )sk tpr k−1 tpr k bk fpr k−1 fpr k : : H0 ∼ Poiss( + )nk sk bk H1 ∼ Poiss( )nk bk Optimal statistics: , = log(1 + ) ∑ k ck nk ck sk bk take some approximation to test power and optimize thresholds
  • 31. FINDING OPTIMAL PARAMETERS some algorithms have many parameters not all the parameters are guessed checking all combinations takes too long
  • 32. FINDING OPTIMAL PARAMETERS randomly picking parameters is a partial solution given a target optimal value we can optimize it no gradient with respect to parameters noisy results function reconstruction is a problem Before running grid optimization make sure your metric is stable (i.e. by train/testing on different subsets). Overfitting by using many attempts is real issue.
  • 33. OPTIMAL GRID SEARCH stochastic optimization (Metropolis-Hastings, annealing) regression techniques, reusing all known information (ML to optimize ML!)
  • 34. OPTIMAL GRID SEARCH USING REGRESSION General algorithm (point of grid = set of parameters): 1. evaluations at random points 2. build regression model based on known results 3. select the point with best expected quality according to trained model 4. evaluate quality at this points 5. Go to 2 if not enough evaluations Why not using linear regression?
  • 35. GAUSSIAN PROCESSES FOR REGRESSION Some definitions: , where and are functions of mean and covariance: , Y ∼ GP(m, K) m K m(x) K( , )x1 x2 represents our prior expectation of quality (may taken constant) represents influence of known results on expectation of new RBF kernel is the most useful: m(x) = Y(x) K( , ) = Y( )Y( )x1 x2 x1 x2 K( , ) = exp(−| − )x1 x2 x1 x2 | 2 We can model the posterior distribution of results in each
  • 36. point. Gaussian Process Demo on Mathematica Also see: .http://www.tmpl.fi/gp/
  • 38. PREDICTION SPEED Cuts vs classifier cuts are interpretable cuts are applied really fast ML classifiers are applied much slower
  • 39. SPEED UP WAYS Method-specific Logistic regression: sparsity Neural networks: removing neurons GBDT: pruning by selecting best trees Staging of classifiers (a-la triggers)
  • 40. SPEED UP: LOOKUP TABLES Split space of features into bins, simple piecewise-constant function
  • 41. SPEED UP: LOOKUP TABLES Training: 1. split each variable into bins 2. replace values with index of bin 3. train any classifier on indices of bins 4. create lookup table (evaluate answer for each combination of bins) Prediction: 1. convert features to bins' indices 2. take answer from lookup table
  • 42. LOOKUP TABLES speed is comparable to cuts allows to use any ML model behind used in LHCb topological trigger ('Bonsai BDT') Problems: too many combination when number of features N > 10 (8 bins in 10 features), finding optimal thresholds of bins ∼ 1Gb8 10
  • 43. UNSUPERVISED LEARNING: PCA [PEARSON, 1901] PCA is finding axes along which variation is maximal (based on principal axis theorem)
  • 44. PCA DESCRIPTION PCA is based on the principal axes is orthogonal matrix, is diagonal matrix. Q = ΛUU T U Λ Λ = diag( , , …, ), ≥ ≥ ⋯ ≥λ1 λ2 λn λ1 λ2 λn
  • 45. PCA: EIGENFACES Emotion = α[scared] + β[laughs] + γ[angry]+. . .
  • 47. AUTOENCODERS autoencoding: , error hidden layer is smaller y = x | − y| → minŷ
  • 48. CONNECTION TO PCA When optimizing MSE: ( − ∑ i ŷ i xi ) 2 And activation is linear: hi ŷ i = = ∑ j wij xi ∑ aij hi
  • 49. j Optimal solution of optimization problem is PCA. BOTTLENECK There is an informational bottleneck algorithm will eager to pass the most precious information. ⇒
  • 50. MANIFOLD LEARNING TECHNIQUES Preserve neighbourhood/distances.
  • 51. Digits samples from MNIST PCA and autoencoder:
  • 55. TOPIC MODELLING (LDA) bag of words generative model that describe different probabilities fitting by maximizing log-likelihood parameters of this model are useful p(x)
  • 56. COLLABORATIVE RESEARCH problem: different environments problem: data ping-pong solution: dedicated machine (server) for a team restart the research = run script or notebook !ping www.bbc.co.uk
  • 57. REPRODUCIBLE RESEARCH readable code code + description + plots together keep versions + reviews Solution: IPython notebook + github Example notebook EVEN MORE REPRODUCIBILITY? Analysis preservation: VM with all needed stuff. Docker
  • 58. BIRD'S EYE VIEW TO ML: Classification Regression Ranking Dimensionality reduction
  • 60.
  • 62. REPRESENTATION LEARNING Word2vec finds representations of words.
  • 63.