The brief introduction to the linear discriminant analysis and some extended methods. Much of the materials are taken from The Elements of Statistical Learning by Hastie et al. (2008).
A tutorial on LDA that first builds on the intuition of the algorithm followed by a numerical example that is solved using MATLAB. This presentation is an audio-slide, which becomes self-explanatory if downloaded and viewed in slideshow mode.
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
This lecture provides an overview of some modern regression techniques including a discussion of the bias variance tradeoff for regression errors and the topic of shrinkage estimators. This leads into an overview of ridge regression, LASSO, and elastic nets. These topics will be discussed in detail and we will go through the calibration/diagnostics and then conclude with a practical example highlighting the techniques.
A tutorial on LDA that first builds on the intuition of the algorithm followed by a numerical example that is solved using MATLAB. This presentation is an audio-slide, which becomes self-explanatory if downloaded and viewed in slideshow mode.
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
This lecture provides an overview of some modern regression techniques including a discussion of the bias variance tradeoff for regression errors and the topic of shrinkage estimators. This leads into an overview of ridge regression, LASSO, and elastic nets. These topics will be discussed in detail and we will go through the calibration/diagnostics and then conclude with a practical example highlighting the techniques.
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
This K-Nearest Neighbor Classification Algorithm presentation (KNN Algorithm) will help you understand what is KNN, why do we need KNN, how do we choose the factor 'K', when do we use KNN, how does KNN algorithm work and you will also see a use case demo showing how to predict whether a person will have diabetes or not using KNN algorithm. KNN algorithm can be applied to both classification and regression problems. Apparently, within the Data Science industry, it's more widely used to solve classification problems. It’s a simple algorithm that stores all available cases and classifies any new cases by taking a majority vote of its k neighbors. Now lets deep dive into these slides to understand what is KNN algorithm and how does it actually works.
Below topics are explained in this K-Nearest Neighbor Classification Algorithm (KNN Algorithm) tutorial:
1. Why do we need KNN?
2. What is KNN?
3. How do we choose the factor 'K'?
4. When do we use KNN?
5. How does KNN algorithm work?
6. Use case - Predict whether a person will have diabetes or not
Simplilearn’s Machine Learning course will make you an expert in Machine Learning, a form of Artificial Intelligence that automates data analysis to enable computers to learn and adapt through experience to do specific tasks without explicit programming. You will master Machine Learning concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, hands-on modeling to develop algorithms and prepare you for the role of Machine Learning Engineer
Why learn Machine Learning?
Machine Learning is rapidly being deployed in all kinds of industries, creating a huge demand for skilled professionals. The Machine Learning market size is expected to grow from USD 1.03 billion in 2016 to USD 8.81 billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
You can gain in-depth knowledge of Machine Learning by taking our Machine Learning certification training course. With Simplilearn’s Machine Learning course, you will prepare for a career as a Machine Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, Naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
Learn more at: https://www.simplilearn.com
This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques
Machine Learning With Logistic RegressionKnoldus Inc.
Machine learning is the subfield of computer science that gives computers the ability to learn without being programmed. Logistic Regression is a type of classification algorithm, based on linear regression to evaluate output and to minimize the error.
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
In this talk, I address two new ideas in sampling geometric objects. The first is a new take on adaptive sampling with respect to the local feature size, i.e., the distance to the medial axis. We recently proved that such samples acn be viewed as uniform samples with respect to an alternative metric on the Euclidean space. The second is a generalization of Voronoi refinement sampling. There, one also achieves an adaptive sample while simultaneously "discovering" the underlying sizing function. We show how to construct such samples that are spaced uniformly with respect to the $k$th nearest neighbor distance function.
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
This K-Nearest Neighbor Classification Algorithm presentation (KNN Algorithm) will help you understand what is KNN, why do we need KNN, how do we choose the factor 'K', when do we use KNN, how does KNN algorithm work and you will also see a use case demo showing how to predict whether a person will have diabetes or not using KNN algorithm. KNN algorithm can be applied to both classification and regression problems. Apparently, within the Data Science industry, it's more widely used to solve classification problems. It’s a simple algorithm that stores all available cases and classifies any new cases by taking a majority vote of its k neighbors. Now lets deep dive into these slides to understand what is KNN algorithm and how does it actually works.
Below topics are explained in this K-Nearest Neighbor Classification Algorithm (KNN Algorithm) tutorial:
1. Why do we need KNN?
2. What is KNN?
3. How do we choose the factor 'K'?
4. When do we use KNN?
5. How does KNN algorithm work?
6. Use case - Predict whether a person will have diabetes or not
Simplilearn’s Machine Learning course will make you an expert in Machine Learning, a form of Artificial Intelligence that automates data analysis to enable computers to learn and adapt through experience to do specific tasks without explicit programming. You will master Machine Learning concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, hands-on modeling to develop algorithms and prepare you for the role of Machine Learning Engineer
Why learn Machine Learning?
Machine Learning is rapidly being deployed in all kinds of industries, creating a huge demand for skilled professionals. The Machine Learning market size is expected to grow from USD 1.03 billion in 2016 to USD 8.81 billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
You can gain in-depth knowledge of Machine Learning by taking our Machine Learning certification training course. With Simplilearn’s Machine Learning course, you will prepare for a career as a Machine Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, Naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
Learn more at: https://www.simplilearn.com
This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques
Machine Learning With Logistic RegressionKnoldus Inc.
Machine learning is the subfield of computer science that gives computers the ability to learn without being programmed. Logistic Regression is a type of classification algorithm, based on linear regression to evaluate output and to minimize the error.
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
In this talk, I address two new ideas in sampling geometric objects. The first is a new take on adaptive sampling with respect to the local feature size, i.e., the distance to the medial axis. We recently proved that such samples acn be viewed as uniform samples with respect to an alternative metric on the Euclidean space. The second is a generalization of Voronoi refinement sampling. There, one also achieves an adaptive sample while simultaneously "discovering" the underlying sizing function. We show how to construct such samples that are spaced uniformly with respect to the $k$th nearest neighbor distance function.
ESRA2015 course: Latent Class Analysis for Survey ResearchDaniel Oberski
Slides for a 3-hour short course I gave at the European Survey Research Association's 2015 meeting in Reykjavík, Iceland.
This course gives a short introduction to Latent Class Analysis (LCA) for survey methodologists. R code and some Latent GOLD input is also provided.
The R code and data for the examples can be found at http://daob.nl/wp-content/uploads/2015/07/ESRA-LCA-analyses-data.zip
Logistic Regression, Linear and Quadratic Discriminant Analyses, and KNN Tarek Dib
A summary of the classification methods: Logistic regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis and a comparison of these three methods with K-Nearest Neighbors algorithm.
Extreme bound analysis based on correlation coefficient for optimal regressio...Loc Nguyen
Regression analysis is an important tool in statistical analysis, in which there is a demand of discovering essential independent variables among many other ones, especially in case that there is a huge number of random variables. Extreme bound analysis is a powerful approach to extract such important variables called robust regressors. In this research, a so-called Regressive Expectation Maximization with RObust regressors (REMRO) algorithm is proposed as an alternative method beside other probabilistic methods for analyzing robust variables. By the different ideology from other probabilistic methods, REMRO searches for robust regressors forming optimal regression model and sorts them according to descending ordering given their fitness values determined by two proposed concepts of local correlation and global correlation. Local correlation represents sufficient explanatories to possible regressive models and global correlation reflects independence level and stand-alone capacity of regressors. Moreover, REMRO can resist incomplete data because it applies Regressive Expectation Maximization (REM) algorithm into filling missing values by estimated values based on ideology of expectation maximization (EM) algorithm. From experimental results, REMRO is more accurate for modeling numeric regressors than traditional probabilistic methods like Sala-I-Martin method but REMRO cannot be applied in case of nonnumeric regression model yet in this research.
AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 3.
More info at http://summerschool.ssa.org.ua
Covariance matrices are central to many adaptive filtering and optimisation problems. In practice, they have to be estimated from a finite number of samples; on this, I will review some known results from spectrum estimation and multiple-input multiple-output communications systems, and how properties that are assumed to be inherent in covariance and power spectral densities can easily be lost in the estimation process. I will discuss new results on space-time covariance estimation, and how the estimation from finite sample sets will impact on factorisations such as the eigenvalue decomposition, which is often key to solving the introductory optimisation problems. The purpose of the presentation is to give you some insight into estimating statistics as well as to provide a glimpse on classical signal processing challenges such as the separation of sources from a mixture of signals.
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Frank Nielsen
Slides for the paper:
On the Chi Square and Higher-Order Chi Distances for Approximating f-Divergences
published in IEEE SPL:
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6654274
Gibbs flow transport for Bayesian inferenceJeremyHeng10
Workshop on "Computational Statistics and Molecular Simulation: A Practical Cross-Fertilization", Casa Matematica Oaxaca (CMO), 13 November 2018
Accompanying video: http://www.birs.ca/events/2018/5-day-workshops/18w5023/videos/watch/201811131630-Heng.html
Workshop details: http://www.birs.ca/events/2018/5-day-workshops/18w5023
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Linear Discriminant Analysis and Its Generalization
1. Linear Discriminant Analysis and Its Generalization
Chapter 4 and 12 of The Elements of Statistical Learning
Presented by Ilsang Ohn
Department of Statistics, Seoul National University
September 3, 2014
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 1 / 33
2. Contents
1 Linear Discriminant Analysis
2 Flexible Discriminant Analysis
3 Penalized Discriminant Analysis
4 Mixture Discriminant Analysis
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 2 / 33
3. Review of
Linear Discriminant Analysis
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 3 / 33
4. LDA: Overview
• Linear discriminant analysis (LDA) does classification by assuming
that the data within each class are normally distributed:
fk(x) = P(X = x|G = k) = N(µk, Σ).
• We allow each class to have its own mean µk ∈ Rp, but we assume a
common variance matrix Σ ∈ Rp×p. Thus
fk(x) =
1
(2π)p/2|Σ|1/2
exp −
1
2
(x − µk)T
Σ−1
(x − µk) .
• We want to find k so that P(G = k|X = x) ∝ fk(x)πk is the largest.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 4 / 33
5. LDA: Overview
• The linear discriminant functions are derived from the relation
log(fk(x)πk) = −
1
2
(x − µk)T
Σ−1
(x − µk) + log(πk) + C
= xT
Σ−1
µk −
1
2
µT
k Σ−1
µk + log(πk) + C ,
and we denote
δk(x) = xT
Σ−1
µk −
1
2
µT
k Σ−1
µk + log(πk).
• The decision rule is G(x) = argmaxkδk(x).
• The Bayes classifier is a linear classifier.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 5 / 33
6. LDA: Overview
• We need to estimate the parameters based on the training data
xi ∈ Rp and yi ∈ {1, · · · , K} by
• ˆπk = Nk/N
• ˆµk = N−1
k yi=k xi, the centroid of class k
• ˆΣ = 1
N−K
K
k=1 yi=k(xi − ˆµk)(xi − ˆµk)T , the pooled sample
variance matrix
• The decision boundary between each pair of classes k and l is given by
{x : δk(x) = δl(x)}
which is equivalent to
(ˆµk − ˆµl)T ˆΣ−1
x =
1
2
(ˆµk + ˆµl)T ˆΣ−1
(ˆµk − ˆµl) − log(ˆπk/ˆπl).
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 6 / 33
7. Fisher’s discriminant analysis
• Fisher’s idea is to find a covariate v such that
max
v
vT
Bv/vT
Wv.
where
- B =
K
k=1(¯xk − ¯x)(¯xk − ¯x)T
: between-class covariance matrix
- W =
K
k=1 yi=k(xi − ¯xk)(xi − ¯xk)T
: within-class covariance
matrix, previously denoted by (N − K)ˆΣ
• This ratio is maximized by v1 = e1, which is the eigenvector of
W−1B with the largest eigenvalue. The linear combination vT
1 X is
called first discriminant. Similarly one can find the next direction v2
orthogonal in W to v1.
• Fisher’s canonical discriminant analysis finds L ≤ K − 1 canonical
coordinates (or a rank-L subspace) that best separate the categories.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 7 / 33
8. Fisher’s discriminant analysis
• Consequently, we have v1, . . . , vL, L ≤ K − 1, which is the
eigenvectors with non-zero eigenvalues.
• Fisher’s discriminant rule assigns to the class closest in Mahalanobis
distance, so the rule is given by
G (x) = argmin
k
L
l=1
[vT
l (x − ¯xk)]2
= argmin
k
(x − ¯xk)T ˆΣ−1
(x − ¯xk)
= argmin
k
(−2δk(x) + xT ˆΣ−1
x + 2 log πk)
= argmax
k
(δk(x) − log πk).
• Thus Fisher’s rule is equivalent to the Gaussian classification rule with
equal prior probabilities.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 8 / 33
9. LDA by optimal scoring
• The standard way of carrying out a (Fisher’s) canonical discriminant
analysis is by way of a suitable SVD.
• There is a somewhat different approach: optimal scoring.
• This method is performing LDA using linear regression on derived
responses.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 9 / 33
10. LDA by optimal scoring
• Recall G = {1, · · · , K}.
• θ : G → R is a function that assigns scores to the classes such that
the transformed class labels are optimally predicted by linear
regression on X.
• We find L ≤ K − 1 sets of independent scorings for the class labels
{θ1, · · · , θL}, and L corresponding linear maps ηl(X) = XT βl chosen
to be optimal for multiple regression in Rl.
• θl and βl are chosen to minimize
ASR =
1
N
L
l=1
N
i=1
(θl(gi) − xT
i βl)2
.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 10 / 33
11. LDA by optimal scoring
Notation
• Y : N × K indicator matrix
• PX = X(XT X)−1XT : projection matrix onto the column space of
the predictors
• Θ: K × L matrix of L score vectors for the K classes.
• Θ∗ = Y Θ: N × K matrix with Θ∗
ij = θj(gi).
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 11 / 33
12. LDA by optimal scoring
Problem
• Minimize ASR by regressing Θ∗ on X. This says that find Θ that
minimizes
ASR(Θ) = tr(Θ∗T
(I − PX)Θ∗
)/N = tr(ΘT
Y T
(I − PX)Y Θ)/N
• ASR(Θ) is minimized by finding the L largest eigenvectors Θ of
Y T PXY with normalization ΘT DpΘ = IL.
• Hear Dp = Y T Y/N is a diagonal matrix of the sample class
proportions Nj/N.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 12 / 33
13. LDA by optimal scoring
Way to the solution
1. Initialize: Form Y : N × K.
2. Multivariate regression: Set ˆY = PXY and denote the p × K
coefficient matrix by B: ˆY = XB.
3. Optimal scores: Obtain the eigenvector matrix Θ of Y T ˆY = Y T PXY
with normalization ΘT DP Θ = I.
4. Update: Update the coefficient matrix in step 2 to reflect the optimal
scores: B ← BΘ. The final optimally scaled regression fit is the K − 1
vector function η(x) = BT x.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 13 / 33
14. LDA by optimal scoring
• The sequence of discriminant vectors νl in LDA are identical to the
sequence βl up to a constant.
• That is, the coefficient matrix B is, up to a diagonal scale matrix, the
same as the discriminant analysis coefficient matrix,
V T
x = DBT
x = Dη(x)
where Dll = 1/[α2
l (1 − α2
l )] and x is a test point. Here αl is lth
largest eigenvalue of Θ.
• Then the Mahalanobis distance is given by
δJ (x, ˆµk) =
K−1
l=1
wl(ˆηl(x) − ¯ηk
l )2
+ D(x)
where ¯ηk
l = N−1
k
nk
i=1 ˆηl(xi) and wl = 1/[α2
l (1 − α2
l )].
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 14 / 33
15. Generalization of LDA
• FDA: Allow non-linear decision boundary
• PDA: Expand the predictors into a large basis set, and then penalize
its coefficients to be smooth
• MDA: Model each class by a mixture of two or more Gaussians with
different centroids but same covariance, rather than a single Gaussian
distribution as in LDA
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 15 / 33
16. Flexible Discriminant Analysis
(Hastie et al., 1994)
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 16 / 33
17. FDA: Overview
• Optimal scoring method provides a starting point for generalizing
LDA to a nonparametric version.
• We replace the linear projection operator PX by a nonparametric
regression procedure, which we denote by the linear operator S.
• One simple and effective approach toward this end is to expand X
into a larger set of basis variables h(X) and then simply use
S = Ph(X) in place of PX.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 17 / 33
18. FDA: Overview
• This regression problems are defined via the criterion
ASR({θl, ηl}L
l=1) =
1
N
L
l=1
N
i=1
(θl(gi) − ηl(xi))2
+ λJ(ηl) ,
where J is a regularizer appropriate for some forms of nonparametric
regression (e.g., smoothing splines, additive splines and lower-order
ANOVA models).
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 18 / 33
19. FDA by optimal scoring
Way to the solution
1. Initialize: Form Y : N × K.
2. Multivariate nonparametric regression: Fit a multi-response adaptive
nonparametric regression of Y on X, giving fitted values ˆY : Let Sλ be
the linear operator that fits the the final chosen model and let η∗(x) be
the vector of fitted regression functions.
3. Optimal scores: Compute the eigen-decomposition of Θ of
Y T ˆY = Y T SλY , where the eigenvectors Θ are normalized:
ΘT DpΘ = IK.
4. Update: Update the final model from step 2 using the optimal scores:
η(x) ← ΘT η∗(x)
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 19 / 33
21. PDA: Overview
• Although FDA is motivated by generalizing optimal scoring, it can
also be viewed directly as a form of regularized discriminant analysis.
• Suppose the regression procedure used in FDA amounts to a linear
regression onto a basis expansion h(X), with a quadratic penalty on
the coefficients:
ASR({θl, ηl}L
l=1) =
1
N
L
l=1
N
i=1
(θl(gi) − hT
(xi)βl)2
+ λβT
l Ωβl
• Ω has a role to give penalty to “rough” ones
• The steps in FDA can be viewed as a generalized form of LDA, which
we call PDA.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 21 / 33
22. PDA: Overview
• Enlarge the set of predictors X via a basis expansion h(X).
• Use (penalized) LDA in the enlarged space, where the penalized
Mahalanobis distance is given by
D(x, µ) = (h(x) − h(µ))T
(ΣW + λΩ)−1
(h(x) − h(µ)),
where ΣW is the within-class covariance matrix of the derived
variables h(xi).
• Decompose the classification subspace using a penalized metric:
max uT
Σu subject to uT
(Σ + λΩ)u = 1
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 22 / 33
23. PDA by optimal scoring
Way to the solution
1. Initialize: Form Y and H = (hij) = (hj(xi)).
2. Multivariate nonparametric regression: Fit a penalized multi-response
regression of Y on H, giving fitted values ˆY = S(Ω)Y : Let
S(Ω) = H(HT H + Ω)−1HT be the smoother matrix of H regularized
by Ω and let β = (HT H + Ω)−1HT Y θ be the penalized least squares
estimate,
3. Optimal scores: Compute the eigen-decomposition of Θ of
Y T ˆY = Y T S(Ω)Y , where the eigenvectors Θ are normalized:
ΘT DpΘ = IK.
4. Update: Update the β.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 23 / 33
24. Mixture Discriminant Analysis
(Hastie and Tibshirani, 1996)
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 24 / 33
25. MDA: Overview
• Linear discriminant analysis can be viewed as a prototype classifier.
Each class is represented by its centroid, and we classify to the closest
using an appropriate metric.
• In many situations a single prototype is not sufficient to represent
inhomogeneous classes, and mixture models are more appropriate.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 25 / 33
26. MDA: Overview
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 26 / 33
27. MDA: Overview
• A Gaussian mixture model for the kth class has density
P(X|G = k) =
Rk
r=1
πkrφ(X; µkr, Σ)
where the mixing proportions πkr sum to one and Rk is a number of
prototypes for the kth class.
• The class posterior probabilities are given by
P(G = k|X = x) =
Rk
r=1 πkrφ(X; µkr, Σ)Πk
K
l=1
Rl
r=1 πlrφ(X; µlr, Σ)Πl
where Πk represent the class prior probabilities
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 27 / 33
28. MDA: Estimation
• We estimate the parameters by maximum likelihood, using the joint
log-likelihood based on P(G, X):
K
k=1gi=k
log
Rk
r=1
πkrφ(X; µkr, Σ)Πk
• We solve above MLEs by EM algorithm
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 28 / 33
29. MDA: Estimation
• E-step: Given the current parameters, compute the responsibility of
subclass ckr within class k for each of the class-k observations
(gi = k):
ˆp(ckr|xi, gi) =
πkrφ(xi; µkr, Σ)
Rk
l=1 πkrφ(xi; µkr, Σ)
.
• M-step: Compute the weighted MLEs for the parameters of each of
the component Gaussians within each of the classes, using the
weights from the E-step.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 29 / 33
30. MDA: Estimation
• The M-step is a weighted version of LDA, with R = K
k=1 RK classes
and K
k=1 NkRK observations.
• We can use optimal scoring as before to solve the weighted LDA
problem, which allows us to use a weighted version of FDA or PDA at
this stage.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 30 / 33
31. MDA: Estimation
• The indicator matrix YN×K collapses in this case to a blurred
response matrix ZN×R.
• For example,
c11 c12 c13 c21 c22 c23 c31 c32 c33
g1 = 2 0 0 0 0.3 0.5 0.2 0 0 0
g2 = 1 0.9 0.1 0.0 0 0 0 0 0 0
g3 = 1 0.1 0.8 0.1 0 0 0 0 0 0
g4 = 3 0 0 0 0 0 0 0.5 0.4 0.1
...
...
gN = 3 0 0 0 0 0 0 0.5 0.4 0.1
where the entries in a class-k row correspond to ˆp(ckr|x, gi).
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 31 / 33
32. MDA: Estimation by optimal scoring
Optimal scoring over EM-step of MDA:
1. Initialize: Start with set of Rk subclasses ckr, and associated subclass
probabilities ˆp(ckr|x, gi)
2. The blurred matrix: If gi = k, then fill the kth block of Rk entries in
the ith row with the values ˆp(ckr|x, gi), and the rest with 0s
3. Multivariate nonparametric regression: Fit a multi-response adaptive
nonparametric regression of Z on X, giving fitted values ˆZ. Let η∗(x)
be the vector of fitted regression functions.
4. Optimal scores: Let Θ be the largest K non-trivial eigenvectors of Z ˆZ,
with normalization ΘT DpΘ = IK.
5. Update: Update the final model from step 2 using the optimal scores:
η(x) ← ΘT η∗(x), and update ˆp(ckr|x, gi) and ˆπkr.
Presented by Ilsang Ohn Linear Discriminant Analysis and Its Generalization September 3, 2014 32 / 33