Prepared By
Ms.W.Ancy Breen,A.P/CSE
SRMIST
 A generalization of the familiar bell shaped normal
density to several dimensions plays a fundamental role in
multivariate analysis
 While real data are never exactly multivariate normal, the
normal density is often a useful approximation to the
“true” population distribution because of a central limit
effect.
 One advantage of the multivariate normal distribution
stems from the fact that it is mathematically tractable
and “nice” results can be obtained.
 Many real-world problems fall naturally within the
framework of normal theory. The importance of the
normal distribution rests on its dual role as both
population model for certain natural phenomena and
approximate sampling distribution for many statistics.
 Determined by
Mean vector and Covariance matrix
The sample mean vector and covariance matrix
constitute a sufficient set of statistics.
 Fundamental aspects to multivariate analysis:
 The measurement and analysis of dependence
 Between variables
 Between sets of variables
 Between variables and sets of variables
 Multiple correlation coefficient:
extension of correlation to the relationship
of one variable to a set of variables.
 Partial correlation coefficient:
 A measure of dependence between two
variables when the effects of other correlated
variables have been removed.
 Univariate case:
Testing the hypothesis that the mean of a
variable is zero.
 Multivariate case:
Testing the hypothesis that the vector of
the means of several variables is the zero
vector.
Generalizing the test procedure of
univariate statistics to the multivariate
statistics-consider the dependence between
variables.
 Univariate- the effect studied is the sum of
many independent random effects.
 Multivariate normal distribution- the multiple
measurements are sums of small independent
effects.
 Central limit theorem leads to the univariate
normal distribution for single variables.
 General central limit theorem for several
variables lead to the multivariate normal
distribution.
 Multivariate methods are developed and can be
studied in an organized and systematic way.
 Practical use
 The methods of analysis are mainly based on
standard operations of matrix algebra.
 The distributions of many statistics involved can
be obtained exactly or at least characterized.
 Optimum properties of procedures can be
deduced.
 Permit the evaluation of observed variability and
significance of results by resampling methods
such as bootstrap and cross-validation.
 Reduces the reliance on tables of significance
points
 Nonparametric techniques are available when
nothing is known about the underlying
distributions.
 Space does not permit treatment of outliers
and
transformation of variables to approximate
normality and homoscedasticity.
 Image analysis using multivariate
analysis
 Meteorological data analysis
 Social network analysis
 Others
 Joint distributions
 Marginal distributions
 Statistical independence
 Conditional distribution
 The Normal N(μ, σ 2) has a density of the form
 Definition: Extend the notion of a Normal random variable to
include constants as N(μ, 0) zero-variance (degenerate) random
variables.
 A random vector X = (X1, . . . ,Xn) ∈ Rn has a multivariate
Normal distribution or a jointly Normal distribution if for every
constant vector w ∈ Rn the linear combination
has a univariate Normal distribution.
 Two different experiments
Rolling a die
Flipping a coin
Identify an event from each experiment
A=Rolling even number
B=Flipping heads
Intersection: occurs whenever event A and event B both
occur.
Notation : Either “A and B” or “A∩ B”
Rolling an even number and flipping heads
 Experiment 1: Rolling a die
Event A1:Rolling an even number
 Event A2:Rolling an odd number
 Experiment 2: Flipping a coin
Event B1: Flipping heads
 Event B2: Flipping tails
Four possible intersections
Rolling even and flipping heads
Rolling even and flipping tails
Rolling odd and flipping heads
Rolling odd and flipping tails
 Two different experiments
Rolling a die
Flipping a coin
Identify an event from each experiment
A=Rolling even number
B=Flipping heads
 Joint Distribution: the probability that the
intersection of two events occurs
 Notation: Either “P(A and B)” or “P (A∩ B)”
 Outcomes for one experiment listed along
the rows.
 Outcomes for other experiment listed at top
of columns.
 Joint probabilities go inside the table first
row,first column.
 Subjects in a sample are asked if they smoke.
 Results are further broken down by gender.
 Table of Probabilities:
 Question: What is the probability that a
person is male and does not smoke?
 Marginal Distribution: Probability that an
individual event from one experiment occurs,
regardless of the outcomes from another
experiment
Computed by adding the probabilities across
the row(or down the column) of the desired
event
Always involve only one experiment
Get their names from the fact that they are
written in the margins of the table
Notation: P(A1)
 Subjects are asked if they would vote for a
qualified woman for President. Results are
broken down by gender.
 When one event occurs,it may impact the
probability of an event from a different
experiment.
 Conditional Distribution: The probability that
a second event (B) will occur given that we
know that the first event (A) has already
occured .
 Note: A and B come from two different
experiments
 Notation: P(B| A)  Vertical bar “|” means
“given”
 To calculate the conditional probability:
Find the joint probability of A and B
Find the marginal probability of the event
that has already occurred (Event A)
Divide the joint probability by the marginal
probability
 College students were asked if they have ever
cheated on an exam. Results were broken
down by gender.
 Question: Given that a student has
cheated,What is the probability he is male?
 Answer: P(Male|Cheater)=
=.32/.60=5.3333

Multivariate and Conditional Distribution

  • 1.
  • 2.
     A generalizationof the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis  While real data are never exactly multivariate normal, the normal density is often a useful approximation to the “true” population distribution because of a central limit effect.  One advantage of the multivariate normal distribution stems from the fact that it is mathematically tractable and “nice” results can be obtained.  Many real-world problems fall naturally within the framework of normal theory. The importance of the normal distribution rests on its dual role as both population model for certain natural phenomena and approximate sampling distribution for many statistics.
  • 3.
     Determined by Meanvector and Covariance matrix The sample mean vector and covariance matrix constitute a sufficient set of statistics.  Fundamental aspects to multivariate analysis:  The measurement and analysis of dependence  Between variables  Between sets of variables  Between variables and sets of variables
  • 4.
     Multiple correlationcoefficient: extension of correlation to the relationship of one variable to a set of variables.  Partial correlation coefficient:  A measure of dependence between two variables when the effects of other correlated variables have been removed.
  • 5.
     Univariate case: Testingthe hypothesis that the mean of a variable is zero.  Multivariate case: Testing the hypothesis that the vector of the means of several variables is the zero vector. Generalizing the test procedure of univariate statistics to the multivariate statistics-consider the dependence between variables.
  • 6.
     Univariate- theeffect studied is the sum of many independent random effects.  Multivariate normal distribution- the multiple measurements are sums of small independent effects.  Central limit theorem leads to the univariate normal distribution for single variables.  General central limit theorem for several variables lead to the multivariate normal distribution.
  • 7.
     Multivariate methodsare developed and can be studied in an organized and systematic way.  Practical use  The methods of analysis are mainly based on standard operations of matrix algebra.  The distributions of many statistics involved can be obtained exactly or at least characterized.  Optimum properties of procedures can be deduced.
  • 8.
     Permit theevaluation of observed variability and significance of results by resampling methods such as bootstrap and cross-validation.  Reduces the reliance on tables of significance points  Nonparametric techniques are available when nothing is known about the underlying distributions.  Space does not permit treatment of outliers and transformation of variables to approximate normality and homoscedasticity.
  • 9.
     Image analysisusing multivariate analysis  Meteorological data analysis  Social network analysis  Others
  • 10.
     Joint distributions Marginal distributions  Statistical independence  Conditional distribution
  • 11.
     The NormalN(μ, σ 2) has a density of the form  Definition: Extend the notion of a Normal random variable to include constants as N(μ, 0) zero-variance (degenerate) random variables.  A random vector X = (X1, . . . ,Xn) ∈ Rn has a multivariate Normal distribution or a jointly Normal distribution if for every constant vector w ∈ Rn the linear combination has a univariate Normal distribution.
  • 13.
     Two differentexperiments Rolling a die Flipping a coin Identify an event from each experiment A=Rolling even number B=Flipping heads Intersection: occurs whenever event A and event B both occur. Notation : Either “A and B” or “A∩ B” Rolling an even number and flipping heads
  • 14.
     Experiment 1:Rolling a die Event A1:Rolling an even number  Event A2:Rolling an odd number  Experiment 2: Flipping a coin Event B1: Flipping heads  Event B2: Flipping tails Four possible intersections Rolling even and flipping heads Rolling even and flipping tails Rolling odd and flipping heads Rolling odd and flipping tails
  • 15.
     Two differentexperiments Rolling a die Flipping a coin Identify an event from each experiment A=Rolling even number B=Flipping heads  Joint Distribution: the probability that the intersection of two events occurs  Notation: Either “P(A and B)” or “P (A∩ B)”
  • 16.
     Outcomes forone experiment listed along the rows.  Outcomes for other experiment listed at top of columns.  Joint probabilities go inside the table first row,first column.
  • 17.
     Subjects ina sample are asked if they smoke.  Results are further broken down by gender.  Table of Probabilities:  Question: What is the probability that a person is male and does not smoke?
  • 18.
     Marginal Distribution:Probability that an individual event from one experiment occurs, regardless of the outcomes from another experiment Computed by adding the probabilities across the row(or down the column) of the desired event Always involve only one experiment Get their names from the fact that they are written in the margins of the table Notation: P(A1)
  • 19.
     Subjects areasked if they would vote for a qualified woman for President. Results are broken down by gender.
  • 20.
     When oneevent occurs,it may impact the probability of an event from a different experiment.  Conditional Distribution: The probability that a second event (B) will occur given that we know that the first event (A) has already occured .  Note: A and B come from two different experiments  Notation: P(B| A)  Vertical bar “|” means “given”
  • 21.
     To calculatethe conditional probability: Find the joint probability of A and B Find the marginal probability of the event that has already occurred (Event A) Divide the joint probability by the marginal probability
  • 22.
     College studentswere asked if they have ever cheated on an exam. Results were broken down by gender.  Question: Given that a student has cheated,What is the probability he is male?  Answer: P(Male|Cheater)= =.32/.60=5.3333