2. A generalization of the familiar bell shaped normal
density to several dimensions plays a fundamental role in
multivariate analysis
While real data are never exactly multivariate normal, the
normal density is often a useful approximation to the
“true” population distribution because of a central limit
effect.
One advantage of the multivariate normal distribution
stems from the fact that it is mathematically tractable
and “nice” results can be obtained.
Many real-world problems fall naturally within the
framework of normal theory. The importance of the
normal distribution rests on its dual role as both
population model for certain natural phenomena and
approximate sampling distribution for many statistics.
3. Determined by
Mean vector and Covariance matrix
The sample mean vector and covariance matrix
constitute a sufficient set of statistics.
Fundamental aspects to multivariate analysis:
The measurement and analysis of dependence
Between variables
Between sets of variables
Between variables and sets of variables
4. Multiple correlation coefficient:
extension of correlation to the relationship
of one variable to a set of variables.
Partial correlation coefficient:
A measure of dependence between two
variables when the effects of other correlated
variables have been removed.
5. Univariate case:
Testing the hypothesis that the mean of a
variable is zero.
Multivariate case:
Testing the hypothesis that the vector of
the means of several variables is the zero
vector.
Generalizing the test procedure of
univariate statistics to the multivariate
statistics-consider the dependence between
variables.
6. Univariate- the effect studied is the sum of
many independent random effects.
Multivariate normal distribution- the multiple
measurements are sums of small independent
effects.
Central limit theorem leads to the univariate
normal distribution for single variables.
General central limit theorem for several
variables lead to the multivariate normal
distribution.
7. Multivariate methods are developed and can be
studied in an organized and systematic way.
Practical use
The methods of analysis are mainly based on
standard operations of matrix algebra.
The distributions of many statistics involved can
be obtained exactly or at least characterized.
Optimum properties of procedures can be
deduced.
8. Permit the evaluation of observed variability and
significance of results by resampling methods
such as bootstrap and cross-validation.
Reduces the reliance on tables of significance
points
Nonparametric techniques are available when
nothing is known about the underlying
distributions.
Space does not permit treatment of outliers
and
transformation of variables to approximate
normality and homoscedasticity.
9. Image analysis using multivariate
analysis
Meteorological data analysis
Social network analysis
Others
11. The Normal N(μ, σ 2) has a density of the form
Definition: Extend the notion of a Normal random variable to
include constants as N(μ, 0) zero-variance (degenerate) random
variables.
A random vector X = (X1, . . . ,Xn) ∈ Rn has a multivariate
Normal distribution or a jointly Normal distribution if for every
constant vector w ∈ Rn the linear combination
has a univariate Normal distribution.
12.
13. Two different experiments
Rolling a die
Flipping a coin
Identify an event from each experiment
A=Rolling even number
B=Flipping heads
Intersection: occurs whenever event A and event B both
occur.
Notation : Either “A and B” or “A∩ B”
Rolling an even number and flipping heads
14. Experiment 1: Rolling a die
Event A1:Rolling an even number
Event A2:Rolling an odd number
Experiment 2: Flipping a coin
Event B1: Flipping heads
Event B2: Flipping tails
Four possible intersections
Rolling even and flipping heads
Rolling even and flipping tails
Rolling odd and flipping heads
Rolling odd and flipping tails
15. Two different experiments
Rolling a die
Flipping a coin
Identify an event from each experiment
A=Rolling even number
B=Flipping heads
Joint Distribution: the probability that the
intersection of two events occurs
Notation: Either “P(A and B)” or “P (A∩ B)”
16. Outcomes for one experiment listed along
the rows.
Outcomes for other experiment listed at top
of columns.
Joint probabilities go inside the table first
row,first column.
17. Subjects in a sample are asked if they smoke.
Results are further broken down by gender.
Table of Probabilities:
Question: What is the probability that a
person is male and does not smoke?
18. Marginal Distribution: Probability that an
individual event from one experiment occurs,
regardless of the outcomes from another
experiment
Computed by adding the probabilities across
the row(or down the column) of the desired
event
Always involve only one experiment
Get their names from the fact that they are
written in the margins of the table
Notation: P(A1)
19. Subjects are asked if they would vote for a
qualified woman for President. Results are
broken down by gender.
20. When one event occurs,it may impact the
probability of an event from a different
experiment.
Conditional Distribution: The probability that
a second event (B) will occur given that we
know that the first event (A) has already
occured .
Note: A and B come from two different
experiments
Notation: P(B| A) Vertical bar “|” means
“given”
21. To calculate the conditional probability:
Find the joint probability of A and B
Find the marginal probability of the event
that has already occurred (Event A)
Divide the joint probability by the marginal
probability
22. College students were asked if they have ever
cheated on an exam. Results were broken
down by gender.
Question: Given that a student has
cheated,What is the probability he is male?
Answer: P(Male|Cheater)=
=.32/.60=5.3333