Multidimensional Data

Multidimensional Data
Dr. Ashutosh Satapathy
Assistant Professor, Department of CSE
VR Siddhartha Engineering College
Kanuru, Vijayawada
October 19, 2022
Dr. Ashutosh Satapathy Multidimensional Data October 19, 2022 1 / 86

Outline
1 Multivariate and
High-Dimensional Problems
2 Visualisation
Three-Dimensional
Visualisation
Parallel Coordinate Plots
3 Multivariate Random Vectors
and Data
Population Case
Sample Case
Multivariate Random Vectors
Gaussian Random Vectors
Marginal and Conditional
Normal Distributions

Multivariate and High-Dimensional Problems
Early in the twentieth century, scientists such as Pearson (1901),
Hotelling (1933) and Fisher (1936) developed methods for
analysing multivariate data in order to
1 Understand the structure in the data and summarise it in simpler
ways.
2 Understand the relationship of one part of the data to another part.
3 Make decisions and inferences based on the data.
The early methods these scientists developed are linear; as time
moved on, more complex methods were developed.
These data sets essential structure can often be obscured by noise.
Reduce the original data in such a way that informative and
interesting structure in the data is preserved while noisy, irrelevant
or purely random variables, dimensions or features are removed,
as these can adversely affect the analysis.

Multivariate and High-Dimensional Problems
Traditionally one assumes that the dimension d is small compared to
the sample size n.
Many recent data sets do not fit into this framework; we encounter
the following problems.
Data whose dimension is comparable to the sample size, and both
are large.
High-dimension and low sample size data whose dimension d
vastly exceeds the sample size n, so d ≥ n.
Functional data whose observations are functions.
High-dimensional and functional data pose special challenges.

Visualisation
Before we analyse a set of data, it is important to look at it.
Often we get useful clues such as skewness, bi- or multi-modality,
outliers, or distinct groupings.
Graphical displays are exploratory data-analysis tools, which, if
appropriately used, can enhance our understanding of data.
Visual clues are easier to understand and interpret than numbers
alone, and the information you can get from graphical displays can
help you understand answers that are based on numbers.

Outline
1 Multivariate and
2 Visualisation
Three-Dimensional
Visualisation
and Data
Population Case
Sample Case

Three-Dimensional Visualisation
Two-dimensional scatter-plots are a natural – though limited – way
of looking at data with three or more variables.
As the number of variables, and therefore the dimension increases.
We can, of course, still display three of the d dimensions in
scatter-plots, but it is less clear how one can look at more than
three dimensions in a single plot.
Figure 2.1 display the 10,000 observations and the three variables
CD3, CD8 and CD4 of the five-dimensional HIV+ and HIV- data
sets.
The data-sets contain measurements of blood cells relevant to HIV.

Figure 2.1: HIV+ data (left) and HIV- data (right) of variables CD3, CD8 and
CD4.
There are differences between the point clouds in the two figures, and an
important task is to exhibit and quantify the differences.

Projecting the Figure 2.1 data onto a number of orthogonal directions
and displaying the lower-dimensional projected data in Figure 2.2.
Figure 2.2: Orthogonal projections of the HIV+ data (left) and the HIV-
data(right).

We can see a smaller fourth cluster in the top right corner of the
HIV- data, which seems to have almost disappeared in the HIV+ data
in the left panel.
Many of the methods we explore use projections: Principal
Component Analysis, Factor Analysis, Multidimensional Scaling,
Independent Component Analysis and Projection Pursuit.
In each case the projections focus on different aspects and properties
of the data.

Figure 2.3: Three different species of Iris flowers.
We display the four variables of Fisher’s iris data – sepal length, sepal
width, petal length and petal width – in a sequence of three-
dimensional scatter-plots.

Figure 2.4: Features 1, 2 and 3 (top left), features 1, 2 and 4 (top right), features
1, 3 and 4 (bottom left) and features 2, 3 and 4 (bottom right).
Red refers to Setosa, green to Versicolor and black to Virginica.

Outline
1 Multivariate and
2 Visualisation
Three-Dimensional
Visualisation
and Data
Population Case
Sample Case

As the dimension grows, three-dimensional scatter-plots become
less relevant, unless we know that only some variables are important.
An alternative, which allows us to see all variables at once, present
the data in the form of parallel coordinate plots.
The idea is to present the data as two-dimensional graphs.
The variable numbers are represented as values on the y-axis in a
vertical parallel coordinate plot.
For a vector X = [X1,..., Xd ]T we represent the first variable X1 by
the point (X1, 1) and the jth variable Xj by (Xj , j).
Finally, we connect the d points by a line which goes from (X1, 1) to
(X2, 2) and so on to (Xd , d).
We apply the same rule to the next d-dimensional feature vectors.
Figure 2.5 shows a vertical parallel coordinate plot for Fisher’s iris
data.

Figure 2.5: Iris data with variables represented on the y-axis and separate colours
for the three species.
Red refers to the observations of Setosa, green to those of
Versicolor and black to those of Virginica.
Unlike the previous Figure 2.4, Figure 2.5 tells us that dimension 3
separates the two groups most strongly.

Instead of the three colours shown in Figure 2.5, different colors can
be used for each observation in Figure 2.6.
In a horizontal parallel coordinate plot, the x-axis represents the
variable numbers 1, ..., d. For a feature vector X = [X1 ··· Xd]T,
the first variable gives rise to the point (1, X1) and the jth variable
Xj to (j, Xj).
The d points are connected by a line, starting with (1, X1), then (2,
X2), until we reach (d, Xd).
As the variables are presented along the x-axis, horizontal parallel
coordinate plots are often used.
The differently coloured lines make it easier to trace particular
observations in Figure 2.6.

Figure 2.6: Parallel coordinate view of the illicit drug market data.
Figure 2.6 shows the 66 monthly observations on 15 features or
variables of the illicit drug market data.
Each observation (month) is displayed in a different colour.
Looking at variable 5, heroin overdose, the question arises whether
there could be two groups of observations corresponding to the high
and low values of this variable.

Outline
1 Multivariate and
2 Visualisation
Three-Dimensional
Visualisation
and Data
Population Case
Sample Case

Population Case
In data science, population is the entire set of items from which you
draw data for a statistical study. It can be a group of individuals, a
set of items, etc.
Generally, population refers to the people who live in a particular area
at a specific time. But in data science, population refers to data on
your study of interest.
It can be a group of individuals, objects, events, organizations, etc.
You use populations to draw conclusions. An example of a
population would be the entire student body at a school. The
problem statement is the percentage of students who speak English
fluently.
If you had to collect the same data from the entire country of India,
it would be impossible to draw reliable conclusions because of
geographical and accessibility constraints.
Making the data biased towards certain regions or groups.

Outline
1 Multivariate and
2 Visualisation
Three-Dimensional
Visualisation
and Data
Population Case
Sample Case

Sample Case
A sample is defined as a smaller and more manageable
representation of a larger group. A subset of a larger population
that contains characteristics of that population.
A sample is used in testing when the population size is too large for
all members or observations to be included in the test.
The sample is an unbiased subset of the population that best
represents the whole data.
The process of collecting data from a small subsection of the
population and then using it to generalize over the entire set is called
sampling.
Samples are used when the population is too large and unlimited in
size and the data collected is not reliable.
A sample should generally be unbiased and satisfy all variations
present in a population. A sample should typically be chosen at
random.

Outline
1 Multivariate and
2 Visualisation
Three-Dimensional
Visualisation
and Data
Population Case
Sample Case

Random vectors are vector-valued functions defined on a sample
space.
where v(t) is the vector function and f(t), g(t) and h(t) are the
coordinate functions of Cartesian 3-space.
It can be represented as v(t) =< f (t), g(t), h(t) >.
A vector-valued function is a mathematical function of one or
more variables whose range is a set of multidimensional vectors or
infinite-dimensional vectors.
the collection of random vectors as the data or the random sample.
Specific feature values are measured for each of the random vectors
in the collection.
We call these values the realised or observed values of the data or
simply the observed data.
The observed values are no longer random.

The Population Case
Let X = [X1 X2... Xd ]T
(1)
be a random vector from a distribution F:Rd → [0, 1]. The individual
Xj, with j ≤ d are random variables, also called the variables,
components or entries of X. X is d-dimensional or d-variate.
X has a finite d-dimensional mean or expected value EX and a
finite d×d covariance matrix var(X).
µ = EX, Σ = var(X) = E[(X − µ)(X − µ)T
] (2)
The µ and Σ are
µ = [µ1 µ2... µd ]T
, Σ =






σ2
1 σ12 . . . σ1d
σ21 σ2
2 . . . σ2d
. . . . . .
. . . . . .
σd1 σd2 . . . σ2
d






(3)

The Population Case
σj
2 = var(Xj) and σjk = cov(Xj , Xk). σjj for the diagonal elements
σj
2 of Σ.
X ∼ (µ, Σ) (4)
Equation 4 is shorthand for a random vector X which has mean µ
and covariance matrix Σ.
If X is a d-dimensional random vector and A is a d × k matrix,
for some k ≥ 1, then ATX is a k-dimensional random vector.
Result 1.1
Let X ∼ (µ, Σ) be a d-variate random vector. Let A and B be matrices
of size d × k and d × l, respectively.
The mean and covariance matrix of the k-variate random vector
ATX are ATX ∼ (ATµ, ATΣA).
The random vectors ATX and BTX are uncorrelated if and only if
ATΣB = 0k×l (All entries are 0s).

The Population Case
Question 1: Suppose you have a set of n=5 data items, representing 5
insects, where each data item has a height (X), width (Y), and speed (Z)
(therefore d = 3)
Table 3.1: Three features of five different insects.
Height (cm) Width (cm) Speed (m/s)
I1 0.64 0.58 0.29
I2 0.66 0.57 0.33
I3 0.68 0.59 0.37
I4 0.69 0.66 0.46
I5 0.73 0.60 0.55
Solution:
mean (µ) = [0.64+0.66+0.68+0.69+0.73
5 , 0.58+0.57+0.59+0.66+0.60
5 ,
0.29+0.33+0.37+0.46+0.55
5 ]T = [0.68, 0.60, 0.40]T

The Population Case
I1-µ = [-0.04, -0.02, -0.11]T, (I1-µ)(I1-µ)T =


0.0016 0.0008 0.0044
0.0008 0.0004 0.0022
0.0044 0.0022 0.0121


I2-µ = [-0.02, -0.03, -0.07]T, (I2-µ)(I2-µ)T =


0.0004 0.0006 0.0014
0.0006 0.0009 0.0021
0.0014 0.0014 0.0049


I3-µ = [0, -0.01, -0.03]T, (I3-µ)(I3-µ)T =


0 0 0
0 0.0001 0.0003
0 0.0003 0.0009


I4-µ = [0.01, 0.06, 0.06]T, (I4-µ)(I4-µ)T =


0.0001 0.0006 0.0006
0.0006 0.0036 0.0036
0.0006 0.0036 0.0036


I5-µ = [0.05, 0, 0.15]T, (I5-µ)(I5-µ)T =


0.0025 0 0.0075
0 0 0
0.0075 0.0036 0.0225



The Population Case
Covariance Matrix (Σ) = 1
n
Pn
i=1(Ii − µ)(Ii − µ)T =
1
5


0.0046 0.0020 0.0139
0.0020 0.005 0.0082
0.0139 0.0082 0.044

 =


9.2E − 4 4.0E − 4 0.00278
4.0E − 4 1.0E − 3 0.00164
0.00278 0.00164 0.0088


Definition
Mean: The mean is the average or the most common value in a collection
of numbers.
Variance: The expectation of the squared deviation of a random variable
from its mean.
Covariance: A measure of the relationship between two random variables
and to what extent, they change together.

The Population Case
Question 2: Verify result 1.1a for A =


0.02 0.01
0.01 0.03
0.03 0.02

 and
X =


0.64 0.66 0.68 0.69 0.73
0.58 0.57 0.59 0.66 0.60
0.29 0.33 0.37 0.46 0.55

, µATX = ATµX and ΣATX =
ATΣX A
Solution: ATX =

0.0273 0.0288 0.0306 0.0342 0.0371
0.0296 0.0303 0.0319 0.0359 0.0363

µATX =

0.0316
0.0328

and ATµX =

0.02 0.01 0.03
0.01 0.03 0.02

*


0.68
0.60
0.40

 =

0.0316
0.0328

Hence, µATX = ATµX

The Population Case
ATΣX A =

0.02 0.01 0.03
0.01 0.03 0.02

*


9.2E − 4 4.0E − 4 0.00278
4.0E − 4 1.0E − 3 0.00164
0.00278 0.00164 0.0088

 *


0.02 0.01
0.01 0.03
0.03 0.02

 =

0.000012868 0.000009794
0.000009794 0.000007832

(ATX)1 - µATX = [-0.0043, -0.0032]T
((ATX)1 - µATX)((ATX)1 - µATX)T =

0.00001849 0.00001376
0.00001376 0.00001024

(ATX)2 - µATX = [-0.0028, -0.0025]T
((ATX)2 -µATX)((ATX)2 - µATX)T =

0.00000784 0.000007
0.000007 0.00000625

(ATX)3 - µATX = [-0.001, -0.0009]T

0.000001 0.0000009
0.0000009 0.00000081


The Population Case
(ATX)4 - µATX = [0.0026, 0.0031]T

0.00000676 0.00000806
0.00000806 0.00000961

(ATX)5 - µATX = [0.0055, 0.0035]T

0.00003025 0.00001925
0.00001925 0.00001225

Covariance Matrix (ΣATX) = 1
n
Pn
i=1((ATX)i − µATX
)((ATX)i − µATX
)T
= 1
5

0.00006434 0.00004897
0.00004897 0.00003916

=

0.000012868 0.000009794
0.000009794 0.000007832

Hence ΣATX = ATΣX A

The Population Case
Question 3: Verify result 1.1b, ATX and BTX are correlated for A =


0.02 0.01
0.01 0.03
0.03 0.02

, B =


0.02 0.01
0.01 0.03
0.03 0.02

, X =


0.64 0.66 0.68 0.69 0.73
0.58 0.57 0.59 0.66 0.60
0.29 0.33 0.37 0.46 0.55


Solution: Σ =


9.2E − 4 4.0E − 4 0.00278
4.0E − 4 1.0E − 3 0.00164
0.00278 0.00164 0.0088


ATΣB =

0.02 0.01 0.03
0.01 0.03 0.02

*


9.2E − 4 4.0E − 4 0.00278
4.0E − 4 1.0E − 3 0.00164
0.00278 0.00164 0.0088

 *


0.02 0.01
0.01 0.03
0.03 0.02

 =

0.000012868 0.000009794
0.000009794 0.000007832

Hence, ATX and BTX are correlated with each-other.

The Sample Case
Let X1,...,Xn be d-dimensional random vectors. We assume that
the Xi are independent and from the same distribution F:Rd → [0,
1] with finite mean µ and covariance matrix Σ.
We omit reference to F when knowledge of the distribution is not
required, i.e., Rd → [0, 1].
In statistics one often identifies a random vector with its observed
values and writes Xi = xi
We explore properties of random samples but only encounter
observed values of random vectors. For this reason
X = [X1, X2, ..., Xn]T
(5)
for the sample of independent random vectors Xi and call this
collection a random sample or data.

The Sample Case
X =






X11 X21 . . . Xn1
X12 X22 . . . Xn2
. . . . . .
. . . . . .
X1d X2d . . . Xnd






=






X•1
X•2
.
.
X•d






(6)
The ith column of X is the ith random vector Xi, and the jth row of
X•j is the jth variable across all n random vectors. i in Xij refers to
the ith vector Xi, and the j refers to the jth variable.
For data, the mean µ and covariance matrix Σ are usually not
known; instead, we work with the sample mean X and the sample
covariance matrix S. It is represented by
X ∼ Sam(X, S) (7)
The sample mean and sample covariance matrix depend on the
sample size n.

The Sample Case
X =
1
n
n
X
i=1
Xi S =
1
n − 1
n
X
i=1
(Xi − X)(Xi − X)T
(8)
Definitions of the sample covariance matrix use n-1 or (n-1)-1 in
the literature. (n-1)-1 is preferred as an unbiased estimator of the
population variance Σ.
Xcent = X − X = [X1 − X, X2 − X, ..., Xn − X] (9)
Xcent is the centred data and it is of size dxn. Using this notation,
the d×d sample covariance matrix S becomes
S =
1
n − 1
XcentXT
cent =
1
n − 1
(X − X)(X − X)T
(10)

The Sample Case
The entries of the sample covariance matrix S are sjk, and
sjk =
1
n − 1
n
X
i=1
(Xij − mj )(Xik − mk) (11)
X =[m1,...,md]T, and mj is the sample mean of the jth variable.
As for the population, we write sj
2 or sjj for the diagonal elements
of S.
Consider a ∈ Rd; then the projection of X onto a is aTX.
Similarly, the projection of the matrix X onto a is done element-wise
for each random vector Xi and results in the 1×n vector aTX.

The Sample Case
Question 4: The math and science scores of good, average and poor
students from a class are given as follows:
Student Math (X) Science (Y)
1 92 68
2 55 30
3 100 78
Find the sample mean (X), covariance matrix (S), S12 of the above data.
Solution: X =

92 55 100
68 30 78

X =
92+55+100
3
68+30+78
3

=

82.33
58.66

X1 − X = [9.67, 9.34]T , (X1 − X)(X1 − X)T =

93.5089 90.3178
90.3178 87.2356

X2 − X = [−27.33, −28.66]T
(X2 − X)(X2 − X)T =

746.9289 783.2778
783.2778 821.3956


The Sample Case
X3 − X = [17.67, 19.34]T , (X3 − X)(X3 − X)T =

312.2289 341.7378
341.7378 374.0356

Sample covariance matrix (S) = 1
n−1
Pn
i=1(Xi − X)(Xi − X)T =
1
2

1152.6667 1215.3334
1215.3334 1282.6668

=

576.3335 607.6667
607.6667 641.3334

From Equation 11, S12 = 1
2
P3
i=1(Xi1 − m1)(Xi2 − m2)
S12 = 1
2[(X11 −m1)(X12 −m2)+(X21 −m1)(X22 −m2)+(X31 −m1)(X32−
m2)] = 1
2[(92 − 82.33)(68 − 58.66) + (55 − 82.33)(30 − 58.66) + (100 −
82.33)(78 − 58.66)] = 607.6667

The Sample Case
Question 5: Compute projection of matrix X =

92 55 100
68 30 78

onto a
vector

−45
45

.
Solution: P = aTX =

−45 45

*

92 55 100
68 30 78

=

−1080 −1125 −990

So, projection of X onto a vector [−45, 45]T is a 1x3 matrix.

Outline
1 Multivariate and
2 Visualisation
Three-Dimensional
Visualisation
and Data
Population Case
Sample Case

The univariate normal probability density function f is
f (X) =
1
σ
√
2π
e
−1
2
( X−µ
σ
)2
(12)
X ∼ N(µ, σ2
) (13)
Equation 13 is shorthand for a random value from the univariate
normal distribution with mean µ and variance σ2.
Figure 3.1: Three normal pdfs of 1000 random values having µ and σ are (0, 0.8),
(-2, 1) and (3, 2) respectively.

The d-variate normal probability density function f is
f (X) = (2π)−d
2 |Σ|−1
2 exp

−1
2(X − µ)T Σ−1(X − µ)

(14)
X ∼ N(µ, Σ) (15)
Equation 15 is shorthand for a d-dimensional random vector from the
d-variate normal distribution with mean µ and covariance matrix Σ.
Figure 3.2: 2-dimensional normal pdf having µ = [1, 2] and Σ =

0.25 0.3
0.3 1


Result 1.2
Let X ∼ N(µ, Σ) be d-variate, and assume that Σ−1 exists.
1 Let XΣ = Σ−1/2(X − µ); then XΣ ∼ N(0, Id×d), where Id×d is the
d×d identity matrix.
2 Let X2 = (X-µ)TΣ-1(X-µ); then X2 ∼ Xd
2, the Chi-squared X2
distribution in d degrees of freedom.
Question 6: Let X ∼ N(µ, Σ) be 2-variate, where µ = [2, 3]T , Σ =

4 0
0 16

and Σ−1 =

0.25 0
0 0.0625

. Verify Result 1.2.1 and 1.2.2
Solution:

Hence, the quantity X2 is a scalar random variable which has, as in
the one-dimensional case, a X2 - distribution, but this time in d
degrees of freedom.
Fix a dimension d ≥ 1. Let Xi ∼ N(µ, σ) be independent d -
dimensional random vectors for i = 1,..., n with sample mean X
and sample covariance matrix S.
We define Hotelling’s T2 by
T2
= n(X − µ)T
S−1
(X − µ) (16)
Question 7: Compute the Hotelling’s T2 of the sample X of size 2x5000
from Example 6.
Solution:

Further let Zj ∼ N(0, Σ) for j = 1,..., m be independent d -
dimensional random vectors, and let
W =
m
X
j=1
Zj ZT
j (17)

In Equation 17,
W be the d×d random matrix generated by the Zj.
W has the Wishart distribution W(m,Σ) with m degrees of
freedom and covariance matrix Σ.
m is the number of summands and Σ is the common d×d
covariance matrix.
Result 1.3
Let Xi ∼ N(µ, Σ) be d-dimensional random vectors for i = 1,...,n. Let
S be the sample covariance matrix, and assume that S is invertible.
1 The sample mean X satisfies X ∼ N (µ, Σ/n).
2 For n observations Xi and their sample covariance matrix S there
exist n-1 independent random vectors Zj ∼ N(0, Σ) such that
S = 1
n−1
Pn−1
j=1 (Zj − Z)(Zj − Z)T , where (n-1)S has a W((n-1), Σ)
Wishart distribution.

Result 1.3 (Continue)
3 Assume that nd. Let T2 be given by Equation 16. It follows that
n − d
(n − 1)d
T2
∼ Fd,n−d (18)
The F distribution in d and n-d degrees of freedom.
n−d
(n−1)d T2 of different set of random data of size n and dimension d
from a Gaussian distribution, has a F distribution in d and n-d degrees
of freedom.

Question 8: Let Z ∼ N(µ, Σ) 2-variate, where µ = [0, 0]T , Σ =

4 2
2 16

.
Compute W and (n-1)S matrices, and plot sample mean distribution.

The W matrix computed from Sample Covariance Matrix S is identical
to W matrix computed using population mean µ (Slide no. 62).

Let X ∼ (µ, Σ) be d-dimensional. The multivariate normal
probability density function f is
f (Xi ) = (2π)−d
2 det(Σ)−1
2 exp

−1
2(Xi − µ)T Σ−1(Xi − µ)

(19)
Where det(σ) is the determinant of Σ and X = [X1, X2, ···, Xn] of
independent random vectors from the normal distribution with
the mean µ and covariance matrix Σ.
The normal or Gaussian likelihood (function) L as a function of
the parameter θ of interest conditional on the data.
L(θ|X) = (2π)−nd
2 det(Σ)−n
2 exp

−1
2(Xi − µ)T Σ−1(Xi − µ)

(20)
The parameter of interest θ is mean µ and the covariance matrix
Σ. So, θ = (µ, Σ).

The maximum likelihood estimator (MLE) of θ, denoted by θ̂, is
θ̂ = (µ̂, Σ̂) (21)
µ̂ =
1
n
n
X
i=1
Xi = X (22)
Σ̂ =
1
n
n
X
i=1
(Xi − X)(Xi − X)T
=
n − 1
n
S (23)
Here, µ̂, X, Σ̂ and S are estimated population mean, sample mean,
estimated population covariance matrix and sample covariance
matrix respectively.

Question 9: From a sample size of 5000 (Example 6), compute the
maximum likelihood estimation of the population mean and population
covariance matrix.
Solution:

Outline
1 Multivariate and
2 Visualisation
Three-Dimensional
Visualisation
and Data
Population Case
Sample Case

Marginal and Conditional Normal Distributions
Consider a normal random vector X = [X1, X2,..., Xd]T. Let X[1] be a
vector consisting of the first d1 entries of X, and let X[2] be the vector
consisting of the remaining d2 entries:
X =

X[1]
X[2]

(24)
For ι = 1, 2 we let µι be the mean of X[ι] and Σι its covariance matrix.
Question 10: Let X ∼ N(µ, Σ) be 4-variate, where µ = [2, 3, 2, 3]T ,
Σ =




4 1 1 1
1 4 1 1
1 1 4 1
1 1 1 4



 Σ−1 =




0.286 −0.048 −0.048 −0.048
−0.048 0.286 −0.048 −0.048
−0.048 −0.048 0.286 −0.048
−0.048 −0.048 −0.048 0.286



.
Compute µ1 and Σ1 of X[1] and µ2 and Σ2 of X[2], where d1 and d2 are 2.
Analyze all the properties from Result 1.4 and 1.5.

Result 1.4
Assume that X[1], X[2] and X are given by Equation 24 for some d1, d2
d such that d1 + d2 = d. Assume also that X ∼ N(µ, Σ).
1 For j = 1,...,d the jth variable Xj of X has the distribution N(µj , σ2
j ).
2 ι = 1, 2, X[ι] has the distribution N(µι, Σι).
3 The (between) covariance matrix cov(X[1], X[2]) of X[1] and X[2]
is the d1xd2 submatrix Σ12 of.
Σ =

Σ1 Σ12
ΣT
12 Σ2

(25)
The marginal distributions of normal random vectors are normal with
means and covariance matrices of the original random vectors.

Result 1.5
Assume that X[1], X[2] and X are given by Equation 24 for some d1, d2
d such that d1 + d2 = d. Assume also that X ∼ N(µ, Σ) and that Σ1
and Σ2 are invertible.
If X[1] and X[2] are independent. The covariance matrix Σ12 of
X[1] and X[2] satisfies
Σ12 = 0d1xd2 (26)
Assume that Σ12 ̸= 0d1×d2
. Put X21 = X2 -Σ12
TΣ1
-1X1. Then
X21 is a d2-dimensional random vector which is independent of X1
and X21 ∼N(µ21, Σ21) with
µ21 = µ2 − ΣT
12Σ−1
1 µ1 and Σ2/1 = Σ2 − ΣT
12Σ−1
1 Σ12 (27)

Result 1.5 (Continue)
Let (X[1] | X[2]) be the conditional random vector X[1] given X[2].
Then (X[1] | X[2]) ∼ N(µX1 | X2
, ΣX1 | X2
)
µX1|X2
= µ1 + Σ12Σ−1
2 (X[2]
− µ2) (28)
ΣX1|X2
= Σ1 − Σ12Σ−1
2 ΣT
12 (29)
The first property specifies independence always implies uncorrelated
-ness, and for the normal distribution, the converse holds too. The
second property shows how one can uncorrelate the vectors X[1] and
X[2]. The last property is about the adjustments that are needed when
the sub-vectors have a non-zero covariance matrix.

Q.10 solution:

Summary
Here, we have discussed
Different types of multivariate and high-dimensional problems.
Three-dimensional visualisation of features from the HIV and Iris
flower data-sets.
Data visualisation using vertical parallel coordinate plot.
Data Visualisation using horizontal parallel coordinate plot.
Differentiation between population cases and sample cases.
Population mean, population covariance matrix, sample mean and
sample covariance matrix of multivariate random vectors.
Population mean, population covariance matrix, sample mean and
sample covariance matrix of Gaussian random vectors.
Parameters and properties of marginal and conditional normal
distributions.

For Further Reading I
I. Koch.
Analysis of multivariate and high-dimensional data (Vol. 32).
Cambridge Universities Press, 2014.
F. Emdad and S. R. Zekavat.
High dimensional data analysis: overview, analysis and applications.
VDM verlag, 2008.

Multidimensional Data

Recommended

Recommended

More Related Content

Similar to Multidimensional Data

Similar to Multidimensional Data (20)

More from Ashutosh Satapathy

More from Ashutosh Satapathy (7)

Recently uploaded

Recently uploaded (20)

Multidimensional Data