SlideShare a Scribd company logo
LINEAR ALGEBRA
MA204
Problem Definition
Suppose we have n individuals and for each individual we are
measuring same m variables i.e. we have n data points with each point
in Rm
.
For example, we have 120 students in a class and we have measured
the following for each student :
‘Registration Number’ , ‘MA204 End-Sem Marks’ , ‘MA204 grade’ i.e. m
= 3 and n = 120 here.
1: Which are the variables that are correlated?
In our example above, we can expect a correlation between the
‘MA204 grade’ and ‘MA204End- Sem Marks’ variables
but there wouldn’t be any correlation between the ‘Registration
Number’ and ‘MA204 grade’ of a student.
2: Which variables are the most important in describing the full
dataset?
There would be some variables that are more important in describing
the dataset and some variables which wouldn’t provide any significant
information to the dataset.
3: Can the data be visualized in a simpler way?
In our example, should the data points in R3
essentially be
clustered
around a plane or is there a simpler way of seeing the data?
Linear Algebra application in Principal
Component Analysis
Let us take a dataset A (3*4) =
Mean and variance:
We know that the mean of n points is given by, μ =
𝟏
𝟏
(a1+a2+...+an)
and variance is given by, σ2 =
𝟏
𝟏−𝟏
[(a1- μ)2+(a2- μ)2+...+(an- μ)2]
Now we will recenter the data such that the mean becomes zero. This is done by subtracting
the mean of the column from each column.
So, the 4*3 matrix B whose mean is zero becomes,
B =
Where μi is the mean of ith column.
Covariance:
Let us try to find the correlation between two columns A and B which tells us how much B
varies as A varies.
cov(A, B) =
𝟏
𝟏−𝟏
[(a1- μA)(b1-μB)+(a2- μA)(b2-μB)+...+(an- μA)(bn-μB)]
Now, let S be defined as S =
𝟏
𝟏−𝟏
BBT
Clearly S is a symmetric matrix.
Now, Sii =
𝟏
𝟏−𝟏
[(a1i- μi)2+(a2i- μi)2+(a3i- μi)2]
and Sij =
𝟏
𝟏−𝟏
[(a1i- μi)(a1j- μj)+(a2i- μi)(a2j- μj)+(a3i- μi)(a3j- μj)]
Clearly Sii represents the variance of the ith variable and Sij represents the covariance of the ith
and jth variable.
Spectral Theorem:
If A is symmetric (meaning A=AT), then A is orthogonally diagonalizable and has only real
eigenvalues. In other words, there exist real numbers λ1 ,..., λn(the eigenvalues) and orthogonal,
non-zero real vectors ũ1 ,..., ũn(the eigenvectors) such that for each i = 1, 2,…, n: Aiũi = λiũi
The matrices AAT
and AT
A sharethe same non-zero eigenvalues and the
eigenvalues of AAT
and AT
A are non-negativenumbers.
FromSpectral theorem, we can orthogonally diagonalize S as it is a symmetric
matrix and let the eigenvalues of S be λ1, λ2, λ3, λ4 and the corresponding
orthonormaleigenvectors be ũ1, ũ2, ũ3, ũ4. These eigenvectors are called the
principal components of the dataset.
The trace of a matrix, T is the sumof the diagonal elements which in turn, is the
sumof the varianceof all the columns and hence is the total variance.
Trace of a matrix is also equal to the sumof its eigenvalues.
The following interpretation is fundamental to PCA:
 The direction in Rm
given by ū1 (the first principal direction)
“explains” or “accounts for” an amount λ1 of the total variance, T.
What fraction of the total variance? It’s λ1/T. And similarly, the
second principal direction ū2 accounts for the fraction λ2 /T of the
total variance, and so on.
 Thus, the vector ū1 belongs to Rm
points in the most “significant”
direction of the data set.
 Among directions that are orthogonal to both ū1 and ū2 points in
the most significant direction, and so on.
Dimensionreduction:
It is often the case that the largest few eigenvalues ofS are much greater
than all the others. For instance, suppose m = 10, the total variance T = 100,
and λ1 = 90.5, λ1 = 8.9 and λ3 …., λ10 are all less than 0.1. This means that
the first and the second principal directions explain 99.4 percent of
total variation in the data. Thus, even though our data points might
from cloud in R10
(which seems impossible to visualize), PCA tells us
that these points cluster near a two-dimensional plane (spanned by ū1
and ū2). In fact, the data points will looksomethinglike a rectangularstrip
inside that plane, since λ1 is a lot bigger than λ2 (similarto the previous
example). We haveeffectively reduced the problem from ten dimensions
down to two.

More Related Content

What's hot

Linear Equation
Linear EquationLinear Equation
Linear Equation
fatine1232002
 
Linear Equation
Linear EquationLinear Equation
Linear Equation
fatine1232002
 
2.3 and 2.4 Lines
2.3 and 2.4 Lines2.3 and 2.4 Lines
2.3 and 2.4 Lines
leblance
 
Linear Equations
Linear EquationsLinear Equations
Linear Equations
fatine1232002
 
Abs regression
Abs regressionAbs regression
Abs regression
Karthikeya Omshanti
 
Sol9
Sol9Sol9
Chapter 2 part2-Correlation
Chapter 2 part2-CorrelationChapter 2 part2-Correlation
Chapter 2 part2-Correlation
nszakir
 
Phase 3 IP Discrete Mathematics - Vertexes & Hierarchical Trees
Phase 3 IP Discrete Mathematics - Vertexes & Hierarchical TreesPhase 3 IP Discrete Mathematics - Vertexes & Hierarchical Trees
Phase 3 IP Discrete Mathematics - Vertexes & Hierarchical Trees
Mark Simon
 
Basics about Summation
Basics about SummationBasics about Summation
Basics about Summation
srijanani16
 
Parallel and perpendicular Lines
Parallel and perpendicular LinesParallel and perpendicular Lines
Parallel and perpendicular Lines
amymallory
 
Parallel and Perpendicular 2
Parallel and Perpendicular 2Parallel and Perpendicular 2
Parallel and Perpendicular 2
jennr21
 
Significant figures
Significant figuresSignificant figures
Significant figures
YhanzieCapilitan
 
Statistical Analysis of the "Statistics Marks" of PGDM Students
Statistical Analysis of the "Statistics Marks" of PGDM StudentsStatistical Analysis of the "Statistics Marks" of PGDM Students
Statistical Analysis of the "Statistics Marks" of PGDM Students
Nivin Vinoi
 
matricesMrtices
matricesMrticesmatricesMrtices
matricesMrtices
Yourskill Tricks
 
คาบ 3 4
คาบ 3 4คาบ 3 4
คาบ 3 4
Yodhathai Reesrikom
 
Lesson 5 locus of a point
Lesson 5    locus of a pointLesson 5    locus of a point
Lesson 5 locus of a point
Jean Leano
 
Properties of triangles1
Properties of triangles1Properties of triangles1
Properties of triangles1
Dreams4school
 
Chapter 6, triangles For Grade -10
Chapter 6, triangles For Grade -10Chapter 6, triangles For Grade -10
Chapter 6, triangles For Grade -10
Siddu Lingesh
 
123
123123
Matrix and Matrices
Matrix and MatricesMatrix and Matrices
Matrix and Matrices
esfriendsg
 

What's hot (20)

Linear Equation
Linear EquationLinear Equation
Linear Equation
 
Linear Equation
Linear EquationLinear Equation
Linear Equation
 
2.3 and 2.4 Lines
2.3 and 2.4 Lines2.3 and 2.4 Lines
2.3 and 2.4 Lines
 
Linear Equations
Linear EquationsLinear Equations
Linear Equations
 
Abs regression
Abs regressionAbs regression
Abs regression
 
Sol9
Sol9Sol9
Sol9
 
Chapter 2 part2-Correlation
Chapter 2 part2-CorrelationChapter 2 part2-Correlation
Chapter 2 part2-Correlation
 
Phase 3 IP Discrete Mathematics - Vertexes & Hierarchical Trees
Phase 3 IP Discrete Mathematics - Vertexes & Hierarchical TreesPhase 3 IP Discrete Mathematics - Vertexes & Hierarchical Trees
Phase 3 IP Discrete Mathematics - Vertexes & Hierarchical Trees
 
Basics about Summation
Basics about SummationBasics about Summation
Basics about Summation
 
Parallel and perpendicular Lines
Parallel and perpendicular LinesParallel and perpendicular Lines
Parallel and perpendicular Lines
 
Parallel and Perpendicular 2
Parallel and Perpendicular 2Parallel and Perpendicular 2
Parallel and Perpendicular 2
 
Significant figures
Significant figuresSignificant figures
Significant figures
 
Statistical Analysis of the "Statistics Marks" of PGDM Students
Statistical Analysis of the "Statistics Marks" of PGDM StudentsStatistical Analysis of the "Statistics Marks" of PGDM Students
Statistical Analysis of the "Statistics Marks" of PGDM Students
 
matricesMrtices
matricesMrticesmatricesMrtices
matricesMrtices
 
คาบ 3 4
คาบ 3 4คาบ 3 4
คาบ 3 4
 
Lesson 5 locus of a point
Lesson 5    locus of a pointLesson 5    locus of a point
Lesson 5 locus of a point
 
Properties of triangles1
Properties of triangles1Properties of triangles1
Properties of triangles1
 
Chapter 6, triangles For Grade -10
Chapter 6, triangles For Grade -10Chapter 6, triangles For Grade -10
Chapter 6, triangles For Grade -10
 
123
123123
123
 
Matrix and Matrices
Matrix and MatricesMatrix and Matrices
Matrix and Matrices
 

Similar to Linear Algebra in PCA

Chapter 14 Part I
Chapter 14 Part IChapter 14 Part I
Chapter 14 Part I
Matthew L Levy
 
Linear Regression
Linear Regression Linear Regression
Linear Regression
Rupak Roy
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
rishi.indian
 
Linear regression.pptx
Linear regression.pptxLinear regression.pptx
Linear regression.pptx
ssuserb8a904
 
Business mathematics presentation
Business mathematics presentationBusiness mathematics presentation
Business mathematics presentation
Sourov Shaha Suvo
 
Lecture 4 chapter 1 review section 2-1
Lecture 4   chapter 1 review section 2-1Lecture 4   chapter 1 review section 2-1
Lecture 4 chapter 1 review section 2-1
njit-ronbrown
 
Matrices And Determinants
Matrices And DeterminantsMatrices And Determinants
Matrices And Determinants
DEVIKA S INDU
 
Geomettry
GeomettryGeomettry
Geomettry
Jose Chavez
 
from_data_to_differential_equations.ppt
from_data_to_differential_equations.pptfrom_data_to_differential_equations.ppt
from_data_to_differential_equations.ppt
ashutoshvb1
 
Stat
StatStat
1 1 number theory
1 1 number theory1 1 number theory
1 1 number theory
smillertx281
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
Rashi Agarwal
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
MuhammadAftab89
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
BAGARAGAZAROMUALD2
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
RidaIrfan10
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
ssuser71ac73
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
HarunorRashid74
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
krunal soni
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
MoinPasha12
 
MC0082 –Theory of Computer Science
MC0082 –Theory of Computer ScienceMC0082 –Theory of Computer Science
MC0082 –Theory of Computer Science
Aravind NC
 

Similar to Linear Algebra in PCA (20)

Chapter 14 Part I
Chapter 14 Part IChapter 14 Part I
Chapter 14 Part I
 
Linear Regression
Linear Regression Linear Regression
Linear Regression
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
Linear regression.pptx
Linear regression.pptxLinear regression.pptx
Linear regression.pptx
 
Business mathematics presentation
Business mathematics presentationBusiness mathematics presentation
Business mathematics presentation
 
Lecture 4 chapter 1 review section 2-1
Lecture 4   chapter 1 review section 2-1Lecture 4   chapter 1 review section 2-1
Lecture 4 chapter 1 review section 2-1
 
Matrices And Determinants
Matrices And DeterminantsMatrices And Determinants
Matrices And Determinants
 
Geomettry
GeomettryGeomettry
Geomettry
 
from_data_to_differential_equations.ppt
from_data_to_differential_equations.pptfrom_data_to_differential_equations.ppt
from_data_to_differential_equations.ppt
 
Stat
StatStat
Stat
 
1 1 number theory
1 1 number theory1 1 number theory
1 1 number theory
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
MC0082 –Theory of Computer Science
MC0082 –Theory of Computer ScienceMC0082 –Theory of Computer Science
MC0082 –Theory of Computer Science
 

Recently uploaded

Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 

Recently uploaded (20)

Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 

Linear Algebra in PCA

  • 2. Problem Definition Suppose we have n individuals and for each individual we are measuring same m variables i.e. we have n data points with each point in Rm . For example, we have 120 students in a class and we have measured the following for each student : ‘Registration Number’ , ‘MA204 End-Sem Marks’ , ‘MA204 grade’ i.e. m = 3 and n = 120 here. 1: Which are the variables that are correlated? In our example above, we can expect a correlation between the ‘MA204 grade’ and ‘MA204End- Sem Marks’ variables but there wouldn’t be any correlation between the ‘Registration Number’ and ‘MA204 grade’ of a student. 2: Which variables are the most important in describing the full dataset? There would be some variables that are more important in describing the dataset and some variables which wouldn’t provide any significant information to the dataset. 3: Can the data be visualized in a simpler way? In our example, should the data points in R3 essentially be clustered around a plane or is there a simpler way of seeing the data?
  • 3. Linear Algebra application in Principal Component Analysis Let us take a dataset A (3*4) = Mean and variance: We know that the mean of n points is given by, μ = 𝟏 𝟏 (a1+a2+...+an) and variance is given by, σ2 = 𝟏 𝟏−𝟏 [(a1- μ)2+(a2- μ)2+...+(an- μ)2] Now we will recenter the data such that the mean becomes zero. This is done by subtracting the mean of the column from each column. So, the 4*3 matrix B whose mean is zero becomes, B = Where μi is the mean of ith column. Covariance: Let us try to find the correlation between two columns A and B which tells us how much B varies as A varies. cov(A, B) = 𝟏 𝟏−𝟏 [(a1- μA)(b1-μB)+(a2- μA)(b2-μB)+...+(an- μA)(bn-μB)] Now, let S be defined as S = 𝟏 𝟏−𝟏 BBT Clearly S is a symmetric matrix. Now, Sii = 𝟏 𝟏−𝟏 [(a1i- μi)2+(a2i- μi)2+(a3i- μi)2] and Sij = 𝟏 𝟏−𝟏 [(a1i- μi)(a1j- μj)+(a2i- μi)(a2j- μj)+(a3i- μi)(a3j- μj)]
  • 4. Clearly Sii represents the variance of the ith variable and Sij represents the covariance of the ith and jth variable. Spectral Theorem: If A is symmetric (meaning A=AT), then A is orthogonally diagonalizable and has only real eigenvalues. In other words, there exist real numbers λ1 ,..., λn(the eigenvalues) and orthogonal, non-zero real vectors ũ1 ,..., ũn(the eigenvectors) such that for each i = 1, 2,…, n: Aiũi = λiũi The matrices AAT and AT A sharethe same non-zero eigenvalues and the eigenvalues of AAT and AT A are non-negativenumbers. FromSpectral theorem, we can orthogonally diagonalize S as it is a symmetric matrix and let the eigenvalues of S be λ1, λ2, λ3, λ4 and the corresponding orthonormaleigenvectors be ũ1, ũ2, ũ3, ũ4. These eigenvectors are called the principal components of the dataset. The trace of a matrix, T is the sumof the diagonal elements which in turn, is the sumof the varianceof all the columns and hence is the total variance. Trace of a matrix is also equal to the sumof its eigenvalues. The following interpretation is fundamental to PCA:  The direction in Rm given by ū1 (the first principal direction) “explains” or “accounts for” an amount λ1 of the total variance, T. What fraction of the total variance? It’s λ1/T. And similarly, the
  • 5. second principal direction ū2 accounts for the fraction λ2 /T of the total variance, and so on.  Thus, the vector ū1 belongs to Rm points in the most “significant” direction of the data set.  Among directions that are orthogonal to both ū1 and ū2 points in the most significant direction, and so on. Dimensionreduction: It is often the case that the largest few eigenvalues ofS are much greater than all the others. For instance, suppose m = 10, the total variance T = 100, and λ1 = 90.5, λ1 = 8.9 and λ3 …., λ10 are all less than 0.1. This means that the first and the second principal directions explain 99.4 percent of total variation in the data. Thus, even though our data points might from cloud in R10 (which seems impossible to visualize), PCA tells us that these points cluster near a two-dimensional plane (spanned by ū1 and ū2). In fact, the data points will looksomethinglike a rectangularstrip inside that plane, since λ1 is a lot bigger than λ2 (similarto the previous example). We haveeffectively reduced the problem from ten dimensions down to two.