SlideShare a Scribd company logo
1 of 33
DATA MINING
LECTURE 8
Dimensionality Reduction
PCA -- SVD
Subrata Kumer Paul
Assistant Professor, Dept. of CSE, BAUET
sksubrata96@gmail.com
The curse of dimensionality
• Real data usually have thousands, or millions of
dimensions
• E.g., web documents, where the dimensionality is the
vocabulary of words
• Facebook graph, where the dimensionality is the
number of users
• Huge number of dimensions causes problems
• Data becomes very sparse, some algorithms become
meaningless (e.g. density based clustering)
• The complexity of several algorithms depends on the
dimensionality and they become infeasible.
Dimensionality Reduction
• Usually the data can be described with fewer
dimensions, without losing much of the meaning
of the data.
• The data reside in a space of lower dimensionality
• Essentially, we assume that some of the data is
noise, and we can approximate the useful part
with a lower dimensionality space.
• Dimensionality reduction does not just reduce the
amount of data, it often brings out the useful part of the
data
Dimensionality Reduction
• We have already seen a form of dimensionality
reduction
• LSH, and random projections reduce the
dimension while preserving the distances
Data in the form of a matrix
• We are given n objects and d attributes describing
the objects. Each object has d numeric values
describing it.
• We will represent the data as a nd real matrix A.
• We can now use tools from linear algebra to process the
data matrix
• Our goal is to produce a new nk matrix B such that
• It preserves as much of the information in the original matrix
A as possible
• It reveals something about the structure of the data in A
Example: Document matrices
n
documents
d terms
(e.g., theorem, proof, etc.)
Aij = frequency of the j-th
term in the i-th document
Find subsets of terms that bring documents
together
Example: Recommendation systems
n
customers
d movies
Aij = rating of j-th
product by the i-th
customer
Find subsets of movies that capture the
behavior or the customers
Linear algebra
• We assume that vectors are column vectors.
• We use 𝑣𝑇 for the transpose of vector 𝑣 (row vector)
• Dot product: 𝑢𝑇𝑣 (1𝑛, 𝑛1 → 11)
• The dot product is the projection of vector 𝑣 on 𝑢 (and vice versa)
• 1, 2, 3
4
1
2
= 12
• 𝑢𝑇𝑣 = 𝑣 𝑢 cos(𝑢, 𝑣)
• If ||𝑢|| = 1 (unit vector) then 𝑢𝑇𝑣 is the projection length of 𝑣 on 𝑢
• −1, 2, 3
4
−1
2
= 0 orthogonal vectors
• Orthonormal vectors: two unit vectors that are orthogonal
Matrices
• An nm matrix A is a collection of n row vectors and m column
vectors
𝐴 =
| | |
𝑎1 𝑎2 𝑎3
| | |
𝐴 =
− 𝛼1
𝑇
−
− 𝛼2
𝑇
−
− 𝛼3
𝑇
−
• Matrix-vector multiplication
• Right multiplication 𝐴𝑢: projection of u onto the row vectors of 𝐴, or
projection of row vectors of 𝐴 onto 𝑢.
• Left-multiplication 𝑢𝑇
𝐴: projection of 𝑢 onto the column vectors of 𝐴, or
projection of column vectors of 𝐴 onto 𝑢
• Example:
1,2,3
1 0
0 1
0 0
= [1,2]
Rank
• Row space of A: The set of vectors that can be
written as a linear combination of the rows of A
• All vectors of the form 𝑣 = 𝑢𝑇𝐴
• Column space of A: The set of vectors that can be
written as a linear combination of the columns of A
• All vectors of the form 𝑣 = 𝐴𝑢.
• Rank of A: the number of linearly independent row (or
column) vectors
• These vectors define a basis for the row (or column) space
of A
Rank-1 matrices
• In a rank-1 matrix, all columns (or rows) are
multiples of the same column (or row) vector
𝐴 =
1 2 −1
2 4 −2
3 6 −3
• All rows are multiples of 𝑟 = [1,2, −1]
• All columns are multiples of 𝑐 = 1,2,3 𝑇
• External product: 𝑢𝑣𝑇 (𝑛1 , 1𝑚 → 𝑛𝑚)
• The resulting 𝑛𝑚 has rank 1: all rows (or columns) are
linearly dependent
• 𝐴 = 𝑟𝑐𝑇
Eigenvectors
• (Right) Eigenvector of matrix A: a vector v such
that 𝐴𝑣 = 𝜆𝑣
• 𝜆: eigenvalue of eigenvector 𝑣
• A square matrix A of rank r, has r orthonormal
eigenvectors 𝑢1, 𝑢2, … , 𝑢𝑟 with eigenvalues
𝜆1, 𝜆2, … , 𝜆𝑟.
• Eigenvectors define an orthonormal basis for the
column space of A
Singular Value Decomposition
𝐴 = 𝑈 Σ 𝑉𝑇 = 𝑢1, 𝑢2, ⋯ , 𝑢𝑟
𝜎1
𝜎2
0
0
⋱
𝜎𝑟
𝑣1
𝑇
𝑣2
𝑇
⋮
𝑣𝑟
𝑇
• 𝜎1, ≥ 𝜎2 ≥ ⋯ ≥ 𝜎𝑟: singular values of matrix 𝐴 (also, the square roots
of eigenvalues of 𝐴𝐴𝑇 and 𝐴𝑇𝐴)
• 𝑢1, 𝑢2, … , 𝑢𝑟: left singular vectors of 𝐴 (also eigenvectors of 𝐴𝐴𝑇
)
• 𝑣1, 𝑣2, … , 𝑣𝑟: right singular vectors of 𝐴 (also, eigenvectors of 𝐴𝑇
𝐴)
𝐴 = 𝜎1𝑢1𝑣1
𝑇
+ 𝜎2𝑢2𝑣2
𝑇
+ ⋯ + 𝜎𝑟𝑢𝑟𝑣𝑟
𝑇
[n×r] [r×r] [r×m]
r: rank of matrix A
[n×m] =
Symmetric matrices
• Special case: A is symmetric positive definite
matrix
𝐴 = 𝜆1𝑢1𝑢1
𝑇
+ 𝜆2𝑢2𝑢2
𝑇
+ ⋯ + 𝜆𝑟𝑢𝑟𝑢𝑟
𝑇
• 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑟 ≥ 0: Eigenvalues of A
• 𝑢1, 𝑢2, … , 𝑢𝑟: Eigenvectors of A
Singular Value Decomposition
• The left singular vectors are an orthonormal basis
for the row space of A.
• The right singular vectors are an orthonormal
basis for the column space of A.
• If A has rank r, then A can be written as the sum
of r rank-1 matrices
• There are r “linear components” (trends) in A.
• Linear trend: the tendency of the row vectors of A to align
with vector v
• Strength of the i-th linear trend: ||𝐴𝒗𝒊|| = 𝝈𝒊
An (extreme) example
• Document-term matrix
• Blue and Red rows (colums) are linearly dependent
• There are two prototype documents (vectors of words): blue
and red
• To describe the data is enough to describe the two prototypes, and the
projection weights for each row
• A is a rank-2 matrix
𝐴 = 𝑤1, 𝑤2
𝑑1
𝑇
𝑑2
𝑇
A =
An (more realistic) example
• Document-term matrix
• There are two prototype documents and words but
they are noisy
• We now have more than two singular vectors, but the
strongest ones are still about the two types.
• By keeping the two strongest singular vectors we obtain most
of the information in the data.
• This is a rank-2 approximation of the matrix A
A =
Rank-k approximations (Ak)
Uk (Vk): orthogonal matrix containing the top k left (right)
singular vectors of A.
k: diagonal matrix containing the top k singular values of A
Ak is an approximation of A
n x d n x k k x k k x d
Ak is the best approximation of A
SVD as an optimization
• The rank-k approximation matrix 𝐴𝑘 produced by
the top-k singular vectors of A minimizes the
Frobenious norm of the difference with the matrix
A
𝐴𝑘 = arg max
𝐵:𝑟𝑎𝑛𝑘 𝐵 =𝑘
𝐴 − 𝐵 𝐹
2
𝐴 − 𝐵 𝐹
2
=
𝑖,𝑗
𝐴𝑖𝑗 − 𝐵𝑖𝑗
2
What does this mean?
• We can project the row (and column) vectors of
the matrix A into a k-dimensional space and
preserve most of the information
• (Ideally) The k dimensions reveal latent
features/aspects/topics of the term (document)
space.
• (Ideally) The 𝐴𝑘 approximation of matrix A,
contains all the useful information, and what is
discarded is noise
Latent factor model
• Rows (columns) are linear combinations of k
latent factors
• E.g., in our extreme document example there are two
factors
• Some noise is added to this rank-k matrix
resulting in higher rank
• SVD retrieves the latent factors (hopefully).
A VT

U
=
objects
features
significant
noise
noise
noise
significant sig.
=
SVD and Rank-k approximations
Application: Recommender systems
• Data: Users rating movies
• Sparse and often noisy
• Assumption: There are k basic user profiles, and
each user is a linear combination of these profiles
• E.g., action, comedy, drama, romance
• Each user is a weighted cobination of these profiles
• The “true” matrix has rank k
• What we observe is a noisy, and incomplete version
of this matrix 𝐴
• The rank-k approximation 𝐴𝑘 is provably close to 𝐴𝑘
• Algorithm: compute 𝐴𝑘 and predict for user 𝑢 and
movie 𝑚, the value 𝐴𝑘[𝑚, 𝑢].
• Model-based collaborative filtering
SVD and PCA
• PCA is a special case of SVD on the centered
covariance matrix.
Covariance matrix
• Goal: reduce the dimensionality while preserving the
“information in the data”
• Information in the data: variability in the data
• We measure variability using the covariance matrix.
• Sample covariance of variables X and Y
𝑖
𝑥𝑖 − 𝜇𝑋
𝑇(𝑦𝑖 − 𝜇𝑌)
• Given matrix A, remove the mean of each column
from the column vectors to get the centered matrix C
• The matrix 𝑉 = 𝐶𝑇𝐶 is the covariance matrix of the
row vectors of A.
PCA: Principal Component Analysis
• We will project the rows of matrix A into a new set
of attributes (dimensions) such that:
• The attributes have zero covariance to each other (they
are orthogonal)
• Each attribute captures the most remaining variance in
the data, while orthogonal to the existing attributes
• The first attribute should capture the most variance in the data
• For matrix C, the variance of the rows of C when
projected to vector x is given by 𝜎2 = 𝐶𝑥
2
• The right singular vector of C maximizes 𝜎2!
4.0 4.5 5.0 5.5 6.0
2
3
4
5
PCA
Input: 2-d dimensional points
Output:
1st (right)
singular vector
1st (right) singular vector:
direction of maximal variance,
2nd (right)
singular
vector
2nd (right) singular vector:
direction of maximal variance,
after removing the projection of
the data along the first singular
vector.
Singular values
1: measures how much of the
data variance is explained by
the first singular vector.
2: measures how much of the
data variance is explained by
the second singular vector.
1
4.0 4.5 5.0 5.5 6.0
2
3
4
5
1st (right)
singular vector
2nd (right)
singular
vector
Singular values tell us something about
the variance
• The variance in the direction of the k-th principal
component is given by the corresponding singular value
σk
2
• Singular values can be used to estimate how many
components to keep
• Rule of thumb: keep enough to explain 85% of the
variation:
85
.
0
1
2
1
2





n
j
j
k
j
j


Example
𝐴 =
𝑎11 ⋯ 𝑎1𝑛
⋮ ⋱ ⋮
𝑎𝑛1 ⋯ 𝑎𝑛𝑛
𝐴 = 𝑈Σ𝑉𝑇
• First right singular vector 𝑣1
• More or less same weight to all drugs
• Discriminates heavy from light users
• Second right singular vector
• Positive values for legal drugs, negative for illegal
students
drugs
legal illegal
𝑎𝑖𝑗: usage of student i of drug j
Drug 2
Drug 1
Another property of PCA/SVD
• The chosen vectors are such that minimize the sum of square
differences between the data vectors and the low-dimensional
projections
4.0 4.5 5.0 5.5 6.0
2
3
4
5
1st (right)
singular vector
Application
• Latent Semantic Indexing (LSI):
• Apply PCA on the document-term matrix, and index the
k-dimensional vectors
• When a query comes, project it onto the k-dimensional
space and compute cosine similarity in this space
• Principal components capture main topics, and enrich
the document representation
SVD is “the Rolls-Royce and the Swiss
Army Knife of Numerical Linear
Algebra.”*
*Dianne O’Leary, MMDS ’06

More Related Content

Similar to Data Mining Lecture_9.pptx

[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialJia-Bin Huang
 
Kulum alin-11 jan2014
Kulum alin-11 jan2014Kulum alin-11 jan2014
Kulum alin-11 jan2014rolly purnomo
 
Applied Mathematics 3 Matrices.pdf
Applied Mathematics 3 Matrices.pdfApplied Mathematics 3 Matrices.pdf
Applied Mathematics 3 Matrices.pdfPrasadBaravkar1
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxRohanBorgalli
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by KodebayKodebay
 
Linear Algebra Presentation including basic of linear Algebra
Linear Algebra Presentation including basic of linear AlgebraLinear Algebra Presentation including basic of linear Algebra
Linear Algebra Presentation including basic of linear AlgebraMUHAMMADUSMAN93058
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxSubrata Kumer Paul
 
Module 1 Theory of Matrices.pdf
Module 1 Theory of Matrices.pdfModule 1 Theory of Matrices.pdf
Module 1 Theory of Matrices.pdfPrathamPatel560716
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
pcappt-140121072949-phpapp01.pptx
pcappt-140121072949-phpapp01.pptxpcappt-140121072949-phpapp01.pptx
pcappt-140121072949-phpapp01.pptxABINASHPADHY6
 
Digital Image Processing.pptx
Digital Image Processing.pptxDigital Image Processing.pptx
Digital Image Processing.pptxMukhtiarKhan5
 
Linear_Algebra_final.pdf
Linear_Algebra_final.pdfLinear_Algebra_final.pdf
Linear_Algebra_final.pdfRohitAnand125
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxGopalPatidar13
 

Similar to Data Mining Lecture_9.pptx (20)

[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
 
Kulum alin-11 jan2014
Kulum alin-11 jan2014Kulum alin-11 jan2014
Kulum alin-11 jan2014
 
Applied Mathematics 3 Matrices.pdf
Applied Mathematics 3 Matrices.pdfApplied Mathematics 3 Matrices.pdf
Applied Mathematics 3 Matrices.pdf
 
Covariance.pdf
Covariance.pdfCovariance.pdf
Covariance.pdf
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptx
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by Kodebay
 
Linear Algebra Presentation including basic of linear Algebra
Linear Algebra Presentation including basic of linear AlgebraLinear Algebra Presentation including basic of linear Algebra
Linear Algebra Presentation including basic of linear Algebra
 
R training2
R training2R training2
R training2
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptx
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Module 1 Theory of Matrices.pdf
Module 1 Theory of Matrices.pdfModule 1 Theory of Matrices.pdf
Module 1 Theory of Matrices.pdf
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Av 738- Adaptive Filtering - Background Material
Av 738- Adaptive Filtering - Background MaterialAv 738- Adaptive Filtering - Background Material
Av 738- Adaptive Filtering - Background Material
 
pcappt-140121072949-phpapp01.pptx
pcappt-140121072949-phpapp01.pptxpcappt-140121072949-phpapp01.pptx
pcappt-140121072949-phpapp01.pptx
 
Digital Image Processing.pptx
Digital Image Processing.pptxDigital Image Processing.pptx
Digital Image Processing.pptx
 
Oed chapter 1
Oed chapter 1Oed chapter 1
Oed chapter 1
 
Linear_Algebra_final.pdf
Linear_Algebra_final.pdfLinear_Algebra_final.pdf
Linear_Algebra_final.pdf
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptx
 
Stats chapter 3
Stats chapter 3Stats chapter 3
Stats chapter 3
 

More from Subrata Kumer Paul

Chapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptChapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptSubrata Kumer Paul
 
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.pptChapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.pptSubrata Kumer Paul
 
Chapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.pptChapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.pptSubrata Kumer Paul
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptSubrata Kumer Paul
 
Chapter 7. Advanced Frequent Pattern Mining.ppt
Chapter 7. Advanced Frequent Pattern Mining.pptChapter 7. Advanced Frequent Pattern Mining.ppt
Chapter 7. Advanced Frequent Pattern Mining.pptSubrata Kumer Paul
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptSubrata Kumer Paul
 
Chapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptChapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptSubrata Kumer Paul
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptSubrata Kumer Paul
 

More from Subrata Kumer Paul (20)

Chapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptChapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.ppt
 
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.pptChapter 13. Trends and Research Frontiers in Data Mining.ppt
Chapter 13. Trends and Research Frontiers in Data Mining.ppt
 
Chapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.pptChapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.ppt
 
Chapter 2. Know Your Data.ppt
Chapter 2. Know Your Data.pptChapter 2. Know Your Data.ppt
Chapter 2. Know Your Data.ppt
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.ppt
 
Chapter 7. Advanced Frequent Pattern Mining.ppt
Chapter 7. Advanced Frequent Pattern Mining.pptChapter 7. Advanced Frequent Pattern Mining.ppt
Chapter 7. Advanced Frequent Pattern Mining.ppt
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.ppt
 
Chapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptChapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.ppt
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Data Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptxData Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptx
 
Data Mining Lecture_13.pptx
Data Mining Lecture_13.pptxData Mining Lecture_13.pptx
Data Mining Lecture_13.pptx
 
Data Mining Lecture_7.pptx
Data Mining Lecture_7.pptxData Mining Lecture_7.pptx
Data Mining Lecture_7.pptx
 
Data Mining Lecture_8(b).pptx
Data Mining Lecture_8(b).pptxData Mining Lecture_8(b).pptx
Data Mining Lecture_8(b).pptx
 
Data Mining Lecture_6.pptx
Data Mining Lecture_6.pptxData Mining Lecture_6.pptx
Data Mining Lecture_6.pptx
 
Data Mining Lecture_11.pptx
Data Mining Lecture_11.pptxData Mining Lecture_11.pptx
Data Mining Lecture_11.pptx
 
Data Mining Lecture_4.pptx
Data Mining Lecture_4.pptxData Mining Lecture_4.pptx
Data Mining Lecture_4.pptx
 

Recently uploaded

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 

Recently uploaded (20)

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 

Data Mining Lecture_9.pptx

  • 1. DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD Subrata Kumer Paul Assistant Professor, Dept. of CSE, BAUET sksubrata96@gmail.com
  • 2. The curse of dimensionality • Real data usually have thousands, or millions of dimensions • E.g., web documents, where the dimensionality is the vocabulary of words • Facebook graph, where the dimensionality is the number of users • Huge number of dimensions causes problems • Data becomes very sparse, some algorithms become meaningless (e.g. density based clustering) • The complexity of several algorithms depends on the dimensionality and they become infeasible.
  • 3. Dimensionality Reduction • Usually the data can be described with fewer dimensions, without losing much of the meaning of the data. • The data reside in a space of lower dimensionality • Essentially, we assume that some of the data is noise, and we can approximate the useful part with a lower dimensionality space. • Dimensionality reduction does not just reduce the amount of data, it often brings out the useful part of the data
  • 4. Dimensionality Reduction • We have already seen a form of dimensionality reduction • LSH, and random projections reduce the dimension while preserving the distances
  • 5. Data in the form of a matrix • We are given n objects and d attributes describing the objects. Each object has d numeric values describing it. • We will represent the data as a nd real matrix A. • We can now use tools from linear algebra to process the data matrix • Our goal is to produce a new nk matrix B such that • It preserves as much of the information in the original matrix A as possible • It reveals something about the structure of the data in A
  • 6. Example: Document matrices n documents d terms (e.g., theorem, proof, etc.) Aij = frequency of the j-th term in the i-th document Find subsets of terms that bring documents together
  • 7. Example: Recommendation systems n customers d movies Aij = rating of j-th product by the i-th customer Find subsets of movies that capture the behavior or the customers
  • 8. Linear algebra • We assume that vectors are column vectors. • We use 𝑣𝑇 for the transpose of vector 𝑣 (row vector) • Dot product: 𝑢𝑇𝑣 (1𝑛, 𝑛1 → 11) • The dot product is the projection of vector 𝑣 on 𝑢 (and vice versa) • 1, 2, 3 4 1 2 = 12 • 𝑢𝑇𝑣 = 𝑣 𝑢 cos(𝑢, 𝑣) • If ||𝑢|| = 1 (unit vector) then 𝑢𝑇𝑣 is the projection length of 𝑣 on 𝑢 • −1, 2, 3 4 −1 2 = 0 orthogonal vectors • Orthonormal vectors: two unit vectors that are orthogonal
  • 9. Matrices • An nm matrix A is a collection of n row vectors and m column vectors 𝐴 = | | | 𝑎1 𝑎2 𝑎3 | | | 𝐴 = − 𝛼1 𝑇 − − 𝛼2 𝑇 − − 𝛼3 𝑇 − • Matrix-vector multiplication • Right multiplication 𝐴𝑢: projection of u onto the row vectors of 𝐴, or projection of row vectors of 𝐴 onto 𝑢. • Left-multiplication 𝑢𝑇 𝐴: projection of 𝑢 onto the column vectors of 𝐴, or projection of column vectors of 𝐴 onto 𝑢 • Example: 1,2,3 1 0 0 1 0 0 = [1,2]
  • 10. Rank • Row space of A: The set of vectors that can be written as a linear combination of the rows of A • All vectors of the form 𝑣 = 𝑢𝑇𝐴 • Column space of A: The set of vectors that can be written as a linear combination of the columns of A • All vectors of the form 𝑣 = 𝐴𝑢. • Rank of A: the number of linearly independent row (or column) vectors • These vectors define a basis for the row (or column) space of A
  • 11. Rank-1 matrices • In a rank-1 matrix, all columns (or rows) are multiples of the same column (or row) vector 𝐴 = 1 2 −1 2 4 −2 3 6 −3 • All rows are multiples of 𝑟 = [1,2, −1] • All columns are multiples of 𝑐 = 1,2,3 𝑇 • External product: 𝑢𝑣𝑇 (𝑛1 , 1𝑚 → 𝑛𝑚) • The resulting 𝑛𝑚 has rank 1: all rows (or columns) are linearly dependent • 𝐴 = 𝑟𝑐𝑇
  • 12. Eigenvectors • (Right) Eigenvector of matrix A: a vector v such that 𝐴𝑣 = 𝜆𝑣 • 𝜆: eigenvalue of eigenvector 𝑣 • A square matrix A of rank r, has r orthonormal eigenvectors 𝑢1, 𝑢2, … , 𝑢𝑟 with eigenvalues 𝜆1, 𝜆2, … , 𝜆𝑟. • Eigenvectors define an orthonormal basis for the column space of A
  • 13. Singular Value Decomposition 𝐴 = 𝑈 Σ 𝑉𝑇 = 𝑢1, 𝑢2, ⋯ , 𝑢𝑟 𝜎1 𝜎2 0 0 ⋱ 𝜎𝑟 𝑣1 𝑇 𝑣2 𝑇 ⋮ 𝑣𝑟 𝑇 • 𝜎1, ≥ 𝜎2 ≥ ⋯ ≥ 𝜎𝑟: singular values of matrix 𝐴 (also, the square roots of eigenvalues of 𝐴𝐴𝑇 and 𝐴𝑇𝐴) • 𝑢1, 𝑢2, … , 𝑢𝑟: left singular vectors of 𝐴 (also eigenvectors of 𝐴𝐴𝑇 ) • 𝑣1, 𝑣2, … , 𝑣𝑟: right singular vectors of 𝐴 (also, eigenvectors of 𝐴𝑇 𝐴) 𝐴 = 𝜎1𝑢1𝑣1 𝑇 + 𝜎2𝑢2𝑣2 𝑇 + ⋯ + 𝜎𝑟𝑢𝑟𝑣𝑟 𝑇 [n×r] [r×r] [r×m] r: rank of matrix A [n×m] =
  • 14. Symmetric matrices • Special case: A is symmetric positive definite matrix 𝐴 = 𝜆1𝑢1𝑢1 𝑇 + 𝜆2𝑢2𝑢2 𝑇 + ⋯ + 𝜆𝑟𝑢𝑟𝑢𝑟 𝑇 • 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑟 ≥ 0: Eigenvalues of A • 𝑢1, 𝑢2, … , 𝑢𝑟: Eigenvectors of A
  • 15. Singular Value Decomposition • The left singular vectors are an orthonormal basis for the row space of A. • The right singular vectors are an orthonormal basis for the column space of A. • If A has rank r, then A can be written as the sum of r rank-1 matrices • There are r “linear components” (trends) in A. • Linear trend: the tendency of the row vectors of A to align with vector v • Strength of the i-th linear trend: ||𝐴𝒗𝒊|| = 𝝈𝒊
  • 16. An (extreme) example • Document-term matrix • Blue and Red rows (colums) are linearly dependent • There are two prototype documents (vectors of words): blue and red • To describe the data is enough to describe the two prototypes, and the projection weights for each row • A is a rank-2 matrix 𝐴 = 𝑤1, 𝑤2 𝑑1 𝑇 𝑑2 𝑇 A =
  • 17. An (more realistic) example • Document-term matrix • There are two prototype documents and words but they are noisy • We now have more than two singular vectors, but the strongest ones are still about the two types. • By keeping the two strongest singular vectors we obtain most of the information in the data. • This is a rank-2 approximation of the matrix A A =
  • 18. Rank-k approximations (Ak) Uk (Vk): orthogonal matrix containing the top k left (right) singular vectors of A. k: diagonal matrix containing the top k singular values of A Ak is an approximation of A n x d n x k k x k k x d Ak is the best approximation of A
  • 19. SVD as an optimization • The rank-k approximation matrix 𝐴𝑘 produced by the top-k singular vectors of A minimizes the Frobenious norm of the difference with the matrix A 𝐴𝑘 = arg max 𝐵:𝑟𝑎𝑛𝑘 𝐵 =𝑘 𝐴 − 𝐵 𝐹 2 𝐴 − 𝐵 𝐹 2 = 𝑖,𝑗 𝐴𝑖𝑗 − 𝐵𝑖𝑗 2
  • 20. What does this mean? • We can project the row (and column) vectors of the matrix A into a k-dimensional space and preserve most of the information • (Ideally) The k dimensions reveal latent features/aspects/topics of the term (document) space. • (Ideally) The 𝐴𝑘 approximation of matrix A, contains all the useful information, and what is discarded is noise
  • 21. Latent factor model • Rows (columns) are linear combinations of k latent factors • E.g., in our extreme document example there are two factors • Some noise is added to this rank-k matrix resulting in higher rank • SVD retrieves the latent factors (hopefully).
  • 23. Application: Recommender systems • Data: Users rating movies • Sparse and often noisy • Assumption: There are k basic user profiles, and each user is a linear combination of these profiles • E.g., action, comedy, drama, romance • Each user is a weighted cobination of these profiles • The “true” matrix has rank k • What we observe is a noisy, and incomplete version of this matrix 𝐴 • The rank-k approximation 𝐴𝑘 is provably close to 𝐴𝑘 • Algorithm: compute 𝐴𝑘 and predict for user 𝑢 and movie 𝑚, the value 𝐴𝑘[𝑚, 𝑢]. • Model-based collaborative filtering
  • 24. SVD and PCA • PCA is a special case of SVD on the centered covariance matrix.
  • 25. Covariance matrix • Goal: reduce the dimensionality while preserving the “information in the data” • Information in the data: variability in the data • We measure variability using the covariance matrix. • Sample covariance of variables X and Y 𝑖 𝑥𝑖 − 𝜇𝑋 𝑇(𝑦𝑖 − 𝜇𝑌) • Given matrix A, remove the mean of each column from the column vectors to get the centered matrix C • The matrix 𝑉 = 𝐶𝑇𝐶 is the covariance matrix of the row vectors of A.
  • 26. PCA: Principal Component Analysis • We will project the rows of matrix A into a new set of attributes (dimensions) such that: • The attributes have zero covariance to each other (they are orthogonal) • Each attribute captures the most remaining variance in the data, while orthogonal to the existing attributes • The first attribute should capture the most variance in the data • For matrix C, the variance of the rows of C when projected to vector x is given by 𝜎2 = 𝐶𝑥 2 • The right singular vector of C maximizes 𝜎2!
  • 27. 4.0 4.5 5.0 5.5 6.0 2 3 4 5 PCA Input: 2-d dimensional points Output: 1st (right) singular vector 1st (right) singular vector: direction of maximal variance, 2nd (right) singular vector 2nd (right) singular vector: direction of maximal variance, after removing the projection of the data along the first singular vector.
  • 28. Singular values 1: measures how much of the data variance is explained by the first singular vector. 2: measures how much of the data variance is explained by the second singular vector. 1 4.0 4.5 5.0 5.5 6.0 2 3 4 5 1st (right) singular vector 2nd (right) singular vector
  • 29. Singular values tell us something about the variance • The variance in the direction of the k-th principal component is given by the corresponding singular value σk 2 • Singular values can be used to estimate how many components to keep • Rule of thumb: keep enough to explain 85% of the variation: 85 . 0 1 2 1 2      n j j k j j  
  • 30. Example 𝐴 = 𝑎11 ⋯ 𝑎1𝑛 ⋮ ⋱ ⋮ 𝑎𝑛1 ⋯ 𝑎𝑛𝑛 𝐴 = 𝑈Σ𝑉𝑇 • First right singular vector 𝑣1 • More or less same weight to all drugs • Discriminates heavy from light users • Second right singular vector • Positive values for legal drugs, negative for illegal students drugs legal illegal 𝑎𝑖𝑗: usage of student i of drug j Drug 2 Drug 1
  • 31. Another property of PCA/SVD • The chosen vectors are such that minimize the sum of square differences between the data vectors and the low-dimensional projections 4.0 4.5 5.0 5.5 6.0 2 3 4 5 1st (right) singular vector
  • 32. Application • Latent Semantic Indexing (LSI): • Apply PCA on the document-term matrix, and index the k-dimensional vectors • When a query comes, project it onto the k-dimensional space and compute cosine similarity in this space • Principal components capture main topics, and enrich the document representation
  • 33. SVD is “the Rolls-Royce and the Swiss Army Knife of Numerical Linear Algebra.”* *Dianne O’Leary, MMDS ’06