SlideShare a Scribd company logo
1 of 15
Download to read offline
Lunch	and	Learn	
At	
Data	Science	for	Social	Good
NOVA-SBE	and	University	of	Chicago
By
Manas	Gaur
Singular Value Decomposition
first right
singular vector
• Singular	Value	Decomposition	(SVD)	is	also	called	
Spectral	Decomposition
• Instead	of	using	two	coordinates	(𝒙, 𝒚) to	describe	point	
locations,	let’s	use	only	one	coordinate	 𝒛
• Point’s	position	is	its	location	along	vector	𝒗 𝟏
• How	to	choose	𝒗 𝟏?	Minimize	reconstruction	error
J.	Leskovec,	A.	Rajaraman,	J.	Ullman:	Mining	of	Massive	Datasets,	http://www.mmds.org
Singular Value Decomposition
• Goal:	Minimize	the	sum
of	reconstruction	errors:	
) ) 𝑥+, − 𝑧+,
/
0
,12
3
+12
• where	𝒙 𝒊𝒋 are	the	“old”	and	𝒛𝒊𝒋 are	the	
“new”	coordinates
• SVD	gives	‘best’	axis	to	project	on:
• ‘best’	=	minimizing	the	reconstruction	errors
• In	other	words,	minimum	reconstruction	error
J.	Leskovec,	A.	Rajaraman,	J.	Ullman:	Mining	of	Massive	Datasets,	http://www.mmds.org
Singular Value Decomposition
•A	=	U	Σ VT	- example:
• V:	“movie-to-concept”	matrix
• U:	“user-to-concept”	matrix
= x x
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
variance (‘spread’)
on the v1 axis
Movie 1 rating
Movie2rating
J.	Leskovec,	A.	Rajaraman,	J.	Ullman:	Mining	of	Massive	Datasets,	http://www.mmds.org
Singular Value Decomposition
A U
Sigma
VT
=
B U
Sigma
VT
=
B is best approximation of A
How Many Singular Values Should
We Retain?
• A useful rule of thumb is to
retain enough singular values
to make up 90% of the energy
in Σ.
• Sum of the squares of the
retained singular values should
be at least 90% of the sum of the
squares of all the singular values.
• Example: the total energy is
(12.4)2 + (9.5)2 + (1.3)2 =
245.70, while the retained
energy is (12.4)2 + (9.5)2 =
244.01.
• We have retained over 99% of the
energy. However, were we to
eliminate the second singular
value, 9.5, the retained energy
would be only (12.4)2/245.70 or
about 63%.J.	Leskovec,	A.	Rajaraman,	J.	Ullman:	Mining	of	Massive	Datasets,	http://www.mmds.org
Relation to Eigen-decomposition
• SVD	gives	us:
• A = U Σ VT
• Eigen-decomposition:
• A = X Λ XT
• A	is	symmetric
• U,	V,	X	are	orthonormal	(UTU=I),
• Λ, Σ are	diagonal
• Now	let’s	calculate:
• AAT= UΣ VT(UΣ VT)T = UΣ VT(VΣTUT) = UΣΣT UT
• ATA = V ΣT UT (UΣ VT) = V ΣΣT VT
X Λ2 XT
X Λ2 XT
Shows how to compute
SVD using eigenvalue
decomposition!
J.	Leskovec,	A.	Rajaraman,	J.	Ullman:	Mining	of	Massive	Datasets,	http://www.mmds.org
Non-Linear Dimensionality
Reduction
Brainstorming
• What	is	dimensionality	of	data	?
• What	is	degree	of	freedoms	of	data	?
• Is	the	data	always	exist	in	high-dimensional	space	?
• What	is	the	rank	of	a	matrix	?
• What	motivates	us	for	non-linear	dimensionality	reduction	?
• Can	the	deep	learning’s	popular	MNIST	dataset	problem,	solvable	by	
simple	machine	learning	model	?
Why do we need dimensionality reduction?
• You	need	to	visualize	it	to	some	non-technical	board	members	which	
are	probably	not	familiar	with	:	terms	like	cosine	similarity	etc.
• Based	on	the	constraint,	such	as	preserve	80%	of	the	data.
• You	need	to	reduce	the	data	you	have	and	any	new	data	as	it	comes,	
which	method	would	you	choose?
Non-Linear Dimensionality Reduction
• Given a low dimensional surface embedded non-linearly in high dimensional space.
Such a surfaceiscalledManifold.
• Agood wayto representdatapointsisbytheir low-dimensionalcoordinates.
• The low-dimensional representation of the data should capture information about
high- dimensionalpairwisedistances.
• Non-lineardimensionalityreductionisalsocalledManifoldlearning.
• Idea :- Torecoverthe lowdimensionalsurface
ISOMAP
Stochastic	
Nearest	
Embedding
T-Stochastic
Nearest	
Embedding
NLDR over PCA
NLDR PCA
https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction
Isomap
• Isomapusesthesame core ideasastheMDS algorithm:
• Obtainamatrixofproximities(distancesbetweenpointsinadataset).
• Thisdistancematrixisamatrixofinnerproducts.
• AnEigendecompositionofthismatrixgivesusthelowerdimensionembedding.
Stochastic Neighbor Embedding (SNE)
𝑃,|+ =	
𝑒
;(<=;<>)
/?=
@
@
∑ 𝑒
;(<=;<B)
/?=
@
@
CD+
𝑄,|+ =
𝑒 F=;F>
@
∑ 𝑒 F=;F>
@
CD+
𝜎 =	
1
2𝜋
𝐾𝐿(𝑃| 𝑄 =	 ) 𝑃 𝑗 𝑖 	log	
𝑃 𝑗 𝑖
𝑄 𝑗 𝑖
	
𝑚𝑖𝑛
F=,F>
𝐾𝐿(𝑃||𝑄)
High	dimensional	space
Minimization	function
Low	dimensional	space	(2-D)
1.Large 𝑷𝒋|𝒊 is modeled as Low 𝑸 𝒋|𝒊 à
High Cost
2.Small 𝑷𝒋|𝒊 is modeled as High 𝑸 𝒋|𝒊 à
Low Cost
1.SNE is not Symmetric whereas
t-SNE is Symmetric.
2.Symmetricity makes t-SNE
fast.
T-Stochastic Neighbor Embedding (t-SNE)
𝑄,+ =	
(1 + ( 𝑦+ − 𝑦,
/
);2
∑ (1 + ( 𝑦+ − 𝑦,
/
);2
CD+
𝑃,+ =	
𝑒
;(<=;<>)
/?@
@
∑ 𝑒
;(<=;<B)
/?@
@
CD+
1. t-distribution has longer tails, embeds
more points in higher dimension to low
dimension.
2. There are some heuristics underlying t-
SNE.
3. Develops an intuition for what’s going
on in the high dimensional data
4. Find structure where other
dimensionality-reduction algorithms
cannot
High	dimensional	space
Low	dimensional	space	(2-D)

More Related Content

Similar to Lunch Learn Data Science SVD Dimensionality Reduction

DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-Ihktripathy
 
Explainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationExplainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationCSIRO
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive ModellingRajiv Advani
 
Distributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and RelatedDistributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and RelatedPei-Che Chang
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with TensorflowShubham Sharma
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptxsghorai
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAminaRepo
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr taeseon ryu
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
TPDM Presentation Slide (ICCV23)
TPDM Presentation Slide (ICCV23)TPDM Presentation Slide (ICCV23)
TPDM Presentation Slide (ICCV23)Suhyeon Lee
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine LearningPranav Ainavolu
 
Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfSeth Juarez
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptxThAnhonc
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learningVivek Maskara
 

Similar to Lunch Learn Data Science SVD Dimensionality Reduction (20)

DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Data Mining Lecture_9.pptx
Data Mining Lecture_9.pptxData Mining Lecture_9.pptx
Data Mining Lecture_9.pptx
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
 
Explainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationExplainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in education
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Distributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and RelatedDistributed Architecture of Subspace Clustering and Related
Distributed Architecture of Subspace Clustering and Related
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptx
 
Elhabian lda09
Elhabian lda09Elhabian lda09
Elhabian lda09
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
 
Part2
Part2Part2
Part2
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
TPDM Presentation Slide (ICCV23)
TPDM Presentation Slide (ICCV23)TPDM Presentation Slide (ICCV23)
TPDM Presentation Slide (ICCV23)
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConf
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptx
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learning
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Lunch Learn Data Science SVD Dimensionality Reduction

  • 2. Singular Value Decomposition first right singular vector • Singular Value Decomposition (SVD) is also called Spectral Decomposition • Instead of using two coordinates (𝒙, 𝒚) to describe point locations, let’s use only one coordinate 𝒛 • Point’s position is its location along vector 𝒗 𝟏 • How to choose 𝒗 𝟏? Minimize reconstruction error J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  • 3. Singular Value Decomposition • Goal: Minimize the sum of reconstruction errors: ) ) 𝑥+, − 𝑧+, / 0 ,12 3 +12 • where 𝒙 𝒊𝒋 are the “old” and 𝒛𝒊𝒋 are the “new” coordinates • SVD gives ‘best’ axis to project on: • ‘best’ = minimizing the reconstruction errors • In other words, minimum reconstruction error J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  • 4. Singular Value Decomposition •A = U Σ VT - example: • V: “movie-to-concept” matrix • U: “user-to-concept” matrix = x x 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32 12.4 0 0 0 9.5 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 variance (‘spread’) on the v1 axis Movie 1 rating Movie2rating J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  • 5. Singular Value Decomposition A U Sigma VT = B U Sigma VT = B is best approximation of A How Many Singular Values Should We Retain? • A useful rule of thumb is to retain enough singular values to make up 90% of the energy in Σ. • Sum of the squares of the retained singular values should be at least 90% of the sum of the squares of all the singular values. • Example: the total energy is (12.4)2 + (9.5)2 + (1.3)2 = 245.70, while the retained energy is (12.4)2 + (9.5)2 = 244.01. • We have retained over 99% of the energy. However, were we to eliminate the second singular value, 9.5, the retained energy would be only (12.4)2/245.70 or about 63%.J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  • 6. Relation to Eigen-decomposition • SVD gives us: • A = U Σ VT • Eigen-decomposition: • A = X Λ XT • A is symmetric • U, V, X are orthonormal (UTU=I), • Λ, Σ are diagonal • Now let’s calculate: • AAT= UΣ VT(UΣ VT)T = UΣ VT(VΣTUT) = UΣΣT UT • ATA = V ΣT UT (UΣ VT) = V ΣΣT VT X Λ2 XT X Λ2 XT Shows how to compute SVD using eigenvalue decomposition! J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  • 8. Brainstorming • What is dimensionality of data ? • What is degree of freedoms of data ? • Is the data always exist in high-dimensional space ? • What is the rank of a matrix ? • What motivates us for non-linear dimensionality reduction ? • Can the deep learning’s popular MNIST dataset problem, solvable by simple machine learning model ?
  • 9. Why do we need dimensionality reduction? • You need to visualize it to some non-technical board members which are probably not familiar with : terms like cosine similarity etc. • Based on the constraint, such as preserve 80% of the data. • You need to reduce the data you have and any new data as it comes, which method would you choose?
  • 10. Non-Linear Dimensionality Reduction • Given a low dimensional surface embedded non-linearly in high dimensional space. Such a surfaceiscalledManifold. • Agood wayto representdatapointsisbytheir low-dimensionalcoordinates. • The low-dimensional representation of the data should capture information about high- dimensionalpairwisedistances. • Non-lineardimensionalityreductionisalsocalledManifoldlearning. • Idea :- Torecoverthe lowdimensionalsurface
  • 12. NLDR over PCA NLDR PCA https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction
  • 13. Isomap • Isomapusesthesame core ideasastheMDS algorithm: • Obtainamatrixofproximities(distancesbetweenpointsinadataset). • Thisdistancematrixisamatrixofinnerproducts. • AnEigendecompositionofthismatrixgivesusthelowerdimensionembedding.
  • 14. Stochastic Neighbor Embedding (SNE) 𝑃,|+ = 𝑒 ;(<=;<>) /?= @ @ ∑ 𝑒 ;(<=;<B) /?= @ @ CD+ 𝑄,|+ = 𝑒 F=;F> @ ∑ 𝑒 F=;F> @ CD+ 𝜎 = 1 2𝜋 𝐾𝐿(𝑃| 𝑄 = ) 𝑃 𝑗 𝑖 log 𝑃 𝑗 𝑖 𝑄 𝑗 𝑖 𝑚𝑖𝑛 F=,F> 𝐾𝐿(𝑃||𝑄) High dimensional space Minimization function Low dimensional space (2-D) 1.Large 𝑷𝒋|𝒊 is modeled as Low 𝑸 𝒋|𝒊 à High Cost 2.Small 𝑷𝒋|𝒊 is modeled as High 𝑸 𝒋|𝒊 à Low Cost 1.SNE is not Symmetric whereas t-SNE is Symmetric. 2.Symmetricity makes t-SNE fast.
  • 15. T-Stochastic Neighbor Embedding (t-SNE) 𝑄,+ = (1 + ( 𝑦+ − 𝑦, / );2 ∑ (1 + ( 𝑦+ − 𝑦, / );2 CD+ 𝑃,+ = 𝑒 ;(<=;<>) /?@ @ ∑ 𝑒 ;(<=;<B) /?@ @ CD+ 1. t-distribution has longer tails, embeds more points in higher dimension to low dimension. 2. There are some heuristics underlying t- SNE. 3. Develops an intuition for what’s going on in the high dimensional data 4. Find structure where other dimensionality-reduction algorithms cannot High dimensional space Low dimensional space (2-D)