SlideShare a Scribd company logo
1 of 15
LSA & PLSA
INTRODUCTION
專題報告
台大統計碩士學位學程碩一 陳育婷
CITATION
LSA:
[1] M. W. Berry, S. T. Dumais, and T. A. Letsche, “Computational
methods for intelligent information access,” in Supercomputing‘95:
Proceedings of the 1995 ACM/IEEE Conference on Supercomputing, 1995,
pp. 20-20: IEEE. (cited by 179)
[2] S. T. Dumais, "Latent semantic analysis," Annual review of
information science and technology ,” vol. 38, no. 1, pp. 188-230, 2004.
(cited by 708)
PLSA:
[3] T. Hofmann, "Probabilistic latent semantic indexing," in ACM SIGIR
Forum, 2017, vol. 51, no. 2, pp. 211-218: ACM. (cited by 5520)
[4] T. Hofmann, "Unsupervised learning by probabilistic latent
semantic analysis," Machine learning, vol. 42, no. 1-2, pp. 177-196, 2001.
(cited by 2615)
目錄
1.Introduction
2.SVD (singular value decomposition)
3.LSA (latent semantic analysis)
4.PLSA (probabilistic latent semantic analysis)
5.LSA vs PLSA
INTRODUCTION
 Information retrieval
 Typical:lexical match between words in users’ requests and
those in or assigned to documents in a database.
 Problem:Fundamental characteristics of human word usage
underlie these retrieval failures --- people generate the same
keyword to describe well-known objects only 20 percent of
the time
→People use a wide variety of words to describe the same
object or concept (synonymy).
Ex. “human-computer interaction ” vs “man-machine study”
INTRODUCTION
 Solutions
 Stemming:converting words to their morphological root.
Ex. “retrieving” “retrieval”→ retrieve, not morphologically related?
 Controlled Vocabulary:requiring that query and index terms belong to
a pre-defined set of terms ,time-consuming manual process ?
 LSA:
1. fully unsupervised learning, automatic statistical approach
2. latent structure in word usage obscured by variability in word choice.
→ SVD:subspace represents important associative relationships
between terms and documents that are not evident in individual
documents.
SVD
 Given any m x n matrix A with rank r, it can be factorlized as
A = 𝑈𝛴𝑉 𝑇 =
𝑖=1
𝑟
𝑢𝑖 𝜎𝑖 𝑣𝑖
𝑇
 𝑈:diagonalizing matrix for 𝐴𝐴 𝑇,containing orthogonal
eigenvectors for 𝐴𝐴 𝑇
 𝛴 :positive, singular value of 𝐴,square roots of
eigenvalues of 𝐴𝐴 𝑇 and 𝐴 𝑇 𝐴
 𝑉:diagonalizing matrix for 𝐴 𝑇 𝐴,containing orthogonal
eigenvectors for 𝐴𝐴 𝑇
SVD
 Definition:norms 𝐴 𝐹
2
= 𝜎1
2
+ 𝜎2
2
+ ⋯ + 𝜎𝑟
2, 𝜎1
2
≥ ⋯ ≥ 𝜎𝑟
2
𝐴 𝑘 = 𝑈 𝑘 𝛴 𝑘 𝑉𝑘
𝑇
= 𝑉 𝑇 = 𝑖=1
𝑟
𝑢𝑖 𝜎𝑖 𝑣𝑖
𝑇
 Theorem:min 𝑟𝑎𝑛𝑘 𝐵 =𝑘 𝐴 − 𝐵 𝐹
2
= 𝐴 − 𝐴 𝑘 𝐹
2
= 𝜎 𝑘+1
2
+
⋯ + 𝜎𝑟
2
 𝑝𝑟𝑜𝑜𝑓:min 𝑟𝑎𝑛𝑘 𝐵 =𝑘 𝐴 − 𝐵 𝐹
2
= 𝑈𝛴𝑉 𝑇 − 𝐵 𝐹
2
=𝑙𝑒𝑓𝑡:∗𝑈 𝑇,𝑟𝑖𝑔ℎ𝑡:∗𝑉 𝛴 − 𝑈 𝑇 𝐵𝑉 𝐹
2
.
Find min 𝑟𝑎𝑛𝑘 𝐵 =𝑘 𝛴 − 𝑈 𝑇 𝐵𝑉 𝐹
2
,
𝑈 𝑇 𝐵𝑉 = 𝛴 𝑘. ∴ 𝐵 = 𝐴 𝑘 = 𝑈 𝑘 𝛴 𝑘 𝑉𝑘
𝑇
 Use 𝐴 𝑘 to approximate 𝐴
LSA (LATENT SEMANTIC ANALYSIS)
1. Term-Document Matrix. Rows are individual words and
columns are documents.
2. Transformed Matrix. Ex. TF-IDF=term
frequency x Inverse document frequency
3.Dimension Reduction-SVD comes into play !
LSA (LATENT SEMANTIC ANALYSIS)
 Since the number of dimensions, k, is smaller than the number of
unique terms, m, minor differences will be ignored. Terms which
occur in similar documents will be near each other in the k-
dimensional factor space. Some documents which do not share any
words with a users query may none the less be near it in k-space.
 Make no use of linguistic techniques for analyzing morphological,
syntactic, or semantic relations and humanly constructed resources
like dictionaries,. Its only input is large amounts of texts.
 Document-document, term-term, and term-document
similarities are computed in the reduced dimensional
approximation to A.
LSA (LATENT SEMANTIC ANALYSIS)
PLSA (PROBABILISTIC LSA)
 An aspect model-a latent variable model which associates an unobserved
class variable 𝑧 𝑘 ∈ 𝑧1 … 𝑧 𝑘 (k ≪ M, N) with each observation: the
occurrence of a word 𝑤 ∈ 𝑊 = 𝑤1 … 𝑤 𝑀 in a particular document 𝑑 ∈ 𝐷 =
𝑑1 … 𝑑 𝑁 .
1. select a document d with probability 𝑃(𝑑),
2. pick a latent class z with probability 𝑃(𝑧|𝑑),
3. generate a word w with probability 𝑃(𝑤|𝑧).
→ obtains an observed pair (𝑑, 𝑤), while latent class variable 𝑧 is discarded
→ Translating this process into a joint probability model results in expression
𝑃(𝑑, 𝑤) = 𝑃(𝑑)𝑃(𝑤|𝑑) = 𝑧∈𝑍 𝑃(𝑧)𝑃 𝑤 𝑧 𝑃 𝑑 𝑧
PLSA (PROBABILISTIC LSA)
1. one determines 𝑃 𝑧 , 𝑃 𝑤 𝑧 , 𝑃 𝑑 𝑧 by maximization of the log
likelihood function :L =
𝑑∈𝐷 𝑤∈𝑊 𝑛 𝑑, 𝑤 𝑙𝑜𝑔𝑃 𝑑, 𝑤 (𝑚𝑢𝑙𝑖𝑡𝑛𝑜𝑚𝑖𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛)
2. Standard procedure for MLE estimation in latent variable models-
Expectation Maximization (EM) algorithm.
(i) E-step 𝑃 𝑧 𝑑, 𝑤 =
𝑃 𝑧 𝑃 𝑤 𝑧 𝑃 𝑑 𝑧
𝑧′ 𝑃 𝑧′ 𝑃 𝑤 𝑧′ 𝑃 𝑑 𝑧′
(ii)M-step P w z = 𝑑 𝑛 𝑑,𝑤 𝑃(𝑧|𝑑,𝑤)
𝑑,𝑤′ 𝑛 𝑑,𝑤′ 𝑝(𝑧|𝑑,𝑤′)
, P d z = 𝑤 𝑛 𝑑,𝑤 𝑃(𝑧|𝑑,𝑤)
𝑑′,𝑤
𝑛 𝑑′,𝑤 𝑝(𝑧|𝑑′,𝑤)
𝑃 𝑧 =
1
𝑑,𝑤 𝑛(𝑑,𝑤) 𝑑,𝑤 𝑛 𝑑, 𝑤 𝑃(𝑧|𝑑, 𝑤)
Alternating (i) with (ii) defines a convergent procedure that approaches a
local maximum of the log likelihood in E(L).
PLSA (PROBABILISTIC LSA)
𝑃(𝑑1, 𝑤1) ⋯ 𝑃(𝑑1, 𝑤 𝑁)
⋮ ⋱ ⋮
𝑃(𝑑 𝑀, 𝑤1) ⋯ 𝑃(𝑑 𝑀, 𝑤 𝑁)
(𝑷 = 𝑈𝛴𝑉 𝑇
)
=
𝑃(𝑑1|𝑧1) ⋯ 𝑃(𝑑1|𝑧 𝑘)
⋮ ⋱ ⋮
𝑃(𝑑 𝑀|𝑧1) ⋯ 𝑃(𝑑 𝑀|𝑧 𝑘)
𝑥
𝑃(𝑧1) ⋯ 0
⋮ ⋱ ⋮
0 ⋯ 𝑃(𝑧 𝑘)
x
𝑃(𝑤1|𝑧1) ⋯ 𝑃(𝑤 𝑁|𝑧1)
⋮ ⋱ ⋮
𝑃(𝑤1|𝑧 𝑘) ⋯ 𝑃(𝑤 𝑁|𝑧 𝑘)
LSA VS PLSA
1. The objective function utilized to determine the optimal approximation .
LSA:L2- or Frobenius norm, corresponding to an implicit additive Gaussian noise
assumption on (possibly transformed) counts.
PLSA :Likelihood function of multinomial sampling and aims at an explicit
maximization of the predictive power of the model.
2. Interpretation of the directions.
LSA:No obvious interpretation
PLSA :Class-conditional word distributions that define a certain topical context.
3. LSA and PLSA can be applied to a wide range of tasks other than
informational retrieval. Ex. document clustering, literature-based
discovery, and modeling human memory.
THANKS!

More Related Content

What's hot

Minimality and Equicontinuity of a Sequence of Maps in Iterative Way
Minimality and Equicontinuity of a Sequence of Maps in Iterative WayMinimality and Equicontinuity of a Sequence of Maps in Iterative Way
Minimality and Equicontinuity of a Sequence of Maps in Iterative Wayinventionjournals
 
Linguistic variable
Linguistic variable Linguistic variable
Linguistic variable Math-Circle
 
2 entity relationship_model
2 entity relationship_model2 entity relationship_model
2 entity relationship_modelUtkarsh De
 
Basic Foundations of Automata Theory
Basic Foundations of Automata TheoryBasic Foundations of Automata Theory
Basic Foundations of Automata Theorysaugat86
 
MTH101 - Calculus and Analytical Geometry- Lecture 41
MTH101 - Calculus and Analytical Geometry- Lecture 41MTH101 - Calculus and Analytical Geometry- Lecture 41
MTH101 - Calculus and Analytical Geometry- Lecture 41Bilal Ahmed
 
Problems of function based syntax
Problems of function based syntaxProblems of function based syntax
Problems of function based syntaxDiego Krivochen
 
A brief history of process algebra
A brief history of process algebraA brief history of process algebra
A brief history of process algebrasugeladi
 
SJUT/Mat210/Interpolation/Lagrangian 2013-14S2
SJUT/Mat210/Interpolation/Lagrangian 2013-14S2SJUT/Mat210/Interpolation/Lagrangian 2013-14S2
SJUT/Mat210/Interpolation/Lagrangian 2013-14S2John Ham
 
Discrete Mathematics Lecture Notes
Discrete Mathematics Lecture NotesDiscrete Mathematics Lecture Notes
Discrete Mathematics Lecture NotesFellowBuddy.com
 
Binary Search - Design & Analysis of Algorithms
Binary Search - Design & Analysis of AlgorithmsBinary Search - Design & Analysis of Algorithms
Binary Search - Design & Analysis of AlgorithmsDrishti Bhalla
 
The Existence of Approximate Solutions for Nonlinear Volterra Type Random Int...
The Existence of Approximate Solutions for Nonlinear Volterra Type Random Int...The Existence of Approximate Solutions for Nonlinear Volterra Type Random Int...
The Existence of Approximate Solutions for Nonlinear Volterra Type Random Int...ijtsrd
 
4 the sql_standard
4 the  sql_standard4 the  sql_standard
4 the sql_standardUtkarsh De
 

What's hot (17)

Minimality and Equicontinuity of a Sequence of Maps in Iterative Way
Minimality and Equicontinuity of a Sequence of Maps in Iterative WayMinimality and Equicontinuity of a Sequence of Maps in Iterative Way
Minimality and Equicontinuity of a Sequence of Maps in Iterative Way
 
Linguistic variable
Linguistic variable Linguistic variable
Linguistic variable
 
2 entity relationship_model
2 entity relationship_model2 entity relationship_model
2 entity relationship_model
 
Basic Foundations of Automata Theory
Basic Foundations of Automata TheoryBasic Foundations of Automata Theory
Basic Foundations of Automata Theory
 
2016 m7 w2
2016 m7 w22016 m7 w2
2016 m7 w2
 
MTH101 - Calculus and Analytical Geometry- Lecture 41
MTH101 - Calculus and Analytical Geometry- Lecture 41MTH101 - Calculus and Analytical Geometry- Lecture 41
MTH101 - Calculus and Analytical Geometry- Lecture 41
 
Problems of function based syntax
Problems of function based syntaxProblems of function based syntax
Problems of function based syntax
 
A brief history of process algebra
A brief history of process algebraA brief history of process algebra
A brief history of process algebra
 
DBMS CS3
DBMS CS3DBMS CS3
DBMS CS3
 
SJUT/Mat210/Interpolation/Lagrangian 2013-14S2
SJUT/Mat210/Interpolation/Lagrangian 2013-14S2SJUT/Mat210/Interpolation/Lagrangian 2013-14S2
SJUT/Mat210/Interpolation/Lagrangian 2013-14S2
 
Discrete Mathematics Lecture Notes
Discrete Mathematics Lecture NotesDiscrete Mathematics Lecture Notes
Discrete Mathematics Lecture Notes
 
Binary search
Binary searchBinary search
Binary search
 
Binary Search - Design & Analysis of Algorithms
Binary Search - Design & Analysis of AlgorithmsBinary Search - Design & Analysis of Algorithms
Binary Search - Design & Analysis of Algorithms
 
Scaling hypothesis
Scaling hypothesisScaling hypothesis
Scaling hypothesis
 
The Existence of Approximate Solutions for Nonlinear Volterra Type Random Int...
The Existence of Approximate Solutions for Nonlinear Volterra Type Random Int...The Existence of Approximate Solutions for Nonlinear Volterra Type Random Int...
The Existence of Approximate Solutions for Nonlinear Volterra Type Random Int...
 
Hak dl07
Hak dl07Hak dl07
Hak dl07
 
4 the sql_standard
4 the  sql_standard4 the  sql_standard
4 the sql_standard
 

Similar to LSA and PLSA Explained

The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learningfridolin.wild
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learningtelss09
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...TELKOMNIKA JOURNAL
 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Association for Computational Linguistics
 
Class of Estimators of Population Median Using New Parametric Relationship fo...
Class of Estimators of Population Median Using New Parametric Relationship fo...Class of Estimators of Population Median Using New Parametric Relationship fo...
Class of Estimators of Population Median Using New Parametric Relationship fo...inventionjournals
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity ResolutionBenjamin Bengfort
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
05 8640 (update email) multiset cs closure propertie (edit lafi)2
05 8640 (update email) multiset cs closure propertie (edit lafi)205 8640 (update email) multiset cs closure propertie (edit lafi)2
05 8640 (update email) multiset cs closure propertie (edit lafi)2IAESIJEECS
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfJunghyun Lee
 
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...Wireilla
 
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...ijfls
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyAuro Tripathy
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...irjes
 
leanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerleanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerAdriano Melo
 

Similar to LSA and PLSA Explained (20)

The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...
 
Hmm and neural networks
Hmm and neural networksHmm and neural networks
Hmm and neural networks
 
Lash
LashLash
Lash
 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
 
Class of Estimators of Population Median Using New Parametric Relationship fo...
Class of Estimators of Population Median Using New Parametric Relationship fo...Class of Estimators of Population Median Using New Parametric Relationship fo...
Class of Estimators of Population Median Using New Parametric Relationship fo...
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 
Homometric Number of Graphs
Homometric Number of GraphsHomometric Number of Graphs
Homometric Number of Graphs
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
05 8640 (update email) multiset cs closure propertie (edit lafi)2
05 8640 (update email) multiset cs closure propertie (edit lafi)205 8640 (update email) multiset cs closure propertie (edit lafi)2
05 8640 (update email) multiset cs closure propertie (edit lafi)2
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
 
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
 
semeval2016
semeval2016semeval2016
semeval2016
 
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
STATISTICAL ANALYSIS OF FUZZY LINEAR REGRESSION MODEL BASED ON DIFFERENT DIST...
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro Tripathy
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
 
leanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerleanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL Reasoner
 

Recently uploaded

A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 

Recently uploaded (20)

A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 

LSA and PLSA Explained

  • 2. CITATION LSA: [1] M. W. Berry, S. T. Dumais, and T. A. Letsche, “Computational methods for intelligent information access,” in Supercomputing‘95: Proceedings of the 1995 ACM/IEEE Conference on Supercomputing, 1995, pp. 20-20: IEEE. (cited by 179) [2] S. T. Dumais, "Latent semantic analysis," Annual review of information science and technology ,” vol. 38, no. 1, pp. 188-230, 2004. (cited by 708) PLSA: [3] T. Hofmann, "Probabilistic latent semantic indexing," in ACM SIGIR Forum, 2017, vol. 51, no. 2, pp. 211-218: ACM. (cited by 5520) [4] T. Hofmann, "Unsupervised learning by probabilistic latent semantic analysis," Machine learning, vol. 42, no. 1-2, pp. 177-196, 2001. (cited by 2615)
  • 3. 目錄 1.Introduction 2.SVD (singular value decomposition) 3.LSA (latent semantic analysis) 4.PLSA (probabilistic latent semantic analysis) 5.LSA vs PLSA
  • 4. INTRODUCTION  Information retrieval  Typical:lexical match between words in users’ requests and those in or assigned to documents in a database.  Problem:Fundamental characteristics of human word usage underlie these retrieval failures --- people generate the same keyword to describe well-known objects only 20 percent of the time →People use a wide variety of words to describe the same object or concept (synonymy). Ex. “human-computer interaction ” vs “man-machine study”
  • 5. INTRODUCTION  Solutions  Stemming:converting words to their morphological root. Ex. “retrieving” “retrieval”→ retrieve, not morphologically related?  Controlled Vocabulary:requiring that query and index terms belong to a pre-defined set of terms ,time-consuming manual process ?  LSA: 1. fully unsupervised learning, automatic statistical approach 2. latent structure in word usage obscured by variability in word choice. → SVD:subspace represents important associative relationships between terms and documents that are not evident in individual documents.
  • 6. SVD  Given any m x n matrix A with rank r, it can be factorlized as A = 𝑈𝛴𝑉 𝑇 = 𝑖=1 𝑟 𝑢𝑖 𝜎𝑖 𝑣𝑖 𝑇  𝑈:diagonalizing matrix for 𝐴𝐴 𝑇,containing orthogonal eigenvectors for 𝐴𝐴 𝑇  𝛴 :positive, singular value of 𝐴,square roots of eigenvalues of 𝐴𝐴 𝑇 and 𝐴 𝑇 𝐴  𝑉:diagonalizing matrix for 𝐴 𝑇 𝐴,containing orthogonal eigenvectors for 𝐴𝐴 𝑇
  • 7. SVD  Definition:norms 𝐴 𝐹 2 = 𝜎1 2 + 𝜎2 2 + ⋯ + 𝜎𝑟 2, 𝜎1 2 ≥ ⋯ ≥ 𝜎𝑟 2 𝐴 𝑘 = 𝑈 𝑘 𝛴 𝑘 𝑉𝑘 𝑇 = 𝑉 𝑇 = 𝑖=1 𝑟 𝑢𝑖 𝜎𝑖 𝑣𝑖 𝑇  Theorem:min 𝑟𝑎𝑛𝑘 𝐵 =𝑘 𝐴 − 𝐵 𝐹 2 = 𝐴 − 𝐴 𝑘 𝐹 2 = 𝜎 𝑘+1 2 + ⋯ + 𝜎𝑟 2  𝑝𝑟𝑜𝑜𝑓:min 𝑟𝑎𝑛𝑘 𝐵 =𝑘 𝐴 − 𝐵 𝐹 2 = 𝑈𝛴𝑉 𝑇 − 𝐵 𝐹 2 =𝑙𝑒𝑓𝑡:∗𝑈 𝑇,𝑟𝑖𝑔ℎ𝑡:∗𝑉 𝛴 − 𝑈 𝑇 𝐵𝑉 𝐹 2 . Find min 𝑟𝑎𝑛𝑘 𝐵 =𝑘 𝛴 − 𝑈 𝑇 𝐵𝑉 𝐹 2 , 𝑈 𝑇 𝐵𝑉 = 𝛴 𝑘. ∴ 𝐵 = 𝐴 𝑘 = 𝑈 𝑘 𝛴 𝑘 𝑉𝑘 𝑇  Use 𝐴 𝑘 to approximate 𝐴
  • 8. LSA (LATENT SEMANTIC ANALYSIS) 1. Term-Document Matrix. Rows are individual words and columns are documents. 2. Transformed Matrix. Ex. TF-IDF=term frequency x Inverse document frequency 3.Dimension Reduction-SVD comes into play !
  • 9. LSA (LATENT SEMANTIC ANALYSIS)  Since the number of dimensions, k, is smaller than the number of unique terms, m, minor differences will be ignored. Terms which occur in similar documents will be near each other in the k- dimensional factor space. Some documents which do not share any words with a users query may none the less be near it in k-space.  Make no use of linguistic techniques for analyzing morphological, syntactic, or semantic relations and humanly constructed resources like dictionaries,. Its only input is large amounts of texts.  Document-document, term-term, and term-document similarities are computed in the reduced dimensional approximation to A.
  • 10. LSA (LATENT SEMANTIC ANALYSIS)
  • 11. PLSA (PROBABILISTIC LSA)  An aspect model-a latent variable model which associates an unobserved class variable 𝑧 𝑘 ∈ 𝑧1 … 𝑧 𝑘 (k ≪ M, N) with each observation: the occurrence of a word 𝑤 ∈ 𝑊 = 𝑤1 … 𝑤 𝑀 in a particular document 𝑑 ∈ 𝐷 = 𝑑1 … 𝑑 𝑁 . 1. select a document d with probability 𝑃(𝑑), 2. pick a latent class z with probability 𝑃(𝑧|𝑑), 3. generate a word w with probability 𝑃(𝑤|𝑧). → obtains an observed pair (𝑑, 𝑤), while latent class variable 𝑧 is discarded → Translating this process into a joint probability model results in expression 𝑃(𝑑, 𝑤) = 𝑃(𝑑)𝑃(𝑤|𝑑) = 𝑧∈𝑍 𝑃(𝑧)𝑃 𝑤 𝑧 𝑃 𝑑 𝑧
  • 12. PLSA (PROBABILISTIC LSA) 1. one determines 𝑃 𝑧 , 𝑃 𝑤 𝑧 , 𝑃 𝑑 𝑧 by maximization of the log likelihood function :L = 𝑑∈𝐷 𝑤∈𝑊 𝑛 𝑑, 𝑤 𝑙𝑜𝑔𝑃 𝑑, 𝑤 (𝑚𝑢𝑙𝑖𝑡𝑛𝑜𝑚𝑖𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛) 2. Standard procedure for MLE estimation in latent variable models- Expectation Maximization (EM) algorithm. (i) E-step 𝑃 𝑧 𝑑, 𝑤 = 𝑃 𝑧 𝑃 𝑤 𝑧 𝑃 𝑑 𝑧 𝑧′ 𝑃 𝑧′ 𝑃 𝑤 𝑧′ 𝑃 𝑑 𝑧′ (ii)M-step P w z = 𝑑 𝑛 𝑑,𝑤 𝑃(𝑧|𝑑,𝑤) 𝑑,𝑤′ 𝑛 𝑑,𝑤′ 𝑝(𝑧|𝑑,𝑤′) , P d z = 𝑤 𝑛 𝑑,𝑤 𝑃(𝑧|𝑑,𝑤) 𝑑′,𝑤 𝑛 𝑑′,𝑤 𝑝(𝑧|𝑑′,𝑤) 𝑃 𝑧 = 1 𝑑,𝑤 𝑛(𝑑,𝑤) 𝑑,𝑤 𝑛 𝑑, 𝑤 𝑃(𝑧|𝑑, 𝑤) Alternating (i) with (ii) defines a convergent procedure that approaches a local maximum of the log likelihood in E(L).
  • 13. PLSA (PROBABILISTIC LSA) 𝑃(𝑑1, 𝑤1) ⋯ 𝑃(𝑑1, 𝑤 𝑁) ⋮ ⋱ ⋮ 𝑃(𝑑 𝑀, 𝑤1) ⋯ 𝑃(𝑑 𝑀, 𝑤 𝑁) (𝑷 = 𝑈𝛴𝑉 𝑇 ) = 𝑃(𝑑1|𝑧1) ⋯ 𝑃(𝑑1|𝑧 𝑘) ⋮ ⋱ ⋮ 𝑃(𝑑 𝑀|𝑧1) ⋯ 𝑃(𝑑 𝑀|𝑧 𝑘) 𝑥 𝑃(𝑧1) ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ 𝑃(𝑧 𝑘) x 𝑃(𝑤1|𝑧1) ⋯ 𝑃(𝑤 𝑁|𝑧1) ⋮ ⋱ ⋮ 𝑃(𝑤1|𝑧 𝑘) ⋯ 𝑃(𝑤 𝑁|𝑧 𝑘)
  • 14. LSA VS PLSA 1. The objective function utilized to determine the optimal approximation . LSA:L2- or Frobenius norm, corresponding to an implicit additive Gaussian noise assumption on (possibly transformed) counts. PLSA :Likelihood function of multinomial sampling and aims at an explicit maximization of the predictive power of the model. 2. Interpretation of the directions. LSA:No obvious interpretation PLSA :Class-conditional word distributions that define a certain topical context. 3. LSA and PLSA can be applied to a wide range of tasks other than informational retrieval. Ex. document clustering, literature-based discovery, and modeling human memory.