Upcoming SlideShare
×

# Latent Semanctic Analysis Auro Tripathy

1,474 views

Published on

LSA Overview - Covers
Singular Value Decomposition,
Dimensionality Reduction,
LSA in Information Retrieval

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,474
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
26
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Latent Semanctic Analysis Auro Tripathy

1. 1. Latent SemanticAnalysisAuro Tripathyipserv@yahoo.com
2. 2. Outline Introduction Singular Value Decomposition Dimensionality Reduction LSA in Information Retrieval
3. 3. Latent Semantic Analysis Introduction
4. 4. Mathematical treatment capableof inferring meaning Measures of word-word, word-passage, & passage-passage relations that correlate well with human understanding of semantic similarity Similarity estimates are NOT based on contiguity frequencies, co-occurrence counts, or usage correlations Mathematical way capable of inferring deeper relationships; hence “latent”
5. 5. Akin to a well-read nun dispensingsex-advice Analysis of text alone Its knowledge does NOT come from perceived information about the physical world, NOT from instinct, NOT from feelings, NOT from emotions Does NOT take into account word-order, phrases, syntactic relationships, logic, It takes in large amounts of text and looks for mutual interdependencies in the text
6. 6. Words and Passages LSA represents the meaning of a word as the average of the meaning of all the passages in which it appears… …and the meaning of the passage as an average of the meaning of the words it contains word1 word2 word3
7. 7. What is LSA? LSA is a mathematical technique for extracting and inferring relations of expected contextual usage of words in documents
8. 8. What LSA is not Not a natural language processing program Not an artificial intelligence program Does NOT use dictionaries or databases Does NOT use syntactic parsers Does not use morphologiesTakes as input – words and text paragraphs
9. 9. Example Titles of N=9 technical memoranda  Five on human-computer interaction  Four on mathematical graph theory  Disjoint topics Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
10. 10. Sample Word-by-Document Matrix  Word selection criteria – occurs in at least two of the titlesHow much was said about a topic Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
11. 11. Semantic Similarity using Spearman rank coefficient correlation  The correlation between human and user is negative, -0.38  The correlation between human and minor is also negative, -0.29  Expected; words never in the same passage, no co-occurrences Spearman ρ (human.user) = -0.38 Spearman ρ (human.minor) = -0.29http://en.wikipedia.org/wiki/Spearmans_rank_correlation_coefficient
12. 12. Singular Value Decomposition
13. 13. The Term Space DocumentsTerms Source: Latent Semantic Indexing and Information Retrieval, Johanna Geiß
14. 14. The Document Space DocumentsTerms Source: Latent Semantic Indexing and Information Retrieval, Johanna Geiß
15. 15. The Semantic Spaceone space for terms and documents  Represent terms AND documents in one space  Makes it possible to calculate similarities  Between documents  Between terms  Between terms and documents
16. 16. The DecompositionTerm1Term2Term3 S DT M T Term-by- rxr rxd document matrix txd txr  Splits the term-document matrix into three matrices  New space, the SVD space  because new axes were found by SVD along which the terms and documents can be grouped
17. 17. New Term Vector, New DocumentVector, & Singular Values  T contains in its rows the term vectors scaled to a new basis  DT contains the new vectors of the documents  S contains the singular values  σ1,σ2, …. σn  Where, σ1 ≥ σ2 ≥ …. ≥ σn ≥ 0
18. 18. Dimensionality ReductionTo reveal the latent semantic structure
19. 19. Reduce to k DimensionsTerm1Term2 S DTTerm3 M T Term-by- kxk rxk document matrix txd txk
20. 20. ExampleTerm Vector Reduced to two Dimensions T S D Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
21. 21. Reconstruction of the original matrixbased on the reduced dimensionsNEW Original Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
22. 22. Recomputed Semantic Similarity using Spearman rank coefficient correlation Spearman ρ (human.user) = +0.94 NEW Spearman ρ (human.minor) = -0.83 Spearman ρ (human.user) = -0.38 Original Spearman ρ (human.minor) = -0.29Humans-user correlation went up and the human-minor correlation went down
23. 23. Correlation between a title and all other titles – Raw Data•Correlation between the human-computer interaction titles was low•Average correlations, 0.2; half the Spearman correlations were 0•Correlation between the four graph-theory papers (mx / my) was mixed•Average Spearman correlation was 0.44, 0.•Correlation between human-computer interaction titles and thegraph-theory papers was -0.3, despite no semantic overlap Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
24. 24. Correlation in the reduced dimension (k=2) space•Average correlations jumped from 0.2 to 0.92•Correlation between the graph-theory papers (mx/my) was HIGH;1.0•Correlation between human-computer interaction titles and thegraph-theory papers was strongly negative Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
25. 25. LSA in Information Retrieval
26. 26. How to treat a query Matrix of term-by-document Perform SVD, reduce dimensions to 50-400 A query is a “pseudo-document”  Weighted average of the vector of the words it contains Use a similarity metric (such as cosine) between the query vector and the document- to-document vectors Rank the results
27. 27. The Query Vector Does better that literal matches between terms in query documents Superior when query and document use different words Source: Latent Semantic Indexing and Information Retrieval, Johanna Geiß
28. 28. References• Latent Semantic Indexing and Information Retrieval, Johanna Geiß• An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham