Language Technology Enhanced Learning - Presentation Transcript
Language Technology Enhanced Learning Fridolin Wild The Open University, UK Gaston Burek University of Tübingen Adriana Berlanga Open University, NL
Workshop Outline
1 | Deep Introduction Latent-Semantic Analysis (LSA)
2 | Quick Introduction Working with R
3 | Experiment Simple Content-Based Feedback
4 | Experiment Topic Proxy
#
Latent-Semantic Analysis LSA
Latent Semantic Analysis
Assumption: language utterances do have a semantic structure
However, this structure is obscured by word usage (noise, synonymy, polysemy, …)
Proposed LSA Solution:
map doc-term matrix
using conceptual indices
derived statistically (truncated SVD )
and make similarity comparisons using e.g. angles
Input (e.g., documents) { M } = Deerwester, Dumais, Furnas, Landauer, and Harshman (1990): Indexing by Latent Semantic Analysis, In: Journal of the American Society for Information Science, 41(6):391-407 Only the red terms appear in more than one document, so strip the rest. term = feature vocabulary = ordered set of features TEXTMATRIX
Singular Value Decomposition =
Truncated SVD … we will get a different matrix (different values, but still of the same format as M). latent-semantic space
Reconstructed, Reduced Matrix m4: Graph minors : A survey
Similarity in a Latent-Semantic Space (Landauer, 2007) Query Target 1 Target 2 Angle 2 Angle 1 Y dimension X dimension
doc2doc - similarities
Unreduced = pure vector space model
- Based on M = TSD’
- Pearson Correlation over document vectors
reduced
- based on M 2 = TS 2 D’
- Pearson Correlation over document vectors
(Landauer, 2007)
Configurations 4 x 12 x 7 x 2 x 3 = 2016 Combinations
Updating: Folding-In
SVD factor stability
Different texts – different factors
Challenge: avoid unwanted factor changes (e.g., bad essays)
0 comments
Post a comment