• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The Geometry of Learning

The Geometry of Learning



Latent Semantic Analysis (LSA) is a mathematical technique for computationally modeling the meaning of words and larger units of texts. LSA works by applying a mathematical technique called Singular ...

Latent Semantic Analysis (LSA) is a mathematical technique for computationally modeling the meaning of words and larger units of texts. LSA works by applying a mathematical technique called Singular Value Decomposition (SVD) to a term*document matrix containing frequency counts for all words found in the corpus in all of the documents or passages in the corpus. After this SVD application, the meaning of a word is represented as a vector in a multidimensional semantic space, which makes it possible to compare word meanings, for instance by computing the cosine between two word vectors.

LSA has been successfully used in a large variety of language related applications from automatic grading of student essays to predicting click trails in website navigation. In Coh-Metrix (Graesser et al. 2004), a computational tool that produces indices of the linguistic and discourse representations of a text, LSA was used as a measure of text cohesion by assuming that cohesion increases as a function of higher cosine scores between adjacent sentences.

Besides being interesting as a technique for building programs that need to deal with semantics, LSA is also interesting as a model of human cognition. LSA can match human performance on word association tasks and vocabulary test. In this talk, Fridolin will focus on LSA as a tool in modeling language acquisition. After framing the area of the talk with sketching the key concepts learning, information, and competence acquisition, and after outlining presuppositions, an introduction into meaningful interaction analysis (MIA) is given. MIA is a means to inspect learning with the support of language analysis that is geometrical in nature. MIA is a fusion of latent semantic analysis (LSA) combined with network analysis (NA/SNA). LSA, NA/SNA, and MIA are illustrated by several examples.



Total Views
Views on SlideShare
Embed Views



2 Embeds 5

http://www.slideshare.net 4
http://www.slashdocs.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

The Geometry of Learning The Geometry of Learning Presentation Transcript

  • The Geometry of Learning
    November 17th, 2009, Utrecht, The Netherlands
    Fridolin WildKMi, The Open University
  • (createdwith http://www.wordle.net)
  • Outline
    Context & Framing Theories
    Latent Semantic Analysis (LSA)
    Social Network Analysis (SNA)
    Meaningful Interaction Analysis (MIA)
    Conclusion & Outlook
  • Context & Theories
  • Information
    Information could be the quality of a certain signal.
    Information could be a logical abstractor, the release mechanism.
    Information & Knowledge
    Knowledge could be the delta at the receiver (a paper, a human, a library).
  • Learning is change
    Learning is about competence development
    Competence becomes visible in performance
    Professional competence is mainly about (re-)constructing and processing information and knowledge from cues
    Professional competence development is much about learning concepts from language
    Professional performance is much about demonstrating conceptual knowledge with language
    What is learning about?
  • Tying shoelaces
    Douglas Adams’ ‘meaning of liff’:
    Epping: The futile movements of forefingers and eyebrows used when failing to attract the attention of waiters and barmen.
    Shoeburyness: The vague uncomfortable feeling you get when sitting on a seat which is still warm from somebody else's bottom
    I have been convincingly Sapir-Whorfed by this book.
    Non-textual concepts things we can’t (easily) learn from language
  • Latent Semantic Analysis
  • Word Choice
    Educated adult understands ~100,000 word forms
    An average sentence contains 20 tokens.
    Thus 100,00020 possible combinationsof words in a sentence
     maximum of log2 100,00020= 332 bits in word choice alone.
    20! = 2.4 x 1018 possible orders of 20 words = maximum of 61 bits from order of the words.
    332/(61+ 332) = 84%word choice
    (Landauer, 2007)
  • Latent Semantic Analysis
    “Humans learn word meanings and how to combine them into passage meaning through experience with ~paragraph unitized verbal environments.”
    “They don’t remember all the separate words of a passage; they remember its overall gist or meaning.”
    “LSA learns by ‘reading’ ~paragraph unitized texts that represent the environment.”
    “It doesn’t remember all the separate words of a text it; it remembers its overall gist or meaning.”
    (Landauer, 2007)
  • Latent Semantics
    latent-semantic space
    In other words:Assumption: language utterances have a semantic structure
    Problem: structure is obscured by word usage(noise, synonymy, polysemy, …)
    Solution: map doc-term matrix using conceptual indices derived statistically (truncated SVD) and make similarity comparisons using angles
  • Input (e.g., documents)
    term = feature
    vocabulary = ordered set of features
    Only the red terms appear in more than one document, so strip the rest.
    { M } =
    Deerwester, Dumais, Furnas, Landauer, and Harshman (1990): Indexing by Latent Semantic Analysis, In: Journal of the American Society for Information Science, 41(6):391-407
  • Singular Value Decomposition
  • Truncated SVD
    latent-semantic space
    … we will get a different matrix (different values, but still of the same format as M).
  • (Landauer, 2007)
  • Reconstructed, Reduced Matrix
    m4: Graphminors: A survey
  • Similarity in a Latent-Semantic Space
    Y dimension
    Target 1
    Angle 1
    Angle 2
    Target 2
    X dimension
    (Landauer, 2007)
  • doc2doc - similarities
    Unreduced = pure vector space model
    - Based on M = TSD’
    - Pearson Correlation over document vectors
    - based on M2 = TS2D’
    - Pearson Correlation over document vectors
  • Typical, simple workflow
    tm = textmatrix(‘dir/‘)
    tm = lw_logtf(tm) * gw_idf(tm)
    space = lsa(tm, dims=dimcalc_share())
    tm3 = fold_in(tm, space)
  • Processing Pipeline (with Options)
    4 x 12 x 7 x 2 x 3 = 2016 Combinations
  • b) SVD is computationally expensive
    From seconds (lower hundreds of documents, optimised linear algebra libraries, truncated SVD)
    To minutes (hundreds to thousands of documents)
    To hours (tens and hundreds of thousands)
    a) SVD factor stability
    SVD calculates factorsover a given text base; different texts – different factors
    Problem: avoid unwanted factor changes
    Solution: folding-in of instead of recalculating
    Projecting by Folding-In
  • 2
    Folding-In in Detail
    (cf. Berry et al., 1995)
    (2) convert
    vector to
    (1) convert
    Vector to
  • The Value of Singular Values
    Pearson(jahr, wien)
    Pearson(eu, österreich)
  • Simple LSA application
  • Summary Writing: Working Principle
    (Landauer, 2007)
  • Summary Writing
    Gold Standard 1
    Standard 2
    Y dimension
    Gold Standard 3
    Essay 1
    Essay 2
    X dimension
  • ‘Dumb’ Summary Writing (Code)
    library( "lsa“ )# load package
    # load training texts
    trm = textmatrix( "trainingtexts/“ )
    trm = lw_bintf( trm ) * gw_idf( trm )# weighting
    space = lsa( trm )# create an LSA space
    # fold-in summaries to be tested (including gold standard text)
    tem = textmatrix( "testessays/", vocabulary=rownames(trm) )
    tem_red = fold_in( tem, space )
    # score a summary by comparing with
    # gold standard text (very simple method!)
    cor( tem_red[,"goldstandard.txt"], tem_red[,"E1.txt"] )
    => 0.7
  • Evaluating Effectiveness
    Compare Machine Scores with Human Scores
    Human-to-Human Correlation
    Usually around .6
    Increased by familiarity between assessors, tighter assessment schemes, …
    Scores vary even stronger with decreasing subject familiarity (.8 at high familiarity, worst test -.07)
    • Test Collection: 43 German Essays, scored from 0 to 5 points (ratio scaled), average length: 56.4 words
    • Training Collection: 3 ‘golden essays’, plus 302 documents from a marketing glossary, average length: 56.1 words
  • (Positive) Evaluation Results
    LSA machinescores:
    Spearman's rank correlationrho
    data: humanscores[names(machinescores), ] and machinescores
    S = 914.5772, p-value = 0.0001049
    alternative hypothesis: truerhoisnotequal to 0
    Pure vectorspacemodel:
    Spearman's rank correlationrho
    data: humanscores[names(machinescores), ] and machinescores
    S = 1616.007, p-value = 0.02188
    alternative hypothesis: truerhoisnotequal to 0
  • (S)NA
  • Social Network Analysis
    Existing for a long time (term coined 1954)
    Basic idea:
    Actors and Relationships between them (e.g. Interactions)
    Actors can be people (groups, media, tags, …)
    Actors and Ties form a Graph (edges and nodes)
    Within that graph, certain structures can be investigated
    Betweenness, Degree of Centrality, Density, Cohesion
    Structural Patterns can be identified (e.g. the Troll)
  • Forum Messages
  • Incidence Matrix
    msg_id = incident, authorsappear in incidents
  • DeriveAdjacency Matrix
    = t(im) %*% im
  • Visualization: Sociogramme
  • Measuring Techniques (Sample)
    Closenesshow close to all others
    Degree Centralitynumber of (in/out) connections to others
    Betweennesshow often intermediary
    Componentse.g. kmeans cluster (k=3)
  • SNA applications
  • Co-Authorship Network WI (2005)
  • Paper Collaboration Prolearn
    e.g. co-authorships of ~30 deliverables of three work packages (ProLearn NoE)
    Roles: reviewer (red), editor (green), contributor
    Size: Prestige()
    But: type of interaction? Content of interaction? => not possible!
  • TEL Project Cooperation (2004-2007)
  • iCamp Collaboration (Y1)
    Shades of yellow: WP leadership
    Red: coordinator
  • MIA
  • Meaningful Interaction Analysis (MIA)
    Fusion: Combining LSA with SNA
    Terms and Documents (or anything else represented with column vectors or row vectors) are mapped into same space by LSA
    Semantic proximity can be measured between them: how close is a term to a document?
    (S)NA allows to analyse these resulting graph structures
    By e.g. cluster or component analysis
    By e.g. identifying central descriptors for these
  • The mathemagics behind
    Meaning Interaction Analysis
  • Truncated SVD
    latent-semantic space
    … we will get a different matrix (different values, but still of the same format as M).
  • Knowledge Proxy: LSA Part
    Tk= left-hand sided matrix = ‚term loadings‘ on the singular value
    Dk= right-hand sided matrix = ‚document loadings‘ on the singular value
    Multiply them into same space
    VT = TkSk
    VD = DkTSk
    Cosine Distance Matrix over ... = a graph
    Extension: add author vectors VAthrough cluster centroids or vector addition of their publication vectors
    latent-semantic space
    Ofcourse:useexistingspaceandfold inthewholesetsofvectors
  • Knowledge Proxy: SNA Part:Filter the Network
    Every vectorhas a cosinedistancetoeveryother (maybe negative)!
    So: filter forthedesiredsimilaritystrength
  • ConSpectmonitoring conceptual development
  • TopicProxy (30 people, 2005)
  • Spot unwanted fragmentation
    e.g. two authors work on the same topic, but with different collaborator groups and with different literature
    Intervention Instrument: automatically recommend to hold a flashmeeting
    Bringing together what belongs together
    Wild, Ochoa, Heinze, Crespo, Quick (2009, to appear)
  • //eof.