LSA algorithm

How Does LSA Work?
Andrew Koo - Insight Data Science

Latent Semantic Analysis
• Separate the text into sentences based on a trained model
• Build a sparse matrix of words and the count it appears in
each sentence
• Normalize each word with tf-idf
• Use singular value decomposition to reduce each
sentence vector to multidimensional “conceptual space”
• Pick top sentences based on the absolute value of the
sentence vector in the “conceptual space”

1. Separate the Text into Sentences
• Apply Tokenizer from the Python sumy Library
“Hi world! Hello
world! This is
Andrew.”
[“Hi world!”, “Hello
world!”, “This is
Andrew.”]

2. Build a sparse matrix of words and
the count it appears in each sentence
[“Hi world!”, “Hello
world!”, “This is
Andrew.”]
(Sen , word) Count
(0 , 2)
(0 , 5)
(1 , 5)
(1 , 1)
(2 , 4)
(2 , 3)
(2 , 0)
1
1
1
1
1
1
1

3. Normalize each word with tf-idf
• tf: term frequency - how frequent a term occurs in a document
• idf: inverse doc frequency - how important a word is (weigh
down the frequent terms, ex: is, does, how)
(Sen , word) Count
(0 , 2)
(0 , 5)
(1 , 5)
(1 , 1)
(2 , 4)
(2 , 3)
(2 , 0)
1
1
1
1
1
1
1
(Sen , word) Count
(0 , 2)
(0 , 5)
(1 , 5)
(1 , 1)
(2 , 4)
(2 , 3)
(2 , 0)
0.796
0.605
0.605
0.796
0.577
0.577
0.577

4. Use singular value decomposition to
reduce each sentence vector to
multidimensional “conceptual” space
Normalized word-sentence
matrix
Transform
matrix
Scaling
matrix
Concept
matrix
Multiply the normalized word-sentence matrix by UT to transform
each sentence to a vector in the multidimensional conceptual space

5. Pick top sentences based on the
absolute value of the sentence vector in
the “conceptual space”
Concept Vector
— λ1V1T —
— λ2V2T —
— λ3V3T —
— λ4V4T —
— λ5V5T —
— λ6V6T —
— λ7V7T —
=
Sentence Vector
S’0 S’1 S’2 S’3 S’4 S’5 S’6
0.400
0.213
0.243
0.762
0.145
0.123
0.254
The absolute value of this vector is the importance score of this
sentence

LSA algorithm

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to LSA algorithm

Similar to LSA algorithm (20)

Recently uploaded

Recently uploaded (20)

LSA algorithm