SlideShare a Scribd company logo
How Does LSA Work? 
Andrew Koo - Insight Data Science
Latent Semantic Analysis 
• Separate the text into sentences based on a trained model 
• Build a sparse matrix of words and the count it appears in 
each sentence 
• Normalize each word with tf-idf 
• Use singular value decomposition to reduce each 
sentence vector to multidimensional “conceptual space” 
• Pick top sentences based on the absolute value of the 
sentence vector in the “conceptual space”
1. Separate the Text into Sentences 
• Apply Tokenizer from the Python sumy Library 
“Hi world! Hello 
world! This is 
Andrew.” 
[“Hi world!”, “Hello 
world!”, “This is 
Andrew.”]
2. Build a sparse matrix of words and 
the count it appears in each sentence 
[“Hi world!”, “Hello 
world!”, “This is 
Andrew.”] 
(Sen , word) Count 
(0 , 2) 
(0 , 5) 
(1 , 5) 
(1 , 1) 
(2 , 4) 
(2 , 3) 
(2 , 0) 
1 
1 
1 
1 
1 
1 
1
3. Normalize each word with tf-idf 
• tf: term frequency - how frequent a term occurs in a document 
• idf: inverse doc frequency - how important a word is (weigh 
down the frequent terms, ex: is, does, how) 
(Sen , word) Count 
(0 , 2) 
(0 , 5) 
(1 , 5) 
(1 , 1) 
(2 , 4) 
(2 , 3) 
(2 , 0) 
1 
1 
1 
1 
1 
1 
1 
(Sen , word) Count 
(0 , 2) 
(0 , 5) 
(1 , 5) 
(1 , 1) 
(2 , 4) 
(2 , 3) 
(2 , 0) 
0.796 
0.605 
0.605 
0.796 
0.577 
0.577 
0.577
4. Use singular value decomposition to 
reduce each sentence vector to 
multidimensional “conceptual” space 
Normalized word-sentence 
matrix 
Transform 
matrix 
Scaling 
matrix 
Concept 
matrix 
Multiply the normalized word-sentence matrix by UT to transform 
each sentence to a vector in the multidimensional conceptual space
5. Pick top sentences based on the 
absolute value of the sentence vector in 
the “conceptual space” 
Concept Vector 
— λ1V1T — 
— λ2V2T — 
— λ3V3T — 
— λ4V4T — 
— λ5V5T — 
— λ6V6T — 
— λ7V7T — 
= 
Sentence Vector 
S’0 S’1 S’2 S’3 S’4 S’5 S’6 
0.400 
0.213 
0.243 
0.762 
0.145 
0.123 
0.254 
The absolute value of this vector is the importance score of this 
sentence

More Related Content

What's hot

Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text Summarization
Tho Phan
 
String Matching with Finite Automata,Aho corasick,
String Matching with Finite Automata,Aho corasick,String Matching with Finite Automata,Aho corasick,
String Matching with Finite Automata,Aho corasick,
8neutron8
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Roelof Pieters
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
HeneWijaya
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
佳蓉 倪
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
Leiden University
 
Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
Mahsa Mohaghegh
 
Text Similarity
Text SimilarityText Similarity
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distances
Ganesh Borle
 
Text summarization
Text summarization Text summarization
Text summarization
prateek khandelwal
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
ankit_ppt
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
Roberto Pereira Silveira
 
Word2Vec
Word2VecWord2Vec
Word2Vec
hyunyoung Lee
 
Hidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionHidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognition
butest
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
Grigory Sapunov
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
Saurav Shrestha
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
Alia Hamwi
 
Message Authentication Code & HMAC
Message Authentication Code & HMACMessage Authentication Code & HMAC
Message Authentication Code & HMAC
Krishna Gehlot
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
Christian Perone
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for English
Hemantha Kulathilake
 

What's hot (20)

Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text Summarization
 
String Matching with Finite Automata,Aho corasick,
String Matching with Finite Automata,Aho corasick,String Matching with Finite Automata,Aho corasick,
String Matching with Finite Automata,Aho corasick,
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
 
Text Similarity
Text SimilarityText Similarity
Text Similarity
 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distances
 
Text summarization
Text summarization Text summarization
Text summarization
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Hidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionHidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognition
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Message Authentication Code & HMAC
Message Authentication Code & HMACMessage Authentication Code & HMAC
Message Authentication Code & HMAC
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for English
 

Viewers also liked

Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
Sudarsun Santhiappan
 
NLP and LSA getting started
NLP and LSA getting startedNLP and LSA getting started
NLP and LSA getting started
Innovation Engineering
 
Latent Semantic Indexing and Analysis
Latent Semantic Indexing and AnalysisLatent Semantic Indexing and Analysis
Latent Semantic Indexing and Analysis
Mercy Livingstone
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
Robert Viseur
 
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco AmalfiHow to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
Social Media Camp
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro Tripathy
Auro Tripathy
 
Using ls as in class 2015
Using ls as in class 2015Using ls as in class 2015
Using ls as in class 2015
MrsMcGinty
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
Marina Santini
 
Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and Applications
Ayush Jain
 
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...
Alexis Perrier
 
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016
Jonathan Sedar
 
Mathematical approach for Text Mining 1
Mathematical approach for Text Mining 1Mathematical approach for Text Mining 1
Mathematical approach for Text Mining 1
Kyunghoon Kim
 
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
rchbeir
 
Recommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human CategorizationRecommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human Categorization
Christoph Trattner
 
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
Damiano Spina
 
Geometric Aspects of LSA
Geometric Aspects of LSAGeometric Aspects of LSA
Geometric Aspects of LSA
Heinrich Hartmann
 
20 cv mil_models_for_words
20 cv mil_models_for_words20 cv mil_models_for_words
20 cv mil_models_for_words
zukun
 
Analysis of Reviews on Sony Z3
Analysis of Reviews on Sony Z3Analysis of Reviews on Sony Z3
Analysis of Reviews on Sony Z3
Krishna Bollojula
 
AutoCardSorter - Designing the Information Architecture of a web site using L...
AutoCardSorter - Designing the Information Architecture of a web site using L...AutoCardSorter - Designing the Information Architecture of a web site using L...
AutoCardSorter - Designing the Information Architecture of a web site using L...
Christos Katsanos
 
Latent Semantic Indexing and Search Engines Optimimization (SEO)
Latent Semantic Indexing and Search Engines Optimimization (SEO)Latent Semantic Indexing and Search Engines Optimimization (SEO)
Latent Semantic Indexing and Search Engines Optimimization (SEO)
muzzy4friends
 

Viewers also liked (20)

Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
NLP and LSA getting started
NLP and LSA getting startedNLP and LSA getting started
NLP and LSA getting started
 
Latent Semantic Indexing and Analysis
Latent Semantic Indexing and AnalysisLatent Semantic Indexing and Analysis
Latent Semantic Indexing and Analysis
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco AmalfiHow to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro Tripathy
 
Using ls as in class 2015
Using ls as in class 2015Using ls as in class 2015
Using ls as in class 2015
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
 
Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and Applications
 
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...
 
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016
 
Mathematical approach for Text Mining 1
Mathematical approach for Text Mining 1Mathematical approach for Text Mining 1
Mathematical approach for Text Mining 1
 
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
 
Recommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human CategorizationRecommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human Categorization
 
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
 
Geometric Aspects of LSA
Geometric Aspects of LSAGeometric Aspects of LSA
Geometric Aspects of LSA
 
20 cv mil_models_for_words
20 cv mil_models_for_words20 cv mil_models_for_words
20 cv mil_models_for_words
 
Analysis of Reviews on Sony Z3
Analysis of Reviews on Sony Z3Analysis of Reviews on Sony Z3
Analysis of Reviews on Sony Z3
 
AutoCardSorter - Designing the Information Architecture of a web site using L...
AutoCardSorter - Designing the Information Architecture of a web site using L...AutoCardSorter - Designing the Information Architecture of a web site using L...
AutoCardSorter - Designing the Information Architecture of a web site using L...
 
Latent Semantic Indexing and Search Engines Optimimization (SEO)
Latent Semantic Indexing and Search Engines Optimimization (SEO)Latent Semantic Indexing and Search Engines Optimimization (SEO)
Latent Semantic Indexing and Search Engines Optimimization (SEO)
 

Similar to LSA algorithm

Text features
Text featuresText features
Text features
Shruti kar
 
Subword tokenizers
Subword tokenizersSubword tokenizers
Subword tokenizers
Ha Loc Do
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
JOBANPREETSINGH62
 
Word embedding
Word embedding Word embedding
Word embedding
ShivaniChoudhary74
 
Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
 Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor... Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
andrejusb
 
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine TranslationRoee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Association for Computational Linguistics
 
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Kunwoo Park
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
NILESH VERMA
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
Shruti kar
 
Beginning text analysis
Beginning text analysisBeginning text analysis
Beginning text analysis
Barry DeCicco
 
Dialog system understanding
Dialog system understandingDialog system understanding
Dialog system understanding
Tran Trung
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
Osebe Sammi
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
Jonathan Mugan
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
Massimo Schenone
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
HPCC Systems
 
Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242
Josh Patterson
 
DATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptx
DrPraveenPawar
 
CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)
Hon Weng Chong
 
Search pitb
Search pitbSearch pitb
Search pitb
Nawab Iqbal
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big Data
Sameer Wadkar
 

Similar to LSA algorithm (20)

Text features
Text featuresText features
Text features
 
Subword tokenizers
Subword tokenizersSubword tokenizers
Subword tokenizers
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
Word embedding
Word embedding Word embedding
Word embedding
 
Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
 Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor... Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
 
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine TranslationRoee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
 
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Beginning text analysis
Beginning text analysisBeginning text analysis
Beginning text analysis
 
Dialog system understanding
Dialog system understandingDialog system understanding
Dialog system understanding
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
 
Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242
 
DATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptx
 
CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)
 
Search pitb
Search pitbSearch pitb
Search pitb
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big Data
 

Recently uploaded

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 

Recently uploaded (20)

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 

LSA algorithm

  • 1. How Does LSA Work? Andrew Koo - Insight Data Science
  • 2. Latent Semantic Analysis • Separate the text into sentences based on a trained model • Build a sparse matrix of words and the count it appears in each sentence • Normalize each word with tf-idf • Use singular value decomposition to reduce each sentence vector to multidimensional “conceptual space” • Pick top sentences based on the absolute value of the sentence vector in the “conceptual space”
  • 3. 1. Separate the Text into Sentences • Apply Tokenizer from the Python sumy Library “Hi world! Hello world! This is Andrew.” [“Hi world!”, “Hello world!”, “This is Andrew.”]
  • 4. 2. Build a sparse matrix of words and the count it appears in each sentence [“Hi world!”, “Hello world!”, “This is Andrew.”] (Sen , word) Count (0 , 2) (0 , 5) (1 , 5) (1 , 1) (2 , 4) (2 , 3) (2 , 0) 1 1 1 1 1 1 1
  • 5. 3. Normalize each word with tf-idf • tf: term frequency - how frequent a term occurs in a document • idf: inverse doc frequency - how important a word is (weigh down the frequent terms, ex: is, does, how) (Sen , word) Count (0 , 2) (0 , 5) (1 , 5) (1 , 1) (2 , 4) (2 , 3) (2 , 0) 1 1 1 1 1 1 1 (Sen , word) Count (0 , 2) (0 , 5) (1 , 5) (1 , 1) (2 , 4) (2 , 3) (2 , 0) 0.796 0.605 0.605 0.796 0.577 0.577 0.577
  • 6. 4. Use singular value decomposition to reduce each sentence vector to multidimensional “conceptual” space Normalized word-sentence matrix Transform matrix Scaling matrix Concept matrix Multiply the normalized word-sentence matrix by UT to transform each sentence to a vector in the multidimensional conceptual space
  • 7. 5. Pick top sentences based on the absolute value of the sentence vector in the “conceptual space” Concept Vector — λ1V1T — — λ2V2T — — λ3V3T — — λ4V4T — — λ5V5T — — λ6V6T — — λ7V7T — = Sentence Vector S’0 S’1 S’2 S’3 S’4 S’5 S’6 0.400 0.213 0.243 0.762 0.145 0.123 0.254 The absolute value of this vector is the importance score of this sentence