SlideShare a Scribd company logo
Vector Space Model
By: Tharuka Vishwajith
Boolean Model
• Based on set theory and Boolean logic
• Exact matching of documents to a user query
• Uses the Boolean AND, OR and NOT operators
D1 D2 D3 D4 D5 D6
Cat 1 1 0 1 0 1
Dog 1 1 1 1 1 0
Rat 0 1 0 1 0 1
Apple 0 0 0 0 1 0
Orange 0 0 1 1 0 1
Computer 0 0 0 1 1 1
• query: Dog AND Cat AND NOT Computer
• computation: 111110 AND 110101 AND 111000 = 110000
• result: document set {D1,D2}
D1 D2 D3 D4 D5 D6
Cat 1 1 0 1 0 1
Dog 1 1 1 1 1 0
Rat 0 1 0 1 0 1
Apple 0 0 0 0 1 0
Orange 0 0 1 1 0 1
Computer 0 0 0 1 1 1
Boolean Model ...
Advantages
• Relatively easy to implement and scalable
• Fast query processing based on parallel scanning of indexes
Disadvantages
• Does not pay attention to synonymy
• Does not pay attention to polysemy
• No ranking of output
• Often the user has to learn a special syntax such as the use of double quotes to
search for phrases
Vector Space Model
• Algebraic model representing text documents and queries as vectors
based on the index terms
• One dimension for each term
• Compute the similarity (angle) between the query vector and the
document vectors
Dog
Computer
D2
D1
5
1
2 8
Query
θ1
θ2
Cosine similarity among 3 documents
Term SaS PaP WH
affection 115 58 20
jealous 10 7 11
gossip 2 0 6
wuthering 0 0 38
1 + log(tf)
Term frequency (tf) count
Log normalization:
Cosine similarity among 3 documents
Term SaS PaP WH
affection 115 58 20
jealous 10 7 11
gossip 2 0 6
wuthering 0 0 38
Log Frequency Weightage
Length normalization for SaS = (3.06)2 + (2)2 + (1.3)2 + (0) 2
Term SaS PaP WH
affection 3.06 0.83 0.52
jealous 2.00 0.55 0.46
gossip 1.30 0 0.40
wuthering 0 0 0.58
Length normalization for PaP = (2.76)2 + (1.84)2 + (0)2 + (0) 2
Length normalization for WH = (2.3)2 + (2.04)2 + (1.78)2 + (2.58) 2
= 3.87
= 3.31
= 4.39
Term SaS PaP WH
affection 3.06 2.76 2.30
jealous 2.00 1.84 2.04
gossip 1.30 0 1.78
wuthering 0 0 2.58
Cosine similarity among 3 documents
Term SaS PaP WH
affection 115 58 20
jealous 10 7 11
gossip 2 0 6
wuthering 0 0 38
After Length Normalization
Length normalization for SaS = (3.06)2 + (2)2 + (1.3)2 + (0) 2
Term SaS PaP WH
affection 3.06 / 3.87 2.78 / 3.31 2.30 / 4.39
jealous 2.00 / 3.87 1.84 / 3.31 2.04 / 4.39
gossip 1.30 / 3.87 0 / 3.31 1.78 / 4.39
wuthering 0 / 3.87 0 / 3.31 2.58 / 4.39
Length normalization for PaP = (2.76)2 + (1.84)2 + (0)2 + (0) 2
Length normalization for WH = (2.3)2 + (2.04)2 + (1.77)2 + (2.57) 2
= 3.87
= 3.31
= 4.39
Cosine similarity among 3 documents
Term SaS PaP WH
affection 115 58 20
jealous 10 7 11
gossip 2 0 6
wuthering 0 0 38
After Length Normalization
Cos( SaS . PaP ) ∝ (0.79 x 0.84) + (0.51 x 0.56)
Term SaS PaP WH
affection 0.79 0.84 0.52
jealous 0.51 0.56 0.46
gossip 0.33 0 0.40
wuthering 0 0 0.58
Cos ( PaP . WH ) ∝ (0.84 x 0.52) + (0.56 x 0.46)
Cos ( SaS . WH ) ∝ (0.79 x 0.52) + (0.51 x 0.46) + (0.33 x 0.4)
= 0.95
= 0.69
= 0.78
Vector space model in information retrieval
Vector space model in information retrieval

More Related Content

What's hot

Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information RetrievalDishant Ailawadi
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
information retrieval Techniques and normalization
information retrieval Techniques and normalizationinformation retrieval Techniques and normalization
information retrieval Techniques and normalization
Ameenababs
 
Inverted index
Inverted indexInverted index
Inverted index
Krishna Gehlot
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
Primya Tamil
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
Vaibhav Khanna
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrievalKU Leuven
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval ssilambu111
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
Rupali Bhatnagar
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)silambu111
 
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information RetrievalIndexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Vikas Bhushan
 
Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval
Abhay Ratnaparkhi
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
ImXaib
 
Information retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic modelsInformation retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic models
Vaibhav Khanna
 
The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
Primya Tamil
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
Mounia Lalmas-Roelleke
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
ssbd6985
 
Information retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of irInformation retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of ir
Vaibhav Khanna
 

What's hot (20)

Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information Retrieval
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
information retrieval Techniques and normalization
information retrieval Techniques and normalizationinformation retrieval Techniques and normalization
information retrieval Techniques and normalization
 
Inverted index
Inverted indexInverted index
Inverted index
 
Ir models
Ir modelsIr models
Ir models
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
 
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information RetrievalIndexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
 
Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
Information retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic modelsInformation retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic models
 
The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Information retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of irInformation retrieval 14 fuzzy set models of ir
Information retrieval 14 fuzzy set models of ir
 

Similar to Vector space model in information retrieval

Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
pasalapudi
 
Kaizen cso002 l1
Kaizen cso002 l1Kaizen cso002 l1
Kaizen cso002 l1asslang
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with R
Akira Murakami
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with R
Akira Murakami
 
Text-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven TemplatesText-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven Templates
Jinho Choi
 
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
BasavaRajeshwari2
 
Three steps to untangle data traffic jams
Three steps to untangle data traffic jamsThree steps to untangle data traffic jams
Three steps to untangle data traffic jams
Bol.com Techlab
 
BenG Update on automatic labelling
BenG Update on automatic labellingBenG Update on automatic labelling
BenG Update on automatic labelling
Victor de Boer
 
ppt_pspp.pdf
ppt_pspp.pdfppt_pspp.pdf
ppt_pspp.pdf
ShereenAhmedMohamed
 
LEC 1.pptx
LEC 1.pptxLEC 1.pptx
LEC 1.pptx
umerfarooq648026
 
Digital Logic & Design
Digital Logic & DesignDigital Logic & Design
Digital Logic & Design
Rokonuzzaman Rony
 
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTopic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Tetsuya Sakai
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
Wim Godden
 
chapter 1.pptx
chapter 1.pptxchapter 1.pptx
chapter 1.pptx
HamzaJaved306957
 
Stream-based Data Synchronization
Stream-based Data SynchronizationStream-based Data Synchronization
Stream-based Data Synchronization
Klemen Verdnik
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
Jay Coskey
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
FEG
 
Pytroch-basic.pptx
Pytroch-basic.pptxPytroch-basic.pptx
Pytroch-basic.pptx
rebeen4
 
Chapter 1 number and code system sss
Chapter 1 number and code system sssChapter 1 number and code system sss
Chapter 1 number and code system sss
Baia Salihin
 

Similar to Vector space model in information retrieval (20)

Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
Kaizen cso002 l1
Kaizen cso002 l1Kaizen cso002 l1
Kaizen cso002 l1
 
01 introduction
01 introduction01 introduction
01 introduction
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with R
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with R
 
Text-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven TemplatesText-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven Templates
 
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
 
Three steps to untangle data traffic jams
Three steps to untangle data traffic jamsThree steps to untangle data traffic jams
Three steps to untangle data traffic jams
 
BenG Update on automatic labelling
BenG Update on automatic labellingBenG Update on automatic labelling
BenG Update on automatic labelling
 
ppt_pspp.pdf
ppt_pspp.pdfppt_pspp.pdf
ppt_pspp.pdf
 
LEC 1.pptx
LEC 1.pptxLEC 1.pptx
LEC 1.pptx
 
Digital Logic & Design
Digital Logic & DesignDigital Logic & Design
Digital Logic & Design
 
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTopic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
chapter 1.pptx
chapter 1.pptxchapter 1.pptx
chapter 1.pptx
 
Stream-based Data Synchronization
Stream-based Data SynchronizationStream-based Data Synchronization
Stream-based Data Synchronization
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
Pytroch-basic.pptx
Pytroch-basic.pptxPytroch-basic.pptx
Pytroch-basic.pptx
 
Chapter 1 number and code system sss
Chapter 1 number and code system sssChapter 1 number and code system sss
Chapter 1 number and code system sss
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 

Vector space model in information retrieval

  • 1. Vector Space Model By: Tharuka Vishwajith
  • 2. Boolean Model • Based on set theory and Boolean logic • Exact matching of documents to a user query • Uses the Boolean AND, OR and NOT operators D1 D2 D3 D4 D5 D6 Cat 1 1 0 1 0 1 Dog 1 1 1 1 1 0 Rat 0 1 0 1 0 1 Apple 0 0 0 0 1 0 Orange 0 0 1 1 0 1 Computer 0 0 0 1 1 1
  • 3. • query: Dog AND Cat AND NOT Computer • computation: 111110 AND 110101 AND 111000 = 110000 • result: document set {D1,D2} D1 D2 D3 D4 D5 D6 Cat 1 1 0 1 0 1 Dog 1 1 1 1 1 0 Rat 0 1 0 1 0 1 Apple 0 0 0 0 1 0 Orange 0 0 1 1 0 1 Computer 0 0 0 1 1 1
  • 4. Boolean Model ... Advantages • Relatively easy to implement and scalable • Fast query processing based on parallel scanning of indexes Disadvantages • Does not pay attention to synonymy • Does not pay attention to polysemy • No ranking of output • Often the user has to learn a special syntax such as the use of double quotes to search for phrases
  • 5. Vector Space Model • Algebraic model representing text documents and queries as vectors based on the index terms • One dimension for each term • Compute the similarity (angle) between the query vector and the document vectors
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Cosine similarity among 3 documents Term SaS PaP WH affection 115 58 20 jealous 10 7 11 gossip 2 0 6 wuthering 0 0 38 1 + log(tf) Term frequency (tf) count Log normalization:
  • 13. Cosine similarity among 3 documents Term SaS PaP WH affection 115 58 20 jealous 10 7 11 gossip 2 0 6 wuthering 0 0 38 Log Frequency Weightage Length normalization for SaS = (3.06)2 + (2)2 + (1.3)2 + (0) 2 Term SaS PaP WH affection 3.06 0.83 0.52 jealous 2.00 0.55 0.46 gossip 1.30 0 0.40 wuthering 0 0 0.58 Length normalization for PaP = (2.76)2 + (1.84)2 + (0)2 + (0) 2 Length normalization for WH = (2.3)2 + (2.04)2 + (1.78)2 + (2.58) 2 = 3.87 = 3.31 = 4.39 Term SaS PaP WH affection 3.06 2.76 2.30 jealous 2.00 1.84 2.04 gossip 1.30 0 1.78 wuthering 0 0 2.58
  • 14. Cosine similarity among 3 documents Term SaS PaP WH affection 115 58 20 jealous 10 7 11 gossip 2 0 6 wuthering 0 0 38 After Length Normalization Length normalization for SaS = (3.06)2 + (2)2 + (1.3)2 + (0) 2 Term SaS PaP WH affection 3.06 / 3.87 2.78 / 3.31 2.30 / 4.39 jealous 2.00 / 3.87 1.84 / 3.31 2.04 / 4.39 gossip 1.30 / 3.87 0 / 3.31 1.78 / 4.39 wuthering 0 / 3.87 0 / 3.31 2.58 / 4.39 Length normalization for PaP = (2.76)2 + (1.84)2 + (0)2 + (0) 2 Length normalization for WH = (2.3)2 + (2.04)2 + (1.77)2 + (2.57) 2 = 3.87 = 3.31 = 4.39
  • 15. Cosine similarity among 3 documents Term SaS PaP WH affection 115 58 20 jealous 10 7 11 gossip 2 0 6 wuthering 0 0 38 After Length Normalization Cos( SaS . PaP ) ∝ (0.79 x 0.84) + (0.51 x 0.56) Term SaS PaP WH affection 0.79 0.84 0.52 jealous 0.51 0.56 0.46 gossip 0.33 0 0.40 wuthering 0 0 0.58 Cos ( PaP . WH ) ∝ (0.84 x 0.52) + (0.56 x 0.46) Cos ( SaS . WH ) ∝ (0.79 x 0.52) + (0.51 x 0.46) + (0.33 x 0.4) = 0.95 = 0.69 = 0.78