SlideShare a Scribd company logo
1 of 15
Information Retrieval : 10
Vector and Probabilistic Models
Prof Neeraj Bhargava
Vaibhav Khanna
Department of Computer Science
School of Engineering and Systems Sciences
Maharshi Dayanand Saraswati University Ajmer
The Vector Model
• Boolean matching and binary weights is too
limiting
• The vector model proposes a framework in which
partial matching is possible
• This is accomplished by assigning non-binary
weights to index terms in queries and in
documents
• Term weights are used to compute a degree of
similarity between a query and each document
• The documents are ranked in decreasing order of
their degree of similarity
The Vector Model
• For the vector model:
• The weight wi,j associated with a pair (ki, dj) is positive and
non-binary
• The index terms are assumed to be all mutually independent
• They are represented as unit vectors of a t-dimensional space
(t is the total number of index terms)
• The representations of document dj and query q are t-
dimensional vectors given by :
The Vector Model
The Vector Model
• Weights in the Vector model are basically tf-idf weights
• These equations should only be applied for values of term
frequency greater than zero
• If the term frequency is zero, the respective weight is also
zero
The Vector Model
• Document ranks computed by the Vector model for the query
“to do”
The Vector Model
• Advantages:
– term-weighting improves quality of the answer set
– partial matching allows retrieval of docs that
approximate the query conditions
– cosine ranking formula sorts documents according to
a degree of similarity to the query
– document length normalization is naturally built-in
into the ranking
• Disadvantages:
– It assumes independence of index terms
Probabilistic Model
• The probabilistic model captures the IR problem
using a probabilistic framework
• Given a user query, there is an ideal answer set
for this query
• Given a description of this ideal answer set, we
could retrieve the relevant documents
• Querying is seen as a specification of the
properties of this ideal answer set
• But, what are these properties?
Probabilistic Model
• An initial set of documents is retrieved somehow
• The user inspects these docs looking for the
relevant ones (in truth, only top 10-20 need to be
inspected)
• The IR system uses this information to refine the
description of the ideal answer set
• By repeating this process, it is expected that the
description of the ideal answer set will improve
Probabilistic Ranking Principle
• The probabilistic model
– Tries to estimate the probability that a document will
be relevant to a user query
– Assumes that this probability depends on the query
and document representations only
– The ideal answer set, referred to as R, should
maximize the probability of relevance
• But,
– How to compute these probabilities?
– What is the sample space?
The Ranking
The Ranking
Ranking Formula
• The previous equation cannot be computed without
estimates of ri and R
• One possibility is to assume R = ri = 0, as a way to
boostrap the ranking equation, which leads to
• This equation provides an idf-like ranking computation
• In the absence of relevance information, this is the
equation for ranking in the probabilistic model
Advantages and Disadvantages
• Advantages:
– Docs ranked in decreasing order of probability of
relevance
• Disadvantages:
– need to guess initial estimates for piR
– method does not take into account tf factors
– the lack of document length normalization
Assignment
• Explain the vector model of information
Retrieval
• Explain the probabilistic model of Information
Retrieval

More Related Content

What's hot

Link Analysis for Web Information Retrieval
Link Analysis for Web Information RetrievalLink Analysis for Web Information Retrieval
Link Analysis for Web Information RetrievalCarlos Castillo (ChaTo)
 
Random graph models
Random graph modelsRandom graph models
Random graph modelsnetworksuw
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrievalNanthini Dominique
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slidesmahavir_a
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval ssilambu111
 
The vector space model
The vector space modelThe vector space model
The vector space modelpkgosh
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)9866825059
 
Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and SegmentationA B Shinde
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesKush Kulshrestha
 
Mining Association Rules in Large Database
Mining Association Rules in Large DatabaseMining Association Rules in Large Database
Mining Association Rules in Large DatabaseEr. Nawaraj Bhandari
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationAdnan Masood
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notesBAIRAVI T
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsSelman Bozkır
 
Image feature extraction
Image feature extractionImage feature extraction
Image feature extractionRushin Shah
 
Back face detection
Back face detectionBack face detection
Back face detectionPooja Dixit
 

What's hot (20)

Information Retrieval Evaluation
Information Retrieval EvaluationInformation Retrieval Evaluation
Information Retrieval Evaluation
 
Link Analysis for Web Information Retrieval
Link Analysis for Web Information RetrievalLink Analysis for Web Information Retrieval
Link Analysis for Web Information Retrieval
 
Random graph models
Random graph modelsRandom graph models
Random graph models
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
The vector space model
The vector space modelThe vector space model
The vector space model
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Term weighting
Term weightingTerm weighting
Term weighting
 
IR
IRIR
IR
 
Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and Segmentation
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
 
Mining Association Rules in Large Database
Mining Association Rules in Large DatabaseMining Association Rules in Large Database
Mining Association Rules in Large Database
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
Image feature extraction
Image feature extractionImage feature extraction
Image feature extraction
 
Back face detection
Back face detectionBack face detection
Back face detection
 

Similar to Vector and Probabilistic Models of Information Retrieval

Information retrieval 12 modern ir and set based models
Information retrieval 12 modern ir and set based modelsInformation retrieval 12 modern ir and set based models
Information retrieval 12 modern ir and set based modelsVaibhav Khanna
 
information retrieval
information retrievalinformation retrieval
information retrievalssbd6985
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrievalssbd6985
 
IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptxthenmozhip8
 
Information retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic modelsInformation retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic modelsVaibhav Khanna
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Information retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessInformation retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessVaibhav Khanna
 
Information retrieval 6 ir models
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir modelsVaibhav Khanna
 
Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt doPonnuthuraiSelvaraj1
 
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationMaruf Aytekin
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspacePrakash Dubey
 
Recommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumRecommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumJonathas Magalhães
 
Recommender system
Recommender systemRecommender system
Recommender systemBhumi Patel
 
Terms for smartPLS.pptx
Terms for smartPLS.pptxTerms for smartPLS.pptx
Terms for smartPLS.pptxkinmengcheng1
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithmRupali Bhatnagar
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit ivmalathieswaran29
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 Georgina Tilby
 

Similar to Vector and Probabilistic Models of Information Retrieval (20)

Information retrieval 12 modern ir and set based models
Information retrieval 12 modern ir and set based modelsInformation retrieval 12 modern ir and set based models
Information retrieval 12 modern ir and set based models
 
information retrieval
information retrievalinformation retrieval
information retrieval
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptx
 
Information retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic modelsInformation retrieval 13 alternative set theoretic models
Information retrieval 13 alternative set theoretic models
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Information retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessInformation retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomness
 
Information retrieval 6 ir models
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir models
 
Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt do
 
LSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in RecommendationLSH for
 Prediction Problem in Recommendation
LSH for
 Prediction Problem in Recommendation
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
 
Recommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumRecommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User Curriculum
 
Chapter 7.pdf
Chapter 7.pdfChapter 7.pdf
Chapter 7.pdf
 
qury.pdf
qury.pdfqury.pdf
qury.pdf
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Terms for smartPLS.pptx
Terms for smartPLS.pptxTerms for smartPLS.pptx
Terms for smartPLS.pptx
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015
 
TCI in primary care - SEM (2006)
TCI in primary care - SEM (2006)TCI in primary care - SEM (2006)
TCI in primary care - SEM (2006)
 

More from Vaibhav Khanna

Information and network security 47 authentication applications
Information and network security 47 authentication applicationsInformation and network security 47 authentication applications
Information and network security 47 authentication applicationsVaibhav Khanna
 
Information and network security 46 digital signature algorithm
Information and network security 46 digital signature algorithmInformation and network security 46 digital signature algorithm
Information and network security 46 digital signature algorithmVaibhav Khanna
 
Information and network security 45 digital signature standard
Information and network security 45 digital signature standardInformation and network security 45 digital signature standard
Information and network security 45 digital signature standardVaibhav Khanna
 
Information and network security 44 direct digital signatures
Information and network security 44 direct digital signaturesInformation and network security 44 direct digital signatures
Information and network security 44 direct digital signaturesVaibhav Khanna
 
Information and network security 43 digital signatures
Information and network security 43 digital signaturesInformation and network security 43 digital signatures
Information and network security 43 digital signaturesVaibhav Khanna
 
Information and network security 42 security of message authentication code
Information and network security 42 security of message authentication codeInformation and network security 42 security of message authentication code
Information and network security 42 security of message authentication codeVaibhav Khanna
 
Information and network security 41 message authentication code
Information and network security 41 message authentication codeInformation and network security 41 message authentication code
Information and network security 41 message authentication codeVaibhav Khanna
 
Information and network security 40 sha3 secure hash algorithm
Information and network security 40 sha3 secure hash algorithmInformation and network security 40 sha3 secure hash algorithm
Information and network security 40 sha3 secure hash algorithmVaibhav Khanna
 
Information and network security 39 secure hash algorithm
Information and network security 39 secure hash algorithmInformation and network security 39 secure hash algorithm
Information and network security 39 secure hash algorithmVaibhav Khanna
 
Information and network security 38 birthday attacks and security of hash fun...
Information and network security 38 birthday attacks and security of hash fun...Information and network security 38 birthday attacks and security of hash fun...
Information and network security 38 birthday attacks and security of hash fun...Vaibhav Khanna
 
Information and network security 37 hash functions and message authentication
Information and network security 37 hash functions and message authenticationInformation and network security 37 hash functions and message authentication
Information and network security 37 hash functions and message authenticationVaibhav Khanna
 
Information and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theoremInformation and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theoremVaibhav Khanna
 
Information and network security 34 primality
Information and network security 34 primalityInformation and network security 34 primality
Information and network security 34 primalityVaibhav Khanna
 
Information and network security 33 rsa algorithm
Information and network security 33 rsa algorithmInformation and network security 33 rsa algorithm
Information and network security 33 rsa algorithmVaibhav Khanna
 
Information and network security 32 principles of public key cryptosystems
Information and network security 32 principles of public key cryptosystemsInformation and network security 32 principles of public key cryptosystems
Information and network security 32 principles of public key cryptosystemsVaibhav Khanna
 
Information and network security 31 public key cryptography
Information and network security 31 public key cryptographyInformation and network security 31 public key cryptography
Information and network security 31 public key cryptographyVaibhav Khanna
 
Information and network security 30 random numbers
Information and network security 30 random numbersInformation and network security 30 random numbers
Information and network security 30 random numbersVaibhav Khanna
 
Information and network security 29 international data encryption algorithm
Information and network security 29 international data encryption algorithmInformation and network security 29 international data encryption algorithm
Information and network security 29 international data encryption algorithmVaibhav Khanna
 
Information and network security 28 blowfish
Information and network security 28 blowfishInformation and network security 28 blowfish
Information and network security 28 blowfishVaibhav Khanna
 
Information and network security 27 triple des
Information and network security 27 triple desInformation and network security 27 triple des
Information and network security 27 triple desVaibhav Khanna
 

More from Vaibhav Khanna (20)

Information and network security 47 authentication applications
Information and network security 47 authentication applicationsInformation and network security 47 authentication applications
Information and network security 47 authentication applications
 
Information and network security 46 digital signature algorithm
Information and network security 46 digital signature algorithmInformation and network security 46 digital signature algorithm
Information and network security 46 digital signature algorithm
 
Information and network security 45 digital signature standard
Information and network security 45 digital signature standardInformation and network security 45 digital signature standard
Information and network security 45 digital signature standard
 
Information and network security 44 direct digital signatures
Information and network security 44 direct digital signaturesInformation and network security 44 direct digital signatures
Information and network security 44 direct digital signatures
 
Information and network security 43 digital signatures
Information and network security 43 digital signaturesInformation and network security 43 digital signatures
Information and network security 43 digital signatures
 
Information and network security 42 security of message authentication code
Information and network security 42 security of message authentication codeInformation and network security 42 security of message authentication code
Information and network security 42 security of message authentication code
 
Information and network security 41 message authentication code
Information and network security 41 message authentication codeInformation and network security 41 message authentication code
Information and network security 41 message authentication code
 
Information and network security 40 sha3 secure hash algorithm
Information and network security 40 sha3 secure hash algorithmInformation and network security 40 sha3 secure hash algorithm
Information and network security 40 sha3 secure hash algorithm
 
Information and network security 39 secure hash algorithm
Information and network security 39 secure hash algorithmInformation and network security 39 secure hash algorithm
Information and network security 39 secure hash algorithm
 
Information and network security 38 birthday attacks and security of hash fun...
Information and network security 38 birthday attacks and security of hash fun...Information and network security 38 birthday attacks and security of hash fun...
Information and network security 38 birthday attacks and security of hash fun...
 
Information and network security 37 hash functions and message authentication
Information and network security 37 hash functions and message authenticationInformation and network security 37 hash functions and message authentication
Information and network security 37 hash functions and message authentication
 
Information and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theoremInformation and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theorem
 
Information and network security 34 primality
Information and network security 34 primalityInformation and network security 34 primality
Information and network security 34 primality
 
Information and network security 33 rsa algorithm
Information and network security 33 rsa algorithmInformation and network security 33 rsa algorithm
Information and network security 33 rsa algorithm
 
Information and network security 32 principles of public key cryptosystems
Information and network security 32 principles of public key cryptosystemsInformation and network security 32 principles of public key cryptosystems
Information and network security 32 principles of public key cryptosystems
 
Information and network security 31 public key cryptography
Information and network security 31 public key cryptographyInformation and network security 31 public key cryptography
Information and network security 31 public key cryptography
 
Information and network security 30 random numbers
Information and network security 30 random numbersInformation and network security 30 random numbers
Information and network security 30 random numbers
 
Information and network security 29 international data encryption algorithm
Information and network security 29 international data encryption algorithmInformation and network security 29 international data encryption algorithm
Information and network security 29 international data encryption algorithm
 
Information and network security 28 blowfish
Information and network security 28 blowfishInformation and network security 28 blowfish
Information and network security 28 blowfish
 
Information and network security 27 triple des
Information and network security 27 triple desInformation and network security 27 triple des
Information and network security 27 triple des
 

Recently uploaded

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 

Recently uploaded (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 

Vector and Probabilistic Models of Information Retrieval

  • 1. Information Retrieval : 10 Vector and Probabilistic Models Prof Neeraj Bhargava Vaibhav Khanna Department of Computer Science School of Engineering and Systems Sciences Maharshi Dayanand Saraswati University Ajmer
  • 2. The Vector Model • Boolean matching and binary weights is too limiting • The vector model proposes a framework in which partial matching is possible • This is accomplished by assigning non-binary weights to index terms in queries and in documents • Term weights are used to compute a degree of similarity between a query and each document • The documents are ranked in decreasing order of their degree of similarity
  • 3. The Vector Model • For the vector model: • The weight wi,j associated with a pair (ki, dj) is positive and non-binary • The index terms are assumed to be all mutually independent • They are represented as unit vectors of a t-dimensional space (t is the total number of index terms) • The representations of document dj and query q are t- dimensional vectors given by :
  • 5. The Vector Model • Weights in the Vector model are basically tf-idf weights • These equations should only be applied for values of term frequency greater than zero • If the term frequency is zero, the respective weight is also zero
  • 6. The Vector Model • Document ranks computed by the Vector model for the query “to do”
  • 7. The Vector Model • Advantages: – term-weighting improves quality of the answer set – partial matching allows retrieval of docs that approximate the query conditions – cosine ranking formula sorts documents according to a degree of similarity to the query – document length normalization is naturally built-in into the ranking • Disadvantages: – It assumes independence of index terms
  • 8. Probabilistic Model • The probabilistic model captures the IR problem using a probabilistic framework • Given a user query, there is an ideal answer set for this query • Given a description of this ideal answer set, we could retrieve the relevant documents • Querying is seen as a specification of the properties of this ideal answer set • But, what are these properties?
  • 9. Probabilistic Model • An initial set of documents is retrieved somehow • The user inspects these docs looking for the relevant ones (in truth, only top 10-20 need to be inspected) • The IR system uses this information to refine the description of the ideal answer set • By repeating this process, it is expected that the description of the ideal answer set will improve
  • 10. Probabilistic Ranking Principle • The probabilistic model – Tries to estimate the probability that a document will be relevant to a user query – Assumes that this probability depends on the query and document representations only – The ideal answer set, referred to as R, should maximize the probability of relevance • But, – How to compute these probabilities? – What is the sample space?
  • 13. Ranking Formula • The previous equation cannot be computed without estimates of ri and R • One possibility is to assume R = ri = 0, as a way to boostrap the ranking equation, which leads to • This equation provides an idf-like ranking computation • In the absence of relevance information, this is the equation for ranking in the probabilistic model
  • 14. Advantages and Disadvantages • Advantages: – Docs ranked in decreasing order of probability of relevance • Disadvantages: – need to guess initial estimates for piR – method does not take into account tf factors – the lack of document length normalization
  • 15. Assignment • Explain the vector model of information Retrieval • Explain the probabilistic model of Information Retrieval