Vector and Probabilistic Models of Information Retrieval

•Download as PPTX, PDF•

1 like•397 views

Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers, such as, for example, index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings.

Software

Information Retrieval : 10
Vector and Probabilistic Models
Prof Neeraj Bhargava
Vaibhav Khanna
Department of Computer Science
School of Engineering and Systems Sciences
Maharshi Dayanand Saraswati University Ajmer

The Vector Model
• Boolean matching and binary weights is too
limiting
• The vector model proposes a framework in which
partial matching is possible
• This is accomplished by assigning non-binary
weights to index terms in queries and in
documents
• Term weights are used to compute a degree of
similarity between a query and each document
• The documents are ranked in decreasing order of
their degree of similarity

The Vector Model
• For the vector model:
• The weight wi,j associated with a pair (ki, dj) is positive and
non-binary
• The index terms are assumed to be all mutually independent
• They are represented as unit vectors of a t-dimensional space
(t is the total number of index terms)
• The representations of document dj and query q are t-
dimensional vectors given by :

The Vector Model
• Weights in the Vector model are basically tf-idf weights
• These equations should only be applied for values of term
frequency greater than zero
• If the term frequency is zero, the respective weight is also
zero

The Vector Model
• Document ranks computed by the Vector model for the query
“to do”

The Vector Model
• Advantages:
– term-weighting improves quality of the answer set
– partial matching allows retrieval of docs that
approximate the query conditions
– cosine ranking formula sorts documents according to
a degree of similarity to the query
– document length normalization is naturally built-in
into the ranking
• Disadvantages:
– It assumes independence of index terms

Probabilistic Model
• The probabilistic model captures the IR problem
using a probabilistic framework
• Given a user query, there is an ideal answer set
for this query
• Given a description of this ideal answer set, we
could retrieve the relevant documents
• Querying is seen as a specification of the
properties of this ideal answer set
• But, what are these properties?

Probabilistic Model
• An initial set of documents is retrieved somehow
• The user inspects these docs looking for the
relevant ones (in truth, only top 10-20 need to be
inspected)
• The IR system uses this information to refine the
description of the ideal answer set
• By repeating this process, it is expected that the
description of the ideal answer set will improve

Probabilistic Ranking Principle
• The probabilistic model
– Tries to estimate the probability that a document will
be relevant to a user query
– Assumes that this probability depends on the query
and document representations only
– The ideal answer set, referred to as R, should
maximize the probability of relevance
• But,
– How to compute these probabilities?
– What is the sample space?

Ranking Formula
• The previous equation cannot be computed without
estimates of ri and R
• One possibility is to assume R = ri = 0, as a way to
boostrap the ranking equation, which leads to
• This equation provides an idf-like ranking computation
• In the absence of relevance information, this is the
equation for ranking in the probabilistic model

Advantages and Disadvantages
• Advantages:
– Docs ranked in decreasing order of probability of
relevance
• Disadvantages:
– need to guess initial estimates for piR
– method does not take into account tf factors
– the lack of document length normalization

Assignment
• Explain the vector model of information
Retrieval
• Explain the probabilistic model of Information
Retrieval

What's hot

Information Retrieval EvaluationJosé Ramón Ríos Viqueira

Link Analysis for Web Information RetrievalCarlos Castillo (ChaTo)

Random graph modelsnetworksuw

Vector space model of information retrievalNanthini Dominique

Web mining slidesmahavir_a

Information retrieval ssilambu111

The vector space modelpkgosh

Model of information retrieval (3)9866825059

Term weightingPrimya Tamil

IRGirish Khanzode

Edge Detection and SegmentationA B Shinde

Clustering - Machine Learning TechniquesKush Kulshrestha

Mining Association Rules in Large DatabaseEr. Nawaraj Bhandari

Belief Networks & Bayesian ClassificationAdnan Masood

CS6007 information retrieval - 5 units notesAnandh Arumugakan

Information retrieval-systems notesBAIRAVI T

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony

Probabilistic information retrieval models & systemsSelman Bozkır

Image feature extractionRushin Shah

Back face detectionPooja Dixit

What's hot (20)

Information Retrieval Evaluation

Link Analysis for Web Information Retrieval

Random graph models

Vector space model of information retrieval

Web mining slides

Information retrieval s

The vector space model

Model of information retrieval (3)

Term weighting

Edge Detection and Segmentation

Clustering - Machine Learning Techniques

Mining Association Rules in Large Database

Belief Networks & Bayesian Classification

CS6007 information retrieval - 5 units notes

Information retrieval-systems notes

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...

Probabilistic information retrieval models & systems

Image feature extraction

Back face detection

Similar to Vector and Probabilistic Models of Information Retrieval

Information retrieval 12 modern ir and set based modelsVaibhav Khanna

information retrievalssbd6985

Information Retrievalssbd6985

IRT Unit_ 2.pptxthenmozhip8

Information retrieval 13 alternative set theoretic modelsVaibhav Khanna

04 Classification in Data MiningValerii Klymchuk

Information retrieval 20 divergence from randomnessVaibhav Khanna

Information retrieval 6 ir modelsVaibhav Khanna

Information retrieval systems irt ppt doPonnuthuraiSelvaraj1

LSH for  Prediction Problem in RecommendationMaruf Aytekin

Document ranking using qprp with concept of multi dimensional subspacePrakash Dubey

Recommending Scientific Papers: Investigating the User CurriculumJonathas Magalhães

Chapter 7.pdfHabtamu100

qury.pdfHabtamu100

Recommender systemBhumi Patel

Terms for smartPLS.pptxkinmengcheng1

Information retrival system and PageRank algorithmRupali Bhatnagar

Data mining techniques unit ivmalathieswaran29

'A critique of testing' UK TMF forum January 2015 Georgina Tilby

TCI in primary care - SEM (2006)Evangelos Kontopantelis

Similar to Vector and Probabilistic Models of Information Retrieval (20)

Information retrieval 12 modern ir and set based models

information retrieval

Information Retrieval

IRT Unit_ 2.pptx

Information retrieval 13 alternative set theoretic models

04 Classification in Data Mining

Information retrieval 20 divergence from randomness

Information retrieval 6 ir models

Information retrieval systems irt ppt do

LSH for  Prediction Problem in Recommendation

Document ranking using qprp with concept of multi dimensional subspace

Recommending Scientific Papers: Investigating the User Curriculum

Chapter 7.pdf

qury.pdf

Recommender system

Terms for smartPLS.pptx

Information retrival system and PageRank algorithm

Data mining techniques unit iv

'A critique of testing' UK TMF forum January 2015

TCI in primary care - SEM (2006)

Recently uploaded

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

Professional Resume Template for Software DevelopersVinodh Ram

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq

Asset Management Software - InfographicHr365.us smith

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

chapter--4-software-project-planning.pptkotipi9215

Exploring iOS App Development: Simplifying the ProcessEvangelist Apps https://twitter.com/EvangelistSW/

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

What is Binary Language? Computer Number SystemsJheuzeDellosa

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

Recently uploaded (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

Optimizing AI for immediate response in Smart CCTV

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

Professional Resume Template for Software Developers

Unlocking the Future of AI Agents with Large Language Models

Salesforce Certified Field Service Consultant

Asset Management Software - Infographic

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

HR Software Buyers Guide in 2024 - HRSoftware.com

chapter--4-software-project-planning.ppt

Exploring iOS App Development: Simplifying the Process

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

What is Binary Language? Computer Number Systems

5 Signs You Need a Fashion PLM Software.pdf

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

Hand gesture recognition PROJECT PPT.pptx

Vector and Probabilistic Models of Information Retrieval

1. Information Retrieval : 10 Vector and Probabilistic Models Prof Neeraj Bhargava Vaibhav Khanna Department of Computer Science School of Engineering and Systems Sciences Maharshi Dayanand Saraswati University Ajmer

2. The Vector Model • Boolean matching and binary weights is too limiting • The vector model proposes a framework in which partial matching is possible • This is accomplished by assigning non-binary weights to index terms in queries and in documents • Term weights are used to compute a degree of similarity between a query and each document • The documents are ranked in decreasing order of their degree of similarity

3. The Vector Model • For the vector model: • The weight wi,j associated with a pair (ki, dj) is positive and non-binary • The index terms are assumed to be all mutually independent • They are represented as unit vectors of a t-dimensional space (t is the total number of index terms) • The representations of document dj and query q are t- dimensional vectors given by :

4. The Vector Model

5. The Vector Model • Weights in the Vector model are basically tf-idf weights • These equations should only be applied for values of term frequency greater than zero • If the term frequency is zero, the respective weight is also zero

6. The Vector Model • Document ranks computed by the Vector model for the query “to do”

7. The Vector Model • Advantages: – term-weighting improves quality of the answer set – partial matching allows retrieval of docs that approximate the query conditions – cosine ranking formula sorts documents according to a degree of similarity to the query – document length normalization is naturally built-in into the ranking • Disadvantages: – It assumes independence of index terms

8. Probabilistic Model • The probabilistic model captures the IR problem using a probabilistic framework • Given a user query, there is an ideal answer set for this query • Given a description of this ideal answer set, we could retrieve the relevant documents • Querying is seen as a specification of the properties of this ideal answer set • But, what are these properties?

9. Probabilistic Model • An initial set of documents is retrieved somehow • The user inspects these docs looking for the relevant ones (in truth, only top 10-20 need to be inspected) • The IR system uses this information to refine the description of the ideal answer set • By repeating this process, it is expected that the description of the ideal answer set will improve

10. Probabilistic Ranking Principle • The probabilistic model – Tries to estimate the probability that a document will be relevant to a user query – Assumes that this probability depends on the query and document representations only – The ideal answer set, referred to as R, should maximize the probability of relevance • But, – How to compute these probabilities? – What is the sample space?

11. The Ranking

12. The Ranking

13. Ranking Formula • The previous equation cannot be computed without estimates of ri and R • One possibility is to assume R = ri = 0, as a way to boostrap the ranking equation, which leads to • This equation provides an idf-like ranking computation • In the absence of relevance information, this is the equation for ranking in the probabilistic model

14. Advantages and Disadvantages • Advantages: – Docs ranked in decreasing order of probability of relevance • Disadvantages: – need to guess initial estimates for piR – method does not take into account tf factors – the lack of document length normalization

15. Assignment • Explain the vector model of information Retrieval • Explain the probabilistic model of Information Retrieval

Vector and Probabilistic Models of Information Retrieval

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Vector and Probabilistic Models of Information Retrieval

Similar to Vector and Probabilistic Models of Information Retrieval (20)

More from Vaibhav Khanna

More from Vaibhav Khanna (20)

Recently uploaded

Recently uploaded (20)

Vector and Probabilistic Models of Information Retrieval