SlideShare a Scribd company logo
UNIT-II(Modelling And Retrieval
Evaluation )
IV Year / VIII Semester
By
P.THENMOZHI AP/CSE
KNCET.
KONGUNADU COLLEGE OF ENGINEERING AND
TECHNOLOGY
(Autonomous)
NAMAKKAL- TRICHY MAIN ROAD, THOTTIAM
DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING
CS8080 – Information Retrieval Techniques
Syllabus
MODELING AND RETRIEVAL
EVALUATION
• Basic Retrieval Models
• An IR model governs how a document and a
query are represented and how the relevance
of a document to a user query is defined.
• There are Three main IR models:
– Boolean model
– Vector space model
– Probabilistic model
• Each term is associated with a weight.Given a
collection of documents D, let
• V = {t1, t2... t|V|} be the set of distinctive
terms in the collection, where ti is a term.
• The set V is usually called the vocabulary of
the collection, and |V| is its size,
• i.e., the number of terms in V.
• An IR model is a quadruple [D, Q, F, R(qi, dj)]
where
• 1. D is a set of logical views for the documents
in the collection
• 2. Q is a set of logical views for the user
queries
• 3. F is a framework for modeling documents
and queries
• 4. R(qi, dj) is a ranking function
Boolean Model
• The Boolean model is one of the earliest and
simplest information retrieval models.
• It uses the notion of exact matching to match
documents to the user query.
• Both the query and the retrieval are based on
Boolean algebra.
• In the Boolean model, documents and queries
are represented as sets of terms.
• That is, each term is only considered present
or absent in a document.
• Boolean Queries:
• Query terms are combined logically using the Boolean
operators AND, OR, and NOT, which have their usual
semantics in logic.
• Thus, a Boolean query has a precise semantics.
• For instance, the query, ((x AND y) AND (NOT z)) says
that a retrieved document must contain both the terms
x and y but not z.
• As another example, the query expression (x OR y)
means that at least one of these terms must be in each
retrieved document.
• Here, we assume that x, y and z are terms. In general,
they can be Boolean expressions themselves.
• Document Retrieval:
• Given a Boolean query, the system retrieves
every document that makes the query
logically true.
• Thus, the retrieval is based on the binary
decision criterion, i.e., a document is either
relevant or irrelevant. Intuitively, this is called
exact match.
• Most search engines support some limited
forms of Boolean retrieval using explicit
inclusion and exclusion operators.
• Drawbacks of the Boolean Model
• No ranking of the documents is provided
(absence of a grading scale)
• Information need has to be translated into a
Boolean expression, which most users find
awkward
• The Boolean queries formulated by the users
are most often too simplistic.
TF-IDF (Term Frequency/Inverse
Document Frequency) Weighting
• We assign to each term in a document a
weight for that term that depends on the
number of occurrences of the term in the
document.
• We would like to compute a score between a
query term t and a document d, based on the
weight of t in d. The simplest approach is to
assign the weight to be equal to the number
of occurrences of term t in document d.
• This weighting scheme is referred to as term
frequency and is denoted tft,d, with the
subscripts denoting the term and the
document in order.
• For a document d, the set of weights
determined by the tf weights above (or indeed
any weighting function that maps the number
of occurrences of t in d to a positive real
value) may be viewed as a quantitative digest
of that document.
• How is the document frequency df of a term
used to scale its weight? Denoting as usual the
total number of documents in a collection by
N, we define the inverse document frequency
(idf) of a term t as follows:
• idft = log
𝑁
𝑑𝑓𝑡
• Tf-idf weighting
• We now combine the definitions of term
frequency and inverse document frequency, to
produce a composite weight for each term in
each document.
• The tf-idf weighting scheme assigns to term t
a weight in document d given by
•
• tf-idft,d = tft,d ×idft.
• Document d is the sum, over all query terms,
of the number of times each of the query
terms occurs in d.
• We can refine this idea so that we add up not
the number of occurrences of each query
term t in d, but instead the tf-idf weight of
each term in d.
• Score (q, d) = 𝑡∈𝑞 tf − idf𝑡, 𝑑.
Cosine similarity
• Documents could be ranked by computing the distance between
the points representing the documents and the query.
• More commonly, a similarity measure is used (rather than a
distance or dissimilarity measure), so that the documents with the
highest scores are the most similar to the query.
• A number of similarity measures have been proposed and tested
for this purpose.
• The most successful of these is the cosine correlation similarity
measure.
• The cosine correlation measures the cosine of the angle between
the query and the document vectors.
• When the vectors are normalized so that all documents and queries
are represented by vectors of equal length, the cosine of the angle
between two identical vectors will be 1 (the angle is zero), and for
two vectors that do not share any non-zero terms, the cosine will
be 0.
• The cosine measure is defined as:
• 𝐶𝑜𝑠𝑖𝑛𝑒(𝐷𝑖, 𝑄) =
𝑗=1
𝑡
𝑑𝑖𝑗 · 𝑞𝑗
𝑗=1
𝑡
𝑑𝑖𝑗2. 𝑗=1
𝑡
𝑞𝑗2
• The numerator of this measure is the sum of the products
of the term weights for the matching query and document
terms (known as the dot product or inner product).
• The denominator normalizes this score by dividing by the
product of the lengths of the two vectors. There is no
theoretical reason why the cosine correlation should be
preferred to other similarity measures, but it does perform
somewhat better in evaluations of search quality.

More Related Content

What's hot

Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
Kira
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
Vaibhav Khanna
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
Marina Santini
 
Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrieval
otisg
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
Primya Tamil
 
CS8080 information retrieval techniques unit iii ppt in pdf
CS8080 information retrieval techniques unit iii ppt in pdfCS8080 information retrieval techniques unit iii ppt in pdf
CS8080 information retrieval techniques unit iii ppt in pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
Artificial Intelligence: Knowledge Engineering
Artificial Intelligence: Knowledge EngineeringArtificial Intelligence: Knowledge Engineering
Artificial Intelligence: Knowledge Engineering
The Integral Worm
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
Azad public school
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
ImXaib
 
Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)
Jeet Das
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
ssbd6985
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
Marina Santini
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrievalNanthini Dominique
 
Artificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge AcquisitionArtificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge Acquisition
The Integral Worm
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
Ankit Rai
 
Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification
Mahmoud Alfarra
 
AI- memory organisation systems
AI-  memory organisation systemsAI-  memory organisation systems
AI- memory organisation systems
ratikaagarwal
 
Dynamic Programming
Dynamic ProgrammingDynamic Programming
Dynamic Programming
Sahil Kumar
 

What's hot (20)

Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrieval
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
CS8080 information retrieval techniques unit iii ppt in pdf
CS8080 information retrieval techniques unit iii ppt in pdfCS8080 information retrieval techniques unit iii ppt in pdf
CS8080 information retrieval techniques unit iii ppt in pdf
 
Artificial Intelligence: Knowledge Engineering
Artificial Intelligence: Knowledge EngineeringArtificial Intelligence: Knowledge Engineering
Artificial Intelligence: Knowledge Engineering
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Ir models
Ir modelsIr models
Ir models
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Pandas
PandasPandas
Pandas
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Artificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge AcquisitionArtificial Intelligence: Knowledge Acquisition
Artificial Intelligence: Knowledge Acquisition
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
 
Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification
 
AI- memory organisation systems
AI-  memory organisation systemsAI-  memory organisation systems
AI- memory organisation systems
 
Dynamic Programming
Dynamic ProgrammingDynamic Programming
Dynamic Programming
 

Similar to IRT Unit_ 2.pptx

Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
Habtamu100
 
UNIT 3 IRT.docx
UNIT 3 IRT.docxUNIT 3 IRT.docx
UNIT 3 IRT.docx
thenmozhip8
 
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
onlmcq
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Sean Golliher
 
IR.pptx
IR.pptxIR.pptx
IR.pptx
MahamSajid4
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspectiveankurpandeyinfo
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdf
Habtamu100
 
Ir 08
Ir   08Ir   08
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
BereketAraya
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
BereketAraya
 
vectorSpaceModelPeterBurden.ppt
vectorSpaceModelPeterBurden.pptvectorSpaceModelPeterBurden.ppt
vectorSpaceModelPeterBurden.ppt
pepe3059
 
Information retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessInformation retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomness
Vaibhav Khanna
 
Ir 09
Ir   09Ir   09
Text mining
Text miningText mining
Text mining
Koshy Geoji
 
A Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking FusionA Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking Fusion
Damiano Spina
 
Simple semantics in topic detection and tracking
Simple semantics in topic detection and trackingSimple semantics in topic detection and tracking
Simple semantics in topic detection and trackingGeorge Ang
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval modelbaradhimarch81
 
information retrieval
information retrievalinformation retrieval
information retrieval
ssbd6985
 

Similar to IRT Unit_ 2.pptx (20)

Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
 
UNIT 3 IRT.docx
UNIT 3 IRT.docxUNIT 3 IRT.docx
UNIT 3 IRT.docx
 
Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
IR.pptx
IR.pptxIR.pptx
IR.pptx
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdf
 
Ir 08
Ir   08Ir   08
Ir 08
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
 
vectorSpaceModelPeterBurden.ppt
vectorSpaceModelPeterBurden.pptvectorSpaceModelPeterBurden.ppt
vectorSpaceModelPeterBurden.ppt
 
Information retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessInformation retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomness
 
Ir 09
Ir   09Ir   09
Ir 09
 
Text mining
Text miningText mining
Text mining
 
A Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking FusionA Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking Fusion
 
Simple semantics in topic detection and tracking
Simple semantics in topic detection and trackingSimple semantics in topic detection and tracking
Simple semantics in topic detection and tracking
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
 
information retrieval
information retrievalinformation retrieval
information retrieval
 

More from thenmozhip8

U5 SPC.pptx
U5 SPC.pptxU5 SPC.pptx
U5 SPC.pptx
thenmozhip8
 
Unit 4.pdf
Unit 4.pdfUnit 4.pdf
Unit 4.pdf
thenmozhip8
 
unit 3 ppt.pptx
unit 3 ppt.pptxunit 3 ppt.pptx
unit 3 ppt.pptx
thenmozhip8
 
Unit 1 .ppt
Unit 1 .pptUnit 1 .ppt
Unit 1 .ppt
thenmozhip8
 
IR UNIT V.docx
IR UNIT  V.docxIR UNIT  V.docx
IR UNIT V.docx
thenmozhip8
 
IRT Unit_4.pptx
IRT Unit_4.pptxIRT Unit_4.pptx
IRT Unit_4.pptx
thenmozhip8
 
IRT Unit_I.pptx
IRT Unit_I.pptxIRT Unit_I.pptx
IRT Unit_I.pptx
thenmozhip8
 
packages unit 5 .ppt
packages  unit 5 .pptpackages  unit 5 .ppt
packages unit 5 .ppt
thenmozhip8
 
unit 4 .ppt
unit 4 .pptunit 4 .ppt
unit 4 .ppt
thenmozhip8
 
Definning class.pptx unit 3
Definning class.pptx unit 3Definning class.pptx unit 3
Definning class.pptx unit 3
thenmozhip8
 
exception-handling-in-java.ppt unit 2
exception-handling-in-java.ppt unit 2exception-handling-in-java.ppt unit 2
exception-handling-in-java.ppt unit 2
thenmozhip8
 
unit 1 full ppt.pptx
unit 1 full ppt.pptxunit 1 full ppt.pptx
unit 1 full ppt.pptx
thenmozhip8
 

More from thenmozhip8 (13)

U5 SPC.pptx
U5 SPC.pptxU5 SPC.pptx
U5 SPC.pptx
 
Unit 4.pdf
Unit 4.pdfUnit 4.pdf
Unit 4.pdf
 
unit 3 ppt.pptx
unit 3 ppt.pptxunit 3 ppt.pptx
unit 3 ppt.pptx
 
U2.ppt
U2.pptU2.ppt
U2.ppt
 
Unit 1 .ppt
Unit 1 .pptUnit 1 .ppt
Unit 1 .ppt
 
IR UNIT V.docx
IR UNIT  V.docxIR UNIT  V.docx
IR UNIT V.docx
 
IRT Unit_4.pptx
IRT Unit_4.pptxIRT Unit_4.pptx
IRT Unit_4.pptx
 
IRT Unit_I.pptx
IRT Unit_I.pptxIRT Unit_I.pptx
IRT Unit_I.pptx
 
packages unit 5 .ppt
packages  unit 5 .pptpackages  unit 5 .ppt
packages unit 5 .ppt
 
unit 4 .ppt
unit 4 .pptunit 4 .ppt
unit 4 .ppt
 
Definning class.pptx unit 3
Definning class.pptx unit 3Definning class.pptx unit 3
Definning class.pptx unit 3
 
exception-handling-in-java.ppt unit 2
exception-handling-in-java.ppt unit 2exception-handling-in-java.ppt unit 2
exception-handling-in-java.ppt unit 2
 
unit 1 full ppt.pptx
unit 1 full ppt.pptxunit 1 full ppt.pptx
unit 1 full ppt.pptx
 

Recently uploaded

Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 

Recently uploaded (20)

Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 

IRT Unit_ 2.pptx

  • 1. UNIT-II(Modelling And Retrieval Evaluation ) IV Year / VIII Semester By P.THENMOZHI AP/CSE KNCET. KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY (Autonomous) NAMAKKAL- TRICHY MAIN ROAD, THOTTIAM DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CS8080 – Information Retrieval Techniques
  • 3. MODELING AND RETRIEVAL EVALUATION • Basic Retrieval Models • An IR model governs how a document and a query are represented and how the relevance of a document to a user query is defined. • There are Three main IR models: – Boolean model – Vector space model – Probabilistic model
  • 4. • Each term is associated with a weight.Given a collection of documents D, let • V = {t1, t2... t|V|} be the set of distinctive terms in the collection, where ti is a term. • The set V is usually called the vocabulary of the collection, and |V| is its size, • i.e., the number of terms in V.
  • 5. • An IR model is a quadruple [D, Q, F, R(qi, dj)] where • 1. D is a set of logical views for the documents in the collection • 2. Q is a set of logical views for the user queries • 3. F is a framework for modeling documents and queries • 4. R(qi, dj) is a ranking function
  • 6.
  • 7. Boolean Model • The Boolean model is one of the earliest and simplest information retrieval models. • It uses the notion of exact matching to match documents to the user query. • Both the query and the retrieval are based on Boolean algebra.
  • 8. • In the Boolean model, documents and queries are represented as sets of terms. • That is, each term is only considered present or absent in a document.
  • 9. • Boolean Queries: • Query terms are combined logically using the Boolean operators AND, OR, and NOT, which have their usual semantics in logic. • Thus, a Boolean query has a precise semantics. • For instance, the query, ((x AND y) AND (NOT z)) says that a retrieved document must contain both the terms x and y but not z. • As another example, the query expression (x OR y) means that at least one of these terms must be in each retrieved document. • Here, we assume that x, y and z are terms. In general, they can be Boolean expressions themselves.
  • 10. • Document Retrieval: • Given a Boolean query, the system retrieves every document that makes the query logically true. • Thus, the retrieval is based on the binary decision criterion, i.e., a document is either relevant or irrelevant. Intuitively, this is called exact match. • Most search engines support some limited forms of Boolean retrieval using explicit inclusion and exclusion operators.
  • 11. • Drawbacks of the Boolean Model • No ranking of the documents is provided (absence of a grading scale) • Information need has to be translated into a Boolean expression, which most users find awkward • The Boolean queries formulated by the users are most often too simplistic.
  • 12. TF-IDF (Term Frequency/Inverse Document Frequency) Weighting • We assign to each term in a document a weight for that term that depends on the number of occurrences of the term in the document. • We would like to compute a score between a query term t and a document d, based on the weight of t in d. The simplest approach is to assign the weight to be equal to the number of occurrences of term t in document d.
  • 13. • This weighting scheme is referred to as term frequency and is denoted tft,d, with the subscripts denoting the term and the document in order. • For a document d, the set of weights determined by the tf weights above (or indeed any weighting function that maps the number of occurrences of t in d to a positive real value) may be viewed as a quantitative digest of that document.
  • 14. • How is the document frequency df of a term used to scale its weight? Denoting as usual the total number of documents in a collection by N, we define the inverse document frequency (idf) of a term t as follows: • idft = log 𝑁 𝑑𝑓𝑡
  • 15. • Tf-idf weighting • We now combine the definitions of term frequency and inverse document frequency, to produce a composite weight for each term in each document. • The tf-idf weighting scheme assigns to term t a weight in document d given by • • tf-idft,d = tft,d ×idft.
  • 16. • Document d is the sum, over all query terms, of the number of times each of the query terms occurs in d. • We can refine this idea so that we add up not the number of occurrences of each query term t in d, but instead the tf-idf weight of each term in d. • Score (q, d) = 𝑡∈𝑞 tf − idf𝑡, 𝑑.
  • 17. Cosine similarity • Documents could be ranked by computing the distance between the points representing the documents and the query. • More commonly, a similarity measure is used (rather than a distance or dissimilarity measure), so that the documents with the highest scores are the most similar to the query. • A number of similarity measures have been proposed and tested for this purpose. • The most successful of these is the cosine correlation similarity measure. • The cosine correlation measures the cosine of the angle between the query and the document vectors. • When the vectors are normalized so that all documents and queries are represented by vectors of equal length, the cosine of the angle between two identical vectors will be 1 (the angle is zero), and for two vectors that do not share any non-zero terms, the cosine will be 0.
  • 18. • The cosine measure is defined as: • 𝐶𝑜𝑠𝑖𝑛𝑒(𝐷𝑖, 𝑄) = 𝑗=1 𝑡 𝑑𝑖𝑗 · 𝑞𝑗 𝑗=1 𝑡 𝑑𝑖𝑗2. 𝑗=1 𝑡 𝑞𝑗2 • The numerator of this measure is the sum of the products of the term weights for the matching query and document terms (known as the dot product or inner product). • The denominator normalizes this score by dividing by the product of the lengths of the two vectors. There is no theoretical reason why the cosine correlation should be preferred to other similarity measures, but it does perform somewhat better in evaluations of search quality.