SlideShare a Scribd company logo
3/10/2021
1
Chapter Six
Query Languages
 Keyword-based queries
 Pattern matching
 Structural queries
Keyword-based querying
• Queries are combinations of words.
• The document collection is searched for documents that
contain these words.
• Word queries are intuitive, easy to express and provide
fast ranking.
• Here, the concept of word must be defined.
– A word is a sequence of letters terminated by a
separator (period, comma, space, etc).
– Definition of letter and separator is flexible; e.g.,
hyphen could be defined as a letter or as a separator.
– Usually, common words (such as “a”, “the”, “of”, …)
are ignored. 2
Single-word queries
• A query is a single word
– Usually used for searching in document images
• Simplest form of query.
• All documents that include this word are retrieved.
• Documents may be ranked by the frequency of this
word in the document.
3
Phrase queries
 In this case a query is a sequence of words treated as a
single unit.
 Also called “literal string” or “exact phrase” query.
 Phrase is usually surrounded by quotation marks.
 All documents that include this phrase are retrieved.
 Usually, separators (commas, colons, etc.) and common
words (e.g., “a”, “the”, “of”, “for”…) in the phrase are
ignored.
In effect, this query is for a set of words that must appear
in sequence.
Allows users to specify a context and thus gain more
precision.
 Example: “Information Processing for Document
Retrieval”.
4
3/10/2021
2
Multiple-word queries
•In this case a query is a set of words (or phrases).
•Two options: A document is retrieved if it includes
–any of the query words, or
–each of the query words.
•Documents are ranked by the number of query words they
contain:
– A document containing n query words is ranked higher
than a document containing m < n query words.
– Documents are ranked in decreasing order:
•those containing all the query words are ranked at the top, only one query
word at bottom.
–Frequency counts may be used to break ties among
documents that contain the same query words.
–Example: what is the result for the query “Red Bird” ? 5
Proximity queries
•Restrict the distance within a document between two
search terms.
•Important for large documents in which the two search
words may appear in different contexts.
•Proximity specifications limit the acceptable occurrences
and hence increase the precision of the search.
•General Format: Word1 within m units of Word2.
–Unit may be character, word, paragraph, etc.
•Examples:
–Information within 5 words of Retrieval:
Finds documents that discuss “Information Processing for
Document Retrieval” but not “Information Processing and
Searching for Relevant Document Retrieval”.
–Nuclear within 0 paragraphs of science:
Finds documents that discuss “Nuclear” and “science” in
the same paragraph.
6
Boolean queries
•Based on concepts from logic: AND, OR, NOT
–It describes the information needed by relating
multiple words with Boolean operators.
•Operators: AND, OR, NOT
•Semantics: For each query word w a corresponding set
Dw is constructed that includes the documents that
contain w.
•The Boolean expression is then interpreted as an
expression on the corresponding document sets with
corresponding set operators:
–AND: Finds only documents containing all of the
specified words or phrases.
–OR: Finds documents containing at least one of the
specified words or phrases.
–NOT: Excludes documents containing the specified
word or phrase.
7
Boolean Queries
• Precedence: Order of operations
– NOT, AND, OR
– use parentheses to override precedence
– process left-to-right among operators with the
same precedence.
• Truth Table
P Q NOT P P AND Q P OR Q
0 0 TRUE FALSE FALSE
0 1 TRUE FALSE TRUE
1 0 FALSE FALSE TRUE
1 1 FALSE TRUE TRUE
8
3/10/2021
3
Examples: Boolean queries
1.computer OR server
–Finds documents containing either computer, server or both
2. (computer OR server) NOT mainframe
– Select all documents that discuss computers or servers, do
not select any documents that discuss mainframes.
3. computer NOT (server OR mainframe)
–Select all documents that discuss computers, and do not
discuss either servers or mainframes.
4. computer OR server NOT mainframe
–Select all documents that discuss computers, or documents
that discuss servers but do not discuss mainframes.
9
Penalizing documents
•When interpreting queries, some models demote
documents that include keywords that were not requested.
For example:
–Example: Assume the vector model with the cosine
measure and the simple case that both documents and
queries use binary values. Consider these two
documents and a query:
– d1 = (0,1,0,1,0), d2= (0,1,1,1,0), q= (0,1,0,1,0)
– sim(q, d1) = 1.0, sim(q, d2) = 0.82
– d2 is demoted because it includes an extra keyword not
requested by q.
•In contrast, the Boolean model does not “penalize”
documents with extra (non-requested) keywords
10
Natural language
• Using natural language for querying is very attractive.
• Example: Find all the documents that discuss
“ campaign finance reforms, including documents that
discuss violations of campaign financing regulations. Do
not include documents that discuss campaign
contributions by the gun and the tobacco industries”.
• Natural language queries are converted to a formal
language for processing against a set of documents.
• Such translation requires intelligence and is still a
challenge
11
Natural language
• Pseudo NL processing: System scans the text and extracts
recognized terms and Boolean connectors. The
grammaticality of the text is not important.
– Often used by search engines.
• Problem: Recognizing the negation in the search statement
(“Do not include...”).
• Compromise: Users enter natural language clauses
connected with Boolean operators.
• In the above example: “campaign finance reforms” or
“violations of campaign financing regulations" and not
“campaign contributions by the gun and the tobacco
industries”.
12
3/10/2021
4
13
Thank you

More Related Content

Similar to Chapter 6 Query Language .pdf

Text features
Text featuresText features
Text features
Shruti kar
 
Chapter 7.pdf
Chapter 7.pdfChapter 7.pdf
Chapter 7.pdf
Habtamu100
 
qury.pdf
qury.pdfqury.pdf
qury.pdf
Habtamu100
 
Interface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation MemoryInterface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation Memory
Priyatham Bollimpalli
 
Lucene
LuceneLucene
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
Dr. Haxel Consult
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
Sardhendu Mishra
 
IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptx
thenmozhip8
 
3_Indexing.ppt
3_Indexing.ppt3_Indexing.ppt
3_Indexing.ppt
MedinaBedru
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
RajkiranVeluri
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
Kai Li
 
Search pitb
Search pitbSearch pitb
Search pitb
Nawab Iqbal
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
Elsevier
 
IRS-Cataloging and Indexing-2.1.pptx
IRS-Cataloging and Indexing-2.1.pptxIRS-Cataloging and Indexing-2.1.pptx
IRS-Cataloging and Indexing-2.1.pptx
ShivaVemula2
 
Vector space model12345678910111213.pptx
Vector space model12345678910111213.pptxVector space model12345678910111213.pptx
Vector space model12345678910111213.pptx
someyamohsen2
 
2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt
HayomeTakele
 
A Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text miningA Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text mining
IJSRD
 
Phrase based Indexing and Information Retrieval
Phrase based Indexing and Information RetrievalPhrase based Indexing and Information Retrieval
Phrase based Indexing and Information Retrieval
Bala Abirami
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)
 
Introduction to search engine-building with Lucene
Introduction to search engine-building with LuceneIntroduction to search engine-building with Lucene
Introduction to search engine-building with Lucene
Kai Chan
 

Similar to Chapter 6 Query Language .pdf (20)

Text features
Text featuresText features
Text features
 
Chapter 7.pdf
Chapter 7.pdfChapter 7.pdf
Chapter 7.pdf
 
qury.pdf
qury.pdfqury.pdf
qury.pdf
 
Interface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation MemoryInterface for Finding Close Matches from Translation Memory
Interface for Finding Close Matches from Translation Memory
 
Lucene
LuceneLucene
Lucene
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptx
 
3_Indexing.ppt
3_Indexing.ppt3_Indexing.ppt
3_Indexing.ppt
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
 
Search pitb
Search pitbSearch pitb
Search pitb
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
IRS-Cataloging and Indexing-2.1.pptx
IRS-Cataloging and Indexing-2.1.pptxIRS-Cataloging and Indexing-2.1.pptx
IRS-Cataloging and Indexing-2.1.pptx
 
Vector space model12345678910111213.pptx
Vector space model12345678910111213.pptxVector space model12345678910111213.pptx
Vector space model12345678910111213.pptx
 
2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt
 
A Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text miningA Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text mining
 
Phrase based Indexing and Information Retrieval
Phrase based Indexing and Information RetrievalPhrase based Indexing and Information Retrieval
Phrase based Indexing and Information Retrieval
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Introduction to search engine-building with Lucene
Introduction to search engine-building with LuceneIntroduction to search engine-building with Lucene
Introduction to search engine-building with Lucene
 

More from Habtamu100

Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
Habtamu100
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
Habtamu100
 
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdfChapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Habtamu100
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdf
Habtamu100
 
Chapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdfChapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdf
Habtamu100
 
Chapter 3 Indexing.pdf
Chapter 3 Indexing.pdfChapter 3 Indexing.pdf
Chapter 3 Indexing.pdf
Habtamu100
 

More from Habtamu100 (6)

Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
 
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdfChapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdf
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdf
 
Chapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdfChapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdf
 
Chapter 3 Indexing.pdf
Chapter 3 Indexing.pdfChapter 3 Indexing.pdf
Chapter 3 Indexing.pdf
 

Recently uploaded

Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 

Recently uploaded (20)

Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 

Chapter 6 Query Language .pdf

  • 1. 3/10/2021 1 Chapter Six Query Languages  Keyword-based queries  Pattern matching  Structural queries Keyword-based querying • Queries are combinations of words. • The document collection is searched for documents that contain these words. • Word queries are intuitive, easy to express and provide fast ranking. • Here, the concept of word must be defined. – A word is a sequence of letters terminated by a separator (period, comma, space, etc). – Definition of letter and separator is flexible; e.g., hyphen could be defined as a letter or as a separator. – Usually, common words (such as “a”, “the”, “of”, …) are ignored. 2 Single-word queries • A query is a single word – Usually used for searching in document images • Simplest form of query. • All documents that include this word are retrieved. • Documents may be ranked by the frequency of this word in the document. 3 Phrase queries  In this case a query is a sequence of words treated as a single unit.  Also called “literal string” or “exact phrase” query.  Phrase is usually surrounded by quotation marks.  All documents that include this phrase are retrieved.  Usually, separators (commas, colons, etc.) and common words (e.g., “a”, “the”, “of”, “for”…) in the phrase are ignored. In effect, this query is for a set of words that must appear in sequence. Allows users to specify a context and thus gain more precision.  Example: “Information Processing for Document Retrieval”. 4
  • 2. 3/10/2021 2 Multiple-word queries •In this case a query is a set of words (or phrases). •Two options: A document is retrieved if it includes –any of the query words, or –each of the query words. •Documents are ranked by the number of query words they contain: – A document containing n query words is ranked higher than a document containing m < n query words. – Documents are ranked in decreasing order: •those containing all the query words are ranked at the top, only one query word at bottom. –Frequency counts may be used to break ties among documents that contain the same query words. –Example: what is the result for the query “Red Bird” ? 5 Proximity queries •Restrict the distance within a document between two search terms. •Important for large documents in which the two search words may appear in different contexts. •Proximity specifications limit the acceptable occurrences and hence increase the precision of the search. •General Format: Word1 within m units of Word2. –Unit may be character, word, paragraph, etc. •Examples: –Information within 5 words of Retrieval: Finds documents that discuss “Information Processing for Document Retrieval” but not “Information Processing and Searching for Relevant Document Retrieval”. –Nuclear within 0 paragraphs of science: Finds documents that discuss “Nuclear” and “science” in the same paragraph. 6 Boolean queries •Based on concepts from logic: AND, OR, NOT –It describes the information needed by relating multiple words with Boolean operators. •Operators: AND, OR, NOT •Semantics: For each query word w a corresponding set Dw is constructed that includes the documents that contain w. •The Boolean expression is then interpreted as an expression on the corresponding document sets with corresponding set operators: –AND: Finds only documents containing all of the specified words or phrases. –OR: Finds documents containing at least one of the specified words or phrases. –NOT: Excludes documents containing the specified word or phrase. 7 Boolean Queries • Precedence: Order of operations – NOT, AND, OR – use parentheses to override precedence – process left-to-right among operators with the same precedence. • Truth Table P Q NOT P P AND Q P OR Q 0 0 TRUE FALSE FALSE 0 1 TRUE FALSE TRUE 1 0 FALSE FALSE TRUE 1 1 FALSE TRUE TRUE 8
  • 3. 3/10/2021 3 Examples: Boolean queries 1.computer OR server –Finds documents containing either computer, server or both 2. (computer OR server) NOT mainframe – Select all documents that discuss computers or servers, do not select any documents that discuss mainframes. 3. computer NOT (server OR mainframe) –Select all documents that discuss computers, and do not discuss either servers or mainframes. 4. computer OR server NOT mainframe –Select all documents that discuss computers, or documents that discuss servers but do not discuss mainframes. 9 Penalizing documents •When interpreting queries, some models demote documents that include keywords that were not requested. For example: –Example: Assume the vector model with the cosine measure and the simple case that both documents and queries use binary values. Consider these two documents and a query: – d1 = (0,1,0,1,0), d2= (0,1,1,1,0), q= (0,1,0,1,0) – sim(q, d1) = 1.0, sim(q, d2) = 0.82 – d2 is demoted because it includes an extra keyword not requested by q. •In contrast, the Boolean model does not “penalize” documents with extra (non-requested) keywords 10 Natural language • Using natural language for querying is very attractive. • Example: Find all the documents that discuss “ campaign finance reforms, including documents that discuss violations of campaign financing regulations. Do not include documents that discuss campaign contributions by the gun and the tobacco industries”. • Natural language queries are converted to a formal language for processing against a set of documents. • Such translation requires intelligence and is still a challenge 11 Natural language • Pseudo NL processing: System scans the text and extracts recognized terms and Boolean connectors. The grammaticality of the text is not important. – Often used by search engines. • Problem: Recognizing the negation in the search statement (“Do not include...”). • Compromise: Users enter natural language clauses connected with Boolean operators. • In the above example: “campaign finance reforms” or “violations of campaign financing regulations" and not “campaign contributions by the gun and the tobacco industries”. 12