SlideShare a Scribd company logo
1 of 19
Query Processing
Dr. Shafiq
The merge
 For query (T2 AND T3)
 (Step1): Maintain pointers of posting lists
and walk through the posting lists
simultaneously
 (Step2): At each step compare the DocID of
both pointers 2
34
128
2 4 8 16 32 64
1 2 3 5 8 13 21
128
34
2 4 8 16 32 64
1 2 3 5 8 13 21
T2
T3
Sec. 1.3
The merge
 For query (T2 AND T3)
 (Step2b): The DocID of posting list T3 is
small, the pointer of T3 posting list is
advanced
 (Step2a): if both DocIDs are same, then
put them in merge list, and advance both
pointers
3
34
128
2 4 8 16 32 64
1 2 3 5 8 13 21
128
34
2 4 8 16 32 64
1 2 3 5 8 13 21
T2
T3
Sec. 1.3
2
The merge
 For query (T2 AND T3)
 What will be the next step?
4
34
128
2 4 8 16 32 64
1 2 3 5 8 13 21
128
34
2 4 8 16 32 64
1 2 3 5 8 13 21
T2
T3
2
The merge
 Walk through the two postings
simultaneously, in time linear in the total
number of postings entries
 For query (T2 AND T3)
5
34
128
2 4 8 16 32 64
1 2 3 5 8 13 21
128
34
2 4 8 16 32 64
1 2 3 5 8 13 21
T2
T3
2 8
Merging Posting Lists
 If posting list lengths are x and y, merge
takes O(x+y) operations
 the complexity of querying is O(n)
 What is n here?
 Crucial: postings sorted by docID.
Query optimization
 What is the best order for query
processing?
 Consider a query that is an AND of n
terms.
 For each of the n terms, get its postings,
then AND them together.
T2
T3
T4
1 2 3 5 8 16 21 34
2 4 8 16 32 64 128
13 16
Query: T3 AND T2 AND T4 8
Query optimization example
 Process in order of increasing freq:
 start with smallest set, then keep cutting
further.
9
Execute the query as (T4 AND T2) AND T3.
T2
T3
T4
1 2 3 5 8 16 21 34
2 4 8 16 32 64 128
13 16 21
More general optimization
10
Exercise
 Recommend a query
processing order for
Term Freq
eyes 213312
kaleidoscope 87009
marmalade 107913
skies 271658
tangerine 46653
trees 316812
11
(tangerine OR trees) AND
(marmalade OR skies) AND
(kaleidoscope OR eyes)
FASTER POSTINGS
MERGES:
SKIP POINTERS/SKIP
LISTS
12
Recall basic merge
 Walk through the two postings
simultaneously, in time linear in the total
number of postings entries
128
31
2 4 8 41 48 64
1 2 3 8 11 17 21
T2
T3
2 8
If the list lengths are x and y, the merge takes O(x+y)
operations.
Can we do better?
Yes (if index isn’t changing too fast). 13
Augment postings with skip pointers
(at indexing time)
 Why?
 To skip postings that will not figure in the
search results.
 How?
 Where do we place skip pointers?
128
2 4 8 41 48 64
31
1 2 3 8 11 17 21
31
11
41 128
14
Query processing with skip pointers
128
2 4 8 41 48 64
31
1 2 3 8 11 17 21
31
11
41 128
Suppose we’ve stepped through the lists until we process 8
on each list. We match it and advance.
We then have 41 and 11 on the lower. 11 is smaller.
15
Where do we place skips?
 Tradeoff:
 More skips  shorter skip spans  more
likely to skip. But lots of comparisons to skip
pointers.
 Fewer skips  few pointer comparison, but
then long skip spans  few successful skips.
16
A simple heuristic for placing skips, which
has been found to work well in practice,
is that for a postings list of length p, use
√p evenly-spaced skip pointers.
Where do we place skips?
Quiz
 We have a two-word query.
 t1=[4,6,10,12,14,16,18,20,22,32,47,81,120,1
22,157,180]
 t2= [47].
 Work out how many comparisons would be done to
intersect the two postings lists with the following two
strategies. Briefly justify your answers:
 Using standard postings lists
 Using postings lists stored with skip pointers, with a skip length
18
Solution
 a) The number of comparisons would be
11 as shown in the following.
 (4,47), (6,47), (10,47), (12,47), (14,47), (16,47), (18,47), (20,47),
(22,47), (32,47),(47,47)
 b) Since the skip length of 16 = 4, the postings lists with
skip pointers would be 4 → 14, 14 → 22, 22 → 120, 120
→ 180. the number of comparisons would be 6;
 (4,47), (14,47), (22,47), (120,47), (32,47),(47,47).
19

More Related Content

Similar to Query Processing in IR

Sorting Techniques
Sorting TechniquesSorting Techniques
Sorting Techniques
Rafay Farooq
 
Data Encryption standard in cryptography
Data Encryption standard in cryptographyData Encryption standard in cryptography
Data Encryption standard in cryptography
NithyasriA2
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure
Eman magdy
 

Similar to Query Processing in IR (20)

Pipelining cache
Pipelining cachePipelining cache
Pipelining cache
 
Pipelining Cache
Pipelining CachePipelining Cache
Pipelining Cache
 
Sorting Techniques
Sorting TechniquesSorting Techniques
Sorting Techniques
 
Data Encryption standard in cryptography
Data Encryption standard in cryptographyData Encryption standard in cryptography
Data Encryption standard in cryptography
 
sorting
sortingsorting
sorting
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Nikit
NikitNikit
Nikit
 
Mergesort
MergesortMergesort
Mergesort
 
Merge sort analysis and its real time applications
Merge sort analysis and its real time applicationsMerge sort analysis and its real time applications
Merge sort analysis and its real time applications
 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conq
 
Sortsearch
SortsearchSortsearch
Sortsearch
 
Cryptographic algorithms
Cryptographic algorithmsCryptographic algorithms
Cryptographic algorithms
 
Cryptographic algorithms
Cryptographic algorithmsCryptographic algorithms
Cryptographic algorithms
 
Chapter 4 ds
Chapter 4 dsChapter 4 ds
Chapter 4 ds
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure
 
Algorithms Exam Help
Algorithms Exam HelpAlgorithms Exam Help
Algorithms Exam Help
 
Database Systems Assignment Help
Database Systems Assignment HelpDatabase Systems Assignment Help
Database Systems Assignment Help
 
Migrate 10TB to Exadata Tips and Tricks (Presentation)
Migrate 10TB to Exadata Tips and Tricks (Presentation)Migrate 10TB to Exadata Tips and Tricks (Presentation)
Migrate 10TB to Exadata Tips and Tricks (Presentation)
 
Data Structure (MC501)
Data Structure (MC501)Data Structure (MC501)
Data Structure (MC501)
 
Algorithm Assignment Help
Algorithm Assignment HelpAlgorithm Assignment Help
Algorithm Assignment Help
 

Recently uploaded

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Query Processing in IR

  • 2. The merge  For query (T2 AND T3)  (Step1): Maintain pointers of posting lists and walk through the posting lists simultaneously  (Step2): At each step compare the DocID of both pointers 2 34 128 2 4 8 16 32 64 1 2 3 5 8 13 21 128 34 2 4 8 16 32 64 1 2 3 5 8 13 21 T2 T3 Sec. 1.3
  • 3. The merge  For query (T2 AND T3)  (Step2b): The DocID of posting list T3 is small, the pointer of T3 posting list is advanced  (Step2a): if both DocIDs are same, then put them in merge list, and advance both pointers 3 34 128 2 4 8 16 32 64 1 2 3 5 8 13 21 128 34 2 4 8 16 32 64 1 2 3 5 8 13 21 T2 T3 Sec. 1.3 2
  • 4. The merge  For query (T2 AND T3)  What will be the next step? 4 34 128 2 4 8 16 32 64 1 2 3 5 8 13 21 128 34 2 4 8 16 32 64 1 2 3 5 8 13 21 T2 T3 2
  • 5. The merge  Walk through the two postings simultaneously, in time linear in the total number of postings entries  For query (T2 AND T3) 5 34 128 2 4 8 16 32 64 1 2 3 5 8 13 21 128 34 2 4 8 16 32 64 1 2 3 5 8 13 21 T2 T3 2 8
  • 6.
  • 7. Merging Posting Lists  If posting list lengths are x and y, merge takes O(x+y) operations  the complexity of querying is O(n)  What is n here?  Crucial: postings sorted by docID.
  • 8. Query optimization  What is the best order for query processing?  Consider a query that is an AND of n terms.  For each of the n terms, get its postings, then AND them together. T2 T3 T4 1 2 3 5 8 16 21 34 2 4 8 16 32 64 128 13 16 Query: T3 AND T2 AND T4 8
  • 9. Query optimization example  Process in order of increasing freq:  start with smallest set, then keep cutting further. 9 Execute the query as (T4 AND T2) AND T3. T2 T3 T4 1 2 3 5 8 16 21 34 2 4 8 16 32 64 128 13 16 21
  • 11. Exercise  Recommend a query processing order for Term Freq eyes 213312 kaleidoscope 87009 marmalade 107913 skies 271658 tangerine 46653 trees 316812 11 (tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes)
  • 13. Recall basic merge  Walk through the two postings simultaneously, in time linear in the total number of postings entries 128 31 2 4 8 41 48 64 1 2 3 8 11 17 21 T2 T3 2 8 If the list lengths are x and y, the merge takes O(x+y) operations. Can we do better? Yes (if index isn’t changing too fast). 13
  • 14. Augment postings with skip pointers (at indexing time)  Why?  To skip postings that will not figure in the search results.  How?  Where do we place skip pointers? 128 2 4 8 41 48 64 31 1 2 3 8 11 17 21 31 11 41 128 14
  • 15. Query processing with skip pointers 128 2 4 8 41 48 64 31 1 2 3 8 11 17 21 31 11 41 128 Suppose we’ve stepped through the lists until we process 8 on each list. We match it and advance. We then have 41 and 11 on the lower. 11 is smaller. 15
  • 16. Where do we place skips?  Tradeoff:  More skips  shorter skip spans  more likely to skip. But lots of comparisons to skip pointers.  Fewer skips  few pointer comparison, but then long skip spans  few successful skips. 16
  • 17. A simple heuristic for placing skips, which has been found to work well in practice, is that for a postings list of length p, use √p evenly-spaced skip pointers. Where do we place skips?
  • 18. Quiz  We have a two-word query.  t1=[4,6,10,12,14,16,18,20,22,32,47,81,120,1 22,157,180]  t2= [47].  Work out how many comparisons would be done to intersect the two postings lists with the following two strategies. Briefly justify your answers:  Using standard postings lists  Using postings lists stored with skip pointers, with a skip length 18
  • 19. Solution  a) The number of comparisons would be 11 as shown in the following.  (4,47), (6,47), (10,47), (12,47), (14,47), (16,47), (18,47), (20,47), (22,47), (32,47),(47,47)  b) Since the skip length of 16 = 4, the postings lists with skip pointers would be 4 → 14, 14 → 22, 22 → 120, 120 → 180. the number of comparisons would be 6;  (4,47), (14,47), (22,47), (120,47), (32,47),(47,47). 19