2. The merge
For query (T2 AND T3)
(Step1): Maintain pointers of posting lists
and walk through the posting lists
simultaneously
(Step2): At each step compare the DocID of
both pointers 2
34
128
2 4 8 16 32 64
1 2 3 5 8 13 21
128
34
2 4 8 16 32 64
1 2 3 5 8 13 21
T2
T3
Sec. 1.3
3. The merge
For query (T2 AND T3)
(Step2b): The DocID of posting list T3 is
small, the pointer of T3 posting list is
advanced
(Step2a): if both DocIDs are same, then
put them in merge list, and advance both
pointers
3
34
128
2 4 8 16 32 64
1 2 3 5 8 13 21
128
34
2 4 8 16 32 64
1 2 3 5 8 13 21
T2
T3
Sec. 1.3
2
4. The merge
For query (T2 AND T3)
What will be the next step?
4
34
128
2 4 8 16 32 64
1 2 3 5 8 13 21
128
34
2 4 8 16 32 64
1 2 3 5 8 13 21
T2
T3
2
5. The merge
Walk through the two postings
simultaneously, in time linear in the total
number of postings entries
For query (T2 AND T3)
5
34
128
2 4 8 16 32 64
1 2 3 5 8 13 21
128
34
2 4 8 16 32 64
1 2 3 5 8 13 21
T2
T3
2 8
6.
7. Merging Posting Lists
If posting list lengths are x and y, merge
takes O(x+y) operations
the complexity of querying is O(n)
What is n here?
Crucial: postings sorted by docID.
8. Query optimization
What is the best order for query
processing?
Consider a query that is an AND of n
terms.
For each of the n terms, get its postings,
then AND them together.
T2
T3
T4
1 2 3 5 8 16 21 34
2 4 8 16 32 64 128
13 16
Query: T3 AND T2 AND T4 8
9. Query optimization example
Process in order of increasing freq:
start with smallest set, then keep cutting
further.
9
Execute the query as (T4 AND T2) AND T3.
T2
T3
T4
1 2 3 5 8 16 21 34
2 4 8 16 32 64 128
13 16 21
11. Exercise
Recommend a query
processing order for
Term Freq
eyes 213312
kaleidoscope 87009
marmalade 107913
skies 271658
tangerine 46653
trees 316812
11
(tangerine OR trees) AND
(marmalade OR skies) AND
(kaleidoscope OR eyes)
13. Recall basic merge
Walk through the two postings
simultaneously, in time linear in the total
number of postings entries
128
31
2 4 8 41 48 64
1 2 3 8 11 17 21
T2
T3
2 8
If the list lengths are x and y, the merge takes O(x+y)
operations.
Can we do better?
Yes (if index isn’t changing too fast). 13
14. Augment postings with skip pointers
(at indexing time)
Why?
To skip postings that will not figure in the
search results.
How?
Where do we place skip pointers?
128
2 4 8 41 48 64
31
1 2 3 8 11 17 21
31
11
41 128
14
15. Query processing with skip pointers
128
2 4 8 41 48 64
31
1 2 3 8 11 17 21
31
11
41 128
Suppose we’ve stepped through the lists until we process 8
on each list. We match it and advance.
We then have 41 and 11 on the lower. 11 is smaller.
15
16. Where do we place skips?
Tradeoff:
More skips shorter skip spans more
likely to skip. But lots of comparisons to skip
pointers.
Fewer skips few pointer comparison, but
then long skip spans few successful skips.
16
17. A simple heuristic for placing skips, which
has been found to work well in practice,
is that for a postings list of length p, use
√p evenly-spaced skip pointers.
Where do we place skips?
18. Quiz
We have a two-word query.
t1=[4,6,10,12,14,16,18,20,22,32,47,81,120,1
22,157,180]
t2= [47].
Work out how many comparisons would be done to
intersect the two postings lists with the following two
strategies. Briefly justify your answers:
Using standard postings lists
Using postings lists stored with skip pointers, with a skip length
18
19. Solution
a) The number of comparisons would be
11 as shown in the following.
(4,47), (6,47), (10,47), (12,47), (14,47), (16,47), (18,47), (20,47),
(22,47), (32,47),(47,47)
b) Since the skip length of 16 = 4, the postings lists with
skip pointers would be 4 → 14, 14 → 22, 22 → 120, 120
→ 180. the number of comparisons would be 6;
(4,47), (14,47), (22,47), (120,47), (32,47),(47,47).
19