List intersection for web search: Algorithms, Cost Models, and Optimizations

LIST INTERSECTION FOR WEB SEARCH:
ALGORITHMS, COST MODELS, AND
OPTIMIZATIONS
Sunghwan Kim (POSTECH),Taesung Lee (IBM Research AI),
Seung-won Hwang (Yonsei University), Sameh Elnikety (Microsoft Research)
VLDB 2019

List Intersection inWeb Search
Doc id Text
105 … research, so to generate the
optimal query plan for the given
scenario, as commonly used in the
database systems…
… …
592 … My research interests are in
database system and data-driven
intelligence, …
… …
ℐ 𝑥 document IDs
database … 105 … 592 842 …
system … 105 … 592 751 …
research … 105 … 592 642 …
data-driven … 321 … 592 632 …
intelligence … 256 … 592 925 …
Multi-word query in web search engine: list intersection of posting lists
Corpus: document  word list Posting list: word  document list
ℐ 𝑥 = posting list of word x
2

Q. “database system research”
List Intersection in Web Search
3
ℐ 𝑥 = inverted list of word x

A. ℐ(“database”) ∩ ℐ(“system”) ∩ ℐ(“research”)
List Intersection in Web Search
4

List Intersection inWeb Search
A. ℐ(“database”)∩ℐ(“system”)∩ℐ(“research”)
5
ℐ 𝑥 document IDs
database … 105 … 592 842 …
system … 105 … 592 751 …
research … 105 … 592 642 …
∩ … 105 … 592 …

List Intersection Algorithms
6

7

8

9

10

11

12
25
… 13 14 19 20 21 23 26 28 …
j j
25
… 13 14 19 20 21 23 26 28 …
pivot
j j
how?

25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
13<25
Scan-based algorithm
(ex. Merge, SIMD)
Search-based algorithm
(ex. Binary search, Gallop)
13

(ex. Merge, SIMD)
14
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
14<25

(ex. Merge, SIMD)
15
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
26>25

(ex. Merge, SIMD)
16
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
26>25
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
13<25

… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
14<25
(ex. Merge, SIMD)
17
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
26>25

… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
20<25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
28>25
(ex. Merge, SIMD)
18
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
26>25

(ex. Merge, SIMD)
19
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
26>25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
binary search

(ex. Merge, SIMD)
20
… 13 14 19 20 21 23 26 28 …… 13 14 19 20 21 23 26 28 …
j j
move 6 elements
… 13 14 19 20 21 23 26 28 …… 13 14 19 20 21 23 26 28 …
j j
move 6 elements

(ex. Merge, SIMD)
21
displacement (𝜹): distance of cursor moves

Challenge for Optimization
Scenario #1. length ratio 1:1
$(Scan-based) < $(Search-based)
Scenario #2. length ratio 1:1000
$(Scan-based) > $(Search-based)
22
No method wins in every scenarios  requires query optimization
1MA
B 1M
1KA
B 1M

Complexity of Cost Estimation
23
■ Cost of a comparison is not uniform.

24
■ Modern architecture pipelines the
execution of instructions.
10
10
10
10
10Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
time (cycle)
pipeline stage

25
10
10
10
10
10Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
pipeline stage
9
8
7
6
time (cycle)

26
10
10
10
10
10Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
pipeline stage
9
8
7
6
9 11 12
11
13
14
1211
131211
987
98
time (cycle)

27
■ Branch can block the pipeline.
10
10
10
10
10Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
time (cycle)
pipeline stage
9
8
7
6
9
987
98
10 JEQ 6
? Empty? 6 or 11?

28
– CPU predicts the result of branch.
10
10
10
10
10Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
time (cycle)
pipeline stage
9
8
7
6
9
987
98
10 JEQ 6
? Predict 6 or 11!

29
10
10
10
10
10Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
time (cycle)
pipeline stage
9
8
7
6
9
987
98
10 JEQ 6
6 7
6
8
9
76
876
Predicted!

10
10
10
10
10
30
Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
time (cycle)
pipeline stage
9
8
7
6
9
987
98
10 JEQ 6
6 7
6
8
9
76
876
Wrong!

10
10
10
10
10
31
– Failure: 10-40 cycles of penalty.
■ Branch Misprediction
Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
time (cycle)
pipeline stage
9
8
7
6
9
987
98
10 JEQ 6
LOST!
Wrong!

Access Results Penalty
L1 hit 4 cycles
L1 miss + L2 hit 12 cycles
L2 miss + L3 hit 42 cycles
L3 miss
42 cycles
+ RAM latency
(200+ cycles)
32
– Failure: 10-40 cycles of penalty.
■ Branch Misprediction
■ Cache/TLB misses are expensive.
– From 12 to 200+ cycles of latency.
Cache miss penalties in Intel Skylake

Motivations
33
There is no single winner in all scenarios.
Optimization based on the number of
comparisons is not always optimal.
Cost of a comparison is not always same
in modern architecture

Cost-based Query Optimizer
34
k-Merge
SIMDV1
k-Merge
2-Merge
2-Gallop
2-SIMD
Algorithms
Archi-
tecture

35
k-Merge
SIMDV1
k-Merge
2-Merge
2-Gallop
2-SIMD
Algorithms
Archi-
tecture

36
k-Merge
SIMDV1
k-Merge
2-Merge
2-Gallop
2-SIMD
Archi-
tecture
Algorithms

37
k-Merge
SIMDV1
k-Merge
2-Merge
2-Gallop
2-SIMD
Archi-
tecture
Algorithms

38
Archi-
tecture
Algorithms
k-Merge
SIMDV1
k-Merge
2-Merge
2-Gallop
2-SIMD

Goal 1. Adaptive to different machines
39
Intel
Xeon
Algorithms
Intel
Coffee
Lake
Algorithms
AMD
Ryzen
Algorithms

Goal 2. Applicable for future algorithms
40
Intel
Xeon
Algorithms
Intel
Coffee
Lake
Algorithms
AMD
Ryzen
AlgorithmsFuture
Algorithms
Future
Algorithms
Future
Algorithms

Cost Optimizer
■ Suggests cost optimal execution plan for
given query.
41
Cost Optimizer

Cost Optimizer
■ Suggests cost optimal execution plan for
given query.
Challenges
■ The cost of optimization should be
negligible.
■ Thus, requires lightweight cost model
with high accuracy.
42
Cost Optimizer

Cost Model
Cost model
■ Estimate cost of algorithms for given input
■ Consider input properties
– Lengths and correlations
43
Correlations
Lengths
8MA
B 16M
4MA ∩ B

Cost Model
Cost model
■ Estimate cost of algorithms for given input
■ Consider input properties
– Lengths and correlations
Challenges
■ Cost depends on hardware properties.
– eg. cache efficiency, branch misprediction.
■ Analysis of algorithm is complex.
44
Archite
ctures
Intel
Xeon
Archite
ctures
Algorithms
Cost Model
unit cost vector event counts

Cost Model
Procedures
1. Identify expensive events.
– Loops, branches or memory accesses.
45
1. Identify events
algorithms

Cost Model
Procedures
2. Parametrize architecture properties.
– unit cost: cost of an event execution.
46
unit cost vector
Archite
ctures
Intel
Xeon
Archi-
tecture
2. Parametrize
1. Identify events
algorithms

Cost Model
Procedures
3. Model function of event count.
– e.g. # iterations, # mispredictions
47
Archite
ctures
Intel
Xeon
Archi-
tecture
𝑓(𝑞𝑢𝑒𝑟𝑦)
2. Parametrize
3. model
1. Identify events
algorithms

Cost Model
Procedures
– Computes in real-time.
48
Archite
ctures
Intel
Xeon
Archi-
tecture
2. Parametrize
computes
in real-time
3. model
1. Identify events
algorithms

Cost Model
Procedures
– Computes in real-time.
49
Cost Model

Step 1. Event Identification
Event identification
■ Total cost = sum of event cost.
50
cost of an event

■ Cost classification (2 levels)
– 1-level: base latency, branch misprediction, memory overhead.
51
cost of an event

■ Cost classification (2 levels)
– 1-level: base latency, branch misprediction, memory overhead.
– 2-level: identified events  expected to affect the entire cost.
52
cost of an event

Step 2. Parametrize Unit Cost
Parametrize unit cost
■ Learn in the machine by using the synthetic test set.
■ Use gradient descent solver.
53
unit cost vector
Archite
ctures
Intel
Xeon
Archi-
tecture
Learning from the test set

25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
A search of 2-Gallop algorithm
Step 3. Event Count Estimation
■ Understanding displacement (𝜹) distribution is key to estimate the event counts.
– displacement: distance of cursor moved.
54
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
A search of 2-Merge algorithm

55
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
displacement = 6
# comparison = 5
# reference = 6
# cache miss = 0
# MISP = 2

56
displacement = 6
# comparison = 5
# reference = 6
# cache miss = 0
# MISP = 2
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot

– eg. memory reference count of 2-Gallop: 2 𝐿1 0
|𝐿2|
𝑷 𝑫 = 𝜹 ∙ 1 + log2 𝜹
57
displacement
distribution
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
displacement = 6
# comparison = 5
# reference = 6
# cache miss = 0
# MISP = 2

– eg. memory reference count of 2-Gallop: 2 𝐿1 0
|𝐿2|
𝑷 𝑫 = 𝜹 ∙ 1 + log2 𝜹
■ Modeling the displacement distribution by using
– Negative hypergeometric (NHG) distribution and Markov Chain
58
Displacement distribution
(2-way, 𝐴 : |𝐵| = 1:1)

Experimental Settings
Machine
■ SandyBridge i7-3820 3.60GHz
■ IvyBridge i7-3770k 2.93Ghz
Synthetic Dataset
■ Number of lists: 2 to 4
■ Length ratio (min:max) : 1:1 to 1:1024
■ Correlation: 0 to 1
Set of Algorithm in Optimizer
■ 2-way
– 2-Merge, 2-Gallop, 2-SIMD, STL
– SIMDV1, SIMDV3, SIMD Gallop [1]
– SIMD Inoue [2]
■ k-way: k-Merge, k-Gallop
■ Build cost model of each algorithm.
[1] D. Lemire, et. al., Simd compression and the intersection of sorted integers.
Software: Practice and Experience, 2016.
[2] H. Inoue, et. al., Faster set intersection with simd instructions by reducing
branch mispredictions. VLDB 2014
59

Accuracy of the Cost Model
- 2-way and k-way
60
Accuracy of 2-way algorithm cost model Accuracy of k-way algorithm cost model

Accuracy of the Cost Model
- Adaptive to machine
61
in SandyBridge in IvyBridge
Accuracy of 2-way algorithm cost model

Optimizer Efficiency
62
Comparison of representative algorithms in four list synthetic dataset.
(Bold: Best of all, Italic: Best among single algorithm)

Conclusion
■ Cost analysis and estimation should take into account the properties of architecture.
■ Analyzing the probability of the algorithm’s operation provides the result of a more
accurate cost estimation.
■ Based on two observations, we propose cost-based optimizer that is equipped with
lightweight and accurate cost model.
63

THANKYOU 
Contact
Sunghwan Kim
sunghwan08@gmail.com Thank you!
64

Top-down Analysis
■ Hardware profiling method
■ Finding bottleneck/hotspots
■ Cost decomposition
4 major cost factors
■ Based on the part utilization at each cycle.
66
Frontend bound Backend bound
No overhead: Retiring
Bad speculation
Frontend→
↓Backend
Stall Fully Utilized
Stall Bad speculation Backend bound
Fully Utilized Frontend bound Retiring

■ Cycle accounting (eg. top-down analysis)
– hard to identify causes of cost  hard to use for cost estimation.
■ We solve this mismatch by using cause-based pivoting.
67
Our cause-based pivoting

Identify expensive events
- Example: 2-Gallop
68
2-Gallop
■ Base latency (α)
– Inner/outer loops (𝛼𝑖𝑛, 𝛼 𝑜𝑢𝑡)
■ Branch misprediction (β)
– Inner/outer loops (𝛽𝑖𝑛, 𝛽𝑜𝑢𝑡)
■ Memory stalls (γ)
– # references (𝛾 𝑚𝑒𝑚)
– # cache misses (𝛾 𝑚𝑖𝑠𝑠)
𝛼 𝑜𝑢𝑡
𝛼𝑖𝑛
𝛽𝑜𝑢𝑡
𝛽𝑖𝑛
𝛾: memory
references

Identify expensive events
- Example: 2-Merge
69
2-Merge
■ Base latency (α)
■ Branch misprediction (β)
– Significant at low length ratio
■ Memory stalls (γ)
– Marginal (due to sequential scan)
𝛼
𝛽: >, <, =

Probabilistic Analysis of Algorithms
- 2-way displacement analysis
𝛿 can be modeled by negative hypergeometric distribution (NHG)
NHG(k; N, K, r) = k-th “success” shown up after r “failures”
N population: K successes, N-K failures
ex. NHG(4; 8, 4, 2) = LA dodgers wins Milwaukee Brewers by 4-2 in NLCS
(assumption: winning distribution is uniform)
x
A
(failure)
B – A
(success)
z
𝛿 of search y 𝛿 of search z
y
70

Probabilistic Analysis of Algorithms
- k-way displacement analysis
displacement 𝛿𝑖: affected by k each search.
𝛿𝑖 =
𝑗=1
𝑘
𝛿𝑖𝑗
Decompose a displacement 𝛿𝑖
into several subdisplacements 𝛿𝑖𝑗.
Model each 𝛿𝑖𝑗 through markov chain.
If search successes, 𝑣𝑗−1 = 𝑣𝑗, 𝛿𝑖𝑗 = 0
Otherwise, 𝑣𝑗−1 ≠ 𝑣𝑗, 𝛿𝑖𝑗 follows NHG
(similar with 2-way)
71

List intersection for web search: Algorithms, Cost Models, and Optimizations

Recommended

Recommended

More Related Content

Similar to List intersection for web search: Algorithms, Cost Models, and Optimizations

Similar to List intersection for web search: Algorithms, Cost Models, and Optimizations (20)

Recently uploaded

Recently uploaded (20)

List intersection for web search: Algorithms, Cost Models, and Optimizations

Editor's Notes