SlideShare a Scribd company logo
1 of 71
LIST INTERSECTION FOR WEB SEARCH:
ALGORITHMS, COST MODELS, AND
OPTIMIZATIONS
Sunghwan Kim (POSTECH),Taesung Lee (IBM Research AI),
Seung-won Hwang (Yonsei University), Sameh Elnikety (Microsoft Research)
VLDB 2019
List Intersection inWeb Search
Doc id Text
105 … research, so to generate the
optimal query plan for the given
scenario, as commonly used in the
database systems…
… …
592 … My research interests are in
database system and data-driven
intelligence, …
… …
ℐ 𝑥 document IDs
database … 105 … 592 842 …
system … 105 … 592 751 …
research … 105 … 592 642 …
data-driven … 321 … 592 632 …
intelligence … 256 … 592 925 …
Multi-word query in web search engine: list intersection of posting lists
Corpus: document  word list Posting list: word  document list
ℐ 𝑥 = posting list of word x
2
Q. “database system research”
List Intersection in Web Search
Multi-word query in web search engine: list intersection of posting lists
3
ℐ 𝑥 = inverted list of word x
Q. “database system research”
A. ℐ(“database”) ∩ ℐ(“system”) ∩ ℐ(“research”)
List Intersection in Web Search
Multi-word query in web search engine: list intersection of posting lists
4
ℐ 𝑥 = inverted list of word x
List Intersection inWeb Search
Q. “database system research”
A. ℐ(“database”)∩ℐ(“system”)∩ℐ(“research”)
5
Multi-word query in web search engine: list intersection of posting lists
ℐ 𝑥 = inverted list of word x
ℐ 𝑥 document IDs
database … 105 … 592 842 …
system … 105 … 592 751 …
research … 105 … 592 642 …
∩ … 105 … 592 …
List Intersection Algorithms
6
List Intersection Algorithms
7
List Intersection Algorithms
8
List Intersection Algorithms
9
List Intersection Algorithms
10
List Intersection Algorithms
11
List Intersection Algorithms
12
25
… 13 14 19 20 21 23 26 28 …
j j
25
… 13 14 19 20 21 23 26 28 …
pivot
j j
how?
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
13<25
Scan-based algorithm
(ex. Merge, SIMD)
List Intersection Algorithms
Search-based algorithm
(ex. Binary search, Gallop)
13
Scan-based algorithm
(ex. Merge, SIMD)
List Intersection Algorithms
Search-based algorithm
(ex. Binary search, Gallop)
14
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
14<25
Scan-based algorithm
(ex. Merge, SIMD)
List Intersection Algorithms
Search-based algorithm
(ex. Binary search, Gallop)
15
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
26>25
Scan-based algorithm
(ex. Merge, SIMD)
List Intersection Algorithms
Search-based algorithm
(ex. Binary search, Gallop)
16
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
26>25
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
13<25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
14<25
Scan-based algorithm
(ex. Merge, SIMD)
List Intersection Algorithms
Search-based algorithm
(ex. Binary search, Gallop)
17
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
26>25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
20<25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
28>25
Scan-based algorithm
(ex. Merge, SIMD)
List Intersection Algorithms
Search-based algorithm
(ex. Binary search, Gallop)
18
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
26>25
Scan-based algorithm
(ex. Merge, SIMD)
List Intersection Algorithms
Search-based algorithm
(ex. Binary search, Gallop)
19
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
26>25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
binary search
Scan-based algorithm
(ex. Merge, SIMD)
List Intersection Algorithms
Search-based algorithm
(ex. Binary search, Gallop)
20
… 13 14 19 20 21 23 26 28 …… 13 14 19 20 21 23 26 28 …
j j
move 6 elements
… 13 14 19 20 21 23 26 28 …… 13 14 19 20 21 23 26 28 …
j j
move 6 elements
Scan-based algorithm
(ex. Merge, SIMD)
List Intersection Algorithms
Search-based algorithm
(ex. Binary search, Gallop)
21
displacement (𝜹): distance of cursor moves
Challenge for Optimization
Scenario #1. length ratio 1:1
$(Scan-based) < $(Search-based)
Scenario #2. length ratio 1:1000
$(Scan-based) > $(Search-based)
22
No method wins in every scenarios  requires query optimization
1MA
B 1M
1KA
B 1M
Complexity of Cost Estimation
23
■ Cost of a comparison is not uniform.
Complexity of Cost Estimation
24
■ Cost of a comparison is not uniform.
■ Modern architecture pipelines the
execution of instructions.
10
10
10
10
10Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
time (cycle)
pipeline stage
Complexity of Cost Estimation
25
■ Cost of a comparison is not uniform.
■ Modern architecture pipelines the
execution of instructions.
10
10
10
10
10Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
pipeline stage
9
8
7
6
time (cycle)
Complexity of Cost Estimation
26
■ Cost of a comparison is not uniform.
■ Modern architecture pipelines the
execution of instructions.
10
10
10
10
10Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
pipeline stage
9
8
7
6
9 11 12
11
13
14
1211
131211
987
98
time (cycle)
Complexity of Cost Estimation
27
■ Cost of a comparison is not uniform.
■ Modern architecture pipelines the
execution of instructions.
■ Branch can block the pipeline.
10
10
10
10
10Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
time (cycle)
pipeline stage
9
8
7
6
9
987
98
10 JEQ 6
? Empty? 6 or 11?
Complexity of Cost Estimation
28
■ Cost of a comparison is not uniform.
■ Modern architecture pipelines the
execution of instructions.
■ Branch can block the pipeline.
– CPU predicts the result of branch.
10
10
10
10
10Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
time (cycle)
pipeline stage
9
8
7
6
9
987
98
10 JEQ 6
? Predict 6 or 11!
Complexity of Cost Estimation
29
■ Cost of a comparison is not uniform.
■ Modern architecture pipelines the
execution of instructions.
■ Branch can block the pipeline.
– CPU predicts the result of branch.
10
10
10
10
10Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
time (cycle)
pipeline stage
9
8
7
6
9
987
98
10 JEQ 6
6 7
6
8
9
76
876
Predicted!
10
10
10
10
10
Complexity of Cost Estimation
30
■ Cost of a comparison is not uniform.
■ Modern architecture pipelines the
execution of instructions.
■ Branch can block the pipeline.
– CPU predicts the result of branch.
Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
time (cycle)
pipeline stage
9
8
7
6
9
987
98
10 JEQ 6
6 7
6
8
9
76
876
Wrong!
10
10
10
10
10
Complexity of Cost Estimation
31
■ Cost of a comparison is not uniform.
■ Modern architecture pipelines the
execution of instructions.
■ Branch can block the pipeline.
– CPU predicts the result of branch.
– Failure: 10-40 cycles of penalty.
■ Branch Misprediction
Fetch
Decode
Execute
Memory
Write Back
𝑡1 𝑡3 𝑡5𝑡2 𝑡4
time (cycle)
pipeline stage
9
8
7
6
9
987
98
10 JEQ 6
LOST!
Wrong!
Complexity of Cost Estimation
Access Results Penalty
L1 hit 4 cycles
L1 miss + L2 hit 12 cycles
L2 miss + L3 hit 42 cycles
L3 miss
42 cycles
+ RAM latency
(200+ cycles)
32
■ Cost of a comparison is not uniform.
■ Modern architecture pipelines the
execution of instructions.
■ Branch can block the pipeline.
– CPU predicts the result of branch.
– Failure: 10-40 cycles of penalty.
■ Branch Misprediction
■ Cache/TLB misses are expensive.
– From 12 to 200+ cycles of latency.
Cache miss penalties in Intel Skylake
Motivations
33
There is no single winner in all scenarios.
Optimization based on the number of
comparisons is not always optimal.
Cost of a comparison is not always same
in modern architecture
Cost-based Query Optimizer
34
k-Merge
SIMDV1
k-Merge
2-Merge
2-Gallop
2-SIMD
Algorithms
Archi-
tecture
Cost-based Query Optimizer
35
k-Merge
SIMDV1
k-Merge
2-Merge
2-Gallop
2-SIMD
Algorithms
Archi-
tecture
Cost-based Query Optimizer
36
k-Merge
SIMDV1
k-Merge
2-Merge
2-Gallop
2-SIMD
Archi-
tecture
Algorithms
Cost-based Query Optimizer
37
k-Merge
SIMDV1
k-Merge
2-Merge
2-Gallop
2-SIMD
Archi-
tecture
Algorithms
Cost-based Query Optimizer
38
Archi-
tecture
Algorithms
k-Merge
SIMDV1
k-Merge
2-Merge
2-Gallop
2-SIMD
Cost-based Query Optimizer
Goal 1. Adaptive to different machines
39
Intel
Xeon
Algorithms
Intel
Coffee
Lake
Algorithms
AMD
Ryzen
Algorithms
Cost-based Query Optimizer
Goal 2. Applicable for future algorithms
40
Intel
Xeon
Algorithms
Intel
Coffee
Lake
Algorithms
AMD
Ryzen
AlgorithmsFuture
Algorithms
Future
Algorithms
Future
Algorithms
Cost-based Query Optimizer
Cost Optimizer
■ Suggests cost optimal execution plan for
given query.
41
Cost Optimizer
Cost-based Query Optimizer
Cost Optimizer
■ Suggests cost optimal execution plan for
given query.
Challenges
■ The cost of optimization should be
negligible.
■ Thus, requires lightweight cost model
with high accuracy.
42
Cost Optimizer
Cost Model
Cost model
■ Estimate cost of algorithms for given input
■ Consider input properties
– Lengths and correlations
43
Correlations
Lengths
8MA
B 16M
4MA ∩ B
Cost Model
Cost model
■ Estimate cost of algorithms for given input
■ Consider input properties
– Lengths and correlations
Challenges
■ Cost depends on hardware properties.
– eg. cache efficiency, branch misprediction.
■ Analysis of algorithm is complex.
44
Archite
ctures
Intel
Xeon
Archite
ctures
Algorithms
Cost Model
unit cost vector event counts
Cost Model
Procedures
1. Identify expensive events.
– Loops, branches or memory accesses.
45
1. Identify events
algorithms
Cost Model
Procedures
1. Identify expensive events.
– Loops, branches or memory accesses.
2. Parametrize architecture properties.
– unit cost: cost of an event execution.
46
unit cost vector
Archite
ctures
Intel
Xeon
Archi-
tecture
2. Parametrize
1. Identify events
algorithms
Cost Model
Procedures
1. Identify expensive events.
– Loops, branches or memory accesses.
2. Parametrize architecture properties.
– unit cost: cost of an event execution.
3. Model function of event count.
– e.g. # iterations, # mispredictions
47
unit cost vector event counts
Archite
ctures
Intel
Xeon
Archi-
tecture
𝑓(𝑞𝑢𝑒𝑟𝑦)
2. Parametrize
3. model
1. Identify events
algorithms
Cost Model
Procedures
1. Identify expensive events.
– Loops, branches or memory accesses.
2. Parametrize architecture properties.
– unit cost: cost of an event execution.
3. Model function of event count.
– e.g. # iterations, # mispredictions
– Computes in real-time.
48
unit cost vector event counts
Archite
ctures
Intel
Xeon
Archi-
tecture
𝑓(𝑞𝑢𝑒𝑟𝑦)
2. Parametrize
computes
in real-time
3. model
1. Identify events
algorithms
Cost Model
Procedures
1. Identify expensive events.
– Loops, branches or memory accesses.
2. Parametrize architecture properties.
– unit cost: cost of an event execution.
3. Model function of event count.
– e.g. # iterations, # mispredictions
– Computes in real-time.
49
Cost Model
unit cost vector event counts
𝑓(𝑞𝑢𝑒𝑟𝑦)
Step 1. Event Identification
Event identification
■ Total cost = sum of event cost.
50
cost of an event
Step 1. Event Identification
Event identification
■ Total cost = sum of event cost.
■ Cost classification (2 levels)
– 1-level: base latency, branch misprediction, memory overhead.
51
cost of an event
Step 1. Event Identification
Event identification
■ Total cost = sum of event cost.
■ Cost classification (2 levels)
– 1-level: base latency, branch misprediction, memory overhead.
– 2-level: identified events  expected to affect the entire cost.
52
cost of an event
Step 2. Parametrize Unit Cost
Parametrize unit cost
■ Learn in the machine by using the synthetic test set.
■ Use gradient descent solver.
53
unit cost vector
Archite
ctures
Intel
Xeon
Archi-
tecture
Learning from the test set
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
A search of 2-Gallop algorithm
Step 3. Event Count Estimation
■ Understanding displacement (𝜹) distribution is key to estimate the event counts.
– displacement: distance of cursor moved.
54
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
A search of 2-Merge algorithm
Step 3. Event Count Estimation
■ Understanding displacement (𝜹) distribution is key to estimate the event counts.
– displacement: distance of cursor moved.
55
A search of 2-Gallop algorithm
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
displacement = 6
# comparison = 5
# reference = 6
# cache miss = 0
# MISP = 2
Step 3. Event Count Estimation
■ Understanding displacement (𝜹) distribution is key to estimate the event counts.
– displacement: distance of cursor moved.
56
displacement = 6
# comparison = 5
# reference = 6
# cache miss = 0
# MISP = 2
A search of 2-Gallop algorithm
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
Step 3. Event Count Estimation
■ Understanding displacement (𝜹) distribution is key to estimate the event counts.
– eg. memory reference count of 2-Gallop: 2 𝐿1 0
|𝐿2|
𝑷 𝑫 = 𝜹 ∙ 1 + log2 𝜹
57
displacement
distribution
A search of 2-Gallop algorithm
25
… 13 14 19 20 21 23 26 28 …
25
… 13 14 19 20 21 23 26 28 …
pivot
displacement = 6
# comparison = 5
# reference = 6
# cache miss = 0
# MISP = 2
Step 3. Event Count Estimation
■ Understanding displacement (𝜹) distribution is key to estimate the event counts.
– eg. memory reference count of 2-Gallop: 2 𝐿1 0
|𝐿2|
𝑷 𝑫 = 𝜹 ∙ 1 + log2 𝜹
■ Modeling the displacement distribution by using
– Negative hypergeometric (NHG) distribution and Markov Chain
58
Displacement distribution
(2-way, 𝐴 : |𝐵| = 1:1)
Experimental Settings
Machine
■ SandyBridge i7-3820 3.60GHz
■ IvyBridge i7-3770k 2.93Ghz
Synthetic Dataset
■ Number of lists: 2 to 4
■ Length ratio (min:max) : 1:1 to 1:1024
■ Correlation: 0 to 1
Set of Algorithm in Optimizer
■ 2-way
– 2-Merge, 2-Gallop, 2-SIMD, STL
– SIMDV1, SIMDV3, SIMD Gallop [1]
– SIMD Inoue [2]
■ k-way: k-Merge, k-Gallop
■ Build cost model of each algorithm.
[1] D. Lemire, et. al., Simd compression and the intersection of sorted integers.
Software: Practice and Experience, 2016.
[2] H. Inoue, et. al., Faster set intersection with simd instructions by reducing
branch mispredictions. VLDB 2014
59
Accuracy of the Cost Model
- 2-way and k-way
60
Accuracy of 2-way algorithm cost model Accuracy of k-way algorithm cost model
Accuracy of the Cost Model
- Adaptive to machine
61
in SandyBridge in IvyBridge
Accuracy of 2-way algorithm cost model
Optimizer Efficiency
62
Comparison of representative algorithms in four list synthetic dataset.
(Bold: Best of all, Italic: Best among single algorithm)
Conclusion
■ Cost analysis and estimation should take into account the properties of architecture.
■ Analyzing the probability of the algorithm’s operation provides the result of a more
accurate cost estimation.
■ Based on two observations, we propose cost-based optimizer that is equipped with
lightweight and accurate cost model.
63
THANKYOU 
Contact
Sunghwan Kim
sunghwan08@gmail.com Thank you!
64
APPENDIX
65
Top-down Analysis
■ Hardware profiling method
■ Finding bottleneck/hotspots
■ Cost decomposition
4 major cost factors
■ Based on the part utilization at each cycle.
66
Frontend bound Backend bound
No overhead: Retiring
Bad speculation
Frontend→
↓Backend
Stall Fully Utilized
Stall Bad speculation Backend bound
Fully Utilized Frontend bound Retiring
Step 1. Event Identification
■ Cycle accounting (eg. top-down analysis)
– hard to identify causes of cost  hard to use for cost estimation.
■ We solve this mismatch by using cause-based pivoting.
67
Our cause-based pivoting
Identify expensive events
- Example: 2-Gallop
68
2-Gallop
■ Base latency (α)
– Inner/outer loops (𝛼𝑖𝑛, 𝛼 𝑜𝑢𝑡)
■ Branch misprediction (β)
– Inner/outer loops (𝛽𝑖𝑛, 𝛽𝑜𝑢𝑡)
■ Memory stalls (γ)
– # references (𝛾 𝑚𝑒𝑚)
– # cache misses (𝛾 𝑚𝑖𝑠𝑠)
𝛼 𝑜𝑢𝑡
𝛼𝑖𝑛
𝛽𝑜𝑢𝑡
𝛽𝑖𝑛
𝛾: memory
references
Identify expensive events
- Example: 2-Merge
69
2-Merge
■ Base latency (α)
■ Branch misprediction (β)
– Significant at low length ratio
■ Memory stalls (γ)
– Marginal (due to sequential scan)
𝛼
𝛽: >, <, =
Probabilistic Analysis of Algorithms
- 2-way displacement analysis
𝛿 can be modeled by negative hypergeometric distribution (NHG)
NHG(k; N, K, r) = k-th “success” shown up after r “failures”
N population: K successes, N-K failures
ex. NHG(4; 8, 4, 2) = LA dodgers wins Milwaukee Brewers by 4-2 in NLCS
(assumption: winning distribution is uniform)
x
A
(failure)
B – A
(success)
z
𝛿 of search y 𝛿 of search z
y
70
Probabilistic Analysis of Algorithms
- k-way displacement analysis
displacement 𝛿𝑖: affected by k each search.
𝛿𝑖 =
𝑗=1
𝑘
𝛿𝑖𝑗
Decompose a displacement 𝛿𝑖
into several subdisplacements 𝛿𝑖𝑗.
Model each 𝛿𝑖𝑗 through markov chain.
If search successes, 𝑣𝑗−1 = 𝑣𝑗, 𝛿𝑖𝑗 = 0
Otherwise, 𝑣𝑗−1 ≠ 𝑣𝑗, 𝛿𝑖𝑗 follows NHG
(similar with 2-way)
71

More Related Content

Similar to List intersection for web search: Algorithms, Cost Models, and Optimizations

Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiDatabricks
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdfFrangoCamila
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData
 
CFD Cornell Energy Workshop - M.F. Campuzano Ochoa
CFD Cornell Energy Workshop - M.F. Campuzano OchoaCFD Cornell Energy Workshop - M.F. Campuzano Ochoa
CFD Cornell Energy Workshop - M.F. Campuzano OchoaMario Felipe Campuzano Ochoa
 
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data StreamingTutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data StreamingVincenzo Gulisano
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and ArchitectureSidney Chen
 
A Consolidation Success Story by Karl Arao
A Consolidation Success Story by Karl AraoA Consolidation Success Story by Karl Arao
A Consolidation Success Story by Karl AraoEnkitec
 
ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale Subbu Allamaraju
 
Big Data-Driven Applications with Cassandra and Spark
Big Data-Driven Applications  with Cassandra and SparkBig Data-Driven Applications  with Cassandra and Spark
Big Data-Driven Applications with Cassandra and SparkArtem Chebotko
 
Adaptive Query Processing on RAW Data
Adaptive Query Processing on RAW DataAdaptive Query Processing on RAW Data
Adaptive Query Processing on RAW DataManos Karpathiotakis
 
Megamodeling of Complex, Distributed, Heterogeneous CPS Systems
Megamodeling of Complex, Distributed, Heterogeneous CPS SystemsMegamodeling of Complex, Distributed, Heterogeneous CPS Systems
Megamodeling of Complex, Distributed, Heterogeneous CPS SystemsEugenio Villar
 
Use of cfd in aerodynamic performance of race car
Use of cfd in aerodynamic performance of race carUse of cfd in aerodynamic performance of race car
Use of cfd in aerodynamic performance of race carDesignage Solutions
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfcookie1969
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Yuichiro Yasui
 
Efficient Design Exploration for Civil Aircraft Using a Kriging-Based Genetic...
Efficient Design Exploration for Civil Aircraft Using a Kriging-Based Genetic...Efficient Design Exploration for Civil Aircraft Using a Kriging-Based Genetic...
Efficient Design Exploration for Civil Aircraft Using a Kriging-Based Genetic...Masahiro Kanazaki
 
Code dive 2019 kamil witecki - should i care about cpu cache
Code dive 2019   kamil witecki - should i care about cpu cacheCode dive 2019   kamil witecki - should i care about cpu cache
Code dive 2019 kamil witecki - should i care about cpu cacheKamil Witecki
 

Similar to List intersection for web search: Algorithms, Cost Models, and Optimizations (20)

Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdf
 
Pasteur deep seq_analysis_theory_2016
Pasteur deep seq_analysis_theory_2016Pasteur deep seq_analysis_theory_2016
Pasteur deep seq_analysis_theory_2016
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 
CFD Cornell Energy Workshop - M.F. Campuzano Ochoa
CFD Cornell Energy Workshop - M.F. Campuzano OchoaCFD Cornell Energy Workshop - M.F. Campuzano Ochoa
CFD Cornell Energy Workshop - M.F. Campuzano Ochoa
 
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data StreamingTutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 
A Consolidation Success Story by Karl Arao
A Consolidation Success Story by Karl AraoA Consolidation Success Story by Karl Arao
A Consolidation Success Story by Karl Arao
 
ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale ql.io: Consuming HTTP at Scale
ql.io: Consuming HTTP at Scale
 
Big Data-Driven Applications with Cassandra and Spark
Big Data-Driven Applications  with Cassandra and SparkBig Data-Driven Applications  with Cassandra and Spark
Big Data-Driven Applications with Cassandra and Spark
 
Adaptive Query Processing on RAW Data
Adaptive Query Processing on RAW DataAdaptive Query Processing on RAW Data
Adaptive Query Processing on RAW Data
 
Megamodeling of Complex, Distributed, Heterogeneous CPS Systems
Megamodeling of Complex, Distributed, Heterogeneous CPS SystemsMegamodeling of Complex, Distributed, Heterogeneous CPS Systems
Megamodeling of Complex, Distributed, Heterogeneous CPS Systems
 
Use of cfd in aerodynamic performance of race car
Use of cfd in aerodynamic performance of race carUse of cfd in aerodynamic performance of race car
Use of cfd in aerodynamic performance of race car
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
 
Efficient Design Exploration for Civil Aircraft Using a Kriging-Based Genetic...
Efficient Design Exploration for Civil Aircraft Using a Kriging-Based Genetic...Efficient Design Exploration for Civil Aircraft Using a Kriging-Based Genetic...
Efficient Design Exploration for Civil Aircraft Using a Kriging-Based Genetic...
 
ql.io at NodePDX
ql.io at NodePDXql.io at NodePDX
ql.io at NodePDX
 
Odp
OdpOdp
Odp
 
Code dive 2019 kamil witecki - should i care about cpu cache
Code dive 2019   kamil witecki - should i care about cpu cacheCode dive 2019   kamil witecki - should i care about cpu cache
Code dive 2019 kamil witecki - should i care about cpu cache
 

Recently uploaded

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...Call girls in Ahmedabad High profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Recently uploaded (20)

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 

List intersection for web search: Algorithms, Cost Models, and Optimizations

  • 1. LIST INTERSECTION FOR WEB SEARCH: ALGORITHMS, COST MODELS, AND OPTIMIZATIONS Sunghwan Kim (POSTECH),Taesung Lee (IBM Research AI), Seung-won Hwang (Yonsei University), Sameh Elnikety (Microsoft Research) VLDB 2019
  • 2. List Intersection inWeb Search Doc id Text 105 … research, so to generate the optimal query plan for the given scenario, as commonly used in the database systems… … … 592 … My research interests are in database system and data-driven intelligence, … … … ℐ 𝑥 document IDs database … 105 … 592 842 … system … 105 … 592 751 … research … 105 … 592 642 … data-driven … 321 … 592 632 … intelligence … 256 … 592 925 … Multi-word query in web search engine: list intersection of posting lists Corpus: document  word list Posting list: word  document list ℐ 𝑥 = posting list of word x 2
  • 3. Q. “database system research” List Intersection in Web Search Multi-word query in web search engine: list intersection of posting lists 3 ℐ 𝑥 = inverted list of word x
  • 4. Q. “database system research” A. ℐ(“database”) ∩ ℐ(“system”) ∩ ℐ(“research”) List Intersection in Web Search Multi-word query in web search engine: list intersection of posting lists 4 ℐ 𝑥 = inverted list of word x
  • 5. List Intersection inWeb Search Q. “database system research” A. ℐ(“database”)∩ℐ(“system”)∩ℐ(“research”) 5 Multi-word query in web search engine: list intersection of posting lists ℐ 𝑥 = inverted list of word x ℐ 𝑥 document IDs database … 105 … 592 842 … system … 105 … 592 751 … research … 105 … 592 642 … ∩ … 105 … 592 …
  • 12. List Intersection Algorithms 12 25 … 13 14 19 20 21 23 26 28 … j j 25 … 13 14 19 20 21 23 26 28 … pivot j j how?
  • 13. 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot 13<25 Scan-based algorithm (ex. Merge, SIMD) List Intersection Algorithms Search-based algorithm (ex. Binary search, Gallop) 13
  • 14. Scan-based algorithm (ex. Merge, SIMD) List Intersection Algorithms Search-based algorithm (ex. Binary search, Gallop) 14 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot 14<25
  • 15. Scan-based algorithm (ex. Merge, SIMD) List Intersection Algorithms Search-based algorithm (ex. Binary search, Gallop) 15 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot 26>25
  • 16. Scan-based algorithm (ex. Merge, SIMD) List Intersection Algorithms Search-based algorithm (ex. Binary search, Gallop) 16 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot 26>25 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot 13<25
  • 17. … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot 14<25 Scan-based algorithm (ex. Merge, SIMD) List Intersection Algorithms Search-based algorithm (ex. Binary search, Gallop) 17 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot 26>25
  • 18. … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot 20<25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot 28>25 Scan-based algorithm (ex. Merge, SIMD) List Intersection Algorithms Search-based algorithm (ex. Binary search, Gallop) 18 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot 26>25
  • 19. Scan-based algorithm (ex. Merge, SIMD) List Intersection Algorithms Search-based algorithm (ex. Binary search, Gallop) 19 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot 26>25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … binary search
  • 20. Scan-based algorithm (ex. Merge, SIMD) List Intersection Algorithms Search-based algorithm (ex. Binary search, Gallop) 20 … 13 14 19 20 21 23 26 28 …… 13 14 19 20 21 23 26 28 … j j move 6 elements … 13 14 19 20 21 23 26 28 …… 13 14 19 20 21 23 26 28 … j j move 6 elements
  • 21. Scan-based algorithm (ex. Merge, SIMD) List Intersection Algorithms Search-based algorithm (ex. Binary search, Gallop) 21 displacement (𝜹): distance of cursor moves
  • 22. Challenge for Optimization Scenario #1. length ratio 1:1 $(Scan-based) < $(Search-based) Scenario #2. length ratio 1:1000 $(Scan-based) > $(Search-based) 22 No method wins in every scenarios  requires query optimization 1MA B 1M 1KA B 1M
  • 23. Complexity of Cost Estimation 23 ■ Cost of a comparison is not uniform.
  • 24. Complexity of Cost Estimation 24 ■ Cost of a comparison is not uniform. ■ Modern architecture pipelines the execution of instructions. 10 10 10 10 10Fetch Decode Execute Memory Write Back 𝑡1 𝑡3 𝑡5𝑡2 𝑡4 time (cycle) pipeline stage
  • 25. Complexity of Cost Estimation 25 ■ Cost of a comparison is not uniform. ■ Modern architecture pipelines the execution of instructions. 10 10 10 10 10Fetch Decode Execute Memory Write Back 𝑡1 𝑡3 𝑡5𝑡2 𝑡4 pipeline stage 9 8 7 6 time (cycle)
  • 26. Complexity of Cost Estimation 26 ■ Cost of a comparison is not uniform. ■ Modern architecture pipelines the execution of instructions. 10 10 10 10 10Fetch Decode Execute Memory Write Back 𝑡1 𝑡3 𝑡5𝑡2 𝑡4 pipeline stage 9 8 7 6 9 11 12 11 13 14 1211 131211 987 98 time (cycle)
  • 27. Complexity of Cost Estimation 27 ■ Cost of a comparison is not uniform. ■ Modern architecture pipelines the execution of instructions. ■ Branch can block the pipeline. 10 10 10 10 10Fetch Decode Execute Memory Write Back 𝑡1 𝑡3 𝑡5𝑡2 𝑡4 time (cycle) pipeline stage 9 8 7 6 9 987 98 10 JEQ 6 ? Empty? 6 or 11?
  • 28. Complexity of Cost Estimation 28 ■ Cost of a comparison is not uniform. ■ Modern architecture pipelines the execution of instructions. ■ Branch can block the pipeline. – CPU predicts the result of branch. 10 10 10 10 10Fetch Decode Execute Memory Write Back 𝑡1 𝑡3 𝑡5𝑡2 𝑡4 time (cycle) pipeline stage 9 8 7 6 9 987 98 10 JEQ 6 ? Predict 6 or 11!
  • 29. Complexity of Cost Estimation 29 ■ Cost of a comparison is not uniform. ■ Modern architecture pipelines the execution of instructions. ■ Branch can block the pipeline. – CPU predicts the result of branch. 10 10 10 10 10Fetch Decode Execute Memory Write Back 𝑡1 𝑡3 𝑡5𝑡2 𝑡4 time (cycle) pipeline stage 9 8 7 6 9 987 98 10 JEQ 6 6 7 6 8 9 76 876 Predicted!
  • 30. 10 10 10 10 10 Complexity of Cost Estimation 30 ■ Cost of a comparison is not uniform. ■ Modern architecture pipelines the execution of instructions. ■ Branch can block the pipeline. – CPU predicts the result of branch. Fetch Decode Execute Memory Write Back 𝑡1 𝑡3 𝑡5𝑡2 𝑡4 time (cycle) pipeline stage 9 8 7 6 9 987 98 10 JEQ 6 6 7 6 8 9 76 876 Wrong!
  • 31. 10 10 10 10 10 Complexity of Cost Estimation 31 ■ Cost of a comparison is not uniform. ■ Modern architecture pipelines the execution of instructions. ■ Branch can block the pipeline. – CPU predicts the result of branch. – Failure: 10-40 cycles of penalty. ■ Branch Misprediction Fetch Decode Execute Memory Write Back 𝑡1 𝑡3 𝑡5𝑡2 𝑡4 time (cycle) pipeline stage 9 8 7 6 9 987 98 10 JEQ 6 LOST! Wrong!
  • 32. Complexity of Cost Estimation Access Results Penalty L1 hit 4 cycles L1 miss + L2 hit 12 cycles L2 miss + L3 hit 42 cycles L3 miss 42 cycles + RAM latency (200+ cycles) 32 ■ Cost of a comparison is not uniform. ■ Modern architecture pipelines the execution of instructions. ■ Branch can block the pipeline. – CPU predicts the result of branch. – Failure: 10-40 cycles of penalty. ■ Branch Misprediction ■ Cache/TLB misses are expensive. – From 12 to 200+ cycles of latency. Cache miss penalties in Intel Skylake
  • 33. Motivations 33 There is no single winner in all scenarios. Optimization based on the number of comparisons is not always optimal. Cost of a comparison is not always same in modern architecture
  • 39. Cost-based Query Optimizer Goal 1. Adaptive to different machines 39 Intel Xeon Algorithms Intel Coffee Lake Algorithms AMD Ryzen Algorithms
  • 40. Cost-based Query Optimizer Goal 2. Applicable for future algorithms 40 Intel Xeon Algorithms Intel Coffee Lake Algorithms AMD Ryzen AlgorithmsFuture Algorithms Future Algorithms Future Algorithms
  • 41. Cost-based Query Optimizer Cost Optimizer ■ Suggests cost optimal execution plan for given query. 41 Cost Optimizer
  • 42. Cost-based Query Optimizer Cost Optimizer ■ Suggests cost optimal execution plan for given query. Challenges ■ The cost of optimization should be negligible. ■ Thus, requires lightweight cost model with high accuracy. 42 Cost Optimizer
  • 43. Cost Model Cost model ■ Estimate cost of algorithms for given input ■ Consider input properties – Lengths and correlations 43 Correlations Lengths 8MA B 16M 4MA ∩ B
  • 44. Cost Model Cost model ■ Estimate cost of algorithms for given input ■ Consider input properties – Lengths and correlations Challenges ■ Cost depends on hardware properties. – eg. cache efficiency, branch misprediction. ■ Analysis of algorithm is complex. 44 Archite ctures Intel Xeon Archite ctures Algorithms Cost Model unit cost vector event counts
  • 45. Cost Model Procedures 1. Identify expensive events. – Loops, branches or memory accesses. 45 1. Identify events algorithms
  • 46. Cost Model Procedures 1. Identify expensive events. – Loops, branches or memory accesses. 2. Parametrize architecture properties. – unit cost: cost of an event execution. 46 unit cost vector Archite ctures Intel Xeon Archi- tecture 2. Parametrize 1. Identify events algorithms
  • 47. Cost Model Procedures 1. Identify expensive events. – Loops, branches or memory accesses. 2. Parametrize architecture properties. – unit cost: cost of an event execution. 3. Model function of event count. – e.g. # iterations, # mispredictions 47 unit cost vector event counts Archite ctures Intel Xeon Archi- tecture 𝑓(𝑞𝑢𝑒𝑟𝑦) 2. Parametrize 3. model 1. Identify events algorithms
  • 48. Cost Model Procedures 1. Identify expensive events. – Loops, branches or memory accesses. 2. Parametrize architecture properties. – unit cost: cost of an event execution. 3. Model function of event count. – e.g. # iterations, # mispredictions – Computes in real-time. 48 unit cost vector event counts Archite ctures Intel Xeon Archi- tecture 𝑓(𝑞𝑢𝑒𝑟𝑦) 2. Parametrize computes in real-time 3. model 1. Identify events algorithms
  • 49. Cost Model Procedures 1. Identify expensive events. – Loops, branches or memory accesses. 2. Parametrize architecture properties. – unit cost: cost of an event execution. 3. Model function of event count. – e.g. # iterations, # mispredictions – Computes in real-time. 49 Cost Model unit cost vector event counts 𝑓(𝑞𝑢𝑒𝑟𝑦)
  • 50. Step 1. Event Identification Event identification ■ Total cost = sum of event cost. 50 cost of an event
  • 51. Step 1. Event Identification Event identification ■ Total cost = sum of event cost. ■ Cost classification (2 levels) – 1-level: base latency, branch misprediction, memory overhead. 51 cost of an event
  • 52. Step 1. Event Identification Event identification ■ Total cost = sum of event cost. ■ Cost classification (2 levels) – 1-level: base latency, branch misprediction, memory overhead. – 2-level: identified events  expected to affect the entire cost. 52 cost of an event
  • 53. Step 2. Parametrize Unit Cost Parametrize unit cost ■ Learn in the machine by using the synthetic test set. ■ Use gradient descent solver. 53 unit cost vector Archite ctures Intel Xeon Archi- tecture Learning from the test set
  • 54. 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot A search of 2-Gallop algorithm Step 3. Event Count Estimation ■ Understanding displacement (𝜹) distribution is key to estimate the event counts. – displacement: distance of cursor moved. 54 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot A search of 2-Merge algorithm
  • 55. Step 3. Event Count Estimation ■ Understanding displacement (𝜹) distribution is key to estimate the event counts. – displacement: distance of cursor moved. 55 A search of 2-Gallop algorithm 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot displacement = 6 # comparison = 5 # reference = 6 # cache miss = 0 # MISP = 2
  • 56. Step 3. Event Count Estimation ■ Understanding displacement (𝜹) distribution is key to estimate the event counts. – displacement: distance of cursor moved. 56 displacement = 6 # comparison = 5 # reference = 6 # cache miss = 0 # MISP = 2 A search of 2-Gallop algorithm 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot
  • 57. Step 3. Event Count Estimation ■ Understanding displacement (𝜹) distribution is key to estimate the event counts. – eg. memory reference count of 2-Gallop: 2 𝐿1 0 |𝐿2| 𝑷 𝑫 = 𝜹 ∙ 1 + log2 𝜹 57 displacement distribution A search of 2-Gallop algorithm 25 … 13 14 19 20 21 23 26 28 … 25 … 13 14 19 20 21 23 26 28 … pivot displacement = 6 # comparison = 5 # reference = 6 # cache miss = 0 # MISP = 2
  • 58. Step 3. Event Count Estimation ■ Understanding displacement (𝜹) distribution is key to estimate the event counts. – eg. memory reference count of 2-Gallop: 2 𝐿1 0 |𝐿2| 𝑷 𝑫 = 𝜹 ∙ 1 + log2 𝜹 ■ Modeling the displacement distribution by using – Negative hypergeometric (NHG) distribution and Markov Chain 58 Displacement distribution (2-way, 𝐴 : |𝐵| = 1:1)
  • 59. Experimental Settings Machine ■ SandyBridge i7-3820 3.60GHz ■ IvyBridge i7-3770k 2.93Ghz Synthetic Dataset ■ Number of lists: 2 to 4 ■ Length ratio (min:max) : 1:1 to 1:1024 ■ Correlation: 0 to 1 Set of Algorithm in Optimizer ■ 2-way – 2-Merge, 2-Gallop, 2-SIMD, STL – SIMDV1, SIMDV3, SIMD Gallop [1] – SIMD Inoue [2] ■ k-way: k-Merge, k-Gallop ■ Build cost model of each algorithm. [1] D. Lemire, et. al., Simd compression and the intersection of sorted integers. Software: Practice and Experience, 2016. [2] H. Inoue, et. al., Faster set intersection with simd instructions by reducing branch mispredictions. VLDB 2014 59
  • 60. Accuracy of the Cost Model - 2-way and k-way 60 Accuracy of 2-way algorithm cost model Accuracy of k-way algorithm cost model
  • 61. Accuracy of the Cost Model - Adaptive to machine 61 in SandyBridge in IvyBridge Accuracy of 2-way algorithm cost model
  • 62. Optimizer Efficiency 62 Comparison of representative algorithms in four list synthetic dataset. (Bold: Best of all, Italic: Best among single algorithm)
  • 63. Conclusion ■ Cost analysis and estimation should take into account the properties of architecture. ■ Analyzing the probability of the algorithm’s operation provides the result of a more accurate cost estimation. ■ Based on two observations, we propose cost-based optimizer that is equipped with lightweight and accurate cost model. 63
  • 66. Top-down Analysis ■ Hardware profiling method ■ Finding bottleneck/hotspots ■ Cost decomposition 4 major cost factors ■ Based on the part utilization at each cycle. 66 Frontend bound Backend bound No overhead: Retiring Bad speculation Frontend→ ↓Backend Stall Fully Utilized Stall Bad speculation Backend bound Fully Utilized Frontend bound Retiring
  • 67. Step 1. Event Identification ■ Cycle accounting (eg. top-down analysis) – hard to identify causes of cost  hard to use for cost estimation. ■ We solve this mismatch by using cause-based pivoting. 67 Our cause-based pivoting
  • 68. Identify expensive events - Example: 2-Gallop 68 2-Gallop ■ Base latency (α) – Inner/outer loops (𝛼𝑖𝑛, 𝛼 𝑜𝑢𝑡) ■ Branch misprediction (β) – Inner/outer loops (𝛽𝑖𝑛, 𝛽𝑜𝑢𝑡) ■ Memory stalls (γ) – # references (𝛾 𝑚𝑒𝑚) – # cache misses (𝛾 𝑚𝑖𝑠𝑠) 𝛼 𝑜𝑢𝑡 𝛼𝑖𝑛 𝛽𝑜𝑢𝑡 𝛽𝑖𝑛 𝛾: memory references
  • 69. Identify expensive events - Example: 2-Merge 69 2-Merge ■ Base latency (α) ■ Branch misprediction (β) – Significant at low length ratio ■ Memory stalls (γ) – Marginal (due to sequential scan) 𝛼 𝛽: >, <, =
  • 70. Probabilistic Analysis of Algorithms - 2-way displacement analysis 𝛿 can be modeled by negative hypergeometric distribution (NHG) NHG(k; N, K, r) = k-th “success” shown up after r “failures” N population: K successes, N-K failures ex. NHG(4; 8, 4, 2) = LA dodgers wins Milwaukee Brewers by 4-2 in NLCS (assumption: winning distribution is uniform) x A (failure) B – A (success) z 𝛿 of search y 𝛿 of search z y 70
  • 71. Probabilistic Analysis of Algorithms - k-way displacement analysis displacement 𝛿𝑖: affected by k each search. 𝛿𝑖 = 𝑗=1 𝑘 𝛿𝑖𝑗 Decompose a displacement 𝛿𝑖 into several subdisplacements 𝛿𝑖𝑗. Model each 𝛿𝑖𝑗 through markov chain. If search successes, 𝑣𝑗−1 = 𝑣𝑗, 𝛿𝑖𝑗 = 0 Otherwise, 𝑣𝑗−1 ≠ 𝑣𝑗, 𝛿𝑖𝑗 follows NHG (similar with 2-way) 71

Editor's Notes

  1. Hi, I’m SungHwan from POSTECH Korea, and it is a great honor to present my work at VLDB. Today, I’ll talk about cost-based optimization for list intersection in modern architecture.
  2. Web search engine accesses the web corpus using the posting lists and uses intersection algorithms to process multi-keyword queries. Each posting list covers document IDs that contains assigned query keyword to the posting list. For example, the posting list for keyword ‘database’ contains document IDs that each document contains the word ‘database’ in its text.
  3. So when a multi-keyword query comes in,
  4. the search engine computes the query by
  5. intersecting posting lists corresponding to the given query keywords.
  6. The list intersection can be computed by sequence of searches like searching all elements in A to B.
  7. For example, we search the first value of A 25 from the other list, and If the search is failed, then
  8. we discard every of preceding elements then
  9. and we move on to the next element, 37,
  10. If the value is found in other list, then add the value to the result.
  11. and we repeat and repeat again until any of the cursor reach to the end of list.
  12. So the main algorithm design issue is that how we search an element from an array list.
  13. There are two main categories of list intersection algorithms; (clk) First one is scan-based approaches that scan elements through one by one,
  14. There are two main categories of list intersection algorithms; (clk) First one is scan-based approaches that scan elements through one by one,
  15. There are two main categories of list intersection algorithms; (clk) First one is scan-based approaches that scan elements through one by one,
  16. and the other is search-based approaches that try to jump over a few elements to reduce the number of comparisons.
  17. and the other is search-based approaches that try to jump over a few elements to reduce the number of comparisons.
  18. and the other is search-based approaches that try to jump over a few elements to reduce the number of comparisons.
  19. and the other is search-based approaches that try to jump over a few elements to reduce the number of comparisons.
  20. Regardless of which algorithm we use, the distance of cursor moveㄴ will not be changed for a specific search.
  21. In this case, the distances are six in both search. We call this distance as the displacement, that is the part that we want to analyze, which will be covered later.
  22. It might sound like a search-based algorithm skipping some elements should be clearly better than a scan-based algorithm going through all items, but the problem is not that simple. It means that there’s no single winner among intersection algorithms in all scenarios. For example, in scenario #1 with one million items in the two lists, the scan-based intersection approach has a lower cost than a search-based approach in general. On the other hand, in scenario #2 with only one thousand items in one list, and one million items in the other, a search-based approach is significantly faster than a scan-based algorithm.
  23. So to propose optimal plan for computing list intersection, we should understand the cost of each algorithm for the given input. However, the cost estimation is also challenging, because the cost of a comparison is not uniform so thus we cannot estimate the cost of list intersection by estimating the count of comparison.
  24. That's why the modern architectures pipeline the execution of instructions. For example, in 5-stage pipelined architecture, the execution of an instruction is executed across in 5 cycles with five steps.
  25. and for each cycle, modern architecture try to fully utilize its modules by parallelizing instruction execution. So at the time 1, not only to fetch number 10, prior instructions such as 6 to 9 are already in other pipeline stage.
  26. and similarly the architecture try to fill the pipelines in every cycle. It is working ideally if all the pipeline stages are completed in a cycle, and we have fixed sequence of instruction to execute,
  27. However if there is some conditional branch, then the problem is much complicated. For example, if the instruction 10 is a branch operation, then what is the next instruction after 10?
  28. It is unclear before the branch is identified, however as an evil CEO, the modern architecture does not allow their employee free, which means that they predict the result of branch and apply the prediction result to the pipeline.
  29. So the pipeline is filled based on predicted results.
  30. However, the problem is arising if the prediction is verified as wrong,
  31. In this case, we lost all of the pipeline result and we will spend penalties around to 10 to 40 cycles. We call this situation as the branch misprediction and this penalty as the branch misprediction penalty.
  32. The other overhead that breaks the scalability of the architecture is memory overhead mainly caused by cache or TLB misses. For example, in the Skylake architecture environment, we can access to the memory in only 4 cycles in best scenario, however in worst case, we spend several hundreds of cycles.
  33. So we identify two challenges as follows: the first is that no algorithm wins in all scenarios. The second is that the cost of a comparison is not always same. So, the computation plan should be carefully selected by considering the properties of the given query and the server machine.
  34. The role of query optimizer is to suggest plan of query computation for a given query.
  35. For example, for the query “database systems”, the optimizer suggests plan to compute the query with
  36. fastest algorithm for the given input, like 2-Merge, and
  37. and in some other scenario it can suggest to use other algorithm.
  38. Not only to work in a specific architecture with a fixed set of algorithms, we have two main objectives,
  39. the first is to propose optimization method which can be adapt to the different architectures like state-of-the-art of AMD or Intel.
  40. the second goal is to propose general approach of algorithm analysis for the cost estimation that can be applicable for the future algorithms.
  41. So the objectives of cost optimizer can be formally defined as follows that is providing cost optimal execution plan for the given query, and
  42. The main challenge is that the run-time cost of the optimizer should be negligible compared to the benefit from the optimization, so it requires high accuracy cost model with a few computational overhead.
  43. We develop a lightweight cost model that takes lists as input and returns the expected cost by considering two properties: one is the lengths of lists and the other is the correlations between lists.
  44. To build high accuracy cost model, we need to model both hardware and algorithm properties. unit cost 는 어떤 vector 일 것같고 등등의 이해를 돕도록
  45. So we will introduce the procedure of generating cost model. The first one is to identify expensive events from the algorithm implementation, such as loops, branches or memory references.
  46. And then, parametrize architecture properties into the unit cost which represents the cost of an event execution.
  47. Then model the function of event count that accepts query, then returns the vector of event counts such as the number of iterations of loop or the number of branch misprediction for a specific branch.
  48. As the event count rely on the input properties, the event count vector is computed at query-time.
  49. and we call this framework as the cost model.
  50. So we formulate the model by decomposing the total cost into the sum of event costs.
  51. In detail, we first divide cost of an application into three important factors: base latency, and two overheads of misprediction and memory access. // branch misprediction이랑 memory overhead에 대해서 앞에 다루어야 함.
  52. Then, we further classify the cost into events according to which is expected to affect the entire cost significantly.
  53. and we parametrize unit cost of each event in the given machine by learning from the synthetic test set. To reduce error, we adopt gradient descent solver in learning stage.
  54. The next step is to compute count of identified events, by understanding the behavior of algorithms. As mentioned earlier, there are several algorithms, but even if the algorithms differ, the displacement are the same, which is the distance of cursor move. We nominate this displacement as the important feature of event count analysis.
  55. For example, for given displacement for a single search, we can calculate event counts of the search based on the displacement,
  56. we further can formulate function of each count corresponding to the displacement.
  57. Then, we can formulate the estimated event count by understanding the displacement distribution. For example, we can compute total memory reference count by using the distribution and function of memory references corresponding to displacement.
  58. And we formulate the distribution by using negative hypergeometric distribution and markov chain. This graph shows an example of the displacement distribution for two equal length lists. So the average of displacement is 1 in this case, but at 50% search, the cursor does not move forward to the next value. See our paper to check further details about the modeling 왼쪽 오른쪽 그래프의 차이가 뭐? 설명 안할 것이면 아예 안보여주는게 좋지 않을까 오른쪽이 continuous 하다는 오해를 주면 안됨. 보여주려면 보여주려는 것만 차이를 부각하여 설명. 설명하려는 내용만 차이점을 부각. 경재: 그래프를 빨간색, 파란색 으로 구분? 두 그래프의 모양이 동일한데 scale이 다른건데.. ??
  59. Next, we demonstrate our work. As our goal is to show efficiency, effectiveness and adaptiveness of our method, we test our method in wide scenario with various state-of-the-art algorithms in two machines.
  60. We first introduce accuracy of the cost model. As we can see, the estimated cost well follows the actual cost in wide range of length ratio. This represents that we successfully modeled most of hardware related features.
  61. Also, our approach can adapt to the different machine very well. Left is the result in SandyBridge, and right is the result in IvyBridge machines.
  62. We also demonstrated that our cost-based optimizer is definitely better than using a single algorithm at every stages.
  63. So the key messages are that cost analysis should be reflecting the characteristics of the architecture such as branch misprediction and memory references, and analyzing the behavior of algorithm can provide much more accurate cost estimation. Based on these observations, we propose a new cost-based optimizer which is equipped with lightweight and accurate cost model.
  64. Thank you very much to listen my presentation, and please visit my poster session today to have further discussions. Thank you!
  65. The analysis of computation is very complex. State-of-the-art analysis method, named top-down is suitable for reverse engineering of computation such as finding bottlenecks, but is not suitable for cost estimation. So thus, we translate this method into our new event-based languages.
  66. Therefore, the new cost classifications are required to resolve this mismatch, and we introduce new kind of cost factors by using cause-based pivoting. So we first classify the cost into base latency, branch misprediction and memory overhead.
  67. In our work, for each algorithm, we first analysis the cost of the algorithm, then Then find cost parameter of the algorithm.
  68. In our work, for each algorithm, we first analysis the cost of the algorithm, then Then find cost parameter of the algorithm.
  69. With this definition, we can get the distribution by NHG model. The NHG model provides probability of k ”successes” before r failures are appeared. So we can compute the probability of SK wins Doosan by that.
  70. Next, the distribution in k-way computation is much complex, because we may need to accumulative result in all round-robin process. To solve this problem, we divide displacement into k subdisplacements, and solve each problem by using the markov chain. The markov chain provides the probability of successful search and unsuccessful search, and for unsuccessful search, we can get distribution by NHG, otherwise the subdisplacement is 0.