SlideShare a Scribd company logo
1 of 26
Download to read offline
London Information Retrieval Meetup
19 Feb 2019
Improving Top-K Retrieval
Algorithms Using Dynamic
Programming and Longer
Skipping
Elia Porciani, Software Engineer
19th February 2019
Introduction
●Top-k retrieval and inverted index
●Introduction to early termination techniques
●Block max wand
Faster BlockMax WAND with Variable-sized Blocks
A Mallia, G Ottaviano, E Porciani, N Tonellotto, R Venturini
SIGIR, 2017
Faster BlockMax WAND with Longer Skipping
A Mallia, E Porciani
ECIR, 2019
Queries over search engines
Inverted index
Documents
term1 term2
term3 term4
term5
Inverted index compression
We compressed posting lists with
partitioned Elias-Fano.
Giuseppe Ottaviano and Rossano Venturini. Partitioned elias-fano
indexes. In Proceedings of the 37th International ACM SIGIR Conference
on Research; Development in Information Retrieval, SIGIR ’14
1 2 5 7 12 13 14 20Inverted List
1 1 3 2 5 1 1 6Dgaps
Only few bits are necessary to store
each item of an inverted list
Top-K Retrieval
We are interested only in the first
K documents, with k small.
Tf-Idf
In details, we use OKAPI BM25
Term frequency
Inverse document frequency
1 2 5 7 12 13 14 20Doc-id
3 2 1 8 2 4 6 2Frequencies
Term
tfij =
|nij |
|dj |
idfi =
|D|
|d : i ∈ d |
Inverted list iterator operations
next() Find the next document Id
nextGEQ(k) Find the next document id in the list with id >= k
score() Compute the score of the current document id,
considering the frequency associated
The score() function involves in
floating point computations
Iterating over inverted index is
expansive because it is
compressed.
Ranked-Or
Doc-Id
T1
T2
T3
T4
T5
Early termination techniques
●It is not necessary to compute the score function on all the
postings.
●Max score
●Wand
●BlockMaxWand These algorithms compute the exact top-k
documents.
Wand
Doc-Id
T1
T2
T3
T4
T5
𝜭 = 15.8
Ms = 5.4
Ms = 5.0
Ms = 4.2
Ms = 4.3
Ms = 2.3
Pivot
List
Sum = 5.4
Sum = 14.6
Sum = 9.6
Sum = 16.9
Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien. Efficient query evaluation using a two-level retrieval
process., CIKM ’03,
Block-Max-Wand
Doc-Id
T1
T2
T3
T4
T5
𝜭 = 15.8
3.2 4.2
4.5 5.4 2.1
2.1 3.2
2.1 2.3 0.8
3.5 1.4
4.1
2.7
Pivot
List
Block max upper
estimation = 10.2
Ms = 5.4
Ms = 5.0
Ms = 4.2
Ms = 4.3
Ms = 2.3
Sum = 5.4
Sum = 14.6
Sum = 9.6
Sum = 16.9
The less is the average
approximation error, the better are
performance.
Shuai Ding and Torsten Suel. Faster top-k document retrieval using block-max indexes. SIGIR ’11
2.0
Block Max Wand
1.Pivot selection as in wand.
2.Compute block max contributions (blockmaxsum) of the pivot doc-id
3.If block max sum overcomes the threshold:
1.Full evaluate the document of the pivot.
2.Move iterator to pivot.docid + 1
4.Otherwise, move iterator to the leftmost boundary of the blocks evaluated.
Partitioning
Fixed size blocks Variable size blocks
∑
𝑏∈𝐵
(max(𝑠  ∈ 𝑏) 𝑏   −  
∑
𝑠 ∈ 𝑏 
𝑠 )
𝑏
min
∑
𝑏∈𝐵
(max(𝑠  ∈ 𝑏) 𝑏 ) + 𝑆
Shortest Path Problem
• V postings sorted by their position in the list
• E every possible block in the list
• C(i,j) is the approximation error
We add a fix cost F to the cost
function C(i,j)
O(n2
)
Approximation algorithm
● Monotonicity: Quasi-subaddictivity:
𝑂( 𝑛2
) → 𝑂(𝑛 log 𝑈 ) 𝑂( 𝑛log𝑈) → 𝑂(𝑛)
C(i, j) ≤ C(i, j + 1)
C(i, j) ≤ C(i − 1,j)
G1
= {(i, j) ∈ G|∃k . C(i, j) ≤ F(1 + α)k
≤ C(i, j + 1)}
C(i, k) + C(k + 1,j) ≤ C(i, j) + F + 1
G2
= {(i, j) ∈ G1
|C(i, j) ≤ F/β}
sp(G2
) = (1 + α)(1 + β)sp(G)
Experimental analysis
Collection Size
Size after
compression
# lists # postings # documents
Gov2 44 GiB 4.4 GiB 35 millions 6 billions 24 millions
Clueweb09 120 GiB 15 GiB 92 millions 15 billions 50 millions
● Trec2005 and Trec2006 query collections.
● The code is written in C++ 11 and compiled with GCC 5.3.1 with the highest optimization settings
and it is executed on a 8-core i7-4790K with 32GiB ram running Linux kernel v. 4.4.0.
Choosing block size
Block size Block size
Block-Max-Wand Compression
Maximum impact
element
Boundary doc-id
Block-Max-Wand Compression (score quantization)
Uniform partitioning
Opt partitioning
Sort
Compression algorithms comparisons
Gov2 Clueweb09
Trec2005 Trec2006 Trec2005 Trec2006
Wand 7.06 (1.92x) 12.92 (1.55x) 28.85 (2.25x) 37.55 (1.40x)
MaxScore 6.59 (1.79x) 11.35 (1.36x) 23.58 (1.84x) 32.28 (1.21x)
BMW 3.67 8.33 12.81 26.64
Gov2 Clueweb09
Plain index 6.91 8.36
Wand/MaxScore 7.24 (1.04x) 8.65 (1.03x)
BMW/VBMW 9.14 (1.32x) 10.68 (1.27x)
VBMW c. 8.07 (1.16x) 9.51 (1.13x)
Gov2 Clueweb09
Trec2005 Trec2006 Trec2005 Trec2006
Wand 7.06 (3.34x) 12.92 (2.72x) 28.85 (3.98x) 37.55 (2.55x)
MaxScore 6.59 (3.12x) 11.35 (2.39x) 23.58 (3.25x) 32.28 (2.11x)
BMW 3.67 (1.73x) 8.33 (1.75x) 12.81 (1.77x) 26.64 (1.74x)
VBMW 2.11 4.75 7.25 15.30
VBMW c. 2.35 (1.11x) 5.29 (1.11x ) 8.21 (1.13x ) 17.00 (1.11x )
Time in ms
Space in bits
per posting
Longer skipping
We can do better than skip
at the block boundary.
Ls-boundaryBoundary
Iterate over the
blocks at runtime
Add a pointer per
block
Block-Max-Wand
Doc-Id
T1
T2
T3
T4
T5
𝜭 = 15.8
3.2 4.2
4.5 5.4 2.1
2.1 3.2
2.1 2.3 0.8
3.5 1.4
4.1
2.7
Pivot
List
Block max upper
estimation = 10.2
Ms = 5.4
Ms = 5.0
Ms = 4.2
Ms = 4.3
Ms = 2.3
Sum = 5.4
Sum = 14.6
Sum = 9.6
Sum = 16.9
Shuai Ding and Torsten Suel. Faster top-k document retrieval using block-max indexes. SIGIR ’11
2.0
Longer skipping
2 3 4 5 6+
VBMW 3.17 (1.45x) 6.39 (1.13x) 8.92 (1.04x) 14.46 (1.00x) 32.04 (1.03x)
VBMW LS 2.18 5.66 8.57 14.44 31.05 (1.04x)
VBMW c. 3.53 (1.31x) 6.97 (1.15x) 9.86 (1.04x) 16.06 (1.00x) 36.26 (1.01x)
VBMW LSP c. 2.68 6.3 9.52 16.07 36.01
ClueWeb - Trec2005
Thank you

More Related Content

What's hot

Java - File Input Output Concepts
Java - File Input Output ConceptsJava - File Input Output Concepts
Java - File Input Output ConceptsVicter Paul
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Rakebul Hasan
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Josef Hardi
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionDeeksha thakur
 
Java methods or Subroutines or Functions
Java methods or Subroutines or FunctionsJava methods or Subroutines or Functions
Java methods or Subroutines or FunctionsKuppusamy P
 
Document Classification and Clustering
Document Classification and ClusteringDocument Classification and Clustering
Document Classification and ClusteringAnkur Shrivastava
 
Finding Similar Files in Large Document Repositories
Finding Similar Files in Large Document RepositoriesFinding Similar Files in Large Document Repositories
Finding Similar Files in Large Document Repositoriesfeiwin
 
Collections - Array List
Collections - Array List Collections - Array List
Collections - Array List Hitesh-Java
 
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...
 Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ... Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...Vladimir Alexiev, PhD, PMP
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and SolrTommaso Teofili
 
Sparql semantic information retrieval by
Sparql semantic information retrieval bySparql semantic information retrieval by
Sparql semantic information retrieval byIJNSA Journal
 
An Integrated Framework on Mining Logs Files for Computing System Management
An Integrated Framework on Mining Logs Files for Computing System ManagementAn Integrated Framework on Mining Logs Files for Computing System Management
An Integrated Framework on Mining Logs Files for Computing System Managementfeiwin
 
Linq And Its Impact On The.Net Framework
Linq And Its Impact On The.Net FrameworkLinq And Its Impact On The.Net Framework
Linq And Its Impact On The.Net Frameworkrushputin
 
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils FlywebJun Zhao
 
Java Input Output (java.io.*)
Java Input Output (java.io.*)Java Input Output (java.io.*)
Java Input Output (java.io.*)Om Ganesh
 

What's hot (19)

Java - File Input Output Concepts
Java - File Input Output ConceptsJava - File Input Output Concepts
Java - File Input Output Concepts
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
Java methods or Subroutines or Functions
Java methods or Subroutines or FunctionsJava methods or Subroutines or Functions
Java methods or Subroutines or Functions
 
Document Classification and Clustering
Document Classification and ClusteringDocument Classification and Clustering
Document Classification and Clustering
 
Text categorization
Text categorizationText categorization
Text categorization
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Finding Similar Files in Large Document Repositories
Finding Similar Files in Large Document RepositoriesFinding Similar Files in Large Document Repositories
Finding Similar Files in Large Document Repositories
 
Advanced R cheat sheet
Advanced R cheat sheetAdvanced R cheat sheet
Advanced R cheat sheet
 
Collections - Array List
Collections - Array List Collections - Array List
Collections - Array List
 
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...
 Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ... Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Sparql semantic information retrieval by
Sparql semantic information retrieval bySparql semantic information retrieval by
Sparql semantic information retrieval by
 
An Integrated Framework on Mining Logs Files for Computing System Management
An Integrated Framework on Mining Logs Files for Computing System ManagementAn Integrated Framework on Mining Logs Files for Computing System Management
An Integrated Framework on Mining Logs Files for Computing System Management
 
Linq And Its Impact On The.Net Framework
Linq And Its Impact On The.Net FrameworkLinq And Its Impact On The.Net Framework
Linq And Its Impact On The.Net Framework
 
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils Flyweb
 
Java Input Output (java.io.*)
Java Input Output (java.io.*)Java Input Output (java.io.*)
Java Input Output (java.io.*)
 
Overloadingmethod
OverloadingmethodOverloadingmethod
Overloadingmethod
 

Similar to Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Skipping

Coco co-desing and co-verification of masked software implementations on cp us
Coco   co-desing and co-verification of masked software implementations on cp usCoco   co-desing and co-verification of masked software implementations on cp us
Coco co-desing and co-verification of masked software implementations on cp usRISC-V International
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Spark Summit
 
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and VisualizationSurasak Sanguanpong
 
DReAMS: High Performance Reconfigurable Computing at NECSTLab
DReAMS: High Performance Reconfigurable Computing at NECSTLabDReAMS: High Performance Reconfigurable Computing at NECSTLab
DReAMS: High Performance Reconfigurable Computing at NECSTLabNECST Lab @ Politecnico di Milano
 
A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...
A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...
A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...NECST Lab @ Politecnico di Milano
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Florian Lautenschlager
 
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...Ryohei Kobayashi
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceRobert Grossman
 
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...pgdayrussia
 
the 10 Top Wireshark Tips & Tricks Megumi Takeshita Sharkfest2017
the 10 Top Wireshark Tips & Tricks Megumi Takeshita Sharkfest2017the 10 Top Wireshark Tips & Tricks Megumi Takeshita Sharkfest2017
the 10 Top Wireshark Tips & Tricks Megumi Takeshita Sharkfest2017Megumi Takeshita
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...NECST Lab @ Politecnico di Milano
 

Similar to Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Skipping (20)

TiReX: Tiled Regular eXpression matching architecture
TiReX: Tiled Regular eXpression matching architectureTiReX: Tiled Regular eXpression matching architecture
TiReX: Tiled Regular eXpression matching architecture
 
Binary Analysis - Luxembourg
Binary Analysis - LuxembourgBinary Analysis - Luxembourg
Binary Analysis - Luxembourg
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
Xbfs HPDC'2019
Xbfs HPDC'2019Xbfs HPDC'2019
Xbfs HPDC'2019
 
Coco co-desing and co-verification of masked software implementations on cp us
Coco   co-desing and co-verification of masked software implementations on cp usCoco   co-desing and co-verification of masked software implementations on cp us
Coco co-desing and co-verification of masked software implementations on cp us
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
 
Super-Encryption Cryptography with IDEA and WAKE Algorithm
Super-Encryption Cryptography with IDEA and WAKE AlgorithmSuper-Encryption Cryptography with IDEA and WAKE Algorithm
Super-Encryption Cryptography with IDEA and WAKE Algorithm
 
Sbst2018 contest2018
Sbst2018 contest2018Sbst2018 contest2018
Sbst2018 contest2018
 
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and Visualization
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
Report
ReportReport
Report
 
DReAMS: High Performance Reconfigurable Computing at NECSTLab
DReAMS: High Performance Reconfigurable Computing at NECSTLabDReAMS: High Performance Reconfigurable Computing at NECSTLab
DReAMS: High Performance Reconfigurable Computing at NECSTLab
 
High Performance Reconfigurable Computing at NECSTLab
High Performance Reconfigurable Computing at NECSTLabHigh Performance Reconfigurable Computing at NECSTLab
High Performance Reconfigurable Computing at NECSTLab
 
A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...
A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...
A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
 
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data Science
 
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
 
the 10 Top Wireshark Tips & Tricks Megumi Takeshita Sharkfest2017
the 10 Top Wireshark Tips & Tricks Megumi Takeshita Sharkfest2017the 10 Top Wireshark Tips & Tricks Megumi Takeshita Sharkfest2017
the 10 Top Wireshark Tips & Tricks Megumi Takeshita Sharkfest2017
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
 

More from Sease

Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors LuceneSease
 
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...Sease
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaSease
 
Introducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneIntroducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneSease
 
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Sease
 
How does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveHow does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveSease
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaSease
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache SolrSease
 
Large Scale Indexing
Large Scale IndexingLarge Scale Indexing
Large Scale IndexingSease
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfSease
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Sease
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfSease
 
How to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptxHow to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptxSease
 
Online Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingOnline Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingSease
 
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Sease
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationSease
 
Advanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneAdvanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneSease
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSease
 
Introduction to Music Information Retrieval
Introduction to Music Information RetrievalIntroduction to Music Information Retrieval
Introduction to Music Information RetrievalSease
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationSease
 

More from Sease (20)

Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors Lucene
 
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With Kibana
 
Introducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneIntroducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache Lucene
 
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
 
How does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveHow does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspective
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With Kibana
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
 
Large Scale Indexing
Large Scale IndexingLarge Scale Indexing
Large Scale Indexing
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
 
How to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptxHow to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptx
 
Online Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingOnline Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr Interleaving
 
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
 
Advanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneAdvanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache Lucene
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
 
Introduction to Music Information Retrieval
Introduction to Music Information RetrievalIntroduction to Music Information Retrieval
Introduction to Music Information Retrieval
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Skipping

  • 1. London Information Retrieval Meetup 19 Feb 2019 Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Skipping Elia Porciani, Software Engineer 19th February 2019
  • 2. Introduction ●Top-k retrieval and inverted index ●Introduction to early termination techniques ●Block max wand Faster BlockMax WAND with Variable-sized Blocks A Mallia, G Ottaviano, E Porciani, N Tonellotto, R Venturini SIGIR, 2017 Faster BlockMax WAND with Longer Skipping A Mallia, E Porciani ECIR, 2019
  • 5. Inverted index compression We compressed posting lists with partitioned Elias-Fano. Giuseppe Ottaviano and Rossano Venturini. Partitioned elias-fano indexes. In Proceedings of the 37th International ACM SIGIR Conference on Research; Development in Information Retrieval, SIGIR ’14 1 2 5 7 12 13 14 20Inverted List 1 1 3 2 5 1 1 6Dgaps Only few bits are necessary to store each item of an inverted list
  • 6. Top-K Retrieval We are interested only in the first K documents, with k small.
  • 7. Tf-Idf In details, we use OKAPI BM25 Term frequency Inverse document frequency 1 2 5 7 12 13 14 20Doc-id 3 2 1 8 2 4 6 2Frequencies Term tfij = |nij | |dj | idfi = |D| |d : i ∈ d |
  • 8. Inverted list iterator operations next() Find the next document Id nextGEQ(k) Find the next document id in the list with id >= k score() Compute the score of the current document id, considering the frequency associated The score() function involves in floating point computations Iterating over inverted index is expansive because it is compressed.
  • 10. Early termination techniques ●It is not necessary to compute the score function on all the postings. ●Max score ●Wand ●BlockMaxWand These algorithms compute the exact top-k documents.
  • 11. Wand Doc-Id T1 T2 T3 T4 T5 𝜭 = 15.8 Ms = 5.4 Ms = 5.0 Ms = 4.2 Ms = 4.3 Ms = 2.3 Pivot List Sum = 5.4 Sum = 14.6 Sum = 9.6 Sum = 16.9 Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien. Efficient query evaluation using a two-level retrieval process., CIKM ’03,
  • 12. Block-Max-Wand Doc-Id T1 T2 T3 T4 T5 𝜭 = 15.8 3.2 4.2 4.5 5.4 2.1 2.1 3.2 2.1 2.3 0.8 3.5 1.4 4.1 2.7 Pivot List Block max upper estimation = 10.2 Ms = 5.4 Ms = 5.0 Ms = 4.2 Ms = 4.3 Ms = 2.3 Sum = 5.4 Sum = 14.6 Sum = 9.6 Sum = 16.9 The less is the average approximation error, the better are performance. Shuai Ding and Torsten Suel. Faster top-k document retrieval using block-max indexes. SIGIR ’11 2.0
  • 13. Block Max Wand 1.Pivot selection as in wand. 2.Compute block max contributions (blockmaxsum) of the pivot doc-id 3.If block max sum overcomes the threshold: 1.Full evaluate the document of the pivot. 2.Move iterator to pivot.docid + 1 4.Otherwise, move iterator to the leftmost boundary of the blocks evaluated.
  • 14. Partitioning Fixed size blocks Variable size blocks ∑ 𝑏∈𝐵 (max(𝑠  ∈ 𝑏) 𝑏   −   ∑ 𝑠 ∈ 𝑏  𝑠 ) 𝑏 min ∑ 𝑏∈𝐵 (max(𝑠  ∈ 𝑏) 𝑏 ) + 𝑆
  • 15. Shortest Path Problem • V postings sorted by their position in the list • E every possible block in the list • C(i,j) is the approximation error We add a fix cost F to the cost function C(i,j) O(n2 )
  • 16. Approximation algorithm ● Monotonicity: Quasi-subaddictivity: 𝑂( 𝑛2 ) → 𝑂(𝑛 log 𝑈 ) 𝑂( 𝑛log𝑈) → 𝑂(𝑛) C(i, j) ≤ C(i, j + 1) C(i, j) ≤ C(i − 1,j) G1 = {(i, j) ∈ G|∃k . C(i, j) ≤ F(1 + α)k ≤ C(i, j + 1)} C(i, k) + C(k + 1,j) ≤ C(i, j) + F + 1 G2 = {(i, j) ∈ G1 |C(i, j) ≤ F/β} sp(G2 ) = (1 + α)(1 + β)sp(G)
  • 17. Experimental analysis Collection Size Size after compression # lists # postings # documents Gov2 44 GiB 4.4 GiB 35 millions 6 billions 24 millions Clueweb09 120 GiB 15 GiB 92 millions 15 billions 50 millions ● Trec2005 and Trec2006 query collections. ● The code is written in C++ 11 and compiled with GCC 5.3.1 with the highest optimization settings and it is executed on a 8-core i7-4790K with 32GiB ram running Linux kernel v. 4.4.0.
  • 18. Choosing block size Block size Block size
  • 20. Block-Max-Wand Compression (score quantization) Uniform partitioning Opt partitioning Sort
  • 22. Gov2 Clueweb09 Trec2005 Trec2006 Trec2005 Trec2006 Wand 7.06 (1.92x) 12.92 (1.55x) 28.85 (2.25x) 37.55 (1.40x) MaxScore 6.59 (1.79x) 11.35 (1.36x) 23.58 (1.84x) 32.28 (1.21x) BMW 3.67 8.33 12.81 26.64 Gov2 Clueweb09 Plain index 6.91 8.36 Wand/MaxScore 7.24 (1.04x) 8.65 (1.03x) BMW/VBMW 9.14 (1.32x) 10.68 (1.27x) VBMW c. 8.07 (1.16x) 9.51 (1.13x) Gov2 Clueweb09 Trec2005 Trec2006 Trec2005 Trec2006 Wand 7.06 (3.34x) 12.92 (2.72x) 28.85 (3.98x) 37.55 (2.55x) MaxScore 6.59 (3.12x) 11.35 (2.39x) 23.58 (3.25x) 32.28 (2.11x) BMW 3.67 (1.73x) 8.33 (1.75x) 12.81 (1.77x) 26.64 (1.74x) VBMW 2.11 4.75 7.25 15.30 VBMW c. 2.35 (1.11x) 5.29 (1.11x ) 8.21 (1.13x ) 17.00 (1.11x ) Time in ms Space in bits per posting
  • 23. Longer skipping We can do better than skip at the block boundary. Ls-boundaryBoundary Iterate over the blocks at runtime Add a pointer per block
  • 24. Block-Max-Wand Doc-Id T1 T2 T3 T4 T5 𝜭 = 15.8 3.2 4.2 4.5 5.4 2.1 2.1 3.2 2.1 2.3 0.8 3.5 1.4 4.1 2.7 Pivot List Block max upper estimation = 10.2 Ms = 5.4 Ms = 5.0 Ms = 4.2 Ms = 4.3 Ms = 2.3 Sum = 5.4 Sum = 14.6 Sum = 9.6 Sum = 16.9 Shuai Ding and Torsten Suel. Faster top-k document retrieval using block-max indexes. SIGIR ’11 2.0
  • 25. Longer skipping 2 3 4 5 6+ VBMW 3.17 (1.45x) 6.39 (1.13x) 8.92 (1.04x) 14.46 (1.00x) 32.04 (1.03x) VBMW LS 2.18 5.66 8.57 14.44 31.05 (1.04x) VBMW c. 3.53 (1.31x) 6.97 (1.15x) 9.86 (1.04x) 16.06 (1.00x) 36.26 (1.01x) VBMW LSP c. 2.68 6.3 9.52 16.07 36.01 ClueWeb - Trec2005