SlideShare a Scribd company logo
1 of 25
Query Reranking As A
Service
ABOLFAZL ASUDEH
NAN ZHANG
GAUTAM DAS
© 2016 VLDB Endowment 21508097/16/03
UNIVERSITY OF TEXAS AT ARLINGTON
GEORGE WASHINGTON UNIVERSITY
UNIVERSITY OF TEXAS AT ARLINGTON
Ranked Retrieval Model
Practical Challenges
How to design a small number of “good” ranking functions when different users often have
diverse and sometimes contradicting preferences
◦ e.g., “transit time” in airfare search, “mileage per year” in used car search
◦ Similarly, users with disabilities or special needs often need special ranking functions
Some database owners lack the expertise, resources, or even motivation to properly study the
requirements of their users
Motivation
Backend
DB
Top-k
web
Interface
Reranking
Service
User
filtering+
ranking
Applications
Meta-Ranking Service
◦ remember user preferences across multiple web databases (e.g., multiple car dealers)
Applications dedicated for users with disabilities, special needs, etc.
Critical Requirements
Accuracy: The output query answer must precisely follow the user-specified ranking function
Versatility: Support a wide variety of user-specific ranking function (so long as it is monotonic).
Allow pass-through of selection conditions. No knowledge of system ranking function.
Efficiency: the number of queries issued to the web database must be minimized.
◦ Because almost all such databases enforce certain query-rate limit.
◦ e.g., Google Flights allows only 50 free queries per user per day.
Solution Novelty: Baselines
Crawling the entire database
◦ Problem: efficiency
”Page down” [TZD13a, TZD13b] to retrieve more than k tuples
◦ Problem: no guarantee of accuracy unless retrieving all n tuples
Database Model
Hidden (web) Database
◦ Limited query interface
◦ Ranked Retrieval Model (Top-k)
Problem Statement
Query Reranking (Get-Next) Problem:
◦ Consider: a hidden web database D with a top-k interface
◦ with an arbitrary, unknown, system ranking function.
◦ Given:
◦ a user query q
◦ a user-specied monotonic ranking function S
◦ the top-h (h ≥ 0 can be greater than, equal to, or smaller than k) tuples satisfying q according to S
◦ Find
◦ the No. (h + 1) tuple for q while minimizing the number of queries issued D.
1D-RERANK
1D Re-Ranking Problem
Given:
◦ An attribute
◦ th, the h-ranked tuple according to the attribute
Find:
◦ th+1, the (h+1)-ranked tuple according to the attribute
Why?
◦ 1D  MD using TA or Fagin’s algorithm
◦ Rank by “transit time”
1D-BASELINE
th
q0
t0
q1
t1
q2
th+1
NULL
Problem: the query cost depend on the correlation between the system ranking function and
the attribute!
Negative Result: no algorithm can outperform 1D-BASELINE in the worst-case scenario.
1D-BINARY
th
q0
t0
q1
t1
q2
th+1
q3
NULL NULL
query cost:
1D-RERANK
Key Observation: dense regions  repeated incurrence of high query cost for re-ranking
Key Idea: Crawl dense regions to benefit future re-ranking requests (On-The-Fly Indexing)
follow 1D-BINARY
until search region
is not dense yet
Oracle
ask from oracle
if region is dense
MD-RERANK
Applying Sorted-List Algorithms (e.g. TA)
on 1D-RERANK
The Get-next operation designed by 1D-RERANK can
get used to develop a sorted-access top-k algorithm
(e.g. TA) on top of it and enable HD Get-next.
It has a major efficiency problem, mainly because of
not leveraging the full power provided by client-
server databases.
MD-BASELINE
q0: select *  t0
q1: where x < t0[x] and y < l(y)  null
q2: where x ≥ t0[x] and x < l(x) and y < t0[y]  t1
q3: where x ≥ t0[x] and x < b(x) and y < t0[y]  null
q4: where x ≥ b(x) and x < I(x) and y < b(y) null
q5: where x < t1[x] and y < t1[y]  null
x
y
t0
l(y)
l(x)
t1
b(y)
b(x)

apply 1D-RERANK
MD-BINARY
q0: select *  t0
q1: where x < t0[x] and y < l(y)  null
q2: where x ≥ t0[x] and x < l(x) and y < t0[y]  t1
q3: where x ≥ t0[x] and x < v0[x] and y < v0[y]  null
q4: where x ≥ t0[x] and x < b(x) and y < t0[y]  null
q5: where x ≥ b(x) and x < I(x) and y < b(y) null
q6: where x < t1[x] and y < t1[y]  null
x
y
t0
l(y)
l(x)
t1

V0
apply 1D-RERANK
b(x)
b(y)
MD-RERANK
High-level idea: follow MD-BINARY until a remaining search space (1) is covered by a region in
the index; or (2) has volume smaller than the threshold.
On-The-Fly Indexing: proactively record as an index densely located tuples once we encounter
them, so that we do not need to incur a high query cost every time a query q triggers visits to
the same dense region.
Experiments setup
Simulating the Top-k web interface on top of an offline dataset.
◦ US Department of Transportation (DOT): 457,013 tuples and over 28 attributes.
◦ Considered two system ranking functions:
◦ SR1(positively correlated ): 0.3 AIR-TIME + TAXI-IN (SR1) – Default function
◦ SR2 (negatively correlated): -0.1 DISTANCE - DEP-DELAY (SR2)
◦ Default k = 10.
Online Experiments
◦ Blue Nile (BN) diamonds: largest online retailer of diamonds; contained 117,641 tuples (diamonds) over
12 attributes.
◦ Yahoo! Autos (YA): offers a popular search service for used cars; contained 13,169 cars within 30 mile
of New York city over 8 attributes.
Offline Experiment Results
1D, Impact of n (SR1) 1D, Impact of n (SR2) 1D, Impact of System-k
Offline Experiment Results
MD, Impact of n (SR1) MD, Impact of n (SR2) MD, Impact of System-k
Online Experiment Results
BN, 1D: Top-k query cost YA, 1D: Top-k query cost
Online Experiment Results
BN, MD: Top-k query cost YA, MD: Top-k query cost
Thanks!

More Related Content

Similar to Query Reranking As A Service

Efficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databasesEfficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databases
Rui Vieira
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
FELIX75
 

Similar to Query Reranking As A Service (20)

On Relevant Query Answering over Streaming and Distributed Data
On Relevant Query Answering over Streaming and Distributed DataOn Relevant Query Answering over Streaming and Distributed Data
On Relevant Query Answering over Streaming and Distributed Data
 
Making sense of the Graph Revolution
Making sense of the Graph RevolutionMaking sense of the Graph Revolution
Making sense of the Graph Revolution
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
 
Secondary Spectrum Usage for Mobile Devices
Secondary Spectrum Usage for Mobile DevicesSecondary Spectrum Usage for Mobile Devices
Secondary Spectrum Usage for Mobile Devices
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019
 
Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
 
Real-time Fraudulent Trips Detection with Xueyao Jiang
Real-time Fraudulent Trips Detection with Xueyao JiangReal-time Fraudulent Trips Detection with Xueyao Jiang
Real-time Fraudulent Trips Detection with Xueyao Jiang
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
How Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdfHow Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdf
 
Efficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databasesEfficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databases
 
Junhua wang ai_next_con
Junhua wang ai_next_conJunhua wang ai_next_con
Junhua wang ai_next_con
 
Designing States, Actions, and Rewards for Using POMDP in Session Search
Designing States, Actions, and Rewards for Using POMDP in Session SearchDesigning States, Actions, and Rewards for Using POMDP in Session Search
Designing States, Actions, and Rewards for Using POMDP in Session Search
 
Stream Processing in Uber
Stream Processing in UberStream Processing in Uber
Stream Processing in Uber
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
 
Deep Learning Inference at speed and scale
Deep Learning Inference at speed and scaleDeep Learning Inference at speed and scale
Deep Learning Inference at speed and scale
 
QCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uberQCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uber
 

More from Abolfazl Asudeh

MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large Clusters
Abolfazl Asudeh
 
Using incompletely cooperative game theory in wireless sensor networks
Using incompletely cooperative game theory in wireless sensor networksUsing incompletely cooperative game theory in wireless sensor networks
Using incompletely cooperative game theory in wireless sensor networks
Abolfazl Asudeh
 

More from Abolfazl Asudeh (6)

Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...
Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...
 
[Slides] Crowdsourcing Pareto-Optimal Object Finding By Pairwise Comparisons
[Slides] Crowdsourcing Pareto-Optimal Object Finding By Pairwise Comparisons[Slides] Crowdsourcing Pareto-Optimal Object Finding By Pairwise Comparisons
[Slides] Crowdsourcing Pareto-Optimal Object Finding By Pairwise Comparisons
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large Clusters
 
GBLENDER: Towards blending visual query formulation and query processing in g...
GBLENDER: Towards blending visual query formulation and query processing in g...GBLENDER: Towards blending visual query formulation and query processing in g...
GBLENDER: Towards blending visual query formulation and query processing in g...
 
Using incompletely cooperative game theory in wireless sensor networks
Using incompletely cooperative game theory in wireless sensor networksUsing incompletely cooperative game theory in wireless sensor networks
Using incompletely cooperative game theory in wireless sensor networks
 
PREGEL a system for large scale graph processing
PREGEL a system for large scale graph processingPREGEL a system for large scale graph processing
PREGEL a system for large scale graph processing
 

Recently uploaded

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Query Reranking As A Service

  • 1. Query Reranking As A Service ABOLFAZL ASUDEH NAN ZHANG GAUTAM DAS © 2016 VLDB Endowment 21508097/16/03 UNIVERSITY OF TEXAS AT ARLINGTON GEORGE WASHINGTON UNIVERSITY UNIVERSITY OF TEXAS AT ARLINGTON
  • 3. Practical Challenges How to design a small number of “good” ranking functions when different users often have diverse and sometimes contradicting preferences ◦ e.g., “transit time” in airfare search, “mileage per year” in used car search ◦ Similarly, users with disabilities or special needs often need special ranking functions Some database owners lack the expertise, resources, or even motivation to properly study the requirements of their users
  • 5. Applications Meta-Ranking Service ◦ remember user preferences across multiple web databases (e.g., multiple car dealers) Applications dedicated for users with disabilities, special needs, etc.
  • 6. Critical Requirements Accuracy: The output query answer must precisely follow the user-specified ranking function Versatility: Support a wide variety of user-specific ranking function (so long as it is monotonic). Allow pass-through of selection conditions. No knowledge of system ranking function. Efficiency: the number of queries issued to the web database must be minimized. ◦ Because almost all such databases enforce certain query-rate limit. ◦ e.g., Google Flights allows only 50 free queries per user per day.
  • 7. Solution Novelty: Baselines Crawling the entire database ◦ Problem: efficiency ”Page down” [TZD13a, TZD13b] to retrieve more than k tuples ◦ Problem: no guarantee of accuracy unless retrieving all n tuples
  • 8. Database Model Hidden (web) Database ◦ Limited query interface ◦ Ranked Retrieval Model (Top-k)
  • 9. Problem Statement Query Reranking (Get-Next) Problem: ◦ Consider: a hidden web database D with a top-k interface ◦ with an arbitrary, unknown, system ranking function. ◦ Given: ◦ a user query q ◦ a user-specied monotonic ranking function S ◦ the top-h (h ≥ 0 can be greater than, equal to, or smaller than k) tuples satisfying q according to S ◦ Find ◦ the No. (h + 1) tuple for q while minimizing the number of queries issued D.
  • 11. 1D Re-Ranking Problem Given: ◦ An attribute ◦ th, the h-ranked tuple according to the attribute Find: ◦ th+1, the (h+1)-ranked tuple according to the attribute Why? ◦ 1D  MD using TA or Fagin’s algorithm ◦ Rank by “transit time”
  • 12. 1D-BASELINE th q0 t0 q1 t1 q2 th+1 NULL Problem: the query cost depend on the correlation between the system ranking function and the attribute! Negative Result: no algorithm can outperform 1D-BASELINE in the worst-case scenario.
  • 14. 1D-RERANK Key Observation: dense regions  repeated incurrence of high query cost for re-ranking Key Idea: Crawl dense regions to benefit future re-ranking requests (On-The-Fly Indexing) follow 1D-BINARY until search region is not dense yet Oracle ask from oracle if region is dense
  • 16. Applying Sorted-List Algorithms (e.g. TA) on 1D-RERANK The Get-next operation designed by 1D-RERANK can get used to develop a sorted-access top-k algorithm (e.g. TA) on top of it and enable HD Get-next. It has a major efficiency problem, mainly because of not leveraging the full power provided by client- server databases.
  • 17. MD-BASELINE q0: select *  t0 q1: where x < t0[x] and y < l(y)  null q2: where x ≥ t0[x] and x < l(x) and y < t0[y]  t1 q3: where x ≥ t0[x] and x < b(x) and y < t0[y]  null q4: where x ≥ b(x) and x < I(x) and y < b(y) null q5: where x < t1[x] and y < t1[y]  null x y t0 l(y) l(x) t1 b(y) b(x)  apply 1D-RERANK
  • 18. MD-BINARY q0: select *  t0 q1: where x < t0[x] and y < l(y)  null q2: where x ≥ t0[x] and x < l(x) and y < t0[y]  t1 q3: where x ≥ t0[x] and x < v0[x] and y < v0[y]  null q4: where x ≥ t0[x] and x < b(x) and y < t0[y]  null q5: where x ≥ b(x) and x < I(x) and y < b(y) null q6: where x < t1[x] and y < t1[y]  null x y t0 l(y) l(x) t1  V0 apply 1D-RERANK b(x) b(y)
  • 19. MD-RERANK High-level idea: follow MD-BINARY until a remaining search space (1) is covered by a region in the index; or (2) has volume smaller than the threshold. On-The-Fly Indexing: proactively record as an index densely located tuples once we encounter them, so that we do not need to incur a high query cost every time a query q triggers visits to the same dense region.
  • 20. Experiments setup Simulating the Top-k web interface on top of an offline dataset. ◦ US Department of Transportation (DOT): 457,013 tuples and over 28 attributes. ◦ Considered two system ranking functions: ◦ SR1(positively correlated ): 0.3 AIR-TIME + TAXI-IN (SR1) – Default function ◦ SR2 (negatively correlated): -0.1 DISTANCE - DEP-DELAY (SR2) ◦ Default k = 10. Online Experiments ◦ Blue Nile (BN) diamonds: largest online retailer of diamonds; contained 117,641 tuples (diamonds) over 12 attributes. ◦ Yahoo! Autos (YA): offers a popular search service for used cars; contained 13,169 cars within 30 mile of New York city over 8 attributes.
  • 21. Offline Experiment Results 1D, Impact of n (SR1) 1D, Impact of n (SR2) 1D, Impact of System-k
  • 22. Offline Experiment Results MD, Impact of n (SR1) MD, Impact of n (SR2) MD, Impact of System-k
  • 23. Online Experiment Results BN, 1D: Top-k query cost YA, 1D: Top-k query cost
  • 24. Online Experiment Results BN, MD: Top-k query cost YA, MD: Top-k query cost