SlideShare a Scribd company logo
Answering Why-not Questions
on Top-K Queries
Andy He and Eric Lo
The Hong Kong Polytechnic University
Background
 The database community has focused on
the performance issues for decades
 Recently more people turn their focus on
to the usability issues
 Supporting keyword search
 Query auto-completion
 Explaining your query result (a.k.a. Why and
Why-Not Questions)
2/33
Why-Not Questions
 You post a query Q
 Database returns you a result R
 R gives you “surprise”
 E.g., a tuple m that you are expecting in the
result is missing, you ask “WHY??!”
 You pose a why-not question (Q,R,m)
 Database returns you an explanation E
3/33
The (short) history of Why-Not
 Chapman and Jagadish
 “Why Not?” [SIGMOD 09]
 Select-Project-Join (SPJ) Questions
 Explanation E = “tell you which operator excludes
the expected tuple”
 Hung, Che, A.H. Doan, and J. Naughton
 “On the Provenance of Non-Answers to Queries
Over Extracted Data” [PVLDB 09]
 SPJ Queries
 Explanation E =“tell you how to modify the data”
4/33
The (short) history of Why-Not
 Herschel and Herandez
 “Explaining Missing Answers to SPJUA Queries”
[PVLDB 10]
 SPJUA Queries
 Explanation E =“tell you how to modify the data”
 Tran and C.Y. Chan
 “How to Conquer why-not Questions” [SIGMOD
10]
 SPJA Queries
 Explanation E =“tell you how to modify your
query” 5/33
About this work
 Why-Not question on Top-k queries.
 Hotel <Price, Distance to CityCenter>
 Top-3 Hotel
 Weighting worigin =<0.5, 0.5>
 Result
 Rank 1: Sheraton
 Rank 2: Westin
 Rank 3: InterContinental
 “WHY my favorite Renaissance NOT in the Top-3 result?”
 If my value of k is too small?
 Or I should revise my weighting?
 Or need to modify both k and weighting?
 Explanation E = “tell you how to refine your Top-K query in
order to get your favorites back to the result”
6/33
One possible answer
-only modify k
 Original query
Q(koriginal=3,woriginal=<0.5,0.5>)
 The ranking of Renaissance under the
original weighting woriginal=<0.5,0.5>
 Rank 1: Sheraton
 Rank 2: Westin
 Rank 3: InterContinental
 Rank 4: Hilton
 Rank 5: Renaissance
 Refined query #1: Q1(k=3,w=<0.5,0.5>)
5
7/33
X
Another possible answer
-only modify weighting
 Original query Q(k=3,woriginal=<0.5,0.5>)
 Refined query #1: Q1(k=5,w=<0.5,0.5>)
 If we set weighting w=<0.1,0.9>
 Rank 1: Hotel E
 Rank 2: Hotel F
 Rank 3: Renaissance
 Refined query #2: Q2(k=3,w=<0.1,0.9>)
8/33
Yet another possible answer
-modify both
 Original query Q(k=3,w=<0.5,0.5>)
 Refined query #1: Q1(k=5,w=<0.5,0.5>)
 Refined query #2: Q2(k=3,w=<0.1,0.9>)
 If we set weighting w=<0.9,0.1>
 Rank 1: Hotel A
 Rank 2: Hotel B
 Rank 3: Hotel C
 …
 …
 Rank 10000: Renaissance
 Refined query #3: Q3(k=10000,w=<0.9,0.1>)
9/33
Our objective
 Find the refined query that minimizes a
penalty function with the missing tuple m
in the Top-K results
Prefer Modify K PMK
Prefer Modify Weighting PMW
Never Mind (Default) NM
10/33
Basic idea
 For each weighting wi ∈ W
 Run PROGRESS(wi, UNTIL-SEE-m)
 Obtain the ranking ri of m under the weighting
wi
 Form a refined query Qi(k=ri,w=wi)
 Return the refined query with the least
penalty
W is
infinite!!!
11/33
Our approach: sampling
 For each weighting wi ∈ W
 Run PROGRESS(wi, UNTIL-SEE-m)
 Obtain the ranking ri of m under the weighting
wi
 Form a refined query Qi(k=ri,w=wi)
 Return the refined query with the least
penalty
W is a set of
weightings draw from
a restricted weighting
space
Key Theorem: The optimal refined query
Qbest is either Q1 or else Qbest has a weighting
wbest in a restricted weighting space.
12/33
W
How large the sample size should
be?
 We say a refined query is the best-T% refined
query if its penalty is smaller than (1-T)% refined
queries
 And we hope to get such a query with a
probability larger than a threshold Pr
13/33
The PROGRESS operation can be
expensive
 Original query Q(k=3,woriginal=<0.5,0.5>)
 Refined query #1: Q1(k=5,w=<0.5,0.5>)
 If we set weighting w=<0.9,0.1>
 Rank 1: Hotel A
 Rank 2: Hotel B
 Rank 3: Hotel C
 …
 …
 Rank 10000: Renaissance
 Refined query: Q2(k=10000,w=<0.5,0.5>)
Very
Slow!!!
14/33
Two optimization techniques
 Stop each PROGRESS operation early
 Skip some PROGRESS operations
15/33
Stop earlier
 The original query Q(k=3,worigin=<0.5,0.5>)
 Refined query #1: Q1(k=5,w=<0.5,0.5>)
 If we set weighting w=<0.9,0.1>
 Rank 1: Hotel A
 Rank 2: Hotel B
 Rank 3: Hotel C
 …
 Rank 5: Hotel D
 …
16/33
Skip PROGRESS operation(a)
 Similar weightings may lead to similar rankings
 Based on “Reverse Top-K” paper, ICDE’10
 Therefore
 The query result of PROGRESS(wx, UNTIL-SEE-m)
 could be used to deduce
 The query result of PROGRESS(wy, UNTIL-SEE-m)
 [Provided that wx and wy are similar]
17/33
Skip PROGRESS operation(a)
 E.g., Original query Q(k=3,worigin=<0.5,0.5>)
 Refined query #1: Q1(k=5,w=<0.5,0.5>)
Score under w=<0.5,0.5>
Hotel Score
Sheraton 10
Westin 9
InterContinental 8
Hilton 7
Renaissance 6
Score under w=<0.6,0.4>
Hotel Score
Sheraton 9
Westin 10
InterContinental 7
Hilton 8
Renaissance 5
How the score
looks like if we
set w=<0.6,0.4>
18/33
Skip PROGRESS operation(b)
 We can skip a weighting w if we find its
change ∆w between the original weighting
worigin is too large.
 E.g., We have a refined query with penalty
equals to 0.5, for a weighting w, if its changing
∆w is 1. We can totally skip it.
19/33
Experiments
 Case Study on NBA data
 Experiments on Synthetic Data
20/33
Case study on NBA data
 Compare with a pure random sampling
version
 Which do not draw sample from the restricted
weighting space but from the complete
weighting space
21/33
Find the top-3 centers in NBA history
 5 Attributes (Weighting = 1/5)
 POINTS
 REBOUND
 BLOCKING
 FIELD GOAL
 FREE THROW
 Initial Result
 Rank 1: Chamberlain
 Rank 2: Abdul-Jabber
 Rank 3: O’Neal
22/33
Find the top-3 centers in NBA history
Sampling on the
restricted sampling
space
Sampling on the
whole weighting
space
Refined query Top-3 Top-7
∆k 0 4
Time (ms) 156 154
Penalty 0.069 0.28
Why Not ?!
We choose “Prefer Modify Weighting”
23/33
Synthetic Data
 Uniform, Anti-correlated, Correlated
 Scalability
24/33
Varying query dimensions
25/33
Varying ko
26/33
Varying the ranking of the missing
object
27/33
Varying the number of missing
objects
28/33
Varying T%
29/33
Time Time
Quality Quality
Varying Pr
30/33
Optimization effectiveness
31/33
Conclusions
 We are the first one to answer why-not question on top-k
query
 We prove that finding the optimal answer is
computationally expensive
 A sampling based method is proposed
 The optimal answer is proved to be in a restricted
sample space
 Two optimization techniques are proposed
 Stop each PROGRESS operation early
 Skip some PROGRESS operations
32/33
Thanks
Q&A
Deal with multiple missing objects M
 We have to modify the algorithm a litte bit:
 Do a simple filtering on the set of missing
objects
 If mi dominates mj in the data space
 Remove mi from M Because every time mj shows
up in a top-k result, mi must be there
 Condition UNTIL-SEE-m becomes UNTIL-
SEE-ALL-OBJECTS-IN-M
34/33
Penalty Model
 Original Query Q(3, worigin)
 Refined Query Q1(5, worigin)
 Penalty of changing k
 ∆ k = 5 - 3 = 2
 Penalty of changing w
 ∆ w = ||worigin -worigin||2=0
 Basic penalty model
 Penalty(5,w0) = λk ∆ k + λw ∆ w
 (λk + λw = 1)
35/33
Normalized penalty function
36/33

More Related Content

Similar to Answering Why-Not Questions on Top-K Queries

Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
meharikiros2
 
Measures of Relative Standing and Boxplots
Measures of Relative Standing and BoxplotsMeasures of Relative Standing and Boxplots
Measures of Relative Standing and Boxplots
Long Beach City College
 
Week8 livelecture2010 follow_up
Week8 livelecture2010 follow_upWeek8 livelecture2010 follow_up
Week8 livelecture2010 follow_up
Brent Heard
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning models
Kyriakos Chatzidimitriou
 
S3 - Process product optimization design experiments response surface methodo...
S3 - Process product optimization design experiments response surface methodo...S3 - Process product optimization design experiments response surface methodo...
S3 - Process product optimization design experiments response surface methodo...
CAChemE
 
Ai lecture 11(unit02)
Ai lecture  11(unit02)Ai lecture  11(unit02)
Ai lecture 11(unit02)
vikas dhakane
 
Bc Math 10 Functions and Slope Practice Test
Bc Math 10 Functions and Slope Practice TestBc Math 10 Functions and Slope Practice Test
Bc Math 10 Functions and Slope Practice Test
Hun Kim
 
Solution manual for design and analysis of experiments 9th edition douglas ...
Solution manual for design and analysis of experiments 9th edition   douglas ...Solution manual for design and analysis of experiments 9th edition   douglas ...
Solution manual for design and analysis of experiments 9th edition douglas ...
Salehkhanovic
 
Business analytics course in delhi
Business analytics course in delhiBusiness analytics course in delhi
Business analytics course in delhi
bhuvan8999
 
data science course in delhi
data science course in delhidata science course in delhi
data science course in delhi
devipatnala1
 
business analytics course in delhi
business analytics course in delhibusiness analytics course in delhi
business analytics course in delhi
devipatnala1
 
data science training in hyderabad
data science training in hyderabaddata science training in hyderabad
data science training in hyderabad
devipatnala1
 
data science training
data science trainingdata science training
data science training
devipatnala1
 
data science course in delhi
data science course in delhidata science course in delhi
data science course in delhi
devipatnala1
 
Data science certification
Data science certificationData science certification
Data science certification
prathyusha1234
 
data science training in mumbai
data science training in mumbaidata science training in mumbai
data science training in mumbai
devipatnala1
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.
Data Analytics Courses in Pune
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.
hrhrenurenu
 
Best data science training, best data science training institute in Chennai
 Best data science training, best data science training institute in Chennai Best data science training, best data science training institute in Chennai
Best data science training, best data science training institute in Chennai
hrhrenurenu
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.
Data Analytics Courses in Pune
 

Similar to Answering Why-Not Questions on Top-K Queries (20)

Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
 
Measures of Relative Standing and Boxplots
Measures of Relative Standing and BoxplotsMeasures of Relative Standing and Boxplots
Measures of Relative Standing and Boxplots
 
Week8 livelecture2010 follow_up
Week8 livelecture2010 follow_upWeek8 livelecture2010 follow_up
Week8 livelecture2010 follow_up
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning models
 
S3 - Process product optimization design experiments response surface methodo...
S3 - Process product optimization design experiments response surface methodo...S3 - Process product optimization design experiments response surface methodo...
S3 - Process product optimization design experiments response surface methodo...
 
Ai lecture 11(unit02)
Ai lecture  11(unit02)Ai lecture  11(unit02)
Ai lecture 11(unit02)
 
Bc Math 10 Functions and Slope Practice Test
Bc Math 10 Functions and Slope Practice TestBc Math 10 Functions and Slope Practice Test
Bc Math 10 Functions and Slope Practice Test
 
Solution manual for design and analysis of experiments 9th edition douglas ...
Solution manual for design and analysis of experiments 9th edition   douglas ...Solution manual for design and analysis of experiments 9th edition   douglas ...
Solution manual for design and analysis of experiments 9th edition douglas ...
 
Business analytics course in delhi
Business analytics course in delhiBusiness analytics course in delhi
Business analytics course in delhi
 
data science course in delhi
data science course in delhidata science course in delhi
data science course in delhi
 
business analytics course in delhi
business analytics course in delhibusiness analytics course in delhi
business analytics course in delhi
 
data science training in hyderabad
data science training in hyderabaddata science training in hyderabad
data science training in hyderabad
 
data science training
data science trainingdata science training
data science training
 
data science course in delhi
data science course in delhidata science course in delhi
data science course in delhi
 
Data science certification
Data science certificationData science certification
Data science certification
 
data science training in mumbai
data science training in mumbaidata science training in mumbai
data science training in mumbai
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.
 
Best data science training, best data science training institute in Chennai
 Best data science training, best data science training institute in Chennai Best data science training, best data science training institute in Chennai
Best data science training, best data science training institute in Chennai
 
Best data science training, best data science training institute in hyderabad.
 Best data science training, best data science training institute in hyderabad. Best data science training, best data science training institute in hyderabad.
Best data science training, best data science training institute in hyderabad.
 

Recently uploaded

一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 

Recently uploaded (20)

一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 

Answering Why-Not Questions on Top-K Queries

  • 1. Answering Why-not Questions on Top-K Queries Andy He and Eric Lo The Hong Kong Polytechnic University
  • 2. Background  The database community has focused on the performance issues for decades  Recently more people turn their focus on to the usability issues  Supporting keyword search  Query auto-completion  Explaining your query result (a.k.a. Why and Why-Not Questions) 2/33
  • 3. Why-Not Questions  You post a query Q  Database returns you a result R  R gives you “surprise”  E.g., a tuple m that you are expecting in the result is missing, you ask “WHY??!”  You pose a why-not question (Q,R,m)  Database returns you an explanation E 3/33
  • 4. The (short) history of Why-Not  Chapman and Jagadish  “Why Not?” [SIGMOD 09]  Select-Project-Join (SPJ) Questions  Explanation E = “tell you which operator excludes the expected tuple”  Hung, Che, A.H. Doan, and J. Naughton  “On the Provenance of Non-Answers to Queries Over Extracted Data” [PVLDB 09]  SPJ Queries  Explanation E =“tell you how to modify the data” 4/33
  • 5. The (short) history of Why-Not  Herschel and Herandez  “Explaining Missing Answers to SPJUA Queries” [PVLDB 10]  SPJUA Queries  Explanation E =“tell you how to modify the data”  Tran and C.Y. Chan  “How to Conquer why-not Questions” [SIGMOD 10]  SPJA Queries  Explanation E =“tell you how to modify your query” 5/33
  • 6. About this work  Why-Not question on Top-k queries.  Hotel <Price, Distance to CityCenter>  Top-3 Hotel  Weighting worigin =<0.5, 0.5>  Result  Rank 1: Sheraton  Rank 2: Westin  Rank 3: InterContinental  “WHY my favorite Renaissance NOT in the Top-3 result?”  If my value of k is too small?  Or I should revise my weighting?  Or need to modify both k and weighting?  Explanation E = “tell you how to refine your Top-K query in order to get your favorites back to the result” 6/33
  • 7. One possible answer -only modify k  Original query Q(koriginal=3,woriginal=<0.5,0.5>)  The ranking of Renaissance under the original weighting woriginal=<0.5,0.5>  Rank 1: Sheraton  Rank 2: Westin  Rank 3: InterContinental  Rank 4: Hilton  Rank 5: Renaissance  Refined query #1: Q1(k=3,w=<0.5,0.5>) 5 7/33 X
  • 8. Another possible answer -only modify weighting  Original query Q(k=3,woriginal=<0.5,0.5>)  Refined query #1: Q1(k=5,w=<0.5,0.5>)  If we set weighting w=<0.1,0.9>  Rank 1: Hotel E  Rank 2: Hotel F  Rank 3: Renaissance  Refined query #2: Q2(k=3,w=<0.1,0.9>) 8/33
  • 9. Yet another possible answer -modify both  Original query Q(k=3,w=<0.5,0.5>)  Refined query #1: Q1(k=5,w=<0.5,0.5>)  Refined query #2: Q2(k=3,w=<0.1,0.9>)  If we set weighting w=<0.9,0.1>  Rank 1: Hotel A  Rank 2: Hotel B  Rank 3: Hotel C  …  …  Rank 10000: Renaissance  Refined query #3: Q3(k=10000,w=<0.9,0.1>) 9/33
  • 10. Our objective  Find the refined query that minimizes a penalty function with the missing tuple m in the Top-K results Prefer Modify K PMK Prefer Modify Weighting PMW Never Mind (Default) NM 10/33
  • 11. Basic idea  For each weighting wi ∈ W  Run PROGRESS(wi, UNTIL-SEE-m)  Obtain the ranking ri of m under the weighting wi  Form a refined query Qi(k=ri,w=wi)  Return the refined query with the least penalty W is infinite!!! 11/33
  • 12. Our approach: sampling  For each weighting wi ∈ W  Run PROGRESS(wi, UNTIL-SEE-m)  Obtain the ranking ri of m under the weighting wi  Form a refined query Qi(k=ri,w=wi)  Return the refined query with the least penalty W is a set of weightings draw from a restricted weighting space Key Theorem: The optimal refined query Qbest is either Q1 or else Qbest has a weighting wbest in a restricted weighting space. 12/33 W
  • 13. How large the sample size should be?  We say a refined query is the best-T% refined query if its penalty is smaller than (1-T)% refined queries  And we hope to get such a query with a probability larger than a threshold Pr 13/33
  • 14. The PROGRESS operation can be expensive  Original query Q(k=3,woriginal=<0.5,0.5>)  Refined query #1: Q1(k=5,w=<0.5,0.5>)  If we set weighting w=<0.9,0.1>  Rank 1: Hotel A  Rank 2: Hotel B  Rank 3: Hotel C  …  …  Rank 10000: Renaissance  Refined query: Q2(k=10000,w=<0.5,0.5>) Very Slow!!! 14/33
  • 15. Two optimization techniques  Stop each PROGRESS operation early  Skip some PROGRESS operations 15/33
  • 16. Stop earlier  The original query Q(k=3,worigin=<0.5,0.5>)  Refined query #1: Q1(k=5,w=<0.5,0.5>)  If we set weighting w=<0.9,0.1>  Rank 1: Hotel A  Rank 2: Hotel B  Rank 3: Hotel C  …  Rank 5: Hotel D  … 16/33
  • 17. Skip PROGRESS operation(a)  Similar weightings may lead to similar rankings  Based on “Reverse Top-K” paper, ICDE’10  Therefore  The query result of PROGRESS(wx, UNTIL-SEE-m)  could be used to deduce  The query result of PROGRESS(wy, UNTIL-SEE-m)  [Provided that wx and wy are similar] 17/33
  • 18. Skip PROGRESS operation(a)  E.g., Original query Q(k=3,worigin=<0.5,0.5>)  Refined query #1: Q1(k=5,w=<0.5,0.5>) Score under w=<0.5,0.5> Hotel Score Sheraton 10 Westin 9 InterContinental 8 Hilton 7 Renaissance 6 Score under w=<0.6,0.4> Hotel Score Sheraton 9 Westin 10 InterContinental 7 Hilton 8 Renaissance 5 How the score looks like if we set w=<0.6,0.4> 18/33
  • 19. Skip PROGRESS operation(b)  We can skip a weighting w if we find its change ∆w between the original weighting worigin is too large.  E.g., We have a refined query with penalty equals to 0.5, for a weighting w, if its changing ∆w is 1. We can totally skip it. 19/33
  • 20. Experiments  Case Study on NBA data  Experiments on Synthetic Data 20/33
  • 21. Case study on NBA data  Compare with a pure random sampling version  Which do not draw sample from the restricted weighting space but from the complete weighting space 21/33
  • 22. Find the top-3 centers in NBA history  5 Attributes (Weighting = 1/5)  POINTS  REBOUND  BLOCKING  FIELD GOAL  FREE THROW  Initial Result  Rank 1: Chamberlain  Rank 2: Abdul-Jabber  Rank 3: O’Neal 22/33
  • 23. Find the top-3 centers in NBA history Sampling on the restricted sampling space Sampling on the whole weighting space Refined query Top-3 Top-7 ∆k 0 4 Time (ms) 156 154 Penalty 0.069 0.28 Why Not ?! We choose “Prefer Modify Weighting” 23/33
  • 24. Synthetic Data  Uniform, Anti-correlated, Correlated  Scalability 24/33
  • 27. Varying the ranking of the missing object 27/33
  • 28. Varying the number of missing objects 28/33
  • 32. Conclusions  We are the first one to answer why-not question on top-k query  We prove that finding the optimal answer is computationally expensive  A sampling based method is proposed  The optimal answer is proved to be in a restricted sample space  Two optimization techniques are proposed  Stop each PROGRESS operation early  Skip some PROGRESS operations 32/33
  • 34. Deal with multiple missing objects M  We have to modify the algorithm a litte bit:  Do a simple filtering on the set of missing objects  If mi dominates mj in the data space  Remove mi from M Because every time mj shows up in a top-k result, mi must be there  Condition UNTIL-SEE-m becomes UNTIL- SEE-ALL-OBJECTS-IN-M 34/33
  • 35. Penalty Model  Original Query Q(3, worigin)  Refined Query Q1(5, worigin)  Penalty of changing k  ∆ k = 5 - 3 = 2  Penalty of changing w  ∆ w = ||worigin -worigin||2=0  Basic penalty model  Penalty(5,w0) = λk ∆ k + λw ∆ w  (λk + λw = 1) 35/33