London Information Retrieval Meetup September 2021 @ DESIRED 2021
Giorgio Maria Di Nunzio, Department of Information Engineering, University of Padova 15/09/2021
Intelligent Interactive Information Access (IIIA) Hub http://iiia.dei.unipd.it
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Query Performance Prediction by Means of Intent-Aware Metrics in Systematic Reviews
1. Am I Missing Something?
Query Performance Prediction by Means of
Intent-Aware Metrics in Systematic Reviews
Giorgio Maria Di Nunzio
Department of Information Engineering
University of Padova
Intelligent Interactive Information Access (IIIA) Hub
London IR Meetup @ DESIRES 2021
15th September 2021, Padova, Italy
6. Example
• To produce high-quality, relevant, up-to-date
systematic reviews and other synthesized research
evidence to inform health decision making.
34. Problem
• Build an e
ff
ective system given limited resources
• Resources can be
• limited time (+ in
fi
nite money)
• limited money (+ in
fi
nite time)
• limited time and limited money
43. When to Stop Reviewing?
Li, D., Kanoulas, E. TOIS 2021
When to Stop Reviewing in Technology-Assisted Reviews:
Sampling from an Adaptive Distribution to Estimate Residual Relevant Documents
44. When to Stop Reviewing?
Giorgio Maria Di Nunzio, ECIR 2018
A Study of an Automatic Stopping Strategy for Technologically Assisted Medical Reviews
45. When to Stop Reviewing?
Giorgio Maria Di Nunzio, ECIR 2018
A Study of an Automatic Stopping Strategy for Technologically Assisted Medical Reviews
46. When to Stop Reviewing?
Giorgio Maria Di Nunzio, ECIR 2018
A Study of an Automatic Stopping Strategy for Technologically Assisted Medical Reviews
47. When to Stop Reviewing?
0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
documents shown (feedback)
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
48. When to Stop Reviewing?
0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
documents shown (feedback)
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
Baseline
49. When to Stop Reviewing?
0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
documents shown (feedback)
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
200 per query
50. When to Stop Reviewing?
0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
documents shown (feedback)
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
400 per query
51. When to Stop Reviewing?
0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
documents shown (feedback)
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
600 per query
52. When to Stop Reviewing?
0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
documents shown (feedback)
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
800 per query
53. When to Stop Reviewing?
0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
documents shown (feedback)
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
1000 per query
54. When to Stop Reviewing?
0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
documents shown (feedback)
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
55. When to Stop Reviewing?
0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
documents shown (feedback)
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
56. When to Stop Reviewing?
0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
documents shown (feedback)
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
60. When to Stop Reviewing?
Accurate estimate of recall but more e
ff
ort and less relevant documents
Or
More relevant documents with less e
ff
ort but inaccurate recall
69. Query Performance Prediction
• Problem: Can we order the di
ff
erent reformulations in
decreasing order according to some evaluation measures?
• Is there any reformulation more promising than the others?
• Pre-retrieval predictors: use statistics about queries and the
collection in order to make a prediction.
• Post-retrieval predictors: use the results, such as the retrieval
status value and rank of documents to make a prediction
about the e
ff
ectiveness of a query.
Scells, H., Azzopardi, L., Zuccon, G., Koopman, B. SIGIR 2018
Query Variation Performance Prediction for Systematic Reviews
70. ScentBar
Umemoto, K., Yamamoto, T., Tanak,. K. SIGIR 2016
ScentBar: A Query Suggestion Interface Visualizing
the Amount of Missed Relevant Information for Intrinsically Diverse Search
71. ScentBar
Umemoto, K., Yamamoto, T., Tanak,. K. SIGIR 2016
ScentBar: A Query Suggestion Interface Visualizing
the Amount of Missed Relevant Information for Intrinsically Diverse Search
Smoking cigarettes
72. ScentBar
Umemoto, K., Yamamoto, T., Tanak,. K. SIGIR 2016
ScentBar: A Query Suggestion Interface Visualizing
the Amount of Missed Relevant Information for Intrinsically Diverse Search
Smoking cigarettes
73. Intent-Aware GAIN Metric
• Importance: documents relevant to a central aspect of the
search topic produce higher gain than those relevant to a
peripheral one.
• Relevance: highly relevant documents produce higher gain
than partially relevant ones.
• Novelty: documents relevant to an unexplored aspect
produce higher gain than those relevant to a fully explored
aspect.
88. Intent-Aware GAIN
• Umemoto et al. de
fi
ne the amount of Missed Information MI
to estimate what could have happened if the user had
browsed more documents
• Instead, we want to transform the GAIN function into a proxy
for a Query Variant Performance Prediction function
89. A Gain for Query Reformulations
• Hypotheses
• We do not have a “reference” topic t (which is unknown)
• But we know that the user has some information need “i”
• We have one single cluster of query variants “Vi”
Di Nunzio, G.M., Faggioli, G. 2021 Applied Sciences (submitted)
A Study of a Gain Based Approach for Query Aspects in Recall Oriented Tasks
https://www.preprints.org/manuscript/202109.0198/v1
90. A Gain for query reformulations
Gaini,q(D) =
[
1 −
∏
d∈D
(1 − Reli,q(d))
]
Reli,q(d) =
∑s∈Vi
Impq(s)Rels(d)
∑s∈Vi
Impq(s)
92. Query Variants Similarity Matrix
• We de
fi
ne Dq as the set of documents retrieved by q.
• Di is the the set of all documents retrieved by at least one
reformulation q.
• R ∈ R^|Vi| x |Di| as the matrix of rankings for the information
need i where each row corresponds to a speci
fi
c
reformulation and each column to a document.
• The value of an element R is de
fi
ned as |Di | minus the rank of
document d retrieved by q.
93. Query Variants Similarity Matrix
• Finally, we build the symmetric matrix S by computing the
cosine similarity between each pair of rows.
• The row (or column) with the largest sum corresponds to the
query variant “closest” to the ideal centroid.
• We use the values of the sum of the rows (or columns) to
order the importance of each variant.
95. Experimental Setting
• Task: Predict the best query variants for a recall oriented task
• CLEF 2018 eHealth Consumer Health Search (CHS) task
• 50 information needs with 7 query reformulation each
• Collection fo 5,535,120 Web pages (selected domains
acquired from the CommonCrawl)
• 500 relevance assessments per information need
107. Final Remarks
• “Traditional” QPP approaches less accurate for Recall
oriented tasks.
• For recall based tasks where the number of documents to
retrieve maybe large, N > 100 , the original de
fi
nition of GAIN
saturates quickly to 1.
• We proposed an alternative de
fi
nition that mitigates this
problem, and we also presented a similarity based approach
that tries to capture the ‘optimal’ query reformulation.
• Our approach signi
fi
cantly improves the prediction of the
order of importance of each reformulation in terms of recall
110. Special Issues now!
(with Evangelos Kanoulas)
• Special Issue Intelligent Systems with Applications (ISWA) Elsevier
• Technology Assisted Review Systems
• Extended deadline October 30th 2021
• Research Topic at Frontiers in Research Metrics and Analytics
• Evaluation in High-Recall IR Systems, evaluation metrics and
protocols
• Submissions open until April 2022
#ads
111.
Thank you!
Am I Missing Something?
Giorgio Maria Di Nunzio
@airamoigroig
Intelligent Interactive Information Access (IIIA) Hub
http://iiia.dei.unipd.it
@iiia_unipd
London IR Meetup @ DESIRES 2021
15th September 2021, Padova, Italy