Query Performance Prediction by Means of Intent-Aware Metrics in Systematic Reviews

Am I Missing Something?
 
Query Performance Prediction by Means of

Intent-Aware Metrics in Systematic Reviews
Giorgio Maria Di Nunzio

Department of Information Engineering

University of Padova

Intelligent Interactive Information Access (IIIA) Hub
London IR Meetup @ DESIRES 2021

15th September 2021, Padova, Italy

High Precision or High Recall
Gotta catch ‘em all!

When?
eDiscovery
Systematic reviews

Example
• To produce high-quality, relevant, up-to-date  
systematic reviews and other synthesized research
evidence to inform health decision making.

Example
disc herniation in patients
with low-back pain
1901 documents

Physical examination for
lumbar radiculopathy
546 documents
Example

Overview
• Continuous Active Learning (CAL) in Recall Oriented Tasks

• Estimate Recall with limited resources

• Query (Variant) Performance Prediction

PART I

Continuous Active Learning (CAL)

in Recall Oriented Tasks

Continuous Active Learning
D1
D2
D3
D4
D5

D1
D2
D3
D4
D5
IR System

Query
IR System

D1
D5
D3
D4
D2
Query
IR System

D1
D5
D3
D4
D2
Query
IR System
D3

D3
Query
IR System

D3
D5
D4
D2
D1
Query
IR System

D3
D5
D4
D2
D1
Query
IR System
D4

D5
D4
D3
D2
D1
Query
IR System
D4

D3
D4
Query
IR System

D3
D4
D5
D2
D1
Query
IR System

D3
D4
D5
D2
D1
Query
IR System
D2

Rank Assess
Formulate

Gotta Catch ‘em All…right?

PART II

Estimate Recall with Limited Resources

Problem
• Build an e
ff
ective system given limited resources

• Resources can be

• limited time (+ in
fi
nite money)

• limited money (+ in
fi
nite time)

• limited time and limited money

Ranking or Sampling?
“Distill” as many relevant documents as possible

Estimate the proportion of relevant documents

Rank Assess
Formulate

Rank Assess
Formulate Sample

When to Stop Reviewing?
Li, D., Kanoulas, E. TOIS 2021

When to Stop Reviewing in Technology-Assisted Reviews:

Sampling from an Adaptive Distribution to Estimate Residual Relevant Documents

Giorgio Maria Di Nunzio, ECIR 2018

A Study of an Automatic Stopping Strategy for Technologically Assisted Medical Reviews

0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
documents shown (feedback)
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop

0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
Baseline

0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
200 per query

0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
400 per query

0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
600 per query

0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
800 per query

0.5
0.6
0.7
0.8
0.9
1.0
20000 40000 60000 80000
average
recall
type
abs−hh−ratio
abs−th−ratio
bm25
equal
prop
1000 per query

Accurate estimate of recall but more e
ff
ort and less relevant documents

Or

More relevant documents with less e
ff
ort but inaccurate recall

PART III

Query (Variant) Performance Prediction

Query Reformulation
Rank Assess
Formulate Sample

Query Reformulation
Reformulate
Reformulate
Rank Assess
Formulate Sample

Query Reformulation
Reformulate
Rank Assess
Formulate Sample
Reformulate

Query Reformulation
Reformulate
Rank
Formulate
Reformulate Rank
Rank

Query Performance (post-hoc)
Reformulate
Rank
Formulate
Reformulate Rank
Rank

Query Performance Prediction
Reformulate
Rank
Formulate
Reformulate Rank
Rank
Scells, H., Azzopardi, L., Zuccon, G., Koopman, B. SIGIR 2018

Query Variation Performance Prediction for Systematic Reviews
?
?
?

Query Performance Prediction
• Problem: Can we order the di
ff
erent reformulations in
decreasing order according to some evaluation measures?

• Is there any reformulation more promising than the others?

• Pre-retrieval predictors: use statistics about queries and the
collection in order to make a prediction.

• Post-retrieval predictors: use the results, such as the retrieval
status value and rank of documents to make a prediction
about the e
ff
ectiveness of a query.
Scells, H., Azzopardi, L., Zuccon, G., Koopman, B. SIGIR 2018

Query Variation Performance Prediction for Systematic Reviews

ScentBar
Umemoto, K., Yamamoto, T., Tanak,. K. SIGIR 2016

ScentBar: A Query Suggestion Interface Visualizing  
the Amount of Missed Relevant Information for Intrinsically Diverse Search

ScentBar
Umemoto, K., Yamamoto, T., Tanak,. K. SIGIR 2016

ScentBar: A Query Suggestion Interface Visualizing  
the Amount of Missed Relevant Information for Intrinsically Diverse Search
Smoking cigarettes

Intent-Aware GAIN Metric
• Importance: documents relevant to a central aspect of the
search topic produce higher gain than those relevant to a
peripheral one.

• Relevance: highly relevant documents produce higher gain
than partially relevant ones.

• Novelty: documents relevant to an unexplored aspect
produce higher gain than those relevant to a fully explored
aspect.

Query Aspects
t
s1
s2
s3
s4
s5
Topic
Subtopics

C1
C2
Query Aspects
t
s1
s2
s3
s4
s5
Topic
Subtopics
Clusters

C1
C2
Query Aspects
t
s1
a1
s3
s4
a2
Topic
Subtopics
Clusters
Aspects

Importance of a Subtopic
Impt(s) =
∑
d∈DN
s ∩d∈DN
t
1
Rankt(d)
D1
D2
D3
D4
D5

Impt(s) =
∑
d∈DN
s ∩d∈DN
t
1
Rankt(d)
D1
D2
D3
D4
D5
t
s

Impt(s) =
∑
d∈DN
s ∩d∈DN
t
1
Rankt(d)
D7
D2
D5
D8
D4
D1
D2
D3
D4
D5
t
s

Building the IA-GAIN
P(a|t) =
Impt(a)
∑a′

∈At
Impt(a′

)

P(a|t) =
Impt(a)
∑a′

∈At
Impt(a′

)
Gaint,a(D) =
[
1 −
∏
d∈D
(1 − Relt,a(d))
]

P(a|t) =
Impt(a)
∑a′

∈At
Impt(a′

)
Gaint,a(D) =
[
1 −
∏
d∈D
(1 − Relt,a(d))
]
Relt,a(d) =
∑s∈Ca
Impt(s) ⋅ Rels(d)
∑s∈Ca
Impt(s)

P(a|t) =
Impt(a)
∑a′

∈At
Impt(a′

)
Gaint,a(D) =
[
1 −
∏
d∈D
(1 − Relt,a(d))
]
Relt,a(d) =
∑s∈Ca
Impt(s) ⋅ Rels(d)
∑s∈Ca
Impt(s)
Rels(d) = 1/ Ranks(d)

Intent-Aware GAIN
Gain-IAt(D) =
∑
a∈At
P(a|t) ⋅ Gaint,a(D)

Intent-Aware GAIN
• Umemoto et al. de
fi
ne the amount of Missed Information MI
to estimate what could have happened if the user had
browsed more documents

• Instead, we want to transform the GAIN function into a proxy
for a Query Variant Performance Prediction function

A Gain for Query Reformulations
• Hypotheses

• We do not have a “reference” topic t (which is unknown)

• But we know that the user has some information need “i”

• We have one single cluster of query variants “Vi”
Di Nunzio, G.M., Faggioli, G. 2021 Applied Sciences (submitted)

A Study of a Gain Based Approach for Query Aspects in Recall Oriented Tasks

https://www.preprints.org/manuscript/202109.0198/v1

A Gain for query reformulations
Gaini,q(D) =
[
1 −
∏
d∈D
(1 − Reli,q(d))
]
Reli,q(d) =
∑s∈Vi
Impq(s)Rels(d)
∑s∈Vi
Impq(s)

Avoid Saturated GAIN
GAINi,q(D) =
[
1 −
∑d∈D
(1 − Reli,q(d))
|D| ]
“Mean” Gain

Query Variants Similarity Matrix
• We de
fi
ne Dq as the set of documents retrieved by q.

• Di is the the set of all documents retrieved by at least one
reformulation q.

• R ∈ R^|Vi| x |Di| as the matrix of rankings for the information
need i where each row corresponds to a speci
fi
c
reformulation and each column to a document.

• The value of an element R is de
fi
ned as |Di | minus the rank of
document d retrieved by q.

• Finally, we build the symmetric matrix S by computing the
cosine similarity between each pair of rows.

• The row (or column) with the largest sum corresponds to the
query variant “closest” to the ideal centroid.

• We use the values of the sum of the rows (or columns) to
order the importance of each variant.

V1 V2 V3
V1 1 0.2 0.4 1.6
V2 0.2 1 0.7 1.9
V3 0.4 0.7 1 2.1

Experimental Setting
• Task: Predict the best query variants for a recall oriented task

• CLEF 2018 eHealth Consumer Health Search (CHS) task

• 50 information needs with 7 query reformulation each

• Collection fo 5,535,120 Web pages (selected domains
acquired from the CommonCrawl)

• 500 relevance assessments per information need

Inter-Topic Traditional QPP
 
(sanity check)

Final Remarks
• “Traditional” QPP approaches less accurate for Recall
oriented tasks.

• For recall based tasks where the number of documents to
retrieve maybe large, N > 100 , the original de
fi
nition of GAIN
saturates quickly to 1.

• We proposed an alternative de
fi
nition that mitigates this
problem, and we also presented a similarity based approach
that tries to capture the ‘optimal’ query reformulation.

• Our approach signi
fi
cantly improves the prediction of the
order of importance of each reformulation in terms of recall

Future Work
Reformulate
Rank Assess
Formulate Sample
Reformulate

Future Work
Reformulate
Rank
Formulate
Reformulate Rank
Rank

Special Issues now!

(with Evangelos Kanoulas)
• Special Issue Intelligent Systems with Applications (ISWA) Elsevier

• Technology Assisted Review Systems
• Extended deadline October 30th 2021
• Research Topic at Frontiers in Research Metrics and Analytics

• Evaluation in High-Recall IR Systems, evaluation metrics and
protocols
• Submissions open until April 2022
#ads

Thank you!
 
Am I Missing Something?
Giorgio Maria Di Nunzio

@airamoigroig

Intelligent Interactive Information Access (IIIA) Hub

http://iiia.dei.unipd.it 
@iiia_unipd

London IR Meetup @ DESIRES 2021

15th September 2021, Padova, Italy

Query Performance Prediction by Means of Intent-Aware Metrics in Systematic Reviews

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to  Query Performance Prediction by Means of Intent-Aware Metrics in Systematic Reviews

Similar to  Query Performance Prediction by Means of Intent-Aware Metrics in Systematic Reviews (20)

Recently uploaded

Recently uploaded (20)