.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causality-driven Ad-hoc Information Retrieval
Suchana Datta
PhD Student
University College Dublin
Under the supervision of
Dr. Derek Greene (University College Dublin)
&
Dr. Debasis Ganguly (University of Glasgow)
23rd
May, 2022
S. Datta (UCD) CausalIR 23rd May 1 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Causal Information Retrieval - the Challenges
2 A Factored Causal Relevance Model (SIGIR’20)
3 A ‘Pointwise-Query, Listwise-Document’-based QPP Approach (SIGIR’22)
4 Concluding Remarks
S. Datta (UCD) CausalIR 23rd May 2 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
Causal Information Retrieval - the Challenges
S. Datta (UCD) CausalIR 23rd May 3 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
What are we interested in?
S. Datta (UCD) CausalIR 23rd May 4 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
May I ask you a question?
S. Datta (UCD) CausalIR 23rd May 5 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
What does Google say?
S. Datta (UCD) CausalIR 23rd May 6 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
What does Google tell us?
S. Datta (UCD) CausalIR 23rd May 7 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
Let’s make the query more specific
S. Datta (UCD) CausalIR 23rd May 8 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
Where is the ’WHY’?
S. Datta (UCD) CausalIR 23rd May 9 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
Where is the ’WHY’?
Is a traditional search system adequate for retrieving causally relevant
information?
S. Datta (UCD) CausalIR 23rd May 10 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
Where is the ’WHY’?
Is a traditional search system adequate for retrieving causally relevant
information?
Is a new research paradigm required to address the requirements of
identifying causally-relevant information (i.e., Causal Information
Retrieval)?
S. Datta (UCD) CausalIR 23rd May 10 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
Dataset Construction
https://www.telegraphindia.com
http://fire.irsi.res.in/fire/static/data
S. Datta (UCD) CausalIR 23rd May 11 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
Dataset Construction
S. Datta (UCD) CausalIR 23rd May 12 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
Dataset Construction
S. Datta (UCD) CausalIR 23rd May 13 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
Nature of Causal Documents
Term overlaps between topical
and causal set of documents
Cosine similarities between topical
and causal documents
S. Datta (UCD) CausalIR 23rd May 14 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
Initial Investigations
S. Datta (UCD) CausalIR 23rd May 15 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Causal Information Retrieval - the Challenges
Initial Investigations
S. Datta (UCD) CausalIR 23rd May 15 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
A Factored Causal Relevance Model (SIGIR’20)
S. Datta (UCD) CausalIR 23rd May 16 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
We Propose - FCRLM
A feedback model to estimate a distribution of terms which are relatively
infrequent but associated with high weights in the topically relevant
distribution, leading to potential causal relevance.
– Datta, S., Ganguly, D., Roy, D., Bonin, F., Jochim, C. and Mitra, M., 2020,
July. Retrieving potential causes from a query event. In Proceedings of the 43rd
International ACM SIGIR Conference on Research and Development in
Information Retrieval (pp. 1689-1692).
S. Datta (UCD) CausalIR 23rd May 17 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
Model Architecture
S. Datta (UCD) CausalIR 23rd May 18 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
Model Architecture
Available at : https://github.com/suchanadatta/Factored-Causal-RLM.git
S. Datta (UCD) CausalIR 23rd May 18 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
How FCRLM Works?
S. Datta (UCD) CausalIR 23rd May 19 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
How FCRLM Works?
S. Datta (UCD) CausalIR 23rd May 20 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
How FCRLM Works?
S. Datta (UCD) CausalIR 23rd May 21 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
FCRLM - Provides us Causally Relevant Information
S. Datta (UCD) CausalIR 23rd May 22 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
Performance of FCRLM
S. Datta (UCD) CausalIR 23rd May 23 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
CAIR - A Shared Task
For Dataset :
https://github.com/suchanadatta/CAIR-
DataSet.git
Datta, S., Ganguly, D., Roy, D., Greene,
D., Jochim, C. and Bonin, F., 2020,
December. Overview of the
Causality-driven Adhoc Information
Retrieval (CAIR) task at FIRE-2020. In
Forum for Information Retrieval Evaluation
(pp. 14-17).
Datta, S., Ganguly, D., Roy, D., Greene,
D., 2021, December. Overview of the
Causality-driven Adhoc Information
Retrieval (CAIR) task at FIRE-2021. In
Forum for Information Retrieval Evaluation
(pp. 25-27).
S. Datta (UCD) CausalIR 23rd May 24 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
We Want More Fine-grained Information
S. Datta (UCD) CausalIR 23rd May 25 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
We Want More Fine-grained Information
Datta, S., Greene, D., Ganguly, D., Roy, D. and Mitra, M., 2020. Where’s the
Why? In Search of Chains of Causes for Query Events. In AICS (pp. 109-120).
S. Datta (UCD) CausalIR 23rd May 25 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
Dataset Construction
S. Datta (UCD) CausalIR 23rd May 26 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
Which piece of text is a Query?
QPP - Predicting the quality of retrieved documents to satisfy the
information needs behind the query.
S. Datta (UCD) CausalIR 23rd May 27 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Factored Causal Relevance Model (SIGIR’20)
Which piece of text is a Query?
QPP - Predicting the quality of retrieved documents to satisfy the
information needs behind the query.
S. Datta (UCD) CausalIR 23rd May 27 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A ‘Pointwise-Query, Listwise-Document’-based QPP Approach (SIGIR’22)
A ‘Pointwise-Query, Listwise-Document’-based QPP Approach
(SIGIR’22)
S. Datta (UCD) CausalIR 23rd May 28 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A ‘Pointwise-Query, Listwise-Document’-based QPP Approach (SIGIR’22)
We propose - qppBERT-PL
An end-to-end neural cross-encoder-based approach - trained pointwise on
individual queries, but listwise over the top ranked documents (split into
chunks).
– Datta, S., MacAvaney, S., Ganguly, D., Greene, D. A ‘Pointwise-Query,
Listwise-Document’based Query Performance Prediction Approach (to appear in
the proceedings of SIGIR’22).
S. Datta (UCD) CausalIR 23rd May 29 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A ‘Pointwise-Query, Listwise-Document’-based QPP Approach (SIGIR’22)
What do we Propose? - qppBERT-PL
A novel architecture and objective function for a pointwise neural QPP.
Transformed the pointwise QPP objective into a classification task, not a
regression model.
Models the top-ranked documents as a sequence of chunks (Listwise), not
as a whole set.
Incorporates the relative Positions (or ranks) of the top documents.
S. Datta (UCD) CausalIR 23rd May 30 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A ‘Pointwise-Query, Listwise-Document’-based QPP Approach (SIGIR’22)
End-to-end Architecture of qppBERT-PL
Popular unsupervised
QPP methods (e.g.
NQC, WIG) work well
when information used
from the top-100
documents.
Encoding long
sequences of 100
documents is likely to
be noisy.
Top-ranked set is
segmented into equal
sized partitions
(chunks).
S. Datta (UCD) CausalIR 23rd May 31 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A ‘Pointwise-Query, Listwise-Document’-based QPP Approach (SIGIR’22)
End-to-end Architecture of qppBERT-PL
BERT-based
cross-encoder is used to
model the interactions
between the query and
the document terms of
each chunk.
LSTM-encoded
representation of this
interaction sequence.
Ranks are encoded via
BERT positional
embeddings.
S. Datta (UCD) CausalIR 23rd May 32 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A ‘Pointwise-Query, Listwise-Document’-based QPP Approach (SIGIR’22)
End-to-end Architecture of qppBERT-PL
Passed through a fully
connected layer (FC).
Terminates at a p + 1
dimensional Softmax
representing the
probability of finding r
relevant documents
within this p-sized
chunk
(r ∈ {0, 1, . . . , p}).
S. Datta (UCD) CausalIR 23rd May 33 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A ‘Pointwise-Query, Listwise-Document’-based QPP Approach (SIGIR’22)
End-to-end Architecture of qppBERT-PL
Compute a weighted
average from the
outputs of the network,
predicted for each
p-sized partition of the
top documents.
Aggregated scores are
used to sort the queries
in descending order.
S. Datta (UCD) CausalIR 23rd May 34 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A ‘Pointwise-Query, Listwise-Document’-based QPP Approach (SIGIR’22)
End-to-end Architecture of qppBERT-PL
Compute a weighted
average from the
outputs of the network,
predicted for each
p-sized partition of the
top documents.
Aggregated scores are
used to sort the queries
in descending order.
Available at : https://github.com/suchanadatta/qppBERT-PL.git
S. Datta (UCD) CausalIR 23rd May 34 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A ‘Pointwise-Query, Listwise-Document’-based QPP Approach (SIGIR’22)
Performance of qppBERT-PL
qppBERT-PL is more effective at predicting query performance than
other supervised and unsupervised methods.
S. Datta (UCD) CausalIR 23rd May 35 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A ‘Pointwise-Query, Listwise-Document’-based QPP Approach (SIGIR’22)
Performance of qppBERT-PL
Sequence modeling, chunking and Rank Embeddings are critical
components of qppBERT-PL.
S. Datta (UCD) CausalIR 23rd May 36 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Concluding Remarks
Concluding Remarks
S. Datta (UCD) CausalIR 23rd May 37 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Concluding Remarks
Concluding Remarks
FCRLM - a feedback model to estimate a distribution of terms which are
relatively infrequent but associated with high weights in the topically relevant
distribution, leading to potential causal relevance.
We would like to incorporate adaptive feedback as a feature that would help
users deciding whether or not to use feedback.
qppBERT-PL - the first contribution in QPP that transforms the pointwise
QPP objective into a classification task.
We are interested in exploring ways to aggregate information from short
passages and predict QPP scores for longer documents.
S. Datta (UCD) CausalIR 23rd May 38 / 39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Concluding Remarks
Thank you!
Many thanks to the IR Group of University of Glasgow for inviting me.
For any questions you may have, please e-mail me at :
suchana.datta@ucdconnect.ie
S. Datta (UCD) CausalIR 23rd May 39 / 39

Glasgow Reading Group Invited Talk - Causality-driven Adhoc IR.pdf