CLIC-it 2020 - Seventh Italian Conference on Computational Linguistics - Bologna, 1 - 3 March, 2021 - ONLINE Conference
Joint work with Cataldo Musto, Marco De Gemmis, Pasquale Lops, Giovanni Semeraro
Università degli Studi di Bari Aldo Moro
SWAP Research Group - https://www.di.uniba.it/~swap
Exploiting Distributional Semantics Models for Natural Language Context-aware Justifications for Recommender Systems
1. GIUSEPPE SPILLO, CATALDO MUSTO, MARCO DE GEMMIS, PASQUALE LOPS, GIOVANNI
SEMERARO
UNIVERSITA’ DEGLI STUDI DI BARI «ALDO MORO»
SWAP RESEARCH GROUP – HTTPS://WWW.DI.UNIBA.IT/~SWAP
EXPLOITING DISTRIBUTIONAL SEMANTICS MODELS FOR
NATURAL LANGUAGE CONTEXT-AWARE
JUSTIFICATIONS FOR RECOMMENDER SYSTEMS
CLIC-IT 2020
Seventh Italian Conference on Computational Linguistics
Bologna, 1 - 3 March, 2021
ONLINE Conference
linkedin.com/giuseppe-
spillo-89542b1b6/
@spillo_giuseppe
2. INTRODUCTION
In this project we present a methodology to generate context aware natural
language justifications supporting the suggestions produced by a recommendation
algorithm, using distributional semantics models.
I suggest you Lost in
translation 2
3. WHY CONTEXT AWARE JUSTIFICATION?
Intuition: just like the selection of the
most suitable item is influenced by the
contexts of usage, a justification that
supports a recommendation should
vary depending on the different
contextual situations in which the item
will be consumed.
Context:
company
Alone
Friends
Couple
3
4. AN EXAMPLE OF CONTEXT-AWARE
JUSTIFICATION
…it's suitable if you don't want to be focused on it
because it is simple in plot and direction.
…it's perfect to spend an evening in sweet
company because this entirely unexpected ending
is one of the most romantic and hopeful moments
you will ever see on screen.
I recommend you Lost in translation
because people who liked this movie
think that…
Context: low
attention
Context: couple
4
5. DISTRIBUTIONAL SEMANTICS MODELS
We designed a natural language processing
pipeline that exploits distributional semantics
models to build a term-context matrix that
encodes the importance of terms and concepts
in each contextual dimension.
In this way we can obtain a vector space
representation of each context, which is used to
identify the most suitable pieces of information
to be combined in a justification.
5
6. JUSTIFICATION, NOT EXPLANATIONS
We are not referring to explanations since these are post-hoc justifications: a
recommender system suggests an item, and this framework will generate a
justification independent from the mechanism of recommendation, but will adapt to
the context of consumption of the user.
In this way we can justify a recommendation even for items that have not a minimum
number of ratings
6
7. THE PIPELINE OF THE FRAMEWORK
CONTEXT LEARNER: it uses DSMs to learn a vector space representation of each context.
RANKER: it implements a scoring mechanism to identify the most suitable review excerpts that can
support the recommendation.
GENERATOR: it puts together previously retrieved pieces of information and provides the user with a
context-aware justification of the item.
7
9. THE CONTEXT LEARNER
As said before, the idea of this module is to contruct a context learner exploiting
Distributional Semantics Models.
Given a set of terms 𝑇 = 𝑡1, … , 𝑡𝑛 and a set contexts 𝐶 = {𝑐1, … , 𝑐𝑘}, this module
constructs a term-context matrix 𝐶𝑛,𝑘 that encodes the importance of a term 𝑡𝑖 for
the context 𝑐𝑗.
How can we construct this matrix?
9
10. MANUAL ANNOTATION
Starting from a set 𝑅 of user reviews, we split each review 𝑟 ∈ 𝑅 in sentences to obtain a
set of sentences 𝑆.
Sent c1 c2 c3 c4 c5
s1 ✓ ✓
s2 ✓ ✓
s3 ✓ ✓
s4 ✓ ✓ ✓
Then, given this set of sentences 𝑆, we
manually annotated a subset of these
sentences in order to obtain a set 𝑆′ =
{𝑠1,𝑠2, … , 𝑠𝑚}, where each 𝑠𝑖 is labelled
with one or more contextual settings,
based on the concepts mentioned in the
sentence. Each 𝑠𝑖 can be annotated with
more than one context.
10
11. MANUAL ANNOTATION
For example, the sentence ‘very romantic movie’ can be annotated with the context
company=couple.
The intuition is that a sentence expressing the concepts of “romantic” can be very useful
to support the recommendation of an item for a user who expresses her desire to spend
time with the partner.
11
12. SENTENCE-CONTEXT MATRIX
In this way we built the sentence-
context matrix 𝐴𝑚,𝑘, in which each
𝐴𝑠𝑖,𝑐𝑗
is equal to 1 if the sentence 𝑠𝑖 is
annotated for the context 𝑐𝑗 (so the
concepts mentioned in that sentence
are relevant for that particulat context),
0 otherwise.
Sent c1 c2 c3 c4 c5
s1 1 0 0 1 0
s2 0 1 0 0 1
s3 1 0 0 1 0
s4 0 1 1 0 1
12
13. TERM-SENTENCE MATRIX
The next step is to split all the annotated sentences s ∈ 𝑆′ into terms
𝑡𝑖 ∈ 𝑇 = {𝑡1, 𝑡2, … , 𝑡𝑛} to identify the specific concepts expressed in each annotated
sentence and build a term-sentence matrix 𝑉
𝑛,𝑚.
Each value of this matrix contains the TF-IDF of the term 𝑡𝑖 in the sentence 𝑠𝑗; the IDF
values are computed on all the annotated sentences.
We also used NLP techniques to reduce the size of the vocabulary of terms, including
tokenization, lemmatization, POS-tagging filtering, using the CoreNLP framework.
13
15. TERM-CONTEXT MATRIX
Once obtained the term-sentence matrix 𝑉
𝑛,𝑚 and the sentence-context matrix 𝐴𝑚,𝑘,
it is possible to compute the term-context matrix 𝐶𝑛,𝑘 by simply multiplying them:
𝐶𝑛,𝑘 = 𝑉
𝑛,𝑚 × 𝐴𝑚,𝑘 =
𝑣1,1 ⋯ 𝑣1,𝑚
⋮ ⋱ ⋮
𝑣𝑛,1 ⋯ 𝑣𝑛,𝑚
×
𝑎1,1 ⋯ 𝑎1,𝑘
⋮ ⋱ ⋮
𝑎𝑚,1 ⋯ 𝑎𝑚,𝑘
=
𝑐1,1 ⋯ 𝑐1,𝑘
⋮ ⋱ ⋮
𝑐𝑛,1 ⋯ 𝑐𝑛,𝑘
15
Term Good mood High Alone Couple
happy 3,4 1,5 1,7 2,4
fun 2,8 1,3 2,1 2,8
focusing 1,0 3,9 2,6 0,4
romantic 1,1 0,7 0,4 4,7
16. CONTEXT VECTORS
We can obtain two different
outputs:
First, we can extract column
vectors from matrix 𝐶. Each
column vector 𝑐𝑗 represents the
vector space representation of
the context 𝑐𝑗, obtained by
exploiting DSMs.
Term Good mood High
attention
Alone Couple
happy 3,4 1,5 1,7 2,4
fun 2,8 1,3 2,1 2,8
focusing 1,0 3,9 2,6 0,4
romantic 1,1 0,7 0,4 4,7
Column vectors of contexts «good mood»
and «couple»
16
17. LEXICON GENERATED
We can obtain two different
outputs:
Second, we can obtain the lexicon
of a contextual dimension by
extracting the first k lemmas with
the highest TF-IDF scores for each
column.
Term Good
mood
Couple
happy 3,4 2,4
fun 2,8 2,8
focusing 1,0 0,4
romantic 1,1 4,7
Sorting by scores
Good mood: happy, fun, focusing, romantic
Couple: romantic, fun, happy, focusing
17
19. THE RANKER
Given the set of contextual vectors generated by the context learner, a recommended
item with its reviews, and the current contextual situation of the user, the aim of the
ranker is to choose from the user reviews the most relevant sentences for the current
contexts of the user, that will be then included into the justification
S1: Very romantic movie
S2: Engaging plot
S3: Perfect for a relaxing night
Company: couple
S1: Very
romantic movie
19
20. SENTENCE REPRESENTATION
To establish the relevance of a sentence for a context, we used the cosine similarity
between the vector representation of the context (given by the context learner) and the
vector representation of the sentence, which is build at this step.
We chose only sentences with a positive sentiment: this has been decided because the
justification has to convince the user to consume the item.
Since each contextual vector has an n-dimensional representation, the ranker has to
build the same n-dimensional representation for the sentence
20
21. SENTENCE VECTOR
To build this representation, first reviews for an item are split into sentences.
Then, these sentences are filtered by sentiment (only positive), tokenized and
lemmatized (as done before).
Finally, the vector 𝑠𝑖 is instantiated in the same space defined by the term-context
matrix 𝐶𝑛,𝑘.
In particular, 𝑠𝑖 = 𝑣𝑡1
, 𝑣𝑡2
, … , 𝑣𝑡𝑛
𝑇
, where each 𝑣𝑡𝑗
represents the TF-IDF score of the
term 𝑡𝑗 (TF counts how many times 𝑡𝑗 appears in 𝑠𝑖, while IDF is calculated in the
canonical way).
21
22. COSINE SIMILARITY
At this point, we have both the contextual vectors and the sentence vector
representations, so it is possibile to compute the cosine similarity between them.
The sentence with the highest cosine similarity score is established to be the most
relevant sentence for that context.
This is performed for each context of consumption of the user: one sentence will be
chosen for each of them.
Let’s see a practical example
22
23. VISUAL EXAMPLE
Let us suppose that we instantiated two
different sentence vectors, related to the
same item:
s1=‘the plot is really interesting and
engaging’
s2=‘wonderful love story’
Let’s suppose that the user’s consumption
contexts are:
𝑐1=‘attention:high’
𝑐2=’company:couple’
23
24. VISUAL EXAMPLE
The closest sentence vector to the
context vector c1 is s1, so s1 will be
chosen for that context
The closest sentence vector to the
context vector c2 is s2, so s2 will be
chosen for that context
24
26. THE GENERATOR
The goal of the generator module is to put together chosen sentences in a single
natural language justification to be presented to the user.
The generated justifications are based on the combination of a fixed part, which is
common to all the justifications, and a dynamic part that depends on the outputs
returned by the previous steps.
The top-1 sentence for each current contextual dimension is selected, and the
different sentences are merged by exploiting simple connectives, such as adverbs
and conjunctions.
26
27. THE GENERATOR
Following the previous example, and
supposing that the item recommended is
Lost in translation, a real justification
provided by this framework could be this
one
27
28. FILMANDO
We tested this metodology in the movies domain.
We defined a set of consumption contexts for movies, and given a set of movies from
with their reviews, we applied the pipeline and constructed a web app integrating
the results.
28
29. CONTEXTS OF CONSUMPTION CHOSEN
We defined 3 different contextual situations, that can assume different values
Attention level:
High, low
Mood:
Good mood, bad mood
Company:
Alone, Friends, Couple
29
30. EXPERIMENT SPECIFICATIONS
For both the context learners (the one based on matrix multiplication and the one based
on PMI) we generated 3 kind of matrix configurations:
The first, based on unigrams
The second, based on bigrams
The third, based on the combination of unigrams and bigrams
The intuition behind this decision is that it is possibile that two single words, taken
alone, assume a meaning, but if they are considered together they could mean
something else.
30
31. RESEARCH QUESTIONS
1) How effective are DSMs based justifications, on varying of different combinations
of the parameters?
2) Do DSMs based justification algorithms obtain performance at least comparable
with respect to a static justification algorithms?
3) Do context aware justification obtain better performance with respect to non-
contextual justification?
31
35. 1) EFFECTIVENESS OF THE MODEL
35
DSMs effectiveness
Question Unigrams Bigrams Unigrams +
Bigrams
Transparency «I understood why the movie was suggested to
me»
3.38 3.81 3.64
Persuasion «The justification made the recommendation
more convincing»
3.56 3.62 3.54
Engagement «The justification allowed me to discover more
information about the movie»
3.54 3.72 3.70
Trust «The justification increased my trust in
recommender systems»
3.44 3.66 3.61
Bigrams behave better
36. 2) VS STATIC CONTEXTUAL BASELINE
36
Preferences: DSMs VS static contextual baseline
CA + DSMs Baseline Indifferent
Transparecny
53.28% 38.10% 19.52%
Persuasion
24.10% 36.33% 19.57%
Engagement
49.31% 39.23% 11.56%
Trust
42.86% 39.31% 17.83%
Improvements over a
contextual baseline based on
static lexicon, except for the
engagement
37. 3) VS NON CONTEXTUAL DISTRIBUTIONAL BASELINE
37
Preferences: DSMs VS distributional non-contextual
baseline
CA + DSMs Baseline Indifferent
Transparecny
52.38% 38.10% 19.32%
Persuasion
54.10% 36.33% 19.57%
Engagement
49.31% 39.23% 11.56%
Trust
42.86% 39.31% 17.83%
Great improvements over a
non-contextual baseline based
on DSMs
38. RECAP
The model seems to be appreciated by users
A representation based on bigrams better catches
the semantics of the different context of
consumptions
Users tend to prefer context-aware justifications,
and DSMs allow to build a more effective
representation
38