Two Approaches to Factor Time into Word and Entity Representations Learned from Text

Two Approaches to Factor Time into
Word and Entity Representations
Learned from Text
Matteo Palmonari and Federico Bianchi
Department of Informatics, Systems and Communication,
University of Milan-Bicocca
Talk@FBK Trento - 10/5/2019
INSID&S Lab
Interaction and Semantics for
Innovation with Data & Services

Outline
● Learning Word and Entity Representations from Text
● Factoring Time into Word and Entity Representations
Learned from Text
● Time-dependent Word Representations
● Representation of Temporal Entities and Time-aware
Similarity
● Future Work

Learning Word and Entity
Representations from Text

Knowledge Graphs & Semantics
● Knowledge Graphs:
○ large representations of structured
knowledge
○ < subject, predicate, object >
○ ~1.3 billion triples in DBpedia
○ symbols to refer to entities, types, and
relations
○ types organized in sub-types graphs
● Model-theoretic or rule-bases semantics
Honolulu
Barack
Obama
birthPlace
Politician
CityPerson
Thing
Agent
Jay-Z
Musical
Artist
Place
Michelle
Obama
isMarriedTo

‘Traditional’ Semantics: Interpretation and Inference
Intuitive interpretation of symbols (remark: interpretation functions are
a bit more complex than this)
● Barack Obama: a symbol denoting a domain object
● Married to: a symbol representing a relation between pairs of
domain objects
● Politician: a symbol representing a set of domain objects
Interpretation of sentences and inference
● “Barack Obama is married to Michelle Obama” (S) is true if the
the objects denoted by Barack Obama and Michelle Obama
belong to the set of married couples
● “All the friends of the husband are also friends of the wife” + S +
“Barack Obama is friend of Jay-Z”
○ “Michelle Obama is friend of Jay-Z”
Symbolic
Knowledge Representation
&
Reasoning
Credit: http://ontogenesis.knowledgeblog.org/1376
Barack
Obama
Michelle
Obama
Jay-Z

‘Traditional’ Semantics: Interpretation and Inference
Intuitive interpretation (remark: interpretation functions are a bit more
complex than this)
● Barack Obama: a symbol denoting a domain object
● Married to: a symbol representing a relation between pairs of
domain objects
● Politician: a symbol representing a set of domain objects
Interpretation of sentences and inference
● “Barack Obama is married to Michelle Obama” (S) is true if the
the objects denoted by Barack Obama and Michelle Obama
belong to the set of married couples
● “All the friends of the husband are also friends of the wife” + S +
“Barack Obama is friend of Jay-Z”
○ “Michelle Obama is friend of Jay-Z”
Symbolic
Knowledge Representation
&
Reasoning
Credit: http://ontogenesis.knowledgeblog.org/1376
Barack
Obama
Michelle
Obama
Jay-Z
Difficult to answer other questions:
● Who’s the US president most similar to Barack Obama?
● Which concept is similar to the concept Politician?
● Who’s the equivalent of Barack Obama in France?

Distributional Semantics: Meaning from Usage
● “The meaning of a word is its use in the language” (Wittgenstein, 1953)
● “You shall know a word by the company it keeps” (Firth, 1957)
Distributional Hypothesis:
similar words tend to appear in similar contexts

Distributional Semantics: Meaning from Usage
(From Lenci & Evert): what’s the meaning of ‘bardiwac’?
‘Bardiwac’ is a heavy red alcoholic beverage made from grape
● He handed her glass of bardiwac
● Beef dishes are made to complement the bardiwacs
● Nigel staggered to his feet, face ﬂushed from too much bardiwac
● Malbec, one of the lesser-known bardiwac grapes, responds well to
Australia’s sunshine
● I dined on bread and cheese and this excellent bardiwac
● The drinks were delicious: blood-red bardiwac as well as light, sweet
Rhenish
‘Bardiwac’ appears in drinking-related contexts, close to words like ‘glass’ and ‘grape’

Distributional Semantics of Words with Word2Vec
● Vector representations of words, i.e., word embeddings, are generated from a
text corpus using a neural network [Mikolov+, 2013]
cat
dog
The big black cat eats its food.
My little black cat sleeps all day.
Sometimes my dog eats too much!
● The neural network generates vectors so as to predict a target word given its
context, or, a context given a target word
● Similar words appear in similar contexts and have similar vectors
● More algorithms to generate word representations, e.g., ELMo and BERT, exist
Target word Context words

Words vs. Entities
Paris has many meanings … 21 pages of cities named Paris in Wikipedia
KGs provide large inventories of entities to disambiguate names

Knowledge Graphs & Semantics
● Knowledge Graphs:
○ large representations of structured
knowledge
○ < subject, predicate, object >
○ ~1.3 billion triples in DBpedia
○ symbols to refer to entities, types, and
relations
○ types organized in sub-types graphs
● Model-theoretic or rule-bases semantics
● Vector-based semantics by learning
representations of entities, types and relations
from data
○ TransE [Bordes+2013],…, CompleX
Trouillon+2017]
○ Logic Tensor Networks [Serafini+2016]
○ ...
Honolulu
Barack
Obama
birthPlace
Politician
CityPerson
Thing
Agent
Jay-Z
Musical
Artist
Place
Michelle
Obama
isMarriedTo
TEE: a model for representing entities and types
grounded in distributional semantics

TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts

Italy and a special
comune (named
Comune di Roma
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Link to DBpedia
entities via named
entity linking tools
[Bianchi+,2017b]
[Bianchi+, 2018a]

Italy and a special
comune (named
Comune di Roma
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Italy and a special
comune (named
Comune di Roma
Link to DBpedia
entities via named
Replace
entities
with their most
speciﬁc types
[Bianchi+,2017b]
[Bianchi+, 2018a]

Italy and a special
comune (named
Comune di Roma
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Link to DBpedia
entities via named
Replace
entities
with their most
speciﬁc types
[Bianchi+,2017b]
[Bianchi+, 2018a]

Italy and a special
comune (named
Comune di Roma
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text
Concatenate
Italy and a special
comune (named
Comune di Roma
Link to DBpedia
entities via named
Replace
entities
with their most
speciﬁc types
[Bianchi+,2017b]
[Bianchi+, 2018a]

Italy and a special
comune (named
Comune di Roma
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text
Concatenate
Italy and a special
comune (named
Comune di Roma
Link to DBpedia
entities via named
Replace
entities
with their most
speciﬁc types
[Bianchi+,2017b]
[Bianchi+, 2018a]
1 3 6 3 19 5 6
v(Rome)v(City)

Vector-space Representations of Words vs. Entities
words entities

TEE: Usefulness of Typed Entity Embeddings
Rome
Paris
1 3 6 3 1
5 2 2 2 4
Italy 1 3 4 9 1
City
Country
1
3
1
sim(Rome, Paris) = 0.65
sim(Rome, Italy) = 0.79
sim(City_Rome, City_Paris) = 0.79
sim(City_Rome, Country_Italy) = 0.71
1 3 6 3 1
5 5 1 3 1
1 3 6 3 1
Rome in the joint space is now nearer to Paris than to Italy
City

Analogies with Typed Entity Embeddings
● Accuracy for analogies with entities up to 0.92 on datasets used to test
word embeddings and adapted for entities
● Accuracy for analogies with words after disambiguating with entities up to
0.86 vs. 0.80 (best word2vec)
● Analogies with types and other interesting properties of type embeddings
discussed in previous work [Bianchi+2017]
Joint Work with Fabio Massimo Zanzotto, Università degli Studi di Roma 'Tor Vergata'

Factoring Time into Word and Entity
Representations Learned from Text

Why Caring about Time?
● Time is a key factor in word/entity semantics and in the
evaluation of similarity
● Time-sensitive applications:
■ Tracking word meaning shift and entity evolution
● apple_1953 vs apple_2017
● dbr:Apple_1990 more similar to “laptop” vs.
dbr:Apple_2018 more similar to “smartphone”

■ Temporal analogies
● reagan is to 1987 as ? is to 1997 (clinton)

■ Controlling the eﬀect of time on similarity (time
sneaks into similarity in a way that may be diﬃcult
to control)
● Should “dbr:Barack Obama” be more similar to
“dbr:Joe Biden” or “dbr:John F. Kennedy”?

• Word meanings are constantly evolving, reﬂecting the continuous
change of the world and the needs of its speakers
For example:
•apple:
fruit → computer → smartphone
•trump:
real estate → television → POTUS
Language Changes across Time…

Temporal Word Analogies: Examples

Factoring Time into Representations based on DS
Explicit
● Representation of temporal
words/entities
■ E.g., 90s, 2012, 7AM,
8/9/1943, monday,
tomorrow
Implicit
● Time-dependent representations of
words/entities
■ E.g., amazon_1960 vs.
amazon_2018, 1960 vs.
amazon_2018; dbr:Apple_1990
vs. dbr:Apple_2018
Words Entities
Implicit [De Carlo&al.AAAI’19]
Explicit [Bianchi&al.ISWC’18]

Factoring Time into Representations Based on DS
Explicit
● Representation of temporal
words/entities
■ E.g., 90s, 2012, 7AM,
8/9/1943, monday,
tomorrow
Implicit
● Time-dependent representations of
words/entities
■ E.g., amazon_1960 vs.
amazon_2018; dbr:Apple_1990
vs. dbr:Apple_2018
Words Entities
Implicit [De Carlo&al.AAAI’19]
Explicit [Bianchi&al.ISWC’18]
● Control time effect in similarity
● Tracking semantic evolution
● Temporal analogies

Training Time-dependent Word
Representations with a Compass
Di Carlo, V., Bianchi, F. & Palmonari, M.. Training Temporal Word Embeddings
with a Compass. In AAAI 2019.

Temporal Word Embeddings
● Temporal word embeddings are vector representations of words during
speciﬁc temporal intervals (e.g. the year 2001, the day 3/28/2018)
● They are learned from diachronic text corpora, divided in multiple
temporal slices (e.g. news articles, social posts)
1999
clinton, 2001
clinton, 2000
clinton, 1999
2000
2001
Requires alignment of diﬀerent vector spaces

Temporal word embeddings models:
● One vector for each time slice (corpus partitions);
● Capture meaning shift clinton1981
≠ clinton2001
;
● Require alignment between models trained on
each temporal slice.
Alignment problem:
Analogy: two cartographers drawing a map starting
from diﬀerent places and without a compass.
State-of-the-art vs. Temporal Word Embeddings with a Compass (TWEC):
● Pairwise-alignment: train each slice separately and then align them with linear transformations
[Kulkarni2015];
● Joint-alignment: train all the vectors concurrently, enforcing them to be aligned [Yao+2018];
● TWEC (this work): implicit alignment with a compass.
Training Temporal Word Embeddings with a Compass

Training Temporal Word Embeddings with a Compass: Intuition
CBOW
Word2vec comes in two ﬂavors: Skip-gram
and Continuous Bag of Word Model (CBOW)
CBOW uses two matrices:
● input matrix
● target matrix
Intuition: ﬁx one matrix while updating the
other matrix

Training Temporal Word Embeddings with a
Compass:
1. run CBOW on entire corpus
2. take target matrix (the compass)
3. use the target matrix to initialize the CBOW of
each slice and freeze it
4. each slice is trained separately and aligned
with the compass
Why using TWEC?
● Fast (generalization of CBOW)
● Easy to implement
● Good results with large and small corpora on:
○ Temporal analogical reasoning
○ Held-out tests
1
2
3
4
Log
Likelihood
Posterior Log
Probability
SW2V -2.66 TW2V -3.30
SBE -1.77 OW2V -3.30
TWEC -2.69 TWEC -2.80
DBE -1.70* DBE -3.16
4
Large
Corpus
Small
Corpus
MRR MRR
TWEC 0.484 TWEC 0.481
TW2V 0.444 TW2V 0.143
SW2V 0.283 SW2V 0.375
44
Training Temporal Word Embeddings with a Compass: Details

Temporal Word Embeddings with a Compass: Example #1
president
senator
hillary
foundation
administration
texas
george
bill
clinton,1999
bush,1999
bush,2001
clinton,2001

Temporal Word
Embeddings with
a Compass:
Example #2
Each point is the
representation of a
president in a given year
(e.g., bush_2001)

Explicit Representation of Temporal
Entities
Bianchi, F., Palmonari, M., & Nozza, D. (2018, October). Towards Encoding
Time in Text-Based Entity Embeddings. In International Semantic Web
Conference (pp. 56-71). Springer, Cham.

● First approach to explicitly encode time into entity embeddings (some parallel work in CogSci on
representation of temporal words such as monday, tomorrow, etc.)
● Lack of control over the time-eﬀect in similarity evaluation:
○ Entities are similar when they co-occur frequently, entities that share a time period co-occur
more frequently
○ E.g., Most similar entities to “Winston Churchill” are his contemporary politicians
● Explicitly encoding of time periods (year-level) to control the similarity with respect to time
Time & Similarity
Winston Churchill Harold Macmillan

Textual Descriptions of Time Periods via Events

Textual Descriptions of Time Periods via Events
“The succession of events is an inherent property of our time perception. Memory
is necessary, and the order of these events is fundamental”
Snaider&al. 2012, Cognitive Systems Research

Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description

Adolf Hitler
Nazi Germany
World War II
4 3 6 2 3
5 1 2 9 2
1 2 8 4 1

Adolf Hitler 4 3 6 2 3
Nazi Germany 5 1 2 9 2
World War II 1 2 8 4 1
1941
9 2 3 5 5
AVG

Embedded Representations vs. Natural Time Flow
191X
years
201X
years
PCA in 1D vs. natural order of years: Kendall τ = 0.80 and Spearman Rank correlation coeﬃcient = 0.94
Good resemblance of natural time ﬂow!
2D projection (PCA)
1D projection (PCA)

Towards Time Aware Similarity
Time ﬂattened similarity: to reduce the impact of time in the similarity.
E.g., make US presidents similar independently from their temporal context.
Time boosted similarity: to boost the impact of time in the similarity.
E.g., make politicians that share temporal contexts more similar

Time Flattened Similarity
What’s the time ﬂattened similarity between
Barack Obama and Bill Clinton?

Extract the embeddings for the two entities

1999 2003
Find the closest year vectors to the two entity
embeddings (e.g., the entity vector of Barack
Obama is close to the vector of the year
2003).

1999 2003
𝝍( , )

1999 2003
𝝍( , ) = η( , )
Cosine similarity

1999 2003
𝝍( , ) = η( , ) - ηn
( , )1990 2003
Normalized cosine similarity

1999 2003
𝝍( , ) = ⍺η( , ) - (1 - ⍺) ηn
( , )1999 2003
⍺ to control the weight of the time factor

Controlling Time Bias: Flattened Similarity
Task: ﬁnd similar entities to a given input entity but that are far in time. E.g., ﬁnd
past president given one
Ford
Coolidge
Hoover
T. Kennedy
Truman
Barack
Obama

Controlling Time Bias: Flattened Similarity
Task: ﬁnd similar entities to a given input entity but that are far in time. E.g., ﬁnd
past president given one
Ford
Coolidge
Hoover
T. Kennedy
Truman
Barack
Obama
Correct
Correct
Correct
Correct
Wrong

Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE

Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
Ford
Coolidge
T. Kennedy
Hoover
Time flattened
similarity to
reorder the
top-100 most
similar
alpha = 0.7
New
New
New
New

Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Time flattened
similarity to
reorder the
top-100 most
similar
alpha = 0.1
New
New
New
New
Ford
Coolidge
Hoover
Truman
Roosevelt
Wilson
E. Roosevelt
Harding
Cleveland
Eisenhower
New
New
New
New
New
New

Future Work
● More testing of TWEC with entities
● Beyond time-based slicing: a framework for aspect-based
comparison of distributional models of words/entities
○ Trump_NYT vs. Trump_RussiaToday
○ ﬂat_NYT vs. ﬂat_TheGuardian
● Using TEE as source of “intuitive” knowledge to combine it
with Logic Tensor Networks for reasoning

Ongoing Work: Embeddings + LTN

References
Di Carlo, V., Bianchi, F. & Palmonari, M. (2019). Training Temporal Word Embeddings with a Compass. AAAI (to appear).
Bianchi, F., Palmonari, M., & Nozza, D. (2018, October). Towards Encoding Time in Text-Based Entity Embeddings. In International Semantic Web
Conference (pp. 56-71). Springer, Cham.
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., & Bouchard, G. (2016, June). Complex embeddings for simple link prediction. In International
Conference on Machine Learning (pp. 2071-2080).
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In Advances
in neural information processing systems (pp. 2787-2795).
Bianchi, F., Soto, M. Palmonari, M., & Cutrona, V. (2018, June). Type Vector Representations from Text: An empirical analysis. in Deep Learning for
Knowledge Graphs and Semantic Technologies Workshop, co-located with the Extended Semantic Web Conference.
Kulkarni, V., Al-Rfou, R., Perozzi, B., & Skiena, S. (2015, May). Statistically signiﬁcant detection of linguistic change. In Proceedings of the 24th
International Conference on World Wide Web (pp. 625-635). International World Wide Web Conferences Steering Committee.
Yao, Z., Sun, Y., Ding, W., Rao, N., & Xiong, H. (2018, February). Dynamic word embeddings for evolving semantic discovery. In Proceedings of the
Eleventh ACM International Conference on Web Search and Data Mining (pp. 673-681). ACM.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In
Advances in neural information processing systems (pp. 3111-3119).

Two Approaches to Factor Time into Word and Entity Representations Learned from Text

Recommended

Recommended

More Related Content

Similar to Two Approaches to Factor Time into Word and Entity Representations Learned from Text

Similar to Two Approaches to Factor Time into Word and Entity Representations Learned from Text (20)

Recently uploaded

Recently uploaded (20)

Two Approaches to Factor Time into Word and Entity Representations Learned from Text