Time is a crucial factor when dealing with distributional models of language and knowledge. For example, tracking word meaning shift and entity evolution can have several applications and time may sneak into similarity as computed with these models in a way that may be difficult to control. In this presentation, we discuss two novel approaches to factor time into word and knowledge representations learned from text: explicit, with representations of temporal references (e.g., years, days, etc.), and implicit, with time-dependent representations of words and entities (e.g., amazon_1975 vs. amazon_2012). Finally, being this an emerging field of research, we will discuss several open topics in this research domain.
FBK, Trento, 10/5/2019
Two Approaches to Factor Time into Word and Entity Representations Learned from Text
1. Two Approaches to Factor Time into
Word and Entity Representations
Learned from Text
Matteo Palmonari and Federico Bianchi
Department of Informatics, Systems and Communication,
University of Milan-Bicocca
Talk@FBK Trento - 10/5/2019
INSID&S Lab
Interaction and Semantics for
Innovation with Data & Services
2. Outline
● Learning Word and Entity Representations from Text
● Factoring Time into Word and Entity Representations
Learned from Text
● Time-dependent Word Representations
● Representation of Temporal Entities and Time-aware
Similarity
● Future Work
4. Knowledge Graphs & Semantics
● Knowledge Graphs:
○ large representations of structured
knowledge
○ < subject, predicate, object >
○ ~1.3 billion triples in DBpedia
○ symbols to refer to entities, types, and
relations
○ types organized in sub-types graphs
● Model-theoretic or rule-bases semantics
Honolulu
Barack
Obama
birthPlace
Politician
CityPerson
Thing
Agent
Jay-Z
Musical
Artist
Place
Michelle
Obama
isMarriedTo
5. ‘Traditional’ Semantics: Interpretation and Inference
Intuitive interpretation of symbols (remark: interpretation functions are
a bit more complex than this)
● Barack Obama: a symbol denoting a domain object
● Married to: a symbol representing a relation between pairs of
domain objects
● Politician: a symbol representing a set of domain objects
Interpretation of sentences and inference
● “Barack Obama is married to Michelle Obama” (S) is true if the
the objects denoted by Barack Obama and Michelle Obama
belong to the set of married couples
● “All the friends of the husband are also friends of the wife” + S +
“Barack Obama is friend of Jay-Z”
○ “Michelle Obama is friend of Jay-Z”
Symbolic
Knowledge Representation
&
Reasoning
Credit: http://ontogenesis.knowledgeblog.org/1376
Barack
Obama
Michelle
Obama
Jay-Z
6. ‘Traditional’ Semantics: Interpretation and Inference
Intuitive interpretation (remark: interpretation functions are a bit more
complex than this)
● Barack Obama: a symbol denoting a domain object
● Married to: a symbol representing a relation between pairs of
domain objects
● Politician: a symbol representing a set of domain objects
Interpretation of sentences and inference
● “Barack Obama is married to Michelle Obama” (S) is true if the
the objects denoted by Barack Obama and Michelle Obama
belong to the set of married couples
● “All the friends of the husband are also friends of the wife” + S +
“Barack Obama is friend of Jay-Z”
○ “Michelle Obama is friend of Jay-Z”
Symbolic
Knowledge Representation
&
Reasoning
Credit: http://ontogenesis.knowledgeblog.org/1376
Barack
Obama
Michelle
Obama
Jay-Z
Difficult to answer other questions:
● Who’s the US president most similar to Barack Obama?
● Which concept is similar to the concept Politician?
● Who’s the equivalent of Barack Obama in France?
7. Distributional Semantics: Meaning from Usage
● “The meaning of a word is its use in the language” (Wittgenstein, 1953)
● “You shall know a word by the company it keeps” (Firth, 1957)
Distributional Hypothesis:
similar words tend to appear in similar contexts
8. Distributional Semantics: Meaning from Usage
(From Lenci & Evert): what’s the meaning of ‘bardiwac’?
‘Bardiwac’ is a heavy red alcoholic beverage made from grape
● He handed her glass of bardiwac
● Beef dishes are made to complement the bardiwacs
● Nigel staggered to his feet, face flushed from too much bardiwac
● Malbec, one of the lesser-known bardiwac grapes, responds well to
Australia’s sunshine
● I dined on bread and cheese and this excellent bardiwac
● The drinks were delicious: blood-red bardiwac as well as light, sweet
Rhenish
‘Bardiwac’ appears in drinking-related contexts, close to words like ‘glass’ and ‘grape’
9. Distributional Semantics of Words with Word2Vec
● Vector representations of words, i.e., word embeddings, are generated from a
text corpus using a neural network [Mikolov+, 2013]
cat
dog
The big black cat eats its food.
My little black cat sleeps all day.
Sometimes my dog eats too much!
● The neural network generates vectors so as to predict a target word given its
context, or, a context given a target word
● Similar words appear in similar contexts and have similar vectors
● More algorithms to generate word representations, e.g., ELMo and BERT, exist
Target word Context words
10. Words vs. Entities
Paris has many meanings … 21 pages of cities named Paris in Wikipedia
KGs provide large inventories of entities to disambiguate names
11. Knowledge Graphs & Semantics
● Knowledge Graphs:
○ large representations of structured
knowledge
○ < subject, predicate, object >
○ ~1.3 billion triples in DBpedia
○ symbols to refer to entities, types, and
relations
○ types organized in sub-types graphs
● Model-theoretic or rule-bases semantics
● Vector-based semantics by learning
representations of entities, types and relations
from data
○ TransE [Bordes+2013],…, CompleX
Trouillon+2017]
○ Logic Tensor Networks [Serafini+2016]
○ ...
Honolulu
Barack
Obama
birthPlace
Politician
CityPerson
Thing
Agent
Jay-Z
Musical
Artist
Place
Michelle
Obama
isMarriedTo
TEE: a model for representing entities and types
grounded in distributional semantics
12. TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
13. TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
14. TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
15. TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
16. TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text
Concatenate
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
17. TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text
Concatenate
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
1 3 6 3 19 5 6
v(Rome)v(City)
Wikipedia’s abstracts
19. TEE: Usefulness of Typed Entity Embeddings
Rome
Paris
1 3 6 3 1
5 2 2 2 4
Italy 1 3 4 9 1
City
Country
1
3
1
sim(Rome, Paris) = 0.65
sim(Rome, Italy) = 0.79
sim(City_Rome, City_Paris) = 0.79
sim(City_Rome, Country_Italy) = 0.71
1 3 6 3 1
5 5 1 3 1
1 3 6 3 1
Rome in the joint space is now nearer to Paris than to Italy
City
20. Analogies with Typed Entity Embeddings
● Accuracy for analogies with entities up to 0.92 on datasets used to test
word embeddings and adapted for entities
● Accuracy for analogies with words after disambiguating with entities up to
0.86 vs. 0.80 (best word2vec)
● Analogies with types and other interesting properties of type embeddings
discussed in previous work [Bianchi+2017]
Joint Work with Fabio Massimo Zanzotto, Università degli Studi di Roma 'Tor Vergata'
22. Why Caring about Time?
● Time is a key factor in word/entity semantics and in the
evaluation of similarity
● Time-sensitive applications:
■ Tracking word meaning shift and entity evolution
● apple_1953 vs apple_2017
● dbr:Apple_1990 more similar to “laptop” vs.
dbr:Apple_2018 more similar to “smartphone”
23. Why Caring about Time?
● Time is a key factor in word/entity semantics and in the
evaluation of similarity
● Time-sensitive applications:
■ Temporal analogies
● reagan is to 1987 as ? is to 1997 (clinton)
24. Why Caring about Time?
● Time is a key factor in word/entity semantics and in the
evaluation of similarity
● Time-sensitive applications:
■ Controlling the effect of time on similarity (time
sneaks into similarity in a way that may be difficult
to control)
● Should “dbr:Barack Obama” be more similar to
“dbr:Joe Biden” or “dbr:John F. Kennedy”?
25. • Word meanings are constantly evolving, reflecting the continuous
change of the world and the needs of its speakers
For example:
•apple:
fruit → computer → smartphone
•trump:
real estate → television → POTUS
Language Changes across Time…
27. Factoring Time into Representations based on DS
Explicit
● Representation of temporal
words/entities
■ E.g., 90s, 2012, 7AM,
8/9/1943, monday,
tomorrow
Implicit
● Time-dependent representations of
words/entities
■ E.g., amazon_1960 vs.
amazon_2018, 1960 vs.
amazon_2018; dbr:Apple_1990
vs. dbr:Apple_2018
Words Entities
Implicit [De Carlo&al.AAAI’19]
Explicit [Bianchi&al.ISWC’18]
28. Factoring Time into Representations Based on DS
Explicit
● Representation of temporal
words/entities
■ E.g., 90s, 2012, 7AM,
8/9/1943, monday,
tomorrow
Implicit
● Time-dependent representations of
words/entities
■ E.g., amazon_1960 vs.
amazon_2018; dbr:Apple_1990
vs. dbr:Apple_2018
Words Entities
Implicit [De Carlo&al.AAAI’19]
Explicit [Bianchi&al.ISWC’18]
● Control time effect in similarity
● Tracking semantic evolution
● Temporal analogies
30. Temporal Word Embeddings
● Temporal word embeddings are vector representations of words during
specific temporal intervals (e.g. the year 2001, the day 3/28/2018)
● They are learned from diachronic text corpora, divided in multiple
temporal slices (e.g. news articles, social posts)
1999
clinton, 2001
clinton, 2000
clinton, 1999
2000
2001
Requires alignment of different vector spaces
31. Temporal word embeddings models:
● One vector for each time slice (corpus partitions);
● Capture meaning shift clinton1981
≠ clinton2001
;
● Require alignment between models trained on
each temporal slice.
Alignment problem:
Analogy: two cartographers drawing a map starting
from different places and without a compass.
State-of-the-art vs. Temporal Word Embeddings with a Compass (TWEC):
● Pairwise-alignment: train each slice separately and then align them with linear transformations
[Kulkarni2015];
● Joint-alignment: train all the vectors concurrently, enforcing them to be aligned [Yao+2018];
● TWEC (this work): implicit alignment with a compass.
Training Temporal Word Embeddings with a Compass
32. Training Temporal Word Embeddings with a Compass: Intuition
CBOW
Word2vec comes in two flavors: Skip-gram
and Continuous Bag of Word Model (CBOW)
CBOW uses two matrices:
● input matrix
● target matrix
Intuition: fix one matrix while updating the
other matrix
33. Training Temporal Word Embeddings with a
Compass:
1. run CBOW on entire corpus
2. take target matrix (the compass)
3. use the target matrix to initialize the CBOW of
each slice and freeze it
4. each slice is trained separately and aligned
with the compass
Why using TWEC?
● Fast (generalization of CBOW)
● Easy to implement
● Good results with large and small corpora on:
○ Temporal analogical reasoning
○ Held-out tests
1
2
3
4
Log
Likelihood
Posterior Log
Probability
SW2V -2.66 TW2V -3.30
SBE -1.77 OW2V -3.30
TWEC -2.69 TWEC -2.80
DBE -1.70* DBE -3.16
4
Large
Corpus
Small
Corpus
MRR MRR
TWEC 0.484 TWEC 0.481
TW2V 0.444 TW2V 0.143
SW2V 0.283 SW2V 0.375
44
Training Temporal Word Embeddings with a Compass: Details
34. Temporal Word Embeddings with a Compass: Example #1
president
senator
hillary
foundation
administration
texas
george
bill
clinton,1999
bush,1999
bush,2001
clinton,2001
35. Temporal Word
Embeddings with
a Compass:
Example #2
Each point is the
representation of a
president in a given year
(e.g., bush_2001)
36. Explicit Representation of Temporal
Entities
Bianchi, F., Palmonari, M., & Nozza, D. (2018, October). Towards Encoding
Time in Text-Based Entity Embeddings. In International Semantic Web
Conference (pp. 56-71). Springer, Cham.
37. ● First approach to explicitly encode time into entity embeddings (some parallel work in CogSci on
representation of temporal words such as monday, tomorrow, etc.)
● Lack of control over the time-effect in similarity evaluation:
○ Entities are similar when they co-occur frequently, entities that share a time period co-occur
more frequently
○ E.g., Most similar entities to “Winston Churchill” are his contemporary politicians
● Explicitly encoding of time periods (year-level) to control the similarity with respect to time
Time & Similarity
Winston Churchill Harold Macmillan
39. Textual Descriptions of Time Periods via Events
“The succession of events is an inherent property of our time perception. Memory
is necessary, and the order of these events is fundamental”
Snaider&al. 2012, Cognitive Systems Research
40. Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
41. Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
42. Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Adolf Hitler
Nazi Germany
World War II
4 3 6 2 3
5 1 2 9 2
1 2 8 4 1
43. Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Adolf Hitler 4 3 6 2 3
Nazi Germany 5 1 2 9 2
World War II 1 2 8 4 1
1941
9 2 3 5 5
AVG
44. Embedded Representations vs. Natural Time Flow
191X
years
201X
years
PCA in 1D vs. natural order of years: Kendall τ = 0.80 and Spearman Rank correlation coefficient = 0.94
Good resemblance of natural time flow!
2D projection (PCA)
1D projection (PCA)
45. Towards Time Aware Similarity
Time flattened similarity: to reduce the impact of time in the similarity.
E.g., make US presidents similar independently from their temporal context.
Time boosted similarity: to boost the impact of time in the similarity.
E.g., make politicians that share temporal contexts more similar
47. Time Flattened Similarity
Extract the embeddings for the two entities
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
48. Time Flattened Similarity
1999 2003
Find the closest year vectors to the two entity
embeddings (e.g., the entity vector of Barack
Obama is close to the vector of the year
2003).
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
50. Time Flattened Similarity
1999 2003
𝝍( , ) = η( , )
Cosine similarity
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
51. Time Flattened Similarity
1999 2003
𝝍( , ) = η( , ) - ηn
( , )1990 2003
Normalized cosine similarity
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
52. Time Flattened Similarity
1999 2003
𝝍( , ) = ⍺η( , ) - (1 - ⍺) ηn
( , )1999 2003
⍺ to control the weight of the time factor
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
53. Controlling Time Bias: Flattened Similarity
Task: find similar entities to a given input entity but that are far in time. E.g., find
past president given one
Ford
Coolidge
Hoover
T. Kennedy
Truman
Barack
Obama
54. Controlling Time Bias: Flattened Similarity
Task: find similar entities to a given input entity but that are far in time. E.g., find
past president given one
Ford
Coolidge
Hoover
T. Kennedy
Truman
Barack
Obama
Correct
Correct
Correct
Correct
Wrong
55. Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
56. Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
Ford
Coolidge
T. Kennedy
Hoover
Time flattened
similarity to
reorder the
top-100 most
similar
alpha = 0.7
New
New
New
New
57. Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Time flattened
similarity to
reorder the
top-100 most
similar
alpha = 0.1
New
New
New
New
Ford
Coolidge
Hoover
Truman
Roosevelt
Wilson
E. Roosevelt
Harding
Cleveland
Eisenhower
New
New
New
New
New
New
59. Future Work
● More testing of TWEC with entities
● Beyond time-based slicing: a framework for aspect-based
comparison of distributional models of words/entities
○ Trump_NYT vs. Trump_RussiaToday
○ flat_NYT vs. flat_TheGuardian
● Using TEE as source of “intuitive” knowledge to combine it
with Logic Tensor Networks for reasoning
61. References
Di Carlo, V., Bianchi, F. & Palmonari, M. (2019). Training Temporal Word Embeddings with a Compass. AAAI (to appear).
Bianchi, F., Palmonari, M., & Nozza, D. (2018, October). Towards Encoding Time in Text-Based Entity Embeddings. In International Semantic Web
Conference (pp. 56-71). Springer, Cham.
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., & Bouchard, G. (2016, June). Complex embeddings for simple link prediction. In International
Conference on Machine Learning (pp. 2071-2080).
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In Advances
in neural information processing systems (pp. 2787-2795).
Bianchi, F., Soto, M. Palmonari, M., & Cutrona, V. (2018, June). Type Vector Representations from Text: An empirical analysis. in Deep Learning for
Knowledge Graphs and Semantic Technologies Workshop, co-located with the Extended Semantic Web Conference.
Kulkarni, V., Al-Rfou, R., Perozzi, B., & Skiena, S. (2015, May). Statistically significant detection of linguistic change. In Proceedings of the 24th
International Conference on World Wide Web (pp. 625-635). International World Wide Web Conferences Steering Committee.
Yao, Z., Sun, Y., Ding, W., Rao, N., & Xiong, H. (2018, February). Dynamic word embeddings for evolving semantic discovery. In Proceedings of the
Eleventh ACM International Conference on Web Search and Data Mining (pp. 673-681). ACM.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In
Advances in neural information processing systems (pp. 3111-3119).