SlideShare a Scribd company logo
1 of 62
Download to read offline
Two Approaches to Factor Time into
Word and Entity Representations
Learned from Text
Matteo Palmonari and Federico Bianchi
Department of Informatics, Systems and Communication,
University of Milan-Bicocca
Talk@FBK Trento - 10/5/2019
INSID&S Lab
Interaction and Semantics for
Innovation with Data & Services
Outline
● Learning Word and Entity Representations from Text
● Factoring Time into Word and Entity Representations
Learned from Text
● Time-dependent Word Representations
● Representation of Temporal Entities and Time-aware
Similarity
● Future Work
Learning Word and Entity
Representations from Text
Knowledge Graphs & Semantics
● Knowledge Graphs:
○ large representations of structured
knowledge
○ < subject, predicate, object >
○ ~1.3 billion triples in DBpedia
○ symbols to refer to entities, types, and
relations
○ types organized in sub-types graphs
● Model-theoretic or rule-bases semantics
Honolulu
Barack
Obama
birthPlace
Politician
CityPerson
Thing
Agent
Jay-Z
Musical
Artist
Place
Michelle
Obama
isMarriedTo
‘Traditional’ Semantics: Interpretation and Inference
Intuitive interpretation of symbols (remark: interpretation functions are
a bit more complex than this)
● Barack Obama: a symbol denoting a domain object
● Married to: a symbol representing a relation between pairs of
domain objects
● Politician: a symbol representing a set of domain objects
Interpretation of sentences and inference
● “Barack Obama is married to Michelle Obama” (S) is true if the
the objects denoted by Barack Obama and Michelle Obama
belong to the set of married couples
● “All the friends of the husband are also friends of the wife” + S +
“Barack Obama is friend of Jay-Z”
○ “Michelle Obama is friend of Jay-Z”
Symbolic
Knowledge Representation
&
Reasoning
Credit: http://ontogenesis.knowledgeblog.org/1376
Barack
Obama
Michelle
Obama
Jay-Z
‘Traditional’ Semantics: Interpretation and Inference
Intuitive interpretation (remark: interpretation functions are a bit more
complex than this)
● Barack Obama: a symbol denoting a domain object
● Married to: a symbol representing a relation between pairs of
domain objects
● Politician: a symbol representing a set of domain objects
Interpretation of sentences and inference
● “Barack Obama is married to Michelle Obama” (S) is true if the
the objects denoted by Barack Obama and Michelle Obama
belong to the set of married couples
● “All the friends of the husband are also friends of the wife” + S +
“Barack Obama is friend of Jay-Z”
○ “Michelle Obama is friend of Jay-Z”
Symbolic
Knowledge Representation
&
Reasoning
Credit: http://ontogenesis.knowledgeblog.org/1376
Barack
Obama
Michelle
Obama
Jay-Z
Difficult to answer other questions:
● Who’s the US president most similar to Barack Obama?
● Which concept is similar to the concept Politician?
● Who’s the equivalent of Barack Obama in France?
Distributional Semantics: Meaning from Usage
● “The meaning of a word is its use in the language” (Wittgenstein, 1953)
● “You shall know a word by the company it keeps” (Firth, 1957)
Distributional Hypothesis:
similar words tend to appear in similar contexts
Distributional Semantics: Meaning from Usage
(From Lenci & Evert): what’s the meaning of ‘bardiwac’?
‘Bardiwac’ is a heavy red alcoholic beverage made from grape
● He handed her glass of bardiwac
● Beef dishes are made to complement the bardiwacs
● Nigel staggered to his feet, face flushed from too much bardiwac
● Malbec, one of the lesser-known bardiwac grapes, responds well to
Australia’s sunshine
● I dined on bread and cheese and this excellent bardiwac
● The drinks were delicious: blood-red bardiwac as well as light, sweet
Rhenish
‘Bardiwac’ appears in drinking-related contexts, close to words like ‘glass’ and ‘grape’
Distributional Semantics of Words with Word2Vec
● Vector representations of words, i.e., word embeddings, are generated from a
text corpus using a neural network [Mikolov+, 2013]
cat
dog
The big black cat eats its food.
My little black cat sleeps all day.
Sometimes my dog eats too much!
● The neural network generates vectors so as to predict a target word given its
context, or, a context given a target word
● Similar words appear in similar contexts and have similar vectors
● More algorithms to generate word representations, e.g., ELMo and BERT, exist
Target word Context words
Words vs. Entities
Paris has many meanings … 21 pages of cities named Paris in Wikipedia
KGs provide large inventories of entities to disambiguate names
Knowledge Graphs & Semantics
● Knowledge Graphs:
○ large representations of structured
knowledge
○ < subject, predicate, object >
○ ~1.3 billion triples in DBpedia
○ symbols to refer to entities, types, and
relations
○ types organized in sub-types graphs
● Model-theoretic or rule-bases semantics
● Vector-based semantics by learning
representations of entities, types and relations
from data
○ TransE [Bordes+2013],…, CompleX
Trouillon+2017]
○ Logic Tensor Networks [Serafini+2016]
○ ...
Honolulu
Barack
Obama
birthPlace
Politician
CityPerson
Thing
Agent
Jay-Z
Musical
Artist
Place
Michelle
Obama
isMarriedTo
TEE: a model for representing entities and types
grounded in distributional semantics
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text
Concatenate
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text
Concatenate
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
1 3 6 3 19 5 6
v(Rome)v(City)
Wikipedia’s abstracts
Vector-space Representations of Words vs. Entities
words entities
TEE: Usefulness of Typed Entity Embeddings
Rome
Paris
1 3 6 3 1
5 2 2 2 4
Italy 1 3 4 9 1
City
Country
1
3
1
sim(Rome, Paris) = 0.65
sim(Rome, Italy) = 0.79
sim(City_Rome, City_Paris) = 0.79
sim(City_Rome, Country_Italy) = 0.71
1 3 6 3 1
5 5 1 3 1
1 3 6 3 1
Rome in the joint space is now nearer to Paris than to Italy
City
Analogies with Typed Entity Embeddings
● Accuracy for analogies with entities up to 0.92 on datasets used to test
word embeddings and adapted for entities
● Accuracy for analogies with words after disambiguating with entities up to
0.86 vs. 0.80 (best word2vec)
● Analogies with types and other interesting properties of type embeddings
discussed in previous work [Bianchi+2017]
Joint Work with Fabio Massimo Zanzotto, Università degli Studi di Roma 'Tor Vergata'
Factoring Time into Word and Entity
Representations Learned from Text
Why Caring about Time?
● Time is a key factor in word/entity semantics and in the
evaluation of similarity
● Time-sensitive applications:
■ Tracking word meaning shift and entity evolution
● apple_1953 vs apple_2017
● dbr:Apple_1990 more similar to “laptop” vs.
dbr:Apple_2018 more similar to “smartphone”
Why Caring about Time?
● Time is a key factor in word/entity semantics and in the
evaluation of similarity
● Time-sensitive applications:
■ Temporal analogies
● reagan is to 1987 as ? is to 1997 (clinton)
Why Caring about Time?
● Time is a key factor in word/entity semantics and in the
evaluation of similarity
● Time-sensitive applications:
■ Controlling the effect of time on similarity (time
sneaks into similarity in a way that may be difficult
to control)
● Should “dbr:Barack Obama” be more similar to
“dbr:Joe Biden” or “dbr:John F. Kennedy”?
• Word meanings are constantly evolving, reflecting the continuous
change of the world and the needs of its speakers
For example:
•apple:
fruit → computer → smartphone
•trump:
real estate → television → POTUS
Language Changes across Time…
Temporal Word Analogies: Examples
Factoring Time into Representations based on DS
Explicit
● Representation of temporal
words/entities
■ E.g., 90s, 2012, 7AM,
8/9/1943, monday,
tomorrow
Implicit
● Time-dependent representations of
words/entities
■ E.g., amazon_1960 vs.
amazon_2018, 1960 vs.
amazon_2018; dbr:Apple_1990
vs. dbr:Apple_2018
Words Entities
Implicit [De Carlo&al.AAAI’19]
Explicit [Bianchi&al.ISWC’18]
Factoring Time into Representations Based on DS
Explicit
● Representation of temporal
words/entities
■ E.g., 90s, 2012, 7AM,
8/9/1943, monday,
tomorrow
Implicit
● Time-dependent representations of
words/entities
■ E.g., amazon_1960 vs.
amazon_2018; dbr:Apple_1990
vs. dbr:Apple_2018
Words Entities
Implicit [De Carlo&al.AAAI’19]
Explicit [Bianchi&al.ISWC’18]
● Control time effect in similarity
● Tracking semantic evolution
● Temporal analogies
Training Time-dependent Word
Representations with a Compass
Di Carlo, V., Bianchi, F. & Palmonari, M.. Training Temporal Word Embeddings
with a Compass. In AAAI 2019.
Temporal Word Embeddings
● Temporal word embeddings are vector representations of words during
specific temporal intervals (e.g. the year 2001, the day 3/28/2018)
● They are learned from diachronic text corpora, divided in multiple
temporal slices (e.g. news articles, social posts)
1999
clinton, 2001
clinton, 2000
clinton, 1999
2000
2001
Requires alignment of different vector spaces
Temporal word embeddings models:
● One vector for each time slice (corpus partitions);
● Capture meaning shift clinton1981
≠ clinton2001
;
● Require alignment between models trained on
each temporal slice.
Alignment problem:
Analogy: two cartographers drawing a map starting
from different places and without a compass.
State-of-the-art vs. Temporal Word Embeddings with a Compass (TWEC):
● Pairwise-alignment: train each slice separately and then align them with linear transformations
[Kulkarni2015];
● Joint-alignment: train all the vectors concurrently, enforcing them to be aligned [Yao+2018];
● TWEC (this work): implicit alignment with a compass.
Training Temporal Word Embeddings with a Compass
Training Temporal Word Embeddings with a Compass: Intuition
CBOW
Word2vec comes in two flavors: Skip-gram
and Continuous Bag of Word Model (CBOW)
CBOW uses two matrices:
● input matrix
● target matrix
Intuition: fix one matrix while updating the
other matrix
Training Temporal Word Embeddings with a
Compass:
1. run CBOW on entire corpus
2. take target matrix (the compass)
3. use the target matrix to initialize the CBOW of
each slice and freeze it
4. each slice is trained separately and aligned
with the compass
Why using TWEC?
● Fast (generalization of CBOW)
● Easy to implement
● Good results with large and small corpora on:
○ Temporal analogical reasoning
○ Held-out tests
1
2
3
4
Log
Likelihood
Posterior Log
Probability
SW2V -2.66 TW2V -3.30
SBE -1.77 OW2V -3.30
TWEC -2.69 TWEC -2.80
DBE -1.70* DBE -3.16
4
Large
Corpus
Small
Corpus
MRR MRR
TWEC 0.484 TWEC 0.481
TW2V 0.444 TW2V 0.143
SW2V 0.283 SW2V 0.375
44
Training Temporal Word Embeddings with a Compass: Details
Temporal Word Embeddings with a Compass: Example #1
president
senator
hillary
foundation
administration
texas
george
bill
clinton,1999
bush,1999
bush,2001
clinton,2001
Temporal Word
Embeddings with
a Compass:
Example #2
Each point is the
representation of a
president in a given year
(e.g., bush_2001)
Explicit Representation of Temporal
Entities
Bianchi, F., Palmonari, M., & Nozza, D. (2018, October). Towards Encoding
Time in Text-Based Entity Embeddings. In International Semantic Web
Conference (pp. 56-71). Springer, Cham.
● First approach to explicitly encode time into entity embeddings (some parallel work in CogSci on
representation of temporal words such as monday, tomorrow, etc.)
● Lack of control over the time-effect in similarity evaluation:
○ Entities are similar when they co-occur frequently, entities that share a time period co-occur
more frequently
○ E.g., Most similar entities to “Winston Churchill” are his contemporary politicians
● Explicitly encoding of time periods (year-level) to control the similarity with respect to time
Time & Similarity
Winston Churchill Harold Macmillan
Textual Descriptions of Time Periods via Events
Textual Descriptions of Time Periods via Events
“The succession of events is an inherent property of our time perception. Memory
is necessary, and the order of these events is fundamental”
Snaider&al. 2012, Cognitive Systems Research
Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Adolf Hitler
Nazi Germany
World War II
4 3 6 2 3
5 1 2 9 2
1 2 8 4 1
Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Adolf Hitler 4 3 6 2 3
Nazi Germany 5 1 2 9 2
World War II 1 2 8 4 1
1941
9 2 3 5 5
AVG
Embedded Representations vs. Natural Time Flow
191X
years
201X
years
PCA in 1D vs. natural order of years: Kendall τ = 0.80 and Spearman Rank correlation coefficient = 0.94
Good resemblance of natural time flow!
2D projection (PCA)
1D projection (PCA)
Towards Time Aware Similarity
Time flattened similarity: to reduce the impact of time in the similarity.
E.g., make US presidents similar independently from their temporal context.
Time boosted similarity: to boost the impact of time in the similarity.
E.g., make politicians that share temporal contexts more similar
Time Flattened Similarity
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
Extract the embeddings for the two entities
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
Find the closest year vectors to the two entity
embeddings (e.g., the entity vector of Barack
Obama is close to the vector of the year
2003).
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
𝝍( , )
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
𝝍( , ) = η( , )
Cosine similarity
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
𝝍( , ) = η( , ) - ηn
( , )1990 2003
Normalized cosine similarity
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
𝝍( , ) = ⍺η( , ) - (1 - ⍺) ηn
( , )1999 2003
⍺ to control the weight of the time factor
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Controlling Time Bias: Flattened Similarity
Task: find similar entities to a given input entity but that are far in time. E.g., find
past president given one
Ford
Coolidge
Hoover
T. Kennedy
Truman
Barack
Obama
Controlling Time Bias: Flattened Similarity
Task: find similar entities to a given input entity but that are far in time. E.g., find
past president given one
Ford
Coolidge
Hoover
T. Kennedy
Truman
Barack
Obama
Correct
Correct
Correct
Correct
Wrong
Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
Ford
Coolidge
T. Kennedy
Hoover
Time flattened
similarity to
reorder the
top-100 most
similar
alpha = 0.7
New
New
New
New
Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Time flattened
similarity to
reorder the
top-100 most
similar
alpha = 0.1
New
New
New
New
Ford
Coolidge
Hoover
Truman
Roosevelt
Wilson
E. Roosevelt
Harding
Cleveland
Eisenhower
New
New
New
New
New
New
Future Work
Future Work
● More testing of TWEC with entities
● Beyond time-based slicing: a framework for aspect-based
comparison of distributional models of words/entities
○ Trump_NYT vs. Trump_RussiaToday
○ flat_NYT vs. flat_TheGuardian
● Using TEE as source of “intuitive” knowledge to combine it
with Logic Tensor Networks for reasoning
Ongoing Work: Embeddings + LTN
References
Di Carlo, V., Bianchi, F. & Palmonari, M. (2019). Training Temporal Word Embeddings with a Compass. AAAI (to appear).
Bianchi, F., Palmonari, M., & Nozza, D. (2018, October). Towards Encoding Time in Text-Based Entity Embeddings. In International Semantic Web
Conference (pp. 56-71). Springer, Cham.
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., & Bouchard, G. (2016, June). Complex embeddings for simple link prediction. In International
Conference on Machine Learning (pp. 2071-2080).
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In Advances
in neural information processing systems (pp. 2787-2795).
Bianchi, F., Soto, M. Palmonari, M., & Cutrona, V. (2018, June). Type Vector Representations from Text: An empirical analysis. in Deep Learning for
Knowledge Graphs and Semantic Technologies Workshop, co-located with the Extended Semantic Web Conference.
Kulkarni, V., Al-Rfou, R., Perozzi, B., & Skiena, S. (2015, May). Statistically significant detection of linguistic change. In Proceedings of the 24th
International Conference on World Wide Web (pp. 625-635). International World Wide Web Conferences Steering Committee.
Yao, Z., Sun, Y., Ding, W., Rao, N., & Xiong, H. (2018, February). Dynamic word embeddings for evolving semantic discovery. In Proceedings of the
Eleventh ACM International Conference on Web Search and Data Mining (pp. 673-681). ACM.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In
Advances in neural information processing systems (pp. 3111-3119).
Questions?

More Related Content

Similar to Two Approaches to Factor Time into Word and Entity Representations Learned from Text

Identifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpediaIdentifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpedia
Óscar Muñoz García
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Leon Derczynski
 
Download
DownloadDownload
Download
butest
 
Download
DownloadDownload
Download
butest
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
Talat Fakhri
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)
Trey Grainger
 

Similar to Two Approaches to Factor Time into Word and Entity Representations Learned from Text (20)

Tutorial semantic wikis and applications
Tutorial   semantic wikis and applicationsTutorial   semantic wikis and applications
Tutorial semantic wikis and applications
 
Practical cases, Applied linguistics course (MUI)
Practical cases, Applied linguistics course (MUI)Practical cases, Applied linguistics course (MUI)
Practical cases, Applied linguistics course (MUI)
 
TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORA
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
 
Some thoughts about the gaps across languages and domains through the experi...
Some thoughts about the gaps across languages and domains through the experi...Some thoughts about the gaps across languages and domains through the experi...
Some thoughts about the gaps across languages and domains through the experi...
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
NCCU: The Story of Data Science and Machine Learning Workshop - Political Blo...
NCCU: The Story of Data Science and Machine Learning Workshop - Political Blo...NCCU: The Story of Data Science and Machine Learning Workshop - Political Blo...
NCCU: The Story of Data Science and Machine Learning Workshop - Political Blo...
 
Identifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpediaIdentifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpedia
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
 
Gleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity SummarizationGleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity Summarization
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
 
Knowledge Graphs as a Pillar to AI
Knowledge Graphs as a Pillar to AIKnowledge Graphs as a Pillar to AI
Knowledge Graphs as a Pillar to AI
 
Database Management System
Database Management System Database Management System
Database Management System
 
A Semi-Automatic Tool for Linked Data Integration
A Semi-Automatic Tool for Linked Data IntegrationA Semi-Automatic Tool for Linked Data Integration
A Semi-Automatic Tool for Linked Data Integration
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 

Recently uploaded

Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
levieagacer
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
Sérgio Sacani
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
University of Hertfordshire
 
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT

Recently uploaded (20)

Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
PARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semPARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th sem
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptx
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdf
 
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
Technical english Technical english.pptx
Technical english Technical english.pptxTechnical english Technical english.pptx
Technical english Technical english.pptx
 
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.
 
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 Rp
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 RpWASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 Rp
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 Rp
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
 
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfMODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
 
Introduction and significance of Symbiotic algae
Introduction and significance of  Symbiotic algaeIntroduction and significance of  Symbiotic algae
Introduction and significance of Symbiotic algae
 

Two Approaches to Factor Time into Word and Entity Representations Learned from Text

  • 1. Two Approaches to Factor Time into Word and Entity Representations Learned from Text Matteo Palmonari and Federico Bianchi Department of Informatics, Systems and Communication, University of Milan-Bicocca Talk@FBK Trento - 10/5/2019 INSID&S Lab Interaction and Semantics for Innovation with Data & Services
  • 2. Outline ● Learning Word and Entity Representations from Text ● Factoring Time into Word and Entity Representations Learned from Text ● Time-dependent Word Representations ● Representation of Temporal Entities and Time-aware Similarity ● Future Work
  • 3. Learning Word and Entity Representations from Text
  • 4. Knowledge Graphs & Semantics ● Knowledge Graphs: ○ large representations of structured knowledge ○ < subject, predicate, object > ○ ~1.3 billion triples in DBpedia ○ symbols to refer to entities, types, and relations ○ types organized in sub-types graphs ● Model-theoretic or rule-bases semantics Honolulu Barack Obama birthPlace Politician CityPerson Thing Agent Jay-Z Musical Artist Place Michelle Obama isMarriedTo
  • 5. ‘Traditional’ Semantics: Interpretation and Inference Intuitive interpretation of symbols (remark: interpretation functions are a bit more complex than this) ● Barack Obama: a symbol denoting a domain object ● Married to: a symbol representing a relation between pairs of domain objects ● Politician: a symbol representing a set of domain objects Interpretation of sentences and inference ● “Barack Obama is married to Michelle Obama” (S) is true if the the objects denoted by Barack Obama and Michelle Obama belong to the set of married couples ● “All the friends of the husband are also friends of the wife” + S + “Barack Obama is friend of Jay-Z” ○ “Michelle Obama is friend of Jay-Z” Symbolic Knowledge Representation & Reasoning Credit: http://ontogenesis.knowledgeblog.org/1376 Barack Obama Michelle Obama Jay-Z
  • 6. ‘Traditional’ Semantics: Interpretation and Inference Intuitive interpretation (remark: interpretation functions are a bit more complex than this) ● Barack Obama: a symbol denoting a domain object ● Married to: a symbol representing a relation between pairs of domain objects ● Politician: a symbol representing a set of domain objects Interpretation of sentences and inference ● “Barack Obama is married to Michelle Obama” (S) is true if the the objects denoted by Barack Obama and Michelle Obama belong to the set of married couples ● “All the friends of the husband are also friends of the wife” + S + “Barack Obama is friend of Jay-Z” ○ “Michelle Obama is friend of Jay-Z” Symbolic Knowledge Representation & Reasoning Credit: http://ontogenesis.knowledgeblog.org/1376 Barack Obama Michelle Obama Jay-Z Difficult to answer other questions: ● Who’s the US president most similar to Barack Obama? ● Which concept is similar to the concept Politician? ● Who’s the equivalent of Barack Obama in France?
  • 7. Distributional Semantics: Meaning from Usage ● “The meaning of a word is its use in the language” (Wittgenstein, 1953) ● “You shall know a word by the company it keeps” (Firth, 1957) Distributional Hypothesis: similar words tend to appear in similar contexts
  • 8. Distributional Semantics: Meaning from Usage (From Lenci & Evert): what’s the meaning of ‘bardiwac’? ‘Bardiwac’ is a heavy red alcoholic beverage made from grape ● He handed her glass of bardiwac ● Beef dishes are made to complement the bardiwacs ● Nigel staggered to his feet, face flushed from too much bardiwac ● Malbec, one of the lesser-known bardiwac grapes, responds well to Australia’s sunshine ● I dined on bread and cheese and this excellent bardiwac ● The drinks were delicious: blood-red bardiwac as well as light, sweet Rhenish ‘Bardiwac’ appears in drinking-related contexts, close to words like ‘glass’ and ‘grape’
  • 9. Distributional Semantics of Words with Word2Vec ● Vector representations of words, i.e., word embeddings, are generated from a text corpus using a neural network [Mikolov+, 2013] cat dog The big black cat eats its food. My little black cat sleeps all day. Sometimes my dog eats too much! ● The neural network generates vectors so as to predict a target word given its context, or, a context given a target word ● Similar words appear in similar contexts and have similar vectors ● More algorithms to generate word representations, e.g., ELMo and BERT, exist Target word Context words
  • 10. Words vs. Entities Paris has many meanings … 21 pages of cities named Paris in Wikipedia KGs provide large inventories of entities to disambiguate names
  • 11. Knowledge Graphs & Semantics ● Knowledge Graphs: ○ large representations of structured knowledge ○ < subject, predicate, object > ○ ~1.3 billion triples in DBpedia ○ symbols to refer to entities, types, and relations ○ types organized in sub-types graphs ● Model-theoretic or rule-bases semantics ● Vector-based semantics by learning representations of entities, types and relations from data ○ TransE [Bordes+2013],…, CompleX Trouillon+2017] ○ Logic Tensor Networks [Serafini+2016] ○ ... Honolulu Barack Obama birthPlace Politician CityPerson Thing Agent Jay-Z Musical Artist Place Michelle Obama isMarriedTo TEE: a model for representing entities and types grounded in distributional semantics
  • 12. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 13. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …”“Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 14. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …” “dbo:City dbo:Country City dbo:Administrative_Region …” “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools Replace entities with their most specific types [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 15. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …” “dbo:City dbo:Country City dbo:Administrative_Region …” Generate Type Vectors From Text Generate Entity Vectors From Text“Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools Replace entities with their most specific types [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 16. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …” “dbo:City dbo:Country City dbo:Administrative_Region …” Generate Type Vectors From Text Generate Entity Vectors From Text Concatenate “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools Replace entities with their most specific types [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 17. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …” “dbo:City dbo:Country City dbo:Administrative_Region …” Generate Type Vectors From Text Generate Entity Vectors From Text Concatenate “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools Replace entities with their most specific types [Bianchi+,2017b] [Bianchi+, 2018a] 1 3 6 3 19 5 6 v(Rome)v(City) Wikipedia’s abstracts
  • 18. Vector-space Representations of Words vs. Entities words entities
  • 19. TEE: Usefulness of Typed Entity Embeddings Rome Paris 1 3 6 3 1 5 2 2 2 4 Italy 1 3 4 9 1 City Country 1 3 1 sim(Rome, Paris) = 0.65 sim(Rome, Italy) = 0.79 sim(City_Rome, City_Paris) = 0.79 sim(City_Rome, Country_Italy) = 0.71 1 3 6 3 1 5 5 1 3 1 1 3 6 3 1 Rome in the joint space is now nearer to Paris than to Italy City
  • 20. Analogies with Typed Entity Embeddings ● Accuracy for analogies with entities up to 0.92 on datasets used to test word embeddings and adapted for entities ● Accuracy for analogies with words after disambiguating with entities up to 0.86 vs. 0.80 (best word2vec) ● Analogies with types and other interesting properties of type embeddings discussed in previous work [Bianchi+2017] Joint Work with Fabio Massimo Zanzotto, Università degli Studi di Roma 'Tor Vergata'
  • 21. Factoring Time into Word and Entity Representations Learned from Text
  • 22. Why Caring about Time? ● Time is a key factor in word/entity semantics and in the evaluation of similarity ● Time-sensitive applications: ■ Tracking word meaning shift and entity evolution ● apple_1953 vs apple_2017 ● dbr:Apple_1990 more similar to “laptop” vs. dbr:Apple_2018 more similar to “smartphone”
  • 23. Why Caring about Time? ● Time is a key factor in word/entity semantics and in the evaluation of similarity ● Time-sensitive applications: ■ Temporal analogies ● reagan is to 1987 as ? is to 1997 (clinton)
  • 24. Why Caring about Time? ● Time is a key factor in word/entity semantics and in the evaluation of similarity ● Time-sensitive applications: ■ Controlling the effect of time on similarity (time sneaks into similarity in a way that may be difficult to control) ● Should “dbr:Barack Obama” be more similar to “dbr:Joe Biden” or “dbr:John F. Kennedy”?
  • 25. • Word meanings are constantly evolving, reflecting the continuous change of the world and the needs of its speakers For example: •apple: fruit → computer → smartphone •trump: real estate → television → POTUS Language Changes across Time…
  • 27. Factoring Time into Representations based on DS Explicit ● Representation of temporal words/entities ■ E.g., 90s, 2012, 7AM, 8/9/1943, monday, tomorrow Implicit ● Time-dependent representations of words/entities ■ E.g., amazon_1960 vs. amazon_2018, 1960 vs. amazon_2018; dbr:Apple_1990 vs. dbr:Apple_2018 Words Entities Implicit [De Carlo&al.AAAI’19] Explicit [Bianchi&al.ISWC’18]
  • 28. Factoring Time into Representations Based on DS Explicit ● Representation of temporal words/entities ■ E.g., 90s, 2012, 7AM, 8/9/1943, monday, tomorrow Implicit ● Time-dependent representations of words/entities ■ E.g., amazon_1960 vs. amazon_2018; dbr:Apple_1990 vs. dbr:Apple_2018 Words Entities Implicit [De Carlo&al.AAAI’19] Explicit [Bianchi&al.ISWC’18] ● Control time effect in similarity ● Tracking semantic evolution ● Temporal analogies
  • 29. Training Time-dependent Word Representations with a Compass Di Carlo, V., Bianchi, F. & Palmonari, M.. Training Temporal Word Embeddings with a Compass. In AAAI 2019.
  • 30. Temporal Word Embeddings ● Temporal word embeddings are vector representations of words during specific temporal intervals (e.g. the year 2001, the day 3/28/2018) ● They are learned from diachronic text corpora, divided in multiple temporal slices (e.g. news articles, social posts) 1999 clinton, 2001 clinton, 2000 clinton, 1999 2000 2001 Requires alignment of different vector spaces
  • 31. Temporal word embeddings models: ● One vector for each time slice (corpus partitions); ● Capture meaning shift clinton1981 ≠ clinton2001 ; ● Require alignment between models trained on each temporal slice. Alignment problem: Analogy: two cartographers drawing a map starting from different places and without a compass. State-of-the-art vs. Temporal Word Embeddings with a Compass (TWEC): ● Pairwise-alignment: train each slice separately and then align them with linear transformations [Kulkarni2015]; ● Joint-alignment: train all the vectors concurrently, enforcing them to be aligned [Yao+2018]; ● TWEC (this work): implicit alignment with a compass. Training Temporal Word Embeddings with a Compass
  • 32. Training Temporal Word Embeddings with a Compass: Intuition CBOW Word2vec comes in two flavors: Skip-gram and Continuous Bag of Word Model (CBOW) CBOW uses two matrices: ● input matrix ● target matrix Intuition: fix one matrix while updating the other matrix
  • 33. Training Temporal Word Embeddings with a Compass: 1. run CBOW on entire corpus 2. take target matrix (the compass) 3. use the target matrix to initialize the CBOW of each slice and freeze it 4. each slice is trained separately and aligned with the compass Why using TWEC? ● Fast (generalization of CBOW) ● Easy to implement ● Good results with large and small corpora on: ○ Temporal analogical reasoning ○ Held-out tests 1 2 3 4 Log Likelihood Posterior Log Probability SW2V -2.66 TW2V -3.30 SBE -1.77 OW2V -3.30 TWEC -2.69 TWEC -2.80 DBE -1.70* DBE -3.16 4 Large Corpus Small Corpus MRR MRR TWEC 0.484 TWEC 0.481 TW2V 0.444 TW2V 0.143 SW2V 0.283 SW2V 0.375 44 Training Temporal Word Embeddings with a Compass: Details
  • 34. Temporal Word Embeddings with a Compass: Example #1 president senator hillary foundation administration texas george bill clinton,1999 bush,1999 bush,2001 clinton,2001
  • 35. Temporal Word Embeddings with a Compass: Example #2 Each point is the representation of a president in a given year (e.g., bush_2001)
  • 36. Explicit Representation of Temporal Entities Bianchi, F., Palmonari, M., & Nozza, D. (2018, October). Towards Encoding Time in Text-Based Entity Embeddings. In International Semantic Web Conference (pp. 56-71). Springer, Cham.
  • 37. ● First approach to explicitly encode time into entity embeddings (some parallel work in CogSci on representation of temporal words such as monday, tomorrow, etc.) ● Lack of control over the time-effect in similarity evaluation: ○ Entities are similar when they co-occur frequently, entities that share a time period co-occur more frequently ○ E.g., Most similar entities to “Winston Churchill” are his contemporary politicians ● Explicitly encoding of time periods (year-level) to control the similarity with respect to time Time & Similarity Winston Churchill Harold Macmillan
  • 38. Textual Descriptions of Time Periods via Events
  • 39. Textual Descriptions of Time Periods via Events “The succession of events is an inherent property of our time perception. Memory is necessary, and the order of these events is fundamental” Snaider&al. 2012, Cognitive Systems Research
  • 40. Embedding Years from Event Descriptions A year is represented by the set of entities taking part in the year’s events The year vector is the average of the entities’ vectors found inside the description
  • 41. Embedding Years from Event Descriptions A year is represented by the set of entities taking part in the year’s events The year vector is the average of the entities’ vectors found inside the description
  • 42. Embedding Years from Event Descriptions A year is represented by the set of entities taking part in the year’s events The year vector is the average of the entities’ vectors found inside the description Adolf Hitler Nazi Germany World War II 4 3 6 2 3 5 1 2 9 2 1 2 8 4 1
  • 43. Embedding Years from Event Descriptions A year is represented by the set of entities taking part in the year’s events The year vector is the average of the entities’ vectors found inside the description Adolf Hitler 4 3 6 2 3 Nazi Germany 5 1 2 9 2 World War II 1 2 8 4 1 1941 9 2 3 5 5 AVG
  • 44. Embedded Representations vs. Natural Time Flow 191X years 201X years PCA in 1D vs. natural order of years: Kendall τ = 0.80 and Spearman Rank correlation coefficient = 0.94 Good resemblance of natural time flow! 2D projection (PCA) 1D projection (PCA)
  • 45. Towards Time Aware Similarity Time flattened similarity: to reduce the impact of time in the similarity. E.g., make US presidents similar independently from their temporal context. Time boosted similarity: to boost the impact of time in the similarity. E.g., make politicians that share temporal contexts more similar
  • 46. Time Flattened Similarity What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 47. Time Flattened Similarity Extract the embeddings for the two entities What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 48. Time Flattened Similarity 1999 2003 Find the closest year vectors to the two entity embeddings (e.g., the entity vector of Barack Obama is close to the vector of the year 2003). What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 49. Time Flattened Similarity 1999 2003 𝝍( , ) What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 50. Time Flattened Similarity 1999 2003 𝝍( , ) = η( , ) Cosine similarity What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 51. Time Flattened Similarity 1999 2003 𝝍( , ) = η( , ) - ηn ( , )1990 2003 Normalized cosine similarity What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 52. Time Flattened Similarity 1999 2003 𝝍( , ) = ⍺η( , ) - (1 - ⍺) ηn ( , )1999 2003 ⍺ to control the weight of the time factor What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 53. Controlling Time Bias: Flattened Similarity Task: find similar entities to a given input entity but that are far in time. E.g., find past president given one Ford Coolidge Hoover T. Kennedy Truman Barack Obama
  • 54. Controlling Time Bias: Flattened Similarity Task: find similar entities to a given input entity but that are far in time. E.g., find past president given one Ford Coolidge Hoover T. Kennedy Truman Barack Obama Correct Correct Correct Correct Wrong
  • 55. Controlling Time Bias: Qualitative Analysis Clinton Reagan G. Bush Carter Al Gore Nixon J. Kerry D. Cheney McCain Biden The most similar entities to Barack Obama using cosine similarity in TEE
  • 56. Controlling Time Bias: Qualitative Analysis Clinton Reagan G. Bush Carter Al Gore Nixon J. Kerry D. Cheney McCain Biden The most similar entities to Barack Obama using cosine similarity in TEE Clinton Reagan G. Bush Carter Al Gore Nixon Ford Coolidge T. Kennedy Hoover Time flattened similarity to reorder the top-100 most similar alpha = 0.7 New New New New
  • 57. Controlling Time Bias: Qualitative Analysis Clinton Reagan G. Bush Carter Al Gore Nixon J. Kerry D. Cheney McCain Biden The most similar entities to Barack Obama using cosine similarity in TEE Time flattened similarity to reorder the top-100 most similar alpha = 0.1 New New New New Ford Coolidge Hoover Truman Roosevelt Wilson E. Roosevelt Harding Cleveland Eisenhower New New New New New New
  • 59. Future Work ● More testing of TWEC with entities ● Beyond time-based slicing: a framework for aspect-based comparison of distributional models of words/entities ○ Trump_NYT vs. Trump_RussiaToday ○ flat_NYT vs. flat_TheGuardian ● Using TEE as source of “intuitive” knowledge to combine it with Logic Tensor Networks for reasoning
  • 61. References Di Carlo, V., Bianchi, F. & Palmonari, M. (2019). Training Temporal Word Embeddings with a Compass. AAAI (to appear). Bianchi, F., Palmonari, M., & Nozza, D. (2018, October). Towards Encoding Time in Text-Based Entity Embeddings. In International Semantic Web Conference (pp. 56-71). Springer, Cham. Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., & Bouchard, G. (2016, June). Complex embeddings for simple link prediction. In International Conference on Machine Learning (pp. 2071-2080). Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems (pp. 2787-2795). Bianchi, F., Soto, M. Palmonari, M., & Cutrona, V. (2018, June). Type Vector Representations from Text: An empirical analysis. in Deep Learning for Knowledge Graphs and Semantic Technologies Workshop, co-located with the Extended Semantic Web Conference. Kulkarni, V., Al-Rfou, R., Perozzi, B., & Skiena, S. (2015, May). Statistically significant detection of linguistic change. In Proceedings of the 24th International Conference on World Wide Web (pp. 625-635). International World Wide Web Conferences Steering Committee. Yao, Z., Sun, Y., Ding, W., Rao, N., & Xiong, H. (2018, February). Dynamic word embeddings for evolving semantic discovery. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (pp. 673-681). ACM. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).