SlideShare a Scribd company logo
1 of 43
Download to read offline
Towards Encoding Time in
Text-Based Entity Embeddings
Federico Bianchi, Matteo Palmonari and Debora Nozza
University of Milano-Bicocca
INSID&S Lab
Interaction and Semantics for
Innovation with Data & Services
International Semantic Web Conference, Monterey, California. 2018
MIND Lab
Models in Decision making
and data analysis
Knowledge Graphs
Large knowledge bases
Entities classified using types
Types organized in sub-types graphs
Binary relationships between entities
Semantics and inference via
rules/axioms
Semantic similarity with lexical,
topological and other feature-based
approaches
A.S.
Roma
Kostas
Manolas
team
Soccer
Player
Soccer
Club
Athlete
Thing
Person
Sports
Club
Garry
Kasparov
Chess
Player
Real
Madrid
Organis.
Knowledge Graphs Embeddings
Generate vector representations of entities and relationships
A.S.
Roma
Kostas
Manolas
team 2
5
6
2
6
4
2
12
5
2
Kostas
Manolas
A.S.
Roma
4
2
12
5
2
team
Given in input a KG
Generate vector
representations
Embedding
Algorithm
Why should we embed?
● Latent components (e.g., → link prediction)
● Features generation (e.g., → entity linking)
● Fast and intuitive way to compute similarity
From Word Embeddings to Text-based Entity Embeddings
- Word embeddings (e.g., [Mikolov+, 2013])
- Text-based Entity Embeddings
- Text as main source vs. Graph as main source [Bordes+,2013][Trouillon+,2016]
- Typed Entity Embeddings (TEE): use word embeddings algorithms on documents where entities and
types replace words (next slide :) )
- Pros: good for similarity evaluation
- Cons: no embedding of relations, just entity
corpus
cat
black
eats
dog
similar words corresponds
to similar vectors
C
W
The big black cat eats its food.
My little black cat sleeps all day.
Sometimes my cat eats too much!
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text
Concatenate
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text
Concatenate
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
1 3 6 3 19 5 6
v(Rome)v(City)
Wikipedia’s abstracts
Why Time?
● To the best of our knowledge this is the first approach to explicitly encode time periods into entity
embeddings
● We expect that when we evaluate similarity between entities time is important:
○ Entities are similar when they co-occur frequently, entities that share a time period co-occur
Most similar entities to “Winston Churchill” are his contemporary politicians
● In this paper we try to provide an approach to explicitly encode time in such a way that we can use
those representation to control the similarity with respect to time
Winston Churchill Harold Macmillan
Textual Descriptions of Time Periods via Events
Textual Descriptions of Time Periods via Events
“The succession of events is an inherent property of our time
perception. Memory is necessary, and the order of these
events is fundamental”
Snaider&al. 2012, Cognitive Systems Research
Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Adolf Hitler
Nazi Germany
World War II
4 3 6 2 3
5 1 2 9 2
1 2 8 4 1
Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Adolf Hitler 4 3 6 2 3
Nazi Germany 5 1 2 9 2
World War II 1 2 8 4 1
1941
9 2 3 5 5
AVG
Towards Time Aware Similarity
Time flattened similarity: to reduce the impact of time in the similarity.
E.g., make US presidents similar independently from their temporal context.
Time boosted similarity: to boost the impact of time in the similarity.
E.g., make politicians that share temporal contexts more similar
Time Flattened Similarity
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
Extract the embeddings for the two entities
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
Find the closest year vectors to the two entity
embeddings (e.g., the entity vector of Barack
Obama is close to the vector of the year
2003).
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
𝝍( , )
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
𝝍( , ) = η( , )
Cosine similarity
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
𝝍( , ) = η( , ) - ηn
( , )1990 2003
Normalized cosine similarity
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
𝝍( , ) = ⍺η( , ) - (1 - ⍺) ηn
( , )1999 2003
⍺ to control the weight of the time factor
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Experiments: Research Questions
1. Quality: properties of the year embeddings
2. Similarity and Time:
a. Time Bias in TEE and EE: Effect of time in entity embeddings from text
i. Adherence to Natural Time Order
ii. Clustering WWI and WWII Battles
iii. Relative Ordering of Entities
b. Controlling Time Bias: handling the effect of time
Embedded Representations vs. Natural Time Flow
191X
years
201X
years
PCA in 1D vs. natural order of years: Kendall τ = 0.80 and Spearman Rank correlation coefficient = 0.94
Good resemblance of natural time flow!
2D projection (PCA)
1D projection (PCA)
Time Bias: Adherence to Natural Time Order
Task: count number of entities shared by sequences of 2-3 contiguous years vs
number of entities shared in non contiguous years (randomly sampled):
● (e.g, 1991-1992 vs 1934-1992)
Dataset: two and three contiguous years and non contiguous years (1931-1991).
Results: contiguous years share an higher amount of entities than non contiguous
years.
Time Bias: Clustering Battles with EE
Task: classify battles as belonging to WWI or WWII.
Dataset: 152 resource identifier of WWI (63) and WWII (89) battles from Wikipedia.
Method: K-means clustering (K=2) on the vector representation in the entity
embedding space.
Results: 95% accuracy. Centroids of the two groups are close to WWI years and
WWII years respectively.
Controlling Time Bias: Flattened Similarity
Task: find similar entities to a given input entity but that are far in time
Barack
Obama
Controlling Time Bias: Flattened Similarity
Task: find similar entities to a given input entity but that are far in time. E.g., find
past president given one
Ford
Coolidge
Hoover
T. Kennedy
Truman
Barack
Obama
Controlling Time Bias: Flattened Similarity
Task: find similar entities to a given input entity but that are far in time. E.g., find
past president given one
Ford
Coolidge
Hoover
T. Kennedy
Truman
Barack
Obama
Correct
Correct
Correct
Correct
Wrong
Controlling Time Bias: Flattened Similarity
Dataset: US presidents entities and British Prime ministers entities (19 and 19)
Method: start with the 6 most recent presidents for each group. For each entity
compute the number of older presidents that are in the ranked list created by the
similarity measures.
Time flattened reorders top-100 results from cosine similarity
Algorithms:
● Time-aware Similarity TEE (TATEE), with time-flattened similarity;
● Similarity TEE (STEE) (standard neighborhood with cosine);
● Time-Aware Similarity EE (TAEE), with time-flattened similarity;
● Similarity EE (SEE) (standard neighborhood with cosine);
● Time-flattened similarity Wiki2Vec (Baseline).
Controlling Time Bias: Flattened Similarity
Results: time-flattened similarity on TATEE seems able to get the best results. This
is also due to the fact that TATEE considers type representations and thus it can
easily retrieve entities sharing types.
Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
Ford
Coolidge
T. Kennedy
Hoover
Time flattened
similarity to
reorder the
top-100 most
similar
alpha = 0.7
New
New
New
New
Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Time flattened
similarity to
reorder the
top-100 most
similar
alpha = 0.1
New
New
New
New
Ford
Coolidge
Hoover
Truman
Roosevelt
Wilson
E. Roosevelt
Harding
Cleveland
Eisenhower
New
New
New
New
New
New
Conclusions and Future Work
Conclusions
● Time can be represented in the vector space using events descriptions
● Time sneaks into entity similarity (time bias)
● Time bias can be controlled by considering explicit representations of
time periods
Future Work
● Study compositionality of time periods representations
● Comparison with Doc2Vec
● Improve time-aware similarity measure
● Comparison with other KG embeddings models
References
Snaider, J., McCall, R., & Franklin, S. (2012). Time production and representation in a conceptual and computational cognitive
model. Cognitive Systems Research, 13(1), 59-71.
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling
multi-relational data. In Advances in neural information processing systems (pp. 2787-2795).
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., & Bouchard, G. (2016, June). Complex embeddings for simple link prediction. In
International Conference on Machine Learning (pp. 2071-2080).
Tran, N. K., Tran, T., & Niederée, C. (2017, May). Beyond time: Dynamic context-aware entity recommendation. In European
Semantic Web Conference (pp. 353-368). Springer, Cham.
Bianchi, F., Soto, M., Palmonari, M., & Cutrona, V. (2018). Type vector representations from text: An empirical analysis. In Deep
Learning for Knowledge Graphs and Semantic Technologies Workshop, co-located with the Extended Semantic Web
Conference, Crete.
Bianchi, F., Palmonari, M., & Nozza, D. (2018), “Towards Encoding Time in Text-Based Entity Embeddings” in International
Semantic Web Conference (to appear), Monterey, California.
References
Bianchi, F., Palmonari, M., Cremaschi, M., & Fersini, E. (2017, May). Actively learning to rank semantic associations for
personalized contextual exploration of knowledge graphs. In European Semantic Web Conference (pp. 120-135). Springer,
Cham.
Bianchi, F., & Palmonari, M. (2017). Joint learning of entity and type embeddings for analogical reasoning with entities. In In
Proceedings of the NL4AI Workshop, co-located with the International Conference of the Italian Association for Artificial
Intelligence (AI* IA).
Thank you!
Qualitative Evaluation of Time Flattened Similarity
Winston Churchill Harold Macmillan
Tony Blair
Gordon Brown
Most similar 49th in
the list of
most
similars
41st in
the list of
most
similars
Method: Cosine similarity
Input: Winston Churchill
Qualitative Evaluation of Time Flattened Similarity
Winston Churchill Margaret Thatcher
Tony Blair
Gordon Brown
Most similar 16th in
the list of
most
similars
14th in
the list of
most
similars
Method: Time-flattened Similarity
Input: Winston Churchill

More Related Content

Similar to Towards Encoding Time in Text-Based Entity Embeddings

TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORAcsandit
 
Persuasive Essay On Capital Punishment. Essay on Capital Punishment Internat...
Persuasive Essay On Capital Punishment. Essay on Capital Punishment  Internat...Persuasive Essay On Capital Punishment. Essay on Capital Punishment  Internat...
Persuasive Essay On Capital Punishment. Essay on Capital Punishment Internat...Monica Clark
 
Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012Richard Ingram
 
Surfacing Real-World Event Content on Twitter
Surfacing Real-World Event Content on TwitterSurfacing Real-World Event Content on Twitter
Surfacing Real-World Event Content on TwitterHila Becker
 
Tutorial semantic wikis and applications
Tutorial   semantic wikis and applicationsTutorial   semantic wikis and applications
Tutorial semantic wikis and applicationsMark Greaves
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Leon Derczynski
 
CeB - f - s01
CeB - f - s01CeB - f - s01
CeB - f - s01gauvins
 
Relational database
Relational databaseRelational database
Relational databaseSanthiNivas
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisJonathan Stray
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chiBarbara Starr
 
Temporal Case Management 1998
Temporal Case Management  1998Temporal Case Management  1998
Temporal Case Management 1998David Tryon
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus typejins0618
 
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...Daniel Katz
 
Session 18, Oegema
Session 18, OegemaSession 18, Oegema
Session 18, Oegemacsrcomm
 
From text to entities: Information Extraction in the Era of Knowledge Graphs
From text to entities: Information Extraction in the Era of Knowledge GraphsFrom text to entities: Information Extraction in the Era of Knowledge Graphs
From text to entities: Information Extraction in the Era of Knowledge GraphsGraphRM
 

Similar to Towards Encoding Time in Text-Based Entity Embeddings (20)

Data journalism: Data rules, while data rule
Data journalism: Data rules, while data ruleData journalism: Data rules, while data rule
Data journalism: Data rules, while data rule
 
Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"
 
TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORA
 
Persuasive Essay On Capital Punishment. Essay on Capital Punishment Internat...
Persuasive Essay On Capital Punishment. Essay on Capital Punishment  Internat...Persuasive Essay On Capital Punishment. Essay on Capital Punishment  Internat...
Persuasive Essay On Capital Punishment. Essay on Capital Punishment Internat...
 
Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012
 
Surfacing Real-World Event Content on Twitter
Surfacing Real-World Event Content on TwitterSurfacing Real-World Event Content on Twitter
Surfacing Real-World Event Content on Twitter
 
Tutorial semantic wikis and applications
Tutorial   semantic wikis and applicationsTutorial   semantic wikis and applications
Tutorial semantic wikis and applications
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
CeB - f - s01
CeB - f - s01CeB - f - s01
CeB - f - s01
 
Relational database
Relational databaseRelational database
Relational database
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
 
Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics
Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics
Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chi
 
Temporal Case Management 1998
Temporal Case Management  1998Temporal Case Management  1998
Temporal Case Management 1998
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
 
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
 
Session 18, Oegema
Session 18, OegemaSession 18, Oegema
Session 18, Oegema
 
M21 and RDA
M21 and RDAM21 and RDA
M21 and RDA
 
From text to entities: Information Extraction in the Era of Knowledge Graphs
From text to entities: Information Extraction in the Era of Knowledge GraphsFrom text to entities: Information Extraction in the Era of Knowledge Graphs
From text to entities: Information Extraction in the Era of Knowledge Graphs
 

Recently uploaded

Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...mikehavy0
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethSamantha Rae Coolbeth
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 

Recently uploaded (20)

Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 

Towards Encoding Time in Text-Based Entity Embeddings

  • 1. Towards Encoding Time in Text-Based Entity Embeddings Federico Bianchi, Matteo Palmonari and Debora Nozza University of Milano-Bicocca INSID&S Lab Interaction and Semantics for Innovation with Data & Services International Semantic Web Conference, Monterey, California. 2018 MIND Lab Models in Decision making and data analysis
  • 2. Knowledge Graphs Large knowledge bases Entities classified using types Types organized in sub-types graphs Binary relationships between entities Semantics and inference via rules/axioms Semantic similarity with lexical, topological and other feature-based approaches A.S. Roma Kostas Manolas team Soccer Player Soccer Club Athlete Thing Person Sports Club Garry Kasparov Chess Player Real Madrid Organis.
  • 3. Knowledge Graphs Embeddings Generate vector representations of entities and relationships A.S. Roma Kostas Manolas team 2 5 6 2 6 4 2 12 5 2 Kostas Manolas A.S. Roma 4 2 12 5 2 team Given in input a KG Generate vector representations Embedding Algorithm Why should we embed? ● Latent components (e.g., → link prediction) ● Features generation (e.g., → entity linking) ● Fast and intuitive way to compute similarity
  • 4. From Word Embeddings to Text-based Entity Embeddings - Word embeddings (e.g., [Mikolov+, 2013]) - Text-based Entity Embeddings - Text as main source vs. Graph as main source [Bordes+,2013][Trouillon+,2016] - Typed Entity Embeddings (TEE): use word embeddings algorithms on documents where entities and types replace words (next slide :) ) - Pros: good for similarity evaluation - Cons: no embedding of relations, just entity corpus cat black eats dog similar words corresponds to similar vectors C W The big black cat eats its food. My little black cat sleeps all day. Sometimes my cat eats too much!
  • 5. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 6. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …”“Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 7. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …” “dbo:City dbo:Country City dbo:Administrative_Region …” “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools Replace entities with their most specific types [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 8. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …” “dbo:City dbo:Country City dbo:Administrative_Region …” Generate Type Vectors From Text Generate Entity Vectors From Text“Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools Replace entities with their most specific types [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 9. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …” “dbo:City dbo:Country City dbo:Administrative_Region …” Generate Type Vectors From Text Generate Entity Vectors From Text Concatenate “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools Replace entities with their most specific types [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 10. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …” “dbo:City dbo:Country City dbo:Administrative_Region …” Generate Type Vectors From Text Generate Entity Vectors From Text Concatenate “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools Replace entities with their most specific types [Bianchi+,2017b] [Bianchi+, 2018a] 1 3 6 3 19 5 6 v(Rome)v(City) Wikipedia’s abstracts
  • 11. Why Time? ● To the best of our knowledge this is the first approach to explicitly encode time periods into entity embeddings ● We expect that when we evaluate similarity between entities time is important: ○ Entities are similar when they co-occur frequently, entities that share a time period co-occur Most similar entities to “Winston Churchill” are his contemporary politicians ● In this paper we try to provide an approach to explicitly encode time in such a way that we can use those representation to control the similarity with respect to time Winston Churchill Harold Macmillan
  • 12. Textual Descriptions of Time Periods via Events
  • 13. Textual Descriptions of Time Periods via Events “The succession of events is an inherent property of our time perception. Memory is necessary, and the order of these events is fundamental” Snaider&al. 2012, Cognitive Systems Research
  • 14. Embedding Years from Event Descriptions A year is represented by the set of entities taking part in the year’s events The year vector is the average of the entities’ vectors found inside the description
  • 15. Embedding Years from Event Descriptions A year is represented by the set of entities taking part in the year’s events The year vector is the average of the entities’ vectors found inside the description
  • 16. Embedding Years from Event Descriptions A year is represented by the set of entities taking part in the year’s events The year vector is the average of the entities’ vectors found inside the description Adolf Hitler Nazi Germany World War II 4 3 6 2 3 5 1 2 9 2 1 2 8 4 1
  • 17. Embedding Years from Event Descriptions A year is represented by the set of entities taking part in the year’s events The year vector is the average of the entities’ vectors found inside the description Adolf Hitler 4 3 6 2 3 Nazi Germany 5 1 2 9 2 World War II 1 2 8 4 1 1941 9 2 3 5 5 AVG
  • 18. Towards Time Aware Similarity Time flattened similarity: to reduce the impact of time in the similarity. E.g., make US presidents similar independently from their temporal context. Time boosted similarity: to boost the impact of time in the similarity. E.g., make politicians that share temporal contexts more similar
  • 19. Time Flattened Similarity What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 20. Time Flattened Similarity Extract the embeddings for the two entities What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 21. Time Flattened Similarity 1999 2003 Find the closest year vectors to the two entity embeddings (e.g., the entity vector of Barack Obama is close to the vector of the year 2003). What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 22. Time Flattened Similarity 1999 2003 𝝍( , ) What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 23. Time Flattened Similarity 1999 2003 𝝍( , ) = η( , ) Cosine similarity What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 24. Time Flattened Similarity 1999 2003 𝝍( , ) = η( , ) - ηn ( , )1990 2003 Normalized cosine similarity What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 25. Time Flattened Similarity 1999 2003 𝝍( , ) = ⍺η( , ) - (1 - ⍺) ηn ( , )1999 2003 ⍺ to control the weight of the time factor What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 26. Experiments: Research Questions 1. Quality: properties of the year embeddings 2. Similarity and Time: a. Time Bias in TEE and EE: Effect of time in entity embeddings from text i. Adherence to Natural Time Order ii. Clustering WWI and WWII Battles iii. Relative Ordering of Entities b. Controlling Time Bias: handling the effect of time
  • 27. Embedded Representations vs. Natural Time Flow 191X years 201X years PCA in 1D vs. natural order of years: Kendall τ = 0.80 and Spearman Rank correlation coefficient = 0.94 Good resemblance of natural time flow! 2D projection (PCA) 1D projection (PCA)
  • 28. Time Bias: Adherence to Natural Time Order Task: count number of entities shared by sequences of 2-3 contiguous years vs number of entities shared in non contiguous years (randomly sampled): ● (e.g, 1991-1992 vs 1934-1992) Dataset: two and three contiguous years and non contiguous years (1931-1991). Results: contiguous years share an higher amount of entities than non contiguous years.
  • 29. Time Bias: Clustering Battles with EE Task: classify battles as belonging to WWI or WWII. Dataset: 152 resource identifier of WWI (63) and WWII (89) battles from Wikipedia. Method: K-means clustering (K=2) on the vector representation in the entity embedding space. Results: 95% accuracy. Centroids of the two groups are close to WWI years and WWII years respectively.
  • 30. Controlling Time Bias: Flattened Similarity Task: find similar entities to a given input entity but that are far in time Barack Obama
  • 31. Controlling Time Bias: Flattened Similarity Task: find similar entities to a given input entity but that are far in time. E.g., find past president given one Ford Coolidge Hoover T. Kennedy Truman Barack Obama
  • 32. Controlling Time Bias: Flattened Similarity Task: find similar entities to a given input entity but that are far in time. E.g., find past president given one Ford Coolidge Hoover T. Kennedy Truman Barack Obama Correct Correct Correct Correct Wrong
  • 33. Controlling Time Bias: Flattened Similarity Dataset: US presidents entities and British Prime ministers entities (19 and 19) Method: start with the 6 most recent presidents for each group. For each entity compute the number of older presidents that are in the ranked list created by the similarity measures. Time flattened reorders top-100 results from cosine similarity Algorithms: ● Time-aware Similarity TEE (TATEE), with time-flattened similarity; ● Similarity TEE (STEE) (standard neighborhood with cosine); ● Time-Aware Similarity EE (TAEE), with time-flattened similarity; ● Similarity EE (SEE) (standard neighborhood with cosine); ● Time-flattened similarity Wiki2Vec (Baseline).
  • 34. Controlling Time Bias: Flattened Similarity Results: time-flattened similarity on TATEE seems able to get the best results. This is also due to the fact that TATEE considers type representations and thus it can easily retrieve entities sharing types.
  • 35. Controlling Time Bias: Qualitative Analysis Clinton Reagan G. Bush Carter Al Gore Nixon J. Kerry D. Cheney McCain Biden The most similar entities to Barack Obama using cosine similarity in TEE
  • 36. Controlling Time Bias: Qualitative Analysis Clinton Reagan G. Bush Carter Al Gore Nixon J. Kerry D. Cheney McCain Biden The most similar entities to Barack Obama using cosine similarity in TEE Clinton Reagan G. Bush Carter Al Gore Nixon Ford Coolidge T. Kennedy Hoover Time flattened similarity to reorder the top-100 most similar alpha = 0.7 New New New New
  • 37. Controlling Time Bias: Qualitative Analysis Clinton Reagan G. Bush Carter Al Gore Nixon J. Kerry D. Cheney McCain Biden The most similar entities to Barack Obama using cosine similarity in TEE Time flattened similarity to reorder the top-100 most similar alpha = 0.1 New New New New Ford Coolidge Hoover Truman Roosevelt Wilson E. Roosevelt Harding Cleveland Eisenhower New New New New New New
  • 38. Conclusions and Future Work Conclusions ● Time can be represented in the vector space using events descriptions ● Time sneaks into entity similarity (time bias) ● Time bias can be controlled by considering explicit representations of time periods Future Work ● Study compositionality of time periods representations ● Comparison with Doc2Vec ● Improve time-aware similarity measure ● Comparison with other KG embeddings models
  • 39. References Snaider, J., McCall, R., & Franklin, S. (2012). Time production and representation in a conceptual and computational cognitive model. Cognitive Systems Research, 13(1), 59-71. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems (pp. 2787-2795). Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., & Bouchard, G. (2016, June). Complex embeddings for simple link prediction. In International Conference on Machine Learning (pp. 2071-2080). Tran, N. K., Tran, T., & Niederée, C. (2017, May). Beyond time: Dynamic context-aware entity recommendation. In European Semantic Web Conference (pp. 353-368). Springer, Cham. Bianchi, F., Soto, M., Palmonari, M., & Cutrona, V. (2018). Type vector representations from text: An empirical analysis. In Deep Learning for Knowledge Graphs and Semantic Technologies Workshop, co-located with the Extended Semantic Web Conference, Crete. Bianchi, F., Palmonari, M., & Nozza, D. (2018), “Towards Encoding Time in Text-Based Entity Embeddings” in International Semantic Web Conference (to appear), Monterey, California.
  • 40. References Bianchi, F., Palmonari, M., Cremaschi, M., & Fersini, E. (2017, May). Actively learning to rank semantic associations for personalized contextual exploration of knowledge graphs. In European Semantic Web Conference (pp. 120-135). Springer, Cham. Bianchi, F., & Palmonari, M. (2017). Joint learning of entity and type embeddings for analogical reasoning with entities. In In Proceedings of the NL4AI Workshop, co-located with the International Conference of the Italian Association for Artificial Intelligence (AI* IA).
  • 42. Qualitative Evaluation of Time Flattened Similarity Winston Churchill Harold Macmillan Tony Blair Gordon Brown Most similar 49th in the list of most similars 41st in the list of most similars Method: Cosine similarity Input: Winston Churchill
  • 43. Qualitative Evaluation of Time Flattened Similarity Winston Churchill Margaret Thatcher Tony Blair Gordon Brown Most similar 16th in the list of most similars 14th in the list of most similars Method: Time-flattened Similarity Input: Winston Churchill