SlideShare a Scribd company logo
1 of 34
Download to read offline
Type Vector Representations
from Text: An Empirical Analysis
Federico Bianchi, Mauricio Soto, Matteo Palmonari and Vincenzo Cutrona
Department of Informatics, Systems and Communications
University of Milano-Bicocca
federico.bianchi@disco.unimib.it
Workshop on Deep Learning for Knowledge Graphs
and Semantic Technologies (DL4KGS)
Co-located with ESWC 18, June 2018, Crete, Greece
ESWC, Crete, 4th June 2018
Outline
● Knowledge Graphs
● Scope of this Paper and State-of-the-art
● T2V: Type to Vector
● Experiments
ESWC, Crete, 4th June 2018
Outline
● Knowledge Graphs
● Scope of this Paper and State-of-the-art
● T2V: Type to Vector
● Experiments
ESWC, Crete, 4th June 2018
● Structured representations
of knowledge
● Entities are classified using
types (i.e., concepts)
● Types are organized in
sub-types graphs
Knowledge Graphs
A.S.
Roma
Kostas
Manolas
team
Soccer
Player
Soccer
Club
Athlete
Thing
Person
Sports
Club
Garry
Kasparov
Chess
Player
Real
Madrid
Organisa.
ESWC, Crete, 4th June 2018
Outline
● Knowledge Graphs
● Scope of this Paper and State-of-the-art
● T2V: Type to Vector
● Experiments
ESWC, Crete, 4th June 2018
Scope of this Paper
● Propose an approach to learn representations of types by
considering text as a different source of information
○ Distributional semantics
○ Embeddings of types in a vector space
○ Mapping to a word2vec learning problem
● Main intuition: building a type similarity measure that encodes
relatedness between types (beyond ontological similarity)
● Empirical evaluation of the properties of text-based type
representations
○ Focus on similarity (relatedness vs ontological similarity)
ESWC, Crete, 4th June 2018
Vector Representations of Types
Types represented in a vector space:
● Easy and fast evaluation of similarity
2
5
6
2
6
4
2
12
5
2
Soccer
Club
Person
ESWC, Crete, 4th June 2018
Embeddings for Representing Ontologies
● [Jayawardana+, 2017]
○ Instance-based approach for building word embeddings vectors of the instances in a custom
ontology (legal domain)
○ Embedding used to predict the best representative vector for each ontology type
(cluster-based approach)
○ Conclusions: type vectors are aggregation of entity embeddings
● [Smaili+, 2018]
○ Distributional hypothesis based embeddings for ontological representation
○ Textual document generated by considering axioms in an ontology as sentences of a text
○ Conclusions: uses the structure of the ontology
ESWC, Crete, 4th June 2018
State-of-the-Art on Ontological Similarity
● [Rada+, 1989] (path)
○ Shortest path length between concepts
○ Equal path problem: two concepts with the same path length share the same semantic similarity
● [Wu&Palmer, 1994] (wup)
○ Considers the instances depth (based on the Least Common Subsumer - i.e., first common ancestor)
○ Equal depth problem: concepts at the same hierarchical level share the same similarity
● [Zhu&Iglesias, 2017] (wpath)
○ Weighted path length to evaluate the similarity between concepts
○ Exploitation of the statistical Information Content (IC) along with the topology
○ IC computed on text corpora and used to assign higher level to more specific entities
● Topological distant concepts may be highly related (e.g., SoccerPlayer and
SoccerClub)
● Not all siblings pairs are similar in the same way (e.g., is a SoccerPlayer equally
similar to a Wrestler and a BasketballPlayer)
ESWC, Crete, 4th June 2018
Similarity vs. Relatedness
Semantic Similarity
Resemblance general conceptual term
Ex. Settlement and Town
Equal Path problem, Depth problem
Measures based on the ontology topology:
● path
● wup (Least Common Subsumer)
● wpath (Information Content)
Relatedness
Existence of connections
Ex. SoccerPlayer and SoccerClub
Ontology structure obliviousness
Measures based in corpora co-occurrence
Word Embedding (Distributional Hypothesis)
● word2vec
ESWC, Crete, 4th June 2018
Outline
● Knowledge Graphs
● Scope of this Paper and State-of-the-art
● T2V: Type to Vector
● Experiments
ESWC, Crete, 4th June 2018
Word2Vec [Mikolov+, 2013]
Well-known algorithm for learning word
representations from an input corpus
Distributional hypothesis: similar words appear in
similar contexts (word-word co-occurrence)
Type to Vector (T2V): generate distributed
representations of types based on type-type
co-occurrence.
cat
black
eats
dog
similar words corresponds
to similar vectors
The big black cat eats its food.
My little black cat sleeps all day.
Sometimes my cat eats too much!
Two hyperparameters:
● Desired embedding size
● Length of the context window
ESWC, Crete, 4th June 2018
T2V: Word2Vec on Annotated Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital
of the Lazio region.”
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Part of our approach to learn representations of typed entities:
- Bianchi & Palmonari. Joint Learning of Entity and Type Embeddings for Analogical Reasoning with Entities. NL4AI 2017
- Bianchi & al. Towards Encoding Time in Text-Based Entity Embeddings. ISWC 2018 (to appear).
ESWC, Crete, 4th June 2018
T2V: Word2Vec on Annotated Text
Find entities in text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital
of the Lazio region.”
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
● Entities are found with a
Named Entity Linking
Service
ESWC, Crete, 4th June 2018
T2V: Word2Vec on Annotated Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital
of the Lazio region.”
Rome Italy Rome Lazio
Find entities in text
● Entities are found with a
Named Entity Linking
Service
● Words are removed
ESWC, Crete, 4th June 2018
T2V: Word2Vec on Annotated Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital
of the Lazio region.”
Replace Entities
With Minimal Types
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital
of the Lazio region.”
City, Country, City
Administrative_Region
Rome Italy Rome Lazio
● Entities are found with a
Named Entity Linking
Service
● Words are removed
● Entities are replaced with
their minimal (most specific)
type
Find entities in text
ESWC, Crete, 4th June 2018
T2V: Word2Vec on Annotated Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital
of the Lazio region.”
Replace Entities
With Minimal Types
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital
of the Lazio region.”
City, Country, City
Administrative_Region
Rome Italy Rome Lazio
● Entities are found with a
Named Entity Linking
Service
● Words are removed
● Entities are replaced with
their minimal (most specific)
type
● The document containing
sequences of types is fed to
word2vec
Find entities in text
ESWC, Crete, 4th June 2018
T2V: Word2Vec on Annotated Text
Generate Type Vectors
word2vec
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital
of the Lazio region.”
Replace Entities
With Minimal Types
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital
of the Lazio region.”
City, Country, City
Administrative_Region
Rome Italy Rome Lazio
2
5
6
2
6
4
2
12
5
2
6
7
6
9
7
City Country Adminis.
Region
Similarity can be computed with cosine similarity
Find entities in text
ESWC, Crete, 4th June 2018
Outline
● Knowledge Graphs
● Scope of this Paper and State of the Art
● T2V: Type to Vector
● Experiments
ESWC, Crete, 4th June 2018
Empirical Evaluation of T2V Representations
Objective: analyzing the properties of the T2V representations, focus on similarity
Corpus for T2V training: DBpedia 2016-04 abstracts annotated with DBpedia Spotlight
Experiments:
1) Analogical reasoning (standard method of evaluation for word embeddings)
2) Correlation with topological measures
3) Similarity and depth (depth problem)
4) Similarity and siblings (siblings similarity problem)
5) Type matching (similarity between different categorization systems)
ESWC, Crete, 4th June 2018
1) Analogical Reasoning
Hypothesis
T2V can support analogical reasoning as word2vec does
Dataset
Dataset of 868 reasonably objective analogies on sports.
(e.g., sportPlayer - sportTeam)
Methodology
● Tested two different T2V analogical reasoning
models with 100 and 200 dimensions for the
embeddings and a window of 5
● Word2vec answers with the list of closest points
to the analogical operation
● We check if the correct answer is found in the list
of top-k (1, 5, 10)
● In the top-k setting answer is correct if it is in the
top-k the ranked list
Example
“Who is the equivalent of a RugbyPlayer that plays in a
RugbyTeam in a BasketballTeam?”
RugbyPlayer : RugbyTeam :: ? : BasketballTeam
Analogical operation: v(dbo:RugbyPlayer) -
v(dbo:RugbyTeam) + v(dbo:BasketballTeam) ≈
v(dbo:BasketballPlayer)
ESWC, Crete, 4th June 2018
1) Analogical Reasoning
P@1 P@2 P@5
T2V
(200,5)
0.50 0.85 0.98
T2V
(100,5)
0.47 0.76 0.93
Outcome
● Correct answer is often found in the first 5 positions
● Linguistic properties are preserved also in T2V
Model used for the
next experiments
ESWC, Crete, 4th June 2018
2) T2V vs Topological Measures: Correlation
path wup wpath T2V
path 1.00 0.87 0.94 0.30
wup 1.00 0.93 0.33
wpath 1.00 0.36
T2V 1.00
Hypothesis
T2V similarity is orthogonal to topological similarity
Dataset
~15000 pairs of types in DBpedia
Methodology
Pearson Correlation coefficient between T2V similarity and
well-known topological measures
Outcome
T2V similarity and topological similarity are not strongly
correlated
ESWC, Crete, 4th June 2018
2) T2V vs Topological Measures: Insights
State of the Art
Based on the topology of the ontology
Ex. dbo:Settlement and dbo:Town (high similarity)
Ex. dbo:SoccerPlayer and dbo:SoccerClub (low similarity)
Ex. dbo:Wrestler and dbo:SoccerPlayer (high similarity, siblings)
T2V
Captures the co-occurrences of types in text
Ex. dbo:Settlement and dbo:Town (high similarity)
Ex. dbo:SoccerPlayer and dbo:SoccerClub (high similarity)
Ex. dbo:Wrestler and dbo:SoccerPlayer (low similarity, siblings)
ESWC, Crete, 4th June 2018
2) T2V vs Topological Measures: Examples
Type 1 Type 2 Sim - wpath Sim - T2V
dbo:SoccerPlayer dbo:SoccerClub 0.17 0.72
dbo:SoccerPlayer dbo:Wrestler 0.47 0.24
dbo:RailwayLine dbo:Station 0.44 0.81
dbo:Vein dbo:Artery 0.70 0.84
dbo:RailwayLine dbo:PublicTransitSystem 0.11 0.79
dbo:Company dbo:Airline 0.72 0.30
ESWC, Crete, 4th June 2018
3) Similarity vs. Depth
Hypothesis
Sibling types are pairwise more similar when types are more specific
(as noticed in topological similarity )
sim(dbo:BasketballPlayer, dbo:SoccerPlayer)
>
sim(dbo:Person,dbo:Organization)
Dataset
DBpedia ontology
Methodology
● Children Information Distribution CID
○ Average pairwise similarity between the children of
a type p
● CID vs relative depth (relative = to the type path)
ESWC, Crete, 4th June 2018
3) T2V CID vs. Relative Depth
Outcome
● On average, CID increases
with depth
CID drops here: CID(dbo:Thing)>CID(dbo:Agent)
ESWC, Crete, 4th June 2018
4) Siblings’ Similarity
Hypothesis
The pairwise similarity for a set of siblings changes from pair to pair
Dataset
31 siblings type from the DBpedia ontology
For each type we selected its most similar sibling and its least similar sibling considering
T2V similarity
(e.g., SoccerPlayer => most similar RugbyPlayer, least similar ChessPlayer)
Methodology
We asked 5 users (knowledgeable about semantic web) to answer questions like the
following:
“Do you think a SoccerPlayer is more similar to a RugbyPlayer or a ChessPlayer?”
Potential Biases
● Low number of participants
● Questions were selected using T2V
ESWC, Crete, 4th June 2018
4) Siblings’ Similarity
Outcome
● Agreement between the user using Gwet AC1 [Gwet, 2008] is 0.9 (high agreement)
● Given an input type, users choose as answer the type that is also returned as most similar by
T2V
Examples
Is a Writer more similar to a dbo:Philosopher or a dbo:BusinessPerson?
Is a President more similar to a dbo:PrimeMinister or a dbo:Mayor?
Most challenging question for users
“is a dbo:Skyscraper more similar to a dbo:Hospital or a dbo:Museum?”
ESWC, Crete, 4th June 2018
5) Type Matching
Hypothesis
T2V can be used for ontology matching provided that two different ontologies are used to classify a common set of instances
Methodology
● Learn representations of types from different ontologies in a shared vector space (100 dimensions, 5 window)
● Replace entities with a type of one of the two ontologies (randomly)
Dataset
● DBpedia 2016-04 and Wikidata 2016-06 (instance of)
Same space in
which types of
different
ontologies
co-exist
City (Ontology 1)
Country (Ontology 2)
City (Ontology 1)
Region (Ontology 2)
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
2
9
1
2
5
4
2
6
4
5
7
5
2
2
9
ESWC, Crete, 4th June 2018
5) Type Matching
Wikidata (label) DBpedia Sim
Q4498974 (ice hokey team) HockeyTeam 0.99
Q5107 (continent) Continent 0.99
Q17374546* (Australian rules football club) AustralianFootballTeam 0.99
Q3001412* (horse race) HorseRace 0.98
Q4022 (river) River 0.98
Q46970 (airline) Airline 0.98
Q18127 (record label) RecordLabel 0.98
Q13027888* (baseball team) BaseballTeam 0.98
Q11424 (film) Film 0.98
Q1075* (color) Colour 0.98
Q17156793* (American football team) American Football Team 0.95
Q3146899* (diocese of the Catholic Church) Diocese 0.93
Q7944* (earthquake) Earthquake 0.91
* not declared equivalent in DBpedia
Outcome
● Types with highest similarity are equivalent classes in the
two ontologies (due to the use in text)
● Found equivalent types not declared as equivalent in
DBpedia
Conclusions and
Future Work
Future Work:
● Combine T2V similarity and topological
similarities in one measure
● Study relation between sub-type relation
and the vector representation
● Support ontology matching tasks
● Compare with other methods for
vector-based type representations
Conclusions:
● Similarity with T2V injects relatedness in
type similarity measures (from
handwritten text corpora)
● T2V exhibits some desired properties
(depth, sibling discrimination)
● T2V supports analogical reasoning
● T2V can support ontology matching
Thank You
Workshop on Deep Learning for Knowledge Graphs
and Semantic Technologies (DL4KGS)
Co-located with ESWC 18, June 2018, Crete, Greece
Code and models are publicly available (see the paper for details)
Mail to: federico.bianchi@disco.unimib.it
ESWC, Crete, 4th June 2018
References
Bianchi, F., Palmonari, M., & Nozza, D., Towards Encoding Time in Text-Based Entity Embeddings. in International Semantic Web
Conference, 2018 (to appear).
Bianchi, F., & Palmonari, M. (2017). Joint Learning of Entity and Type Embeddings for Analogical Reasoning with Entities. In In
Proceedings of the NL4AI Workshop, co-located with the International Conference of the Italian Association for Artificial Intelligence
(AI* IA).
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their
compositionality. In Advances in neural information processing systems (pp. 3111-3119).
Kilem Li Gwet. Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical
and Statistical Psychology,61(1):29–48, 2008.
Ganggao Zhu and Carlos A Iglesias. Computing semantic similarity of concepts in knowledge graphs. IEEE Transactions on Knowledge
and Data Engineering, 29(1):72–85, 2017.
V. Jayawardana, D. Lakmal, N. de Silva, A. S. Perera, K. Sugathadasa, and B. Ayesha. Deriving a representative vector for
ontology classes with instance word vector embeddings. In INTECH, pages 79–84, Aug 2017.
Fatima Zohra Smaili, Xin Gao, and Robert Hoehndorf. Onto2vec: joint vector-based representation of biological entities and
their ontology-based annotations. arXiv preprint arXiv:1802.00864, 2018.

More Related Content

What's hot

TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsShubhangi Tandon
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyAuro Tripathy
 
What's next in Julia
What's next in JuliaWhat's next in Julia
What's next in JuliaJiahao Chen
 
Latent Relational Model for Relation Extraction
Latent Relational Model for Relation ExtractionLatent Relational Model for Relation Extraction
Latent Relational Model for Relation ExtractionGaetano Rossiello, PhD
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational SemanticsMarina Santini
 
Fragen visualisierung svantje
Fragen visualisierung svantjeFragen visualisierung svantje
Fragen visualisierung svantjeStefan Gradmann
 
OUTDATED Text Mining 5/5: Information Extraction
OUTDATED Text Mining 5/5: Information ExtractionOUTDATED Text Mining 5/5: Information Extraction
OUTDATED Text Mining 5/5: Information ExtractionFlorian Leitner
 
Entity Linking in Queries: Tasks and Evaluation
Entity Linking in Queries: Tasks and EvaluationEntity Linking in Queries: Tasks and Evaluation
Entity Linking in Queries: Tasks and EvaluationFaegheh Hasibi
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational SemanticsMarina Santini
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaTraian Rebedea
 
Exploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity RetrievalExploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity RetrievalFaegheh Hasibi
 
Topic modeling using big data analytics
Topic modeling using big data analyticsTopic modeling using big data analytics
Topic modeling using big data analyticsFarheen Nilofer
 

What's hot (20)

TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro Tripathy
 
What's next in Julia
What's next in JuliaWhat's next in Julia
What's next in Julia
 
Latent Relational Model for Relation Extraction
Latent Relational Model for Relation ExtractionLatent Relational Model for Relation Extraction
Latent Relational Model for Relation Extraction
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
Fragen visualisierung svantje
Fragen visualisierung svantjeFragen visualisierung svantje
Fragen visualisierung svantje
 
Ir 03
Ir   03Ir   03
Ir 03
 
AINL 2016: Malykh
AINL 2016: MalykhAINL 2016: Malykh
AINL 2016: Malykh
 
OUTDATED Text Mining 5/5: Information Extraction
OUTDATED Text Mining 5/5: Information ExtractionOUTDATED Text Mining 5/5: Information Extraction
OUTDATED Text Mining 5/5: Information Extraction
 
Entity Linking in Queries: Tasks and Evaluation
Entity Linking in Queries: Tasks and EvaluationEntity Linking in Queries: Tasks and Evaluation
Entity Linking in Queries: Tasks and Evaluation
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
 
AINL 2016: Yagunova
AINL 2016: YagunovaAINL 2016: Yagunova
AINL 2016: Yagunova
 
Language models
Language modelsLanguage models
Language models
 
Ir 02
Ir   02Ir   02
Ir 02
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large Corpora
 
Exploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity RetrievalExploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity Retrieval
 
Topic modeling using big data analytics
Topic modeling using big data analyticsTopic modeling using big data analytics
Topic modeling using big data analytics
 
AINL 2016: Maraev
AINL 2016: MaraevAINL 2016: Maraev
AINL 2016: Maraev
 

Similar to Type Vector Representations from Text. DL4KGS@ESWC 2018

Two Approaches to Factor Time into Word and Entity Representations Learned fr...
Two Approaches to Factor Time into Word and Entity Representations Learned fr...Two Approaches to Factor Time into Word and Entity Representations Learned fr...
Two Approaches to Factor Time into Word and Entity Representations Learned fr...Federico Bianchi
 
Automatic Annotation Approach Of Events In News Articles
Automatic Annotation Approach Of Events In News ArticlesAutomatic Annotation Approach Of Events In News Articles
Automatic Annotation Approach Of Events In News ArticlesJoaquin Hamad
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT ecij
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT ecij
 
Analysing Business Models For Cross Border E-Services Provided By The Chamber...
Analysing Business Models For Cross Border E-Services Provided By The Chamber...Analysing Business Models For Cross Border E-Services Provided By The Chamber...
Analysing Business Models For Cross Border E-Services Provided By The Chamber...Brandi Gonzales
 
TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORAcsandit
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Cornelius Puschmann
 
SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalgowthamnaidu0986
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Salam Shah
 
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-LanguageMarius Corici
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Angelo Salatino
 
Named Entity Recognition from Online News
Named Entity Recognition from Online NewsNamed Entity Recognition from Online News
Named Entity Recognition from Online NewsBernardo Najlis
 
Text Segmentation for Online Subjective Examination using Machine Learning
Text Segmentation for Online Subjective Examination using Machine   LearningText Segmentation for Online Subjective Examination using Machine   Learning
Text Segmentation for Online Subjective Examination using Machine LearningIRJET Journal
 
Real-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech StreamsReal-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech Streamstmra
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summaryJaved Riza
 
Towards Encoding Time in Text-Based Entity Embeddings
Towards Encoding Time in Text-Based Entity EmbeddingsTowards Encoding Time in Text-Based Entity Embeddings
Towards Encoding Time in Text-Based Entity EmbeddingsFederico Bianchi
 

Similar to Type Vector Representations from Text. DL4KGS@ESWC 2018 (20)

Two Approaches to Factor Time into Word and Entity Representations Learned fr...
Two Approaches to Factor Time into Word and Entity Representations Learned fr...Two Approaches to Factor Time into Word and Entity Representations Learned fr...
Two Approaches to Factor Time into Word and Entity Representations Learned fr...
 
Narrative: Text Generation Model from Data
Narrative: Text Generation Model from DataNarrative: Text Generation Model from Data
Narrative: Text Generation Model from Data
 
Automatic Annotation Approach Of Events In News Articles
Automatic Annotation Approach Of Events In News ArticlesAutomatic Annotation Approach Of Events In News Articles
Automatic Annotation Approach Of Events In News Articles
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
 
Analysing Business Models For Cross Border E-Services Provided By The Chamber...
Analysing Business Models For Cross Border E-Services Provided By The Chamber...Analysing Business Models For Cross Border E-Services Provided By The Chamber...
Analysing Business Models For Cross Border E-Services Provided By The Chamber...
 
TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORA
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
 
SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professional
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
 
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics
 
Named Entity Recognition from Online News
Named Entity Recognition from Online NewsNamed Entity Recognition from Online News
Named Entity Recognition from Online News
 
Text Segmentation for Online Subjective Examination using Machine Learning
Text Segmentation for Online Subjective Examination using Machine   LearningText Segmentation for Online Subjective Examination using Machine   Learning
Text Segmentation for Online Subjective Examination using Machine Learning
 
Real-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech StreamsReal-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech Streams
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summary
 
Towards Encoding Time in Text-Based Entity Embeddings
Towards Encoding Time in Text-Based Entity EmbeddingsTowards Encoding Time in Text-Based Entity Embeddings
Towards Encoding Time in Text-Based Entity Embeddings
 

Recently uploaded

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 

Recently uploaded (20)

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 

Type Vector Representations from Text. DL4KGS@ESWC 2018

  • 1. Type Vector Representations from Text: An Empirical Analysis Federico Bianchi, Mauricio Soto, Matteo Palmonari and Vincenzo Cutrona Department of Informatics, Systems and Communications University of Milano-Bicocca federico.bianchi@disco.unimib.it Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies (DL4KGS) Co-located with ESWC 18, June 2018, Crete, Greece
  • 2. ESWC, Crete, 4th June 2018 Outline ● Knowledge Graphs ● Scope of this Paper and State-of-the-art ● T2V: Type to Vector ● Experiments
  • 3. ESWC, Crete, 4th June 2018 Outline ● Knowledge Graphs ● Scope of this Paper and State-of-the-art ● T2V: Type to Vector ● Experiments
  • 4. ESWC, Crete, 4th June 2018 ● Structured representations of knowledge ● Entities are classified using types (i.e., concepts) ● Types are organized in sub-types graphs Knowledge Graphs A.S. Roma Kostas Manolas team Soccer Player Soccer Club Athlete Thing Person Sports Club Garry Kasparov Chess Player Real Madrid Organisa.
  • 5. ESWC, Crete, 4th June 2018 Outline ● Knowledge Graphs ● Scope of this Paper and State-of-the-art ● T2V: Type to Vector ● Experiments
  • 6. ESWC, Crete, 4th June 2018 Scope of this Paper ● Propose an approach to learn representations of types by considering text as a different source of information ○ Distributional semantics ○ Embeddings of types in a vector space ○ Mapping to a word2vec learning problem ● Main intuition: building a type similarity measure that encodes relatedness between types (beyond ontological similarity) ● Empirical evaluation of the properties of text-based type representations ○ Focus on similarity (relatedness vs ontological similarity)
  • 7. ESWC, Crete, 4th June 2018 Vector Representations of Types Types represented in a vector space: ● Easy and fast evaluation of similarity 2 5 6 2 6 4 2 12 5 2 Soccer Club Person
  • 8. ESWC, Crete, 4th June 2018 Embeddings for Representing Ontologies ● [Jayawardana+, 2017] ○ Instance-based approach for building word embeddings vectors of the instances in a custom ontology (legal domain) ○ Embedding used to predict the best representative vector for each ontology type (cluster-based approach) ○ Conclusions: type vectors are aggregation of entity embeddings ● [Smaili+, 2018] ○ Distributional hypothesis based embeddings for ontological representation ○ Textual document generated by considering axioms in an ontology as sentences of a text ○ Conclusions: uses the structure of the ontology
  • 9. ESWC, Crete, 4th June 2018 State-of-the-Art on Ontological Similarity ● [Rada+, 1989] (path) ○ Shortest path length between concepts ○ Equal path problem: two concepts with the same path length share the same semantic similarity ● [Wu&Palmer, 1994] (wup) ○ Considers the instances depth (based on the Least Common Subsumer - i.e., first common ancestor) ○ Equal depth problem: concepts at the same hierarchical level share the same similarity ● [Zhu&Iglesias, 2017] (wpath) ○ Weighted path length to evaluate the similarity between concepts ○ Exploitation of the statistical Information Content (IC) along with the topology ○ IC computed on text corpora and used to assign higher level to more specific entities ● Topological distant concepts may be highly related (e.g., SoccerPlayer and SoccerClub) ● Not all siblings pairs are similar in the same way (e.g., is a SoccerPlayer equally similar to a Wrestler and a BasketballPlayer)
  • 10. ESWC, Crete, 4th June 2018 Similarity vs. Relatedness Semantic Similarity Resemblance general conceptual term Ex. Settlement and Town Equal Path problem, Depth problem Measures based on the ontology topology: ● path ● wup (Least Common Subsumer) ● wpath (Information Content) Relatedness Existence of connections Ex. SoccerPlayer and SoccerClub Ontology structure obliviousness Measures based in corpora co-occurrence Word Embedding (Distributional Hypothesis) ● word2vec
  • 11. ESWC, Crete, 4th June 2018 Outline ● Knowledge Graphs ● Scope of this Paper and State-of-the-art ● T2V: Type to Vector ● Experiments
  • 12. ESWC, Crete, 4th June 2018 Word2Vec [Mikolov+, 2013] Well-known algorithm for learning word representations from an input corpus Distributional hypothesis: similar words appear in similar contexts (word-word co-occurrence) Type to Vector (T2V): generate distributed representations of types based on type-type co-occurrence. cat black eats dog similar words corresponds to similar vectors The big black cat eats its food. My little black cat sleeps all day. Sometimes my cat eats too much! Two hyperparameters: ● Desired embedding size ● Length of the context window
  • 13. ESWC, Crete, 4th June 2018 T2V: Word2Vec on Annotated Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Part of our approach to learn representations of typed entities: - Bianchi & Palmonari. Joint Learning of Entity and Type Embeddings for Analogical Reasoning with Entities. NL4AI 2017 - Bianchi & al. Towards Encoding Time in Text-Based Entity Embeddings. ISWC 2018 (to appear).
  • 14. ESWC, Crete, 4th June 2018 T2V: Word2Vec on Annotated Text Find entities in text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” ● Entities are found with a Named Entity Linking Service
  • 15. ESWC, Crete, 4th June 2018 T2V: Word2Vec on Annotated Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Rome Italy Rome Lazio Find entities in text ● Entities are found with a Named Entity Linking Service ● Words are removed
  • 16. ESWC, Crete, 4th June 2018 T2V: Word2Vec on Annotated Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Replace Entities With Minimal Types “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” City, Country, City Administrative_Region Rome Italy Rome Lazio ● Entities are found with a Named Entity Linking Service ● Words are removed ● Entities are replaced with their minimal (most specific) type Find entities in text
  • 17. ESWC, Crete, 4th June 2018 T2V: Word2Vec on Annotated Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Replace Entities With Minimal Types “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” City, Country, City Administrative_Region Rome Italy Rome Lazio ● Entities are found with a Named Entity Linking Service ● Words are removed ● Entities are replaced with their minimal (most specific) type ● The document containing sequences of types is fed to word2vec Find entities in text
  • 18. ESWC, Crete, 4th June 2018 T2V: Word2Vec on Annotated Text Generate Type Vectors word2vec “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Replace Entities With Minimal Types “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” City, Country, City Administrative_Region Rome Italy Rome Lazio 2 5 6 2 6 4 2 12 5 2 6 7 6 9 7 City Country Adminis. Region Similarity can be computed with cosine similarity Find entities in text
  • 19. ESWC, Crete, 4th June 2018 Outline ● Knowledge Graphs ● Scope of this Paper and State of the Art ● T2V: Type to Vector ● Experiments
  • 20. ESWC, Crete, 4th June 2018 Empirical Evaluation of T2V Representations Objective: analyzing the properties of the T2V representations, focus on similarity Corpus for T2V training: DBpedia 2016-04 abstracts annotated with DBpedia Spotlight Experiments: 1) Analogical reasoning (standard method of evaluation for word embeddings) 2) Correlation with topological measures 3) Similarity and depth (depth problem) 4) Similarity and siblings (siblings similarity problem) 5) Type matching (similarity between different categorization systems)
  • 21. ESWC, Crete, 4th June 2018 1) Analogical Reasoning Hypothesis T2V can support analogical reasoning as word2vec does Dataset Dataset of 868 reasonably objective analogies on sports. (e.g., sportPlayer - sportTeam) Methodology ● Tested two different T2V analogical reasoning models with 100 and 200 dimensions for the embeddings and a window of 5 ● Word2vec answers with the list of closest points to the analogical operation ● We check if the correct answer is found in the list of top-k (1, 5, 10) ● In the top-k setting answer is correct if it is in the top-k the ranked list Example “Who is the equivalent of a RugbyPlayer that plays in a RugbyTeam in a BasketballTeam?” RugbyPlayer : RugbyTeam :: ? : BasketballTeam Analogical operation: v(dbo:RugbyPlayer) - v(dbo:RugbyTeam) + v(dbo:BasketballTeam) ≈ v(dbo:BasketballPlayer)
  • 22. ESWC, Crete, 4th June 2018 1) Analogical Reasoning P@1 P@2 P@5 T2V (200,5) 0.50 0.85 0.98 T2V (100,5) 0.47 0.76 0.93 Outcome ● Correct answer is often found in the first 5 positions ● Linguistic properties are preserved also in T2V Model used for the next experiments
  • 23. ESWC, Crete, 4th June 2018 2) T2V vs Topological Measures: Correlation path wup wpath T2V path 1.00 0.87 0.94 0.30 wup 1.00 0.93 0.33 wpath 1.00 0.36 T2V 1.00 Hypothesis T2V similarity is orthogonal to topological similarity Dataset ~15000 pairs of types in DBpedia Methodology Pearson Correlation coefficient between T2V similarity and well-known topological measures Outcome T2V similarity and topological similarity are not strongly correlated
  • 24. ESWC, Crete, 4th June 2018 2) T2V vs Topological Measures: Insights State of the Art Based on the topology of the ontology Ex. dbo:Settlement and dbo:Town (high similarity) Ex. dbo:SoccerPlayer and dbo:SoccerClub (low similarity) Ex. dbo:Wrestler and dbo:SoccerPlayer (high similarity, siblings) T2V Captures the co-occurrences of types in text Ex. dbo:Settlement and dbo:Town (high similarity) Ex. dbo:SoccerPlayer and dbo:SoccerClub (high similarity) Ex. dbo:Wrestler and dbo:SoccerPlayer (low similarity, siblings)
  • 25. ESWC, Crete, 4th June 2018 2) T2V vs Topological Measures: Examples Type 1 Type 2 Sim - wpath Sim - T2V dbo:SoccerPlayer dbo:SoccerClub 0.17 0.72 dbo:SoccerPlayer dbo:Wrestler 0.47 0.24 dbo:RailwayLine dbo:Station 0.44 0.81 dbo:Vein dbo:Artery 0.70 0.84 dbo:RailwayLine dbo:PublicTransitSystem 0.11 0.79 dbo:Company dbo:Airline 0.72 0.30
  • 26. ESWC, Crete, 4th June 2018 3) Similarity vs. Depth Hypothesis Sibling types are pairwise more similar when types are more specific (as noticed in topological similarity ) sim(dbo:BasketballPlayer, dbo:SoccerPlayer) > sim(dbo:Person,dbo:Organization) Dataset DBpedia ontology Methodology ● Children Information Distribution CID ○ Average pairwise similarity between the children of a type p ● CID vs relative depth (relative = to the type path)
  • 27. ESWC, Crete, 4th June 2018 3) T2V CID vs. Relative Depth Outcome ● On average, CID increases with depth CID drops here: CID(dbo:Thing)>CID(dbo:Agent)
  • 28. ESWC, Crete, 4th June 2018 4) Siblings’ Similarity Hypothesis The pairwise similarity for a set of siblings changes from pair to pair Dataset 31 siblings type from the DBpedia ontology For each type we selected its most similar sibling and its least similar sibling considering T2V similarity (e.g., SoccerPlayer => most similar RugbyPlayer, least similar ChessPlayer) Methodology We asked 5 users (knowledgeable about semantic web) to answer questions like the following: “Do you think a SoccerPlayer is more similar to a RugbyPlayer or a ChessPlayer?” Potential Biases ● Low number of participants ● Questions were selected using T2V
  • 29. ESWC, Crete, 4th June 2018 4) Siblings’ Similarity Outcome ● Agreement between the user using Gwet AC1 [Gwet, 2008] is 0.9 (high agreement) ● Given an input type, users choose as answer the type that is also returned as most similar by T2V Examples Is a Writer more similar to a dbo:Philosopher or a dbo:BusinessPerson? Is a President more similar to a dbo:PrimeMinister or a dbo:Mayor? Most challenging question for users “is a dbo:Skyscraper more similar to a dbo:Hospital or a dbo:Museum?”
  • 30. ESWC, Crete, 4th June 2018 5) Type Matching Hypothesis T2V can be used for ontology matching provided that two different ontologies are used to classify a common set of instances Methodology ● Learn representations of types from different ontologies in a shared vector space (100 dimensions, 5 window) ● Replace entities with a type of one of the two ontologies (randomly) Dataset ● DBpedia 2016-04 and Wikidata 2016-06 (instance of) Same space in which types of different ontologies co-exist City (Ontology 1) Country (Ontology 2) City (Ontology 1) Region (Ontology 2) “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” 2 9 1 2 5 4 2 6 4 5 7 5 2 2 9
  • 31. ESWC, Crete, 4th June 2018 5) Type Matching Wikidata (label) DBpedia Sim Q4498974 (ice hokey team) HockeyTeam 0.99 Q5107 (continent) Continent 0.99 Q17374546* (Australian rules football club) AustralianFootballTeam 0.99 Q3001412* (horse race) HorseRace 0.98 Q4022 (river) River 0.98 Q46970 (airline) Airline 0.98 Q18127 (record label) RecordLabel 0.98 Q13027888* (baseball team) BaseballTeam 0.98 Q11424 (film) Film 0.98 Q1075* (color) Colour 0.98 Q17156793* (American football team) American Football Team 0.95 Q3146899* (diocese of the Catholic Church) Diocese 0.93 Q7944* (earthquake) Earthquake 0.91 * not declared equivalent in DBpedia Outcome ● Types with highest similarity are equivalent classes in the two ontologies (due to the use in text) ● Found equivalent types not declared as equivalent in DBpedia
  • 32. Conclusions and Future Work Future Work: ● Combine T2V similarity and topological similarities in one measure ● Study relation between sub-type relation and the vector representation ● Support ontology matching tasks ● Compare with other methods for vector-based type representations Conclusions: ● Similarity with T2V injects relatedness in type similarity measures (from handwritten text corpora) ● T2V exhibits some desired properties (depth, sibling discrimination) ● T2V supports analogical reasoning ● T2V can support ontology matching
  • 33. Thank You Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies (DL4KGS) Co-located with ESWC 18, June 2018, Crete, Greece Code and models are publicly available (see the paper for details) Mail to: federico.bianchi@disco.unimib.it
  • 34. ESWC, Crete, 4th June 2018 References Bianchi, F., Palmonari, M., & Nozza, D., Towards Encoding Time in Text-Based Entity Embeddings. in International Semantic Web Conference, 2018 (to appear). Bianchi, F., & Palmonari, M. (2017). Joint Learning of Entity and Type Embeddings for Analogical Reasoning with Entities. In In Proceedings of the NL4AI Workshop, co-located with the International Conference of the Italian Association for Artificial Intelligence (AI* IA). Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119). Kilem Li Gwet. Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology,61(1):29–48, 2008. Ganggao Zhu and Carlos A Iglesias. Computing semantic similarity of concepts in knowledge graphs. IEEE Transactions on Knowledge and Data Engineering, 29(1):72–85, 2017. V. Jayawardana, D. Lakmal, N. de Silva, A. S. Perera, K. Sugathadasa, and B. Ayesha. Deriving a representative vector for ontology classes with instance word vector embeddings. In INTECH, pages 79–84, Aug 2017. Fatima Zohra Smaili, Xin Gao, and Robert Hoehndorf. Onto2vec: joint vector-based representation of biological entities and their ontology-based annotations. arXiv preprint arXiv:1802.00864, 2018.