CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
Bootstrapping Entity Alignment with Knowledge Graph Embedding
1. Zequn Sun, Wei Hu, Qingheng Zhang and Yuzhong Qu
National Key Laboratory for Novel Software Technology
Nanjing University, China
{zqsun, qhzhang}.nju@gmail.com, {whu, yzqu}@nju.edu.cn
Bootstrapping Entity Alignment
with Knowledge Graph Embedding
1
2. Background
n Entity Alignment
¡ Find entities in different KGs that refer to the same real-world object
¡ Play a vital role in automatically integrating multiple KGs
n Conventional approaches
¡ Compute entity similarities based on entity attributes
¡ Are not always effective because of the semantic heterogeneity
n Embedding-based approaches
¡ Encode KGs into vector spaces
¡ Measure entity similarities via entity embeddings
2
3. Challenges
n Although embedding a single KG has been extensively studied
in the past few years, alignment-oriented KG embedding
remains largely unexplored.
n Embedding-based entity alignment usually relies on existing
entity alignment (prior alignment) as training data. However,
the accessible prior alignment usually accounts for a small
proportion.
3
4. Framework
n We model entity alignment as a classification problem of
using KG2 entities to label KG1 entities.
n To solve the aforementioned two issues, we proposed a
bootstrapping framework:
4
KG1 triples
KG2 triples
Prior
alignment
Supervised
triples
Parameter
swapping
Alignment
predictor
Train alignment-oriented
KG embeddings
Likely
alignment
Parameter
swapping
Alignment
editing
Alignment
labeling
5. Parameter Swapping
n We swap aligned entities in their triples to calibrate the
embeddings of KG1 and KG2 in the unified vector space.
!(#,%)
'
= {(*, +, ,)|(., +, ,) ∈ !0
1
} ∪ ℎ, +, * ℎ, +, . ∈ !0
1
∪ {(., +, ,)|(*, +, ,) ∈ !5
1
} ∪ ℎ, +, . ℎ, +, * ∈ !5
1
n The supervised triples are fed to our KG embedding model as
positives.
5
KG2’s triples
KG1’s triples
7. !-Truncated Negative Sampling
n Conventional uniform negative sampling
(Washington DC, capitalOf, USA) (Tim Berners-Lee, capitalOf, USA)
n !-Truncated negative sampling
(Washington DC, capitalOf, USA) (New York , capitalOf, USA)
7
The replacer is randomly sampled from all entities.
It may be easily distinguished from its original.
The sampling scope is limited to a group of candidates,
i.e., its "-nearest neighbors, where " = 1 − & ' .
8. Likely Alignment Labeling
n We choose to label likely alignment at the !-th iteration by
solving the following optimization problem:
max %
&∈(
%
)∈*+
,(.|0; 2 3
) 5 6 3
(0, .) ,
s. t. %
&;∈(
6 3
(0<
, .) ≤ 1,
%
);∈*+
6 3
(0, .<
) ≤ 1, ∀0, .
n We transform it to max-weighted matching on bipartite
graphs.
8
( *
one-to-one labeling
9. Likely Alignment Editing
n Labeling conflicts exist when accumulating the newly-
labeled alignment of different iterations.
¡ ! is labeled as " at the #-th iteration while as "$
at the (#+1)-th
iteration
n We calculate the following likelihood difference:
∆(',),)*)
(,)
= . " !; 0 ,
− .("$
|!; 0 ,
)
¡ If ∆(',),)*)
(,)
> 0, indicating labeling x as y gives more alignment
likelihood, we choose " to label !. Otherwise "$
.
9
10. Experiments
10
n Dataset
¡ DBP15K: three cross-lingual datasets built from the multilingual
versions of DBpedia: DBPZH-EN (Chinese to English), DBPJA-EN
(Japanese to English) and DBPFR-EN (French to English). Each
dataset contains 15 thousand reference entity alignment.
¡ DWY100K: two large-scale datasets extracted from DBpedia,
Wikidata and YAGO3, denoted by DBP-WD and DBP-YG. Each
dataset has 100 thousand reference entity alignment.
11. Experiments
11
n Comparative Approaches
¡ MTransE [ijcai 2017] learns a linear transformation between KGs.
¡ IPTransE [ijcai 2017] is an iterative method for entity alignment.
¡ JAPE [iswc 2017] combines relation and attribute embeddings for
entity alignment.
n Metrics
¡ Hits@k : the percentage of correct alignment ranked at top k
¡ MRR: the average of the reciprocal ranks of results
13. Experiments
13
n F1-score w.r.t. Distribution of Relation Triple Numbers
¡ We divided entity links in testing data into several intervals based on
the number of their relation triples.
¡ The performance was assessed by F1-score within a certain interval.
¡ This analysis demonstrated that BootEA can achieve promising
results on sparse data, indicating its practical use for real KGs.
0.0
0.2
0.4
0.6
0.8
1.0
[1,6) [6,11) [11,16) [16,21) [21,∞)
F1-score
Number of relation triples
MTransE IPTransE JAPE BootEA
Number of entity alignment within interval
14. Conclusion
14
n In this paper, we studied embedding-based entity alignment.
¡ We introduced a KG embedding model to learn alignment-oriented
embeddings across different KGs. It employs an !-truncated uniform
negative sampling method to improve alignment performance.
¡ We conducted entity alignment in a bootstrapping process. It labels
likely alignment as training data and edits alignment during iterations
¡ Our experiment results showed that the proposed approach
significantly outperformed three state-of-the-art embedding-based
ones, on three cross-lingual datasets and two new large-scale
datasets.
15. Thanks for your attention!
n This work is supported by the National Key R&D Program of China
(No. 2018YFB1004300)
n Codes and datasets of BootEA are now available at
https://github.com/nju-websoft/BootEA
n Welcome to my poster (#1425)
15