SlideShare a Scribd company logo
1 of 26
Download to read offline
Orange restricted
Radar Station
Using KG Embeddings for Semantic Table
Interpretation and Entity Disambiguation
Jixiong Liu Viet-Phi Huynh Yoan Chabot Raphaël Troncy
Radar Station-ISWC 2022
01
26 October 2022
Radar Station-ISWC 2022
02
What does this table mean?
Can the machine automatically interpret it?
… … …
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Context & Motivation
author
(P50)
Radar Station-ISWC 2022
03
The New Jedi Order
(Q2743959)
Traitor
(Q7833036)
Ylesia
(Q8053998)
(P179)
Part of the series
30 July
2002
publication date
(P577)
3 September
2002
Matthew Stover
(Q1909623)
Walter Jon Williams
(Q714485)
author
(P50)
… … …
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Semantic Table Interpretation using Knowledge Graphs
… … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
• Column-Type Annotation (CTA)
• Columns-Predicate Annotation (CPA)
• Cell-Entity Annotation (CEA) - Our focus
• Table Topic Annotation
• Row-to-Instance Traitor (literary work):
Q7833036 on Wikidata
author: Matthew Stover
part of the series: The New Jedi Order
publisher: Del Rey Books
publication date: 30 July 2002
media franchise: Star Wars …
Traitor (literary work):
Q21161161 on Wikidata
author: Stephen Daisley
country of origin: Australia
publication date: 2010
language of work or name: English …
Radar Station-ISWC 2022
04
Traitor (literary work):
Q7833036 on Wikidata
author: Matthew Stover
part of the series: The New Jedi Order
publisher: Del Rey Books
publication date: 30 July 2002
media franchise: Star Wars …
Semantic Table Interpretation – Up to Five Tasks
Interne Orange
Semantic Table Interpretation - Related Work
• Heuristic-Based Approaches:
• Rely on features (e.g. relevance score) provided by a lookup service
• E.g., ADOG [1], BBW [2]
[1] Oliveira, D., d’Aquin, M.: Adog-annotating data with ontologies and graphs. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) (2019)
[2] Shigapov, R, Zumstein, P, Kamlah, J, Oberländer, L, Mechnich, J, Schumm, I., d’Aquin, M.: bbw: Matching CSV to Wikidata via Meta-lookup. In: Semantic Web Challenge on Tabular Data to Knowledge
Graph Matching (SemTab) (2020)
𝑠𝑖𝑚 = 1 − (
𝐿𝑒𝑣𝑒𝑛𝑠ℎ𝑡𝑒𝑖𝑛𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑠1, 𝑠2)
ma𝑥(𝑙𝑒𝑛𝑔𝑡ℎ 𝑠1 , 𝑙𝑒𝑛𝑔𝑡ℎ(𝑠2))
)
Radar Station-ISWC 2022
05
s1: Label of the entity from the KG (string)
s2: Mention from the table cell (string)
Interne Orange
Semantic Table Interpretation - Related Work
• Heuristic-Based Approaches
• Iterative Disambiguation:
• Use the results of the CEA, CTA, CPA annotation tasks, in order to mutually
reinforce the compatibility between annotations
• Main shortcomings:
• Error propagation
• Background knowledge hidden in
the table is not used (e.g. all books
belong to a series)
• E.g., DAGOBAH [3], Mtab [4]
[3] Huynh, V.P., Liu, J., Chabot, Y., Deuzé, F., Labbé, T., Monnin, P., Troncy, R.: DAGOBAH: Table and Graph Contexts for Efficient Semantic Annotation of Tabular Data. In: Semantic Web Challenge on
Tabular Data to Knowledge Graph Matching (SemTab) (2021)
[4] Nguyen, P., Yamada, I., Kertkeidkachorn, N., Ichise, R., Takeda, H.: Mtab4wikidata at semtab 2020: Tabular data annotation with wikidata. In: Semantic Web Challenge on Tabular Data to Knowledge
Graph Matching (SemTab) (2020)
Radar Station-ISWC 2022
06
Interne Orange
Semantic Table Interpretation - Related Work
• Heuristic-Based Approaches
• Iterative Disambiguation
• Usage of Graph Embeddings:
• Use pre-trained graph embeddings for
augmenting information about entities
• Main shortcoming: the embeddings
quality depends on the density of the
graph
• E.g., Vasilis et al [5], DAGOBAH-
Embeddings [6]
[5] Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Matching web tables with knowledge base entities: from entity lookups to entity embeddings. In: 16th International Semantic
Web Conference (ISWC). pp. 260–277. Springer (2017)
[6] Chabot, Y., Labbe, T., Liu, J., Troncy, R.: DAGOBAH: an end-to-end context-free tabular data semantic annotation system. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching
(SemTab). pp. 41–48 (2019)
Radar Station-ISWC 2022
07
Our Approach: Radar Station
Annotation
System
Tables
Ambiguity
Detection
Radar Station
Disambiguation
Output
KG Embeddings
Candidate Scores Ambiguities
& Context
Context Entities
Selection
Radar Station-ISWC 2022
08
Ambiguity
Detection
Context Entities
Selection
Radar Station
Disambiguation
Detect potential errors caused by error propagation
Capture more semantic similarities (from the embeddings)
Disambiguation by hybridizing entity scores and embeddings distance
Radar Station is a plug-in module for an
existing STI system (typically using iterative
disambiguation) that will benefit from pre-
trained embeddings as data augmentation
… … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
Why is this table difficult to interpret?
• The table lacks context, e.g., for the target column (book titles), who are the authors?
• The information about the book series (Star Wars) is not present in the table
• Matching “2002” with “30 July 2002” is not trivial
Traitor (Literary work):
Q7833036 on Wikidata
author: Matthew Stover
part of the series: The New Jedi Order
publisher: Del Rey Books
publication date: 30 July 2002
media franchise: Star Wars …
Traitor (Literary Work):
Q21161161 on Wikidata
author: Stephen Daisley
country of origin: Australia
publication date: 2010
language of work or name: English …
Radar Station-ISWC 2022
09
DAGOBAH SL results: 2 candidates with an equal score
Mtab results: the correct candidate is at the 4th rank
BBW results: no output for this cell
We use DAGOBAH SL as input system to illustrate this presentation
… … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
DAGOBAH SL scores:
{‘id': 'Q21161161’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q7833036’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q1536329’, ‘score’: 0.01164}, (Traitor - film)
…... ,
MTab scores:
{‘id’: ‘Q2435622’, ‘score’: 0.02546}, (Traitor - television series episode)
{‘id’: ‘Q16746183’, ‘score’: 0.02545}, (Traitor - television series episode)
{‘id’: ‘Q7833042’, ‘score’: 0.024468}, (Traitor - fictional character)
{‘id’: ‘Q7833036’, ‘score’: 0.024467}, (Traitor - literary work)
…... ,
Radar Station-ISWC 2022
10
… … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
We aim to detect the cell annotations that need to be disambiguated
We set a tolerance t to select the top candidates
Example:
• If t = 1, Q21161161 and Q7833036 are top candidates
Radar Station-ISWC 2022
11
DAGOBAH SL scores:
…
{‘id': 'Q21161161’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q7833036’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q1536329’, ‘score’: 0.01164}, (Traitor - film)
…... ]},
...
We aim to detect the cell annotations that need to be disambiguated:
We set a tolerance t to select the top candidates
Example:
• If t = 1, Q21161161 and Q7833036 are the top candidates
• If t = 0.7, Q1536329 is also considered among the top candidates (0.1164>0.16*0.7)
Top candidates are ambiguities that we need to disambiguate
… … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
Radar Station-ISWC 2022
12
DAGOBAH SL scores:
…
{‘id': 'Q21161161’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q7833036’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q1536329’, ‘score’: 0.01164}, (Traitor - film)
…... ]},
...
… … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
If we only have one candidate in top candidates (e.g., row “Destiny’s Way” with
t = 1), we directly output the entity without Radar Station.
Radar Station-ISWC 2022
13
DAGOBAH SL scores:
…
{‘id': ‘Q5265233’, ‘score’: 0.01600}, (Destiny’s Way - literary work)
{‘id’: ‘Q60172766’, ‘score’: 0.0102}, (Destiny - literary work)
{‘id’: ‘Q17010392’, ‘score’: 0.0102}, (Destiny - literary work)
…... ]},
...
… … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
We aim to build a column-wised representation of table context with
candidates from the same column.
Collect all top candidates and their scores for a given t from the same column as
the context entities (e.g., t =1)
…
{‘row’: 15, ‘column’ : 1,
‘Annotations’: [
{‘id': 'Q21161161’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q7833036’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q1536329’, ‘score’: 0.01164}, (Traitor - film) …... ]},
{‘row’: 16, ‘column’ : 1,
{‘Annotations’: [
{‘id’: ‘Q5265233’, ‘score’: 0.01600}, (Destiny’s Way - literary work)
{‘id’: ‘Q60172766’, ‘score’: 0.0102}, (Destiny - literary work)
{‘id’: ‘Q17010392’, ‘score’: 0.0102}, (Destiny - literary work) ….. ]},
Radar Station-ISWC 2022
14
Radar Station - Intuition Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
Distance
Initial
power
receiver sender
Distance Embeddings distance
Signal power scoring from a previous annotation system
Radar Station-ISWC 2022
15
Radar Station - Intuition Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
Star by Star
Enemy Lines: Rebel Stand
Enemy Lines: Rebel Dream
Destiny’s Way
Ylesia
Dark Journey
Traitor
Q7833036
Traitor
Q21161161
Radar Station-ISWC 2022
16
Experiment - Embeddings Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
Leverage Pytorch-Biggraph [7] for training embeddings
Experiment with:
• 2 translational distance models:
TransE, RotatE (GraphVite [8] pre-trained embeddings)
• 2 semantic matching models: DistMult, ComplEx
[7] Lerer, A., Wu, L., Shen, J., Lacroix, T., Wehrstedt, L., Bose, A., Peysakhovich, A.: Pytorch-biggraph: A large scale graph embedding system. In: Conference onMachine Learning and
Systems (MLSys). vol. 1, pp. 120–131 (2019)
[8] Zhu, Z., Xu, S., Tang, J., Qu, M.: Graphvite: A high-performance cpu-gpu hybrid system for node embedding. In: The World Wide Web Conference (WWW). pp. 2494–2504 (2019)
Radar Station-ISWC 2022
17
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
Using table context to disambiguate ambiguities
𝑎𝑚𝑖: An ambiguity (one of the top candidates
previously selected)
𝑒𝑗: A context entity, i.e. a candidate entity for
another cell from the same column
𝑆𝑐(𝑒𝑗): Score of the context entity 𝑒𝑗
…
{‘row’: 15, ‘column’ : 1,
‘Annotations’: [
{‘id': 'Q21161161’, ‘score’: 0.01600},
{‘id’: ‘Q7833036’, ‘score’: 0.01600},
{‘row’: 16, ‘column’ : 1,
{‘Annotations’: [
{‘id’: ‘Q5265233’, ‘score’: 0.01600},
{‘id’: ‘Q60172766’, ‘score’: 0.0102},
{‘id’: ‘Q17010392’, ‘score’: 0.0102}, ….. ]},
...
Table context
Incorrect candidate
Correct candidate
-- Ambiguities
-- Context
𝐹 𝑎𝑚𝑖 =
1
𝐾
෍
𝑗<𝐾
(
𝑆𝑐(𝑒𝑗)
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑎𝑚𝑖, 𝑒𝑗)
)
Radar Station-ISWC 2022
18
Experiment – Dataset
Limaye, T2D - Web tables
2T_v2 - Synthetic tables
ShortTables - Built from T2D (each table only contains two rows, reduce context)
Gold
standard
# Table Avg.
#Rows
Avg. #Col #Entities Ambiguities (t=1) Ambiguities (t=0.9)
Limaye 437 37 2 5,143 181 (3.52%) 685 (13.31%)
T2D 762 157 5 18,589 2,322 (12.49%) 8,852 (47.62%)
2T_v2 180 1080 5 661,297 30,686 (4.64%) 86,739 (13.11%)
ShortTables 2237 2 5 4,474 1422 (31.78%) 1822 (40.72%)
Radar Station-ISWC 2022
19
Evaluation - Metrics
𝐴𝑃 =
# 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑎𝑚𝑏𝑖𝑔𝑢𝑖𝑡𝑖𝑒𝑠
# 𝐴𝑚𝑏𝑖𝑔𝑢𝑖𝑡𝑖𝑒𝑠
𝑃𝐴 =
# 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑚𝑏𝑖𝑔𝑢𝑖𝑡𝑦 𝑑𝑖𝑠𝑎𝑚𝑏𝑖𝑔𝑢𝑎𝑡𝑖𝑜𝑛𝑠
# 𝐴𝑚𝑏𝑖𝑔𝑢𝑖𝑡𝑖𝑒𝑠
GP =
# 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛𝑠
# 𝑇𝑜𝑡𝑎𝑙 𝑙𝑎𝑏𝑒𝑙𝑠
Radar Station-ISWC 2022
20
AP: the quality of the ambiguity set
PA: the precision for Radar Station over the ambiguity set
GP: the global precision among all annotations with or without Radar Station
Evaluation - Improvements on all datasets (regardless of the embeddings type)
Methods
Limaye T2D 2T_v2 ShortTable
AP PA GP AP PA GP AP PA GP AP PA GP
DAGOBAH SL 0.296 0.853 0.180 0.785 0.208 0.870 0.302 0.654
RS+TransE 0.528 0.872 0.312 0.815 0.230 0.872 0.414 0.673
RS+RotatE 0.614 0.542 0.873 0.332 0.312 0.815 0.327 0.235 0.872 0.671 0.418 0.674
RS+DistMult 0.377 0.860 0.230 0.797 0.213 0.870 0.328 0.659
RS+ComplEx 0.435 0.864 0.233 0.798 0.219 0.870 0.334 0.660
Radar Station evaluation based on DAGOBAH SL scores, t=0,95.
Radar Station-ISWC 2022
21
AP: the quality of the ambiguity set
PA: the precision for Radar Station over the ambiguity set
GP: the global precision among all annotations with or without Radar Station
AP: the quality of the ambiguity set
PA: the precision for Radar Station over the ambiguity set
GP: the global precision among all annotations with or without Radar Station
Evaluation - Improvements for all base input systems
Dataset System t AP
Original Output Radar Station
PA GP PA GP
Limaye
DAGOBAH SL 0.9 0.653 0.432 0.853 0.578 (+0.146) 0.873 (+0.020)
MTab 0.83 0.820 0.705 0.857 0.787 (+0.082) 0.875 (+0.018)
BBW 0.65 0.587 0.359 0.563 0.507 (+0.148) 0.597 (+0.034)
T2D
DAGOBAH SL 0.95 0.332 0.180 0.785 0.312 (+0.132) 0.815 (+0.030)
MTab 0.71 0.385 0.295 0.837 0.346 (+0.051) 0.857 (+0.020)
BBW 0.65 0.263 0.192 0.364 0.253 (+0.061) 0.382 (+0.018)
Radar Station evaluation on Web tables with DAGOBAL SL, Mtab and BBW, with RotatE
Radar Station-ISWC 2022
22
AP: the quality of the ambiguity set
PA: the precision for Radar Station over the ambiguity set
GP: the global precision among all annotations with or without Radar Station
Evaluation - More improvements on Web tables than on synthetic tables
More improvements on Web tables (Max +3%) than synthetic tables (Max +0.2%)
- Synthetic tables lack the inclusion of common themes.
Web Tables Synthetic Tables
Methods Limaye T2D 2T_v2
AP PA GP AP PA GP AP PA GP
DAGOBAH SL 0.296 0.853 0.180 0.785 0.208 0.870
RS+TransE 0.528 0.872 0.312 0.815 0.230 0.872
RS+RotatE 0.614 0.542 0.873 0.332 0.312 0.815 0.327 0.235 0.872
RS+DistMult 0.377 0.860 0.230 0.797 0.213 0.870
RS+ComplEx 0.435 0.864 0.233 0.798 0.219 0.870
Radar Station evaluation based on DAGOBAH SL scores. t=0.95
Radar Station-ISWC 2022
23
Evaluation - Not specific improvements over simulated extreme conditions
The contribution of Radar Station is minimal in T2D and ShortTable (Max +3%)
• More ambiguities +
• Less context -
Methods
T2D ShortTable
AP PA GP AP PA GP
DAGOBAH SL 0.180 0.785 0.302 0.654
RS+TransE 0.312 0.815 0.414 0.673
RS+RotatE 0.332 0.312 0.815 0.671 0.418 0.674
RS+DistMult 0.230 0.797 0.328 0.659
RS+ComplEx 0.233 0.798 0.334 0.660
Radar Station evaluation based on DAGOBAH SL scores. t=0.95
Radar Station-ISWC 2022
24
AP: the quality of the ambiguity set
PA: the precision for Radar Station over the ambiguity set
GP: the global precision among all annotations with or without Radar Station
The results are similar for embeddings from the same family
Translational distance models are better than semantic matching models
t
Models Limaye
Class System AP PA GP
0.95
- DAGOBAH SL 0.296 0.853
Translational
Distance
RS+TransE 0.528 0.872
RS+RotatE 0.614 0.542 0.873
Semantic
Matchin
RS+DistMult 0.377 0.860
RS+ComplEx 0.435 0.864
Evaluation - Translational distance models are better
Illustration of the Kappa test between different outputs,
t = 0.95.
Radar Station evaluation based on DAGOBAH SL scores.
Radar Station-ISWC 2022
25
Interne Orange
Conclusion & Future Work
▪ Radar Station is a useful plug-in module for improving cell annotations!
Github: https://github.com/Orange-OpenSource/radar-station
Data and Models: https://zenodo.org/record/6522985
& https://zenodo.org/record/6522921
Slides: https://tinyurl.com/radar-station-iswc2022
▪ Future Work:
▪ Handle additional tables (beyond relational tables)
▪ Handle additional context (e.g. table caption, text surrounding the table, etc.)
▪ Downstream tasks (e.g., schemas augmentation, data imputation)
Radar Station-ISWC 2022
26

More Related Content

Similar to Radar Station - ISWC 2022.pdf

Representative Previous Work
Representative Previous WorkRepresentative Previous Work
Representative Previous Workbutest
 
Representative Previous Work
Representative Previous WorkRepresentative Previous Work
Representative Previous Workbutest
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Line Detection on the GPU
Line Detection on the GPU Line Detection on the GPU
Line Detection on the GPU Gernot Ziegler
 
Strengthening support vector classifiers based on fuzzy logic and evolutionar...
Strengthening support vector classifiers based on fuzzy logic and evolutionar...Strengthening support vector classifiers based on fuzzy logic and evolutionar...
Strengthening support vector classifiers based on fuzzy logic and evolutionar...Reza Sadeghi
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Usatyuk Vasiliy
 
“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”inventionjournals
 
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Beniamino Murgante
 
Dagobahic2020orange
Dagobahic2020orangeDagobahic2020orange
Dagobahic2020orangeJixiongLIU
 
Security of Artificial Intelligence
Security of Artificial IntelligenceSecurity of Artificial Intelligence
Security of Artificial IntelligenceFederico Cerutti
 
Pointcuts and Analysis
Pointcuts and AnalysisPointcuts and Analysis
Pointcuts and AnalysisWiwat Ruengmee
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Pirouz Nourian
 
Digital Watermarking using DWT-SVD
Digital Watermarking using DWT-SVDDigital Watermarking using DWT-SVD
Digital Watermarking using DWT-SVDSurit Datta
 
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...Julián Urbano
 
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...NAVER Engineering
 
21cm cosmology with machine learning (Review))
21cm cosmology with machine learning (Review))21cm cosmology with machine learning (Review))
21cm cosmology with machine learning (Review))Hayato Shimabukuro
 

Similar to Radar Station - ISWC 2022.pdf (20)

Representative Previous Work
Representative Previous WorkRepresentative Previous Work
Representative Previous Work
 
Representative Previous Work
Representative Previous WorkRepresentative Previous Work
Representative Previous Work
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Line Detection on the GPU
Line Detection on the GPU Line Detection on the GPU
Line Detection on the GPU
 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
 
Strengthening support vector classifiers based on fuzzy logic and evolutionar...
Strengthening support vector classifiers based on fuzzy logic and evolutionar...Strengthening support vector classifiers based on fuzzy logic and evolutionar...
Strengthening support vector classifiers based on fuzzy logic and evolutionar...
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”
 
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
 
ICRA Nathan Piasco
ICRA Nathan PiascoICRA Nathan Piasco
ICRA Nathan Piasco
 
Dagobahic2020orange
Dagobahic2020orangeDagobahic2020orange
Dagobahic2020orange
 
Security of Artificial Intelligence
Security of Artificial IntelligenceSecurity of Artificial Intelligence
Security of Artificial Intelligence
 
Pointcuts and Analysis
Pointcuts and AnalysisPointcuts and Analysis
Pointcuts and Analysis
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
 
Digital Watermarking using DWT-SVD
Digital Watermarking using DWT-SVDDigital Watermarking using DWT-SVD
Digital Watermarking using DWT-SVD
 
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
 
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
 
AR/SLAM and IoT
AR/SLAM and IoTAR/SLAM and IoT
AR/SLAM and IoT
 
21cm cosmology with machine learning (Review))
21cm cosmology with machine learning (Review))21cm cosmology with machine learning (Review))
21cm cosmology with machine learning (Review))
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Radar Station - ISWC 2022.pdf

  • 1. Orange restricted Radar Station Using KG Embeddings for Semantic Table Interpretation and Entity Disambiguation Jixiong Liu Viet-Phi Huynh Yoan Chabot Raphaël Troncy Radar Station-ISWC 2022 01 26 October 2022
  • 2. Radar Station-ISWC 2022 02 What does this table mean? Can the machine automatically interpret it? … … … 2002 Enemy Lines: Rebel Stand 2002 Traitor 2002 Destiny’s Way 2002 Ylesia E-book Context & Motivation
  • 3. author (P50) Radar Station-ISWC 2022 03 The New Jedi Order (Q2743959) Traitor (Q7833036) Ylesia (Q8053998) (P179) Part of the series 30 July 2002 publication date (P577) 3 September 2002 Matthew Stover (Q1909623) Walter Jon Williams (Q714485) author (P50) … … … 2002 Enemy Lines: Rebel Stand 2002 Traitor 2002 Destiny’s Way 2002 Ylesia E-book Semantic Table Interpretation using Knowledge Graphs
  • 4. … … … 2002 Enemy Lines: Rebel Dream 2002 Enemy Lines: Rebel Stand 2002 Traitor 2002 Destiny’s Way 2002 Ylesia E-book • Column-Type Annotation (CTA) • Columns-Predicate Annotation (CPA) • Cell-Entity Annotation (CEA) - Our focus • Table Topic Annotation • Row-to-Instance Traitor (literary work): Q7833036 on Wikidata author: Matthew Stover part of the series: The New Jedi Order publisher: Del Rey Books publication date: 30 July 2002 media franchise: Star Wars … Traitor (literary work): Q21161161 on Wikidata author: Stephen Daisley country of origin: Australia publication date: 2010 language of work or name: English … Radar Station-ISWC 2022 04 Traitor (literary work): Q7833036 on Wikidata author: Matthew Stover part of the series: The New Jedi Order publisher: Del Rey Books publication date: 30 July 2002 media franchise: Star Wars … Semantic Table Interpretation – Up to Five Tasks
  • 5. Interne Orange Semantic Table Interpretation - Related Work • Heuristic-Based Approaches: • Rely on features (e.g. relevance score) provided by a lookup service • E.g., ADOG [1], BBW [2] [1] Oliveira, D., d’Aquin, M.: Adog-annotating data with ontologies and graphs. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) (2019) [2] Shigapov, R, Zumstein, P, Kamlah, J, Oberländer, L, Mechnich, J, Schumm, I., d’Aquin, M.: bbw: Matching CSV to Wikidata via Meta-lookup. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) (2020) 𝑠𝑖𝑚 = 1 − ( 𝐿𝑒𝑣𝑒𝑛𝑠ℎ𝑡𝑒𝑖𝑛𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑠1, 𝑠2) ma𝑥(𝑙𝑒𝑛𝑔𝑡ℎ 𝑠1 , 𝑙𝑒𝑛𝑔𝑡ℎ(𝑠2)) ) Radar Station-ISWC 2022 05 s1: Label of the entity from the KG (string) s2: Mention from the table cell (string)
  • 6. Interne Orange Semantic Table Interpretation - Related Work • Heuristic-Based Approaches • Iterative Disambiguation: • Use the results of the CEA, CTA, CPA annotation tasks, in order to mutually reinforce the compatibility between annotations • Main shortcomings: • Error propagation • Background knowledge hidden in the table is not used (e.g. all books belong to a series) • E.g., DAGOBAH [3], Mtab [4] [3] Huynh, V.P., Liu, J., Chabot, Y., Deuzé, F., Labbé, T., Monnin, P., Troncy, R.: DAGOBAH: Table and Graph Contexts for Efficient Semantic Annotation of Tabular Data. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) (2021) [4] Nguyen, P., Yamada, I., Kertkeidkachorn, N., Ichise, R., Takeda, H.: Mtab4wikidata at semtab 2020: Tabular data annotation with wikidata. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) (2020) Radar Station-ISWC 2022 06
  • 7. Interne Orange Semantic Table Interpretation - Related Work • Heuristic-Based Approaches • Iterative Disambiguation • Usage of Graph Embeddings: • Use pre-trained graph embeddings for augmenting information about entities • Main shortcoming: the embeddings quality depends on the density of the graph • E.g., Vasilis et al [5], DAGOBAH- Embeddings [6] [5] Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Matching web tables with knowledge base entities: from entity lookups to entity embeddings. In: 16th International Semantic Web Conference (ISWC). pp. 260–277. Springer (2017) [6] Chabot, Y., Labbe, T., Liu, J., Troncy, R.: DAGOBAH: an end-to-end context-free tabular data semantic annotation system. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab). pp. 41–48 (2019) Radar Station-ISWC 2022 07
  • 8. Our Approach: Radar Station Annotation System Tables Ambiguity Detection Radar Station Disambiguation Output KG Embeddings Candidate Scores Ambiguities & Context Context Entities Selection Radar Station-ISWC 2022 08 Ambiguity Detection Context Entities Selection Radar Station Disambiguation Detect potential errors caused by error propagation Capture more semantic similarities (from the embeddings) Disambiguation by hybridizing entity scores and embeddings distance Radar Station is a plug-in module for an existing STI system (typically using iterative disambiguation) that will benefit from pre- trained embeddings as data augmentation
  • 9. … … … 2002 Enemy Lines: Rebel Dream 2002 Enemy Lines: Rebel Stand 2002 Traitor 2002 Destiny’s Way 2002 Ylesia E-book Radar Station Annotation System Tables Candidate Scores Context Entities Selection Ambiguity Detection Radar Station Disambiguation Ambiguities & Context KG Embeddings Output Why is this table difficult to interpret? • The table lacks context, e.g., for the target column (book titles), who are the authors? • The information about the book series (Star Wars) is not present in the table • Matching “2002” with “30 July 2002” is not trivial Traitor (Literary work): Q7833036 on Wikidata author: Matthew Stover part of the series: The New Jedi Order publisher: Del Rey Books publication date: 30 July 2002 media franchise: Star Wars … Traitor (Literary Work): Q21161161 on Wikidata author: Stephen Daisley country of origin: Australia publication date: 2010 language of work or name: English … Radar Station-ISWC 2022 09
  • 10. DAGOBAH SL results: 2 candidates with an equal score Mtab results: the correct candidate is at the 4th rank BBW results: no output for this cell We use DAGOBAH SL as input system to illustrate this presentation … … … 2002 Enemy Lines: Rebel Dream 2002 Enemy Lines: Rebel Stand 2002 Traitor 2002 Destiny’s Way 2002 Ylesia E-book Radar Station Annotation System Tables Candidate Scores Context Entities Selection Ambiguity Detection Radar Station Disambiguation Ambiguities & Context KG Embeddings Output DAGOBAH SL scores: {‘id': 'Q21161161’, ‘score’: 0.01600}, (Traitor - literary work) {‘id’: ‘Q7833036’, ‘score’: 0.01600}, (Traitor - literary work) {‘id’: ‘Q1536329’, ‘score’: 0.01164}, (Traitor - film) …... , MTab scores: {‘id’: ‘Q2435622’, ‘score’: 0.02546}, (Traitor - television series episode) {‘id’: ‘Q16746183’, ‘score’: 0.02545}, (Traitor - television series episode) {‘id’: ‘Q7833042’, ‘score’: 0.024468}, (Traitor - fictional character) {‘id’: ‘Q7833036’, ‘score’: 0.024467}, (Traitor - literary work) …... , Radar Station-ISWC 2022 10
  • 11. … … … 2002 Enemy Lines: Rebel Dream 2002 Enemy Lines: Rebel Stand 2002 Traitor 2002 Destiny’s Way 2002 Ylesia E-book Radar Station Annotation System Tables Candidate Scores Context Entities Selection Ambiguity Detection Radar Station Disambiguation Ambiguities & Context KG Embeddings Output We aim to detect the cell annotations that need to be disambiguated We set a tolerance t to select the top candidates Example: • If t = 1, Q21161161 and Q7833036 are top candidates Radar Station-ISWC 2022 11 DAGOBAH SL scores: … {‘id': 'Q21161161’, ‘score’: 0.01600}, (Traitor - literary work) {‘id’: ‘Q7833036’, ‘score’: 0.01600}, (Traitor - literary work) {‘id’: ‘Q1536329’, ‘score’: 0.01164}, (Traitor - film) …... ]}, ...
  • 12. We aim to detect the cell annotations that need to be disambiguated: We set a tolerance t to select the top candidates Example: • If t = 1, Q21161161 and Q7833036 are the top candidates • If t = 0.7, Q1536329 is also considered among the top candidates (0.1164>0.16*0.7) Top candidates are ambiguities that we need to disambiguate … … … 2002 Enemy Lines: Rebel Dream 2002 Enemy Lines: Rebel Stand 2002 Traitor 2002 Destiny’s Way 2002 Ylesia E-book Radar Station Annotation System Tables Candidate Scores Context Entities Selection Ambiguity Detection Radar Station Disambiguation Ambiguities & Context KG Embeddings Output Radar Station-ISWC 2022 12 DAGOBAH SL scores: … {‘id': 'Q21161161’, ‘score’: 0.01600}, (Traitor - literary work) {‘id’: ‘Q7833036’, ‘score’: 0.01600}, (Traitor - literary work) {‘id’: ‘Q1536329’, ‘score’: 0.01164}, (Traitor - film) …... ]}, ...
  • 13. … … … 2002 Enemy Lines: Rebel Dream 2002 Enemy Lines: Rebel Stand 2002 Traitor 2002 Destiny’s Way 2002 Ylesia E-book Radar Station Annotation System Tables Candidate Scores Context Entities Selection Ambiguity Detection Radar Station Disambiguation Ambiguities & Context KG Embeddings Output If we only have one candidate in top candidates (e.g., row “Destiny’s Way” with t = 1), we directly output the entity without Radar Station. Radar Station-ISWC 2022 13 DAGOBAH SL scores: … {‘id': ‘Q5265233’, ‘score’: 0.01600}, (Destiny’s Way - literary work) {‘id’: ‘Q60172766’, ‘score’: 0.0102}, (Destiny - literary work) {‘id’: ‘Q17010392’, ‘score’: 0.0102}, (Destiny - literary work) …... ]}, ...
  • 14. … … … 2002 Enemy Lines: Rebel Dream 2002 Enemy Lines: Rebel Stand 2002 Traitor 2002 Destiny’s Way 2002 Ylesia E-book Radar Station Annotation System Tables Candidate Scores Context Entities Selection Ambiguity Detection Radar Station Disambiguation Ambiguities & Context KG Embeddings Output We aim to build a column-wised representation of table context with candidates from the same column. Collect all top candidates and their scores for a given t from the same column as the context entities (e.g., t =1) … {‘row’: 15, ‘column’ : 1, ‘Annotations’: [ {‘id': 'Q21161161’, ‘score’: 0.01600}, (Traitor - literary work) {‘id’: ‘Q7833036’, ‘score’: 0.01600}, (Traitor - literary work) {‘id’: ‘Q1536329’, ‘score’: 0.01164}, (Traitor - film) …... ]}, {‘row’: 16, ‘column’ : 1, {‘Annotations’: [ {‘id’: ‘Q5265233’, ‘score’: 0.01600}, (Destiny’s Way - literary work) {‘id’: ‘Q60172766’, ‘score’: 0.0102}, (Destiny - literary work) {‘id’: ‘Q17010392’, ‘score’: 0.0102}, (Destiny - literary work) ….. ]}, Radar Station-ISWC 2022 14
  • 15. Radar Station - Intuition Annotation System Tables Candidate Scores Context Entities Selection Ambiguity Detection Radar Station Disambiguation Ambiguities & Context KG Embeddings Output Distance Initial power receiver sender Distance Embeddings distance Signal power scoring from a previous annotation system Radar Station-ISWC 2022 15
  • 16. Radar Station - Intuition Annotation System Tables Candidate Scores Context Entities Selection Ambiguity Detection Radar Station Disambiguation Ambiguities & Context KG Embeddings Output Star by Star Enemy Lines: Rebel Stand Enemy Lines: Rebel Dream Destiny’s Way Ylesia Dark Journey Traitor Q7833036 Traitor Q21161161 Radar Station-ISWC 2022 16
  • 17. Experiment - Embeddings Annotation System Tables Candidate Scores Context Entities Selection Ambiguity Detection Radar Station Disambiguation Ambiguities & Context KG Embeddings Output Leverage Pytorch-Biggraph [7] for training embeddings Experiment with: • 2 translational distance models: TransE, RotatE (GraphVite [8] pre-trained embeddings) • 2 semantic matching models: DistMult, ComplEx [7] Lerer, A., Wu, L., Shen, J., Lacroix, T., Wehrstedt, L., Bose, A., Peysakhovich, A.: Pytorch-biggraph: A large scale graph embedding system. In: Conference onMachine Learning and Systems (MLSys). vol. 1, pp. 120–131 (2019) [8] Zhu, Z., Xu, S., Tang, J., Qu, M.: Graphvite: A high-performance cpu-gpu hybrid system for node embedding. In: The World Wide Web Conference (WWW). pp. 2494–2504 (2019) Radar Station-ISWC 2022 17
  • 18. Radar Station Annotation System Tables Candidate Scores Context Entities Selection Ambiguity Detection Radar Station Disambiguation Ambiguities & Context KG Embeddings Output Using table context to disambiguate ambiguities 𝑎𝑚𝑖: An ambiguity (one of the top candidates previously selected) 𝑒𝑗: A context entity, i.e. a candidate entity for another cell from the same column 𝑆𝑐(𝑒𝑗): Score of the context entity 𝑒𝑗 … {‘row’: 15, ‘column’ : 1, ‘Annotations’: [ {‘id': 'Q21161161’, ‘score’: 0.01600}, {‘id’: ‘Q7833036’, ‘score’: 0.01600}, {‘row’: 16, ‘column’ : 1, {‘Annotations’: [ {‘id’: ‘Q5265233’, ‘score’: 0.01600}, {‘id’: ‘Q60172766’, ‘score’: 0.0102}, {‘id’: ‘Q17010392’, ‘score’: 0.0102}, ….. ]}, ... Table context Incorrect candidate Correct candidate -- Ambiguities -- Context 𝐹 𝑎𝑚𝑖 = 1 𝐾 ෍ 𝑗<𝐾 ( 𝑆𝑐(𝑒𝑗) 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑎𝑚𝑖, 𝑒𝑗) ) Radar Station-ISWC 2022 18
  • 19. Experiment – Dataset Limaye, T2D - Web tables 2T_v2 - Synthetic tables ShortTables - Built from T2D (each table only contains two rows, reduce context) Gold standard # Table Avg. #Rows Avg. #Col #Entities Ambiguities (t=1) Ambiguities (t=0.9) Limaye 437 37 2 5,143 181 (3.52%) 685 (13.31%) T2D 762 157 5 18,589 2,322 (12.49%) 8,852 (47.62%) 2T_v2 180 1080 5 661,297 30,686 (4.64%) 86,739 (13.11%) ShortTables 2237 2 5 4,474 1422 (31.78%) 1822 (40.72%) Radar Station-ISWC 2022 19
  • 20. Evaluation - Metrics 𝐴𝑃 = # 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑎𝑚𝑏𝑖𝑔𝑢𝑖𝑡𝑖𝑒𝑠 # 𝐴𝑚𝑏𝑖𝑔𝑢𝑖𝑡𝑖𝑒𝑠 𝑃𝐴 = # 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑚𝑏𝑖𝑔𝑢𝑖𝑡𝑦 𝑑𝑖𝑠𝑎𝑚𝑏𝑖𝑔𝑢𝑎𝑡𝑖𝑜𝑛𝑠 # 𝐴𝑚𝑏𝑖𝑔𝑢𝑖𝑡𝑖𝑒𝑠 GP = # 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛𝑠 # 𝑇𝑜𝑡𝑎𝑙 𝑙𝑎𝑏𝑒𝑙𝑠 Radar Station-ISWC 2022 20 AP: the quality of the ambiguity set PA: the precision for Radar Station over the ambiguity set GP: the global precision among all annotations with or without Radar Station
  • 21. Evaluation - Improvements on all datasets (regardless of the embeddings type) Methods Limaye T2D 2T_v2 ShortTable AP PA GP AP PA GP AP PA GP AP PA GP DAGOBAH SL 0.296 0.853 0.180 0.785 0.208 0.870 0.302 0.654 RS+TransE 0.528 0.872 0.312 0.815 0.230 0.872 0.414 0.673 RS+RotatE 0.614 0.542 0.873 0.332 0.312 0.815 0.327 0.235 0.872 0.671 0.418 0.674 RS+DistMult 0.377 0.860 0.230 0.797 0.213 0.870 0.328 0.659 RS+ComplEx 0.435 0.864 0.233 0.798 0.219 0.870 0.334 0.660 Radar Station evaluation based on DAGOBAH SL scores, t=0,95. Radar Station-ISWC 2022 21 AP: the quality of the ambiguity set PA: the precision for Radar Station over the ambiguity set GP: the global precision among all annotations with or without Radar Station
  • 22. AP: the quality of the ambiguity set PA: the precision for Radar Station over the ambiguity set GP: the global precision among all annotations with or without Radar Station Evaluation - Improvements for all base input systems Dataset System t AP Original Output Radar Station PA GP PA GP Limaye DAGOBAH SL 0.9 0.653 0.432 0.853 0.578 (+0.146) 0.873 (+0.020) MTab 0.83 0.820 0.705 0.857 0.787 (+0.082) 0.875 (+0.018) BBW 0.65 0.587 0.359 0.563 0.507 (+0.148) 0.597 (+0.034) T2D DAGOBAH SL 0.95 0.332 0.180 0.785 0.312 (+0.132) 0.815 (+0.030) MTab 0.71 0.385 0.295 0.837 0.346 (+0.051) 0.857 (+0.020) BBW 0.65 0.263 0.192 0.364 0.253 (+0.061) 0.382 (+0.018) Radar Station evaluation on Web tables with DAGOBAL SL, Mtab and BBW, with RotatE Radar Station-ISWC 2022 22
  • 23. AP: the quality of the ambiguity set PA: the precision for Radar Station over the ambiguity set GP: the global precision among all annotations with or without Radar Station Evaluation - More improvements on Web tables than on synthetic tables More improvements on Web tables (Max +3%) than synthetic tables (Max +0.2%) - Synthetic tables lack the inclusion of common themes. Web Tables Synthetic Tables Methods Limaye T2D 2T_v2 AP PA GP AP PA GP AP PA GP DAGOBAH SL 0.296 0.853 0.180 0.785 0.208 0.870 RS+TransE 0.528 0.872 0.312 0.815 0.230 0.872 RS+RotatE 0.614 0.542 0.873 0.332 0.312 0.815 0.327 0.235 0.872 RS+DistMult 0.377 0.860 0.230 0.797 0.213 0.870 RS+ComplEx 0.435 0.864 0.233 0.798 0.219 0.870 Radar Station evaluation based on DAGOBAH SL scores. t=0.95 Radar Station-ISWC 2022 23
  • 24. Evaluation - Not specific improvements over simulated extreme conditions The contribution of Radar Station is minimal in T2D and ShortTable (Max +3%) • More ambiguities + • Less context - Methods T2D ShortTable AP PA GP AP PA GP DAGOBAH SL 0.180 0.785 0.302 0.654 RS+TransE 0.312 0.815 0.414 0.673 RS+RotatE 0.332 0.312 0.815 0.671 0.418 0.674 RS+DistMult 0.230 0.797 0.328 0.659 RS+ComplEx 0.233 0.798 0.334 0.660 Radar Station evaluation based on DAGOBAH SL scores. t=0.95 Radar Station-ISWC 2022 24 AP: the quality of the ambiguity set PA: the precision for Radar Station over the ambiguity set GP: the global precision among all annotations with or without Radar Station
  • 25. The results are similar for embeddings from the same family Translational distance models are better than semantic matching models t Models Limaye Class System AP PA GP 0.95 - DAGOBAH SL 0.296 0.853 Translational Distance RS+TransE 0.528 0.872 RS+RotatE 0.614 0.542 0.873 Semantic Matchin RS+DistMult 0.377 0.860 RS+ComplEx 0.435 0.864 Evaluation - Translational distance models are better Illustration of the Kappa test between different outputs, t = 0.95. Radar Station evaluation based on DAGOBAH SL scores. Radar Station-ISWC 2022 25
  • 26. Interne Orange Conclusion & Future Work ▪ Radar Station is a useful plug-in module for improving cell annotations! Github: https://github.com/Orange-OpenSource/radar-station Data and Models: https://zenodo.org/record/6522985 & https://zenodo.org/record/6522921 Slides: https://tinyurl.com/radar-station-iswc2022 ▪ Future Work: ▪ Handle additional tables (beyond relational tables) ▪ Handle additional context (e.g. table caption, text surrounding the table, etc.) ▪ Downstream tasks (e.g., schemas augmentation, data imputation) Radar Station-ISWC 2022 26