Radar Station is a cell-entity disambiguation plugin for semantic table interpretation (STI) systems. It leverages graph embeddings to enhance the table context in order to more accurately annotate very ambiguous cell entities.
Recent advances on low-rank and sparse decomposition for moving object detectionActiveEon
(RFIA 2016) Recent advances on low-rank and sparse decomposition for moving object detection: matrix and tensor-based approaches. RFIA 2016, workshop/atelier: Enjeux dans la détection d’objets mobiles par soustraction de fond.
Point-GNN: Graph Neural Network for 3D Object Detection in a Point CloudNuwan Sriyantha Bandara
Presentation on the Point-GNN paper (presented at CVPR 2020) for the module: Advances in Machine Vision at the Department of Electronic and Telecommunication Engineering, University of Moratuwa, Sri Lanka.
Presentation slides are prepared by Nuwan Bandara.
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...Ravi Kiran B.
Modern perception pipelines in autonomous driving (AD) systems are based on Deep Neural Networks (DNNs) which utilize multiple hyper-parameter configurations and training strategies. Data augmentations is now a well-established training strategy to improve the generalization of DNNs, especially in a low dataset regime. Self-supervised learning and semi-supervised methods depend heavily on data augmentation strategies. In this study we view generalization due to data augmentations training DNNs since they implicitly model the geometric, viewpoint based transformations present on images/pointclouds due to noise, perspective, motion of the ego-vehicle. We shortly review current data augmentation strategies for perception tasks in AD, and recent developments on understanding its effects on model generalization.
In the talk we shall review data augmentation strategies through two case studies:
- Improving model performance of monocular 3D object detection model by using geometry preserving data augmentations on images
- Understand the role of data augmentation in reducing data redundancy and improving label efficiency within an active learning pipeline
Recent advances on low-rank and sparse decomposition for moving object detectionActiveEon
(RFIA 2016) Recent advances on low-rank and sparse decomposition for moving object detection: matrix and tensor-based approaches. RFIA 2016, workshop/atelier: Enjeux dans la détection d’objets mobiles par soustraction de fond.
Point-GNN: Graph Neural Network for 3D Object Detection in a Point CloudNuwan Sriyantha Bandara
Presentation on the Point-GNN paper (presented at CVPR 2020) for the module: Advances in Machine Vision at the Department of Electronic and Telecommunication Engineering, University of Moratuwa, Sri Lanka.
Presentation slides are prepared by Nuwan Bandara.
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...Ravi Kiran B.
Modern perception pipelines in autonomous driving (AD) systems are based on Deep Neural Networks (DNNs) which utilize multiple hyper-parameter configurations and training strategies. Data augmentations is now a well-established training strategy to improve the generalization of DNNs, especially in a low dataset regime. Self-supervised learning and semi-supervised methods depend heavily on data augmentation strategies. In this study we view generalization due to data augmentations training DNNs since they implicitly model the geometric, viewpoint based transformations present on images/pointclouds due to noise, perspective, motion of the ego-vehicle. We shortly review current data augmentation strategies for perception tasks in AD, and recent developments on understanding its effects on model generalization.
In the talk we shall review data augmentation strategies through two case studies:
- Improving model performance of monocular 3D object detection model by using geometry preserving data augmentations on images
- Understand the role of data augmentation in reducing data redundancy and improving label efficiency within an active learning pipeline
Line Detection is computationally more intense than humans often would
expect. A graphics processing unit (GPU) can meet this need with substantial computational power, but the classic algorithmic approaches to line detection are often of a serial nature
and/or
utilize statistical sampling that cannot provide deterministic detection guarantuees.
Our talk presents a line detection algorithm that is able to detect lines of any angle, throughout the image. It is as parallel as the number of given image pixels multiplied by the
number of potential line angle bins. In contrast to the Hough transform, it is able to locate start and end of found line segments as well. Its redundant image accesses and bilinear
interpolations needed for
the multi-angle edge detection are managed by the texture cache, conserving DRAM memory bandwidth and computational complexity.
It is based on local edge detection filtering to fill small line angle candidates, followed by the inference of line primitives by a segmented scan, all happening in a data-parallel
fashion.
The output is a 2D array of line segments, providing the length of all line segments that originate from a given 2D position and a given line angle bin. This line segment map can then
be used to either infer higher-level vector symbols built from line primitives, again in a data-parallel fashion, using either GPU atomics or a data compaction algorithm in stream
fashion such as HistoPyramids. We exemplify this with the detection of parallel lines and quadriliterals.
While the algorithm's implementation benefits from atomics and shared memory, the basic algorithmic implementation is so simple that it can even be implemented on OpenGL ES 2.0 hardware
such as mobile phones.
Through a WebGL implementation, the line detection can even be applied to HTML5-based
camera input, providing a platform portable approach to low-level computer vision, and, in continuation, augmented reality and symbol detection on mobile phones.
https://www.geofront.eu/demos/lines
This slide represents topics on PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) where I tried to cover basic PCA, application, and use of PCA and SVD, Important keywords to know about PCA briefly, PCA algorithm and implementation, Basic SVD, SVD calculation, SVD implementation, Performance comparison of SVD and PCA regarding one publicly available dataset.
N.B. Information in this slide are gathered from
1. Machine Learning course by Andrew NG,
2. Mining of Massive Dataset | Stanford University | Artificial Intelligence - All in One (youtube channel)
3. and many more they are described in the slide.
“ Implimentation of SD Processor Based On CRDC Algorithm ”inventionjournals
In Digital Signal Processing (DSP) there are many complex algorithms for which an efficient hardware implementation is required in real time applications. One such complex algorithm is Singular-value Decomposition (SD) which is an important algorithm with applications in varied domains of signal processing such as direction estimation, spectrum analysis and systems identification. It is a generalized extension to the eigen-decomposition for non-square matrices and is hence of great importance, particularly for subspace based algorithms in signal processing. But SD is known to be a very complicated algorithm with computational complexity ~O(N3 ) (for a NxN square matrix). For real-time computation of such a complex algorithm the use of a parallel and direct mapped hardware solution is indeed desired.
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Beniamino Murgante
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Data Quality Interpretation
Erik Borg, Bernd Fichtelmann - German Aerospace Center, German Remote Sensing Data Center
Hartmut Asche - Department of Geography, University of Potsdam
Slides of Nathan Piasco ICRA 2019 oral presentation about the paper "Learning Scene Geometry for Visual Localization in Challenging Conditions". Best paper in Robot Vision Finalist
The presentation slides of conference IC2020
https://webikeo.fr/webinar/ic-2-partie-1
Yoan Chabot, Thomas Labbé, Jixiong Liu, Raphaël Troncy
DAGOBAH : Un système d’annotation sémantique de données tabulaires indépendant du contexte
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
GraphX is a graph processing framework built into Apache Spark. This talk introduces GraphX, describes key features of its API, and gives an update on its status.
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...Julián Urbano
We present an empirical analysis of the effect that the gain and discount functions have in the correlation between DCG and user satisfaction. Through a large user study we estimate the relationship between satisfaction and the effectiveness computed with a test collection. In particular, we estimate the probabilities that users find a system satisfactory given a DCG score, and that they agree with a difference in DCG as to which of two systems is more satisfactory. We study this relationship for 36 combinations of gain and discount, and find that a linear gain and a constant discount are best correlated with user satisfaction.
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...NAVER Engineering
발표자: 이연창(한양대 박사과정)
발표일: 2018.2.
We investigate how to address the shortcomings of the popular One-Class Collaborative Filtering (OCCF) methods in handling challenging “sparse” dataset in one-class setting (e.g., clicked or bookmarked), and propose a novel graph-theoretic OCCF approach, named as gOCCF, by exploiting both positive preferences (derived from rated items) as well as negative preferences (derived from unrated items). In capturing both positive and negative preferences as a bipartite graph, further, we apply the graph shattering theory to determine the right amount of negative preferences to use. Then, we develop a suite of novel graph-based OCCF methods based on the random walk with restart and belief propagation methods. Through extensive experiments using 3 real-life datasets, we show that our gOCCF effectively addresses the sparsity challenge and significantly outperforms all of 8 competing methods in accuracy on very sparse datasets while providing comparable accuracy to the best performing OCCF methods on less sparse datasets.
Invited talk on AR/SLAM and IoT in ILAS Seminar :Introduction to IoT and
Security, Kyoto University, 2020.
(https://www.z.k.kyoto-u.ac.jp/freshman-guide/ilas-seminars/ )
◆登壇者: Tomoyuki Mukasa
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Line Detection is computationally more intense than humans often would
expect. A graphics processing unit (GPU) can meet this need with substantial computational power, but the classic algorithmic approaches to line detection are often of a serial nature
and/or
utilize statistical sampling that cannot provide deterministic detection guarantuees.
Our talk presents a line detection algorithm that is able to detect lines of any angle, throughout the image. It is as parallel as the number of given image pixels multiplied by the
number of potential line angle bins. In contrast to the Hough transform, it is able to locate start and end of found line segments as well. Its redundant image accesses and bilinear
interpolations needed for
the multi-angle edge detection are managed by the texture cache, conserving DRAM memory bandwidth and computational complexity.
It is based on local edge detection filtering to fill small line angle candidates, followed by the inference of line primitives by a segmented scan, all happening in a data-parallel
fashion.
The output is a 2D array of line segments, providing the length of all line segments that originate from a given 2D position and a given line angle bin. This line segment map can then
be used to either infer higher-level vector symbols built from line primitives, again in a data-parallel fashion, using either GPU atomics or a data compaction algorithm in stream
fashion such as HistoPyramids. We exemplify this with the detection of parallel lines and quadriliterals.
While the algorithm's implementation benefits from atomics and shared memory, the basic algorithmic implementation is so simple that it can even be implemented on OpenGL ES 2.0 hardware
such as mobile phones.
Through a WebGL implementation, the line detection can even be applied to HTML5-based
camera input, providing a platform portable approach to low-level computer vision, and, in continuation, augmented reality and symbol detection on mobile phones.
https://www.geofront.eu/demos/lines
This slide represents topics on PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) where I tried to cover basic PCA, application, and use of PCA and SVD, Important keywords to know about PCA briefly, PCA algorithm and implementation, Basic SVD, SVD calculation, SVD implementation, Performance comparison of SVD and PCA regarding one publicly available dataset.
N.B. Information in this slide are gathered from
1. Machine Learning course by Andrew NG,
2. Mining of Massive Dataset | Stanford University | Artificial Intelligence - All in One (youtube channel)
3. and many more they are described in the slide.
“ Implimentation of SD Processor Based On CRDC Algorithm ”inventionjournals
In Digital Signal Processing (DSP) there are many complex algorithms for which an efficient hardware implementation is required in real time applications. One such complex algorithm is Singular-value Decomposition (SD) which is an important algorithm with applications in varied domains of signal processing such as direction estimation, spectrum analysis and systems identification. It is a generalized extension to the eigen-decomposition for non-square matrices and is hence of great importance, particularly for subspace based algorithms in signal processing. But SD is known to be a very complicated algorithm with computational complexity ~O(N3 ) (for a NxN square matrix). For real-time computation of such a complex algorithm the use of a parallel and direct mapped hardware solution is indeed desired.
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Beniamino Murgante
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Data Quality Interpretation
Erik Borg, Bernd Fichtelmann - German Aerospace Center, German Remote Sensing Data Center
Hartmut Asche - Department of Geography, University of Potsdam
Slides of Nathan Piasco ICRA 2019 oral presentation about the paper "Learning Scene Geometry for Visual Localization in Challenging Conditions". Best paper in Robot Vision Finalist
The presentation slides of conference IC2020
https://webikeo.fr/webinar/ic-2-partie-1
Yoan Chabot, Thomas Labbé, Jixiong Liu, Raphaël Troncy
DAGOBAH : Un système d’annotation sémantique de données tabulaires indépendant du contexte
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
GraphX is a graph processing framework built into Apache Spark. This talk introduces GraphX, describes key features of its API, and gives an update on its status.
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...Julián Urbano
We present an empirical analysis of the effect that the gain and discount functions have in the correlation between DCG and user satisfaction. Through a large user study we estimate the relationship between satisfaction and the effectiveness computed with a test collection. In particular, we estimate the probabilities that users find a system satisfactory given a DCG score, and that they agree with a difference in DCG as to which of two systems is more satisfactory. We study this relationship for 36 combinations of gain and discount, and find that a linear gain and a constant discount are best correlated with user satisfaction.
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...NAVER Engineering
발표자: 이연창(한양대 박사과정)
발표일: 2018.2.
We investigate how to address the shortcomings of the popular One-Class Collaborative Filtering (OCCF) methods in handling challenging “sparse” dataset in one-class setting (e.g., clicked or bookmarked), and propose a novel graph-theoretic OCCF approach, named as gOCCF, by exploiting both positive preferences (derived from rated items) as well as negative preferences (derived from unrated items). In capturing both positive and negative preferences as a bipartite graph, further, we apply the graph shattering theory to determine the right amount of negative preferences to use. Then, we develop a suite of novel graph-based OCCF methods based on the random walk with restart and belief propagation methods. Through extensive experiments using 3 real-life datasets, we show that our gOCCF effectively addresses the sparsity challenge and significantly outperforms all of 8 competing methods in accuracy on very sparse datasets while providing comparable accuracy to the best performing OCCF methods on less sparse datasets.
Invited talk on AR/SLAM and IoT in ILAS Seminar :Introduction to IoT and
Security, Kyoto University, 2020.
(https://www.z.k.kyoto-u.ac.jp/freshman-guide/ilas-seminars/ )
◆登壇者: Tomoyuki Mukasa
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Radar Station - ISWC 2022.pdf
1. Orange restricted
Radar Station
Using KG Embeddings for Semantic Table
Interpretation and Entity Disambiguation
Jixiong Liu Viet-Phi Huynh Yoan Chabot Raphaël Troncy
Radar Station-ISWC 2022
01
26 October 2022
2. Radar Station-ISWC 2022
02
What does this table mean?
Can the machine automatically interpret it?
… … …
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Context & Motivation
3. author
(P50)
Radar Station-ISWC 2022
03
The New Jedi Order
(Q2743959)
Traitor
(Q7833036)
Ylesia
(Q8053998)
(P179)
Part of the series
30 July
2002
publication date
(P577)
3 September
2002
Matthew Stover
(Q1909623)
Walter Jon Williams
(Q714485)
author
(P50)
… … …
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Semantic Table Interpretation using Knowledge Graphs
4. … … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
• Column-Type Annotation (CTA)
• Columns-Predicate Annotation (CPA)
• Cell-Entity Annotation (CEA) - Our focus
• Table Topic Annotation
• Row-to-Instance Traitor (literary work):
Q7833036 on Wikidata
author: Matthew Stover
part of the series: The New Jedi Order
publisher: Del Rey Books
publication date: 30 July 2002
media franchise: Star Wars …
Traitor (literary work):
Q21161161 on Wikidata
author: Stephen Daisley
country of origin: Australia
publication date: 2010
language of work or name: English …
Radar Station-ISWC 2022
04
Traitor (literary work):
Q7833036 on Wikidata
author: Matthew Stover
part of the series: The New Jedi Order
publisher: Del Rey Books
publication date: 30 July 2002
media franchise: Star Wars …
Semantic Table Interpretation – Up to Five Tasks
5. Interne Orange
Semantic Table Interpretation - Related Work
• Heuristic-Based Approaches:
• Rely on features (e.g. relevance score) provided by a lookup service
• E.g., ADOG [1], BBW [2]
[1] Oliveira, D., d’Aquin, M.: Adog-annotating data with ontologies and graphs. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) (2019)
[2] Shigapov, R, Zumstein, P, Kamlah, J, Oberländer, L, Mechnich, J, Schumm, I., d’Aquin, M.: bbw: Matching CSV to Wikidata via Meta-lookup. In: Semantic Web Challenge on Tabular Data to Knowledge
Graph Matching (SemTab) (2020)
𝑠𝑖𝑚 = 1 − (
𝐿𝑒𝑣𝑒𝑛𝑠ℎ𝑡𝑒𝑖𝑛𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑠1, 𝑠2)
ma𝑥(𝑙𝑒𝑛𝑔𝑡ℎ 𝑠1 , 𝑙𝑒𝑛𝑔𝑡ℎ(𝑠2))
)
Radar Station-ISWC 2022
05
s1: Label of the entity from the KG (string)
s2: Mention from the table cell (string)
6. Interne Orange
Semantic Table Interpretation - Related Work
• Heuristic-Based Approaches
• Iterative Disambiguation:
• Use the results of the CEA, CTA, CPA annotation tasks, in order to mutually
reinforce the compatibility between annotations
• Main shortcomings:
• Error propagation
• Background knowledge hidden in
the table is not used (e.g. all books
belong to a series)
• E.g., DAGOBAH [3], Mtab [4]
[3] Huynh, V.P., Liu, J., Chabot, Y., Deuzé, F., Labbé, T., Monnin, P., Troncy, R.: DAGOBAH: Table and Graph Contexts for Efficient Semantic Annotation of Tabular Data. In: Semantic Web Challenge on
Tabular Data to Knowledge Graph Matching (SemTab) (2021)
[4] Nguyen, P., Yamada, I., Kertkeidkachorn, N., Ichise, R., Takeda, H.: Mtab4wikidata at semtab 2020: Tabular data annotation with wikidata. In: Semantic Web Challenge on Tabular Data to Knowledge
Graph Matching (SemTab) (2020)
Radar Station-ISWC 2022
06
7. Interne Orange
Semantic Table Interpretation - Related Work
• Heuristic-Based Approaches
• Iterative Disambiguation
• Usage of Graph Embeddings:
• Use pre-trained graph embeddings for
augmenting information about entities
• Main shortcoming: the embeddings
quality depends on the density of the
graph
• E.g., Vasilis et al [5], DAGOBAH-
Embeddings [6]
[5] Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Matching web tables with knowledge base entities: from entity lookups to entity embeddings. In: 16th International Semantic
Web Conference (ISWC). pp. 260–277. Springer (2017)
[6] Chabot, Y., Labbe, T., Liu, J., Troncy, R.: DAGOBAH: an end-to-end context-free tabular data semantic annotation system. In: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching
(SemTab). pp. 41–48 (2019)
Radar Station-ISWC 2022
07
8. Our Approach: Radar Station
Annotation
System
Tables
Ambiguity
Detection
Radar Station
Disambiguation
Output
KG Embeddings
Candidate Scores Ambiguities
& Context
Context Entities
Selection
Radar Station-ISWC 2022
08
Ambiguity
Detection
Context Entities
Selection
Radar Station
Disambiguation
Detect potential errors caused by error propagation
Capture more semantic similarities (from the embeddings)
Disambiguation by hybridizing entity scores and embeddings distance
Radar Station is a plug-in module for an
existing STI system (typically using iterative
disambiguation) that will benefit from pre-
trained embeddings as data augmentation
9. … … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
Why is this table difficult to interpret?
• The table lacks context, e.g., for the target column (book titles), who are the authors?
• The information about the book series (Star Wars) is not present in the table
• Matching “2002” with “30 July 2002” is not trivial
Traitor (Literary work):
Q7833036 on Wikidata
author: Matthew Stover
part of the series: The New Jedi Order
publisher: Del Rey Books
publication date: 30 July 2002
media franchise: Star Wars …
Traitor (Literary Work):
Q21161161 on Wikidata
author: Stephen Daisley
country of origin: Australia
publication date: 2010
language of work or name: English …
Radar Station-ISWC 2022
09
10. DAGOBAH SL results: 2 candidates with an equal score
Mtab results: the correct candidate is at the 4th rank
BBW results: no output for this cell
We use DAGOBAH SL as input system to illustrate this presentation
… … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
DAGOBAH SL scores:
{‘id': 'Q21161161’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q7833036’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q1536329’, ‘score’: 0.01164}, (Traitor - film)
…... ,
MTab scores:
{‘id’: ‘Q2435622’, ‘score’: 0.02546}, (Traitor - television series episode)
{‘id’: ‘Q16746183’, ‘score’: 0.02545}, (Traitor - television series episode)
{‘id’: ‘Q7833042’, ‘score’: 0.024468}, (Traitor - fictional character)
{‘id’: ‘Q7833036’, ‘score’: 0.024467}, (Traitor - literary work)
…... ,
Radar Station-ISWC 2022
10
11. … … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
We aim to detect the cell annotations that need to be disambiguated
We set a tolerance t to select the top candidates
Example:
• If t = 1, Q21161161 and Q7833036 are top candidates
Radar Station-ISWC 2022
11
DAGOBAH SL scores:
…
{‘id': 'Q21161161’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q7833036’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q1536329’, ‘score’: 0.01164}, (Traitor - film)
…... ]},
...
12. We aim to detect the cell annotations that need to be disambiguated:
We set a tolerance t to select the top candidates
Example:
• If t = 1, Q21161161 and Q7833036 are the top candidates
• If t = 0.7, Q1536329 is also considered among the top candidates (0.1164>0.16*0.7)
Top candidates are ambiguities that we need to disambiguate
… … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
Radar Station-ISWC 2022
12
DAGOBAH SL scores:
…
{‘id': 'Q21161161’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q7833036’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q1536329’, ‘score’: 0.01164}, (Traitor - film)
…... ]},
...
13. … … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
If we only have one candidate in top candidates (e.g., row “Destiny’s Way” with
t = 1), we directly output the entity without Radar Station.
Radar Station-ISWC 2022
13
DAGOBAH SL scores:
…
{‘id': ‘Q5265233’, ‘score’: 0.01600}, (Destiny’s Way - literary work)
{‘id’: ‘Q60172766’, ‘score’: 0.0102}, (Destiny - literary work)
{‘id’: ‘Q17010392’, ‘score’: 0.0102}, (Destiny - literary work)
…... ]},
...
14. … … …
2002 Enemy Lines: Rebel Dream
2002 Enemy Lines: Rebel Stand
2002 Traitor
2002 Destiny’s Way
2002 Ylesia E-book
Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
We aim to build a column-wised representation of table context with
candidates from the same column.
Collect all top candidates and their scores for a given t from the same column as
the context entities (e.g., t =1)
…
{‘row’: 15, ‘column’ : 1,
‘Annotations’: [
{‘id': 'Q21161161’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q7833036’, ‘score’: 0.01600}, (Traitor - literary work)
{‘id’: ‘Q1536329’, ‘score’: 0.01164}, (Traitor - film) …... ]},
{‘row’: 16, ‘column’ : 1,
{‘Annotations’: [
{‘id’: ‘Q5265233’, ‘score’: 0.01600}, (Destiny’s Way - literary work)
{‘id’: ‘Q60172766’, ‘score’: 0.0102}, (Destiny - literary work)
{‘id’: ‘Q17010392’, ‘score’: 0.0102}, (Destiny - literary work) ….. ]},
Radar Station-ISWC 2022
14
15. Radar Station - Intuition Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
Distance
Initial
power
receiver sender
Distance Embeddings distance
Signal power scoring from a previous annotation system
Radar Station-ISWC 2022
15
16. Radar Station - Intuition Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
Star by Star
Enemy Lines: Rebel Stand
Enemy Lines: Rebel Dream
Destiny’s Way
Ylesia
Dark Journey
Traitor
Q7833036
Traitor
Q21161161
Radar Station-ISWC 2022
16
17. Experiment - Embeddings Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
Leverage Pytorch-Biggraph [7] for training embeddings
Experiment with:
• 2 translational distance models:
TransE, RotatE (GraphVite [8] pre-trained embeddings)
• 2 semantic matching models: DistMult, ComplEx
[7] Lerer, A., Wu, L., Shen, J., Lacroix, T., Wehrstedt, L., Bose, A., Peysakhovich, A.: Pytorch-biggraph: A large scale graph embedding system. In: Conference onMachine Learning and
Systems (MLSys). vol. 1, pp. 120–131 (2019)
[8] Zhu, Z., Xu, S., Tang, J., Qu, M.: Graphvite: A high-performance cpu-gpu hybrid system for node embedding. In: The World Wide Web Conference (WWW). pp. 2494–2504 (2019)
Radar Station-ISWC 2022
17
18. Radar Station Annotation
System
Tables
Candidate
Scores
Context Entities
Selection
Ambiguity
Detection
Radar Station
Disambiguation
Ambiguities
& Context
KG Embeddings
Output
Using table context to disambiguate ambiguities
𝑎𝑚𝑖: An ambiguity (one of the top candidates
previously selected)
𝑒𝑗: A context entity, i.e. a candidate entity for
another cell from the same column
𝑆𝑐(𝑒𝑗): Score of the context entity 𝑒𝑗
…
{‘row’: 15, ‘column’ : 1,
‘Annotations’: [
{‘id': 'Q21161161’, ‘score’: 0.01600},
{‘id’: ‘Q7833036’, ‘score’: 0.01600},
{‘row’: 16, ‘column’ : 1,
{‘Annotations’: [
{‘id’: ‘Q5265233’, ‘score’: 0.01600},
{‘id’: ‘Q60172766’, ‘score’: 0.0102},
{‘id’: ‘Q17010392’, ‘score’: 0.0102}, ….. ]},
...
Table context
Incorrect candidate
Correct candidate
-- Ambiguities
-- Context
𝐹 𝑎𝑚𝑖 =
1
𝐾
𝑗<𝐾
(
𝑆𝑐(𝑒𝑗)
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑎𝑚𝑖, 𝑒𝑗)
)
Radar Station-ISWC 2022
18
20. Evaluation - Metrics
𝐴𝑃 =
# 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑎𝑚𝑏𝑖𝑔𝑢𝑖𝑡𝑖𝑒𝑠
# 𝐴𝑚𝑏𝑖𝑔𝑢𝑖𝑡𝑖𝑒𝑠
𝑃𝐴 =
# 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑚𝑏𝑖𝑔𝑢𝑖𝑡𝑦 𝑑𝑖𝑠𝑎𝑚𝑏𝑖𝑔𝑢𝑎𝑡𝑖𝑜𝑛𝑠
# 𝐴𝑚𝑏𝑖𝑔𝑢𝑖𝑡𝑖𝑒𝑠
GP =
# 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛𝑠
# 𝑇𝑜𝑡𝑎𝑙 𝑙𝑎𝑏𝑒𝑙𝑠
Radar Station-ISWC 2022
20
AP: the quality of the ambiguity set
PA: the precision for Radar Station over the ambiguity set
GP: the global precision among all annotations with or without Radar Station
21. Evaluation - Improvements on all datasets (regardless of the embeddings type)
Methods
Limaye T2D 2T_v2 ShortTable
AP PA GP AP PA GP AP PA GP AP PA GP
DAGOBAH SL 0.296 0.853 0.180 0.785 0.208 0.870 0.302 0.654
RS+TransE 0.528 0.872 0.312 0.815 0.230 0.872 0.414 0.673
RS+RotatE 0.614 0.542 0.873 0.332 0.312 0.815 0.327 0.235 0.872 0.671 0.418 0.674
RS+DistMult 0.377 0.860 0.230 0.797 0.213 0.870 0.328 0.659
RS+ComplEx 0.435 0.864 0.233 0.798 0.219 0.870 0.334 0.660
Radar Station evaluation based on DAGOBAH SL scores, t=0,95.
Radar Station-ISWC 2022
21
AP: the quality of the ambiguity set
PA: the precision for Radar Station over the ambiguity set
GP: the global precision among all annotations with or without Radar Station
22. AP: the quality of the ambiguity set
PA: the precision for Radar Station over the ambiguity set
GP: the global precision among all annotations with or without Radar Station
Evaluation - Improvements for all base input systems
Dataset System t AP
Original Output Radar Station
PA GP PA GP
Limaye
DAGOBAH SL 0.9 0.653 0.432 0.853 0.578 (+0.146) 0.873 (+0.020)
MTab 0.83 0.820 0.705 0.857 0.787 (+0.082) 0.875 (+0.018)
BBW 0.65 0.587 0.359 0.563 0.507 (+0.148) 0.597 (+0.034)
T2D
DAGOBAH SL 0.95 0.332 0.180 0.785 0.312 (+0.132) 0.815 (+0.030)
MTab 0.71 0.385 0.295 0.837 0.346 (+0.051) 0.857 (+0.020)
BBW 0.65 0.263 0.192 0.364 0.253 (+0.061) 0.382 (+0.018)
Radar Station evaluation on Web tables with DAGOBAL SL, Mtab and BBW, with RotatE
Radar Station-ISWC 2022
22
23. AP: the quality of the ambiguity set
PA: the precision for Radar Station over the ambiguity set
GP: the global precision among all annotations with or without Radar Station
Evaluation - More improvements on Web tables than on synthetic tables
More improvements on Web tables (Max +3%) than synthetic tables (Max +0.2%)
- Synthetic tables lack the inclusion of common themes.
Web Tables Synthetic Tables
Methods Limaye T2D 2T_v2
AP PA GP AP PA GP AP PA GP
DAGOBAH SL 0.296 0.853 0.180 0.785 0.208 0.870
RS+TransE 0.528 0.872 0.312 0.815 0.230 0.872
RS+RotatE 0.614 0.542 0.873 0.332 0.312 0.815 0.327 0.235 0.872
RS+DistMult 0.377 0.860 0.230 0.797 0.213 0.870
RS+ComplEx 0.435 0.864 0.233 0.798 0.219 0.870
Radar Station evaluation based on DAGOBAH SL scores. t=0.95
Radar Station-ISWC 2022
23
24. Evaluation - Not specific improvements over simulated extreme conditions
The contribution of Radar Station is minimal in T2D and ShortTable (Max +3%)
• More ambiguities +
• Less context -
Methods
T2D ShortTable
AP PA GP AP PA GP
DAGOBAH SL 0.180 0.785 0.302 0.654
RS+TransE 0.312 0.815 0.414 0.673
RS+RotatE 0.332 0.312 0.815 0.671 0.418 0.674
RS+DistMult 0.230 0.797 0.328 0.659
RS+ComplEx 0.233 0.798 0.334 0.660
Radar Station evaluation based on DAGOBAH SL scores. t=0.95
Radar Station-ISWC 2022
24
AP: the quality of the ambiguity set
PA: the precision for Radar Station over the ambiguity set
GP: the global precision among all annotations with or without Radar Station
25. The results are similar for embeddings from the same family
Translational distance models are better than semantic matching models
t
Models Limaye
Class System AP PA GP
0.95
- DAGOBAH SL 0.296 0.853
Translational
Distance
RS+TransE 0.528 0.872
RS+RotatE 0.614 0.542 0.873
Semantic
Matchin
RS+DistMult 0.377 0.860
RS+ComplEx 0.435 0.864
Evaluation - Translational distance models are better
Illustration of the Kappa test between different outputs,
t = 0.95.
Radar Station evaluation based on DAGOBAH SL scores.
Radar Station-ISWC 2022
25
26. Interne Orange
Conclusion & Future Work
▪ Radar Station is a useful plug-in module for improving cell annotations!
Github: https://github.com/Orange-OpenSource/radar-station
Data and Models: https://zenodo.org/record/6522985
& https://zenodo.org/record/6522921
Slides: https://tinyurl.com/radar-station-iswc2022
▪ Future Work:
▪ Handle additional tables (beyond relational tables)
▪ Handle additional context (e.g. table caption, text surrounding the table, etc.)
▪ Downstream tasks (e.g., schemas augmentation, data imputation)
Radar Station-ISWC 2022
26