Combining Ontology Matchers via Anomaly Detection

•Download as ODP, PDF•

0 likes•634 views

In ontology alignment, there is no single best performing matching algorithm for every matching problem. Thus, most modern matching systems combine several base matchers and aggregate their results into a final alignment. This combination is often based on simple voting or averaging, or uses existing matching problems for learning a combination policy in a supervised setting. In this paper, we present the COMMAND matching system, an unsupervised method for combining base matchers, which uses anomaly detection to produce an alignment from the results delivered by several base matchers. The basic idea of our approach is that in a large set of potential mapping candidates, the scarce actual mappings should be visible as anomalies against the majority of non-mappings. The approach is evaluated on different OAEI datasets and shows a competitive performance with state-of-the-art systems.

Data & Analytics

Combining Ontology Matchers
via Anomaly Detection
Alexander C. Müller and Heiko Paulheim

10/13/15 Alexander C. Müller, Heiko Paulheim 2
Motivation
• Most high-performing matching systems use multiple matchers
• How to combine multiple matchers into a single result?
• Common approaches (selection of)
– average, maximum, minimum matching score
– voting
– expert modeled weights (0.4m1 + 0.3m2 + 0.3m3)
– supervised learning
• Proposal:
– use anomaly detection as an unsupervised aggregation method

10/13/15 Alexander C. Müller, Heiko Paulheim 3
Idea
• Common definitions anomaly/outlier detection:
– Outlier or anomaly detection methods are used to “that appear to
deviate markedly from other members of the same sample", i.e.
– “that appear to be inconsistent with the remainder of the data"
• Rationale:
– for two ontologies with n and m concepts, there are nxm candidates
– the majority are non-matches
– the actual matches are a minority (that differ markedly from the rest)
– so, we should be able to identify them as outliers

10/13/15 Alexander C. Müller, Heiko Paulheim 4
Outlier Detection in a Nutshell
• Given a set of instances as feature vectors
– outlier detection assigns an outlier score to each instance
– higher outlier scores ↔ higher degree of outlierness
• Common approaches
– distance based
– density based
– clustering based
– model based

10/13/15 Alexander C. Müller, Heiko Paulheim 5
Aggregating Matchers via Anomaly Detection
• We run a set of base matchers
• Each base matcher score becomes a numerical feature
• Thus, out feature vectors consist of individual matching scores

10/13/15 Alexander C. Müller, Heiko Paulheim 6
Aggregating Matchers via Anomaly Detection
• Example from the conference dataset
– note: reduced to two dimensions!

10/13/15 Alexander C. Müller, Heiko Paulheim 7
COMMAND: Full Pipeline
• Run set of element-based matchers
– find non-correlated subset
• Run set of structure-based matchers on that subset
• Collect all results into feature vectors
• Perform dimensionality reduction
– removing correlated matchers
– Principal Component Analysis
• Run outlier detection
• Perform optional repair step

10/13/15 Alexander C. Müller, Heiko Paulheim 8
COMMAND: Full Pipeline

10/13/15 Alexander C. Müller, Heiko Paulheim 9
COMMAND: Full Pipeline
• Run set of element-based matchers (28 different ones)
– find non-correlated subset
• Run set of structure-based matchers (five different ones)
on that subset
– Collect all results into feature vectors
• Perform dimensionality reduction
– removing correlated matchers
– Principal Component Analysis
• Run outlier detection
• Normalize outlier scores
• Select mapping candidates
• Perform optional repair setp

10/13/15 Alexander C. Müller, Heiko Paulheim 10
COMMAND: Results
• Good results on biblio benchmark dataset
– up to 67% F-measure
• Median results on conference
– up to 68% F-measure
• Difficulties on anatomy dataset
– only a subset of matchers could be run for scalability reasons

10/13/15 Alexander C. Müller, Heiko Paulheim 11
Discussion and Conclusion
• Proof of Concept
– Anomaly detection is suitable
for matcher aggregation
– non-trivial combination of
matcher scores (PCA, outlier score)
– automatic selection of a suitable
subset of matchers
• Future work
– address scalability issues
– try more anomaly detection
approaches

Viewers also liked

各顯神通bigblue

Marketing Digital e Redes SociaisMarcio Okabe

5 самых вкусных способов заработка в Youtube - Заработок в сети без вложений Лайфхак - Вебинары

The Best of CES 2014The Tech Cult

Social Media for Bremer BankAnn Walker Smalley

Agile Financial Times May09 EditionAgile Financial Technologies

LogroñoBegoña Garcia Diez

Originales gatos- By Oxana Zaikamaditabalnco

BoldPM Insights Summary: Why Smart, Connected Devices Are Transforming Busine...Hector Del Castillo, CPM, CPMM

Estrategias de la publicidad y la mercadotecnia.Miguel I. Robles Rico

Cuestionario de comercioshaniGarciaR

Viewers also liked (11)

各顯神通

Marketing Digital e Redes Sociais

5 самых вкусных способов заработка в Youtube - Заработок в сети без вложений

The Best of CES 2014

Social Media for Bremer Bank

Agile Financial Times May09 Edition

Logroño

Originales gatos- By Oxana Zaika

BoldPM Insights Summary: Why Smart, Connected Devices Are Transforming Busine...

Estrategias de la publicidad y la mercadotecnia.

Cuestionario de comercio

Similar to Combining Ontology Matchers via Anomaly Detection

Introduction to simulation modelingbhupendra kumar

How is research conducted in my fieldCristian Klein

Introduction to Statistics and Probability:Shrihari Shrihari

Overview of statistical tests: Data handling and data quality (Part II)Bioinformatics and Computational Biosciences Branch

An experimental comparison of globally-optimal data de-identification algorithmsarx-deidentifier

Experimental Design for Distributed Machine Learning with Myles BakerDatabricks

Cadth 2015 c2 panel.mohsenCADTH Symposium

simulation modeling in DSSEnaam Alotaibi

steps in geographical research.pptxAsim Pt

Worked examples of sampling uncertainty evaluationGH Yeoh

DutchMLSchool 2022 - History and Developments in MLBigML, Inc

Research Design tushar chaudhari

cs1538.pptTaraLeander

mel705-15.pptDrVivekChauhan1

mel705-15.pptGurumurthy B R

Dowhy: An end-to-end library for causal inferenceAmit Sharma

Brief Introduction to the 12 Steps of Evaluation Data CleaningJennifer Morrow

AL slides.pptShehnazIslam1

6 Modelling PurposesBruce Edmonds

Financial Investments course Chapter 3.pptxMdRoniHasan

Similar to Combining Ontology Matchers via Anomaly Detection (20)

Introduction to simulation modeling

How is research conducted in my field

Introduction to Statistics and Probability:

Overview of statistical tests: Data handling and data quality (Part II)

An experimental comparison of globally-optimal data de-identification algorithms

Experimental Design for Distributed Machine Learning with Myles Baker

Cadth 2015 c2 panel.mohsen

simulation modeling in DSS

steps in geographical research.pptx

Worked examples of sampling uncertainty evaluation

DutchMLSchool 2022 - History and Developments in ML

Research Design

cs1538.ppt

mel705-15.ppt

Dowhy: An end-to-end library for causal inference

Brief Introduction to the 12 Steps of Evaluation Data Cleaning

AL slides.ppt

6 Modelling Purposes

Financial Investments course Chapter 3.pptx

Recently uploaded

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha

Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly

B2 Creative Industry Response Evaluation.docxStephen266013

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk

ASML's Taxonomy Adventure by Daniel Cantervoginip

Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一fhwihughh

RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

Call Girls in Saket 99530🔝 56974 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

E-Commerce Order PredictionShraddha Kamble.pptxBoston Institute of Analytics

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534

Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha

Recently uploaded (20)

20240419 - Measurecamp Amsterdam - SAM.pdf

9654467111 Call Girls In Munirka Hotel And Home Service

Generative AI for Social Good at Open Data Science East 2024

B2 Creative Industry Response Evaluation.docx

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样

ASML's Taxonomy Adventure by Daniel Canter

Customer Service Analytics - Make Sense of All Your Data.pptx

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一

RABBIT: A CLI tool for identifying bots based on their GitHub events.

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

Call Girls in Saket 99530🔝 56974 Escort Service

E-Commerce Order PredictionShraddha Kamble.pptx

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...

Call Girls In Dwarka 9654467111 Escorts Service

Combining Ontology Matchers via Anomaly Detection

1. Combining Ontology Matchers via Anomaly Detection Alexander C. Müller and Heiko Paulheim

2. 10/13/15 Alexander C. Müller, Heiko Paulheim 2 Motivation • Most high-performing matching systems use multiple matchers • How to combine multiple matchers into a single result? • Common approaches (selection of) – average, maximum, minimum matching score – voting – expert modeled weights (0.4m1 + 0.3m2 + 0.3m3) – supervised learning • Proposal: – use anomaly detection as an unsupervised aggregation method

3. 10/13/15 Alexander C. Müller, Heiko Paulheim 3 Idea • Common definitions anomaly/outlier detection: – Outlier or anomaly detection methods are used to “that appear to deviate markedly from other members of the same sample", i.e. – “that appear to be inconsistent with the remainder of the data" • Rationale: – for two ontologies with n and m concepts, there are nxm candidates – the majority are non-matches – the actual matches are a minority (that differ markedly from the rest) – so, we should be able to identify them as outliers

4. 10/13/15 Alexander C. Müller, Heiko Paulheim 4 Outlier Detection in a Nutshell • Given a set of instances as feature vectors – outlier detection assigns an outlier score to each instance – higher outlier scores ↔ higher degree of outlierness • Common approaches – distance based – density based – clustering based – model based

5. 10/13/15 Alexander C. Müller, Heiko Paulheim 5 Aggregating Matchers via Anomaly Detection • We run a set of base matchers • Each base matcher score becomes a numerical feature • Thus, out feature vectors consist of individual matching scores

6. 10/13/15 Alexander C. Müller, Heiko Paulheim 6 Aggregating Matchers via Anomaly Detection • Example from the conference dataset – note: reduced to two dimensions!

7. 10/13/15 Alexander C. Müller, Heiko Paulheim 7 COMMAND: Full Pipeline • Run set of element-based matchers – find non-correlated subset • Run set of structure-based matchers on that subset • Collect all results into feature vectors • Perform dimensionality reduction – removing correlated matchers – Principal Component Analysis • Run outlier detection • Perform optional repair step

8. 10/13/15 Alexander C. Müller, Heiko Paulheim 8 COMMAND: Full Pipeline

9. 10/13/15 Alexander C. Müller, Heiko Paulheim 9 COMMAND: Full Pipeline • Run set of element-based matchers (28 different ones) – find non-correlated subset • Run set of structure-based matchers (five different ones) on that subset – Collect all results into feature vectors • Perform dimensionality reduction – removing correlated matchers – Principal Component Analysis • Run outlier detection • Normalize outlier scores • Select mapping candidates • Perform optional repair setp

10. 10/13/15 Alexander C. Müller, Heiko Paulheim 10 COMMAND: Results • Good results on biblio benchmark dataset – up to 67% F-measure • Median results on conference – up to 68% F-measure • Difficulties on anatomy dataset – only a subset of matchers could be run for scalability reasons

11. 10/13/15 Alexander C. Müller, Heiko Paulheim 11 Discussion and Conclusion • Proof of Concept – Anomaly detection is suitable for matcher aggregation – non-trivial combination of matcher scores (PCA, outlier score) – automatic selection of a suitable subset of matchers • Future work – address scalability issues – try more anomaly detection approaches

12. Combining Ontology Matchers via Anomaly Detection Alexander C. Müller and Heiko Paulheim

Combining Ontology Matchers via Anomaly Detection

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

Similar to Combining Ontology Matchers via Anomaly Detection

Similar to Combining Ontology Matchers via Anomaly Detection (20)

More from Heiko Paulheim

More from Heiko Paulheim (20)

Recently uploaded

Recently uploaded (20)

Combining Ontology Matchers via Anomaly Detection