SlideShare a Scribd company logo
1 of 9
1 
INSTANCE MATCHING API 
Ilias Tzortzakakis - Evangelia Daskalaki - Martin Doerr 
Foundation of Research and Technology Hellas 
(FORTH), 
Institute of Computer Science (ICS)
2 
Instance Matching 
Instance matching for Linked Data is the process 
of comparing different particulars with the goal 
of recognizing the same real-world entity. 
RDF Data Source A 
Appellation: Thutmose 
Profession: Ruler 
Birth Timespan: 1505-1504 BC 
Same 
RDF Data Source B 
particular ? Appellation: Thutmose 
Profession: Sculptor 
Birth Timespan: 1360-1350 BC
3 
IMAPI - Instance Matching API 
• Instance Matching tool is a java API developed in java 7 used 
in order to retrieve similarities of instance pairs, on user 
specified fields among “Source Data” and “Target Data”. 
• Source Data are always a set of RDF files while Target Data 
may be another set of RDF files or an online Database. 
Currently the British Museum Collection 
Database(http://www.britishmuseum.org/research/collection 
_online/search.aspx) and CLAROS database 
(http://www.clarosnet.org/) are supported. 
• Available under an open source software license 
https://github.com/isl/IMAPI .
4 
IMAPI matching Persons via Man-Made Objects
5 
IMAPI Person Match Example 
British Museum 
CLAROS
6 
IMAPI Process 
Source 
CIDOC RDF 
Files 
Target 
CIDOC RDF 
Files / online 
Triple store 
Clustering 
Data 
IM API 
• Clustering data to 
match them 
sequentially 
Calculating 
Similarities 
• Calculation similarities 
by using threshold and 
weighted averages 
according to the User 
Configuration File 
Result Set 
Matched 
instances 
+ 
Matching 
justifications 
User 
Configuration 
File
7 
IMAPI User Configuration File 
Define the source and the target CIDOC-CRM 
data that will be matched together 
Customize a set of weighted paths that are 
used in the matching process 
1) Search for E21_Actors 
2) Return all their rdfs:labels (w=0.5) 
3) Get literals connected to Actors via 
their P3_has_note predicate 
(w=0.4) 
4) Get URIs connected to Actors via 
their P131_is_identified predicate 
(w=0.9) 
5) … 
6) …. 
User Configuration File 
Example
8 
IMAPI Novelties 
• System’s ability to capture Domain Knowledge and Reality 
Knowledge of the experts by using targeted path rules, e.g. 
one painting is usually painted by one and randomly by two 
painters. 
• Fully customizable for the specific needs of different CIDOC 
CRM instance matching problems (or another rich RDF/OWL 
Ontology) depending on the data included in the DBs 
• Uses six different metrics for the comparison of literals 
 Digram, Trigram, Soundex, Edit Distance (Levenshtein distance), 
Single Error, Character Frequency 
• Time-Span comparison
9 
Future Work 
• Compatibility of geographic areas 
• Broader / narrower term classification e.g. 
Painter – Artist 
• Exclusion of comparisons by using e.g negative 
weights, blocking data

More Related Content

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Instance Matching API - Foundation of Research and Technology Hellas, Institute of Computer Science

  • 1. 1 INSTANCE MATCHING API Ilias Tzortzakakis - Evangelia Daskalaki - Martin Doerr Foundation of Research and Technology Hellas (FORTH), Institute of Computer Science (ICS)
  • 2. 2 Instance Matching Instance matching for Linked Data is the process of comparing different particulars with the goal of recognizing the same real-world entity. RDF Data Source A Appellation: Thutmose Profession: Ruler Birth Timespan: 1505-1504 BC Same RDF Data Source B particular ? Appellation: Thutmose Profession: Sculptor Birth Timespan: 1360-1350 BC
  • 3. 3 IMAPI - Instance Matching API • Instance Matching tool is a java API developed in java 7 used in order to retrieve similarities of instance pairs, on user specified fields among “Source Data” and “Target Data”. • Source Data are always a set of RDF files while Target Data may be another set of RDF files or an online Database. Currently the British Museum Collection Database(http://www.britishmuseum.org/research/collection _online/search.aspx) and CLAROS database (http://www.clarosnet.org/) are supported. • Available under an open source software license https://github.com/isl/IMAPI .
  • 4. 4 IMAPI matching Persons via Man-Made Objects
  • 5. 5 IMAPI Person Match Example British Museum CLAROS
  • 6. 6 IMAPI Process Source CIDOC RDF Files Target CIDOC RDF Files / online Triple store Clustering Data IM API • Clustering data to match them sequentially Calculating Similarities • Calculation similarities by using threshold and weighted averages according to the User Configuration File Result Set Matched instances + Matching justifications User Configuration File
  • 7. 7 IMAPI User Configuration File Define the source and the target CIDOC-CRM data that will be matched together Customize a set of weighted paths that are used in the matching process 1) Search for E21_Actors 2) Return all their rdfs:labels (w=0.5) 3) Get literals connected to Actors via their P3_has_note predicate (w=0.4) 4) Get URIs connected to Actors via their P131_is_identified predicate (w=0.9) 5) … 6) …. User Configuration File Example
  • 8. 8 IMAPI Novelties • System’s ability to capture Domain Knowledge and Reality Knowledge of the experts by using targeted path rules, e.g. one painting is usually painted by one and randomly by two painters. • Fully customizable for the specific needs of different CIDOC CRM instance matching problems (or another rich RDF/OWL Ontology) depending on the data included in the DBs • Uses six different metrics for the comparison of literals  Digram, Trigram, Soundex, Edit Distance (Levenshtein distance), Single Error, Character Frequency • Time-Span comparison
  • 9. 9 Future Work • Compatibility of geographic areas • Broader / narrower term classification e.g. Painter – Artist • Exclusion of comparisons by using e.g negative weights, blocking data

Editor's Notes

  1. Instance matching for Linked Data is the process of comparing different particulars with the goal of recognizing the same real-world entity. Let’s see some examples: Here we see two Persons with the same Appellation (Thutmose) , but different Profession (Ruler vs. Sculptor) and Birth Timespan (1505 BC vs. 1360 BC). So we can easily export the information that these two particulars are not the same real entity , but different ones. Another example is presented here