Instance Matching tool is a java API developed in java 7 used in order to retrieve similarities of instance pairs, on user specified fields among “Source Data” and “Target Data”.
Source Data are always a set of RDF files while Target Data may be another set of RDF files or an online Database. Currently the British Museum Collection Database(http://www.britishmuseum.org/research/collection_online/search.aspx) and CLAROS database (http://www.clarosnet.org/) are supported.
Available under an open source software license https://github.com/isl/IMAPI .
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Instance Matching API - Foundation of Research and Technology Hellas, Institute of Computer Science
1. 1
INSTANCE MATCHING API
Ilias Tzortzakakis - Evangelia Daskalaki - Martin Doerr
Foundation of Research and Technology Hellas
(FORTH),
Institute of Computer Science (ICS)
2. 2
Instance Matching
Instance matching for Linked Data is the process
of comparing different particulars with the goal
of recognizing the same real-world entity.
RDF Data Source A
Appellation: Thutmose
Profession: Ruler
Birth Timespan: 1505-1504 BC
Same
RDF Data Source B
particular ? Appellation: Thutmose
Profession: Sculptor
Birth Timespan: 1360-1350 BC
3. 3
IMAPI - Instance Matching API
• Instance Matching tool is a java API developed in java 7 used
in order to retrieve similarities of instance pairs, on user
specified fields among “Source Data” and “Target Data”.
• Source Data are always a set of RDF files while Target Data
may be another set of RDF files or an online Database.
Currently the British Museum Collection
Database(http://www.britishmuseum.org/research/collection
_online/search.aspx) and CLAROS database
(http://www.clarosnet.org/) are supported.
• Available under an open source software license
https://github.com/isl/IMAPI .
6. 6
IMAPI Process
Source
CIDOC RDF
Files
Target
CIDOC RDF
Files / online
Triple store
Clustering
Data
IM API
• Clustering data to
match them
sequentially
Calculating
Similarities
• Calculation similarities
by using threshold and
weighted averages
according to the User
Configuration File
Result Set
Matched
instances
+
Matching
justifications
User
Configuration
File
7. 7
IMAPI User Configuration File
Define the source and the target CIDOC-CRM
data that will be matched together
Customize a set of weighted paths that are
used in the matching process
1) Search for E21_Actors
2) Return all their rdfs:labels (w=0.5)
3) Get literals connected to Actors via
their P3_has_note predicate
(w=0.4)
4) Get URIs connected to Actors via
their P131_is_identified predicate
(w=0.9)
5) …
6) ….
User Configuration File
Example
8. 8
IMAPI Novelties
• System’s ability to capture Domain Knowledge and Reality
Knowledge of the experts by using targeted path rules, e.g.
one painting is usually painted by one and randomly by two
painters.
• Fully customizable for the specific needs of different CIDOC
CRM instance matching problems (or another rich RDF/OWL
Ontology) depending on the data included in the DBs
• Uses six different metrics for the comparison of literals
Digram, Trigram, Soundex, Edit Distance (Levenshtein distance),
Single Error, Character Frequency
• Time-Span comparison
9. 9
Future Work
• Compatibility of geographic areas
• Broader / narrower term classification e.g.
Painter – Artist
• Exclusion of comparisons by using e.g negative
weights, blocking data
Editor's Notes
Instance matching for Linked Data is the process of comparing different particulars with the goal of recognizing the same real-world entity.
Let’s see some examples:
Here we see two Persons with the same Appellation (Thutmose) , but different Profession (Ruler vs. Sculptor) and Birth Timespan (1505 BC vs. 1360 BC). So we can easily export the information that these two particulars are not the same real entity , but different ones. Another example is presented here