SlideShare a Scribd company logo
A Statistical and Schema Independent
Approach to Identify Equivalent Properties
on Linked Data
†Kno.e.sis Center
Wright State University
Dayton OH, USA
‡IBM T J Watson Research Center
Yorktown Heights
New York NY, USA
Kalpa Gunaratna†, Krishnaprasad Thirunarayan†, Prateek Jain‡, Amit Sheth†, and Sanjaya Wijeratne†
{kalpa,tkprasad,amit,sanjaya}@knoesis.org, jainpr@us.ibm.com
iSemantics 2013, Graz, Austria
09/06/2013 2
Motivation
Why we need property alignment and it is so
important?
iSemantics 2013
09/06/2013 3
Many datasets. We can query!
iSemantics 2013
09/06/2013 3iSemantics 2013
09/06/2013 3
Same information in different names.
Therefore, data integration for better presentation is required.
iSemantics 2013
Background
Statistical Equivalence of properties
Evaluation
Discussion, interesting facts, and future directions
Conclusion
09/06/2013 4
Roadmap
iSemantics 2013
Existing techniques for property alignment fall into three
categories.
I. Syntactic/dictionary based
– Uses string manipulation techniques, external dictionaries and
lexical databases like WordNet.
II. Schema dependent
– Uses schema information such as, domain and range, definitions.
III. Schema independent
– Uses instance level information for the alignment.
 Our approach falls under schema independent.
09/06/2013 5
Background
iSemantics 2013
Properties capture meaning of triples and hence they are
complex in nature.



09/06/2013 6iSemantics 2013
Properties capture meaning of triples and hence they are
complex in nature.
Syntactic or dictionary based approaches analyze property
names for equivalence. But in LOD, name heterogeneities exist.


09/06/2013 6iSemantics 2013
Properties capture meaning of triples and hence they are
complex in nature.
Syntactic or dictionary based approaches analyze property
names for equivalence. But in LOD, name heterogeneities exist.
Therefore, syntactic or dictionary based approaches have
limited coverage in property alignment.

09/06/2013 6iSemantics 2013
Properties capture meaning of triples and hence they are
complex in nature.
Syntactic or dictionary based approaches analyze property
names for equivalence. But in LOD, name heterogeneities exist.
Therefore, syntactic or dictionary based approaches have
limited coverage in property alignment.
Schema dependent approaches including processing domain
and range, class level tags do not capture semantics of
properties well.
09/06/2013 6iSemantics 2013
Background
Statistical Equivalence of properties
Evaluation
Discussion, interesting facts, and future directions
Conclusion
09/06/2013 7
Roadmap
iSemantics 2013
 Statistical Equivalence is based on analyzing owl:equivalentProperty.
 owl:equivalentProperty - properties that have same property
extensions.









09/06/2013 8
Statistical Equivalence of properties
iSemantics 2013
 Statistical Equivalence is based on analyzing owl:equivalentProperty.
 owl:equivalentProperty - properties that have same property
extensions.
Example 1:
Property P is defined by the triples, { a P b, c P d, e P f }
Property Q is defined by the triples, { a Q b, c Q d, e Q f }
P and Q are owl:equivalentProperty, because they have the same extension,
{ {a,b}, {c,d}, {e,f} }




09/06/2013 8
Statistical Equivalence of properties
iSemantics 2013
 Statistical Equivalence is based on analyzing owl:equivalentProperty.
 owl:equivalentProperty - properties that have same property
extensions.
Example 1:
Property P is defined by the triples, { a P b, c P d, e P f }
Property Q is defined by the triples, { a Q b, c Q d, e Q f }
P and Q are owl:equivalentProperty, because they have the same extension,
{ {a,b}, {c,d}, {e,f} }
Example 2:
Property P is defined by the triples, { a P b, c P d, e P f }
Property Q is defined by the triples, { a Q b, c Q d, e Q h }
Then, P and Q are not owl:equivalentProperty, because their extensions are not the
same. But they provide statistical evidence in support of equivalence.
09/06/2013 8
Statistical Equivalence of properties
iSemantics 2013
Intuition
 Higher rate of subject-object matches in extensions leads to
equivalent properties. In practice, it is hard to have exact same
extensions for matching properties. Because,
– Datasets are incomplete.
– Same instance may be modelled differently in different datasets.
 Therefore, we analyze the property extensions to identify equivalent
properties between datasets.
 We define the following notions. Let the statement below be true
for all the definitions.
S1P1O1 and S2P2O2 be two triples in Dataset D1 and D2 respectively.
09/06/2013 iSemantics 2013 9
Definition 1: Candidate Match
The two properties P1 and P2 are a candidate match iff S1
𝐸𝐶𝑅∗
S2 and O1
𝐸𝐶𝑅∗
O2.
We say two instances are connected by an ECR* link if there is a link path between the
instances using ECR links (* is the Kleene star notation). ECR links are Entity Co-reference
Relationships such as those formalized using owl:sameAs and skos:exactMatch.






09/06/2013 iSemantics 2013 10
Definition 1: Candidate Match
The two properties P1 and P2 are a candidate match iff S1
𝐸𝐶𝑅∗
S2 and O1
𝐸𝐶𝑅∗
O2.
We say two instances are connected by an ECR* link if there is a link path between the
instances using ECR links (* is the Kleene star notation). ECR links are Entity Co-reference
Relationships such as those formalized using owl:sameAs and skos:exactMatch.
Example
The two datasets DBpedia(d) and Freebase(f)
d:Arthur Purdy Stout d:place of birth d:New York City
f:Arthur Purdy Stout f:place of death f:New York City


09/06/2013 iSemantics 2013 10
Definition 1: Candidate Match
The two properties P1 and P2 are a candidate match iff S1
𝐸𝐶𝑅∗
S2 and O1
𝐸𝐶𝑅∗
O2.
We say two instances are connected by an ECR* link if there is a link path between the
instances using ECR links (* is the Kleene star notation). ECR links are Entity Co-reference
Relationships such as those formalized using owl:sameAs and skos:exactMatch.
Example
The two datasets DBpedia(d) and Freebase(f)
d:Arthur Purdy Stout d:place of birth d:New York City
f:Arthur Purdy Stout f:place of death f:New York City
 The above is a candidate match, but not equivalent, because intensions are different
(coincidental match).
 We need further analysis to decide on equivalence.
09/06/2013 iSemantics 2013 10
Match Count μ(P1,P2) – Number of triple pairs for P1 and P2 that participate in
candidate matches.
μ 𝑃1, 𝑃2 = | 𝑆1 𝑃1 𝑂1 ϵ 𝐷1 | ∃ 𝑆2 𝑃2 𝑂2 ϵ 𝐷2 ˄ 𝑆1
𝐸𝐶𝑅∗
𝑆2 ˄ 𝑂1
𝐸𝐶𝑅∗
𝑂2 |
Co-appearance Count λ(P1,P2) – Number of triple pairs for P1 and P2 that have
matching subjects.
λ 𝑃1, 𝑃2 = | 𝑆1 𝑃1 𝑂1 ϵ 𝐷1 | ∃ 𝑆2 𝑃2 𝑂2 ϵ 𝐷2 ˄ 𝑆1
𝐸𝐶𝑅∗
𝑆2 |
Definition 2: Statistically Equivalent Properties
The pair of properties P1 and P2 are statistically equivalent to degree (α, k) iff,
𝐹 =
μ(𝑃1
,𝑃2
)
λ(𝑃1
,𝑃2
)
≥ 𝛼,
Where, μ(P1,P2) ≥ k, and 0 ˂ α ≤ 1, k > 1
09/06/2013 iSemantics 2013 11
09/06/2013 iSemantics 2013 12
Dataset 2Dataset 1
Candidate Matching Algorithm Process
09/06/2013 iSemantics 2013 12
I1
Dataset 2Dataset 1
I1=d1:Willis_Lamb
Candidate Matching Algorithm Process
09/06/2013 iSemantics 2013 12
I1 I2
owl:sameAs
Dataset 2Dataset 1
I1=d1:Willis_Lamb I2 =d2:willis_lamb
Step 1
Candidate Matching Algorithm Process
09/06/2013 iSemantics 2013 12
I1 I2
I2
owl:sameAs
P1=d1:doctoralStudent
P2=d2:education.
academic.advisees
Dataset 2Dataset 1
d2:theodore_harold_maiman
I1=d1:Willis_Lamb I2 =d2:willis_lamb
I1
I1
d1:Theodore_Maiman
triple 1
triple 2
triple 3
triple 4
triple 5
Step 1
Step2
Step2
Candidate Matching Algorithm Process
09/06/2013 iSemantics 2013 12
I1 I2
I2
matching resources
owl:sameAs
P1=d1:doctoralStudent
P2=d2:education.
academic.advisees
Dataset 2Dataset 1
property P1 and property P2 are a candidate match
d2:theodore_harold_maiman
I1=d1:Willis_Lamb I2 =d2:willis_lamb
I1
I1
d1:Theodore_Maiman
triple 1
triple 2
triple 3
triple 4
triple 5
Step 1
Step2
Step2
Step 3
Candidate Matching Algorithm Process
09/06/2013 iSemantics 2013 13
Complexity:
If the average number of properties for an entity is x and for each property, average
number of objects is j. For n subjects, it requires n*j2*x2+2n comparisons. Since n > j,
n > x, and x and j are independent of n, O(n).
Example:
09/06/2013 iSemantics 2013 14
Parallel computation (Map-Reduce implementation)
 Generating candidate matches can be done for each instance
independently. Hence, we implemented the algorithm in Hadoop 1.0.3
framework.
 Generating candidate matches for instances is distributed among mappers
and each mapper outputs μ and λ to the reducer for property pairs.
• Map Phase
– Let the number of subject instances in dataset D1 be X and namespace of dataset
D2 be ns. For each subject i ϵ X, start a mapper job for
GenerateCandidateMatches(i, ns).
– Each mapper outputs (key,value) pairs as (p:q, μ(p,q):λ(p,q)). pϵD1 and qϵD2.
 The reducer collects all μ and λ values and aggregate them for final
analysis.
• Reduce phase
– Collects output from mappers and aggregates μ(p,q) and λ(p,q) for each key p:q.
 The map reduce version on a 14 node cluster was able to achieve a speed
up of 833% compared to the desktop version.
09/06/2013 iSemantics 2013 28
Background
Statistical Equivalence of properties
Evaluation
Discussion, interesting facts, and future directions
Conclusion
09/06/2013 16
Roadmap
iSemantics 2013
Objectives of the evaluation
– Show the effectiveness of the approach in linked datasets
– Compare with existing aligning techniques


09/06/2013 iSemantics 2013 17
Evaluation
Objectives of the evaluation
– Show the effectiveness of the approach in linked datasets
– Compare with existing aligning techniques
We selected 5000 instance samples from DBpedia, Freebase,
LinkedMDB, DBLP L3S , and DBLP RKB Explorer datasets.

09/06/2013 iSemantics 2013 17
Evaluation
Objectives of the evaluation
– Show the effectiveness of the approach in linked datasets
– Compare with existing aligning techniques
We selected 5000 instance samples from DBpedia, Freebase,
LinkedMDB, DBLP L3S , and DBLP RKB Explorer datasets.
These datasets have,
– Complete data for instances in different viewpoints
– Many inter-links
– Complex properties
09/06/2013 iSemantics 2013 17
Evaluation
Experiment details
– α = 0.5 for all experiments (works for LOD) except DBpedia and
Freebase movie alignment where it was 0.7.
09/06/2013 iSemantics 2013 18
Experiment details
– α = 0.5 for all experiments (works for LOD) except DBpedia and
Freebase movie alignment where it was 0.7.
– k was set as 14, 6, 2, 2, and 2 respectively for Person, Film and
Software between DBpedia and Freebase, Film between
LinkedMDB and DBpedia, and article between DBLP datasets.
09/06/2013 iSemantics 2013 18
Experiment details
– α = 0.5 for all experiments (works for LOD) except DBpedia and
Freebase movie alignment where it was 0.7.
– k was set as 14, 6, 2, 2, and 2 respectively for Person, Film and
Software between DBpedia and Freebase, Film between
LinkedMDB and DBpedia, and article between DBLP datasets.
– k can be estimated using the data as follows,
– Set α = 0.5 and k = 2 (lowest positive values).
– Get exact matching property (property names) pairs not identified by
the algorithm and their μ
– Get the average of those μ values
09/06/2013 iSemantics 2013 18
Experiment details
– α = 0.5 for all experiments (works for LOD) except DBpedia and
Freebase movie alignment where it was 0.7.
– k was set as 14, 6, 2, 2, and 2 respectively for Person, Film and
Software between DBpedia and Freebase, Film between
LinkedMDB and DBpedia, and article between DBLP datasets.
– k can be estimated using the data as follows,
– Set α = 0.5 and k = 2 (lowest positive values).
– Get exact matching property (property names) pairs not identified by
the algorithm and their μ
– Get the average of those μ values
– 0.92 for string similarity algorithms.
09/06/2013 iSemantics 2013 18
Experiment details
– α = 0.5 for all experiments (works for LOD) except DBpedia and
Freebase movie alignment where it was 0.7.
– k was set as 14, 6, 2, 2, and 2 respectively for Person, Film and
Software between DBpedia and Freebase, Film between
LinkedMDB and DBpedia, and article between DBLP datasets.
– k can be estimated using the data as follows,
– Set α = 0.5 and k = 2 (lowest positive values).
– Get exact matching property (property names) pairs not identified by
the algorithm and their μ
– Get the average of those μ values
– 0.92 for string similarity algorithms.
– 0.8 for WordNet similarity.
09/06/2013 iSemantics 2013 18
Measure
type
DBpedia –
Freebase
(Person)
DBpedia –
Freebase
(Film)
DBpedia –
Freebase
(Software)
DBpedia –
LinkedMDB
(Film)
DBLP_RKB –
DBLP_L3S
(Article)
Average
Extension
Based
Algorithm
Precision 0.8758 0.9737 0.6478 0.7560 1.0000 0.8427
Recall 0.8089* 0.5138 0.4339 0.8157 1.0000 0.7145
F measure 0.8410* 0.6727 0.5197 0.7848 1.0000 0.7656
WordNet
Similarity
Precision 0.5200 0.8620 0.7619 0.8823 1.0000 0.8052
Recall 0.4140* 0.3472 0.3018 0.3947 0.3333 0.3582
F measure 0.4609* 0.4950 0.4324 0.5454 0.5000 0.4867
Dice
Similarity
Precision 0.8064 0.9666 0.7659 1.0000 0.0000 0.7078
Recall 0.4777* 0.4027 0.3396 0.3421 0.0000 0.3124
F measure 0.6000* 0.5686 0.4705 0.5098 0.0000 0.4298
Jaro
Similarity
Precision 0.6774 0.8809 0.7755 0.9411 0.0000 0.6550
Recall 0.5350* 0.5138 0.3584 0.4210 0.0000 0.3656
F measure 0.5978* 0.6491 0.4903 0.5818 0.0000 0.4638
09/06/2013 iSemantics 2013 19
Alignment results
* Marks estimated values for experiment 1 because of very large comparisons to check manually. Boldface
marks highest result for each experiment.
Example identifications
09/06/2013 iSemantics 2013 20
Property pair
types
Dataset 1 (DBpedia) Dataset 2 (Freebase)
Simple string
similarity matches
db:nationality fb:nationality
db:religion fb:religion
Synonymous
matches
db:occupation fb:profession
db:battles fb:participated_in_conflicts
Complex matches db:screenplay fb:written_by
db:doctoralStudent fb:advisees
Example identifications
09/06/2013 iSemantics 2013 20
Property pair
types
Dataset 1 (DBpedia) Dataset 2 (Freebase)
Simple string
similarity matches
db:nationality fb:nationality
db:religion fb:religion
Synonymous
matches
db:occupation fb:profession
db:battles fb:participated_in_conflicts
Complex matches db:screenplay fb:written_by
db:doctoralStudent fb:advisees
WordNet similarity failed to identify any of these
Background
Statistical Equivalence of properties
Evaluation
Discussion, interesting facts, and future directions
Conclusion
09/06/2013 21
Roadmap
iSemantics 2013
 Our experiment covered multi-domain to multi-domain, multi-
domain to specific domain and specific-domain to specific-domain
dataset property alignment.




09/06/2013 iSemantics 2013 22
Discussion, interesting facts, and future directions
 Our experiment covered multi-domain to multi-domain, multi-
domain to specific domain and specific-domain to specific-domain
dataset property alignment.
 In every experiment, the extension based algorithm outperformed
others (F measure). F measure gain is in the range of 57% to 78%.



09/06/2013 iSemantics 2013 22
Discussion, interesting facts, and future directions
 Our experiment covered multi-domain to multi-domain, multi-
domain to specific domain and specific-domain to specific-domain
dataset property alignment.
 In every experiment, the extension based algorithm outperformed
others (F measure). F measure gain is in the range of 57% to 78%.
 Some properties that are identified are intentionally different, e.g.,
db:distributor vs fb:production_companies.
– This is because many companies produce and also distribute their
films.


09/06/2013 iSemantics 2013 22
Discussion, interesting facts, and future directions
 Our experiment covered multi-domain to multi-domain, multi-
domain to specific domain and specific-domain to specific-domain
dataset property alignment.
 In every experiment, the extension based algorithm outperformed
others (F measure). F measure gain is in the range of 57% to 78%.
 Some properties that are identified are intentionally different, e.g.,
db:distributor vs fb:production_companies.
– This is because many companies produce and also distribute their
films.
 Some identified pairs are incorrect due to errors in data modeling.
– For example, db:issue and fb:children.

09/06/2013 iSemantics 2013 22
Discussion, interesting facts, and future directions
 Our experiment covered multi-domain to multi-domain, multi-
domain to specific domain and specific-domain to specific-domain
dataset property alignment.
 In every experiment, the extension based algorithm outperformed
others (F measure). F measure gain is in the range of 57% to 78%.
 Some properties that are identified are intentionally different, e.g.,
db:distributor vs fb:production_companies.
– This is because many companies produce and also distribute their
films.
 Some identified pairs are incorrect due to errors in data modeling.
– For example, db:issue and fb:children.
 owl:sameAs linking issues in LOD (not linking exact same thing), e.g.,
linking London and Greater London.
– We believe few misused links wont affect the algorithm as it decides
on a match after analyzing many matches for a pair.
09/06/2013 iSemantics 2013 22
Discussion, interesting facts, and future directions
 Less number of interlinks.
– Evolve over time.
– Look for possible other types of ECR links (i.e., rdf:seeAlso).


09/06/2013 iSemantics 2013 23
 Less number of interlinks.
– Evolve over time.
– Look for possible other types of ECR links (i.e., rdf:seeAlso).
 Properties do not have uniform distribution in a dataset.
– Hence, some properties do not have enough matches or appearances.
– This is due to rare classes and domains they belong to.
– We can run the algorithm on instances that these less frequent
properties appear iteratively.

09/06/2013 iSemantics 2013 23
 Less number of interlinks.
– Evolve over time.
– Look for possible other types of ECR links (i.e., rdf:seeAlso).
 Properties do not have uniform distribution in a dataset.
– Hence, some properties do not have enough matches or appearances.
– This is due to rare classes and domains they belong to.
– We can run the algorithm on instances that these less frequent
properties appear iteratively.
Current limitations,
– Requires ECR links
– Requires overlapping datasets
– Object-type properties
– Inability to identify property – sub property relationships
09/06/2013 iSemantics 2013 23
Background
Statistical Equivalence of properties
Evaluation
Discussion, interesting facts, and future directions
Conclusion
09/06/2013 24
Roadmap
iSemantics 2013
We approximate owl:equivalentProperty using Statistical
Equivalence of properties by analyzing property extensions,
which is schema independent.



09/06/2013 iSemantics 2013 25
Conclusion
We approximate owl:equivalentProperty using Statistical
Equivalence of properties by analyzing property extensions,
which is schema independent.
This novel extension based approach works well with
interlinked datasets.


09/06/2013 iSemantics 2013 25
Conclusion
We approximate owl:equivalentProperty using Statistical
Equivalence of properties by analyzing property extensions,
which is schema independent.
This novel extension based approach works well with
interlinked datasets.
The extension based approach outperforms syntax or
dictionary based approaches. F measure gain in the range of
57% - 78%.

09/06/2013 iSemantics 2013 25
Conclusion
We approximate owl:equivalentProperty using Statistical
Equivalence of properties by analyzing property extensions,
which is schema independent.
This novel extension based approach works well with
interlinked datasets.
The extension based approach outperforms syntax or
dictionary based approaches. F measure gain in the range of
57% - 78%.
It requires many comparisons, but can be easily parallelized
evidenced by our Map-Reduce implementation.
09/06/2013 iSemantics 2013 25
Conclusion
26
Thank You
http://knoesis.wright.edu/researchers/kalpa
kalpa@knoesis.org
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA
Questions ?
09/06/2013 iSemantics 2013

More Related Content

What's hot

Datastructures using c++
Datastructures using c++Datastructures using c++
Datastructures using c++
Gopi Nath
 
Data Structure
Data StructureData Structure
Data Structuresheraz1
 
UNIT I LINEAR DATA STRUCTURES – LIST
UNIT I 	LINEAR DATA STRUCTURES – LIST 	UNIT I 	LINEAR DATA STRUCTURES – LIST
UNIT I LINEAR DATA STRUCTURES – LIST
Kathirvel Ayyaswamy
 
Data structure using c++
Data structure using c++Data structure using c++
Data structure using c++
Prof. Dr. K. Adisesha
 
Data structures using C
Data structures using CData structures using C
Data structures using C
Prof. Dr. K. Adisesha
 
Data Structure
Data StructureData Structure
Data Structure
Karthikeyan A K
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
Majid Abdollahi
 
Incomplete Information in RDF
Incomplete Information in RDFIncomplete Information in RDF
Incomplete Information in RDF
Charalampos (Babis) Nikolaou
 
Exchanging more than Complete Data
Exchanging more than Complete DataExchanging more than Complete Data
Exchanging more than Complete Datanet2-project
 
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
widespreadpromotion
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure
Prof Ansari
 
Introduction of data structure
Introduction of data structureIntroduction of data structure
Introduction of data structureeShikshak
 
Data Structures (BE)
Data Structures (BE)Data Structures (BE)
Data Structures (BE)
PRABHAHARAN429
 
Data Structures
Data StructuresData Structures
Data Structures
Prof. Dr. K. Adisesha
 
What's next in Julia
What's next in JuliaWhat's next in Julia
What's next in JuliaJiahao Chen
 
R training2
R training2R training2
R training2
Hellen Gakuruh
 
Data structures using c
Data structures using cData structures using c
Data structures using c
Prof. Dr. K. Adisesha
 
Elementary data structure
Elementary data structureElementary data structure
Elementary data structure
Biswajit Mandal
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
izahn
 
Data Structures Notes 2021
Data Structures Notes 2021Data Structures Notes 2021
Data Structures Notes 2021
Sreedhar Chowdam
 

What's hot (20)

Datastructures using c++
Datastructures using c++Datastructures using c++
Datastructures using c++
 
Data Structure
Data StructureData Structure
Data Structure
 
UNIT I LINEAR DATA STRUCTURES – LIST
UNIT I 	LINEAR DATA STRUCTURES – LIST 	UNIT I 	LINEAR DATA STRUCTURES – LIST
UNIT I LINEAR DATA STRUCTURES – LIST
 
Data structure using c++
Data structure using c++Data structure using c++
Data structure using c++
 
Data structures using C
Data structures using CData structures using C
Data structures using C
 
Data Structure
Data StructureData Structure
Data Structure
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
Incomplete Information in RDF
Incomplete Information in RDFIncomplete Information in RDF
Incomplete Information in RDF
 
Exchanging more than Complete Data
Exchanging more than Complete DataExchanging more than Complete Data
Exchanging more than Complete Data
 
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
2. Linear Data Structure Using Arrays - Data Structures using C++ by Varsha P...
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure
 
Introduction of data structure
Introduction of data structureIntroduction of data structure
Introduction of data structure
 
Data Structures (BE)
Data Structures (BE)Data Structures (BE)
Data Structures (BE)
 
Data Structures
Data StructuresData Structures
Data Structures
 
What's next in Julia
What's next in JuliaWhat's next in Julia
What's next in Julia
 
R training2
R training2R training2
R training2
 
Data structures using c
Data structures using cData structures using c
Data structures using c
 
Elementary data structure
Elementary data structureElementary data structure
Elementary data structure
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Data Structures Notes 2021
Data Structures Notes 2021Data Structures Notes 2021
Data Structures Notes 2021
 

Similar to A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data

Towards Transfer Learning of Link Specifications
Towards Transfer Learning of Link SpecificationsTowards Transfer Learning of Link Specifications
Towards Transfer Learning of Link Specifications
geoknow
 
A Preference Model on Adaptive Affinity Propagation
A Preference Model on Adaptive Affinity PropagationA Preference Model on Adaptive Affinity Propagation
A Preference Model on Adaptive Affinity Propagation
IJECEIAES
 
Pertemuan 5_Relation Matriks_01 (17)
Pertemuan 5_Relation Matriks_01 (17)Pertemuan 5_Relation Matriks_01 (17)
Pertemuan 5_Relation Matriks_01 (17)
Evert Sandye Taasiringan
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
Sakthivel C R
 
Spatial Approximate String Keyword content Query processing
Spatial Approximate String Keyword content Query processingSpatial Approximate String Keyword content Query processing
Spatial Approximate String Keyword content Query processing
inventionjournals
 
Top-K Dominating Queries on Incomplete Data with Priorities
Top-K Dominating Queries on Incomplete Data with PrioritiesTop-K Dominating Queries on Incomplete Data with Priorities
Top-K Dominating Queries on Incomplete Data with Priorities
ijtsrd
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
Anshik Bansal
 
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Editor IJMTER
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.
Mehwish Alam
 
Ldml application - public
Ldml   application - publicLdml   application - public
Ldml application - publicseanscon
 
inteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access FrameworkinteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access Framework
National Inistitute of Informatics (NII), Tokyo, Japann
 
Cs229 notes10
Cs229 notes10Cs229 notes10
Cs229 notes10
VuTran231
 
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAEFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
csandit
 
ML_Unit_1_Part_C
ML_Unit_1_Part_CML_Unit_1_Part_C
ML_Unit_1_Part_C
Srimatre K
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
Prakash Dubey
 
Evaluation of Uncertain Location
Evaluation of Uncertain Location Evaluation of Uncertain Location
Evaluation of Uncertain Location
IOSR Journals
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
Benjamin Bengfort
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
Hiroyuki Kuromiya
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
Ankit Tewari
 

Similar to A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data (20)

Towards Transfer Learning of Link Specifications
Towards Transfer Learning of Link SpecificationsTowards Transfer Learning of Link Specifications
Towards Transfer Learning of Link Specifications
 
A Preference Model on Adaptive Affinity Propagation
A Preference Model on Adaptive Affinity PropagationA Preference Model on Adaptive Affinity Propagation
A Preference Model on Adaptive Affinity Propagation
 
Pertemuan 5_Relation Matriks_01 (17)
Pertemuan 5_Relation Matriks_01 (17)Pertemuan 5_Relation Matriks_01 (17)
Pertemuan 5_Relation Matriks_01 (17)
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
 
Spatial Approximate String Keyword content Query processing
Spatial Approximate String Keyword content Query processingSpatial Approximate String Keyword content Query processing
Spatial Approximate String Keyword content Query processing
 
Top-K Dominating Queries on Incomplete Data with Priorities
Top-K Dominating Queries on Incomplete Data with PrioritiesTop-K Dominating Queries on Incomplete Data with Priorities
Top-K Dominating Queries on Incomplete Data with Priorities
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
 
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.
 
Ldml application - public
Ldml   application - publicLdml   application - public
Ldml application - public
 
inteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access FrameworkinteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access Framework
 
Cs229 notes10
Cs229 notes10Cs229 notes10
Cs229 notes10
 
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAEFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
ML_Unit_1_Part_C
ML_Unit_1_Part_CML_Unit_1_Part_C
ML_Unit_1_Part_C
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
 
Evaluation of Uncertain Location
Evaluation of Uncertain Location Evaluation of Uncertain Location
Evaluation of Uncertain Location
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data

  • 1. A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data †Kno.e.sis Center Wright State University Dayton OH, USA ‡IBM T J Watson Research Center Yorktown Heights New York NY, USA Kalpa Gunaratna†, Krishnaprasad Thirunarayan†, Prateek Jain‡, Amit Sheth†, and Sanjaya Wijeratne† {kalpa,tkprasad,amit,sanjaya}@knoesis.org, jainpr@us.ibm.com iSemantics 2013, Graz, Austria
  • 2. 09/06/2013 2 Motivation Why we need property alignment and it is so important? iSemantics 2013
  • 3. 09/06/2013 3 Many datasets. We can query! iSemantics 2013
  • 5. 09/06/2013 3 Same information in different names. Therefore, data integration for better presentation is required. iSemantics 2013
  • 6. Background Statistical Equivalence of properties Evaluation Discussion, interesting facts, and future directions Conclusion 09/06/2013 4 Roadmap iSemantics 2013
  • 7. Existing techniques for property alignment fall into three categories. I. Syntactic/dictionary based – Uses string manipulation techniques, external dictionaries and lexical databases like WordNet. II. Schema dependent – Uses schema information such as, domain and range, definitions. III. Schema independent – Uses instance level information for the alignment.  Our approach falls under schema independent. 09/06/2013 5 Background iSemantics 2013
  • 8. Properties capture meaning of triples and hence they are complex in nature.    09/06/2013 6iSemantics 2013
  • 9. Properties capture meaning of triples and hence they are complex in nature. Syntactic or dictionary based approaches analyze property names for equivalence. But in LOD, name heterogeneities exist.   09/06/2013 6iSemantics 2013
  • 10. Properties capture meaning of triples and hence they are complex in nature. Syntactic or dictionary based approaches analyze property names for equivalence. But in LOD, name heterogeneities exist. Therefore, syntactic or dictionary based approaches have limited coverage in property alignment.  09/06/2013 6iSemantics 2013
  • 11. Properties capture meaning of triples and hence they are complex in nature. Syntactic or dictionary based approaches analyze property names for equivalence. But in LOD, name heterogeneities exist. Therefore, syntactic or dictionary based approaches have limited coverage in property alignment. Schema dependent approaches including processing domain and range, class level tags do not capture semantics of properties well. 09/06/2013 6iSemantics 2013
  • 12. Background Statistical Equivalence of properties Evaluation Discussion, interesting facts, and future directions Conclusion 09/06/2013 7 Roadmap iSemantics 2013
  • 13.  Statistical Equivalence is based on analyzing owl:equivalentProperty.  owl:equivalentProperty - properties that have same property extensions.          09/06/2013 8 Statistical Equivalence of properties iSemantics 2013
  • 14.  Statistical Equivalence is based on analyzing owl:equivalentProperty.  owl:equivalentProperty - properties that have same property extensions. Example 1: Property P is defined by the triples, { a P b, c P d, e P f } Property Q is defined by the triples, { a Q b, c Q d, e Q f } P and Q are owl:equivalentProperty, because they have the same extension, { {a,b}, {c,d}, {e,f} }     09/06/2013 8 Statistical Equivalence of properties iSemantics 2013
  • 15.  Statistical Equivalence is based on analyzing owl:equivalentProperty.  owl:equivalentProperty - properties that have same property extensions. Example 1: Property P is defined by the triples, { a P b, c P d, e P f } Property Q is defined by the triples, { a Q b, c Q d, e Q f } P and Q are owl:equivalentProperty, because they have the same extension, { {a,b}, {c,d}, {e,f} } Example 2: Property P is defined by the triples, { a P b, c P d, e P f } Property Q is defined by the triples, { a Q b, c Q d, e Q h } Then, P and Q are not owl:equivalentProperty, because their extensions are not the same. But they provide statistical evidence in support of equivalence. 09/06/2013 8 Statistical Equivalence of properties iSemantics 2013
  • 16. Intuition  Higher rate of subject-object matches in extensions leads to equivalent properties. In practice, it is hard to have exact same extensions for matching properties. Because, – Datasets are incomplete. – Same instance may be modelled differently in different datasets.  Therefore, we analyze the property extensions to identify equivalent properties between datasets.  We define the following notions. Let the statement below be true for all the definitions. S1P1O1 and S2P2O2 be two triples in Dataset D1 and D2 respectively. 09/06/2013 iSemantics 2013 9
  • 17. Definition 1: Candidate Match The two properties P1 and P2 are a candidate match iff S1 𝐸𝐶𝑅∗ S2 and O1 𝐸𝐶𝑅∗ O2. We say two instances are connected by an ECR* link if there is a link path between the instances using ECR links (* is the Kleene star notation). ECR links are Entity Co-reference Relationships such as those formalized using owl:sameAs and skos:exactMatch.       09/06/2013 iSemantics 2013 10
  • 18. Definition 1: Candidate Match The two properties P1 and P2 are a candidate match iff S1 𝐸𝐶𝑅∗ S2 and O1 𝐸𝐶𝑅∗ O2. We say two instances are connected by an ECR* link if there is a link path between the instances using ECR links (* is the Kleene star notation). ECR links are Entity Co-reference Relationships such as those formalized using owl:sameAs and skos:exactMatch. Example The two datasets DBpedia(d) and Freebase(f) d:Arthur Purdy Stout d:place of birth d:New York City f:Arthur Purdy Stout f:place of death f:New York City   09/06/2013 iSemantics 2013 10
  • 19. Definition 1: Candidate Match The two properties P1 and P2 are a candidate match iff S1 𝐸𝐶𝑅∗ S2 and O1 𝐸𝐶𝑅∗ O2. We say two instances are connected by an ECR* link if there is a link path between the instances using ECR links (* is the Kleene star notation). ECR links are Entity Co-reference Relationships such as those formalized using owl:sameAs and skos:exactMatch. Example The two datasets DBpedia(d) and Freebase(f) d:Arthur Purdy Stout d:place of birth d:New York City f:Arthur Purdy Stout f:place of death f:New York City  The above is a candidate match, but not equivalent, because intensions are different (coincidental match).  We need further analysis to decide on equivalence. 09/06/2013 iSemantics 2013 10
  • 20. Match Count μ(P1,P2) – Number of triple pairs for P1 and P2 that participate in candidate matches. μ 𝑃1, 𝑃2 = | 𝑆1 𝑃1 𝑂1 ϵ 𝐷1 | ∃ 𝑆2 𝑃2 𝑂2 ϵ 𝐷2 ˄ 𝑆1 𝐸𝐶𝑅∗ 𝑆2 ˄ 𝑂1 𝐸𝐶𝑅∗ 𝑂2 | Co-appearance Count λ(P1,P2) – Number of triple pairs for P1 and P2 that have matching subjects. λ 𝑃1, 𝑃2 = | 𝑆1 𝑃1 𝑂1 ϵ 𝐷1 | ∃ 𝑆2 𝑃2 𝑂2 ϵ 𝐷2 ˄ 𝑆1 𝐸𝐶𝑅∗ 𝑆2 | Definition 2: Statistically Equivalent Properties The pair of properties P1 and P2 are statistically equivalent to degree (α, k) iff, 𝐹 = μ(𝑃1 ,𝑃2 ) λ(𝑃1 ,𝑃2 ) ≥ 𝛼, Where, μ(P1,P2) ≥ k, and 0 ˂ α ≤ 1, k > 1 09/06/2013 iSemantics 2013 11
  • 21. 09/06/2013 iSemantics 2013 12 Dataset 2Dataset 1 Candidate Matching Algorithm Process
  • 22. 09/06/2013 iSemantics 2013 12 I1 Dataset 2Dataset 1 I1=d1:Willis_Lamb Candidate Matching Algorithm Process
  • 23. 09/06/2013 iSemantics 2013 12 I1 I2 owl:sameAs Dataset 2Dataset 1 I1=d1:Willis_Lamb I2 =d2:willis_lamb Step 1 Candidate Matching Algorithm Process
  • 24. 09/06/2013 iSemantics 2013 12 I1 I2 I2 owl:sameAs P1=d1:doctoralStudent P2=d2:education. academic.advisees Dataset 2Dataset 1 d2:theodore_harold_maiman I1=d1:Willis_Lamb I2 =d2:willis_lamb I1 I1 d1:Theodore_Maiman triple 1 triple 2 triple 3 triple 4 triple 5 Step 1 Step2 Step2 Candidate Matching Algorithm Process
  • 25. 09/06/2013 iSemantics 2013 12 I1 I2 I2 matching resources owl:sameAs P1=d1:doctoralStudent P2=d2:education. academic.advisees Dataset 2Dataset 1 property P1 and property P2 are a candidate match d2:theodore_harold_maiman I1=d1:Willis_Lamb I2 =d2:willis_lamb I1 I1 d1:Theodore_Maiman triple 1 triple 2 triple 3 triple 4 triple 5 Step 1 Step2 Step2 Step 3 Candidate Matching Algorithm Process
  • 26. 09/06/2013 iSemantics 2013 13 Complexity: If the average number of properties for an entity is x and for each property, average number of objects is j. For n subjects, it requires n*j2*x2+2n comparisons. Since n > j, n > x, and x and j are independent of n, O(n).
  • 28. Parallel computation (Map-Reduce implementation)  Generating candidate matches can be done for each instance independently. Hence, we implemented the algorithm in Hadoop 1.0.3 framework.  Generating candidate matches for instances is distributed among mappers and each mapper outputs μ and λ to the reducer for property pairs. • Map Phase – Let the number of subject instances in dataset D1 be X and namespace of dataset D2 be ns. For each subject i ϵ X, start a mapper job for GenerateCandidateMatches(i, ns). – Each mapper outputs (key,value) pairs as (p:q, μ(p,q):λ(p,q)). pϵD1 and qϵD2.  The reducer collects all μ and λ values and aggregate them for final analysis. • Reduce phase – Collects output from mappers and aggregates μ(p,q) and λ(p,q) for each key p:q.  The map reduce version on a 14 node cluster was able to achieve a speed up of 833% compared to the desktop version. 09/06/2013 iSemantics 2013 28
  • 29. Background Statistical Equivalence of properties Evaluation Discussion, interesting facts, and future directions Conclusion 09/06/2013 16 Roadmap iSemantics 2013
  • 30. Objectives of the evaluation – Show the effectiveness of the approach in linked datasets – Compare with existing aligning techniques   09/06/2013 iSemantics 2013 17 Evaluation
  • 31. Objectives of the evaluation – Show the effectiveness of the approach in linked datasets – Compare with existing aligning techniques We selected 5000 instance samples from DBpedia, Freebase, LinkedMDB, DBLP L3S , and DBLP RKB Explorer datasets.  09/06/2013 iSemantics 2013 17 Evaluation
  • 32. Objectives of the evaluation – Show the effectiveness of the approach in linked datasets – Compare with existing aligning techniques We selected 5000 instance samples from DBpedia, Freebase, LinkedMDB, DBLP L3S , and DBLP RKB Explorer datasets. These datasets have, – Complete data for instances in different viewpoints – Many inter-links – Complex properties 09/06/2013 iSemantics 2013 17 Evaluation
  • 33. Experiment details – α = 0.5 for all experiments (works for LOD) except DBpedia and Freebase movie alignment where it was 0.7. 09/06/2013 iSemantics 2013 18
  • 34. Experiment details – α = 0.5 for all experiments (works for LOD) except DBpedia and Freebase movie alignment where it was 0.7. – k was set as 14, 6, 2, 2, and 2 respectively for Person, Film and Software between DBpedia and Freebase, Film between LinkedMDB and DBpedia, and article between DBLP datasets. 09/06/2013 iSemantics 2013 18
  • 35. Experiment details – α = 0.5 for all experiments (works for LOD) except DBpedia and Freebase movie alignment where it was 0.7. – k was set as 14, 6, 2, 2, and 2 respectively for Person, Film and Software between DBpedia and Freebase, Film between LinkedMDB and DBpedia, and article between DBLP datasets. – k can be estimated using the data as follows, – Set α = 0.5 and k = 2 (lowest positive values). – Get exact matching property (property names) pairs not identified by the algorithm and their μ – Get the average of those μ values 09/06/2013 iSemantics 2013 18
  • 36. Experiment details – α = 0.5 for all experiments (works for LOD) except DBpedia and Freebase movie alignment where it was 0.7. – k was set as 14, 6, 2, 2, and 2 respectively for Person, Film and Software between DBpedia and Freebase, Film between LinkedMDB and DBpedia, and article between DBLP datasets. – k can be estimated using the data as follows, – Set α = 0.5 and k = 2 (lowest positive values). – Get exact matching property (property names) pairs not identified by the algorithm and their μ – Get the average of those μ values – 0.92 for string similarity algorithms. 09/06/2013 iSemantics 2013 18
  • 37. Experiment details – α = 0.5 for all experiments (works for LOD) except DBpedia and Freebase movie alignment where it was 0.7. – k was set as 14, 6, 2, 2, and 2 respectively for Person, Film and Software between DBpedia and Freebase, Film between LinkedMDB and DBpedia, and article between DBLP datasets. – k can be estimated using the data as follows, – Set α = 0.5 and k = 2 (lowest positive values). – Get exact matching property (property names) pairs not identified by the algorithm and their μ – Get the average of those μ values – 0.92 for string similarity algorithms. – 0.8 for WordNet similarity. 09/06/2013 iSemantics 2013 18
  • 38. Measure type DBpedia – Freebase (Person) DBpedia – Freebase (Film) DBpedia – Freebase (Software) DBpedia – LinkedMDB (Film) DBLP_RKB – DBLP_L3S (Article) Average Extension Based Algorithm Precision 0.8758 0.9737 0.6478 0.7560 1.0000 0.8427 Recall 0.8089* 0.5138 0.4339 0.8157 1.0000 0.7145 F measure 0.8410* 0.6727 0.5197 0.7848 1.0000 0.7656 WordNet Similarity Precision 0.5200 0.8620 0.7619 0.8823 1.0000 0.8052 Recall 0.4140* 0.3472 0.3018 0.3947 0.3333 0.3582 F measure 0.4609* 0.4950 0.4324 0.5454 0.5000 0.4867 Dice Similarity Precision 0.8064 0.9666 0.7659 1.0000 0.0000 0.7078 Recall 0.4777* 0.4027 0.3396 0.3421 0.0000 0.3124 F measure 0.6000* 0.5686 0.4705 0.5098 0.0000 0.4298 Jaro Similarity Precision 0.6774 0.8809 0.7755 0.9411 0.0000 0.6550 Recall 0.5350* 0.5138 0.3584 0.4210 0.0000 0.3656 F measure 0.5978* 0.6491 0.4903 0.5818 0.0000 0.4638 09/06/2013 iSemantics 2013 19 Alignment results * Marks estimated values for experiment 1 because of very large comparisons to check manually. Boldface marks highest result for each experiment.
  • 39. Example identifications 09/06/2013 iSemantics 2013 20 Property pair types Dataset 1 (DBpedia) Dataset 2 (Freebase) Simple string similarity matches db:nationality fb:nationality db:religion fb:religion Synonymous matches db:occupation fb:profession db:battles fb:participated_in_conflicts Complex matches db:screenplay fb:written_by db:doctoralStudent fb:advisees
  • 40. Example identifications 09/06/2013 iSemantics 2013 20 Property pair types Dataset 1 (DBpedia) Dataset 2 (Freebase) Simple string similarity matches db:nationality fb:nationality db:religion fb:religion Synonymous matches db:occupation fb:profession db:battles fb:participated_in_conflicts Complex matches db:screenplay fb:written_by db:doctoralStudent fb:advisees WordNet similarity failed to identify any of these
  • 41. Background Statistical Equivalence of properties Evaluation Discussion, interesting facts, and future directions Conclusion 09/06/2013 21 Roadmap iSemantics 2013
  • 42.  Our experiment covered multi-domain to multi-domain, multi- domain to specific domain and specific-domain to specific-domain dataset property alignment.     09/06/2013 iSemantics 2013 22 Discussion, interesting facts, and future directions
  • 43.  Our experiment covered multi-domain to multi-domain, multi- domain to specific domain and specific-domain to specific-domain dataset property alignment.  In every experiment, the extension based algorithm outperformed others (F measure). F measure gain is in the range of 57% to 78%.    09/06/2013 iSemantics 2013 22 Discussion, interesting facts, and future directions
  • 44.  Our experiment covered multi-domain to multi-domain, multi- domain to specific domain and specific-domain to specific-domain dataset property alignment.  In every experiment, the extension based algorithm outperformed others (F measure). F measure gain is in the range of 57% to 78%.  Some properties that are identified are intentionally different, e.g., db:distributor vs fb:production_companies. – This is because many companies produce and also distribute their films.   09/06/2013 iSemantics 2013 22 Discussion, interesting facts, and future directions
  • 45.  Our experiment covered multi-domain to multi-domain, multi- domain to specific domain and specific-domain to specific-domain dataset property alignment.  In every experiment, the extension based algorithm outperformed others (F measure). F measure gain is in the range of 57% to 78%.  Some properties that are identified are intentionally different, e.g., db:distributor vs fb:production_companies. – This is because many companies produce and also distribute their films.  Some identified pairs are incorrect due to errors in data modeling. – For example, db:issue and fb:children.  09/06/2013 iSemantics 2013 22 Discussion, interesting facts, and future directions
  • 46.  Our experiment covered multi-domain to multi-domain, multi- domain to specific domain and specific-domain to specific-domain dataset property alignment.  In every experiment, the extension based algorithm outperformed others (F measure). F measure gain is in the range of 57% to 78%.  Some properties that are identified are intentionally different, e.g., db:distributor vs fb:production_companies. – This is because many companies produce and also distribute their films.  Some identified pairs are incorrect due to errors in data modeling. – For example, db:issue and fb:children.  owl:sameAs linking issues in LOD (not linking exact same thing), e.g., linking London and Greater London. – We believe few misused links wont affect the algorithm as it decides on a match after analyzing many matches for a pair. 09/06/2013 iSemantics 2013 22 Discussion, interesting facts, and future directions
  • 47.  Less number of interlinks. – Evolve over time. – Look for possible other types of ECR links (i.e., rdf:seeAlso).   09/06/2013 iSemantics 2013 23
  • 48.  Less number of interlinks. – Evolve over time. – Look for possible other types of ECR links (i.e., rdf:seeAlso).  Properties do not have uniform distribution in a dataset. – Hence, some properties do not have enough matches or appearances. – This is due to rare classes and domains they belong to. – We can run the algorithm on instances that these less frequent properties appear iteratively.  09/06/2013 iSemantics 2013 23
  • 49.  Less number of interlinks. – Evolve over time. – Look for possible other types of ECR links (i.e., rdf:seeAlso).  Properties do not have uniform distribution in a dataset. – Hence, some properties do not have enough matches or appearances. – This is due to rare classes and domains they belong to. – We can run the algorithm on instances that these less frequent properties appear iteratively. Current limitations, – Requires ECR links – Requires overlapping datasets – Object-type properties – Inability to identify property – sub property relationships 09/06/2013 iSemantics 2013 23
  • 50. Background Statistical Equivalence of properties Evaluation Discussion, interesting facts, and future directions Conclusion 09/06/2013 24 Roadmap iSemantics 2013
  • 51. We approximate owl:equivalentProperty using Statistical Equivalence of properties by analyzing property extensions, which is schema independent.    09/06/2013 iSemantics 2013 25 Conclusion
  • 52. We approximate owl:equivalentProperty using Statistical Equivalence of properties by analyzing property extensions, which is schema independent. This novel extension based approach works well with interlinked datasets.   09/06/2013 iSemantics 2013 25 Conclusion
  • 53. We approximate owl:equivalentProperty using Statistical Equivalence of properties by analyzing property extensions, which is schema independent. This novel extension based approach works well with interlinked datasets. The extension based approach outperforms syntax or dictionary based approaches. F measure gain in the range of 57% - 78%.  09/06/2013 iSemantics 2013 25 Conclusion
  • 54. We approximate owl:equivalentProperty using Statistical Equivalence of properties by analyzing property extensions, which is schema independent. This novel extension based approach works well with interlinked datasets. The extension based approach outperforms syntax or dictionary based approaches. F measure gain in the range of 57% - 78%. It requires many comparisons, but can be easily parallelized evidenced by our Map-Reduce implementation. 09/06/2013 iSemantics 2013 25 Conclusion
  • 55. 26 Thank You http://knoesis.wright.edu/researchers/kalpa kalpa@knoesis.org Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA Questions ? 09/06/2013 iSemantics 2013

Editor's Notes

  1. k in some datasets are low because there are low appearing property pairs as well as popular property pairs. Because of this, sometimes lower k values as 2 produces better results but may be with low confidence.
  2. k in some datasets are low because there are low appearing property pairs as well as popular property pairs. Because of this, sometimes lower k values as 2 produces better results but may be with low confidence.
  3. k in some datasets are low because there are low appearing property pairs as well as popular property pairs. Because of this, sometimes lower k values as 2 produces better results but may be with low confidence.
  4. k in some datasets are low because there are low appearing property pairs as well as popular property pairs. Because of this, sometimes lower k values as 2 produces better results but may be with low confidence.
  5. k in some datasets are low because there are low appearing property pairs as well as popular property pairs. Because of this, sometimes lower k values as 2 produces better results but may be with low confidence.