SlideShare a Scribd company logo
1 of 16
Download to read offline
Using Substitutive Itemset Mining Framework for
Finding Synonymous Properties in Linked Data
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski
Poznan University of Technology, Poland
August 3rd, 2015
RuleML 2015
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Outline
Motivating Scenario
Substitutive Sets Mining
Finding Synonymous Properties with Substitutive Sets Mining
Summary
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Motivating scenario 1/3
Wiki	
  	
  
(mappings	
  	
  
Wikipedia	
  infoboxes	
  -­‐>	
  
DBpedia	
  ontology)	
  
Norah_Jones	
  
	
  
Denton,_Texas	
  
dbpedia-­‐prop:origin	
  
Mark_Knopfler	
   Gosforth	
  
dbpedia-­‐owl:hometown	
  
Peter_Gabriel	
   Godalming	
  
dbpedia-­‐owl:hometown	
  
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Motivating scenario 2/3
DBpedia 2014 ontology has 1310 object and 1725 data properties
Many large Linked Data use relatively lightweight schemas with a
high number of object properties
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Motivating scenario 3/3
Wiki	
  	
  
(mappings	
  	
  
Wikipedia	
  infoboxes	
  -­‐>	
  
DBpedia	
  ontology)	
  
dbpedia-­‐owl:MusicalArAst	
  	
   dbpedia-­‐owl:PopulatedPlace	
  	
  
Norah_Jones	
  
	
  
Denton,_Texas	
  
dbpedia-­‐prop:origin	
  
Mark_Knopfler	
   Gosforth	
  
dbpedia-­‐owl:hometown	
  
Peter_Gabriel	
   Godalming	
  
dbpedia-­‐owl:hometown	
  
dbpedia-­‐owl:MusicalArAst	
  	
   dbpedia-­‐owl:PopulatedPlace	
  	
  
dbpedia-­‐owl:MusicalArAst	
  	
   dbpedia-­‐owl:PopulatedPlace	
  	
  
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Substitutive Sets Mining Framework
Frequent(
Itemset(Mining(
Subs1tu1ve(Set(
Genera1on(
Transac1on(DB(
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Frequent Itemsets
I = {i1,i2,...,im} - a set of items
DT = {t1,t2,...,tn}, where ∀iti ⊆ I -a database of transactions
support(X) =
{t∈DT X⊆t}
DT
ID Items
1 Nachos, Pepsi, Salsa
2 Nachos, Coca-Cola, Salsa
3 Nachos, Coca-Cola
4 Nachos, Pepsi, Salsa
5 Milk, Bread
Frequent Itemset Support
{Nachos} 80%
{Salsa} 60%
{Coca-Cola} 40%
{Pepsi} 40%
{Nachos, Salsa} 60%
{Nachos, Coca-Cola} 40%
{Nachos, Pepsi} 40%
{Salsa, Pepsi} 40%
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Covering Set
CS(i L) = {X ∈ L {i} ∪ X ∈ L}
coverage(i L) = CS(i L)
Frequent Itemset
{Nachos}
{Salsa}
{Coca-Cola}
{Pepsi}
{Nachos, Salsa}
{Nachos, Coca-Cola}
{Nachos, Pepsi}
{Salsa, Pepsi}
i CS(i) coverage
{Nachos} {{Salsa}, {Coca-Cola}, {Pepsi}} 3
{Salsa} {{Nachos}} 1
{Coca-Cola} {{Nachos}} 1
{Pepsi} {{Nachos}, {Salsa}} 2
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Substitutive Sets
A two-element itemset {x,y} is a substitutive itemset, if:
x ∈ L1,
y ∈ L1,
support({x} ∪ {y}) < ε, where ε is a user-defined threshold
representing the highest amount of noise in the data allowed,
CS(x L)∩CS(y L)
max{ CS L(x) , CS(y L) } ⩾ mincommon.
i CS(i) coverage
{Nachos} {{Salsa}, {Coca-Cola}, {Pepsi}} 3
{Salsa} {{Nachos}} 1
{Coca-Cola} {{Nachos}} 1
{Pepsi} {{Nachos}, {Salsa}} 2
CS(Pepsi)∩CS(Coca−Cola)
max{ CS(Pepsi) , CS(Coca−Cola) }
= 0.5
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Create Substitutive Sets RapidMiner operator
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Use Case: DBpedia
DBpedia knowledge base version 2014
sets of 3–item transactions {c1,p,c2}, where c1 and c2 classes of
subject and object of RDF triple, and p property connecting s and o
SELECT ?c1 ?p ?c2
WHERE {
?s rdf:type dbpedia-owl:Organization .
?s ?p ?o .
?s rdf:type ?c1 .
?o rdf:type ?c2 .
FILTER(?p != dbpedia-owl:wikiPageWikiLink) .
FILTER(?p != rdf:type) .
FILTER(?p != dbpedia-owl:wikiPageExternalLink) .
FILTER(?p != dbpedia-owl:wikiPageID) .
FILTER(?p != dbpedia-owl:wikiPageInterLanguageLink) .
FILTER(?p != dbpedia-owl:wikiPageLength) .
FILTER(?p != dbpedia-owl:wikiPageOutDegree) .
FILTER(?p != dbpedia-owl:wikiPageRedirects) .
FILTER(?p != dbpedia-owl:wikiPageRevisionID)
}
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Transaction generation
dbpedia'owl:MusicalAr2st44 dbpedia'owl:PopulatedPlace44
Norah_Jones4
4
Denton,_Texas4
dbpedia'prop:origin4
Mark_Knopfler4 Gosforth4
dbpedia'owl:hometown4
dbpedia'owl:MusicalAr2st44 dbpedia'owl:PopulatedPlace44
s4
s4 p4
p4 o4
o4
c14
c14
c24
c24
Transactions
{c1 dbpedia-owl:MusicalArtist, dbpedia-owl:hometown, c2 dbpedia-owl:PopulatedPlace }
{c1 dbpedia-owl:MusicalArtist, dbpedia-prop:origin , c2 dbpedia-owl:PopulatedPlace }
...
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Experimental Setup
FP-Growth: min number of itemsets = 500, max number of retries =
15, min support: 1.0E-4),
Create Substitutive Sets: min support = 1.0E-4, min common
= 0.7, epsilon =1.0E-5,
a sample of 100k results per each query,
desktop computer with 12GB RAM and CPU Intel(R) Core(TM)
i5-4570 3.20GHz,
a single run of mining substitutive sets (for a single class and 100k
transactions) took several seconds on average (ranging from 2s to
12s)
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Sample substitutive properties for the class Organisation
Item X Item Y Common
dbpprop:parentOrganization dbo:parentOrganisation 1.000
dbpprop:owner dbo:owner 1.000
dbpprop:origin dbo:hometown 1.000
dbpprop:headquarters dbpprop:parentOrganization 1.000
dbpprop:formerAffiliations dbo:formerBroadcastNetwork 1.000
dbo:product dbpprop:products 1.000
dbpprop:keyPeople dbo:keyPerson 0.910
dbpprop:commandStructure dbpprop:branch 0.857
dbo:schoolPatron dbo:foundedBy 0.835
dbpprop:notableCommanders dbo:notableCommander 0.824
dbo:recordLabel dbpprop:label 0.803
dbo:headquarter dbo:locationCountry 0.803
dbpprop:country dbo:state 0.753
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Summary
Introduced a model for substitutive itemsets mining
Preliminary tests of this model within the task of deduplication of
object properties in an RDF dataset (DBpedia)
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15
Acknowledgements
Foundation for Polish Science under the POMOST programme,
cofinanced from European Union, Regional Development Fund (No
POMOST/2013-7/8) (2013-2015)
EU FP7 ICT-2007.4.4 (No 231519) ”e-LICO: An e-Laboratory for
Interdisciplinary Collaborative Research in Data Mining and
Data-Intensive Science” (2009-2012)
Thanks to Ewa Kowalczuk for debugging the RapidMiner plugin
Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked
August 3rd, 2015 RuleML 20
/ 15

More Related Content

What's hot

Learning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingLearning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingVrije Universiteit Amsterdam
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013Luis Daniel Ibáñez
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...Paolo Missier
 
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A CloudScalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud Paolo Missier
 

What's hot (6)

Oshs_9_11_2015
Oshs_9_11_2015Oshs_9_11_2015
Oshs_9_11_2015
 
Learning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingLearning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic Programming
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A CloudScalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
 
Streams&amp;io
Streams&amp;ioStreams&amp;io
Streams&amp;io
 

Viewers also liked

Semantic data mining: an ontology based approach
Semantic data mining: an ontology based approachSemantic data mining: an ontology based approach
Semantic data mining: an ontology based approachAgnieszka Ławrynowicz
 
Semantic Meta-Mining of Knowledge Discovery Processes
Semantic Meta-Mining of Knowledge Discovery ProcessesSemantic Meta-Mining of Knowledge Discovery Processes
Semantic Meta-Mining of Knowledge Discovery ProcessesAgnieszka Ławrynowicz
 
Hazardous Situation Ontology Design Pattern
Hazardous Situation Ontology Design Pattern Hazardous Situation Ontology Design Pattern
Hazardous Situation Ontology Design Pattern Agnieszka Ławrynowicz
 
Data Mining OPtimization Ontology and its application to meta-mining of knowl...
Data Mining OPtimization Ontology and its application to meta-mining of knowl...Data Mining OPtimization Ontology and its application to meta-mining of knowl...
Data Mining OPtimization Ontology and its application to meta-mining of knowl...Agnieszka Ławrynowicz
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 

Viewers also liked (9)

Semantic data mining: an ontology based approach
Semantic data mining: an ontology based approachSemantic data mining: an ontology based approach
Semantic data mining: an ontology based approach
 
Semantic Meta-Mining of Knowledge Discovery Processes
Semantic Meta-Mining of Knowledge Discovery ProcessesSemantic Meta-Mining of Knowledge Discovery Processes
Semantic Meta-Mining of Knowledge Discovery Processes
 
ML Schema: Machine Learning Schema
ML Schema: Machine Learning SchemaML Schema: Machine Learning Schema
ML Schema: Machine Learning Schema
 
Hazardous Situation Ontology Design Pattern
Hazardous Situation Ontology Design Pattern Hazardous Situation Ontology Design Pattern
Hazardous Situation Ontology Design Pattern
 
ZTG 2013 Agnieszka Ławrynowicz
ZTG 2013 Agnieszka ŁawrynowiczZTG 2013 Agnieszka Ławrynowicz
ZTG 2013 Agnieszka Ławrynowicz
 
Data Mining OPtimization Ontology and its application to meta-mining of knowl...
Data Mining OPtimization Ontology and its application to meta-mining of knowl...Data Mining OPtimization Ontology and its application to meta-mining of knowl...
Data Mining OPtimization Ontology and its application to meta-mining of knowl...
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 

Similar to Finding Synonymous Properties in Linked Data Using Substitutive Itemset Mining

Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesIan Foster
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinalDeborah McGuinness
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresThomas Gottron
 
Inductive Entity Typing Alignment
Inductive Entity Typing AlignmentInductive Entity Typing Alignment
Inductive Entity Typing AlignmentGiuseppe Rizzo
 
Bio it 2005_rdf_workshop05
Bio it 2005_rdf_workshop05Bio it 2005_rdf_workshop05
Bio it 2005_rdf_workshop05Joanne Luciano
 
Ontology based clustering algorithms
Ontology based clustering algorithmsOntology based clustering algorithms
Ontology based clustering algorithmsIkutwa
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of DataRinke Hoekstra
 
Kid171 chap02 english version
Kid171 chap02 english versionKid171 chap02 english version
Kid171 chap02 english versionFrank S.C. Tseng
 
La résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesLa résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesData2B
 
Linked Open Data (LOD) part 1
Linked Open Data (LOD) part 1Linked Open Data (LOD) part 1
Linked Open Data (LOD) part 1IPLODProject
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
 
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...Nikolaos Konstantinou
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Maulik Kamdar
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)Frank van Harmelen
 
Online Relation Alignment for Linked Datasets
Online Relation Alignment for Linked DatasetsOnline Relation Alignment for Linked Datasets
Online Relation Alignment for Linked DatasetsMaria Koutraki
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudDhaval Thakker
 

Similar to Finding Synonymous Properties in Linked Data Using Substitutive Itemset Mining (20)

Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index Structures
 
Inductive Entity Typing Alignment
Inductive Entity Typing AlignmentInductive Entity Typing Alignment
Inductive Entity Typing Alignment
 
Bio it 2005_rdf_workshop05
Bio it 2005_rdf_workshop05Bio it 2005_rdf_workshop05
Bio it 2005_rdf_workshop05
 
Ontology based clustering algorithms
Ontology based clustering algorithmsOntology based clustering algorithms
Ontology based clustering algorithms
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
Kid171 chap02 english version
Kid171 chap02 english versionKid171 chap02 english version
Kid171 chap02 english version
 
La résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesLa résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphes
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Linked Open Data (LOD) part 1
Linked Open Data (LOD) part 1Linked Open Data (LOD) part 1
Linked Open Data (LOD) part 1
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
 
ConQueSt
ConQueStConQueSt
ConQueSt
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Online Relation Alignment for Linked Datasets
Online Relation Alignment for Linked DatasetsOnline Relation Alignment for Linked Datasets
Online Relation Alignment for Linked Datasets
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 

Recently uploaded

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 

Recently uploaded (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 

Finding Synonymous Properties in Linked Data Using Substitutive Itemset Mining

  • 1. Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked Data Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski Poznan University of Technology, Poland August 3rd, 2015 RuleML 2015 Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 2. Outline Motivating Scenario Substitutive Sets Mining Finding Synonymous Properties with Substitutive Sets Mining Summary Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 3. Motivating scenario 1/3 Wiki     (mappings     Wikipedia  infoboxes  -­‐>   DBpedia  ontology)   Norah_Jones     Denton,_Texas   dbpedia-­‐prop:origin   Mark_Knopfler   Gosforth   dbpedia-­‐owl:hometown   Peter_Gabriel   Godalming   dbpedia-­‐owl:hometown   Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 4. Motivating scenario 2/3 DBpedia 2014 ontology has 1310 object and 1725 data properties Many large Linked Data use relatively lightweight schemas with a high number of object properties Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 5. Motivating scenario 3/3 Wiki     (mappings     Wikipedia  infoboxes  -­‐>   DBpedia  ontology)   dbpedia-­‐owl:MusicalArAst     dbpedia-­‐owl:PopulatedPlace     Norah_Jones     Denton,_Texas   dbpedia-­‐prop:origin   Mark_Knopfler   Gosforth   dbpedia-­‐owl:hometown   Peter_Gabriel   Godalming   dbpedia-­‐owl:hometown   dbpedia-­‐owl:MusicalArAst     dbpedia-­‐owl:PopulatedPlace     dbpedia-­‐owl:MusicalArAst     dbpedia-­‐owl:PopulatedPlace     Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 6. Substitutive Sets Mining Framework Frequent( Itemset(Mining( Subs1tu1ve(Set( Genera1on( Transac1on(DB( Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 7. Frequent Itemsets I = {i1,i2,...,im} - a set of items DT = {t1,t2,...,tn}, where ∀iti ⊆ I -a database of transactions support(X) = {t∈DT X⊆t} DT ID Items 1 Nachos, Pepsi, Salsa 2 Nachos, Coca-Cola, Salsa 3 Nachos, Coca-Cola 4 Nachos, Pepsi, Salsa 5 Milk, Bread Frequent Itemset Support {Nachos} 80% {Salsa} 60% {Coca-Cola} 40% {Pepsi} 40% {Nachos, Salsa} 60% {Nachos, Coca-Cola} 40% {Nachos, Pepsi} 40% {Salsa, Pepsi} 40% Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 8. Covering Set CS(i L) = {X ∈ L {i} ∪ X ∈ L} coverage(i L) = CS(i L) Frequent Itemset {Nachos} {Salsa} {Coca-Cola} {Pepsi} {Nachos, Salsa} {Nachos, Coca-Cola} {Nachos, Pepsi} {Salsa, Pepsi} i CS(i) coverage {Nachos} {{Salsa}, {Coca-Cola}, {Pepsi}} 3 {Salsa} {{Nachos}} 1 {Coca-Cola} {{Nachos}} 1 {Pepsi} {{Nachos}, {Salsa}} 2 Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 9. Substitutive Sets A two-element itemset {x,y} is a substitutive itemset, if: x ∈ L1, y ∈ L1, support({x} ∪ {y}) < ε, where ε is a user-defined threshold representing the highest amount of noise in the data allowed, CS(x L)∩CS(y L) max{ CS L(x) , CS(y L) } ⩾ mincommon. i CS(i) coverage {Nachos} {{Salsa}, {Coca-Cola}, {Pepsi}} 3 {Salsa} {{Nachos}} 1 {Coca-Cola} {{Nachos}} 1 {Pepsi} {{Nachos}, {Salsa}} 2 CS(Pepsi)∩CS(Coca−Cola) max{ CS(Pepsi) , CS(Coca−Cola) } = 0.5 Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 10. Create Substitutive Sets RapidMiner operator Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 11. Use Case: DBpedia DBpedia knowledge base version 2014 sets of 3–item transactions {c1,p,c2}, where c1 and c2 classes of subject and object of RDF triple, and p property connecting s and o SELECT ?c1 ?p ?c2 WHERE { ?s rdf:type dbpedia-owl:Organization . ?s ?p ?o . ?s rdf:type ?c1 . ?o rdf:type ?c2 . FILTER(?p != dbpedia-owl:wikiPageWikiLink) . FILTER(?p != rdf:type) . FILTER(?p != dbpedia-owl:wikiPageExternalLink) . FILTER(?p != dbpedia-owl:wikiPageID) . FILTER(?p != dbpedia-owl:wikiPageInterLanguageLink) . FILTER(?p != dbpedia-owl:wikiPageLength) . FILTER(?p != dbpedia-owl:wikiPageOutDegree) . FILTER(?p != dbpedia-owl:wikiPageRedirects) . FILTER(?p != dbpedia-owl:wikiPageRevisionID) } Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 12. Transaction generation dbpedia'owl:MusicalAr2st44 dbpedia'owl:PopulatedPlace44 Norah_Jones4 4 Denton,_Texas4 dbpedia'prop:origin4 Mark_Knopfler4 Gosforth4 dbpedia'owl:hometown4 dbpedia'owl:MusicalAr2st44 dbpedia'owl:PopulatedPlace44 s4 s4 p4 p4 o4 o4 c14 c14 c24 c24 Transactions {c1 dbpedia-owl:MusicalArtist, dbpedia-owl:hometown, c2 dbpedia-owl:PopulatedPlace } {c1 dbpedia-owl:MusicalArtist, dbpedia-prop:origin , c2 dbpedia-owl:PopulatedPlace } ... Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 13. Experimental Setup FP-Growth: min number of itemsets = 500, max number of retries = 15, min support: 1.0E-4), Create Substitutive Sets: min support = 1.0E-4, min common = 0.7, epsilon =1.0E-5, a sample of 100k results per each query, desktop computer with 12GB RAM and CPU Intel(R) Core(TM) i5-4570 3.20GHz, a single run of mining substitutive sets (for a single class and 100k transactions) took several seconds on average (ranging from 2s to 12s) Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 14. Sample substitutive properties for the class Organisation Item X Item Y Common dbpprop:parentOrganization dbo:parentOrganisation 1.000 dbpprop:owner dbo:owner 1.000 dbpprop:origin dbo:hometown 1.000 dbpprop:headquarters dbpprop:parentOrganization 1.000 dbpprop:formerAffiliations dbo:formerBroadcastNetwork 1.000 dbo:product dbpprop:products 1.000 dbpprop:keyPeople dbo:keyPerson 0.910 dbpprop:commandStructure dbpprop:branch 0.857 dbo:schoolPatron dbo:foundedBy 0.835 dbpprop:notableCommanders dbo:notableCommander 0.824 dbo:recordLabel dbpprop:label 0.803 dbo:headquarter dbo:locationCountry 0.803 dbpprop:country dbo:state 0.753 Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 15. Summary Introduced a model for substitutive itemsets mining Preliminary tests of this model within the task of deduplication of object properties in an RDF dataset (DBpedia) Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15
  • 16. Acknowledgements Foundation for Polish Science under the POMOST programme, cofinanced from European Union, Regional Development Fund (No POMOST/2013-7/8) (2013-2015) EU FP7 ICT-2007.4.4 (No 231519) ”e-LICO: An e-Laboratory for Interdisciplinary Collaborative Research in Data Mining and Data-Intensive Science” (2009-2012) Thanks to Ewa Kowalczuk for debugging the RapidMiner plugin Mikolaj Morzy, Agnieszka Lawrynowicz, Mateusz Zozulinski ( Poznan University of Technology, Poland )Using Substitutive Itemset Mining Framework for Finding Synonymous Properties in Linked August 3rd, 2015 RuleML 20 / 15