SlideShare a Scribd company logo
1 of 32
A Scalable Approach to Learn Semantic
Models of Structured Sources
Mohsen Taheriyan
Craig Knoblock
Pedro Szekely
Jose Luis Ambite 8th IEEE International Conference on
Semantic Computing
Motivation
1
How to express the intended meaning of data?
Explicit semantics is missing in many of the
structured sources
creator? actor? rightsHolder?
artwork? movie? referenced entity?
Map the Source to the Domain Ontology
2
EDM: Europeana Data Model
SKOS: Simple Knowledge Organization System
FOAF: Friend of a Friend
AAC: American Art Collaborative
ElementsGr2: RDA Group 2 Elements
ORE: Open Archive Initiative
DCTerms: Dublin Core Metadata
Terms
Data Source: artworks in the Indianapolis Museum of Art
Domain ontologies
Semantic Model: a mapping from the source to the concepts
and relationships defined by the domain ontologies
Semantic Model
3
aac:CulturalHeritageObject edm:WebResourc
e
skos:Concept
aac:Person
edm:EuropeanaAggregation
dcterms:title
edm:aggregatedCHO
skos:prefLabel
ElementsGr2:
nameOfThePerson
rdf:type
edm:hasResource
dcterms:creator
edm:hasType
dcterms:description
Key ingredient to automate source discovery, data integration,
and publishing RDF triples
4
Problem:
How to automatically learn a
semantic model for a source
Main Idea
5
Sources in the same domain often have similar data
Exploit knowledge of known semantic models to
hypothesize a semantic model for a new sources
Previous Approach (ISWC 2013)
6
Input
Learn semantic types for attributes(s)
• Sample data from new source (S)
• Domain Ontologies (O)
• Known semantic models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output
• A ranked set of semantic models for S
Limitations
7
Input
Learn semantic types for attributes(s)
• Sample data from new source (S)
• Domain Ontologies (O)
• Known semantic models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output
• A ranked set of semantic models for S
Consider only one semantic type (label) for each attribute
Not scalable to sources with a large number of attributes
Contributions
8
Input
Learn semantic types for attributes(s)
• Sample data from new source (S)
• Domain Ontologies (O)
• Known semantic models
Build Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output
• A ranked set of semantic models for S
Consider K candidate semantic types per attribute
A Beam search algorithm to score and prune the mappings
Example
9
New source: Indianapolis Museum of Art
EDM SKOS FOAF AAC ElementsGr2ORE DCTermsDomain ontologies:
S1(title, creationDate, name, birthDate, deathDate, type)
Known Semantic Models:
S1: Dallas Museum
S2: The Metropolitan Museum of Art S2(name, copyright, materials, dimensions, imageUrl)
S(title, label, image, type, artist)
Goal: Semantic model for source S
Semantic model of S1 Semantic model of S2
• Sample data from new source (S)
Approach
10
Input
Learn semantic types for attributes(s)
• Domain Ontologies (O)
• Known semantic models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output
• A ranked set of semantic models for S
Learn Semantic Types
• A CRF-based machine learning technique to learn Semantic Types for each
attribute from its data
• Semantic Type
– Ontology Class: <class_uri>
– Data Property + Domain Class: <class_uri, property_uri>
• Pick top K semantic types according to their confidence values
11
New source: S(title, label, image, type, artist)
title <aac:CulturalHeritageObject, dcterms:title> 0.19
<aac:CulturalHeritageObject, rdfs:label> 0.08
label <aac:CulturalHeritageObject, dcterms:description> 0.7
<aac:Person, ElementsGr2:note> 0.03
image <edm:WebResource> 0.58
<foaf:Document> 0.41
type <skos:Concept, skos:prefLabel> 0.82
<skos:Concept, rdfs:label> 0.15
name <foaf:Person, foaf:name> 0.27
<aac:Person, ElementsGr2:nameOfThePerson> 0.19
• Sample data from new source (S)
Approach
12
Input
Learn semantic types for attributes(s)
• Domain Ontologies (O)
• Known semantic models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output
• A ranked set of semantic models for S
Build Graph G: Add Known Models
13
Build Graph G: Add Semantic Types
14
Build Graph G: Expand with Paths from
Ontologies
15
• Sample data from new source (S)
Approach
16
Input
Learn semantic types for attributes(s)
• Domain Ontologies (O)
• Known semantic models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output
• A ranked set of semantic models for S
Map Source Attributes to the Graph
17
New source: S(title, label, image, type, artist)
title <aac:CulturalHeritageObject, dcterms:title> <aac:CulturalHeritageObject, rdfs:label>
label <aac:CulturalHeritageObject, dcterms:description> <aac:Person, ElementsGr2:note>
image <edm:WebResource> <foaf:Document>
type <skos:Concept, skos:prefLabel> <skos:Concept, rdfs:label>
name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>
Map Source Attributes to the Graph
18
New source: S(title, label, image, type, artist)
title <aac:CulturalHeritageObject, dcterms:title> <aac:CulturalHeritageObject, rdfs:label>
label <aac:CulturalHeritageObject, dcterms:description> <aac:Person, ElementsGr2:note>
image <edm:WebResource> <foaf:Document>
type <skos:Concept, skos:prefLabel> <skos:Concept, rdfs:label>
name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>
Map Source Attributes to the Graph
19
New source: S(title, label, image, type, artist)
title <aac:CulturalHeritageObject, dcterms:title> <aac:CulturalHeritageObject, rdfs:label>
label <aac:CulturalHeritageObject, dcterms:description> <aac:Person, ElementsGr2:note>
image <edm:WebResource> <foaf:Document>
type <skos:Concept, skos:prefLabel> <skos:Concept, rdfs:label>
name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>
Scalability Issue
• Multiple mappings from attributes(S) to nodes of G
– Each attribute has more than one semantic type
– Multiple matches for each semantic type
• Not feasible to generate all possible mappings
– The size of graph may be large
– The source may have many attributes
• Exponential in terms of number of attributes
– N attributes, M match for each  MN mappings
20
Prune the Mappings
• Score the partial mappings after mapping each
attribute
– Coherence: number of nodes in a mapping that belong to
same component
– Confidence: average confidence of semantic types in m
– Score = arithmetic mean of coherence and size reduction
• Beam Search
– Keep only top K mappings after mapping each attribute
• Number of mappings will not exceed a fixed size after
mapping each attribute
21
Score the Mappings
22
New source: S(title, label, image, type, artist)
title <aac:CulturalHeritageObject, dcterms:title>, 0.19 <aac:CulturalHeritageObject, rdfs:label>
label <aac:CulturalHeritageObject, dcterms:description>, 0.7 <aac:Person, ElementsGr2:note>
image <edm:WebResource>, , 0.58 <foaf:Document>
type <skos:Concept, skos:prefLabel>, 0.82 <skos:Concept, rdfs:label>
name <foaf:Person, foaf:name>, 0.27 <aac:Person, ElementsGr2:nameOfThePerson>
Coherence: 4/9 = 0.44
Confidence: 0.56
Score: 0.5
Example Mapping 2
Score the Mappings
23
New source: S(title, label, image, type, artist)
title <aac:CulturalHeritageObject, dcterms:title>, 0.19 <aac:CulturalHeritageObject, rdfs:label>
label <aac:CulturalHeritageObject, dcterms:description>, 0.7 <aac:Person, ElementsGr2:note>
image <edm:WebResource>, , 0.58 <foaf:Document>
type <skos:Concept, skos:prefLabel>, 0.82 <skos:Concept, rdfs:label>
name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>, 0.19
Coherence: 6/9 = 0.66
Confidence: 0.55
Score: 0.605
Example Mapping 1
This mapping gets
higher score even
though it uses the 2nd
ranked semantic
type for artist
• Sample data from new source (S)
Approach
24
Input
Learn semantic types for attributes(s)
• Domain Ontologies (O)
• Known semantic models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and
V
Generate and rank semantic models
1
2
3
4
Output
• A ranked set of semantic models for S
Generate Semantic Models
• Select top M mappings
• Compute a Steiner tree for each mapping
– A minimal tree connecting nodes of mapping
• Each tree is a candidate model
• Rank candidate models (Steiner trees)
– Cost
– Score of the corresponding mapping
25
Steiner Tree
26
Evaluation
• Dataset
– 29 museum data sources
– 332 attributes (average 11 per source)
• Domain ontologies
– EDM ,SKOS, FOAF, ORE, ElementsGr2, AAC, DCTerms
– 119 classes, 351 properties
• Compute precision and recall between learned models
and correct models
27
precision =
rel(sm)Çrel(sm')
rel(sm')
recall =
rel(sm)Çrel(sm')
rel(sm)
How many of the learned
relationships are correct?
How many of the correct
relationships are learned?
Quality
28
k = 1  correct semantic type learned only for 62% of attributes
k = 4  correct semantic type was among the 4 learned types for 87% of attributes
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 4 8 12 16 20 24 28
Number of known semantic models
precision (k=1)
recall (k=1)
precision (k=4)
recall (k=4)
precision
(correct types)
recall (correct
types)
Performance
The previous approach was not able to learn semantic models for
sources with more than 4 attributes in the timeout of 1 hour
Example: S16 with only 5 attributes  16,633,298 mappings (29*29*29*31*22)
0
10
20
30
40
50
60
0 5 10 15 20 25 30
Time
Number of Attributes
Previous Approach
New Approach (Kbeam = 100)
Related Work
• Schema matching & schema mapping
– iMAP [Dhamankar et al., 2004], Clio [Fagin et al., 2009]
• Mapping databases and spreadsheets to ontologies
– Mapping languages: D2R [Bizer, 2003], D2RQ [Bizer and Seaborne, 2004], R2RML [Das et
al., 2012]
– Tools: RDOTE [Vavliakis et al., 2010], RDF123 [Han et al., 2008], XLWrap [Langegger
and Woß, 2009]
– String similarity between column names and ontology terms [Polfliet and Ichise, 2010]
• Understand semantics of Web tables
– Use column headers and cell values to find the labels and relations from a database of
labels and relations populated from the Web [Wang et al., 2012] [Limaye et al., 2010]
[Venetis et al., 2011]
• Exploit Linked Open Data (LOD)
– Link the values to the entities in LOD to find the types of the values and their
relationships [Muoz et al., 2013] [Mulwad et al., 2013]
• Learn Semantic Definitions of Online Information Sources [Carman,
Knoblock, 2007]
– Learns LAV rules from known sources
– Only learns descriptions that are conjunctive combinations of known descriptions
30
Future Work
• Scalability regarding number of the known models
– Create a more compact graph
– Consolidate overlapping segments of known models
• Leverage Linked Open Data (LOD)
– Exploit the relationships between instances
– Improve the accuracy of the learned relations
• Integrate the new approach in Karma
– http://www.isi.edu/integration/karma
– @KarmaSemWeb
31

More Related Content

Similar to A Scalable Approach to Learn Semantic Models of Structured Sources

Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLGábor Szárnyas
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXBenjamin Bengfort
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Bradley Allen
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!Arjen de Vries
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsJiaheng Lu
 
Spark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar itemsViet-Trung TRAN
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRoelof Pieters
 
Deduplication on large amounts of code
Deduplication on large amounts of codeDeduplication on large amounts of code
Deduplication on large amounts of codesource{d}
 
Contextless Object Recognition with Shape-enriched SIFT and Bags of Features
Contextless Object Recognition with Shape-enriched SIFT and Bags of FeaturesContextless Object Recognition with Shape-enriched SIFT and Bags of Features
Contextless Object Recognition with Shape-enriched SIFT and Bags of FeaturesUniversitat Politècnica de Catalunya
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanfordSakthivel C R
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
 
managing big data
managing big datamanaging big data
managing big dataSuveeksha
 
Bytewise approximate matching, searching and clustering
Bytewise approximate matching, searching and clusteringBytewise approximate matching, searching and clustering
Bytewise approximate matching, searching and clusteringLiwei Ren任力偉
 
Exploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queriesExploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queriesLuiz Henrique Zambom Santana
 
Government GraphSummit: Leveraging Graphs for AI and ML
Government GraphSummit: Leveraging Graphs for AI and MLGovernment GraphSummit: Leveraging Graphs for AI and ML
Government GraphSummit: Leveraging Graphs for AI and MLNeo4j
 

Similar to A Scalable Approach to Learn Semantic Models of Structured Sources (20)

Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing Paradigms
 
Spark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza Karimi
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
Deduplication on large amounts of code
Deduplication on large amounts of codeDeduplication on large amounts of code
Deduplication on large amounts of code
 
Contextless Object Recognition with Shape-enriched SIFT and Bags of Features
Contextless Object Recognition with Shape-enriched SIFT and Bags of FeaturesContextless Object Recognition with Shape-enriched SIFT and Bags of Features
Contextless Object Recognition with Shape-enriched SIFT and Bags of Features
 
GraphREL: A Relational Graph Query Processor
GraphREL: A Relational Graph Query ProcessorGraphREL: A Relational Graph Query Processor
GraphREL: A Relational Graph Query Processor
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
managing big data
managing big datamanaging big data
managing big data
 
Presentation shexer
Presentation shexerPresentation shexer
Presentation shexer
 
Bytewise approximate matching, searching and clustering
Bytewise approximate matching, searching and clusteringBytewise approximate matching, searching and clustering
Bytewise approximate matching, searching and clustering
 
Exploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queriesExploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queries
 
Government GraphSummit: Leveraging Graphs for AI and ML
Government GraphSummit: Leveraging Graphs for AI and MLGovernment GraphSummit: Leveraging Graphs for AI and ML
Government GraphSummit: Leveraging Graphs for AI and ML
 

Recently uploaded

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 

Recently uploaded (20)

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

A Scalable Approach to Learn Semantic Models of Structured Sources

  • 1. A Scalable Approach to Learn Semantic Models of Structured Sources Mohsen Taheriyan Craig Knoblock Pedro Szekely Jose Luis Ambite 8th IEEE International Conference on Semantic Computing
  • 2. Motivation 1 How to express the intended meaning of data? Explicit semantics is missing in many of the structured sources creator? actor? rightsHolder? artwork? movie? referenced entity?
  • 3. Map the Source to the Domain Ontology 2 EDM: Europeana Data Model SKOS: Simple Knowledge Organization System FOAF: Friend of a Friend AAC: American Art Collaborative ElementsGr2: RDA Group 2 Elements ORE: Open Archive Initiative DCTerms: Dublin Core Metadata Terms Data Source: artworks in the Indianapolis Museum of Art Domain ontologies Semantic Model: a mapping from the source to the concepts and relationships defined by the domain ontologies
  • 5. 4 Problem: How to automatically learn a semantic model for a source
  • 6. Main Idea 5 Sources in the same domain often have similar data Exploit knowledge of known semantic models to hypothesize a semantic model for a new sources
  • 7. Previous Approach (ISWC 2013) 6 Input Learn semantic types for attributes(s) • Sample data from new source (S) • Domain Ontologies (O) • Known semantic models Construct Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S
  • 8. Limitations 7 Input Learn semantic types for attributes(s) • Sample data from new source (S) • Domain Ontologies (O) • Known semantic models Construct Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S Consider only one semantic type (label) for each attribute Not scalable to sources with a large number of attributes
  • 9. Contributions 8 Input Learn semantic types for attributes(s) • Sample data from new source (S) • Domain Ontologies (O) • Known semantic models Build Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S Consider K candidate semantic types per attribute A Beam search algorithm to score and prune the mappings
  • 10. Example 9 New source: Indianapolis Museum of Art EDM SKOS FOAF AAC ElementsGr2ORE DCTermsDomain ontologies: S1(title, creationDate, name, birthDate, deathDate, type) Known Semantic Models: S1: Dallas Museum S2: The Metropolitan Museum of Art S2(name, copyright, materials, dimensions, imageUrl) S(title, label, image, type, artist) Goal: Semantic model for source S Semantic model of S1 Semantic model of S2
  • 11. • Sample data from new source (S) Approach 10 Input Learn semantic types for attributes(s) • Domain Ontologies (O) • Known semantic models Construct Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S
  • 12. Learn Semantic Types • A CRF-based machine learning technique to learn Semantic Types for each attribute from its data • Semantic Type – Ontology Class: <class_uri> – Data Property + Domain Class: <class_uri, property_uri> • Pick top K semantic types according to their confidence values 11 New source: S(title, label, image, type, artist) title <aac:CulturalHeritageObject, dcterms:title> 0.19 <aac:CulturalHeritageObject, rdfs:label> 0.08 label <aac:CulturalHeritageObject, dcterms:description> 0.7 <aac:Person, ElementsGr2:note> 0.03 image <edm:WebResource> 0.58 <foaf:Document> 0.41 type <skos:Concept, skos:prefLabel> 0.82 <skos:Concept, rdfs:label> 0.15 name <foaf:Person, foaf:name> 0.27 <aac:Person, ElementsGr2:nameOfThePerson> 0.19
  • 13. • Sample data from new source (S) Approach 12 Input Learn semantic types for attributes(s) • Domain Ontologies (O) • Known semantic models Construct Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S
  • 14. Build Graph G: Add Known Models 13
  • 15. Build Graph G: Add Semantic Types 14
  • 16. Build Graph G: Expand with Paths from Ontologies 15
  • 17. • Sample data from new source (S) Approach 16 Input Learn semantic types for attributes(s) • Domain Ontologies (O) • Known semantic models Construct Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S
  • 18. Map Source Attributes to the Graph 17 New source: S(title, label, image, type, artist) title <aac:CulturalHeritageObject, dcterms:title> <aac:CulturalHeritageObject, rdfs:label> label <aac:CulturalHeritageObject, dcterms:description> <aac:Person, ElementsGr2:note> image <edm:WebResource> <foaf:Document> type <skos:Concept, skos:prefLabel> <skos:Concept, rdfs:label> name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>
  • 19. Map Source Attributes to the Graph 18 New source: S(title, label, image, type, artist) title <aac:CulturalHeritageObject, dcterms:title> <aac:CulturalHeritageObject, rdfs:label> label <aac:CulturalHeritageObject, dcterms:description> <aac:Person, ElementsGr2:note> image <edm:WebResource> <foaf:Document> type <skos:Concept, skos:prefLabel> <skos:Concept, rdfs:label> name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>
  • 20. Map Source Attributes to the Graph 19 New source: S(title, label, image, type, artist) title <aac:CulturalHeritageObject, dcterms:title> <aac:CulturalHeritageObject, rdfs:label> label <aac:CulturalHeritageObject, dcterms:description> <aac:Person, ElementsGr2:note> image <edm:WebResource> <foaf:Document> type <skos:Concept, skos:prefLabel> <skos:Concept, rdfs:label> name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>
  • 21. Scalability Issue • Multiple mappings from attributes(S) to nodes of G – Each attribute has more than one semantic type – Multiple matches for each semantic type • Not feasible to generate all possible mappings – The size of graph may be large – The source may have many attributes • Exponential in terms of number of attributes – N attributes, M match for each  MN mappings 20
  • 22. Prune the Mappings • Score the partial mappings after mapping each attribute – Coherence: number of nodes in a mapping that belong to same component – Confidence: average confidence of semantic types in m – Score = arithmetic mean of coherence and size reduction • Beam Search – Keep only top K mappings after mapping each attribute • Number of mappings will not exceed a fixed size after mapping each attribute 21
  • 23. Score the Mappings 22 New source: S(title, label, image, type, artist) title <aac:CulturalHeritageObject, dcterms:title>, 0.19 <aac:CulturalHeritageObject, rdfs:label> label <aac:CulturalHeritageObject, dcterms:description>, 0.7 <aac:Person, ElementsGr2:note> image <edm:WebResource>, , 0.58 <foaf:Document> type <skos:Concept, skos:prefLabel>, 0.82 <skos:Concept, rdfs:label> name <foaf:Person, foaf:name>, 0.27 <aac:Person, ElementsGr2:nameOfThePerson> Coherence: 4/9 = 0.44 Confidence: 0.56 Score: 0.5 Example Mapping 2
  • 24. Score the Mappings 23 New source: S(title, label, image, type, artist) title <aac:CulturalHeritageObject, dcterms:title>, 0.19 <aac:CulturalHeritageObject, rdfs:label> label <aac:CulturalHeritageObject, dcterms:description>, 0.7 <aac:Person, ElementsGr2:note> image <edm:WebResource>, , 0.58 <foaf:Document> type <skos:Concept, skos:prefLabel>, 0.82 <skos:Concept, rdfs:label> name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>, 0.19 Coherence: 6/9 = 0.66 Confidence: 0.55 Score: 0.605 Example Mapping 1 This mapping gets higher score even though it uses the 2nd ranked semantic type for artist
  • 25. • Sample data from new source (S) Approach 24 Input Learn semantic types for attributes(s) • Domain Ontologies (O) • Known semantic models Construct Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S
  • 26. Generate Semantic Models • Select top M mappings • Compute a Steiner tree for each mapping – A minimal tree connecting nodes of mapping • Each tree is a candidate model • Rank candidate models (Steiner trees) – Cost – Score of the corresponding mapping 25
  • 28. Evaluation • Dataset – 29 museum data sources – 332 attributes (average 11 per source) • Domain ontologies – EDM ,SKOS, FOAF, ORE, ElementsGr2, AAC, DCTerms – 119 classes, 351 properties • Compute precision and recall between learned models and correct models 27 precision = rel(sm)Çrel(sm') rel(sm') recall = rel(sm)Çrel(sm') rel(sm) How many of the learned relationships are correct? How many of the correct relationships are learned?
  • 29. Quality 28 k = 1  correct semantic type learned only for 62% of attributes k = 4  correct semantic type was among the 4 learned types for 87% of attributes 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 4 8 12 16 20 24 28 Number of known semantic models precision (k=1) recall (k=1) precision (k=4) recall (k=4) precision (correct types) recall (correct types)
  • 30. Performance The previous approach was not able to learn semantic models for sources with more than 4 attributes in the timeout of 1 hour Example: S16 with only 5 attributes  16,633,298 mappings (29*29*29*31*22) 0 10 20 30 40 50 60 0 5 10 15 20 25 30 Time Number of Attributes Previous Approach New Approach (Kbeam = 100)
  • 31. Related Work • Schema matching & schema mapping – iMAP [Dhamankar et al., 2004], Clio [Fagin et al., 2009] • Mapping databases and spreadsheets to ontologies – Mapping languages: D2R [Bizer, 2003], D2RQ [Bizer and Seaborne, 2004], R2RML [Das et al., 2012] – Tools: RDOTE [Vavliakis et al., 2010], RDF123 [Han et al., 2008], XLWrap [Langegger and Woß, 2009] – String similarity between column names and ontology terms [Polfliet and Ichise, 2010] • Understand semantics of Web tables – Use column headers and cell values to find the labels and relations from a database of labels and relations populated from the Web [Wang et al., 2012] [Limaye et al., 2010] [Venetis et al., 2011] • Exploit Linked Open Data (LOD) – Link the values to the entities in LOD to find the types of the values and their relationships [Muoz et al., 2013] [Mulwad et al., 2013] • Learn Semantic Definitions of Online Information Sources [Carman, Knoblock, 2007] – Learns LAV rules from known sources – Only learns descriptions that are conjunctive combinations of known descriptions 30
  • 32. Future Work • Scalability regarding number of the known models – Create a more compact graph – Consolidate overlapping segments of known models • Leverage Linked Open Data (LOD) – Exploit the relationships between instances – Improve the accuracy of the learned relations • Integrate the new approach in Karma – http://www.isi.edu/integration/karma – @KarmaSemWeb 31

Editor's Notes

  1. http://www.metmuseum.org/collection/the-collection-online/search?deptids=1&amp;pg=1&amp;ft=french&amp;od=on&amp;ao=on&amp;noqs=true http://americanart.si.edu/collections/search/artwork/results/index.cfm?rows=10&fq=online_media_type%3A%22Images%22&q=Search+by+Artist%2C+Work%2C+or+Keyword&page=1&start=0&x=62&y=5 http://museum.dma.org:9090/emuseum/view/objects/asitem/2031/0/title-desc?t:state:flow=ff495581-e0d2-4334-bec9-19988f9eeb20 http://www.mfah.org/art/100-highlights/commemorative-head-king/ Leverage relationships in known semantic models to hypothesize relationships for new sources
  2. confidence = (0.19+0.7+0.58+0.82+0.27)/5
  3. coherence=number of nodes in same component / total number of nodes in mapping confidence = (0.19+0.7+0.58+0.82+0.19)/5 u (maximum number of nodes in mapping) = 2 * n (#attributes) = 10 l (minimum number of nodes in mapping) = n+1 = 6 size reduction = [u - size(mapping)] / [u - l + 1] = (10 - 8) / (10 - 6 + 1) = 2 / 5 = 0.4
  4. An et al. [An et al., 2007] generate declarative mapping expressions between two tables with different schemas starting from element correspondences. They create a graph from the conceptual model (CM) of each schema and then suggest plausible mappings by exploring low-cost Steiner trees that connect those nodes in the CM graph that have attributes participating in element correspondences. Known: Table 1, CM graph 1, marked nodes (nodes in CM1 corresponding to columns), s-tree1: a semantic tree that expresses the correct semantic of Table1 by connecting its marked nodes in CM1 Known: Table 2, CM graph 2, marked nodes (nodes in CM2 corresponding to columns), s-tree2: a semantic tree that expresses the correct semantic of Table2 by connecting its marked nodes in CM2 Goal: Find a mapping from Table1 to Table2 (a subgraph of CM1, called CSG1, to a subgraph of CM2, called CSG2 Method: If CSG2 is known (e.g., it is the s-tree2), find the Steiner tree in CM1 connecting marked nodes, preferring the edges from s-tree1 Strong assumption: we know semantic of each table (s-trees) Use-case example: Table 1 has 10 columns with a large s-tree, and table 2 has only 3 columns. This approach finds a minimal tree in CM1 (maximum overlap with s-tree1) that connects the marked nodes of table1 corresponding to the marked nodes of table2. ======================================================================================= Schema matching Finds correspondence between elements of the source and target schemas Example: iMAP [Dhamankar et al., 2004] Schema mapping Generate declarative mappings expressible as queries in SQL or Datalog Example: Clio [Fagin et al., 2009] Semantic annotation of Web services Languages: SAWSDL [Farrell and Lausen, 2007] Tools: SWEET [Maleshkova et al., 2009] Annotate input and output parameters [Heß et al., 2003] [Lerman et al., 2006] [Saquicela e al., 2011]