A Scalable Approach to Learn Semantic Models of Structured Sources

1,230 views

Published on

Semantic models of data sources describe the meaning of the data in terms of the concepts and relationships defined by a domain ontology. Building such models is an important step toward integrating data from different sources, where we need to provide the user with a unified view of underlying sources. In this paper, we present a scalable approach to automatically learn semantic models of a structured data source by exploiting the knowledge of previously modeled sources. Our evaluation shows that the approach generates expressive semantic models with minimal user input, and it is scalable to large ontologies and data sources with many attributes.

Published in: Science, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,230
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • http://www.metmuseum.org/collection/the-collection-online/search?deptids=1&pg=1&ft=french&od=on&ao=on&noqs=true
    http://americanart.si.edu/collections/search/artwork/results/index.cfm?rows=10&fq=online_media_type%3A%22Images%22&q=Search+by+Artist%2C+Work%2C+or+Keyword&page=1&start=0&x=62&y=5
    http://museum.dma.org:9090/emuseum/view/objects/asitem/2031/0/title-desc?t:state:flow=ff495581-e0d2-4334-bec9-19988f9eeb20
    http://www.mfah.org/art/100-highlights/commemorative-head-king/


    Leverage relationships in known semantic models to hypothesize relationships for new sources

  • confidence = (0.19+0.7+0.58+0.82+0.27)/5
  • coherence=number of nodes in same component / total number of nodes in mapping
    confidence = (0.19+0.7+0.58+0.82+0.19)/5
    u (maximum number of nodes in mapping) = 2 * n (#attributes) = 10
    l (minimum number of nodes in mapping) = n+1 = 6
    size reduction = [u - size(mapping)] / [u - l + 1] = (10 - 8) / (10 - 6 + 1) = 2 / 5 = 0.4
  • An et al. [An et al., 2007] generate declarative mapping expressions between two tables with different schemas starting from element correspondences. They create a graph from the conceptual model (CM) of each schema and then suggest plausible mappings by exploring low-cost Steiner trees that connect those nodes in the CM graph that have attributes participating in element correspondences.
    Known: Table 1, CM graph 1, marked nodes (nodes in CM1 corresponding to columns), s-tree1: a semantic tree that expresses the correct semantic of Table1 by connecting its marked nodes in CM1
    Known: Table 2, CM graph 2, marked nodes (nodes in CM2 corresponding to columns), s-tree2: a semantic tree that expresses the correct semantic of Table2 by connecting its marked nodes in CM2
    Goal: Find a mapping from Table1 to Table2 (a subgraph of CM1, called CSG1, to a subgraph of CM2, called CSG2
    Method: If CSG2 is known (e.g., it is the s-tree2), find the Steiner tree in CM1 connecting marked nodes, preferring the edges from s-tree1

    Strong assumption: we know semantic of each table (s-trees)
    Use-case example: Table 1 has 10 columns with a large s-tree, and table 2 has only 3 columns. This approach finds a minimal tree in CM1 (maximum overlap with s-tree1) that connects the marked nodes of table1 corresponding to the marked nodes of table2.


    =======================================================================================

    Schema matching
    Finds correspondence between elements of the source and target schemas
    Example: iMAP [Dhamankar et al., 2004]
    Schema mapping
    Generate declarative mappings expressible as queries in SQL or Datalog
    Example: Clio [Fagin et al., 2009]
    Semantic annotation of Web services
    Languages: SAWSDL [Farrell and Lausen, 2007]
    Tools: SWEET [Maleshkova et al., 2009]
    Annotate input and output parameters [Heß et al., 2003] [Lerman et al., 2006] [Saquicela e al., 2011]


  • A Scalable Approach to Learn Semantic Models of Structured Sources

    1. 1. A Scalable Approach to Learn Semantic Models of Structured Sources Mohsen Taheriyan Craig Knoblock Pedro Szekely Jose Luis Ambite 8th IEEE International Conference on Semantic Computing
    2. 2. Motivation 1 How to express the intended meaning of data? Explicit semantics is missing in many of the structured sources creator? actor? rightsHolder? artwork? movie? referenced entity?
    3. 3. Map the Source to the Domain Ontology 2 EDM: Europeana Data Model SKOS: Simple Knowledge Organization System FOAF: Friend of a Friend AAC: American Art Collaborative ElementsGr2: RDA Group 2 Elements ORE: Open Archive Initiative DCTerms: Dublin Core Metadata Terms Data Source: artworks in the Indianapolis Museum of Art Domain ontologies Semantic Model: a mapping from the source to the concepts and relationships defined by the domain ontologies
    4. 4. Semantic Model 3 aac:CulturalHeritageObject edm:WebResourc e skos:Concept aac:Person edm:EuropeanaAggregation dcterms:title edm:aggregatedCHO skos:prefLabel ElementsGr2: nameOfThePerson rdf:type edm:hasResource dcterms:creator edm:hasType dcterms:description Key ingredient to automate source discovery, data integration, and publishing RDF triples
    5. 5. 4 Problem: How to automatically learn a semantic model for a source
    6. 6. Main Idea 5 Sources in the same domain often have similar data Exploit knowledge of known semantic models to hypothesize a semantic model for a new sources
    7. 7. Previous Approach (ISWC 2013) 6 Input Learn semantic types for attributes(s) • Sample data from new source (S) • Domain Ontologies (O) • Known semantic models Construct Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S
    8. 8. Limitations 7 Input Learn semantic types for attributes(s) • Sample data from new source (S) • Domain Ontologies (O) • Known semantic models Construct Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S Consider only one semantic type (label) for each attribute Not scalable to sources with a large number of attributes
    9. 9. Contributions 8 Input Learn semantic types for attributes(s) • Sample data from new source (S) • Domain Ontologies (O) • Known semantic models Build Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S Consider K candidate semantic types per attribute A Beam search algorithm to score and prune the mappings
    10. 10. Example 9 New source: Indianapolis Museum of Art EDM SKOS FOAF AAC ElementsGr2ORE DCTermsDomain ontologies: S1(title, creationDate, name, birthDate, deathDate, type) Known Semantic Models: S1: Dallas Museum S2: The Metropolitan Museum of Art S2(name, copyright, materials, dimensions, imageUrl) S(title, label, image, type, artist) Goal: Semantic model for source S Semantic model of S1 Semantic model of S2
    11. 11. • Sample data from new source (S) Approach 10 Input Learn semantic types for attributes(s) • Domain Ontologies (O) • Known semantic models Construct Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S
    12. 12. Learn Semantic Types • A CRF-based machine learning technique to learn Semantic Types for each attribute from its data • Semantic Type – Ontology Class: <class_uri> – Data Property + Domain Class: <class_uri, property_uri> • Pick top K semantic types according to their confidence values 11 New source: S(title, label, image, type, artist) title <aac:CulturalHeritageObject, dcterms:title> 0.19 <aac:CulturalHeritageObject, rdfs:label> 0.08 label <aac:CulturalHeritageObject, dcterms:description> 0.7 <aac:Person, ElementsGr2:note> 0.03 image <edm:WebResource> 0.58 <foaf:Document> 0.41 type <skos:Concept, skos:prefLabel> 0.82 <skos:Concept, rdfs:label> 0.15 name <foaf:Person, foaf:name> 0.27 <aac:Person, ElementsGr2:nameOfThePerson> 0.19
    13. 13. • Sample data from new source (S) Approach 12 Input Learn semantic types for attributes(s) • Domain Ontologies (O) • Known semantic models Construct Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S
    14. 14. Build Graph G: Add Known Models 13
    15. 15. Build Graph G: Add Semantic Types 14
    16. 16. Build Graph G: Expand with Paths from Ontologies 15
    17. 17. • Sample data from new source (S) Approach 16 Input Learn semantic types for attributes(s) • Domain Ontologies (O) • Known semantic models Construct Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S
    18. 18. Map Source Attributes to the Graph 17 New source: S(title, label, image, type, artist) title <aac:CulturalHeritageObject, dcterms:title> <aac:CulturalHeritageObject, rdfs:label> label <aac:CulturalHeritageObject, dcterms:description> <aac:Person, ElementsGr2:note> image <edm:WebResource> <foaf:Document> type <skos:Concept, skos:prefLabel> <skos:Concept, rdfs:label> name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>
    19. 19. Map Source Attributes to the Graph 18 New source: S(title, label, image, type, artist) title <aac:CulturalHeritageObject, dcterms:title> <aac:CulturalHeritageObject, rdfs:label> label <aac:CulturalHeritageObject, dcterms:description> <aac:Person, ElementsGr2:note> image <edm:WebResource> <foaf:Document> type <skos:Concept, skos:prefLabel> <skos:Concept, rdfs:label> name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>
    20. 20. Map Source Attributes to the Graph 19 New source: S(title, label, image, type, artist) title <aac:CulturalHeritageObject, dcterms:title> <aac:CulturalHeritageObject, rdfs:label> label <aac:CulturalHeritageObject, dcterms:description> <aac:Person, ElementsGr2:note> image <edm:WebResource> <foaf:Document> type <skos:Concept, skos:prefLabel> <skos:Concept, rdfs:label> name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>
    21. 21. Scalability Issue • Multiple mappings from attributes(S) to nodes of G – Each attribute has more than one semantic type – Multiple matches for each semantic type • Not feasible to generate all possible mappings – The size of graph may be large – The source may have many attributes • Exponential in terms of number of attributes – N attributes, M match for each  MN mappings 20
    22. 22. Prune the Mappings • Score the partial mappings after mapping each attribute – Coherence: number of nodes in a mapping that belong to same component – Confidence: average confidence of semantic types in m – Score = arithmetic mean of coherence and size reduction • Beam Search – Keep only top K mappings after mapping each attribute • Number of mappings will not exceed a fixed size after mapping each attribute 21
    23. 23. Score the Mappings 22 New source: S(title, label, image, type, artist) title <aac:CulturalHeritageObject, dcterms:title>, 0.19 <aac:CulturalHeritageObject, rdfs:label> label <aac:CulturalHeritageObject, dcterms:description>, 0.7 <aac:Person, ElementsGr2:note> image <edm:WebResource>, , 0.58 <foaf:Document> type <skos:Concept, skos:prefLabel>, 0.82 <skos:Concept, rdfs:label> name <foaf:Person, foaf:name>, 0.27 <aac:Person, ElementsGr2:nameOfThePerson> Coherence: 4/9 = 0.44 Confidence: 0.56 Score: 0.5 Example Mapping 2
    24. 24. Score the Mappings 23 New source: S(title, label, image, type, artist) title <aac:CulturalHeritageObject, dcterms:title>, 0.19 <aac:CulturalHeritageObject, rdfs:label> label <aac:CulturalHeritageObject, dcterms:description>, 0.7 <aac:Person, ElementsGr2:note> image <edm:WebResource>, , 0.58 <foaf:Document> type <skos:Concept, skos:prefLabel>, 0.82 <skos:Concept, rdfs:label> name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>, 0.19 Coherence: 6/9 = 0.66 Confidence: 0.55 Score: 0.605 Example Mapping 1 This mapping gets higher score even though it uses the 2nd ranked semantic type for artist
    25. 25. • Sample data from new source (S) Approach 24 Input Learn semantic types for attributes(s) • Domain Ontologies (O) • Known semantic models Construct Graph G=(V,E) Generate mappings between attributes(S) and V Generate and rank semantic models 1 2 3 4 Output • A ranked set of semantic models for S
    26. 26. Generate Semantic Models • Select top M mappings • Compute a Steiner tree for each mapping – A minimal tree connecting nodes of mapping • Each tree is a candidate model • Rank candidate models (Steiner trees) – Cost – Score of the corresponding mapping 25
    27. 27. Steiner Tree 26
    28. 28. Evaluation • Dataset – 29 museum data sources – 332 attributes (average 11 per source) • Domain ontologies – EDM ,SKOS, FOAF, ORE, ElementsGr2, AAC, DCTerms – 119 classes, 351 properties • Compute precision and recall between learned models and correct models 27 precision = rel(sm)Çrel(sm') rel(sm') recall = rel(sm)Çrel(sm') rel(sm) How many of the learned relationships are correct? How many of the correct relationships are learned?
    29. 29. Quality 28 k = 1  correct semantic type learned only for 62% of attributes k = 4  correct semantic type was among the 4 learned types for 87% of attributes 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 4 8 12 16 20 24 28 Number of known semantic models precision (k=1) recall (k=1) precision (k=4) recall (k=4) precision (correct types) recall (correct types)
    30. 30. Performance The previous approach was not able to learn semantic models for sources with more than 4 attributes in the timeout of 1 hour Example: S16 with only 5 attributes  16,633,298 mappings (29*29*29*31*22) 0 10 20 30 40 50 60 0 5 10 15 20 25 30 Time Number of Attributes Previous Approach New Approach (Kbeam = 100)
    31. 31. Related Work • Schema matching & schema mapping – iMAP [Dhamankar et al., 2004], Clio [Fagin et al., 2009] • Mapping databases and spreadsheets to ontologies – Mapping languages: D2R [Bizer, 2003], D2RQ [Bizer and Seaborne, 2004], R2RML [Das et al., 2012] – Tools: RDOTE [Vavliakis et al., 2010], RDF123 [Han et al., 2008], XLWrap [Langegger and Woß, 2009] – String similarity between column names and ontology terms [Polfliet and Ichise, 2010] • Understand semantics of Web tables – Use column headers and cell values to find the labels and relations from a database of labels and relations populated from the Web [Wang et al., 2012] [Limaye et al., 2010] [Venetis et al., 2011] • Exploit Linked Open Data (LOD) – Link the values to the entities in LOD to find the types of the values and their relationships [Muoz et al., 2013] [Mulwad et al., 2013] • Learn Semantic Definitions of Online Information Sources [Carman, Knoblock, 2007] – Learns LAV rules from known sources – Only learns descriptions that are conjunctive combinations of known descriptions 30
    32. 32. Future Work • Scalability regarding number of the known models – Create a more compact graph – Consolidate overlapping segments of known models • Leverage Linked Open Data (LOD) – Exploit the relationships between instances – Improve the accuracy of the learned relations • Integrate the new approach in Karma – http://www.isi.edu/integration/karma – @KarmaSemWeb 31

    ×