From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge

1,266 views
1,161 views

Published on

Companies, such as Google and Microsoft, are building web-scale linked knowledge bases for the purpose of indexing and searching the Web, but these efforts do not address the problem of building accurate, fine-grained, deep knowledge bases for specific application domains. We are developing an integration framework, called Karma, which supports the rapid, end-to-end construction of such linked knowledge bases. In this talk I will describe machine-learning techniques for mapping new data sources to a domain model and linking the data across sources. I will also present several applications of this technology, including building virtual museums and integrating data sources for peacebuilding.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,266
On SlideShare
0
From Embeds
0
Number of Embeds
25
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Data are linked together.
    There could be millions of relationships about art that would surprise us but we don’t event know.
    When we learn about art, if we also put our focuses on the relationships, we might find interesting stories about things we are familiar with, or things we are not familiar with.
    And therefore, learning itself would be much more interesting.
  • Start from one topic of interest
    Explore other nodes until user generates a path of interest
  • Starts off with default generated text and images/videos
    Users can choose their own images, videos, and text
  • Starts off with default generated text and images/videos
    Users can choose their own images, videos, and text
  • Story player plays back images and videos that user has selected
    Using text to speech technology, story player will narrate information from topic to topic
  • Tokenize values in a given labeled column into pure alphabetic, numeric and symbol tokens
    Extract features from the tokens and the column name and associate them with column’s semantic type
  • From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge

    1. 1. From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge Craig Knoblock University of Southern California
    2. 2. Collaborators • Overall project • Pedro Szekely • Karma Research • Jose Luis Ambite • Yao-Yi Chiang • Shubham Gupta • Dipsy Kapoor • Ramnandan Krishnamurthy • Amol Mittal • Shrikanth Narayanan • Jason Slepicka • Mohsen Taheriyan • Bo Wu • Fengyu Yang • Peacebuilding • Prerna Dwiveldi • Xiaojiae Tan • Hongyan Yun • Virtual museum • Eleanor Fink • Jing Wan • Xuechao Zhang • LOD Stories • Jianliang Chen • Yuting Liu • Dipanwita Maulik • Miel Vander Sande • Linda Wu • Hao Zhang
    3. 3. Outline •Linked Knowledge •Examples of Using Linked Knowledge •Creating Linked Knowledge •Related Work •Discussion •Future Work
    4. 4. The Web Today Crystal Bridges Museum of American Art Dallas Museum of Art I ndianapolis Museum of Art The Met ropolitan Museum of Art Nat ional Port rait Gallery Smit hsonian American Art Museum
    5. 5. Linked Data Today Crystal Bridges Museum of American Art Dallas Museum of Art I ndianapolis Museum of Art The Met ropolitan Museum of Art Nat ional Port rait Gallery Smit hsonian American Art Museum
    6. 6. Linked Data Today Crystal Bridges Museum of American Art Dallas Museum of Art I ndianapolis Museum of Art The Met ropolitan Museum of Art Nat ional Port rait Gallery Smit hsonian American Art Museum ✔ ✖
    7. 7. Linked Data with a Shared Domain Model Crystal Bridges Museum of American Art Dallas Museum of Art I ndianapolis Museum of Art The Met ropolitan Museum of Art Nat ional Port rait Gallery Smit hsonian American Art Museum
    8. 8. Crystal Bridges Museum of American Art Dallas Museum of Art I ndianapolis Museum of Art The Met ropolitan Museum of Art Nat ional Port rait Gallery Smit hsonian American Art Museum Linked Data with a Shared Domain Model
    9. 9. Linked Knowledge (Shared model and fully linked)
    10. 10. Properties of Linked Knowledge Current • No domain model • Sparse links • Not curated • No provenance data • Not current Desired • Shared domain models • Rich links • Curated • Provenance data • Up to date
    11. 11. Outline •Linked Knowledge •Examples of Using Linked Knowledge •Creating Linked Knowledge •Related Work •Discussion •Future Work
    12. 12. American Art Collaborative • Building a virtual museum of American art • Rich, interlinked repository of cultural heritage knowledge • 13 U.S. museums with large collections of American art • Shared domain model • Links across museums • Links to other sites • Already mapped the data from the Smithsonian, Crystal Bridges, and Amon Carter museums of American art
    13. 13. Implemented in Scalar (scalar.usc.edu) Crystal Bridges Museum of American Art Smithsonian American Art Museum Amon Carter Museum of American Art
    14. 14. Andy Warhol Carnegie Institute of Technology Pittsburgh George Washington The Law of Nations Emerich deVattel LOD Stories
    15. 15. Multimedia Story Generated
    16. 16. Peacebuilding Data • Peacebuilding • Identify and implement interventions that can prevent violent conflicts • Today • Lots of data available to identify indicators of possible conflicts • But difficult to find and analyze the data World-Wide Human Geography Data Working Group Strauss CenterUnited Nations World Health Organization
    17. 17. Armed Conflict Events in the MRU
    18. 18. Food Security Indicators from UN
    19. 19. Food Security Analysis United Nations World Health Organization
    20. 20. Goal: Use the data and interconnections to predict conflicts and identify interventions Prevalence of Undernourishment in Sierra Leone Armed Conflict Events Year Percentage
    21. 21. Outline •Linked Knowledge •Examples of Using Linked Knowledge •Creating Linked Knowledge •Related Work •Discussion •Future Work
    22. 22. Steps to Create Linked Knowledge • Map data to a common representation … select ontologies … create a semantic model using the ontologies … generate RDF • Link to external resources … identify the links … curate the links
    23. 23. select ontologies
    24. 24. edm:ProvidedCHO aac:CulturalHeritageObject dcterms:creator ore:Aggregation edm:EuropeanaAggregation crm:E89_Propositional_Object edm:WebResource edm:aggregatedCHO edm:hasView edm:Agent/crm:E39_Actor, foaf:Person aac:Person rdaGr2:placeOfBirth rdaGr2:placeOfDeath edm:Place/crm:E53_Place aac:Place aac:associatedPlace schema:PostalAddress schema:address
    25. 25. edm:ProvidedCHO aac:CulturalHeritageObject skos:Concept skos:Concept edm:hasType skos:narrower skos:prefLabelskos:prefLabel saam:objectId dcterms:date dcterms:provenance dcterms:rights dcterms:subject dcterms:medium dcterms:title dcterms:description dcterms:creator ore:Aggregation edm:EuropeanaAggregation crm:E89_Propositional_Object edm:WebResource edm:aggregatedCHO edm:hasView edm:Agent/crm:E39_Actor, foaf:Person aac:Person skos:altLabel rdaGr2:dateOfDeath rdaGr2:biographicalInformation rdaGr2:placeOfBirth rdaGr2:placeOfDeath rdaGr2:dateAssociated WithThePerson edm:Place/crm:E53_Place aac:Place aac:associatedPlace schema:PostalAddressschema:addressCountry schema:addressLocality schema:addressRegion schema:address skos:prefLabel schema:Country schema:name dcterms:format rdaGr2:dateOfBirth skos:prefLabel saam:objectNumber saam:constituentId dcterms:created
    26. 26. create a semantic model using the ontologies
    27. 27. Semantic Model Describes the source in terms of the concepts and relationships defined by the domain ontology Source object property data property subClassOf Domain Ontology Person Organization Place State name birthdate bornIn worksFor state name phone name livesIn City Event ceo location organizer nearby startDate title isPartOf postalCode Column 1 Column 2 Column 3 Column 4 Column 5 Bill Gates Oct 1955 Microsoft Seattle WA Mark Zuckerberg May 1984 Facebook White Plains NY Larry Page Mar 1973 Google East Lansing MI
    28. 28. Semantic Types Column 1 Column 2 Column 3 Column 4 Column 5 Bill Gates Oct 1955 Microsoft Seattle WA Mark Zuckerberg May 1984 Facebook White Plains NY Larry Page Mar 1973 Google East Lansing MI Person Organization City State name birthdate name namename Person
    29. 29. Relationships Column 1 Column 2 Column 3 Column 4 Column 5 Bill Gates Oct 1955 Microsoft Seattle WA Mark Zuckerberg May 1984 Facebook White Plains NY Larry Page Mar 1973 Google East Lansing MI Person Organization City State name birthdate bornIn worksFor state name name name This semantic model is converted to a semantic description in R2RML
    30. 30. Karma Hierarchical Sources Services Model Karma Tabular Sources Database … Interactive tool for rapidly extracting, cleaning, transforming, integrating, and publishing data [ Knoblock, Szekely, et al. Semi-automatically mapping structured sources into the semantic web. ISWC 2012 ]
    31. 31. Karma in Action
    32. 32. Crystal Bridges RDF Data … cb:SAAMCHO_Robert_Louis_Stevenson_and_His_Wife a saam:SAAMCHO ; dcterms:created "1885" ; dcterms:creator cb:SaamPerson_John_Singer_Sargent ; dcterms:medium "Oil on canvas" ; dcterms:title "Robert Louis Stevenson and His Wife” . cb:SAAMCHO_Room a saam:SAAMCHO ; dcterms:created "2007" ; dcterms:creator cb:SaamPerson_Alison_Elizabeth_Taylor ; dcterms:format "96 x 120 x 96 in.(243.8 x 304.8 x 243.8 cm)" ; dcterms:medium "Wood veneer, pyrography, and shellac" ; dcterms:title "Room” . cb:SaamPerson_John_Singer_Sargent a saam:SaamPerson ; ont0:dateOfBirth "1879", "1885" ; ont0:dateOfDeath "1925" ; skos:prefLabel "John Singer Sargent" .
    33. 33. Approach Domain Ontology Learn Semantic Types Sample Data Construct a Graph Generate Candidate Models Rank Results
    34. 34. Learning Semantic Types • Requirements: • Learn from a small number of examples • Distinguish both string and numeric values • Can be learned quickly and is highly scalable to large numbers of semantic types Person OrganizationCity State name birthdate name namename Person name date city state workplace 1 Fred Collins Oct 1959 Seattle WA Microsoft 2 Tina Peterson May 1980 New York NY Google Domain Ontology
    35. 35. Textual Data Learning Semantic Types • Textual Data • Treat each column of data as a document • Apply TF-IDF Cosine Similarity
    36. 36. Numeric Data Learning Semantic Types (cont.) • Numeric Data: • Apply statistical hypothesis testing to determine which distribution fits best • Apply Kolmogorov-Smirnov Test • Combined Approach: • Classify each set of data first and apply appropriate test • If ambiguous, apply both methods • Top-k suggestions returned based on the confidence scores
    37. 37. Evaluation Combined approach achieves 97% accuracy on the top-4 accuracy Reduced the training time from 110s to 0.45s
    38. 38. Construct a Graph Construct a graph from semantic types and ontology date
    39. 39. Determine Relationships Select minimal tree that connects all semantic types • A customized Steiner tree algorithm [Kou & Markowsky, 1981] Initial Model date
    40. 40. Result in Karma
    41. 41. Refining the Model Correct Model Impose constraints on Steiner Tree Algorithm date
    42. 42. Final Semantic Model
    43. 43. Improved Approach Taheriyan et al., ISWC 2013, ICSC 2014 Domain Ontology Learn Semantic Types C R F Sample Data Construct a Graph Generate Candidate Models Rank Results Known Semantic Models
    44. 44. Results on 17 Geospatial Sources Source Signature #Attributes GED Previous work Current Approach nearestCity(lat, lng, city, state, country) 5 6 1 findRestaurant(zipcode, restaurantName, phone, address) 4 1 0 zipcodesInCity(city, state, postalCode) 3 3 1 parseAddress(address, city, state, zipcode, country) 5 6 1 citiesOfState(state, city) 2 1 0 ocean(lat, lng, name) 3 2 1 postalCodeLookup(zipCode, city, state, country) 4 6 1 country(lat, lng, code, name) 4 2 0 companyCEO(company, name) 2 1 0 personalInfo(firstname, lastname, birthdate, brithCity, birthCountry) 5 4 1 businessInfo(company, phone, homepage, city, country, name) 6 10 8 restaurantChef(restaurant, firstname, lastname) 3 2 1 findSchool(city, state, name, code, homepage, ranking, dean) 7 8 6 employees(organization, firstname, lastname, birthdate) 4 1 2 education(person, hometown, homecountry, school, city, country) 6 9 4 administrativeDistrict(city, province, country) 3 4 1 capital(country, city) 2 2 1 TOTAL 68 68 29 57% improvement GED = Graph Edit Distance
    45. 45. Results on 6 Museum Sources Source Signature #Attributes GED Previous work Current Approach S1(Attribution, BeginDate, EndDate, Title, Dated, Medium, Dimensions) 7 1 0 S2(ObjectID, ObjectTitle, ObjectWorkType, ArtistName, ArtistBirthDate, ArtistDeathDate, ObjectEarliestDate, ObjectRights, ObjectFacetValue1) 9 2 3 S3(death, birth, name) 3 0 0 S4(accessionNumber, artist, creditLine, dimensions, imageURL, materials, relatedArtworksURL, creationDate, provenance, keywordValues) 10 9 6 S5(AccessionNumber, Classification, CreditLine, Date, Description, DimensionsOrphan, WhatValues, Who, image, relatedArtworksValues) 10 9 5 S6(Artist, ArtistBornDate, ArtistDiedDate, Classification, Copyright, CreditLine, Image, KeywordValues, Ref, SitterValues) 10 8 6 TOTAL 49 29 20 31% improvement GED = Graph Edit Distance
    46. 46. Demonstration of Source Modeling
    47. 47. identifying and curating links
    48. 48. Multiple “John Singer Sargent” ima:Person_John_Singer_Sargent a aac-ont:Person ; dct:date "1856-1925" ; foaf:name "John Singer Sargent" . saam:Person_4253 a aac-ont:Person ; aac-ont:associatedPlace saam:SaamPlace_1357324439768t1r13950_0, saam:SaamPlace_1357324439768t1r13951_0 ; saam:constituentId "4253" ; rdaGr2:biographicalInformation “Painter. Sargent traveled …" ; rdaGr2:dateAssociatedWithThePerson "1990-10-1”, "1995-5-8" ; rdaGr2:dateOfBirth "1856-1-12" ; rdaGr2:dateOfDeath "1925-4-15" ; rdaGr2:placeOfBirth saam:SaamPlace_1357324439768t1r13952_0 ; rdaGr2:placeOfDeath saam:SaamPlace_1357324439768t1r13953_0 ; foaf:name "John S. Sargent" ; skos:altLabel "John S. Sargent" ; skos:prefLabel "John Singer Sargent" . cb:Person_John_Singer_Sargent a aac-ont:Person ; ont0:dateOfBirth "1879", "1885" ; ont0:dateOfDeath "1925" ; foaf:name "John Singer Sargent" . met:Person_John_Singer_Sargent a aac-ont:Person ; ont0:placeOfResidence "North and Central America", "United States" ; foaf:name "John Singer Sargent" . dallas:Person_John_Singer_Sargent a aac-ont:Person ; ont0:dateOfBirth "1856" ; ont0:dateOfDeath "1925" ; foaf:name "John Singer Sargent" .
    49. 49. Pedro Szekely and Craig KnoblockUniversity of Southern California John Singer Sargent ima:SaamPerson_John_Singer_Sargent a saam:SaamPerson ; dct:date "1856-1925" ; foaf:name "John Singer Sargent" . saam:SaamPerson_4253 a saam:SaamPerson ; saam:associatedPlace saam:SaamPlace_1357324439768t1r13950_0, saam:SaamPlace_1357324439768t1r13951_0 ; saam:constituentId "4253" ; rdaGr2:biographicalInformation “Painter. Sargent traveled …" ; rdaGr2:dateAssociatedWithThePerson "1990-10-1”, "1995-5-8" ; rdaGr2:dateOfBirth "1856-1-12" ; rdaGr2:dateOfDeath "1925-4-15" ; rdaGr2:placeOfBirth saam:SaamPlace_1357324439768t1r13952_0 ; rdaGr2:placeOfDeath saam:SaamPlace_1357324439768t1r13953_0 ; skos:altLabel "John S. Sargent" ; skos:prefLabel "John Singer Sargent" . cb:SaamPerson_John_Singer_Sargent a saam:SaamPerson ; ont0:dateOfBirth "1879", "1885" ; ont0:dateOfDeath "1925" ; skos:prefLabel "John Singer Sargent" . met:SaamPerson_John_Singer_Sargent a saam:SaamPerson ; ont0:placeOfResidence "North and Central America", "United States" ; foaf:name "John Singer Sargent" . dallas:SaamPerson_John_Singer_Sargent a saam:SaamPerson ; ont0:dateOfBirth "1856" ; ont0:dateOfDeath "1925" ; foaf:name "John Singer Sargent" .
    50. 50. Interactive Linking in Karma Integrated a linking services that allows a user to reconcile the entities with other sources
    51. 51. Intuition Estimate discrimination power of properties, e.g., of name, birth and death dates birth date death date # of people … … … 1800 1820 147 1800 1821 284 1800 1822 213 … … … every combination of dates Song, D., Heflin, J.: Domain-independent entity coreference for linking ontology instances. ACM Journal of Data and Information Quality (ACM JDIQ) (2012) similar idea to
    52. 52. Linking “John Singer Sargent” saam:Person_4253 owl:sameAs cb:Person_John_Singer_Sargent ; owl:sameAs dallas:Person_John_Singer_Sargent ; owl:sameAs ima:Person_John_Singer_Sargent ; owl:sameAs met:Person_John_Singer_Sargent ; owl:sameAs dbpedia:John_Singer_Sargent ; owl:sameAs nytimes:N49129220686803623753 ; owl:sameAs w-flick:John_Singer_Sargent ; ... .
    53. 53. Evaluation of Automatic Linking SAAM names starting with “A” matched by hand  535 people  176 matches estimate ≈ 30 missing links to DBpedia
    54. 54. Curating Links with Karma
    55. 55. Curating Links with Karma
    56. 56. Related Work • Source modeling • Semantic typing of inputs and outputs [Heß et al., 2003] [Lerman et al., 2006] [Stonebraker et al., 2013] • Schema mapping (e.g., Clio [Fagin et al., 2009]) • Mapping languages (e.g., D2R [Bizer, 2003], D2RQ [Bizer and Seaborne, 2004], R2RML [Das et al., 2012]) • Linking • Tools for linking data across sources (e.g., Silk [Volz et al., 2009], LIMES [Ngomo & Auer]) • Support for missing values and variation in discriminability [Song & Heflin, 2012] • Knowledge graphs • Google Knowledge Graph & Microsoft Santori knowledge repository • Linked Data Integration Framework (LDIF) [Schultz et al, 2012]
    57. 57. Related Work: Cultural Heritage Data • Europeana project (Haslhofer & Isaac, 2011) • 30 million objects from 2,300 institutions • Integrated, but shallow domain model with limited semantics and sparsely linked • Isolated jewels • Amsterdam Museum (Boer et al, 2012) • Museum Finland (Hyvonen et al., 2005) • LODAC museums in Japan (Matsumura et al., 2012) • British Museum (collection.brishmuseum.org) • Detailed models, but few links
    58. 58. Discussion Ability to create linked knowledge • Shared domain models • Karma provides semi-automatic and interactive approach to building the source models • Rich links • Integrated an initial linking capability in Karma – now working on general capability • Curated • Curation approach currently being integrated into Karma • Provenance data • Now store basic provenance info using PROV, but more to be done • Up to date • Source models make it possible to refresh the data, but more to be done
    59. 59. Future Work • End-to-end approach to building linked knowledge in Karma • Learning links from examples • Extraction from text • Visualization tools • Ontology refinement • Transformation learning • Geospatial data support • Support for cloud-based execution
    60. 60. Links http://www.isi.edu/integration/karma/ Papers available at http://www.isi.edu/~knoblock
    61. 61. Acknowledgements • Thanks to our sponsors: • Smithsonian American Art Museum • Center for Global Health and Peacebuilding • National Science Foundation • Intelligence Advanced Research Projects Activity • Air Force Research Laboratory
    62. 62. Thanks!

    ×