NLP Linked Open Data "Is a" Solution

669 views

Published on

Demonstration of solving the "is a" natural language problem using linked open data, the DBPedia.org RDF store, and Semantic Web technologies.

Published in: Technology, Lifestyle
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
669
On SlideShare
0
From Embeds
0
Number of Embeds
25
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

NLP Linked Open Data "Is a" Solution

  1. 1. LINKED OPEN DATA SERVICES Is a “cat” a “mammal”? … true Is a “lizard” a “reptile”? … true Is a “cat” a “reptile”? … false Is a “lizard” an “animal”? … true …..
  2. 2. Four types of “IS A” query • Shallow query TO rdf:type: “Cat” is a “Animal”? • Is a “Cat” -> rdf:type -> http://dbpedia.org/ontology/Animal • Deep query THROUGH rdf:type: “Cat” is a “Eukaryote”? • Is a “Cat” -> rdf:type -> http://dbpedia.org/ontology/Animal • “Animal” -> rdfs:subClassOf -> http://dbpedia.org/ontology/Eukaryote
  3. 3. Four types of “IS A” query • Shallow query TO dcterms:subject: “Cat” is a “Feline”? • Is a “Cat” -> dcterms:subject -> http://dbpedia.org/resource/Category:Felines • Deep query THROUGH dcterms:subject: “Cat” is a “Felid”? • Is a “Cat” -> dcterms:subject -> http://dbpedia.org/resource/Category:Felines • http://dbpedia.org/resource/Category:Felines -> skos:broader -> http://dbpedia.org/resource/Category:Felids
  4. 4. CONVERT TRIPLES TO NEW VOCABULARY Re-link the Linked Data
  5. 5. Ontology Resource Category Resource Term skos:broader rdfs:subClassOf dcterms:subject Category Resource rdf:type Ontology Resource Structured Hierarchy Wikipedia Categories Original Graph from DBPedia.org
  6. 6. Normalize Labels for Search • http://dbpedia.org/page/United_States • “united states” • http://dbpedia.org/ontology/PopulatedPlace • “populated place” • http://dbpedia.org/class/yago/CountriesBorderingTheAtlanticOcean • “countries bordering the atlantic ocean” • http://dbpedia.org/resource/Category:Former_British_colonies • “former british colonies” Term / Resource rdfs:label String: Label
  7. 7. Update Types to Vulcan Vocabulary prefix: halo-uri http://halo.vulcan.com/lod/2013/11/isa-vocabulary# • http://dbpedia.org/page/United_States • halo-uri:term • halo-uri:term-id • http://dbpedia.org/ontology/PopulatedPlace • halo-uri:rdf-type • http://dbpedia.org/resource/Category:Former_British_colonies • halo-uri:wikipedia-category Term / Resource rdf:type Halo Vocabulary
  8. 8. Add “Is A” Connection for Graphs prefix: halo-uri http://halo.vulcan.com/lod/2013/11/isa-vocabulary# • http://dbpedia.org/page/United_States • halo-uri:isA http://dbpedia.org/ontology/Place • http://dbpedia.org/page/United_States • halo-uri:isA http://dbpedia.org/resource/Category:Republic Term halo-uri:isA Type, Category, or Graph
  9. 9. Add “Is A” Connection for Ontology prefix: halo-uri http://halo.vulcan.com/lod/2013/11/isa-vocabulary# • http://dbpedia.org/page/United_States • halo-uri:isA http://dbpedia.org/ontology/Place halo-uri:isA Term Ontology Tree Ontology Tree Ontology Tree Ontology Tree Place Populated Place Country Thing
  10. 10. Add “Is A” Connection for Categories prefix: halo-uri http://halo.vulcan.com/lod/2013/11/isa-vocabulary# • http://dbpedia.org/page/United_States • halo-uri:isA http://dbpedia.org/resource/Category:Republic Term halo-uri:isA Category Tree Category Tree Countries Republics “Is A” connections not applied higher up category tree Category Tree Political Theories Category Tree Philosophies
  11. 11. Search Terms and Context New Graph with Links to DBPedia.org Halo type Halo type Ontology graph Halo label Term halo-uri:isA Halo type Category graph Custom graph replaces hierarchy Halo label
  12. 12. Ontology Resource Category Resource Term skos:broader rdfs:subClassOf dcterms:subject Category Resource rdf:type Ontology Resource Structured Hierarchy Wikipedia Categories Original Graph from DBPedia.org
  13. 13. EXAMPLE: 1 Query TO and THROUGH rdf:type
  14. 14. Query TO and THROUGH rdf:type • Is a “cat” a “mammal”? • Is a “cat an “animal”? Cat halo-uri:isA • Is a “lizard” a “reptile”? Anima l • Is a “lizard” an “animal”? Lizard Mammal halo-uri:isA Reptile
  15. 15. Query TO and THROUGH rdf:type • Is a “cat” a “reptile”? • Is a “cat an “animal”? Cat halo-uri:isA Reptile Cat halo-uri:isA Anima l
  16. 16. Query TO and THROUGH rdf:type • Is a “cat” a “mal”? Cat Cat Mammal halo-uri:isA halo-uri:isA ? Anima l
  17. 17. Query TO and THROUGH rdf:type • XML: http://halo.vulcan.com:8080/isa/cat/type/animal.xml • XML: http://halo.vulcan.com:8080/isa/cat/type-graph/animal.xml <item> <id>9c5eebf630d626279fa6acbe1f50c9b9</id> <term>cat</term> <domain>animal</domain> <match>true</match> <triples> <s>http://dbpedia.org/resource/Cat</s> <p>http://halo.vulcan.com/lod/2013/11/isa-vocabulary#isA</p> <o>http://dbpedia.org/ontology/Animal</o> <search> <p>http://www.w3.org/2000/01/rdf-schema#label</p> <o>animal</o> </search> </triples> </item>
  18. 18. Query TO and THROUGH rdf:type • JSON: http://halo.vulcan.com:8080/isa/cat/type/animal.json • JSON: http://halo.vulcan.com:8080/isa/cat/type-graph/animal.json { "id":"9c5eebf630d626279fa6acbe1f50c9b9", "term":"cat", "domain":"animal", "match":true, "triples":[{ "s":"http://dbpedia.org/resource/Cat", "p":"http://halo.vulcan.com/lod/2013/11/isa-vocabulary#isA", "o":"http://dbpedia.org/ontology/Animal", "search":{ "p":"http://www.w3.org/2000/01/rdf-schema#label", "o":"animal" } }] }
  19. 19. Query TO and THROUGH rdf:type • Halo.Vulcan SPARQL : Is a “cat” a “mammal”? PREFIX halo: <http://halo.vulcan.com/lod/2013/11/isa-vocabulary#> SELECT DISTINCT ?p ?o ?domainLabel WHERE { GRAPH ?G { ?term halo:isA ?o . ?term ?p ?o . ?term <http://www.w3.org/2000/01/rdf-schema#label> ?termLabel . ?o <http://www.w3.org/2000/01/rdf-schema#label> ?domainLabel . ?o rdf:type halo:rdf-type . FILTER (regex(str(?termLabel), '^cat$', 'i')) . FILTER (regex(str(?domainLabel), 'mammal', 'i')) }} LIMIT 100
  20. 20. EXAMPLE: 2 Query TO and THROUGH category
  21. 21. Query TO and THROUGH category • Is a “cat” an “animal”? Cat halo-uri:isA Invasive animal species Animals described in 1758 Domesticated animals
  22. 22. Searches are Returned with Triples "triples":[{ "s":"http://dbpedia.org/resource/Cat", "p":"http://halo.vulcan.com/lod/2013/11/isa-vocabulary#isA", "o":"http://dbpedia.org/resource/Category:Invasive_animal_species", "search":{ "p":"http://www.w3.org/2000/01/rdf-schema#label", "o":"invasive animal species" } },{ "s":"http://dbpedia.org/resource/Cat", "p":"http://halo.vulcan.com/lod/2013/11/isa-vocabulary#isA", "o":"http://dbpedia.org/resource/Category:Animals_described_in_1758", "search":{ "p":"http://www.w3.org/2000/01/rdf-schema#label", "o":"animals described in 1758" } }]
  23. 23. Query TO and THROUGH rdf:type • Halo.Vulcan SPARQL : Is a “cat” an “animal”? PREFIX halo: <http://halo.vulcan.com/lod/2013/11/isa-vocabulary#> SELECT DISTINCT ?p ?o ?domainLabel WHERE { GRAPH ?G { ?term halo:isA ?o . ?term ?p ?o . ?term <http://www.w3.org/2000/01/rdf-schema#label> ?termLabel . ?o <http://www.w3.org/2000/01/rdf-schema#label> ?domainLabel . ?o rdf:type halo:wikipedia-category . FILTER (regex(str(?termLabel), '^cat$', 'i')) . FILTER (regex(str(?domainLabel), ‟animal', 'i')) }} LIMIT 100
  24. 24. ADDITIONAL SERVICES More graphs, flexible service points, and unexpected features…
  25. 25. There are 4 graphs in Virtuoso • http://halo.vulcan.com:8890/conductor/sparql_graph.vspx • http://halo.vulcan.com:8890/isa/rdf-type • ~87,069 Triples • http://halo.vulcan.com:8890/isa/rdf-type-graph • ~101,823 Triples • http://halo.vulcan.com:8890/isa/category-type • ~292,239 Triples • http://halo.vulcan.com:8890/isa/category-type-graph • ~560,906 Triples 652,677 NORMALIZED TRIPLES ACROSS ALL GRAPHS
  26. 26. “Find All” Service Points Why query every instance? Just ask the service for all relations to a term in a given graph. • All “is a” matches in rdf-type • http://halo.vulcan.com:8080/isa/cat/type/.xml • All “is a” matches in rdf-type graph • http://halo.vulcan.com:8080/isa/cat/type-graph/.xml • All “is a” matches in categories • http://halo.vulcan.com:8080/isa/cat/category/.xml • All “is a” matches in categories graph • http://halo.vulcan.com:8080/isa/cat/category-graph/.xml • All “is a” matches in every graph • http://halo.vulcan.com:8080/isa/cat/.xml
  27. 27. Unanticipated Features • Absolute matching on domain for “is a” relations • Add a URL for literal matching and update SPARQL regex • Spacing and special characters need to be URL encoded because each call is a „GET‟ • Wikipedia categories are of poor quality • User defined and often inaccurate • Very specific: “Animal species described in 1705” • Cyclical: “Republics -> Countries -> United States -> Republics … • All paths lead to: Philosophy • Plural categories make for difficult literal matching and odd “is a” statements: Is a cat a felines?
  28. 28. Example Service URLs Query rdf-types graph: • Domain-range Query – http://halo.vulcan.com:8080/isa/cat/type/animal.json • Term isa * Query – http://halo.vulcan.com:8080/isa/cat/type/.json Query rdf-types and parents graph: • Domain-range Query – http://halo.vulcan.com:8080/isa/lizard/type-graph/reptile.json • Term isa * Query – http://halo.vulcan.com:8080/isa/lizard/type-graph/.json Query category graph: • Domain-range Query – http://halo.vulcan.com:8080/isa/cat/category/animal.json • Term isa * Query – http://halo.vulcan.com:8080/isa/cat/category/.json Query category and parents graph: • Domain-range Query – http://halo.vulcan.com:8080/isa/cat/category-graph/animal.json • Term isa * Query – http://halo.vulcan.com:8080/isa/cat/category-graph/.json Query all IsA graphs (every associated entity types, categories, and parents): • Domain-range Query – http://halo.vulcan.com:8080/isa/cat/animal.json • Term isa * Query – http://halo.vulcan.com:8080/isa/cat/.json

×