Case acquisition from text: Ontology-based information extraction with SCOOBIE for myCBR

1,214 views
1,119 views

Published on

myCBR is a freely available tool for rapid prototyping of similarity-based retrieval applications such as case-based product recommender systems. It provides easy-to-use model generation, data import, similarity modelling, explanation, and testing functionality together with comfortable graphical user interfaces. SCOOBIE is an ontology-based information extraction system, which uses symbolic background knowledge for extracting information from text. Extraction results depend on existing knowledge fragments. In this paper we show how to use SCOOBIE for generating cases from texts. More concrete we use ontologies of the Web of Data, published as so called Linked Data interlinked with myCBR’s case model. We present a way of formalising a case model as Linked Data ready ontology and connect it with other ontologies of the Web of Data in order to get richer cases.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,214
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Case acquisition from text: Ontology-based information extraction with SCOOBIE for myCBR

  1. 1. Competence Center Case-Based Reasoning CASE ACQUISITION FROM TEXT: ONTOLOGY-BASED INFORMATION EXTRACTION WITH SCOOBIE FOR MYCBR Thomas Roth-Berghofer, Benjamin Adrian, and Andreas Dengel German Research Center for Artificial Intelligence DFKI GmbH Donnerstag, 5. August 2010
  2. 2. COMPETENCE CENTER CASE-BASED REASONING (CC CBR) Klaus-Dieter Thomas Armin Althoff Roth-Berghofer Stahl © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  3. 3. COMPETENCE CENTER CASE-BASED REASONING (CC CBR) Klaus-Dieter Thomas Armin Althoff Roth-Berghofer Stahl Kerstin Régis © 2010 DFKI CC CBR Bach Newo Donnerstag, 5. August 2010
  4. 4. MOTIVATION © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  5. 5. MOTIVATION Ontologies SCOOBIE Ontology-based RDF Texts © 2010 DFKI CC CBR Information Extraction Donnerstag, 5. August 2010
  6. 6. MOTIVATION Ontologies SCOOBIE + Ontology-based RDF Texts © 2010 DFKI CC CBR Information Extraction Donnerstag, 5. August 2010
  7. 7. BBC Music profiles Jamendo TOTP Peel Sites Open- Guides DBLP flickr RKB Project Pub Geo- Euro- wrappr Explorer Guten- Virtuoso Guide names stat Pisa CORDIS berg Sponger eprints BBC Programmes Open Calais RKB riese World Linked ECS Magna- Fact- MDB IEEE New- South- tune book ampton castle RDF Book DBpedia Mashup Linked GeoData lingvoj Freebase LAAS- US CiteSeer Census CNRS W3C DBLP Data IBM WordNet Hannover UniRef GEO UMBEL Species DBLP Gov- Track Berlin Reactome LinkedCT UniParc Open Taxonomy Cyc Yago Drug PROSITE Daily Bank Med Pub GeneID Chem Homolo KEGG UniProt Gene Pfam ProDom Disea- CAS Gene some ChEBI Ontology Symbol OMIM Inter Pro UniSTS PDB MOTIVATION HGNC MGI PubMed As of July 2009 Ontologies SCOOBIE + Ontology-based RDF Texts © 2010 DFKI CC CBR Information Extraction Donnerstag, 5. August 2010
  8. 8. BBC Music profiles Jamendo TOTP Peel Sites Open- Guides DBLP flickr RKB Project Pub Geo- Euro- wrappr Explorer Guten- Virtuoso Guide names stat Pisa CORDIS berg Sponger eprints BBC Programmes Open Calais RKB riese World Linked ECS Magna- Fact- MDB IEEE New- South- tune book ampton castle RDF Book DBpedia Mashup Linked GeoData lingvoj Freebase LAAS- US CiteSeer Census CNRS W3C DBLP Data IBM WordNet Hannover UniRef GEO UMBEL Species DBLP Gov- Track Berlin Reactome LinkedCT UniParc Open Taxonomy Cyc Yago Drug PROSITE Daily Bank Med Pub GeneID Chem Homolo KEGG UniProt Gene Pfam ProDom Disea- CAS Gene some ChEBI Ontology Symbol OMIM Inter Pro UniSTS PDB MOTIVATION HGNC MGI PubMed As of July 2009 Ontologies SCOOBIE + Ontology-based RDF Texts © 2010 DFKI CC CBR Information Extraction Donnerstag, 5. August 2010
  9. 9. OVERVIEW • Ontology-based Information Extraction with SCOOBIE • Recap of myCBR • myCBR+SCOOBIE • Outlook and future work © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  10. 10. SCOOBIE: ONTOLOGIE-BASED INFORMATION EXTRACTION © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  11. 11. SCOOBIE Ontologies Ontology-based Texts RDF Information Extraction © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  12. 12. EXTRACT PLAIN TEXT © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  13. 13. EXTRACT TOKENS © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  14. 14. EXTRACT TOKENS © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  15. 15. EXTRACT TOKENS © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  16. 16. RECOGNISE SYMBOLS © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  17. 17. RECOGNISE SYMBOLS © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  18. 18. RECOGNISE SYMBOLS © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  19. 19. RECOGNISE SYMBOLS © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  20. 20. © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  21. 21. RECAP: MOTIVATION FOR DEVELOPING • Need for a freely available “out of the box” tool: • compact and easy to use • comfortable graphical user interface for • defining case representations • modeling knowledge-intensive similarity measures • testing of retrieval functionality • support for rapid prototyping © 2010 DFKI CC CBR • adaptable & extendable Donnerstag, 5. August 2010
  22. 22. ➜ ECCBR 2008 © 2010 DFKI CC CBR Armin Stahl and Thomas R. Roth-Berghofer. Rapid prototyping of CBR applications with the open source tool myCBR. In Ralph Bergmann and Klaus-Dieter Althoff, editors, Advances in Case-Based Reasoning. Springer Verlag, 2008. Donnerstag, 5. August 2010
  23. 23. BBC Music profiles Jamendo TOTP Peel Sites Open- Guides DBLP flickr RKB Project Pub Geo- Euro- wrappr Explorer Guten- Virtuoso Guide names stat Pisa CORDIS berg Sponger eprints BBC Programmes Open Calais RKB riese World Linked ECS Magna- Fact- MDB IEEE New- South- tune book ampton castle RDF Book DBpedia Mashup Linked GeoData lingvoj Freebase LAAS- US CiteSeer Census CNRS W3C DBLP Data IBM WordNet Hannover UniRef GEO UMBEL Species DBLP Gov- Track Berlin Reactome LinkedCT UniParc Open Taxonomy Cyc Yago Drug PROSITE Daily Bank Med Pub GeneID Chem Homolo KEGG UniProt Gene Pfam ProDom Disea- CAS Gene some ChEBI Ontology Symbol OMIM Inter Pro UniSTS PDB MOTIVATION HGNC MGI PubMed As of July 2009 Ontologies SCOOBIE + Ontology-based RDF Texts Information Extraction © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  24. 24. SEMANTIC WEB VISION “The Semantic Web is an extension of the current Web in which information is given well- defined meaning, better enabling computers and people to work in cooperation.” © 2010 DFKI CC CBR T. Berners-Lee, J. Hendler, O. Lassila, “The Semantic Web”, Scientific American, May 2001 Donnerstag, 5. August 2010
  25. 25. SEMANTIC WEB VISION “The Semantic Web is an extension of the current Web in which information is given well- defined meaning, better enabling computers and people to work in cooperation.” • Web of content • Web pages linked by semantical relations • Machines are able to process contents and links © 2010 DFKI CC CBR T. Berners-Lee, J. Hendler, O. Lassila, “The Semantic Web”, Scientific American, May 2001 Donnerstag, 5. August 2010
  26. 26. SEMANTIC WEB VISION “The Semantic Web is an extension of the current Web in which information is given well- defined meaning, better enabling computers and people to work in cooperation.” • Web of content Web of content • Web pages linked by semantical relations • Machines are able to process contents and links © 2010 DFKI CC CBR T. Berners-Lee, J. Hendler, O. Lassila, “The Semantic Web”, Scientific American, May 2001 Donnerstag, 5. August 2010
  27. 27. WEB OF DATA • Characteristics: • Expressed in RDF • Identified by URIs • Accessible via http © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  28. 28. WEB OF TRIPLES <rdf:Description rdf:about= "http://dbtropes.org/resource/Main/Ratatouille#Remy"> <does-not-like rdf:resource= "http://mycbr-project.net/models/Recipe#velveeta_cheese"/> </rdf:Description> © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  29. 29. WEB OF TRIPLES • Characteristics: • Expressed in RDF <rdf:Description • Identified by rdf:about= URIs "http://dbtropes.org/resource/Main/Ratatouille#Remy"> <does-not-like • Accessible via rdf:resource= http "http://mycbr-project.net/models/Recipe#velveeta_cheese"/> </rdf:Description> © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  30. 30. WEB OF TRIPLES <rdf:Description rdf:about= "http://dbtropes.org/resource/Main/Ratatouille#Remy"> <does-not-like rdf:resource= "http://mycbr-project.net/models/Recipe#velveeta_cheese"/> </rdf:Description> © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  31. 31. flickr RKB Project Geo- Euro- wrappr Explorer Guten- Virtuoso names Pisa USING LINKED stat berg Sponger Open Calais RKB World Linked ECS Magna- Fact- MDB South- DATA FOR CASE tune book ampton RDF Book DBpedia Mashup lingvoj Freebase CiteSeer W3C DBLP GENERATION WordNet Hannover UniR GEO UMBEL Species DBLP Berlin Reactome LinkedCT UniParc o Drug PROSITE Daily Bank Med Pub GeneID Chem KEGG UniProt Pfam Disea- CAS Gene some ChEBI Ontology l OMIM UniSTS Case Inter Pro PDB HGNC MGI PubMed Model © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  32. 32. flickr RKB Project Geo- Euro- wrappr Explorer Guten- Virtuoso names <skos:Concept Pisa USING LINKED stat berg Sponger Open rdf:about="http://mycbr-project.net/models/Recipe#Shallots"> Calais World Linked <skos:prefLabel> RKB ECS Magna- Fact- MDB Shallots South- DATA FOR CASE tune book ampton DBpedia </skos:prefLabel> RDF Book Mashup lingvoj Freebase <rdf:type rdf:resource="ingredients_vegetables"/> CiteSeer W3C </skos:Concept> DBLP GENERATION WordNet Hannover UniR GEO UMBEL Species DBLP Berlin <skos:Concept LinkedCT rdf:about="http://mycbr-project.net/models/Recipe#Onions"> Reactome UniParc o Drug <skos:prefLabel> PROSITE Bank Daily Med Onions Pub GeneID Chem </skos:prefLabel> KEGG UniProt <rdf:type rdf:resource="ingredients_vegetables"/> Disea- CAS </skos:Concept> Pfam Gene some ChEBI Ontology l OMIM UniSTS Case Inter Pro PDB HGNC MGI PubMed Model © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  33. 33. flickr RKB Project Geo- Euro- wrappr Explorer Guten- Virtuoso names Pisa USING LINKED stat berg Sponger Open Calais RKB World Linked ECS Magna- Fact- MDB South- DATA FOR CASE tune book ampton RDF Book DBpedia Mashup lingvoj Freebase CiteSeer W3C DBLP GENERATION WordNet Hannover UniR GEO UMBEL Species DBLP Berlin Reactome LinkedCT UniParc o Drug PROSITE Daily Bank Med Pub GeneID Chem KEGG UniProt Pfam Disea- CAS Gene some ChEBI Ontology l OMIM UniSTS Case Inter Pro PDB HGNC MGI PubMed Model © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  34. 34. flickr RKB Project Geo- Euro- wrappr Explorer Guten- Virtuoso names Pisa USING LINKED stat berg Sponger Open Calais RKB World Linked ECS Magna- Fact- MDB South- DATA FOR CASE tune book ampton RDF Book DBpedia Mashup lingvoj Freebase CiteSeer W3C DBLP GENERATION WordNet Hannover UniR GEO UMBEL Species DBLP Berlin Reactome LinkedCT UniParc o Drug PROSITE Daily Bank Med Pub GeneID Chem KEGG UniProt Pfam Disea- CAS Gene some ChEBI Ontology l OMIM UniSTS Connection Case Inter Pro PDB HGNC Model MGI PubMed Model © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  35. 35. flickr flickr RKB RKB Project Project Geo- Geo- Euro- Euro- wrappr wrappr Explorer Explorer Guten- Guten- Virtuoso Virtuoso names names Pisa Pisa CORDIS USING LINKED stat stat berg berg Sponger Sponger eprints Open Open Calais Calais RKB RKB World World Linked Linked ECS ECS Magna- Magna- Fact- Fact- MDB MDB IEEE New- South- South- DATA FOR CASE tune tune book book ampton ampton castle RDF Book RDF Book DBpedia DBpedia Mashup Mashup lingvoj lingvoj Freebase Freebase LAAS- CiteSeer CiteSeer CNRS W3C W3C DBLP DBLP GENERATION IBM WordNet WordNet Hannover Hannover UniRef UniR GEO GEO UMBEL UMBEL Species Species DBLP DBLP Berlin Berlin Reactome Reactome LinkedCT LinkedCT UniParc UniParc Taxonomy o o Drug Drug owl:sameas PROSITE PROSITE Daily Daily Bank Bank Med Med Pub Pub GeneID GeneID Chem Chem KEGG KEGG UniProt UniProt Pfam Pfam ProDom Disea- Disea- CAS CAS Gene Gene some some ChEBI ChEBI Ontology Ontology l OMIM OMIM UniSTS UniSTS Connection CaseInter Inter Pro Pro PDB PDB HGNC HGNC Model MGI MGI PubMed PubMed Model As of July 2009 © 2010 DFKI CC CBR Donnerstag, 5. August 2010
  36. 36. flickr flickr RKB RKB Project Project Geo- Geo- Euro- Euro- wrappr wrappr Explorer Explorer Guten- Guten- Virtuoso Virtuoso names names Pisa Pisa CORDIS USING LINKED stat stat berg berg Sponger Sponger eprints Open Open Calais Calais RKB RKB World World Linked Linked ECS ECS Magna- Magna- Fact- Fact- MDB MDB IEEE New- South- South- DATA FOR CASE tune tune book book ampton ampton castle RDF Book RDF Book DBpedia DBpedia Mashup Mashup lingvoj lingvoj Freebase Freebase LAAS- CiteSeer CiteSeer CNRS W3C W3C DBLP DBLP GENERATION IBM WordNet WordNet Hannover Hannover UniRef UniR GEO GEO UMBEL UMBEL Species Species DBLP DBLP Berlin Berlin Reactome Reactome LinkedCT LinkedCT UniParc UniParc Taxonomy o o Drug Drug owl:sameas PROSITE PROSITE Daily Daily Bank Bank Med Med Pub Pub GeneID GeneID Chem Chem KEGG KEGG UniProt UniProt Pfam Pfam ProDom Disea- Disea- CAS CAS Gene Gene some some ChEBI ChEBI Ontology Ontology l OMIM OMIM UniSTS UniSTS Connection CaseInter Inter Pro Pro PDB PDB HGNC HGNC Model MGI MGI PubMed PubMed Model <http://mycbr-project.net/models/Recipe#onions> 2009 As of July owl:sameas <http://dbpedia.org/resource/Onion> <http://mycbr-project.net/models/Recipe#green_fettuccine"> owl:sameas <http://dbpedia.org/resource/Fettucine> <http://mycbr-project.net/models/Recipe#spinach_noodles"> owl:sameas <http://dbpedia.org/resource/Noodle> © 2010 DFKI CC CBR Donnerstag, 5. August 2010

×