Successfully reported this slideshow.
Your SlideShare is downloading. ×

BOA - Bootstrapping Linked Data

BOA - Bootstrapping Linked Data

Download to read offline

Most knowledge sources on the Data Web were extracted from structured or semi-structured data. Thus, they encompass solely a small fraction of the information available on the document-oriented Web. In this paper, we present BOA, an iterative bootstrapping strat- egy for extracting RDF from unstructured data. The idea behind BOA is to use the Data Web as background knowledge for the extraction of natural language patterns that represent predicates found on the Data Web. These patterns are used to extract instance knowledge from natu- ral language text. This knowledge is finally fed back into the Data Web, therewith closing the loop. We evaluate our approach on two data sets using DBpedia as background knowledge. Our results show that we can extract several thousand new facts in one iteration with very high ac- curacy. Moreover, we provide the first repository of natural language representations of predicates found on the Data Web.

Most knowledge sources on the Data Web were extracted from structured or semi-structured data. Thus, they encompass solely a small fraction of the information available on the document-oriented Web. In this paper, we present BOA, an iterative bootstrapping strat- egy for extracting RDF from unstructured data. The idea behind BOA is to use the Data Web as background knowledge for the extraction of natural language patterns that represent predicates found on the Data Web. These patterns are used to extract instance knowledge from natu- ral language text. This knowledge is finally fed back into the Data Web, therewith closing the loop. We evaluate our approach on two data sets using DBpedia as background knowledge. Our results show that we can extract several thousand new facts in one iteration with very high ac- curacy. Moreover, we provide the first repository of natural language representations of predicates found on the Data Web.

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

BOA - Bootstrapping Linked Data

  1. 1. Daniel Gerber Axel-Cyrille Ngonga Ngomo AKSW, Universität Leipzig
  2. 2. Bootstrapping the Data Web Motivation ๏ Most knowledge bases extracted from (semi)- structured data ๏ Only 15-20 % of information in structured data ๏ Semantic Web ⬌ Document Web ๏ How can we extract data from the document- oriented web? WeKEx@ISWC - 17.01.2012 - Page 2 http://boa.aksw.org
  3. 3. Bootstrapping the Data Web Idea I dbpedia:Barack_Obama dbpedia-owl:birthPlace dbpedia-owl:spouse dbpedia-owl:party dbpedia:Honolulu,_Hawaii dbpedia:Michelle_Obama dbpedia:Democratic_Party WeKEx@ISWC - 17.01.2012 - Page 3 http://boa.aksw.org
  4. 4. Bootstrapping the Data Web Idea II Barack Obama was born in Honolulu, Hawaii. is a politician of the Barack Hussein Obama is a politician of the Democratic Party. Obama married Michelle Robinson in 1992. WeKEx@ISWC - 17.01.2012 - Page 4 http://boa.aksw.org
  5. 5. Bootstrapping the Data Web Idea III married is a politician of the Jackie Bouvier Kennedy Onassis who married John F. Kennedy was tied to Joseph Martin "Joschka" Fischer (born 1948-04-12) the Auchinclosses via her sister's is a politician of the German Green Party. marriage into the Auchincloss family. was born in Dietrich's only child, Maria Elisabeth Sieber, was born in Berlin on 13 December 1924. WeKEx@ISWC - 17.01.2012 - Page 5 http://boa.aksw.org
  6. 6. Bootstrapping the Data Web Related Work ๏ ReadTheWeb Project: N(ever) E(nding) L(anguage) L(earner) ๏ PROSPERA: Scalable Knowledge Harvesting with High Precision and High Recall WeKEx@ISWC - 17.01.2012 - Page 6 http://boa.aksw.org
  7. 7. Bootstrapping the Data Web The BOA approach Use in next Knowledge Acquisition iteration Filtering Data Web SPARQL 2 3 Background Web Knowledge Pattern Pattern Scoring Patterns 1 Search Corpus Extraction 4 Crawler Indexer Cleaner RDF Corpora Generation 5 WeKEx@ISWC - 17.01.2012 - Page 7 http://boa.aksw.org
  8. 8. Bootstrapping the Data Web Knowledge acquisition SELECT ?x ?xLabel ?prop ?y ?yLabel ?domain ?range WHERE { ?x rdf:type dbpedia-owl:[Organisation|Person|Place] . ?x rdfs:label ?xLabel . ?y rdfs:label ?yLabel . [?y ?prop ?x | ?x ?prop ?y] . FILTER ( lang(?xLabel) = ‘en’ && lang(?yLabel) = ‘en’ ) . ?prop rdfs:range ?range . ?prop rdfs:domain ?domain . } http://dbpedia.org/resource/Google http://dbpedia.org/ontology/Company “Google” http://dbpedia.org/ontology/Company http://dbpedia.org/ontology/subsidiary http://dbpedia.org/resource/YouTube “Youtube” WeKEx@ISWC - 17.01.2012 - Page 8 http://boa.aksw.org
  9. 9. Bootstrapping the Data Web Pattern Search (1) Set of entities s and o connected through p (2) Find all sentences which contain s and o (3) Replace labels with variables (?D?, ?R?) BOA pattern: BOA pattern mapping: dbpedia-owl:spouse dbpedia-owl:spouse dbpedia-owl:spouse “?D? with his wife ?R?” “?D? with his wife ?R?” “?D? and her husband ?R?” dbpedia-owl:spouse “?D? and his wife ?R?” WeKEx@ISWC - 17.01.2012 - Page 9 http://boa.aksw.org
  10. 10. Bootstrapping the Data Web Pattern Scoring - Support Support pattern should be used across several triples in background knowledge subsidiary ↣ “?R? was acquired by ?D?” ๏ [Google, DoubleClick] ↣ 2 ๏ [General Motors, Opel] ↣ 1 ๏ [Cablevision, Rainbow Media] ↣ 4 WeKEx@ISWC - 17.01.2012 - Page 10 http://boa.aksw.org
  11. 11. Bootstrapping the Data Web Pattern Scoring - Specificity Specificity pattern should not be used by many pattern mappings ๏ subsidiary: “?D? agreed to buy ?R?” ๏ subsidiary: “?R? is a part of ?D?” ๏ foundationOrganisation: “?R? is a part of ?D?” WeKEx@ISWC - 17.01.2012 - Page 11 http://boa.aksw.org
  12. 12. Bootstrapping the Data Web Pattern Scoring - Typicity Typicity pattern should be used to connect entities of correct type ๏ Hypercom was acquired by Verifone . ๏ Hypercom_ORG was_O acquired_O by_O Verifone_ORG ._O ๏ Maktoob was acquired by Yahoo! ๏ Maktoob_PER was_O acquired_O by_O Yahoo_ORG ._O WeKEx@ISWC - 17.01.2012 - Page 12 http://boa.aksw.org
  13. 13. Bootstrapping the Data Web RDF Generation ?D? with his wife ?R? Pacheco arrived with his wife Leyla Rodriguez Stahl and several... Pacheco_PER arrived_O with_O his_O wife_O Leyla_PER Rodriguez_PER Stahl_PER and_O NEW dbpedia-owl:spouse NEW dbpedia:Abel_Pacheco boa:Leyla_Rodriguez_Stahl rdf:type rdf:type rdfs:label NEW rdfs:label dbpedia- NEW dbpedia- ‘‘Abel Pacheco’’@en owl:Person ‘‘Leyla Rodriguez Stahl’’@en owl:Person WeKEx@ISWC - 17.01.2012 - Page 13 http://boa.aksw.org
  14. 14. Bootstrapping the Data Web Evaluation I riverMouth musicalArtist musicalBand # of triples en-wiki en-news award writer almaMater occupation Language english english formerTeam deathPlace general birthPlace Topic news knowledge # of lines 44.7M 256.1M riverMouth 158697 musicalArtist musicalBand is object award # of triples is subject # of words 1,032.1M 5,068.7M writer 551693 almaMater 327430 occupation 137990 formerTeam deathPlace 72820 64239 birthPlace Place Person Organisation WeKEx@ISWC - 17.01.2012 - Page 14 http://boa.aksw.org
  15. 15. Bootstrapping the Data Web Evaluation II en-wiki en-news LOC PER ORG LOC PER ORG Triples extracted 1465 8817 2567 488 903 916 Triples in DBpedia 138 183 48 52 44 7 Evaluated Triples 100 (8) 100 (1) 100 (1) 100 (1) 100 (7) 100 (0) Precision 90,5 97 99 61,5 73,5 91 New true Statements* 1200 8375 2494 268 631 827 Found pattern mappings 62 72 59 49 70 55 Found patterns 123k 136k 38k 569k 465k 92k Scored patterns 1045 612 241 3832 7294 1077 * Number of extracted statements not found in DBpedia multiplied with the precision of our approach WeKEx@ISWC - 17.01.2012 - Page 15 http://boa.aksw.org
  16. 16. Bootstrapping the Data Web Future Work ๏ Iteration 1+ ๏ Human feedback ๏ Pattern generalization ๏ Datatype Properties ๏ Languages/Corpora ๏ Webservices WeKEx@ISWC - 17.01.2012 - Page 16 http://boa.aksw.org
  17. 17. Bootstrapping the Data Web Conclusion ๏ No manual created seed patterns needed ๏ 95.5% Precision on DBpedia/Wikipedia ๏ Output easily integrable in LOD Cloud ๏ Library of natural-language representations of formal relations, Demo ๏ Quasi language independent (German/Korean) WeKEx@ISWC - 17.01.2012 - Page 17 http://boa.aksw.org
  18. 18. Thank you! Questions? Daniel Gerber Johannisgasse 26, Room 5-21 04103 Leipzig, Germany SIMBA@AKSW http://bis.informatik.uni-leipzig.de/DanielGerber http://boa.aksw.org http://code.google.com/p/boa LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

×