Daniel Gerber     Axel-Cyrille Ngonga Ngomo                    AKSW, Universität LeipzigBOAExtracting MultilingualNatural-...
Bootstrapping the Data WebMotivation              ๏        Most knowledge bases are extracted from                       (...
Bootstrapping the Data WebIdea I                               dbpedia:Barack_Obama      dbpedia-owl:birthPlace           ...
Bootstrapping the Data WebIdea II                         Barack Obama     was born in Honolulu, Hawaii.                  ...
Bootstrapping the Data WebIdea III                             married                           is a politician of the   ...
Bootstrapping the Data Web                                             The BOA approach                      Data Web     ...
Bootstrapping the Data WebPattern Search             (1) Set of entities s and o connected through p             (2) Find ...
Bootstrapping the Data WebFeature Extraction - Language Independent        subsidiary ↣ “?Company was acquired by ?Company...
Bootstrapping the Data WebFeature Extraction - Language Dependent  Intrinsic Information Content Metric                   ...
Bootstrapping the Data WebBOA Neuronal Network         ๏ 200 patterns are manually classified as good           (1) or bad ...
Bootstrapping the Data WebRDF Generation                                            ?D? with his wife ?R?                 ...
Bootstrapping the Data WebEvaluation I                                  en-wiki         en-news        de-wiki         de-...
Bootstrapping the Data WebEvaluation II                              en-wiki    en-news     de-wiki   de-news# of pattern ...
Bootstrapping the Data WebConclusion            ๏    No manual created seed patterns needed            ๏    > 90% precisio...
Bootstrapping the Data WebBOA Graphical User Interface                                         http://boa.aksw.orgEKAW - 1...
Thank you!                                           Questions?Daniel GerberAugustusplatz 10, Room P61604109 Leipzig, Germ...
Upcoming SlideShare
Loading in …5
×

Extracting Multilingual Natural-Language Patterns for RDF Predicates

7,796 views

Published on

Most knowledge sources on the Data Web were extracted from structured or semi-structured data. Thus, they encompass solely a small fraction of the information available on the document-oriented Web. In this paper, we present BOA, a bootstrapping strategy for ex- tracting RDF from text. The idea behind BOA is to extract natural-language patterns that represent predicates found on the Data Web from unstructured data by using background knowledge from the Data Web. These patterns are then used to extract instance knowledge from natural-language text. This knowledge is finally fed back into the Data Web, therewith closing the loop. The approach followed by BOA is quasi independent of the language in which the corpus is written. We demonstrate our approach by applying it to four different corpora and two different languages. We evaluate BOA on these data sets using DBpedia as background knowledge. Our results show that we can extract several thousand new facts in one iteration with very high accuracy.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
7,796
On SlideShare
0
From Embeds
0
Number of Embeds
6,285
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Extracting Multilingual Natural-Language Patterns for RDF Predicates

  1. 1. Daniel Gerber Axel-Cyrille Ngonga Ngomo AKSW, Universität LeipzigBOAExtracting MultilingualNatural-Language Patternsfor RDF Predicates
  2. 2. Bootstrapping the Data WebMotivation ๏ Most knowledge bases are extracted from (semi)-structured data ๏ Only 15-20 % of information in structured data ๏ Semantic Web ⬌ Document Web ๏ How can we extract data from the document- oriented web?EKAW - 10.10.2012 - Page 2 http://boa.aksw.org
  3. 3. Bootstrapping the Data WebIdea I dbpedia:Barack_Obama dbpedia-owl:birthPlace dbpedia-owl:spouse dbpedia-owl:party dbpedia:Honolulu,_Hawaii dbpedia:Michelle_Obama dbpedia:Democratic_PartyEKAW - 10.10.2012 - Page 3 http://boa.aksw.org
  4. 4. Bootstrapping the Data WebIdea II Barack Obama was born in Honolulu, Hawaii. is a politician of the Barack Hussein Obama is a politician of the Democratic Party. Obama met married Michelle Robinson in 1992.EKAW - 10.10.2012 - Page 4 http://boa.aksw.org
  5. 5. Bootstrapping the Data WebIdea III married is a politician of the Jackie Bouvier Kennedy Onassis who married John F. Kennedy was tied to Joseph Martin "Joschka" Fischer (born 1948-04-12) the Auchinclosses via her sisters is a politician of the German Green Party. marriage into the Auchincloss family. was born in Dietrichs only child, Maria Elisabeth Sieber, was born in Berlin on 13 December 1924.EKAW - 10.10.2012 - Page 5 http://boa.aksw.org
  6. 6. Bootstrapping the Data Web The BOA approach Data Web 2 5 Feature 6 SPARQL Filter Extraction Neural Surface Network Web 3 forms 4 Search & Filter Patterns 7 Corpus Extraction Module Crawler Indexer Cleaner Corpora 8 1 GenerationEKAW - 10.10.2012 - Page 6 http://boa.aksw.org
  7. 7. Bootstrapping the Data WebPattern Search (1) Set of entities s and o connected through p (2) Find all sentences which contain s and o (3) Replace labels with variables (?D?, ?R?) BOA pattern: BOA pattern mapping: dbpedia-owl:spouse “?D? with his wife ?R?” “?D? with his wife ?R?” “?D? and her husband ?R?” “?D? and his wife ?R?”EKAW - 10.10.2012 - Page 7 http://boa.aksw.org
  8. 8. Bootstrapping the Data WebFeature Extraction - Language Independent subsidiary ↣ “?Company was acquired by ?Company” Support Specificity Typicity pattern should be used pattern should not be used pattern should be used to across several triples by many pattern mappings connect entities of correct type ๏ subsidiary: ๏ Hypercom_ORG was_O ๏ Google - DoubleClick: 2 acquired_O by_O “?R? is a part of ?D?” ๏ General Motors - Verifone_ORG ._O Opel:1 ๏ foundationOrg: ๏ Cablevision - “?R? is a part of ?D?” Rainbow Media: 4EKAW - 10.10.2012 - Page 8 http://boa.aksw.org
  9. 9. Bootstrapping the Data WebFeature Extraction - Language Dependent Intrinsic Information Content Metric ReVerb dbpedia:subsidiary ๏ Open Information Extraction ๏ Patterns need to abide a POS rdfs:label “subsidiary”@en tag sequence ๏ Logistic Regression Classifier Wordnet ?D? was acquired by ?R?EKAW - 10.10.2012 - Page 9 http://boa.aksw.org
  10. 10. Bootstrapping the Data WebBOA Neuronal Network ๏ 200 patterns are manually classified as good (1) or bad (0) Input Layer Hidden Layer Output Layer [0,1] [0,1] ๏ up to 18 Reverb features, depending Specificity on language IICM TypicityEKAW - 10.10.2012 - Page 10 http://boa.aksw.org
  11. 11. Bootstrapping the Data WebRDF Generation ?D? with his wife ?R? Pacheco arrived with his wife Leyla Rodriguez Stahl and several...Pacheco_PER arrived_O with_O his_O wife_O Leyla_PER Rodriguez_PER Stahl_PER and_O NEW dbpedia-owl:spouse NEW dbpedia:Abel_Pacheco boa:Leyla_Rodriguez_Stahl rdf:type rdf:type rdfs:label NEW rdfs:label dbpedia- NEW dbpedia-‘‘Abel Pacheco’’@en owl:Person ‘‘Leyla Rodriguez Stahl’’@en owl:PersonEKAW - 10.10.2012 - Page 11 http://boa.aksw.org
  12. 12. Bootstrapping the Data WebEvaluation I en-wiki en-news de-wiki de-newsLanguage english english german germanTopic general knowledge news general knowledge news# of sentences 58M 214,2M 24,6M 112,8M# of tokens per sentence 21,4 22,1 17,4 18,3EKAW - 10.10.2012 - Page 12 http://boa.aksw.org
  13. 13. Bootstrapping the Data WebEvaluation II en-wiki en-news de-wiki de-news# of pattern mappings 125 44 66 19# of patterns 9551 586 7366 109# of new triples 78944 22883 10138 883# of known triples 1829 798 655 42# of found triples 80773 3081 10793 925Precision Top-100 92 % 70 % 91 % 74 %EKAW - 10.10.2012 - Page 13 http://boa.aksw.org
  14. 14. Bootstrapping the Data WebConclusion ๏ No manual created seed patterns needed ๏ > 90% precision for german an english dataset ๏ high recall through surface forms ๏ Output easily integrable in LOD Cloud ๏ Library of natural-language representations of formal relations, DemoEKAW - 10.10.2012 - Page 14 http://boa.aksw.org
  15. 15. Bootstrapping the Data WebBOA Graphical User Interface http://boa.aksw.orgEKAW - 10.10.2012 - Page 15 http://boa.aksw.org
  16. 16. Thank you! Questions?Daniel GerberAugustusplatz 10, Room P61604109 Leipzig, GermanySIMBA@AKSWhttp://bis.informatik.uni-leipzig.de/DanielGerberhttp://boa.aksw.orghttp://code.google.com/p/boa LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

×