SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig<br />Tobias Wunner<br />Unit for Natura...
Based On:<br />“SOFIE: A Self-Organizing Framework for Information Extraction”<br />Authors: Fabian Suchanek, Mauro Sozio,...
Overview<br />Introduction<br />SOFIE Model + Rules<br />Excursion: Satisfiability<br />SOFIE Approach<br />Evaluation exp...
Motivation<br />Classical IE on text<br />  pattern-based  80pc<br />Semistructural approach<br />  Wikipedia infoboxes ...
Example<br />5<br />Document1<br />YAGO ontology<br />familyName(AlbertEinstein, Einstein)<br />bornIn(AlbertEinstein, Ger...
General Idea<br />Express extraction patterns as fact<br />Rules to understand usage of terms<br />Add restrictions<br />6...
Contribution<br />Unified approach to<br />Pattern matching<br />Word Sense Disambiguation<br />Reasoning<br />Large Scale...
Pattern extraction with WICs<br />Extract patterns based on ‘interesting’ entities<br />8<br />Documents<br />Einstein was...
Grounding<br />Test Rules<br />How?<br />find an instance which satisfies the formulae<br />9<br />bornIn(Einstein,Ulm) ⇒ ...
Rules (Hypotheses)<br />Disambiguation<br />disambiguatesAs(Albert@D,AlberEinstein)[?]<br />Expresses a new fact<br />expr...
New fact rule<br />...with disambiguation<br />11<br />“Pattern P expresses<br /> Relation R when  <br />  analysis of WIC...
Restrictions<br />Disambiguation<br /> disambiguation prior should influence choice of disambiguation<br />12<br />N - any...
Restrictions<br />Functional restrictions<br />13<br />R(X,Y) and <br />type(R, function) and<br />different(Y,Z)<br />⇒ ¬...
SOFIE Rules<br />Framework to test the hypotheses<br />Question<br />  “How to satisfy all them?” <br />rules      +      ...
SAT / MAX SAT<br />SAT (Satisfiability)<br />proove formula can be TRUE<br />Complexity Classes<br />P  Good    example: ...
SAT / MAX SAT<br />SAT (Satisfiability)<br />proove formula can be TRUE<br />Complexity Classes<br />P  Good    example: ...
Weighted MAX SAT in SOFIE<br />...back to SOFIE<br />this is MAX SAT but with weights<br />17<br />rules      +     truste...
Weighted MAX SAT in SOFIE<br />Weighted MAX SAT is NP hard<br />only approximation algorithms<br /> impractical to find o...
Weighted MAX SAT in SOFIE<br />Functional MAX SAT<br />Specialized reasoning (support for functional properties)<br />Appr...
Controlled experiment<br />Corpus from Wikipedia infoboxes<br />100 articles<br />Semantic is known!<br />20<br />
Controlled experiment<br />Large-scale: Corpus from Wikipedia articles<br />2000 articles<br />13 frequent relations from ...
Unstructured text sources<br />150 news paper articles<br />relation under test headquarterOf<br />YAGO (modified with rel...
Unstructured text sources<br />Large-scale:<br />10 biographies for each of 400 US senators<br />5 relationships<br />Disa...
MAX SAT can’t do OWL per se (Open World Assumption)<br />Reformulate OWL in propositional logic<br />OWL  FOL  Skolem No...
Conclusions<br />Ontology-based IE (OBIE) reformulated as weighted MAX SAT problem<br />Approximation algorithm with 1/2<b...
Limitations<br />Specialized approximation algorithm<br />Accounts for SOFIE rules NOT OWL<br />MAX SAT Restrictions<br />...
References<br />27<br />F Suchanek et al, SOFIE: a self-organizing framework for information extraction, Proceeding WWW '0...
Upcoming SlideShare
Loading in …5
×

SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig

1,017 views

Published on

The creation of new knowledge in the Semantic Web is more and more depending on a automatic knowledge enrichment processes, such semi-structural Information Extraction (IE) in the example of the creation of DBPedia from Wikipedia. To further improve knowledge coverage IE must also consider non-structural plain natural language text resources. Here SOFIE offers a novel approach to IE which can consistently enrich semantic models from text sources by combining pattern matching, entity disambiguation and reasoning in a propositional logic approach using MAX SAT in the IE process.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,017
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig

  1. 1. SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig<br />Tobias Wunner<br />Unit for Natural Language Processing (UNLP)<br />firstname.lastname@deri.org<br />Wednesday,22nd June, 2011<br />DERI, Reading Group<br />1<br />
  2. 2. Based On:<br />“SOFIE: A Self-Organizing Framework for Information Extraction”<br />Authors: Fabian Suchanek, Mauro Sozio, <br /> Gerhard Weikum<br />Published: World Wide Web Conference (WWW) <br /> Madrid, 2009<br />2<br />
  3. 3. Overview<br />Introduction<br />SOFIE Model + Rules<br />Excursion: Satisfiability<br />SOFIE Approach<br />Evaluation experiments<br />Conclusion<br />3<br />
  4. 4. Motivation<br />Classical IE on text<br /> pattern-based  80pc<br />Semistructural approach<br /> Wikipedia infoboxes 95%<br />Idea of Paper: combine<br /> use text (hypotheses) + ontology (trusted facts)<br />4<br />
  5. 5. Example<br />5<br />Document1<br />YAGO ontology<br />familyName(AlbertEinstein, Einstein)<br />bornIn(AlbertEinstein, Germany)<br />attendedSchoolIn( AlbertEinstein, Germany)<br />Einstein attended secondary school in Germany.<br />New Knowledge<br />
  6. 6. General Idea<br />Express extraction patterns as fact<br />Rules to understand usage of terms<br />Add restrictions<br />6<br />patternOcc(“X went to school in Y”,Einstein, Switzerland)<br />patternOcc(Pattern,X,Y) and R(X,Y) ⇒ express(Pattern,R)<br />
  7. 7. Contribution<br />Unified approach to<br />Pattern matching<br />Word Sense Disambiguation<br />Reasoning<br />Large Scale<br />On Unstructured Data<br />7<br />
  8. 8. Pattern extraction with WICs<br />Extract patterns based on ‘interesting’ entities<br />8<br />Documents<br />Einstein was born at Ulm in Württemberg, Germany, on March 18, 1879. When Albert was around four, his father gave him a magnetic compass. <br />When Albert became older, he went to a school in Switzerland. After he graduated, he got a job in the patent office there…<br />Knowledge Base<br />patternOcc(“Einstein was born in Ulm”,Einstein@D1, Ulm@D1) [1]<br />patternOcc(“Ulm is in Württemberg, Germany”,Ulm@D1, Germany@D1) [1]<br />patternOcc(“Albert .. Switzerland”,Albert@D1, Switzerland@D1) [1]<br />WICs (Word in Context)<br />
  9. 9. Grounding<br />Test Rules<br />How?<br />find an instance which satisfies the formulae<br />9<br />bornIn(Einstein,Ulm) ⇒ ¬bornIn(Einstein,Timbuktu)<br />studiedIn(Einstein,Ulm)<br />bornIn(X,Ulm) ⇒ ¬bornIn(X,Timbuktu)<br />studiedIn(X,Ulm)<br />
  10. 10. Rules (Hypotheses)<br />Disambiguation<br />disambiguatesAs(Albert@D,AlberEinstein)[?]<br />Expresses a new fact<br />expresses(P, livedIn(Einstein,Switzerland) )[?]<br />New facts<br />CityIn(Ulm,Germany)[?]<br />10<br />
  11. 11. New fact rule<br />...with disambiguation<br />11<br />“Pattern P expresses<br /> Relation R when <br /> analysis of WICs <br /> are disambiguated”<br />patternOcc( P, WX, WY ) and<br />disambiguatesAs(WX, X) and<br />disambiguatesAs(WY, Y) and<br />R(X,Y)<br />⇒ express( P, R )<br />
  12. 12. Restrictions<br />Disambiguation<br /> disambiguation prior should influence choice of disambiguation<br />12<br />N - any disamb. function<br />disambPrior( W, X, N )<br />⇒ disambiguatedAs( W, X )<br />| words(D1) ∩ rel(AlbertEinstein)|<br />| words(D1) |<br />
  13. 13. Restrictions<br />Functional restrictions<br />13<br />R(X,Y) and <br />type(R, function) and<br />different(Y,Z)<br />⇒ ¬R(X,Z)<br />“Albert@D1 born in?”<br />Albert@D1 ≠ Albert@D2<br />
  14. 14. SOFIE Rules<br />Framework to test the hypotheses<br />Question<br /> “How to satisfy all them?” <br />rules + trusted facts<br />14<br />dismbPrior(Albert@D1, AlbertEinstein, 10)<br />⇒ disambiguatesAs(Albert@D1, AlbertEinstein)<br />patternOcc( P, X, Y ) and<br />R(X,Y)<br />⇒ express( P, R )<br />dismbPrior(Albert@D1, HermannEinstein, 3)<br />⇒ disambiguatesAs(Albert@D1, HermannEinstein)<br /> Country(Germany)<br />livedIn(AlbertEinstein,Ulm)<br /> …<br />
  15. 15. SAT / MAX SAT<br />SAT (Satisfiability)<br />proove formula can be TRUE<br />Complexity Classes<br />P  Good example: Nk<br />NP  Bad cN<br />e.g. naive algorithm for 100 variables<br /> 2100 x 10-10 ms per row = 4 x 1012 y<br />Not always.. 3SAT in (4/3)N<br />SAT Solver<br />15<br />F = (X or Y or Z) and (¬X or Y or Z) <br /> and (¬X or ¬Y or ¬Z)<br />G = (X or Y) and (¬X or ¬Y) and (X)<br />truth table has 23 rows<br />Details Schöning 2010<br />
  16. 16. SAT / MAX SAT<br />SAT (Satisfiability)<br />proove formula can be TRUE<br />Complexity Classes<br />P  Good example: Nk<br />NP  Bad cN<br />e.g. naive algorithm for 100 variables<br /> 2100 x 10-10 ms per row = 4 x 1012 y<br />Not always.. 3SAT in (4/3)N<br />SAT Solver<br />MAX SAT<br />16<br />F = (X or Y or Z) and (¬X or Y or Z) <br /> and (¬X or ¬Y or ¬Z)<br />G = (X or Y) and (¬X or ¬Y) and (X)<br />truth table has 23 rows<br />Details Schöning 2010<br />
  17. 17. Weighted MAX SAT in SOFIE<br />...back to SOFIE<br />this is MAX SAT but with weights<br />17<br />rules + trusted facts<br /> Country(Germany)<br />livedIn(AlbertEinstein,Ulm)<br /> …<br />dismbPrior(Albert@D1, AlbertEinstein, 10)<br />⇒ disambiguatesAs(Albert@D1, AlbertEinstein)<br />patternOcc( P, X, Y ) and<br />R(X,Y)<br />⇒ express( P, R )<br />dismbPrior(Albert@D1, HermannEinstein, 3)<br />⇒ disambiguatesAs(Albert@D1, HermannEinstein)<br />
  18. 18. Weighted MAX SAT in SOFIE<br />Weighted MAX SAT is NP hard<br />only approximation algorithms<br /> impractical to find optimal solution<br />SAT Solver<br />Johnson’s algorithm:  2/3 (apprx guarantee)<br />
  19. 19. Weighted MAX SAT in SOFIE<br />Functional MAX SAT<br />Specialized reasoning (support for functional properties)<br />Approximation guarantee 1/2<br />Propagates dominating unit clauses<br />Considers only unit clauses<br />A v B [w1]<br />A v B [w2]<br />B v C [w3]<br />C [w4]<br />A v B [10]<br />A [10]<br />A [30]<br />A = true<br />30 > 10+10<br />
  20. 20. Controlled experiment<br />Corpus from Wikipedia infoboxes<br />100 articles<br />Semantic is known!<br />20<br />
  21. 21. Controlled experiment<br />Large-scale: Corpus from Wikipedia articles<br />2000 articles<br />13 frequent relations from YAGO<br />Parsing = 87min Reaoning = 77min<br />21<br />
  22. 22. Unstructured text sources<br />150 news paper articles<br />relation under test headquarterOf<br />YAGO (modified with relation seeds)<br />Parsing 87min WeightedMaxSat 77min<br />disambiguated entries (provenance) could be manually assessed<br />22<br />functional<br />relation<br />
  23. 23. Unstructured text sources<br />Large-scale:<br />10 biographies for each of 400 US senators<br />5 relationships<br />Disambiguation was not ideal for YAGO (13 James Watson)<br />Parsing 7h W-MAX-SAT 9h<br />Results<br />4 good<br />1 bad (misleading patterns)<br />23<br />
  24. 24. MAX SAT can’t do OWL per se (Open World Assumption)<br />Reformulate OWL in propositional logic<br />OWL  FOL  Skolem Normal Form  Propositional Logic<br />Might find OWL-inconsistent ontologies due to OW Assumption<br />24<br />define a student as a subclass “attends some course”<br />⇒ ∀ x, ∃ y: attends(x,y), Course(y) -> Student(y)<br />⇒ ∀ x: attends(x,k), Course(y) -> Student(y); ∃ k<br />⇒ ¬attends(xi, ki) or ¬Course(xi) or Student(xi); k=x1 .. xn<br />Inferred Ontology<br />{ Student(alex), Student(bob),<br /> Student subClassOf attends some Course,<br /> attends(alex, SemanticWeb) }<br />Details JMC 2010<br />
  25. 25. Conclusions<br />Ontology-based IE (OBIE) reformulated as weighted MAX SAT problem<br />Approximation algorithm with 1/2<br />Works and scales (large corpus + YAGO)<br />25<br />
  26. 26. Limitations<br />Specialized approximation algorithm<br />Accounts for SOFIE rules NOT OWL<br />MAX SAT Restrictions<br />∈ Prepositional Logic<br />∉ First-Order Logic<br />Ontology population approach (can’t infer new relations)<br />26<br />
  27. 27. References<br />27<br />F Suchanek et al, SOFIE: a self-organizing framework for information extraction, Proceeding WWW '09 Proceedings of the 18th international conference on World wide web, link<br />John McCrae, Automatic Extraction Of Logically Consistent Ontologies From Text, PhD thesis at National Institute of Informatics, Japan, 2009 link<br />Uwe Schöning: Das SAT-Problem. In Informatik Spektrum 33(5): 479-483, 2010, link<br />F Suchanek, Automated Construction and Growth of a Large Ontology, PhD thesis at Technology of Saarland University. Saarbrücken, Germany, 2009, link<br />

×