20080919 regular meeting報告


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

20080919 regular meeting報告

  1. 1. TOB : Timely Ontologies for Business Relations <ul><li>Qi Zhang Fabian M. Suchanek Lihua Yue Gerhard Weikum </li></ul><ul><li>11th International Workshop on the Web and Databases, WebDB 2008 </li></ul><ul><li>Reporter : Che-Min Liao </li></ul>2008/9/16
  2. 2. Outline <ul><li>INTRODUCTION </li></ul><ul><li>RELATED WORK </li></ul><ul><li>REAL TIME BUSINESS ONTOLOGY </li></ul><ul><li>RELATION EXTRACTION </li></ul><ul><li>TEMPORAL RELATION INFERENCE </li></ul><ul><li>EVALUATION </li></ul><ul><li>CONCLUSION </li></ul>2008/9/16
  3. 3. INTRODUCTION <ul><li>The web has become a valuable source of facts that can be obtained by information-extraction methods. </li></ul><ul><li>Unfortunately, state-of-the-art IE systems handle time aspects of real word relations very insufficiently. </li></ul><ul><li>This paper introduces TOB toolkit for automatically building time-annotated business ontologies. </li></ul>2008/9/16
  4. 4. TOB <ul><li>An example of time-annotated facts </li></ul><ul><li>Contributions </li></ul><ul><ul><li>The methods for extracting temporal relations. </li></ul></ul><ul><ul><li>A model to represent underspecified time intervals. </li></ul></ul><ul><ul><li>New ways of temporal relation inferencing. </li></ul></ul>2008/9/16
  5. 5. RELATED WORK <ul><li>Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum &quot; YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia&quot; WWW 2007, ACM Press, pp. 697–706. </li></ul><ul><li>Fabian M. Suchanek, Georgiana Ifrim, and Gerhard Weikum. 2006. Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents. In SIGKDD 2006. </li></ul><ul><li>Koen, D. B., & Bender, W. Time frames: temporal augmentation of the news. IBM Systems, 39(3&4), 597–616. </li></ul>2008/9/16
  6. 6. REAL TIME BUSINESS ONTOLOGY <ul><li>The ontology model of TOB is based on the YAGO model. </li></ul><ul><ul><li>All objects are represented as entities, such as companies, people, and products. </li></ul></ul><ul><ul><li>Facts are binary relations between two entities. </li></ul></ul><ul><ul><ul><li>Elvis Presly hasWonPrize Grammy_Award </li></ul></ul></ul><ul><ul><li>Classes (i.e. entity types) and relations are also entities. </li></ul></ul><ul><ul><ul><li>Singer subClassOf person </li></ul></ul></ul><ul><ul><li>Every fact is also an entity and gets assigned a fact identifier. </li></ul></ul><ul><ul><ul><li>Google acquired YouTube on October 9, 2006 -> #1 happenedOn 2006-10-09 where #1 = Google acquired YouTube </li></ul></ul></ul>2008/9/16
  7. 7. Time-Enhancement for the Ontology <ul><li>The key characteristic of TOB in going beyond YAGO is that every fact should have a time interval indicating when the fact happens. </li></ul><ul><li>To model the time range of a fact by 4 relations. </li></ul><ul><ul><li>Jack Welch isCEO General Electric : [1980,1982,2000,2001] (Jack Welch became CEO of GE at some time between 1980 and 1982. He retired between 2000 and 2001) </li></ul></ul>2008/9/16
  8. 8. Work Flow of TOB 2008/9/16
  9. 9. RELATION EXTRACTION <ul><li>Pattern-Based Relation Extraction </li></ul><ul><ul><li>It focuses on structured and semi-structured data, a typical and most prominent example being infoboxes in Wikipedia. </li></ul></ul><ul><ul><li>This approach has been used in prior work (YAGO and Dbpedia) </li></ul></ul><ul><ul><li>For example : (Wikipedia article about Google in the Wikipedia Markup Languages) num_employees = 16,805 ([[Dec 31]] [[2007]]) -> Google hasEmp 16,805 : [2007-12-31,2007-12-31,2007-12-31,2007-12-31] </li></ul></ul><ul><li>Link Grammar Based Relation Extraction </li></ul><ul><ul><li>LEILA (Learning to Extract Information by Linguistic Analysis ) </li></ul></ul>2008/9/16
  10. 10. E-LEILA <ul><li>It can deal with ternary and quaternary relations to capture facts together with their time intervals. </li></ul><ul><li>The algorithms : </li></ul><ul><ul><li>DateRecognition </li></ul></ul><ul><ul><li>GetVerb </li></ul></ul><ul><ul><li>GetPreposition </li></ul></ul><ul><ul><li>HasLinkage </li></ul></ul><ul><ul><li>GetTimeRelation </li></ul></ul>2008/9/16
  11. 11. Examples <ul><li>Input : </li></ul><ul><li>Microsoft acquired Tellme in May 2007 and acquired Colloquis Inc in October 2006 </li></ul><ul><li>Result of Link Parser : </li></ul><ul><li>Result of LEILA : </li></ul><ul><li>Result of E-LEILA : </li></ul>2008/9/16
  12. 12. TEMPORAL RELATION INFERENCE <ul><li>Often, there is no temporal information in the same record or sentence that contains the primary fact. </li></ul><ul><li>The goal in TOB is to have every fact associated with a time interval. </li></ul><ul><ul><li>Ontology Level Inferencing </li></ul></ul><ul><ul><li>Page Level Inferencing </li></ul></ul>2008/9/16
  13. 13. Ontology Level Inferencing <ul><li>Most entities have a life span in which they participate in events. </li></ul><ul><li>A fact can only hold during the life spans of its arguments. </li></ul><ul><ul><li>If the fact is about the two entities e 1 and e 2 , and e 1 has life span T 1 and e 2 has life span T 2 , the time range T of the fact is here is the time inferencing operator, defined as follows for two time ranges (life spans) T 1 =[t 1 ,t 2 ,t 3 ,t 4 ], T 2 =[t 1 ’,t 2 ’,t 3 ’,t 4 ’] </li></ul></ul><ul><li>Example </li></ul>2008/9/16
  14. 14. Page Level Inferencing <ul><li>News pages contain many relative temporal phrases, such as today, last Monday, this year, etc. </li></ul><ul><ul><li>This makes the extraction of proper time points or time intervals much harder. </li></ul></ul><ul><li>Publication date identification </li></ul><ul><ul><li>URL </li></ul></ul><ul><ul><li>Metadata </li></ul></ul><ul><ul><li>Main Text </li></ul></ul><ul><li>Relative temporal phrases normalization </li></ul>2008/9/16
  15. 15. EVALUATION <ul><li>Wikipedia companies </li></ul><ul><ul><li>350 Wikipedia articles about US companies. </li></ul></ul><ul><ul><li>To test pattern-based extraction algorithm. </li></ul></ul><ul><li>Reuters company descriptions </li></ul><ul><ul><li>276 Reuters company description pages. </li></ul></ul><ul><ul><li>To test the link-grammar-based relation extraction algorithm. </li></ul></ul><ul><li>Google News pages </li></ul><ul><ul><li>438 news pages from the Google News Archive </li></ul></ul><ul><ul><li>To test page-level temporal inferencing methods. </li></ul></ul>2008/9/16
  16. 16. Results on pattern-based extraction <ul><li>confidence level = 95% </li></ul><ul><li>margin of error = </li></ul>2008/9/16
  17. 17. Result on link-grammar-based extraction <ul><li>only for the relation of acquisition : </li></ul>2008/9/16
  18. 18. Results on relative temporal normalization 2008/9/16
  19. 19. CONCLUSION <ul><li>In contrast to the general-purpose information extraction, their methods are particularly geared for extracting temporal relations from semi-structured and textual Web sources. </li></ul><ul><li>They have developed new ways of temporal relation inferencing for facts without time intervals. </li></ul><ul><li>Their experiments have show that they can achieve fairly high precision for the extracted information. </li></ul>2008/9/16