A Distributed Architecture System for Recognizing Textual Entailment


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A Distributed Architecture System for Recognizing Textual Entailment

  1. 1. A Distributed Architecture System for Recognizing Textual Entailment Adrian Iftene, Alexandra Balahur-Dobrescu, Daniel Matei {adiftene, abalahur, dmatei}@info.uaic.ro „ Al. I. Cuza“ University, Iasi, Romania Faculty of Computer Science
  2. 2. Overview <ul><li>Textual Entailment </li></ul><ul><ul><li>Definition </li></ul></ul><ul><ul><li>System presentation </li></ul></ul><ul><ul><li>Results </li></ul></ul><ul><li>Peer-to-Peer Architecture </li></ul><ul><ul><li>Presentation </li></ul></ul><ul><ul><li>Transfer protocol </li></ul></ul><ul><ul><li>Synchronization problem </li></ul></ul><ul><ul><li>Results </li></ul></ul><ul><li>Conclusions </li></ul>
  3. 3. Textual Entailment <ul><li>TE is defined (Dagan et al., 2006) as a directional relation between two text fragments, termed T (text) - the entailing text, and H (hypothesis) - the entailed text. </li></ul><ul><li>It is then said that T entails H if, typically, a human reading T would infer that H is most likely true. </li></ul><ul><li>Example: </li></ul><ul><ul><li>T: The carmine cat devours the mouse in the garden. </li></ul></ul><ul><ul><li>H: The red cat killed the mouse. </li></ul></ul>
  4. 4. RTE Competition <ul><li>Organized by PASCAL (Pattern Analysis, Statistical Modeling and Computational Learning) - the European Commission's IST-funded Network of Excellence for Multimodal Interfaces. </li></ul><ul><li>This year, a limited number of longer texts were added. </li></ul><ul><li>2005: 16 groups, 55% average, 70% the best </li></ul><ul><li>2006: 23 groups, 58% average, 75% the best </li></ul><ul><li>2007: 26 groups, 80 % the best, our result 69.13% (third place) </li></ul>
  5. 5. System presentation Resources Initial data DIRT Minipar module Dependency trees for (T, H) pairs LingPipe module Named entities for (T, H) pairs Final result Core Module3 Core Module2 Core Module1 Acronyms Background knowledge Wordnet P2P Computers Wikipedia
  6. 6. Tools - LingPipe <ul><li>LingPipe is a suite of Java libraries for the linguistic analysis of human language. The major tools are for: </li></ul><ul><ul><li>Sentece . </li></ul></ul><ul><ul><li>Parts of Speech . </li></ul></ul><ul><ul><li>Named Entities . </li></ul></ul><ul><ul><li>Coreference </li></ul></ul>Example: Hypothesis from pair 111: Leloir was born in Argentina. <ENAMEX TYPE=&quot;PERSON&quot;> Leloir </ENAMEX> was born in <ENAMEX TYPE=&quot;LOCATION&quot;> Argentina </ENAMEX>.
  7. 7. Tools - MINIPAR <ul><li>MINIPAR transform the text and the hypothesis into dependency trees </li></ul>Example: Le Beau Serge was directed by Chabrol . ( E0(() fin C * ) 1 (Le ~ U 3 lex-mod (gov Le Beau Serge)) 2 (Beau ~ U 3 lex-mod (gov Le Beau Serge)) 3 (Serge Le Beau Serge N 5 s (gov direct)) 4 (was be be 5 be (gov direct)) 5 (directed direct V E0 i (gov fin)) E2 (() Le Beau Serge N 5 obj (gov direct) (antecedent 3)) 6 (by ~ Prep 5 by-subj (gov direct)) 7 (Chabrol ~ N 6 pcomp-n (gov by)) 8 (. ~ U * punc) ) direct (V) Le_Beau_Serge (N) be (be) Chabrol (N) Le_Beau_Serge (N) Le (U) Beau (U) s be by obj lex-mod lex-mod
  8. 8. Resources – DIRT1 <ul><li>DIRT is both an algorithm and a resulting knowledge collection created by Lin and Pantel </li></ul>Example: Le Beau Serge was directed by Chabrol &quot;X solves Y&quot; Y is solved by X X resolves Y X finds a solution to Y X tries to solve Y X deals with Y Y is resolved by X… N:s:V<direct>V:by:N N:obj:V<direct>V:by:N N:s:V<direct>V: :V<direct>V:by:N :V<direct>V:by:N N:obj:V<direct>V:
  9. 9. Resources – eXtended WordNet <ul><li>For every synonym, we check to see which word appears in the text tree, and select the mapping with the best value according to the values from eXtended WordNet. </li></ul><ul><li>For example, the relation between “relative” and “niece” is made with a score of 0.078652. </li></ul>
  10. 10. Resources - Acronyms <ul><li>The acronyms’ database helps our program in finding relations between the acronym and its meaning: “ US - United States ” </li></ul>
  11. 11. Background Knowledge - Example <ul><li><pair id=&quot;748&quot; entailment=&quot;YES“> </li></ul><ul><li><T>Argentina President Carlos Menem has ordered an 'immediate' investigation into war crimes allegedly committed by British troops during the 1982 Falklands War.</T> </li></ul><ul><li><H>Argentine demanded an investigation of alleged war crimes during the Falklands War.</H> </li></ul><ul><li></pair> </li></ul>
  12. 12. Resources – Background Knowledge Argentine [is] Argentina ar |calling_code = 54 |footnotes = Argentina also has a territorial dispute Argentina', , Nación Argentina (Argentine Nation) for many legal purposes), is in the world. Argentina occupies a continental surface area of Argentina national football team Netherlands [is] Dutch Netherlands [is] Nederlandse Netherlands [is] Antillen Netherlands [in] Europe Netherlands [is] Holland Antilles [in] Netherlands “ Argentine”: Extracted Snippets from Wikipedia: <ul><li>Usually are “definition” patterns: </li></ul><ul><li>- verbs like “is”, “define”, “represent”, etc. </li></ul><ul><li>punctuation context , “ ‘ () [] : </li></ul><ul><li>anaphora resolution </li></ul>Chinese [in] China Los Angeles [in] California 2 [is] two Netherlands [is] Holland
  13. 13. Semantic Variability Rules <ul><li>Negation rule – given by terms like “no”, “not”, “never” </li></ul><ul><li>Modal verbs: “ may”, “might”, “cannot”, “should”, “could” </li></ul><ul><li>Certain cases for particle “to” when it precedes: </li></ul><ul><ul><li>a verb: “allow”, “impose”, “galvanize” </li></ul></ul><ul><ul><li>adjective like “necessary”, “compulsory”, “free” </li></ul></ul><ul><ul><li>noun like “attempt”, “trial” </li></ul></ul><ul><li>Influence of context: </li></ul><ul><ul><li>Positive words: “certainly”, “absolutely” </li></ul></ul><ul><ul><li>Negative words: “probably”, “likely” </li></ul></ul>
  14. 14. Fitness calculation 1 <ul><li>Local Fitness: </li></ul><ul><ul><li>1 at direct mapping, Acronyms, BK </li></ul></ul><ul><ul><li>DIRT score </li></ul></ul><ul><ul><li>eXtended WordNet score </li></ul></ul><ul><li>Extended Local Fitness: </li></ul><ul><ul><li>Local Fitness </li></ul></ul><ul><ul><li>Parent Fitness </li></ul></ul><ul><ul><li>Mapping of edge label </li></ul></ul><ul><ul><li>Node Position (left or right) </li></ul></ul>Text tree node mapping father mapping edge label mapping Hypothesis tree
  15. 15. Fitness calculation 2 <ul><li>Total Fitness </li></ul><ul><li>The Negation Value </li></ul><ul><li>Threshold value = 2.06 </li></ul>
  16. 16. Fitness calculation 3 <ul><li>T: The French railway company SNCF is cooperating in the project. </li></ul><ul><li>H: The French railway company is called SNCF. </li></ul><ul><li>Total_Fitness = (3.125 + 3.125 + 3.125 + 2.5 + 4 + 3.048 + 1.125 + 2.625)/8 = 22.673/8 = 2.834 </li></ul><ul><li>Positive_Verbs_Number = 1/1 = 1 </li></ul><ul><li>GlobalFitness = 1*2.834+(1–1)*(4-2.834) = 2.834 </li></ul>2.625 1 (SNCF, call, desc) 1.125 1 (company, call, obj) 3.048 0.096 (call, -, -) 4 1 (be, call, be) 2.5 1 (company, call, s) 3.125 1 (railway, company, nn) 3.125 1 (French, company, nn) 3.125 1 (the, company, det) Extended local fitness Node Fitness Initial entity
  17. 17. Results 0.6913 0.645 0.865 0.685 0.57 Run02 0.6913 0.635 0.87 0.69 0.57 Run01 Global SUM QA IR IE 0.6675 University of Rome ”Tor Vergata”, Italy 0.6687 LT-lab, Germany 0.6700 University of Texas, USA 0.6913 ” Al. I. Cuza” University, Romania 0.7225 LCC Richardson, USA 0.8000 Language Computer Corporation, USA
  18. 18. Peer-to-Peer Architecture <ul><li>Speed optimization </li></ul><ul><ul><li>P2P architecture, cache mechanism </li></ul></ul><ul><li>Transfer protocol </li></ul><ul><ul><li>Fail-over mechanism </li></ul></ul><ul><li>Ending synchronization </li></ul><ul><ul><li>Quota mechanism </li></ul></ul>Initiator DIRT db CM CM CM CM Acronyms SMB upload SMB download CM CM
  19. 19. Transfer protocol SMB header CIFS protocol
  20. 20. Synchronization problem <ul><li>Dynamic quota (~ 0.26 s) </li></ul>
  21. 21. Results 0:00:06.7 5 computers with 7 processes 4 0:00:41 One computer with full cache at start 3 2:03:13 One computer with caching mechanism, but with empty cache at start 2 5:28:45 One computer without caching mechanism 1 Duration Run details No
  22. 22. Conclusions <ul><li>Core of our approach is based on a tree edit distance algorithm (Kouylekov, Magnini, 2005)‏ </li></ul><ul><li>Main idea is to transform the hypothesis using source like DIRT, WordNet, Wikipedia, Acronyms database </li></ul><ul><li>In order to improve the speed we use a P2P architecture and a caching mechanism </li></ul><ul><li>For ending synchronization we use a dynamic quota </li></ul>
  23. 23. Acknowledgments <ul><li>NLP group of Iasi: </li></ul><ul><ul><li>Supervisor: Prof. Dan Cristea </li></ul></ul><ul><ul><li>Diana Trandabat, Corina Forascu, Ionut Pistol, Marius Raschip </li></ul></ul><ul><li>Anaphora resolution group: </li></ul><ul><ul><li>Iustin Dornescu, Alex Moruz, Gabriela Pavel </li></ul></ul>
  24. 24. <ul><li>THANK YOU! </li></ul>