Hypothesis Transformation and Semantic Variability Rules Used in RTE Adrian Iftene, Alexandra Balahur-Dobrescu adiftene@in...
Overview <ul><li>System presentation </li></ul><ul><li>Tools </li></ul><ul><li>Resources </li></ul><ul><li>Semantic variab...
System presentation Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Resources Initial  ...
Tools - LingPipe <ul><li>LingPipe (http://www.alias-i.com/lingpipe) is a suite of Java libraries for the linguistic analys...
Tools - MINIPAR <ul><li>MINIPAR (Lin, 1998) transform the text and the hypothesis into dependency trees </li></ul>Adrian I...
Resources <ul><li>DIRT - Discovery of Inference Rules from Text  </li></ul><ul><li>Extended WordNet </li></ul><ul><li>Acro...
Resources – DIRT <ul><li>DIRT is both an algorithm and a resulting knowledge collection (Lin and Pantel, 2001) ‏ </li></ul...
Resources – DIRT (cont...) ‏ Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Pair 37: T...
Resources – DIRT (cont...) ‏ Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Pair 161: ...
Resources – eXtended WordNet <ul><li>For every synonym, we check to see which word appears in the text tree, and select th...
Resources - Acronyms <ul><li>The acronyms’ database (http://www.acronym-guide.com) helps our program in finding relations ...
Resources – Background Knowledge Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Argent...
Semantic Variability Rules <ul><li>Negation rule – given by terms like “no”, “not”, “never” </li></ul><ul><li>Modal verbs:...
Fitness calculation 1 <ul><li>Local Fitness: </li></ul><ul><ul><li>1 at direct mapping, Acronyms, BK </li></ul></ul><ul><u...
Fitness calculation 2 <ul><li>Total Fitness </li></ul><ul><li>The Negation Value </li></ul><ul><li>Threshold value = 2.06 ...
Fitness calculation 3 <ul><li>T: The French railway company SNCF is cooperating in the project. </li></ul><ul><li>H: The F...
Results 1 Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania 16.71 % 0.5758 Without NEs 2....
Result 2 <ul><li>Pilot task: Yes, No/Unknown + answer justification </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “...
Peer-to-Peer Architecture Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania <ul><li>Speed...
Results Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania 0:00:06.7 5 computers with 7 pr...
Conclusions Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania <ul><li>Core of our approac...
Future work <ul><li>Search for a method to establish more precise values for penalties </li></ul><ul><li>The multiplicatio...
Acknowledgments <ul><li>Pre-processing: Daniel Matei </li></ul><ul><li>NLP group of Iasi:  </li></ul><ul><ul><li>Coordinat...
<ul><li>THANK YOU! </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Upcoming SlideShare
Loading in …5
×

Hypothesis Transformation and Semantic Variability Rules Used in RTE

1,042 views

Published on

Published in: Spiritual, Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,042
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hypothesis Transformation and Semantic Variability Rules Used in RTE

  1. 1. Hypothesis Transformation and Semantic Variability Rules Used in RTE Adrian Iftene, Alexandra Balahur-Dobrescu adiftene@info.uaic.ro,abalahur@info.uaic.ro „ Al. I. Cuza“ University, Iasi, Romania Faculty of Computer Science
  2. 2. Overview <ul><li>System presentation </li></ul><ul><li>Tools </li></ul><ul><li>Resources </li></ul><ul><li>Semantic variability rules </li></ul><ul><li>Fitness calculation </li></ul><ul><li>Results </li></ul><ul><li>Peer-to-Peer architecture </li></ul><ul><li>Conclusions and Future Work </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
  3. 3. System presentation Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Resources Initial data DIRT Minipar module Dependency trees for (T, H) pairs LingPipe module Named entities for (T, H) pairs Final result Core Module3 Core Module2 Core Module1 Acronyms Background knowledge Wordnet P2P Computers Wikipedia
  4. 4. Tools - LingPipe <ul><li>LingPipe (http://www.alias-i.com/lingpipe) is a suite of Java libraries for the linguistic analysis of human language. The major tools are for: </li></ul><ul><ul><li>Sentence </li></ul></ul><ul><ul><li>Parts of Speech . </li></ul></ul><ul><ul><li>Named Entities . </li></ul></ul><ul><ul><li>Coreference </li></ul></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Example: Hypothesis from pair 111: Leloir was born in Argentina. <ENAMEX TYPE=&quot;PERSON&quot;> Leloir </ENAMEX> was born in <ENAMEX TYPE=&quot;LOCATION&quot;> Argentina </ENAMEX>.
  5. 5. Tools - MINIPAR <ul><li>MINIPAR (Lin, 1998) transform the text and the hypothesis into dependency trees </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Example: Le Beau Serge was directed by Chabrol . ( E0(() fin C * ) 1 (Le ~ U 3 lex-mod (gov Le Beau Serge)) 2 (Beau ~ U 3 lex-mod (gov Le Beau Serge)) 3 (Serge Le Beau Serge N 5 s (gov direct)) 4 (was be be 5 be (gov direct)) 5 (directed direct V E0 i (gov fin)) E2 (() Le Beau Serge N 5 obj (gov direct) (antecedent 3)) 6 (by ~ Prep 5 by-subj (gov direct)) 7 (Chabrol ~ N 6 pcomp-n (gov by)) 8 (. ~ U * punc) ) direct (V) ‏ Le_Beau_Serge (N) ‏ be (be) ‏ Chabrol (N) ‏ Le_Beau_Serge (N) ‏ Le (U) ‏ Beau (U) ‏ s be by obj lex-mod lex-mod
  6. 6. Resources <ul><li>DIRT - Discovery of Inference Rules from Text </li></ul><ul><li>Extended WordNet </li></ul><ul><li>Acronyms </li></ul><ul><li>Background Knowledge </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
  7. 7. Resources – DIRT <ul><li>DIRT is both an algorithm and a resulting knowledge collection (Lin and Pantel, 2001) ‏ </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Example: Le Beau Serge was directed by Chabrol &quot;X solves Y&quot; Y is solved by X X resolves Y X finds a solution to Y X tries to solve Y X deals with Y Y is resolved by X… N:s:V<direct>V:by:N N:obj:V<direct>V:by:N N:s:V<direct>V: :V<direct>V:by:N :V<direct>V:by:N N:obj:V<direct>V:
  8. 8. Resources – DIRT (cont...) ‏ Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Pair 37: T: She was transferred again to Navy when the American Civil War began, 1861. H: The American Civil War started in 1861. H’: The American Civil War began in 1861. Left – left relations similarity HypothesisVerb relation1 relation2 TextVerb relation1 relation3 Left Subtree Right Subtree Right Subtree Left Subtree
  9. 9. Resources – DIRT (cont...) ‏ Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Pair 161: T: The demonstrators, convoked by the solidarity with Latin America committee, verbally attacked Salvadoran President Alfredo Cristiani. H: President Alfredo Cristiani was attacked by demonstrators. H’: Demonstrators attacked President Alfredo Cristiani. Left – right relations similarity HypothesisVerb relation1 relation2 TextVerb relation3 relation1 Left Subtree Right Subtree Left Subtree Right Subtree
  10. 10. Resources – eXtended WordNet <ul><li>For every synonym, we check to see which word appears in the text tree, and select the mapping with the best value according to the values from eXtended WordNet ( http://xwn.hlt.utdallas.edu/downloads.html ) </li></ul><ul><li>For example, the relation between “relative” and “niece” is made with a score of 0.078652. </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
  11. 11. Resources - Acronyms <ul><li>The acronyms’ database (http://www.acronym-guide.com) helps our program in finding relations between the acronym and its meaning: “ US - United States ” </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
  12. 12. Resources – Background Knowledge Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Argentine [is] Argentina ar |calling_code = 54 |footnotes = Argentina also has a territorial dispute Argentina', , Nación Argentina (Argentine Nation) for many legal purposes), is in the world. Argentina occupies a continental surface area of Argentina national football team Netherlands [is] Dutch Netherlands [is] Nederlandse Netherlands [is] Antillen Netherlands [in] Europe Netherlands [is] Holland Antilles [in] Netherlands “ Argentine”: Extracted Snippets from Wikipedia: <ul><li>Usually are “definition” patterns: </li></ul><ul><li>- verbs like “is”, “define”, “represent”, etc. </li></ul><ul><li>punctuation context , “ ‘ () [] : </li></ul><ul><li>anaphora resolution </li></ul>Chinese [in] China Los Angeles [in] California 2 [is] two Netherlands [is] Holland
  13. 13. Semantic Variability Rules <ul><li>Negation rule – given by terms like “no”, “not”, “never” </li></ul><ul><li>Modal verbs: “may”, “might”, “cannot”, “should”, “could” </li></ul><ul><li>Certain cases for particle “to” when it precedes: </li></ul><ul><ul><li>a verb: “allow”, “impose”, “galvanize” </li></ul></ul><ul><ul><li>adjective like “necessary”, “compulsory”, “free” </li></ul></ul><ul><ul><li>noun like “attempt”, “trial” </li></ul></ul><ul><li>Influence of context: </li></ul><ul><ul><li>Positive words: “certainly”, “absolutely” </li></ul></ul><ul><ul><li>Negative words: “probably”, “likely” </li></ul></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
  14. 14. Fitness calculation 1 <ul><li>Local Fitness: </li></ul><ul><ul><li>1 at direct mapping, Acronyms, BK </li></ul></ul><ul><ul><li>DIRT score </li></ul></ul><ul><ul><li>eXtended WordNet score </li></ul></ul><ul><li>Extended Local Fitness: </li></ul><ul><ul><li>Local Fitness </li></ul></ul><ul><ul><li>Parent Fitness </li></ul></ul><ul><ul><li>Mapping of edge label </li></ul></ul><ul><ul><li>Node Position (left or right) ‏ </li></ul></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Text tree node mapping father mapping edge label mapping Hypothesis tree
  15. 15. Fitness calculation 2 <ul><li>Total Fitness </li></ul><ul><li>The Negation Value </li></ul><ul><li>Threshold value = 2.06 </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
  16. 16. Fitness calculation 3 <ul><li>T: The French railway company SNCF is cooperating in the project. </li></ul><ul><li>H: The French railway company is called SNCF. </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania <ul><li>Total_Fitness = (3.125 + 3.125 + 3.125 + 2.5 + 4 + 3.048 + 1.125 + 2.625)/8 = 22.673/8 = 2.834 </li></ul><ul><li>Positive_Verbs_Number = 1/1 = 1 </li></ul><ul><li>GlobalFitness = 1*2.834+(1–1)*(4-2.834) = 2.834 </li></ul>2.625 1 (SNCF, call, desc) 1.125 1 (company, call, obj) 3.048 0.096 (call, -, -) 4 1 (be, call, be) 2.5 1 (company, call, s) 3.125 1 (railway, company, nn) 3.125 1 (French, company, nn) 3.125 1 (the, company, det) Extended local fitness Node Fitness Initial entity
  17. 17. Results 1 Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania 16.71 % 0.5758 Without NEs 2.17 % 0.6763 Without SVR 2.00 % 0.6775 Without BK 1.08 % 0.6838 Without Acronyms 1.63 % 0.6800 Without WordNet 0.54 % 0.6876 Without DIRT Relevance Precision System Description Component relevance: 0.6913 0.645 0.865 0.685 0.57 Run02 0.6913 0.635 0.87 0.69 0.57 Run01 Global SUM QA IR IE
  18. 18. Result 2 <ul><li>Pilot task: Yes, No/Unknown + answer justification </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania Table over 12 submitted runs: Mean [understandability correctness]: [4.1 2.0] [4.3 2.8]* [4.1 1.5] [2.7 1.2] [3.2 1.5] [3.1 1.5] 0.643 0.437 0.475 0.471 System 2 0.805 0.547 0.595 0.569 System 1 Recall Precision F(b=1/3) ‏ Accuracy 0.753 0.731 max 0.475 0.471 median 0.211 0.365 min F(beta=1/3) ‏ Accuracy
  19. 19. Peer-to-Peer Architecture Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania <ul><li>Speed optimization </li></ul><ul><ul><li>P2P architecture, cache mechanism </li></ul></ul><ul><li>Ending synchronization </li></ul><ul><ul><li>Quota mechanism </li></ul></ul>Initiator DIRT db CM CM CM CM Acronyms SMB upload SMB download CM CM
  20. 20. Results Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania 0:00:06.7 5 computers with 7 processes 4 0:00:41 One computer with full cache at start 3 2:03:13 One computer with caching mechanism, but with empty cache at start 2 5:28:45 One computer without caching mechanism 1 Duration Run details No
  21. 21. Conclusions Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania <ul><li>Core of our approach is based on a tree edit distance algorithm (Kouylekov, Magnini, 2005) ‏ </li></ul><ul><li>Main idea is to transform the hypothesis using source like DIRT, WordNet, Wikipedia, Acronyms database </li></ul><ul><li>Additionally, we built a system to acquire the extra background knowledge and applied complex grammar rules for rephrasing in English </li></ul><ul><li>At each step, analysis of the influence of resources used and new subproblems identified and addressed </li></ul>
  22. 22. Future work <ul><li>Search for a method to establish more precise values for penalties </li></ul><ul><li>The multiplication coefficients for the parameters in the extended local fitness </li></ul><ul><li>Using machine learning to establish the global threshold </li></ul><ul><li>Inserting the Textual Entailment system as part of a Question Answering system </li></ul><ul><li>Building a Romanian Textual Entailment System </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
  23. 23. Acknowledgments <ul><li>Pre-processing: Daniel Matei </li></ul><ul><li>NLP group of Iasi: </li></ul><ul><ul><li>Coordinator: Prof. Dan Cristea </li></ul></ul><ul><ul><li>Diana Trandabat, Corina Forascu,Ionut Pistol, Marius Raschip </li></ul></ul><ul><li>Anaphora resolution group: Iustin Dornescu, Alex Moruz, Gabriela Pavel </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
  24. 24. <ul><li>THANK YOU! </li></ul>Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

×