Your SlideShare is downloading. ×
  • Like
A Distributed Architecture System for Recognizing Textual Entailment
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

A Distributed Architecture System for Recognizing Textual Entailment

  • 204 views
Published

 

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
204
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A Distributed Architecture System for Recognizing Textual Entailment Adrian Iftene, Alexandra Balahur-Dobrescu, Daniel Matei {adiftene, abalahur, dmatei}@info.uaic.ro „ Al. I. Cuza“ University, Iasi, Romania Faculty of Computer Science
  • 2. Overview
    • Textual Entailment
      • Definition
      • System presentation
      • Results
    • Peer-to-Peer Architecture
      • Presentation
      • Transfer protocol
      • Synchronization problem
      • Results
    • Conclusions
  • 3. Textual Entailment
    • TE is defined (Dagan et al., 2006) as a directional relation between two text fragments, termed T (text) - the entailing text, and H (hypothesis) - the entailed text.
    • It is then said that T entails H if, typically, a human reading T would infer that H is most likely true.
    • Example:
      • T: The carmine cat devours the mouse in the garden.
      • H: The red cat killed the mouse.
  • 4. RTE Competition
    • Organized by PASCAL (Pattern Analysis, Statistical Modeling and Computational Learning) - the European Commission's IST-funded Network of Excellence for Multimodal Interfaces.
    • This year, a limited number of longer texts were added.
    • 2005: 16 groups, 55% average, 70% the best
    • 2006: 23 groups, 58% average, 75% the best
    • 2007: 26 groups, 80 % the best, our result 69.13% (third place)
  • 5. System presentation Resources Initial data DIRT Minipar module Dependency trees for (T, H) pairs LingPipe module Named entities for (T, H) pairs Final result Core Module3 Core Module2 Core Module1 Acronyms Background knowledge Wordnet P2P Computers Wikipedia
  • 6. Tools - LingPipe
    • LingPipe is a suite of Java libraries for the linguistic analysis of human language. The major tools are for:
      • Sentece .
      • Parts of Speech .
      • Named Entities .
      • Coreference
    Example: Hypothesis from pair 111: Leloir was born in Argentina. <ENAMEX TYPE=&quot;PERSON&quot;> Leloir </ENAMEX> was born in <ENAMEX TYPE=&quot;LOCATION&quot;> Argentina </ENAMEX>.
  • 7. Tools - MINIPAR
    • MINIPAR transform the text and the hypothesis into dependency trees
    Example: Le Beau Serge was directed by Chabrol . ( E0(() fin C * ) 1 (Le ~ U 3 lex-mod (gov Le Beau Serge)) 2 (Beau ~ U 3 lex-mod (gov Le Beau Serge)) 3 (Serge Le Beau Serge N 5 s (gov direct)) 4 (was be be 5 be (gov direct)) 5 (directed direct V E0 i (gov fin)) E2 (() Le Beau Serge N 5 obj (gov direct) (antecedent 3)) 6 (by ~ Prep 5 by-subj (gov direct)) 7 (Chabrol ~ N 6 pcomp-n (gov by)) 8 (. ~ U * punc) ) direct (V) Le_Beau_Serge (N) be (be) Chabrol (N) Le_Beau_Serge (N) Le (U) Beau (U) s be by obj lex-mod lex-mod
  • 8. Resources – DIRT1
    • DIRT is both an algorithm and a resulting knowledge collection created by Lin and Pantel
    Example: Le Beau Serge was directed by Chabrol &quot;X solves Y&quot; Y is solved by X X resolves Y X finds a solution to Y X tries to solve Y X deals with Y Y is resolved by X… N:s:V<direct>V:by:N N:obj:V<direct>V:by:N N:s:V<direct>V: :V<direct>V:by:N :V<direct>V:by:N N:obj:V<direct>V:
  • 9. Resources – eXtended WordNet
    • For every synonym, we check to see which word appears in the text tree, and select the mapping with the best value according to the values from eXtended WordNet.
    • For example, the relation between “relative” and “niece” is made with a score of 0.078652.
  • 10. Resources - Acronyms
    • The acronyms’ database helps our program in finding relations between the acronym and its meaning: “ US - United States ”
  • 11. Background Knowledge - Example
    • <pair id=&quot;748&quot; entailment=&quot;YES“>
    • <T>Argentina President Carlos Menem has ordered an 'immediate' investigation into war crimes allegedly committed by British troops during the 1982 Falklands War.</T>
    • <H>Argentine demanded an investigation of alleged war crimes during the Falklands War.</H>
    • </pair>
  • 12. Resources – Background Knowledge Argentine [is] Argentina ar |calling_code = 54 |footnotes = Argentina also has a territorial dispute Argentina', , Nación Argentina (Argentine Nation) for many legal purposes), is in the world. Argentina occupies a continental surface area of Argentina national football team Netherlands [is] Dutch Netherlands [is] Nederlandse Netherlands [is] Antillen Netherlands [in] Europe Netherlands [is] Holland Antilles [in] Netherlands “ Argentine”: Extracted Snippets from Wikipedia:
    • Usually are “definition” patterns:
    • - verbs like “is”, “define”, “represent”, etc.
    • punctuation context , “ ‘ () [] :
    • anaphora resolution
    Chinese [in] China Los Angeles [in] California 2 [is] two Netherlands [is] Holland
  • 13. Semantic Variability Rules
    • Negation rule – given by terms like “no”, “not”, “never”
    • Modal verbs: “ may”, “might”, “cannot”, “should”, “could”
    • Certain cases for particle “to” when it precedes:
      • a verb: “allow”, “impose”, “galvanize”
      • adjective like “necessary”, “compulsory”, “free”
      • noun like “attempt”, “trial”
    • Influence of context:
      • Positive words: “certainly”, “absolutely”
      • Negative words: “probably”, “likely”
  • 14. Fitness calculation 1
    • Local Fitness:
      • 1 at direct mapping, Acronyms, BK
      • DIRT score
      • eXtended WordNet score
    • Extended Local Fitness:
      • Local Fitness
      • Parent Fitness
      • Mapping of edge label
      • Node Position (left or right)
    Text tree node mapping father mapping edge label mapping Hypothesis tree
  • 15. Fitness calculation 2
    • Total Fitness
    • The Negation Value
    • Threshold value = 2.06
  • 16. Fitness calculation 3
    • T: The French railway company SNCF is cooperating in the project.
    • H: The French railway company is called SNCF.
    • Total_Fitness = (3.125 + 3.125 + 3.125 + 2.5 + 4 + 3.048 + 1.125 + 2.625)/8 = 22.673/8 = 2.834
    • Positive_Verbs_Number = 1/1 = 1
    • GlobalFitness = 1*2.834+(1–1)*(4-2.834) = 2.834
    2.625 1 (SNCF, call, desc) 1.125 1 (company, call, obj) 3.048 0.096 (call, -, -) 4 1 (be, call, be) 2.5 1 (company, call, s) 3.125 1 (railway, company, nn) 3.125 1 (French, company, nn) 3.125 1 (the, company, det) Extended local fitness Node Fitness Initial entity
  • 17. Results 0.6913 0.645 0.865 0.685 0.57 Run02 0.6913 0.635 0.87 0.69 0.57 Run01 Global SUM QA IR IE 0.6675 University of Rome ”Tor Vergata”, Italy 0.6687 LT-lab, Germany 0.6700 University of Texas, USA 0.6913 ” Al. I. Cuza” University, Romania 0.7225 LCC Richardson, USA 0.8000 Language Computer Corporation, USA
  • 18. Peer-to-Peer Architecture
    • Speed optimization
      • P2P architecture, cache mechanism
    • Transfer protocol
      • Fail-over mechanism
    • Ending synchronization
      • Quota mechanism
    Initiator DIRT db CM CM CM CM Acronyms SMB upload SMB download CM CM
  • 19. Transfer protocol SMB header CIFS protocol
  • 20. Synchronization problem
    • Dynamic quota (~ 0.26 s)
  • 21. Results 0:00:06.7 5 computers with 7 processes 4 0:00:41 One computer with full cache at start 3 2:03:13 One computer with caching mechanism, but with empty cache at start 2 5:28:45 One computer without caching mechanism 1 Duration Run details No
  • 22. Conclusions
    • Core of our approach is based on a tree edit distance algorithm (Kouylekov, Magnini, 2005)‏
    • Main idea is to transform the hypothesis using source like DIRT, WordNet, Wikipedia, Acronyms database
    • In order to improve the speed we use a P2P architecture and a caching mechanism
    • For ending synchronization we use a dynamic quota
  • 23. Acknowledgments
    • NLP group of Iasi:
      • Supervisor: Prof. Dan Cristea
      • Diana Trandabat, Corina Forascu, Ionut Pistol, Marius Raschip
    • Anaphora resolution group:
      • Iustin Dornescu, Alex Moruz, Gabriela Pavel
  • 24.
    • THANK YOU!