Improving a Question Answering System for Romanian Using Textual Entailment Adrian Iftene, Alexandra Balahur-Dobrescu {adi...
Overview <ul><li>Textual Entailment </li></ul><ul><ul><li>Definition </li></ul></ul><ul><ul><li>System presentation </li><...
Textual Entailment <ul><li>TE is defined (Dagan et al., 2006) as a directional relation between two text fragments, termed...
System presentation Resources Initial   data DIRT Minipar module Dependency trees for  (T, H) pairs LingPipe module Named ...
GATE for Romanian <ul><li>Instead of  LingPipe  we use GATE (General Architecture for Text Engineering) </li></ul><ul><li>...
Acronyms database <ul><li>The  acronyms’ database  helps our program in finding relations between the acronym and its mean...
The Background Knowledge <ul><li>For NEs from  H  without correspondence in  T , we used Romanian Wikipedia instead of Eng...
Fitness calculation <ul><li>H:  Ernest Hemingway, faimos romancier, nuvelist, realizator de povestiri American, a trăit în...
Fitness calculation (cont…) <ul><li>{ Ernest Hemingway, {faimos, celebru, excelent}, {romancier, scriitor}, nuvelist, {rea...
Fitness calculation (cont…) 7 1961 11 4 1899 10 - an 9 - trăi, exista, vieţui 8 32 American, America, US, USA 7 12 povesti...
Fitness calculation (cont…)
QA System
Using the TE system in the QA system <ul><li>The scope is improving the ranking between possible answers ( Person, Locatio...
Example <ul><li>Question:  Ce faimos romancier, nuvelist şi realizator american de povestiri a trăit între anii 1899 şi 19...
Example (cont…) <ul><li>H2_1:   Ernest Hemingway, faimos romancier, nuvelist, realizator de povestiri American, a trăit în...
Results <ul><li>TE English system: 69.13%  </li></ul><ul><li>TE Romanian system: 67% </li></ul><ul><li>Precision for QA sy...
Conclusions & Future Work <ul><li>We present the steps performed in order to obtain Romanian TE system </li></ul><ul><li>R...
Acknowledgments <ul><li>NLP group of Iasi:  </li></ul><ul><ul><li>Supervisor: Prof. Dan Cristea </li></ul></ul><ul><ul><li...
<ul><li>THANK YOU! </li></ul>
Upcoming SlideShare
Loading in …5
×

Improving a Question Answering System for Romanian Using Textual Entailment

557 views

Published on

Published in: Business, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
557
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Improving a Question Answering System for Romanian Using Textual Entailment

  1. 1. Improving a Question Answering System for Romanian Using Textual Entailment Adrian Iftene, Alexandra Balahur-Dobrescu {adiftene, abalahur}@info.uaic.ro „ Al. I. Cuza“ University, Iasi, Romania Faculty of Computer Science
  2. 2. Overview <ul><li>Textual Entailment </li></ul><ul><ul><li>Definition </li></ul></ul><ul><ul><li>System presentation </li></ul></ul><ul><ul><li>Romanian TE system </li></ul></ul><ul><ul><ul><li>GATE for Romanian </li></ul></ul></ul><ul><ul><ul><li>Acronyms database </li></ul></ul></ul><ul><ul><ul><li>The Background Knowledge </li></ul></ul></ul><ul><ul><ul><li>Fitness calculation </li></ul></ul></ul><ul><li>Using the TE system in the QA system </li></ul><ul><ul><li>Example </li></ul></ul><ul><ul><li>Results </li></ul></ul><ul><li>Conclusions & Future Work </li></ul>
  3. 3. Textual Entailment <ul><li>TE is defined (Dagan et al., 2006) as a directional relation between two text fragments, termed T (text) - the entailing text, and H (hypothesis) - the entailed text. </li></ul><ul><li>It is then said that T entails H if, typically, a human reading T would infer that H is most likely true. </li></ul><ul><li>Example: </li></ul><ul><ul><li>T: The carmine cat devours the mouse in the garden. </li></ul></ul><ul><ul><li>H: The red cat killed the mouse. </li></ul></ul>
  4. 4. System presentation Resources Initial data DIRT Minipar module Dependency trees for (T, H) pairs LingPipe module Named entities for (T, H) pairs Final result Core Module3 Core Module2 Core Module1 Acronyms Background knowledge Wordnet P2P Computers Wikipedia
  5. 5. GATE for Romanian <ul><li>Instead of LingPipe we use GATE (General Architecture for Text Engineering) </li></ul><ul><li>We use in our TE system a combination of Romanian and English lists of NEs, because Romanian questions from QA@CLEF include universal writers or authors or personalities </li></ul>
  6. 6. Acronyms database <ul><li>The acronyms’ database helps our program in finding relations between the acronym and its meaning: “ US - United States ”. </li></ul><ul><li>We automatically extracted a list of acronyms from a collection of Romanian newspaper articles from economics and politics </li></ul><ul><li>http://www.abbreviations.com/acronyms/ROMANIAN </li></ul>
  7. 7. The Background Knowledge <ul><li>For NEs from H without correspondence in T , we used Romanian Wikipedia instead of English Wikipedia used in (Iftene, Balahur 2007) </li></ul><ul><li>Additional, we use the English Background Knowledge </li></ul>Chinese [in] China II [is] February America [is] United States of America American [in] America Bucharest [in] Romania
  8. 8. Fitness calculation <ul><li>H: Ernest Hemingway, faimos romancier, nuvelist, realizator de povestiri American, a trăit între anii 1899 şi 1961. ( Ernest Hemingway, a famous American novelist and short story writer lived between 1899 and 1961. ) => stop word elimination </li></ul><ul><li>{ Ernest Hemingway, faimos , romancier, nuvelist, realizator, povestire, American, trăi, an, 1899, 1961 } => WordNet => </li></ul><ul><li>{ Ernest Hemingway, { faimos, celebru, excelent }, {romancier, scriitor}, nuvelist, {realizator, producător, creator, participant}, {povestire, mit, parabolă, naraţiune}, American, {trăi, exista, vieţui}, an, 1899, 1961 } => BK </li></ul>
  9. 9. Fitness calculation (cont…) <ul><li>{ Ernest Hemingway, {faimos, celebru, excelent}, {romancier, scriitor}, nuvelist, {realizator, producător, creator, participant, autor}, {povestire, mit, parabolă, naraţiune}, { American, America, US, USA }, {trăi, exista, vieţui}, an, 1899, 1961 }. </li></ul>
  10. 10. Fitness calculation (cont…) 7 1961 11 4 1899 10 - an 9 - trăi, exista, vieţui 8 32 American, America, US, USA 7 12 povestire, mit, parabolă, naraţiune 6 11, 31 realizator, producător, creator, participant, autor 5 10 novelist 4 9 romancier, scriitor 3 8, 30 faimos, celebru, excelent 2 1 Ernest Hemingway 1 Positions in Text Word No
  11. 11. Fitness calculation (cont…)
  12. 12. QA System
  13. 13. Using the TE system in the QA system <ul><li>The scope is improving the ranking between possible answers ( Person, Location , Date and Organization ) </li></ul><ul><li>How? Select all relevant named entities from the extracted snippets for one question </li></ul><ul><li>Replace with them the variables from the patterns associated to the question </li></ul>
  14. 14. Example <ul><li>Question: Ce faimos romancier, nuvelist şi realizator american de povestiri a trăit între anii 1899 şi 1961? </li></ul><ul><li>Pattern: PERSON , faimos romancier, nuvelist, realizator de povestiri American, a trăit între anii 1899 şi 1961. </li></ul><ul><li>S1: “ Petru Popescu este un romancier, scenarist şi realizator de filme american de origine română. A emigrat în Statele Unite ale Americii în anii 1980, unde s-a impus drept romancier şi autor de scenarii ale unor filme de la Hollywood.” </li></ul><ul><li>S2: “ Americanul Ernest Hemingway (1899-1961), autor de povestiri, nuvelist şi romancier, şi romancierul rus Yuri Olesha (1899-1976) s-au născut la aceeaşi dată. ” </li></ul>
  15. 15. Example (cont…) <ul><li>H2_1: Ernest Hemingway, faimos romancier, nuvelist, realizator de povestiri American, a trăit între anii 1899 şi 1961. </li></ul><ul><li>H2_2: Yuri Olesha, faimos romancier, nuvelist, realizator de povestiri American, a trăit între anii 1899 şi 1961. </li></ul><ul><li>Global fitness: </li></ul><ul><ul><li>For pair (H2_1, S2) is 0.47, and </li></ul></ul><ul><ul><li>For pair (H2_2, S2) is 0.44 </li></ul></ul>
  16. 16. Results <ul><li>TE English system: 69.13% </li></ul><ul><li>TE Romanian system: 67% </li></ul><ul><li>Precision for QA system on Romanian with TE system on Romanian included grow with 5% </li></ul>
  17. 17. Conclusions & Future Work <ul><li>We present the steps performed in order to obtain Romanian TE system </li></ul><ul><li>Romanian TE system can be included in a Romanian QA system </li></ul><ul><li>For future we want to perform the same steps for Organization and Date </li></ul>
  18. 18. Acknowledgments <ul><li>NLP group of Iasi: </li></ul><ul><ul><li>Supervisor: Prof. Dan Cristea </li></ul></ul><ul><ul><li>Diana Trandabat, Corina Forascu, Ionut Pistol, Marius Raschip </li></ul></ul><ul><li>Anaphora resolution group: </li></ul><ul><ul><li>Iustin Dornescu, Alex Moruz, Gabriela Pavel </li></ul></ul>
  19. 19. <ul><li>THANK YOU! </li></ul>

×