Clef2010 QA

637 views
581 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
637
On SlideShare
0
From Embeds
0
Number of Embeds
158
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Swoogle – a semantic web search engine
  • Clef2010 QA

    1. 1. Question Answering on Romanian, English and French Languages „ Al. I. Cuza” University of Ia s i, Rom a nia Faculty of Computer Science
    2. 2. <ul><li>Introduction </li></ul><ul><li>System components </li></ul><ul><ul><li>Questions analysis </li></ul></ul><ul><ul><li>Index creation and information retrieval </li></ul></ul><ul><ul><li>Answer extraction </li></ul></ul><ul><li>Results </li></ul><ul><li>Application of QA system </li></ul><ul><ul><li>eLearning </li></ul></ul><ul><ul><li>Robotics </li></ul></ul><ul><ul><li>CriES 2010 </li></ul></ul><ul><li>Conclusions </li></ul>
    3. 3. <ul><li>Our group participate in CLEF exercises from 2006: </li></ul><ul><ul><li>2006 – Ro–En (English collection) – 9.47% right answers </li></ul></ul><ul><ul><li>2007 – Ro–Ro (Romanian Wikipedia) – 12 % </li></ul></ul><ul><ul><li>2008 – Ro–Ro (Romanian Wikipedia) – 31 % </li></ul></ul><ul><ul><li>2009 – Ro–Ro, En–En (JRC-Acquis) – 47.2 % (48.6%) </li></ul></ul><ul><ul><li>2010 – Ro-Ro, En-En, Fr-Fr (JRC-Acquis, Europarl) – 47.5% (42.5%, 27 %) </li></ul></ul>
    4. 4. EUROPARL corpus
    5. 5. <ul><li>Q1: What percentage of people in Italy relies on television for information? </li></ul><ul><li><q q_id=&quot;0001&quot; source_lang=&quot;EN&quot; target_lang=&quot;RO&quot;> </li></ul><ul><li><string>Ce procent al populaţiei din Italia contează pe televiziune pentru a obţine informaţii</string> </li></ul><ul><li><focus>procent</focus> </li></ul><ul><li><verb>contează obţine</verb> </li></ul><ul><li><noun>populaţiei televiziune informaţii</noun> </li></ul><ul><li><nameEntities>Italia</nameEntities> </li></ul><ul><li>< luceneQ uery>procent~0.7 populaţiei~0.7 Italia^3 (contează^2 conta) televiziune~0.7 obţine informaţii~0.7 </ luceneQ uery> </li></ul><ul><li><questionType>FACTOID</questionType> ~ 40 patterns </li></ul><ul><li><answerType>MEASURE</answerType> ~ 30 patterns </li></ul><ul><li></q> </li></ul>
    6. 6. <ul><li>We used Lucene and we have created two indexes, one at paragraph level and one at document level </li></ul><ul><li>Using the Lucene queries and the indexes we used the Lucene search engine to extract a ranked list of snippets for every question as possible answer candidates </li></ul>
    7. 7. <ul><li>Depend by Lucene score and additional we built special modules to extract answers for questions of type DEFINITION, REASON-PURPOSE, PROCEDURE, OPINION </li></ul><ul><li>Two thresholds values </li></ul><ul><ul><li>A higher one - in this case, the system offers many NOA answers – RA is affected, but c@1 is higher </li></ul></ul><ul><ul><li>A lower one – in this case we offer only a few NOA answers – RA is higher , but c@1 is lower </li></ul></ul>
    8. 8. RO-RO EN-EN FR-FR answered right 95 102 85 78 54 47 answered wrong 74 93 98 99 124 153 total answered 169 195 183 177 178 200 unanswered right 0 0 0 0 0 0 unanswered wrong 0 0 0 0 0 0 unanswered empty 31 5 17 23 22 0 total unanswered 31 5 17 23 22 0 c@1 measure 0.55 0.42 0.46 0.43 0.30 0.24
    9. 9. <ul><li>eLearning – fast answer for ~30% questions </li></ul><ul><li>Robotics – communication </li></ul><ul><li>CriES 2010 – identify experts on Yahoo! Answers </li></ul>
    10. 11. <ul><li>With Swoogle we extend the knowledge base </li></ul><ul><li>The ontologies returned are then converted to AIML format and saved in the robot’s memory </li></ul>
    11. 12. Initial digraph Initial Yahoo!answers collections en fr ge sp Eliminate stop words Domains keywords Initial users questions Eliminate stop words Questions keywords Relevant words for questions Relevant words for domains Similarity score between questions and domains Run 2 Run 1 Run 0
    12. 13. <ul><li>UAIC QA system evolved over time (from 9 % in 2006 at 47.5 % in 2010) </li></ul><ul><li>The main problem is related to quality and quantity of Romanian resources involved </li></ul><ul><li>In present we are concerned with using of QA components in other applications in order to improve their capabilities </li></ul>

    ×