A Modified Information Retrieval
 Approach to Produce Answer
   Candidates for Question
         Answering

               ...
A modified
    information
retrieval approach
to produce answer
candidates for QA                                          ...
A modified
    information
retrieval approach
to produce answer
candidates for QA                                          ...
A modified
    information
retrieval approach
to produce answer
candidates for QA                                    Questi...
A modified
    information
retrieval approach
to produce answer
candidates for QA                           Embedding of MI...
A modified
    information
retrieval approach
to produce answer
candidates for QA                                          ...
A modified
    information
retrieval approach
to produce answer
candidates for QA                           Expected answer...
A modified
    information
retrieval approach
to produce answer
candidates for QA                            Expected answe...
A modified
    information
retrieval approach
to produce answer
candidates for QA                           Expected answer...
A modified
    information
retrieval approach
to produce answer
candidates for QA                                EAT freque...
A modified
    information
retrieval approach
to produce answer
candidates for QA                     EAT subclass frequenc...
A modified
    information
retrieval approach
to produce answer
candidates for QA                       Tagging with subcla...
A modified
    information
retrieval approach
to produce answer
candidates for QA                          MAVE - MultiNet-...
A modified
    information
retrieval approach
to produce answer
candidates for QA                           Evaluation resu...
A modified
    information
retrieval approach
to produce answer
candidates for QA                        Evaluation results...
A modified
    information
retrieval approach
to produce answer
candidates for QA                        Evaluation results...
A modified
    information
retrieval approach
to produce answer
candidates for QA                           Summary and Fut...
A modified
    information
retrieval approach
to produce answer
candidates for QA                              Selected Ref...
Upcoming SlideShare
Loading in …5
×

A Modified Information Retrieval Approach to Produce Candidates for Question Answering

661 views

Published on

Talk given at the FGIR LWA workshop 2007,
Halle (Saale), Germany

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
661
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Modified Information Retrieval Approach to Produce Candidates for Question Answering

  1. 1. A Modified Information Retrieval Approach to Produce Answer Candidates for Question Answering Johannes Leveling Intelligent Information and Communication Systems (IICS) University of Hagen (FernUniversität in Hagen) 58084 Hagen, Germany johannes.leveling@fernuni-hagen.de LWA 2007 Workshop, Halle (Saale), Germany
  2. 2. A modified information retrieval approach to produce answer candidates for QA Outline Johannes Leveling 1 IRSAW IRSAW QA phases 2 QA phases MIRA Embedding of MIRA Expected answer types 3 MIRA TüBa-D/Z annotation MAVE Embedding of MIRA Evaluation Expected answer types Summary and TüBa-D/Z annotation Future Work References 4 MAVE 5 Evaluation 6 Summary and Future Work Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 2 / 18
  3. 3. A modified information retrieval approach to produce answer candidates for QA IRSAW question Johannes Leveling answering framework IRSAW QA phases IRSAW framework Local Database MIRA Embedding of MIRA Documents Expected answer types Document TüBa-D/Z annotation Answer candidate preprocessing producer: InSicht MAVE Evaluation Answer candidate Answer validation producer: QAP and selection: MAVE Summary and Natural language question Future Work Answer candidate Answer Question producer: MIRA References processing Produce answer candidates IRSAW: Intelligent Information Retrieval on the Basis of a Semantically Annotated Web funded by the DFG (Deutsche Forschungsgemeinschaft) Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 3 / 18
  4. 4. A modified information retrieval approach to produce answer candidates for QA Question answering Johannes Leveling phases IRSAW QA phases MIRA Embedding of MIRA 1 Process document collection Expected answer types TüBa-D/Z annotation 2 Preprocess question MAVE (⇐ Natural language question) Evaluation Summary and 3 Retrieve text segments Future Work 4 Match document and question representations References 5 Return answer candidates 6 Merge and validate answer candidates (⇒ Answer) Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 4 / 18
  5. 5. A modified information retrieval approach to produce answer candidates for QA Embedding of MIRA in Johannes Leveling IRSAW IRSAW QA phases • Employ different modules to produce data MIRA streams containing answer candidates: Embedding of MIRA Expected answer types • InSicht (Matching semantic network TüBa-D/Z annotation representations, Hartrumpf and Leveling (2007)) MAVE • QAP (Question Answering by Pattern matching, Evaluation Summary and Leveling (2006)), and Future Work • MIRA (Modified Information Retrieval Approach) References • Use different methods to produce answer streams to increase recall and robustness • Merge, rank, logically validate answer candidates and select best answer, (MAVE, Glöckner et al. (2007)) Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 5 / 18
  6. 6. A modified information retrieval approach to produce answer candidates for QA MIRA Johannes Leveling • Shallow question answering IRSAW QA phases • Expected answer type (EAT) of question MIRA determined by Bayesian classifier: Embedding of MIRA Expected answer types PERSON, SUBSTANCE, ... TüBa-D/Z annotation MAVE • Manually annotated corpus with EAT tags (e.g. Evaluation PERSON) and subclasses (e.g. person-first Summary and Future Work person-last) References • TüBa-D/Z newspaper corpus (Tübingen Treebank of Written German; http://www.sfs.uni-tuebingen.de/en_ tuebadz.shtml), approximately 470,000 words Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 6 / 18
  7. 7. A modified information retrieval approach to produce answer candidates for QA Expected answer types Johannes Leveling (1/3) IRSAW QA phases MIRA • Question (German): Wer wurde 1948 erster Embedding of MIRA Expected answer types Ministerpräsident Israels? TüBa-D/Z annotation MAVE • Question (English): Who became the first Prime Evaluation minister of Israel in 1948? Summary and Future Work • EAT: PERSON References • Answer string: David ben Gurion • Tag sequence: person-first person-part person-last Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 7 / 18
  8. 8. A modified information retrieval approach to produce answer candidates for QA Expected answer types Johannes Leveling (2/3) IRSAW QA phases MIRA • Question (German): In welchem Jahr endete Embedding of MIRA Expected answer types offiziell die Besetzung Deutschlands? TüBa-D/Z annotation MAVE • Question (English): In what year did the Evaluation occupation of Germany officially end? Summary and Future Work • EAT: TIME References • Answer string: im Jahr 1955 • Tag sequence: prep year num-card Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 8 / 18
  9. 9. A modified information retrieval approach to produce answer candidates for QA Expected answer types Johannes Leveling (3/3) IRSAW QA phases • Question (German): Wie wird der Ebolavirus MIRA übertragen? Embedding of MIRA Expected answer types • Question (English): How is the Ebola virus TüBa-D/Z annotation MAVE transmitted? Evaluation • EAT: OTHER Summary and Future Work • Answer string: (Übertragen werden die References Ebolaviren durch direkten Körperkontakt und bei Kontakt mit Körperausscheidungen infizierter Personen per Kontaktinfektion bzw. Schmierinfektion.) • Tag sequence: – (other entity type → answer not found!) Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 9 / 18
  10. 10. A modified information retrieval approach to produce answer candidates for QA EAT frequency in Johannes Leveling annotated TüBa-D/Z IRSAW QA phases MIRA Embedding of MIRA Name class Corpus frequency Expected answer types TüBa-D/Z annotation LOCATION 8,274 MAVE PERSON 14,527 Evaluation Summary and ORGANIZATION 7,148 Future Work TIME 14,524 References MEASURE 895 SUBSTANCE 293 OTHER 2,987 Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 10 / 18
  11. 11. A modified information retrieval approach to produce answer candidates for QA EAT subclass frequency Johannes Leveling in annotated TüBa-D/Z IRSAW LOCATION Subclass frequency QA phases MIRA city 3,717 Embedding of MIRA Expected answer types country 1,955 TüBa-D/Z annotation region 926 MAVE Evaluation street 613 Summary and state 370 Future Work other 206 References building 195 streetno 124 river 85 island 55 sea 17 mountain 11 Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 11 / 18
  12. 12. A modified information retrieval approach to produce answer candidates for QA Tagging with subclasses Johannes Leveling Token EAT Subclass IRSAW Vor TIME prep QA phases 25 TIME num-card MIRA Jahren TIME year Embedding of MIRA betrat – Expected answer types Neil PERSON person-first TüBa-D/Z annotation Armstrong PERSON person-last MAVE als – erster – Evaluation Mensch – Summary and den – Future Work Mond LOCATION other References , – doch – heute TIME deictic stagniert – die – bemannte – Raumfahrt – . – Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 12 / 18
  13. 13. A modified information retrieval approach to produce answer candidates for QA MAVE - MultiNet-based Johannes Leveling Answer Verification IRSAW QA phases MIRA Embedding of MIRA Expected answer types TüBa-D/Z annotation • Validate answer candidates MAVE • Test logical validity of answer candidate by using Evaluation Summary and Future Work a) inferences, entailments References b) heuristic quality indicators (fallback strategy) • Select most trusted answer Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 13 / 18
  14. 14. A modified information retrieval approach to produce answer candidates for QA Evaluation results (1/3) Johannes Leveling IRSAW QA phases Performance results for InSicht, QAP, and MIRA MIRA Embedding of MIRA based on questions from QA@CLEF data from 2004 Expected answer types TüBa-D/Z annotation to 2006 MAVE Evaluation System # Candidates Coverage # Correct Precision Summary and Future Work InSicht 1,212 226/600 625 51.6% References QAP 2,562 114/600 1,190 46.6% MIRA 14,946 520/600 1,738 11.6% Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 14 / 18
  15. 15. A modified information retrieval approach to produce answer candidates for QA Evaluation results (2/3) Johannes Leveling IRSAW QA phases Performance results including answer selection by MIRA Embedding of MIRA MAVE based on questions from QA@CLEF data Expected answer types TüBa-D/Z annotation from 2004 to 2006 MAVE Evaluation Run # Correct # Inexact # Wrong Summary and Future Work InSicht+Mira+QAP 247.4 15.8 307.8 References InSicht+Mira+QAP (opt.) 305.0 17.0 249.0 Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 15 / 18
  16. 16. A modified information retrieval approach to produce answer candidates for QA Evaluation results (3/3) Johannes Leveling IRSAW Results for MIRA answer candidates for QA@CLEF QA phases data from 2003 to 2006 MIRA Embedding of MIRA top-N Expected answer types TüBa-D/Z annotation N=50 N=30 N=10 N=5 MAVE Evaluation # Correct (2006) 798 615 215 95 Summary and # Inexact (2006) 56 53 20 12 Future Work # Wrong (2006) 4,436 3,421 1,360 722 References # Correct (2003–2006) 1,864 1,503 609 263 # Inexact (2003–2006) 287 248 103 54 # Wrong (2003–2006) 17,326 14,102 5,694 3,013 Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 16 / 18
  17. 17. A modified information retrieval approach to produce answer candidates for QA Summary and Future Johannes Leveling Work IRSAW MIRA: QA phases MIRA • Produces a highly recall-oriented answer Embedding of MIRA Expected answer types stream, TüBa-D/Z annotation MAVE • Covers more questions than the other answer Evaluation producers in IRSAW, and Summary and Future Work • Returns the largest number of correct answer References candidates. Future work: • Return additional answer support for temporal deictic expressions • Support processing list questions Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 17 / 18
  18. 18. A modified information retrieval approach to produce answer candidates for QA Selected References Johannes Leveling Glöckner, Ingo; Sven Hartrumpf; and Johannes Leveling IRSAW (2007). Logical validation, answer merging and witness QA phases selection – a case study in multi-stream question answering. In Proceedings of RIAO 2007, Large-Scale Semantic Access MIRA Embedding of MIRA to Content (Text, Image, Video and Sound). Pittsburgh, USA: Expected answer types TüBa-D/Z annotation C.I.D. MAVE Hartrumpf, Sven and Johannes Leveling (2007). Interpretation Evaluation and normalization of temporal expressions for question Summary and answering. In Evaluation of Multilingual and Multi-modal Future Work Information Retrieval: 7th Workshop of the Cross-Language References Evaluation Forum, CLEF 2006 (edited by et al., Carol Peters), volume 4730 of LNCS, pp. 432–439. Berlin: Springer. Leveling, Johannes (2006). On the role of information retrieval in the question answering system IRSAW. In Proceedings of the LWA 2006, Workshop Information Retrieval, pp. 119–125. Hildesheim, Germany: Universität Hildesheim. Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 18 / 18

×