A Fully-automatic Approach to Answer Geographic Queries: GIRSA-WP at GikiP

597 views
539 views

Published on

Talk given at CLEF 2008,
Aarhus, Denmark

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
597
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Fully-automatic Approach to Answer Geographic Queries: GIRSA-WP at GikiP

  1. 1. A fully-automatic approach to answer geographic queries: GIRSA-WP at GikiP Johannes Leveling Sven Hartrumpf Intelligent Information and Communication Systems (IICS) University of Hagen (FernUniversität in Hagen) 58084 Hagen, Germany firstname.lastname@fernuni-hagen.de
  2. 2. GIRSA-WP J. Leveling, S. Hartrumpf Main idea Main idea GIRSA-WP InSicht (Hartrumpf, 2005) Semantic filter • open-domain QA system Experiments • based on matching semantic network representations and Results of question and documents Conclusions • supports question decomposition References e.g. temporal or geographical constraints + GIRSA (Leveling and Hartrumpf, 2008) • textual GIR system • supports methods to boost recall e.g. normalizing location indicators • supports methods to boost precision e.g. metonymy recognition = GIRSA-WP (GIRSA for Wikipedia) • automatic combination of InSicht and GIRSA J. Leveling, S. Hartrumpf GIRSA-WP 2/9
  3. 3. GIRSA-WP J. Leveling, S. Hartrumpf GIRSA-WP Main idea GIRSA-WP Semantic filter Experiments and Results • applies semantic filter on answer candidates Conclusions • merges results from InSicht and GIRSA by using the References maximum score of documents • returns list of Wikipedia article names • simple multilingual approach: follow German Wikipedia links to articles in English and Portuguese J. Leveling, S. Hartrumpf GIRSA-WP 3/9
  4. 4. GIRSA-WP J. Leveling, S. Hartrumpf Semantic filter (1/2) Main idea GIRSA-WP • in QA: check expected answer type of answer Semantic filter candidates Experiments and Results • for GIRSA-WP: check semantic answer types Conclusions (semantic sort and features, see Helbig (2006)) References • extract word representing the answer type from topic title and description (the first noun not a proper noun) • parse these words with WOCADI, a syntactico-semantic parser (includes a disambiguation of words) and find semantic features corresponding to the extracted words • parse the answer candidates (titles of Wikipedia articles) and determine their semantic features • test if unification of semantic features succeeds; discard answer candidate, otherwise J. Leveling, S. Hartrumpf GIRSA-WP 4/9
  5. 5. GIRSA-WP J. Leveling, S. Hartrumpf Semantic filter (2/2) Main idea GIRSA-WP • Which Swiss cantons border Germany? Semantic filter Experiments → extracted word: cantons and Results • parse result: corresponding concept is canton Conclusions • artificial geographical entity or regional institution References • legal-person:+, movable:–, etc. • answer candidate Cross-Border-Leasing: • prototypical-theoretical-concept • legal-person:–, movable:– → semantic features not unifiable • answer candidate Aargau: → unifiable semantic features J. Leveling, S. Hartrumpf GIRSA-WP 5/9
  6. 6. GIRSA-WP J. Leveling, S. Hartrumpf Experiments and results Main idea GIRSA-WP Semantic filter Experiments • six runs submitted: and Results three with threshold score of 0.01 and Conclusions References varied settings for stemming, location name normalization, and noun decompounding; additional three experiments with threshold score of 0.03 • 798 (372) answers found • 79 correct answers in best run J. Leveling, S. Hartrumpf GIRSA-WP 6/9
  7. 7. GIRSA-WP J. Leveling, S. Hartrumpf Conclusions (1/2) Main idea GIRSA-WP Semantic filter GikiP topics Experiments • are at least as difficult as QA or GeoCLEF topics and Results Conclusions • aim at a wider range of expected answer types References • include complex geographic relations (GP2: outside, GP4: on the border ), restrictions on measurable properties (GP3: more than, GP13: longer than), and temporal constraints (GP9: Renaissance, GP15: between 1980 and 1990) ⇒ new challenge for QA and GIR community J. Leveling, S. Hartrumpf GIRSA-WP 7/9
  8. 8. GIRSA-WP J. Leveling, S. Hartrumpf Conclusions (2/2) Main idea GIRSA-WP • GIRSA: Semantic filter • indexing single sentences was meant to ensure a high Experiments and Results precision (but did not work); Conclusions • geographic entities have not been annotated at all in the References Wikipedia documents • InSicht: • important information is given in tables (like inhabitant numbers), but WOCADI ignores these • the semantic matching approach is still too strict for the IR oriented parts of GikiP queries (similarly for GeoCLEF) ⇒ tasks for future work J. Leveling, S. Hartrumpf GIRSA-WP 8/9
  9. 9. GIRSA-WP J. Leveling, S. Hartrumpf Selected References Main idea GIRSA-WP Hartrumpf, S. (2005). Question answering using sentence parsing and Semantic filter semantic network matching. In Multilingual Information Access for Experiments Text, Speech and Images: 5th Workshop of the Cross-Language and Results Evaluation Forum, CLEF 2004 (edited by Peters, C.; Clough, P.; Conclusions Gonzalo, J.; Jones, G. J. F.; Kluck, M.; and Magnini, B.), volume 3491 References of LNCS, pp. 512–521. Berlin: Springer. Helbig, H. (2006). Knowledge Representation and the Semantics of Natural Language. Berlin: Springer. Leveling, J. and Hartrumpf, S. (2008). Inferring location names for geographic information retrieval. In Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007 (edited by Peters, C.; Jijkoun, V.; Mandl, T.; Müller, H.; Oard, D. W.; Peñas, A.; Petras, V.; and Santos, D.), volume 5152 of LNCS, pp. 773–780. Berlin: Springer. J. Leveling, S. Hartrumpf GIRSA-WP 9/9

×