Exploring Term Selection for Geographic Blind Feedback

1,033 views

Published on

Talk given at the GIR 2007,
Lisbon, Portugal

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,033
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Exploring Term Selection for Geographic Blind Feedback

  1. 1. Exploring Term Selection for Geographic Blind Feedback Johannes Leveling Intelligent Information and Communication Systems (IICS) University of Hagen (FernUniversität in Hagen) 58084 Hagen, Germany firstname.lastname@fernuni-hagen.de GIR 2007 Workshop, Lisbon, Portugal
  2. 2. Exploring Term Selection for Geographic Blind Outline Feedback Johannes Leveling 1 Introduction Introduction Creating a Geographical 2 Creating a Geographical Knowledge Base Knowledge Base GeoNames Data GeoNames Data PND Data PND Data Experiments on Geographic 3 Experiments on Geographic Blind Feedback Blind Feedback Experimental Settings Experimental Settings Results Results Discussion Discussion Outlook References 4 Outlook Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 2 / 18
  3. 3. Exploring Term Selection for Geographic Blind Blind Feedback Feedback Johannes Leveling General idea: Introduction Improve IR performance by expanding a query Creating a 1 The original query Qo is processed and an initial Geographical Knowledge ranked result set Ro of documents is obtained Base GeoNames Data PND Data 2 D documents from Ro are selected and presumed to be Experiments relevant on Geographic 3 T terms from these documents are extracted for Blind Feedback relevance feedback Experimental Settings Results 4 Qo is modified into the final query Qf , merging the Discussion extracted terms into the query and possibly Outlook re-weighting all terms References 5 The final result set Rf is retrieved with the query Qf Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 3 / 18
  4. 4. Exploring Term Selection for Geographic Blind Application of Blind Feedback to Feedback Johannes GIR (1/2) Leveling • Gey and Larson (2): Introduction an improvement on the order of 53% to 72% MAP (mean Creating a Geographical average precision) was achieved for some monolingual Knowledge Base German GIR topics on the GeoCLEF 2006 data (using GeoNames Data PND Data T = 30, D = 5); no significant improvement for English Experiments • Gey and Petras (1): on Geographic “the most improved queries seem to add mostly proper Blind Feedback names and word variations and very few irrelevant words Experimental Settings that won’t distort the search towards another direction” Results Discussion and “blind feedback improves precision, but it seems to do Outlook so for only a particular kind of query” References Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 4 / 18
  5. 5. Exploring Term Selection for Geographic Blind Application of Blind Feedback to Feedback Johannes GIR (2/2) Leveling Introduction Creating a • Blind feedback (BF) is a method originating (and Geographical Knowledge intended for) ad-hoc retrieval Base GeoNames Data → BF does not yet reflect the geographic orientation of PND Data Experiments GIR on Geographic → novel methods for document and term selection are Blind Feedback required, preferably based on geographic knowledge Experimental Settings → BF does not generally increase performance Results Discussion significantly, even in standard IR Outlook → application to GIR without adaptations seems References questionable Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 5 / 18
  6. 6. Exploring Term Selection for Geographic Blind The Geographical Knowledge Feedback Johannes Base (GKB) Leveling Introduction Creating a Geographical Knowledge • Avoid ambiguities for location names; sacrifice Base GeoNames Data coverage (i.e. focus on important places) PND Data Experiments → Create small geographic knowledge base (GKB) with on meronymy relations (part-whole-relations) Geographic Blind • GKB based on two resources: Feedback Experimental Settings • Linking between Wikipedia articles and authority Results Discussion records for persons (PND), and • GeoNames data for the largest cities world-wide Outlook References Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 6 / 18
  7. 7. Exploring Term Selection for Geographic Blind GeoNames data Feedback Johannes • GeoNames provides data for populated places world-wide Leveling with more than 1,000, 5,000, or 15,000 inhabitants Introduction • Entries contain geographic codes for the continent, Creating a country, and administrational divisions Geographical Knowledge • Data for cities with more than 5,000 inhabitants Base GeoNames Data → meronymy relations for 41,228 entries PND Data • Names are translated by utilizing the Wikipedia linking Experiments on between articles in English and German Geographic Blind • Example: Nuenen is a populated place in North Brabant, Feedback in The Netherlands in Europe Experimental Settings Results → meronym(Nuenen, North Brabant), Discussion → meronym(North Brabant, The Netherlands), Outlook → meronym(The Netherlands, Europe) References → A place is important if it is highly populated Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 7 / 18
  8. 8. Exploring Term Selection for Geographic Blind PND Data Feedback Johannes • Wikipedia articles are linked with authority records for Leveling persons from the PND (Personennamendatei) Introduction • PND contains information such as a person’s name, his or Creating a her place and date of birth, place and date of death, and Geographical Knowledge profession Base GeoNames Data • Specification of a place often encodes meronymy PND Data information Experiments on • 152,650 PND entries → 27,734 unique meronymy Geographic Blind relations Feedback • Example: Edsger Wybe Dijkstra was born in Rotterdam, Experimental Settings Results Niederlande/the Netherlands in 1930; died in Nuenen, Discussion Niederlande/the Netherlands in 2002 Outlook → meronym(Rotterdam, The Netherlands), References → meronym(Nuenen, The Netherlands) → A place is important if some well-known person was born or died there Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 8 / 18
  9. 9. Exploring Term Selection for Geographic Blind Towards Less Ambiguity in Feedback Johannes Geographic Resources Leveling characteristic GeoNames cities (pop. > X ) Introduction X=1,000 X=5,000 X=15,000 Creating a Geographical Knowledge unique loc. names 124,315 83,680 57,172 Base GeoNames Data ambiguous loc. names 22,616 13,133 7,551 PND Data senses per loc. name 1.587 1.455 1.345 Experiments on Geographic Blind Feedback Experimental Settings Results Discussion Outlook References Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 9 / 18
  10. 10. Exploring Term Selection for Geographic Blind The Meronymy Predicate Feedback Johannes Leveling Transitive meronymy predicate mero? for two location Introduction names: Creating a Geographical true if L1 is a meronym of L2 Knowledge mero?(L1, L2) := Base GeoNames Data false otherwise PND Data Experiments on • Example: Geographic Blind mero?(Berlin, Germany) returns true Feedback Experimental mero?(Hong Kong, France) returns false Settings Results Discussion → Allows term selection in BF based on meronymy Outlook information in GKB References → Geographic Blind Feedback Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 10 / 18
  11. 11. Exploring Term Selection for Geographic Blind Experimental Setup Feedback Johannes Leveling Introduction Creating a • GeoCLEF documents: 275,000 German newspaper Geographical Knowledge articles from Frankfurter Rundschau, Schweizerische Base GeoNames Data Depeschenagentur, and Der Spiegel from the years PND Data 1994 and 1995 Experiments on • GeoCLEF topics: 25 topics from 2006 with a title, a Geographic Blind Feedback short description, and a narrative part Experimental Settings • GIRSA system: setup similar to previous GIR Results Discussion experiments on GeoCLEF data (4; 3) Outlook References Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 11 / 18
  12. 12. Exploring Term Selection for Geographic Blind Experimental Settings for Feedback Johannes Retrieval Experiments (D=5) Leveling Introduction L: only location names are selected from the top ranked Creating a documents as blind feedback terms Geographical Knowledge Base M: location names are filtered utilizing the mero? GeoNames Data predicate, keeping meronyms of a search term in the PND Data Experiments original query as BF terms on Geographic H: a location name is filtered from the BF terms if it there Blind Feedback is an inverse meronymy relation to a search term in the Experimental Settings original query (holonym) Results Discussion B1 : (Baseline) no blind feedback; query terms are Outlook associated with static weights References B2 : (Baseline) no blind feedback; bag-of-words query; query terms are not weighted Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 12 / 18
  13. 13. Exploring Term Selection for Geographic Blind Results for Retrieval Feedback Johannes Experiments (1/2) Leveling Performance plot Introduction 0.25 Creating a B1 × Geographical Knowledge L ♦ Base H + GeoNames Data + M ♦ PND Data 0.24 × × × × × ♦ × × × + ♦ ♦ Experiments ♦ + on Geographic + MAP ♦ + Blind + Feedback + Experimental ♦ Settings 0.23 + ♦ Results Discussion Outlook References 0.22 5 10 15 20 25 30 35 40 Number of terms T Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 13 / 18
  14. 14. Exploring Term Selection for Geographic Blind Results for Retrieval Feedback Johannes Experiments (2/2) Leveling Topic experiment Introduction Creating a Geographical B1 L H M B2 Knowledge Base GC028 0.38 0.24 0.22 0.41 0.28 GeoNames Data PND Data GC030 ∗ 0.81 0.65 0.66 0.63 0.71 Experiments on GC032 0.60 0.62 0.62 0.70 0.49 Geographic Blind GC039 0.00 0.03 0.03 0.01 0.00 Feedback Experimental GC044 0.33 0.33 0.33 0.33 0.33 Settings Results GC048 0.87 0.89 0.89 0.66 0.85 Discussion Outlook MAP 0.24 0.23 0.23 0.24 0.19 References P@5 0.31 0.32 0.31 0.34 0.24 P@10 0.27 0.24 0.24 0.29 0.21 Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 14 / 18
  15. 15. Exploring Term Selection for Geographic Blind Discussion of Results Feedback Johannes Leveling Introduction • MAP did not change considerably when using BF Creating a compared to the upper baseline B1 (0.24) Geographical Knowledge • The BF strategy M (selecting meronyms) clearly Base GeoNames Data PND Data outperforms the second baseline B2 (0.24 vs. 0.19) Experiments • Precision at five documents was increased (from on Geographic 0.31/0.24 in the baseline experiments to 0.34 in the Blind Feedback M-run) Experimental Settings Results • Per-topic comparison of MAP between B1 and M: Discussion MAP was increased for nine, decreased for three topics Outlook in M-run References Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 15 / 18
  16. 16. Exploring Term Selection for Geographic Blind Discussion Feedback Johannes Leveling Introduction • Geographic semantic relation in is not used in all topics. Creating a Geographical Seven topics with near, in a distance of, alongside, or Knowledge Base around. Five of these with MAP of less than 0.03 GeoNames Data PND Data • GKB mostly covers cities and does not include Experiments on information on rivers, seas, lakes, etc. Geographic Blind • The initial result set may be difficult to improve. Highest Feedback Experimental MAP for official monolingual German experiments in Settings Results GeoCLEF 2006: 0.22 (see (3)) Discussion Baseline experiment B1 : 0.24 MAP Outlook References Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 16 / 18
  17. 17. Exploring Term Selection for Geographic Blind Outlook Feedback Johannes Leveling Introduction Creating a Geographical • Focus on finding even more geographically oriented Knowledge Base term and document selection criteria GeoNames Data PND Data • Investigate setting the parameters T and D in a flexible Experiments on way Geographic Blind • Consider more geographic semantic relations (other Feedback Experimental Settings than meronymy) in term selection for blind feedback Results Discussion Outlook References Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 17 / 18
  18. 18. Exploring Term Selection for Geographic Blind Selected References Feedback Johannes [1] Fredric C. Gey and Vivien Petras. Berkeley2 at GeoCLEF: Leveling Cross-language geographic information retrieval of English and German documents. In Carol Peters, editor, Results of the CLEF Introduction 2005 Cross-Language System Evaluation Campaign , Vienna, Creating a Austria, 2005. Geographical Knowledge [2] Ray Larson and Fredric C. Gey. GeoCLEF text retrieval and manual Base GeoNames Data expansion approaches. In Alessandro Nardi, Carol Peters, and PND Data José Luis Vicedo, editors, Results of the CLEF 2006 Cross-Language Experiments System Evaluation Campaign , Alicante, Spain, 2006. on Geographic [3] Johannes Leveling and Dirk Veiel. Experiments on the exclusion of Blind metonymic location names from GIR. In Carol Peters, et al., editors, Feedback Experimental Evaluation of Multilingual and Multi-modal Information Retrieval: 7th Settings Results Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Discussion volume 4730 of LNCS, pages 901–904. Springer, Berlin, 2007. Outlook [4] Johannes Leveling, Sven Hartrumpf, and Dirk Veiel. Using semantic References networks for geographic information retrieval. In Carol Peters, et al., editors, Accessing Multilingual Information Repositories: 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005, volume 4022 of LNCS, pages 977–986. Springer, Berlin, 2006. Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 18 / 18

×