The current abundance of electronic documents requires automatic techniques that support the users in understanding their content and extracting useful information. To this aim, improving the retrieval performance must necessarily go beyond simple lexical interpretation of the user queries, and pass through an understanding of their semantic content and aims. It goes without saying that any digital library would take enormous advantage from the availability of effective Information Retrieval techniques to provide to their users. This paper proposes an approach to Information Retrieval based on a correspondence of the domain of discourse between the query and the documents in the repository. Such an association is based on standard general-purpose linguistic resources (WordNet and WordNet Domains) and on a novel similarity assessment technique. Although the work is at a preliminary stage, interesting initial results suggest to go on extending and improving the approach.
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotella, Ferilli, Leuzzi
1. Università degli studi di Bari “Aldo Moro” Dipartimento di Informatica A Domain Based Approach to Information Retrieval in Digital Libraries F. Rotella , S. Ferilli, F. Leuzzi [email_address] , {fabio.leuzzi, rotella.fulvio}@gmail.com 8th Italian Research Conference on Digital Libraries Bari, Italy, February 9-10, 2012 L.A.C.A.M. http://lacam.di.uniba.it:8000
10. Conclusions & Future Works A Domain Based Approach to Information Retrieval in Digital Libraries - F. Rotella, S. Ferilli, F. Leuzzi
11. Some repositories leave the responsibility of quality to the authors. + Anybody can produce and distribute documents. = Possible low average quality of the repository contents. Users are often overwhelmed by documents that only apparently are suitable for satisfying their information needs . Introduction A Domain Based Approach to Information Retrieval in Digital Libraries - F. Rotella, S. Ferilli, F. Leuzzi
12.
13.
14. WordNet Domains A Domain Based Approach to Information Retrieval in Digital Libraries - F. Rotella, S. Ferilli, F. Leuzzi
18. select the most appropriate one according to the context of the discourse A Domain Based Approach to Information Retrieval in Digital Libraries - F. Rotella, S. Ferilli, F. Leuzzi
19.
20. To limit the possibility of including non-discriminative and very general words in the representation of a document A Domain Based Approach to Information Retrieval in Digital Libraries - F. Rotella, S. Ferilli, F. Leuzzi
21. Word Sense Disambiguation Domain Driven One Domain per Discourse assumption: many uses of a word in a coherent portion of text tend to share the same domain. A Domain Based Approach to Information Retrieval in Digital Libraries - F. Rotella, S. Ferilli, F. Leuzzi Prevalent domain individuation Extraction of all synsets for each term Extraction of all domains for each synset Choice of prevalent domain synset
27. A Multistrategy Similarity Measure Cosidered Relationship member meronimy : the latter synset is a member meronym of the former; substance meronimy : the latter synset is a substance meronym of the former; part meronimy : the latter synset is a part meronym of the former; similarity : the latter synset is similar in meaning to the former; antonym : specifies antonymous word; attribute : defines the attribute relation between noun and adjective synset pairs in which the adjective is a value of the noun; additional information : additional information about the first word can be obtained by seeing the second word; part of speech based : specifies two different relations based on the parts of speech involved; participle : the adjective first word is a participle of the verb second word; hyperonymy : the latter synset is a hypernym of the former. A Domain Based Approach to Information Retrieval in Digital Libraries - F. Rotella, S. Ferilli, F. Leuzzi
39. obtaining the most similar cluster to the involved words A Domain Based Approach to Information Retrieval in Digital Libraries - F. Rotella, S. Ferilli, F. Leuzzi
40. Users Query Elaboration Query Results The best combination is used to obtain the list of clusters ranked by descending relevance, that can be used as an answer to the user search . The results are then displayed to the user, in particular are displayed the first n sets of document such that n is the minimum value that shows at least 10 results. A Domain Based Approach to Information Retrieval in Digital Libraries - F. Rotella, S. Ferilli, F. Leuzzi
41.
42.
43. synset: 105943300; lemmas: doctrine, philosophical system, philosophy and school of thought; gloss: a belief (or system of beliefs) accepted as authoritative by some group or school;
48. synset: 107026352; lemmas: opera ; gloss: a drama set to music, consists of singing with orchestral accompaniment and an orchestral overture and interludes;
49. synset: 107071942; lemmas: genre, music genre , musical genre and musical style; gloss: an expressive style of music;
50. synset: 107064715; lemmas: rock , rock ’n’ roll, rock and roll, rock music, rock’n’roll and rock-and-roll; gloss: a genre of popular music originating in the 1950s, a blend of black rhythm-and-blues with white country-and-western. A Domain Based Approach to Information Retrieval in Digital Libraries - F. Rotella, S. Ferilli, F. Leuzzi A Preliminary Evaluation The Quality of Clusters
51. A Domain Based Approach to Information Retrieval in Digital Libraries - F. Rotella, S. Ferilli, F. Leuzzi # Query Outcomes Precision Recall 1 Ornaments and melodies [1 to 9] music [10 to 11] religion 0.82 (1.0) 0.43 (9/21) 2 Reincarnation and eternal life [1 to 9] religion [10] science 0.9 (1.0) 0.39 (9/23) 3 Traditions and folks [1 to 4] music [5 to 6] religion [7 to 10] music 0.8 (1.0) 0.38 (8/21) 4 Limits of theory of relativity [1 to 2] science [3] politics [4 to 5] religion [6 to 15] science 0.8 0.44 (12/27) 5 Capitalism vs communism [1 to 3] politics [4] science [5 to 6] religion [7 to 11] politics [12] science [13] music 0.61 (0.77) 0.53 (8/15) 6 Markets and new economy [1] politics [2] music [3] science [4 to 8] politics [9 to 10] religion 0.6 (0.7) 0.4 (6/15) 7 Relationship between democracy and parliament [1 to 3] politics [4] science [5 to 6] politics [7 to 10] religion 0.5 (0.6) 0.33 (5/15) A Preliminary Evaluation Synthesis of Outcomes
52.
53.
54.
55. To explore other approaches to choose better intensional descriptions of each document A Domain Based Approach to Information Retrieval in Digital Libraries - F. Rotella, S. Ferilli, F. Leuzzi