(1) The document describes a semantic model for selective dissemination of information (SDI) in digital libraries using multi-agent and semantic web technologies. (2) Key components include a thesaurus of concepts, user profiles describing interests and preferences, and RSS feeds to generate personalized alerts of recommended documents. (3) The thesaurus is generated from document analysis and conceptualization using techniques like term frequency-inverse document frequency and WordNet to identify concepts and relations.
2. www.sti-innsbruck.at
Basic Ideea
2
– Develop a multi-agent Selective Dissmination of Information (SDI)
platform capable of generating alerts and recommandations of
documents for users, according to their personal profiles
– Appling Semantic Web technologies for achiving more efficient
information managment and improving agent-agent and user-agent
communication
3. www.sti-innsbruck.at
SDI Components
• Thesaurus
– Enables organizing the most relevant concepts in a
specific domain, by defining semantic relations
between them.
• User profiles
– Structured representations that contain personal data,
interest and preferences of users.
• RSS feeds
– Used as “current awareness bulletins” to generate
personalized bibliographic alerts
• Recommendation log file
– Each document in the repository has an associated
log file that includes the listing of evaluations
assigned to that resource by different users
3
4. www.sti-innsbruck.at
Thesaurus
The creation of a thesaurus includes four phases:
• Pre-processing of documents
– Prepare the document parametrization by removing the
elements regarded as superfluous in 3 stages:
• Eliminate all the tags (HTML, XML, etc)
• Standardization of the words in the document including
removing texts articles, determiners, auxiliary verbs,
conjunctions, prepositions, …
• Stemming all the terms left using the WordNet
algorithm(Morphy)
• Parameterizing the selected terms
– Final terms are quantified by assigning weights obtained by the
application of the scheme term frequency – inverse document frequency
(tf-idf)
5. www.sti-innsbruck.at
Thesaurus
• Conceptualizing their lexical stems
– The associated meaning of each term (lemma) are extract by
searching them on WordNet, which returns a group of synsets
associated to each word (including hypernyms and hyperonyms)
• Generating a lattice or graph that shows the relation between the
identified concepts
– Using formal concept analysis techniques for finding relations
from the generated groups, where each node in the graph
represents a descriptor(namely a group of synonyms terms)
– Clustering of documents depending on the terms(and synonyms)
including links to those with which has any relation(hyponymy or
hyperonymy)
Once the thesaurus is obtained by identifying its terms and the
underlying relation between them, it is represented using SKOS
vocabulary.
6. www.sti-innsbruck.at
User profiles
• Defined with Friend of a Friend(FOAF) vocabulary (generated at
registration time)
– Containing personal data, interests and preferences of users
• 2 Parts:
– Public profile: data related to the user's identity and affiliation
– Private profile: user interests and preferences about the topic of
the alerts he or she wishes to receive
• Users must specify keywords and concepts that best define their
information needs
• This keywords are then compared with the concepts in the
thesaurus; if there is an exact math, the introduced term will be
return, otherwise the lexically most similar term.
• The return term will be suggested to the user and added to its
preferences, if this term satisfy he user expectations.
10. www.sti-innsbruck.at
References
1. J. M. Morales-del-Castillo: Assistant Professor of Information Science, Library
and Information Science Department, University of Granada, Spain
2. R. Pedraza-Jiménez: Assistant Professor of Information Science, Journalism
and Audiovisual Communication Department, Pompeu Fabra University,
Barcelona, Spain
3. A. A. Ruíz: Full Professor of Information Science, Library and Information
Science Department, University of Granada.
4. E. Peis: is Full Professor of Information Science, Library and Information
Science Department, University of Granada.
5. E. Herrera-Viedma: Senior Lecturer in Computer Science, Computer Science
and Artificial Intelligence Department, University of Granada.