1. IIR 2010 - First Italian Information Retrieval Workshop
Padova, 28 gen 10
!!"#$%&'
"!(''
#&&!))'#$*
$!+),$#-./#%,$''
0!)!#+&1'2+,34'
1546778889*.93$.(#9.:7;)8#4''
An IR-based approach to tag C. Musto, F. Narducci, P. Lops,
M.de Gemmis, G. Semeraro
recommendation
2. outline
• Background
• Web 2.0 and User-Generated Content
• Collaborative Tagging Systems
• Tag Recommendation
• STaR: Social Tag Recommender System
• Basic assumptions
• Architecture
• Experimental Evaluation
• Conclusions and future work
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 2
3. background
•What is a tag?
•Where do we use tags?
•Why do we use tags?
•Why do we need a tag recommender?
•How does a tag recommender works?
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 3
4. web 2.0
• Nowadays web sites tend to
be more and more social
• Web 2.0 platforms let users
to publish auto-produced
content
• users can post photos,
videos
• users can express opinions
(e.g. reviews)
• users can annotate
resources
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 4
5. social tagging
•Users annotate resources
of interest with free
keywords, called tags
• The act of
collaboratively
annotate resources
with tags produces
a lexical structure
called folksonomy
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 5
6. folskonomies
• The act of collaboratively annotate resources with tags produces a lexical
structure called folksonomy
• A folksonomy is a set of tags
• Usually represented with a Tag Cloud
• The more a tag is used by the community to describe a resource, the
more is the likelihood that it faithfully describes the information
conveyed by the resource
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 6
7. social tagging systems
• Advantages
• Information organized in a way that closely follows the user
mental model
• Effective retrieval, serendipitous browsing
• Disadvantages
• Tag space usually very noisy
• Polysemy, synonymy, level variation
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 7
8. social tagging systems
• These problems are of hindrance to completely
exploit the expressive power of folksonomies
• e.g. ) Searching the resources annotated with the
tag “Macbook” will exclude the resources
annotated with the tag “MacBookPro”
• Folksonomies can’t be exploited for retrieval and
filtering resources in an effective way
• Tag Recommenders are more and more required
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 8
9. tag recommenders: how do they work?
•A user posts a new resource on a platform
•e.g. a new bookmark on bibsonomy.org
•The resource is analyzed
•A set of (hopefully) relevant tags is produced and filtered
•The user freely chooses the most appropriate tags to annotate
the resource
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 9
10. STaR: Social Tag Recommender System
•Basic assumptions
• Resources with similar content should be annotated with
similar tags
•Improved retrieval techniques
• The users previous tagging activity should be taken into
account
•Increasing the weight of tags already used to annotate
similar resources
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 10
11. STaR Architecture
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 11
12. STaR: indexing strategy
•Based on Apache Lucene engine
•A Personal Index for each user
•Information on her previously tagged resources
•A Social Index for the whole community
•Information about all the resources previously tagged by the
community
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 12
13. STaR Architecture
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 13
14. STaR: retrieval of similar resources
•Given a resource to be tagged
•Both the Personal Index and the Social Index queried
•Lucene Scoring function replaced with the Okapi BM25
implementation
•State-of-the-art retrieval model
•Resources with similarity exceeding a certain threshold
retrieved
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 14
15. Retrieval of Similar
STaR Resources
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 15
16. STaR Architecture
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 16
17. STaR: extraction of candidate tags
• Extraction of tags from the most similar resources retrieved in the
previous step
• Building a set of candidate tags
• Each tag assigned with a score by weighting the normalized occurence
of the tag with the similar score returned by Lucene
• Possible different weights to resources retrieved querying the
Personal Index or the Social Index
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 17
18. Tag Extraction
STaR Process 18
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova
19. experimental evaluation
• Goal
• To evaluate the accurary of STaR using different Lucene scoring functions
(Experiment 1)
• Original vs. BM25
• To evaluate the best combination of weights for resources retrieved from
Personal Index and Social Index (Experiment 2)
• Dataset
• Gathered from Bibsonomy
• 263,004 bookmark posts, 158,924 BibTeX entries, 3,617 different users
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 19
20. results of experiment 1
scoring resource precision recall f1
original bookmark 25,26 29,67 27,29
bm25 bookmark 25,62 36,62 30,15
original BibTex 14,06 21,45 16,99
bm25 BibTex 13,72 22,91 17,16
original overall 16,43 23,58 19,37
bm25 overall 16,45 26,46 20,29
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 20
21. results of experiment 2
social tag personal tag
approach weight weight precision recall f1
community-
based
1,0 0,0 34,44 35,89 35,15
user-based 0,0 1,0 44,73 40,53 42,53
hybrid_1 0,7 0,3 32,31 38,57 35,16
hybrid_2 0,5 0,5 32,36 37,55 34,76
hybrid_3 0,3 0,7 35,47 39,68 37,46
baseline - - 42,03 13,23 20,13
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 21
22. ECML/PKDD Discovery Challenge 2009
•STaR participated in the ECML/
PKDD 2009 Discovery Challenge
•The only Italian team
•Sixth place in the task of We are
content-based tag there
recommendation (more than 20
participants)
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 22
23. conclusions
• Users tend to reuse their own tags to annotate similar resources
• The integration of a more effective scoring function (BM25) improves the recommender
accuracy
• Robust recommendation model
• Partecipation to the Discovery Challenge @ECML-PKDD 09
• Future Work
• Tag extraction from textual content of resources
• Work in progress: 3% of improvement in f1-measure on the ECML/PKDD 09 dataset
• Word Sense Disambiguation algorithms for tackling tag synonymy and polysemy
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 23
24. http://www.di.uniba.it/~swap/
Thanks for your attention
Cataldo Musto
Ph.D. Student
University of Bari - “Aldo Moro”
Italy
cataldomusto@di.uniba.it