An IR-based approach to Tag Recommendation"

IIR 2010 - First Italian Information Retrieval Workshop
Padova, 28 gen 10

!!"#$%&'
"!(''
#&&!))'#$*
$!+),$#-./#%,$''
0!)!#+&1'2+,34'
1546778889*.93$.(#9.:7;)8#4''

An IR-based approach to tag C. Musto, F. Narducci, P. Lops,
M.de Gemmis, G. Semeraro
recommendation

outline
• Background

• Web 2.0 and User-Generated Content

• Collaborative Tagging Systems

• Tag Recommendation

• STaR: Social Tag Recommender System

• Basic assumptions

• Architecture

• Experimental Evaluation

• Conclusions and future work

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 2

background
•What is a tag?

•Where do we use tags?

•Why do we use tags?

•Why do we need a tag recommender?

•How does a tag recommender works?


web 2.0
• Nowadays web sites tend to
be more and more social
• Web 2.0 platforms let users
to publish auto-produced
content
• users can post photos,
videos
• users can express opinions
(e.g. reviews)
• users can annotate
resources

social tagging
•Users annotate resources
of interest with free
keywords, called tags

• The act of
collaboratively
annotate resources
with tags produces
a lexical structure
called folksonomy


folskonomies
• The act of collaboratively annotate resources with tags produces a lexical
structure called folksonomy
• A folksonomy is a set of tags
• Usually represented with a Tag Cloud

• The more a tag is used by the community to describe a resource, the
more is the likelihood that it faithfully describes the information
conveyed by the resource


social tagging systems
• Advantages

• Information organized in a way that closely follows the user
mental model

• Effective retrieval, serendipitous browsing

• Disadvantages

• Tag space usually very noisy

• Polysemy, synonymy, level variation

social tagging systems
• These problems are of hindrance to completely
exploit the expressive power of folksonomies
• e.g. ) Searching the resources annotated with the
tag “Macbook” will exclude the resources
annotated with the tag “MacBookPro”
• Folksonomies can’t be exploited for retrieval and
ﬁltering resources in an effective way
• Tag Recommenders are more and more required

tag recommenders: how do they work?
•A user posts a new resource on a platform

•e.g. a new bookmark on bibsonomy.org

•The resource is analyzed

•A set of (hopefully) relevant tags is produced and ﬁltered

•The user freely chooses the most appropriate tags to annotate
the resource


STaR: Social Tag Recommender System
•Basic assumptions

• Resources with similar content should be annotated with
similar tags

•Improved retrieval techniques

• The users previous tagging activity should be taken into
account

•Increasing the weight of tags already used to annotate
similar resources

STaR Architecture

STaR: indexing strategy
•Based on Apache Lucene engine

•A Personal Index for each user

•Information on her previously tagged resources

•A Social Index for the whole community

•Information about all the resources previously tagged by the
community


STaR Architecture

STaR: retrieval of similar resources
•Given a resource to be tagged

•Both the Personal Index and the Social Index queried

•Lucene Scoring function replaced with the Okapi BM25
implementation

•State-of-the-art retrieval model

•Resources with similarity exceeding a certain threshold
retrieved


Retrieval of Similar
STaR Resources

STaR Architecture

STaR: extraction of candidate tags
• Extraction of tags from the most similar resources retrieved in the
previous step

• Building a set of candidate tags

• Each tag assigned with a score by weighting the normalized occurence
of the tag with the similar score returned by Lucene

• Possible different weights to resources retrieved querying the
Personal Index or the Social Index


Tag Extraction
STaR Process 18
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova

experimental evaluation
• Goal

• To evaluate the accurary of STaR using different Lucene scoring functions
(Experiment 1)

• Original vs. BM25

• To evaluate the best combination of weights for resources retrieved from
Personal Index and Social Index (Experiment 2)

• Dataset

• Gathered from Bibsonomy

• 263,004 bookmark posts, 158,924 BibTeX entries, 3,617 different users


results of experiment 1
scoring resource precision recall f1
original bookmark 25,26 29,67 27,29

bm25 bookmark 25,62 36,62 30,15

original BibTex 14,06 21,45 16,99

bm25 BibTex 13,72 22,91 17,16

original overall 16,43 23,58 19,37

bm25 overall 16,45 26,46 20,29


results of experiment 2
social tag personal tag
approach weight weight precision recall f1
community-
based
1,0 0,0 34,44 35,89 35,15

user-based 0,0 1,0 44,73 40,53 42,53

hybrid_1 0,7 0,3 32,31 38,57 35,16

hybrid_2 0,5 0,5 32,36 37,55 34,76

hybrid_3 0,3 0,7 35,47 39,68 37,46

baseline - - 42,03 13,23 20,13


ECML/PKDD Discovery Challenge 2009

•STaR participated in the ECML/
PKDD 2009 Discovery Challenge

•The only Italian team

•Sixth place in the task of We are
content-based tag there
recommendation (more than 20
participants)

conclusions
• Users tend to reuse their own tags to annotate similar resources

• The integration of a more effective scoring function (BM25) improves the recommender
accuracy

• Robust recommendation model

• Partecipation to the Discovery Challenge @ECML-PKDD 09

• Future Work

• Tag extraction from textual content of resources

• Work in progress: 3% of improvement in f1-measure on the ECML/PKDD 09 dataset

• Word Sense Disambiguation algorithms for tackling tag synonymy and polysemy


http://www.di.uniba.it/~swap/

Thanks for your attention

Cataldo Musto
Ph.D. Student
University of Bari - “Aldo Moro”
Italy
cataldomusto@di.uniba.it

An IR-based approach to Tag Recommendation"

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Recently uploaded

Recently uploaded (20)

An IR-based approach to Tag Recommendation"