• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
An IR-based approach to tag recommendation
 

An IR-based approach to tag recommendation

on

  • 652 views

Presentazione IIR 10 - Padova - "An IR-based approach to tag recommendation"

Presentazione IIR 10 - Padova - "An IR-based approach to tag recommendation"

Statistics

Views

Total Views
652
Views on SlideShare
652
Embed Views
0

Actions

Likes
1
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    An IR-based approach to tag recommendation An IR-based approach to tag recommendation Presentation Transcript

    • IIR 2010 - First Italian Information Retrieval Workshop Padova, 28 gen 10 !!"#$%&' "!('' #&&!))'#$* $!+),$#-./#%,$'' 0!)!#+&1'2+,34' 1546778889*.93$.(#9.:7;)8#4'' An IR-based approach to tag C. Musto, F. Narducci, P. Lops, M.de Gemmis, G. Semeraro recommendation
    • outline • Background • Web 2.0 and User-Generated Content • Collaborative Tagging Systems • Tag Recommendation • STaR: Social Tag Recommender System • Basic assumptions • Architecture • Experimental Evaluation • Conclusions and future work C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 2
    • background •What is a tag? •Where do we use tags? •Why do we use tags? •Why do we need a tag recommender? •How does a tag recommender works? C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 3
    • web 2.0 • Nowadays web sites tend to be more and more social • Web 2.0 platforms let users to publish auto-produced content • users can post photos, videos • users can express opinions (e.g. reviews) • users can annotate resources C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 4
    • social tagging •Users annotate resources of interest with free keywords, called tags • The act of collaboratively annotate resources with tags produces a lexical structure called folksonomy C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 5
    • folskonomies • The act of collaboratively annotate resources with tags produces a lexical structure called folksonomy • A folksonomy is a set of tags • Usually represented with a Tag Cloud • The more a tag is used by the community to describe a resource, the more is the likelihood that it faithfully describes the information conveyed by the resource C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 6
    • social tagging systems • Advantages • Information organized in a way that closely follows the user mental model • Effective retrieval, serendipitous browsing • Disadvantages • Tag space usually very noisy • Polysemy, synonymy, level variation C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 7
    • social tagging systems • These problems are of hindrance to completely exploit the expressive power of folksonomies • e.g. ) Searching the resources annotated with the tag “Macbook” will exclude the resources annotated with the tag “MacBookPro” • Folksonomies can’t be exploited for retrieval and filtering resources in an effective way • Tag Recommenders are more and more required C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 8
    • tag recommenders: how do they work? •A user posts a new resource on a platform •e.g. a new bookmark on bibsonomy.org •The resource is analyzed •A set of (hopefully) relevant tags is produced and filtered •The user freely chooses the most appropriate tags to annotate the resource C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 9
    • STaR: Social Tag Recommender System •Basic assumptions • Resources with similar content should be annotated with similar tags •Improved retrieval techniques • The users previous tagging activity should be taken into account •Increasing the weight of tags already used to annotate similar resources C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 10
    • STaR Architecture C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 11
    • STaR: indexing strategy •Based on Apache Lucene engine •A Personal Index for each user •Information on her previously tagged resources •A Social Index for the whole community •Information about all the resources previously tagged by the community C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 12
    • STaR Architecture C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 13
    • STaR: retrieval of similar resources •Given a resource to be tagged •Both the Personal Index and the Social Index queried •Lucene Scoring function replaced with the Okapi BM25 implementation •State-of-the-art retrieval model •Resources with similarity exceeding a certain threshold retrieved C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 14
    • Retrieval of Similar STaR Resources C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 15
    • STaR Architecture C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 16
    • STaR: extraction of candidate tags • Extraction of tags from the most similar resources retrieved in the previous step • Building a set of candidate tags • Each tag assigned with a score by weighting the normalized occurence of the tag with the similar score returned by Lucene • Possible different weights to resources retrieved querying the Personal Index or the Social Index C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 17
    • Tag Extraction STaR Process 18 C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova
    • experimental evaluation • Goal • To evaluate the accurary of STaR using different Lucene scoring functions (Experiment 1) • Original vs. BM25 • To evaluate the best combination of weights for resources retrieved from Personal Index and Social Index (Experiment 2) • Dataset • Gathered from Bibsonomy • 263,004 bookmark posts, 158,924 BibTeX entries, 3,617 different users C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 19
    • results of experiment 1 scoring resource precision recall f1 original bookmark 25,26 29,67 27,29 bm25 bookmark 25,62 36,62 30,15 original BibTex 14,06 21,45 16,99 bm25 BibTex 13,72 22,91 17,16 original overall 16,43 23,58 19,37 bm25 overall 16,45 26,46 20,29 C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 20
    • results of experiment 2 social tag personal tag approach weight weight precision recall f1 community- based 1,0 0,0 34,44 35,89 35,15 user-based 0,0 1,0 44,73 40,53 42,53 hybrid_1 0,7 0,3 32,31 38,57 35,16 hybrid_2 0,5 0,5 32,36 37,55 34,76 hybrid_3 0,3 0,7 35,47 39,68 37,46 baseline - - 42,03 13,23 20,13 C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 21
    • ECML/PKDD Discovery Challenge 2009 •STaR participated in the ECML/ PKDD 2009 Discovery Challenge •The only Italian team •Sixth place in the task of We are content-based tag there recommendation (more than 20 participants) C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 22
    • conclusions • Users tend to reuse their own tags to annotate similar resources • The integration of a more effective scoring function (BM25) improves the recommender accuracy • Robust recommendation model • Partecipation to the Discovery Challenge @ECML-PKDD 09 • Future Work • Tag extraction from textual content of resources • Work in progress: 3% of improvement in f1-measure on the ECML/PKDD 09 dataset • Word Sense Disambiguation algorithms for tackling tag synonymy and polysemy C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 23
    • http://www.di.uniba.it/~swap/ Thanks for your attention Cataldo Musto Ph.D. Student University of Bari - “Aldo Moro” Italy cataldomusto@di.uniba.it