SlideShare a Scribd company logo
1 of 24
Download to read offline
IIR 2010 - First Italian Information Retrieval Workshop
                         Padova, 28 gen 10




                                                   !!"#$%&'
                                                   "!(''
                                                   #&&!))'#$*
                                                   $!+),$#-./#%,$''
                                                   0!)!#+&1'2+,34'
                                                   1546778889*.93$.(#9.:7;)8#4''




An IR-based approach to tag              C. Musto, F. Narducci, P. Lops,
                                          M.de Gemmis, G. Semeraro
recommendation
outline
    • Background

         • Web 2.0 and User-Generated Content

         • Collaborative Tagging Systems

         • Tag Recommendation

    • STaR: Social Tag Recommender System

         • Basic assumptions

         • Architecture

    • Experimental Evaluation

    • Conclusions and future work


C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   2
background
    •What is a tag?

    •Where do we use tags?

    •Why do we use tags?

    •Why do we need a tag recommender?

    •How does a tag recommender works?



C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   3
web 2.0
       • Nowadays web sites tend to
             be more and more social
       • Web 2.0 platforms let users
             to publish auto-produced
             content
            • users can post photos,
                  videos
            • users can express opinions
                  (e.g. reviews)
            • users can annotate
                  resources
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   4
social tagging
    •Users annotate resources
     of interest with free
     keywords, called tags

       • The act of
             collaboratively
             annotate resources
             with tags produces
             a lexical structure
             called folksonomy

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   5
folskonomies
        •    The act of collaboratively annotate resources with tags produces a lexical
             structure called folksonomy
            •     A folksonomy is a set of tags
            •     Usually represented with a Tag Cloud




            •     The more a tag is used by the community to describe a resource, the
                  more is the likelihood that it faithfully describes the information
                  conveyed by the resource

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   6
social tagging systems
    • Advantages

         • Information organized in a way that closely follows the user
           mental model

         • Effective retrieval, serendipitous browsing

    • Disadvantages

         • Tag space usually very noisy

         • Polysemy, synonymy, level variation
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   7
social tagging systems
     • These problems are of hindrance to completely
           exploit the expressive power of folksonomies
        • e.g. ) Searching the resources annotated with the
              tag “Macbook” will exclude the resources
              annotated with the tag “MacBookPro”
     • Folksonomies can’t be exploited for retrieval and
           filtering resources in an effective way
        • Tag Recommenders are more and more required
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   8
tag recommenders: how do they work?
    •A user posts a new resource on a platform

         •e.g. a new bookmark on bibsonomy.org

    •The resource is analyzed

    •A set of (hopefully) relevant tags is produced and filtered

    •The user freely chooses the most appropriate tags to annotate
     the resource


C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   9
STaR: Social Tag Recommender System
    •Basic assumptions

         • Resources with similar content should be annotated with
           similar tags

             •Improved retrieval techniques

         • The users previous tagging activity should be taken into
           account

             •Increasing the weight of tags already used to annotate
              similar resources
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   10
STaR Architecture
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   11
STaR: indexing strategy
    •Based on Apache Lucene engine

    •A Personal Index for each user

         •Information on her previously tagged resources

    •A Social Index for the whole community

         •Information about all the resources previously tagged by the
          community


C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   12
STaR Architecture
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   13
STaR: retrieval of similar resources
    •Given a resource to be tagged

         •Both the Personal Index and the Social Index queried

         •Lucene Scoring function replaced with the Okapi BM25
          implementation

             •State-of-the-art retrieval model

         •Resources with similarity exceeding a certain threshold
          retrieved

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   14
Retrieval of Similar
                                               STaR                           Resources
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   15
STaR Architecture
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   16
STaR: extraction of candidate tags
    • Extraction of tags from the most similar resources retrieved in the
      previous step

    • Building a set of candidate tags

    • Each tag assigned with a score by weighting the normalized occurence
      of the tag with the similar score returned by Lucene




         • Possible different weights to resources retrieved querying the
           Personal Index or the Social Index


C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   17
Tag Extraction
                                                     STaR                      Process        18
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova
experimental evaluation
    • Goal

         • To evaluate the accurary of STaR using different Lucene scoring functions
           (Experiment 1)

             • Original vs. BM25

         • To evaluate the best combination of weights for resources retrieved from
           Personal Index and Social Index (Experiment 2)

    • Dataset

         • Gathered from Bibsonomy

         • 263,004 bookmark posts, 158,924 BibTeX entries, 3,617 different users

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   19
results of experiment 1
      scoring resource precision                                               recall                      f1
        original             bookmark                   25,26                   29,67                   27,29

          bm25               bookmark                    25,62                  36,62                   30,15

        original                BibTex                  14,06                   21,45                   16,99

          bm25                   BibTex                  13,72                  22,91                   17,16

        original                overall                 16,43                   23,58                   19,37

          bm25                  overall                  16,45                  26,46                   20,29

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   20
results of experiment 2
                         social tag        personal tag
    approach              weight              weight          precision               recall                 f1
    community-
      based
                             1,0                 0,0              34,44               35,89               35,15

    user-based                0,0                1,0               44,73              40,53               42,53

     hybrid_1                0,7                 0,3              32,31               38,57               35,16

     hybrid_2                0,5                 0,5              32,36               37,55               34,76

     hybrid_3                0,3                 0,7              35,47               39,68               37,46

     baseline                  -                   -               42,03              13,23               20,13

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   21
ECML/PKDD Discovery Challenge 2009



   •STaR participated in the ECML/
    PKDD 2009 Discovery Challenge

   •The only Italian team

   •Sixth place in the task of                                                                                We are
    content-based tag                                                                                         there
    recommendation (more than 20
    participants)
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   22
conclusions
    • Users tend to reuse their own tags to annotate similar resources

    • The integration of a more effective scoring function (BM25) improves the recommender
      accuracy

    • Robust recommendation model

         • Partecipation to the Discovery Challenge @ECML-PKDD 09

    • Future Work

         • Tag extraction from textual content of resources

             • Work in progress: 3% of improvement in f1-measure on the ECML/PKDD 09 dataset

         • Word Sense Disambiguation algorithms for tackling tag synonymy and polysemy


C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   23
http://www.di.uniba.it/~swap/

                   Thanks for your attention


     Cataldo Musto
      Ph.D. Student
University of Bari - “Aldo Moro”
              Italy
cataldomusto@di.uniba.it

More Related Content

Viewers also liked

วิจัยในชั้นเรียน ผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแ...
วิจัยในชั้นเรียน ผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแ...วิจัยในชั้นเรียน ผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแ...
วิจัยในชั้นเรียน ผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแ...apiwat97
 
9 B Illnesses In Sport
9 B Illnesses In Sport9 B Illnesses In Sport
9 B Illnesses In Sportpriesthorpe
 
Mashup Visual Programming Environment
Mashup Visual Programming EnvironmentMashup Visual Programming Environment
Mashup Visual Programming EnvironmentDaniel De Sousa
 
Toxicologia ambiental
Toxicologia ambientalToxicologia ambiental
Toxicologia ambientalitzeluka89
 
วิจัยเรื่องผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแก้ปัญห...
วิจัยเรื่องผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแก้ปัญห...วิจัยเรื่องผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแก้ปัญห...
วิจัยเรื่องผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแก้ปัญห...apiwat97
 
Documenting from the sky. Mapping with balloons in Castellón and building com...
Documenting from the sky. Mapping with balloons in Castellón and building com...Documenting from the sky. Mapping with balloons in Castellón and building com...
Documenting from the sky. Mapping with balloons in Castellón and building com...basurama
 
ตัวอย่างโครงร่างวิจัย
ตัวอย่างโครงร่างวิจัยตัวอย่างโครงร่างวิจัย
ตัวอย่างโครงร่างวิจัยguest41395d
 
Risk and Hazards
Risk  and HazardsRisk  and Hazards
Risk and Hazardspriesthorpe
 

Viewers also liked (10)

Sanat
SanatSanat
Sanat
 
วิจัยในชั้นเรียน ผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแ...
วิจัยในชั้นเรียน ผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแ...วิจัยในชั้นเรียน ผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแ...
วิจัยในชั้นเรียน ผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแ...
 
9 B Illnesses In Sport
9 B Illnesses In Sport9 B Illnesses In Sport
9 B Illnesses In Sport
 
Mashup Visual Programming Environment
Mashup Visual Programming EnvironmentMashup Visual Programming Environment
Mashup Visual Programming Environment
 
Toxicologia ambiental
Toxicologia ambientalToxicologia ambiental
Toxicologia ambiental
 
วิจัยเรื่องผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแก้ปัญห...
วิจัยเรื่องผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแก้ปัญห...วิจัยเรื่องผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแก้ปัญห...
วิจัยเรื่องผลสัมฤทธิ์ทางการเรียน กระบวนการทางคณิตศาสตร์ ประกอบด้วย การแก้ปัญห...
 
Documenting from the sky. Mapping with balloons in Castellón and building com...
Documenting from the sky. Mapping with balloons in Castellón and building com...Documenting from the sky. Mapping with balloons in Castellón and building com...
Documenting from the sky. Mapping with balloons in Castellón and building com...
 
ตัวอย่างโครงร่างวิจัย
ตัวอย่างโครงร่างวิจัยตัวอย่างโครงร่างวิจัย
ตัวอย่างโครงร่างวิจัย
 
SAP SD Documents
SAP SD DocumentsSAP SD Documents
SAP SD Documents
 
Risk and Hazards
Risk  and HazardsRisk  and Hazards
Risk and Hazards
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

An IR-based approach to Tag Recommendation"

  • 1. IIR 2010 - First Italian Information Retrieval Workshop Padova, 28 gen 10 !!"#$%&' "!('' #&&!))'#$* $!+),$#-./#%,$'' 0!)!#+&1'2+,34' 1546778889*.93$.(#9.:7;)8#4'' An IR-based approach to tag C. Musto, F. Narducci, P. Lops, M.de Gemmis, G. Semeraro recommendation
  • 2. outline • Background • Web 2.0 and User-Generated Content • Collaborative Tagging Systems • Tag Recommendation • STaR: Social Tag Recommender System • Basic assumptions • Architecture • Experimental Evaluation • Conclusions and future work C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 2
  • 3. background •What is a tag? •Where do we use tags? •Why do we use tags? •Why do we need a tag recommender? •How does a tag recommender works? C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 3
  • 4. web 2.0 • Nowadays web sites tend to be more and more social • Web 2.0 platforms let users to publish auto-produced content • users can post photos, videos • users can express opinions (e.g. reviews) • users can annotate resources C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 4
  • 5. social tagging •Users annotate resources of interest with free keywords, called tags • The act of collaboratively annotate resources with tags produces a lexical structure called folksonomy C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 5
  • 6. folskonomies • The act of collaboratively annotate resources with tags produces a lexical structure called folksonomy • A folksonomy is a set of tags • Usually represented with a Tag Cloud • The more a tag is used by the community to describe a resource, the more is the likelihood that it faithfully describes the information conveyed by the resource C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 6
  • 7. social tagging systems • Advantages • Information organized in a way that closely follows the user mental model • Effective retrieval, serendipitous browsing • Disadvantages • Tag space usually very noisy • Polysemy, synonymy, level variation C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 7
  • 8. social tagging systems • These problems are of hindrance to completely exploit the expressive power of folksonomies • e.g. ) Searching the resources annotated with the tag “Macbook” will exclude the resources annotated with the tag “MacBookPro” • Folksonomies can’t be exploited for retrieval and filtering resources in an effective way • Tag Recommenders are more and more required C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 8
  • 9. tag recommenders: how do they work? •A user posts a new resource on a platform •e.g. a new bookmark on bibsonomy.org •The resource is analyzed •A set of (hopefully) relevant tags is produced and filtered •The user freely chooses the most appropriate tags to annotate the resource C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 9
  • 10. STaR: Social Tag Recommender System •Basic assumptions • Resources with similar content should be annotated with similar tags •Improved retrieval techniques • The users previous tagging activity should be taken into account •Increasing the weight of tags already used to annotate similar resources C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 10
  • 11. STaR Architecture C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 11
  • 12. STaR: indexing strategy •Based on Apache Lucene engine •A Personal Index for each user •Information on her previously tagged resources •A Social Index for the whole community •Information about all the resources previously tagged by the community C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 12
  • 13. STaR Architecture C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 13
  • 14. STaR: retrieval of similar resources •Given a resource to be tagged •Both the Personal Index and the Social Index queried •Lucene Scoring function replaced with the Okapi BM25 implementation •State-of-the-art retrieval model •Resources with similarity exceeding a certain threshold retrieved C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 14
  • 15. Retrieval of Similar STaR Resources C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 15
  • 16. STaR Architecture C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 16
  • 17. STaR: extraction of candidate tags • Extraction of tags from the most similar resources retrieved in the previous step • Building a set of candidate tags • Each tag assigned with a score by weighting the normalized occurence of the tag with the similar score returned by Lucene • Possible different weights to resources retrieved querying the Personal Index or the Social Index C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 17
  • 18. Tag Extraction STaR Process 18 C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova
  • 19. experimental evaluation • Goal • To evaluate the accurary of STaR using different Lucene scoring functions (Experiment 1) • Original vs. BM25 • To evaluate the best combination of weights for resources retrieved from Personal Index and Social Index (Experiment 2) • Dataset • Gathered from Bibsonomy • 263,004 bookmark posts, 158,924 BibTeX entries, 3,617 different users C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 19
  • 20. results of experiment 1 scoring resource precision recall f1 original bookmark 25,26 29,67 27,29 bm25 bookmark 25,62 36,62 30,15 original BibTex 14,06 21,45 16,99 bm25 BibTex 13,72 22,91 17,16 original overall 16,43 23,58 19,37 bm25 overall 16,45 26,46 20,29 C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 20
  • 21. results of experiment 2 social tag personal tag approach weight weight precision recall f1 community- based 1,0 0,0 34,44 35,89 35,15 user-based 0,0 1,0 44,73 40,53 42,53 hybrid_1 0,7 0,3 32,31 38,57 35,16 hybrid_2 0,5 0,5 32,36 37,55 34,76 hybrid_3 0,3 0,7 35,47 39,68 37,46 baseline - - 42,03 13,23 20,13 C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 21
  • 22. ECML/PKDD Discovery Challenge 2009 •STaR participated in the ECML/ PKDD 2009 Discovery Challenge •The only Italian team •Sixth place in the task of We are content-based tag there recommendation (more than 20 participants) C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 22
  • 23. conclusions • Users tend to reuse their own tags to annotate similar resources • The integration of a more effective scoring function (BM25) improves the recommender accuracy • Robust recommendation model • Partecipation to the Discovery Challenge @ECML-PKDD 09 • Future Work • Tag extraction from textual content of resources • Work in progress: 3% of improvement in f1-measure on the ECML/PKDD 09 dataset • Word Sense Disambiguation algorithms for tackling tag synonymy and polysemy C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 23
  • 24. http://www.di.uniba.it/~swap/ Thanks for your attention Cataldo Musto Ph.D. Student University of Bari - “Aldo Moro” Italy cataldomusto@di.uniba.it