Scientific Recommender Systems

             Jan Petertonkoker


            January 12th, 2012




   Scientific Recommender Systems   1
Contents


Contents


  1. Motivation (Examples)
  2. Recommender Systems
  3. Categories of Recommender Systems
     3.1 Content-based Recommender: TF-IDF
     3.2 Collaborative Recommender: Apache Mahout
     3.3 Hybrid Recommender: SciPlore
  4. Visualizations (Prototype)
  5. Conclusion




                  Scientific Recommender Systems           2
Motivation


Motivation




                       Example: Amazon




             Scientific Recommender Systems           3
Motivation


Motivation




                       Example: Twitter



             Scientific Recommender Systems           4
Recommender Systems


Recommender Systems



                              u :C ×S →R

    C - set of all users
    S - set of all items
    R - totally ordered set, which describes the usefulness of the
    items to the respective user




                  Scientific Recommender Systems                             5
Categories of Recommender Systems


Categories of Recommender Systems



    content-based: items are recommended that are similar to
    items the user liked in the past
    collaborative: items are recommended that people liked that
    are similar to the user (similar taste/preferences)
    hybrid: a combination of content-based and collaborative
    recommendation approaches




                Scientific Recommender Systems                                  6
Categories of Recommender Systems


Content-based Recommender Systems


    utility u(c, s) of an item s is estimated with the help of the
    utilities u(c, si ) of all items si ∈ S that user c already rated
    that are similar to item s
    similarity between items is calculated according to their
    attributes
    user and item profiles
    common problems
         limited content analysis
         overspecialization
         new user problem




                  Scientific Recommender Systems                                    7
Categories of Recommender Systems


Content-based Recommender: TF-IDF
    N - total number of documents in the system
    keyword ki appears in ni of the documents
    fi,j denotes the number of times a certain keyword ki appears
    in a document dj




                Scientific Recommender Systems                                  8
Categories of Recommender Systems


Content-based Recommender: TF-IDF
     N - total number of documents in the system
     keyword ki appears in ni of the documents
     fi,j denotes the number of times a certain keyword ki appears
     in a document dj
 Term Frequency
                 fi,j
     TFi,j =   maxz fz,j
     maximum in the denominator calculated over the frequencies
     of all keywords kz that appear in document dj




                     Scientific Recommender Systems                                  8
Categories of Recommender Systems


Content-based Recommender: TF-IDF
     N - total number of documents in the system
     keyword ki appears in ni of the documents
     fi,j denotes the number of times a certain keyword ki appears
     in a document dj
 Term Frequency
                 fi,j
     TFi,j =   maxz fz,j
     maximum in the denominator calculated over the frequencies
     of all keywords kz that appear in document dj
 Inverse Document Frequency
                                   N
     for a keyword ki : IDFi = log ni



                     Scientific Recommender Systems                                  8
Categories of Recommender Systems


Content-based Recommender: TF-IDF
     N - total number of documents in the system
     keyword ki appears in ni of the documents
     fi,j denotes the number of times a certain keyword ki appears
     in a document dj
 Term Frequency
                 fi,j
     TFi,j =   maxz fz,j
     maximum in the denominator calculated over the frequencies
     of all keywords kz that appear in document dj
 Inverse Document Frequency
                                   N
     for a keyword ki : IDFi = log ni
 TF-IDF
     wi,j = TFi,j × IDFi
                     Scientific Recommender Systems                                  8
Categories of Recommender Systems


Collaborative Recommender Systems



    utility u(c, s) of an item s is estimated with the help of the
    utilities u(ci , s) assigned by users ci ∈ C that are similar to
    user c.
    common problems
         new user/item problem
         cold start
         sparsity
         scalability




                  Scientific Recommender Systems                                    9
Categories of Recommender Systems


Collaborative Recommender: Apache Mahout (1)



    provides a ”toolbox” to create collaborative recommender
    systems
    input
        user (long), item (long), preference (double)
        1, 111, 2.5
    data model
        input from different file formats, database
        increase performance with specific data structures




                 Scientific Recommender Systems                                    10
Categories of Recommender Systems


Collaborative Recommender: Apache Mahout (2)
    user-based recommender




               Scientific Recommender Systems                                 11
Categories of Recommender Systems


Collaborative Recommender: Apache Mahout (2)
    user-based recommender




    item-based recommender




               Scientific Recommender Systems                                 11
Categories of Recommender Systems


Collaborative Recommender: Apache Mahout (3)


    similarity measures
        pearson correlation (cosine similarity)
        euclidean distance
        spearman correlation
        log-likelihood
        ...
    slope-one recommender
    other experimental recommender implementations
        e.g. cluster-based




                 Scientific Recommender Systems                                  12
Categories of Recommender Systems


Hybrid Recommender Systems


    combination of content-based and collaborative methods
        seperate content-based and collaborative recommender
        systems; results get combined somehow
        collaborative recommender system with some added aspects of
        content-based methods
        content-based recommender system with some added aspects
        of collaborative methods
        a single recommender system which unifies content-based and
        collaborative methods from the beginning




                Scientific Recommender Systems                                 13
Categories of Recommender Systems


Hybrid Recommender: SciPlore




                       SciPlore Overview


             Scientific Recommender Systems                                 14
Visualizations (Prototype)


Visualizations (Prototype)



     several recommenders based on given database
     visualizations for explaining recommendations




                  Live Presentation


                 Scientific Recommender Systems                              15
Conclusion


Summary



   utility function
   categories of recommender systems
        content-based
        collaborative
        hybrid
   implementation with Apache Mahout
   possible visualizations




                Scientific Recommender Systems          16
Conclusion




      Questions?




Scientific Recommender Systems          17
References


References

    Apache Mahout: Scalable machine learning and data mining.
    http://mahout.apache.org/ - accessed on 6th January 2012
    SciPlore: Exploring Science. http://www.sciplore.org -
    accessed on 6th January 2012
    G Adomavicius and A Tuzhilin. Toward the next generation of
    recommender systems: a survey of the state-of-the-art and
    possible extensions. IEEE Transactions on Knowledge and
    Data Engineering, 17(6):734-749, 2005
    B Gipp, J Beel and C Hentschel. Scienstein: A research paper
    recommender system, volume 301, pages 309-315. IEEE, 2009
    Sean Owen, Robin Anil, Ted Dunning and Ellen Friedman.
    Mahout in Action, 2011

                Scientific Recommender Systems                       18

Scientific Recommender Systems - PG PUSHPIN

  • 1.
    Scientific Recommender Systems Jan Petertonkoker January 12th, 2012 Scientific Recommender Systems 1
  • 2.
    Contents Contents 1.Motivation (Examples) 2. Recommender Systems 3. Categories of Recommender Systems 3.1 Content-based Recommender: TF-IDF 3.2 Collaborative Recommender: Apache Mahout 3.3 Hybrid Recommender: SciPlore 4. Visualizations (Prototype) 5. Conclusion Scientific Recommender Systems 2
  • 3.
    Motivation Motivation Example: Amazon Scientific Recommender Systems 3
  • 4.
    Motivation Motivation Example: Twitter Scientific Recommender Systems 4
  • 5.
    Recommender Systems Recommender Systems u :C ×S →R C - set of all users S - set of all items R - totally ordered set, which describes the usefulness of the items to the respective user Scientific Recommender Systems 5
  • 6.
    Categories of RecommenderSystems Categories of Recommender Systems content-based: items are recommended that are similar to items the user liked in the past collaborative: items are recommended that people liked that are similar to the user (similar taste/preferences) hybrid: a combination of content-based and collaborative recommendation approaches Scientific Recommender Systems 6
  • 7.
    Categories of RecommenderSystems Content-based Recommender Systems utility u(c, s) of an item s is estimated with the help of the utilities u(c, si ) of all items si ∈ S that user c already rated that are similar to item s similarity between items is calculated according to their attributes user and item profiles common problems limited content analysis overspecialization new user problem Scientific Recommender Systems 7
  • 8.
    Categories of RecommenderSystems Content-based Recommender: TF-IDF N - total number of documents in the system keyword ki appears in ni of the documents fi,j denotes the number of times a certain keyword ki appears in a document dj Scientific Recommender Systems 8
  • 9.
    Categories of RecommenderSystems Content-based Recommender: TF-IDF N - total number of documents in the system keyword ki appears in ni of the documents fi,j denotes the number of times a certain keyword ki appears in a document dj Term Frequency fi,j TFi,j = maxz fz,j maximum in the denominator calculated over the frequencies of all keywords kz that appear in document dj Scientific Recommender Systems 8
  • 10.
    Categories of RecommenderSystems Content-based Recommender: TF-IDF N - total number of documents in the system keyword ki appears in ni of the documents fi,j denotes the number of times a certain keyword ki appears in a document dj Term Frequency fi,j TFi,j = maxz fz,j maximum in the denominator calculated over the frequencies of all keywords kz that appear in document dj Inverse Document Frequency N for a keyword ki : IDFi = log ni Scientific Recommender Systems 8
  • 11.
    Categories of RecommenderSystems Content-based Recommender: TF-IDF N - total number of documents in the system keyword ki appears in ni of the documents fi,j denotes the number of times a certain keyword ki appears in a document dj Term Frequency fi,j TFi,j = maxz fz,j maximum in the denominator calculated over the frequencies of all keywords kz that appear in document dj Inverse Document Frequency N for a keyword ki : IDFi = log ni TF-IDF wi,j = TFi,j × IDFi Scientific Recommender Systems 8
  • 12.
    Categories of RecommenderSystems Collaborative Recommender Systems utility u(c, s) of an item s is estimated with the help of the utilities u(ci , s) assigned by users ci ∈ C that are similar to user c. common problems new user/item problem cold start sparsity scalability Scientific Recommender Systems 9
  • 13.
    Categories of RecommenderSystems Collaborative Recommender: Apache Mahout (1) provides a ”toolbox” to create collaborative recommender systems input user (long), item (long), preference (double) 1, 111, 2.5 data model input from different file formats, database increase performance with specific data structures Scientific Recommender Systems 10
  • 14.
    Categories of RecommenderSystems Collaborative Recommender: Apache Mahout (2) user-based recommender Scientific Recommender Systems 11
  • 15.
    Categories of RecommenderSystems Collaborative Recommender: Apache Mahout (2) user-based recommender item-based recommender Scientific Recommender Systems 11
  • 16.
    Categories of RecommenderSystems Collaborative Recommender: Apache Mahout (3) similarity measures pearson correlation (cosine similarity) euclidean distance spearman correlation log-likelihood ... slope-one recommender other experimental recommender implementations e.g. cluster-based Scientific Recommender Systems 12
  • 17.
    Categories of RecommenderSystems Hybrid Recommender Systems combination of content-based and collaborative methods seperate content-based and collaborative recommender systems; results get combined somehow collaborative recommender system with some added aspects of content-based methods content-based recommender system with some added aspects of collaborative methods a single recommender system which unifies content-based and collaborative methods from the beginning Scientific Recommender Systems 13
  • 18.
    Categories of RecommenderSystems Hybrid Recommender: SciPlore SciPlore Overview Scientific Recommender Systems 14
  • 19.
    Visualizations (Prototype) Visualizations (Prototype) several recommenders based on given database visualizations for explaining recommendations Live Presentation Scientific Recommender Systems 15
  • 20.
    Conclusion Summary utility function categories of recommender systems content-based collaborative hybrid implementation with Apache Mahout possible visualizations Scientific Recommender Systems 16
  • 21.
    Conclusion Questions? Scientific Recommender Systems 17
  • 22.
    References References Apache Mahout: Scalable machine learning and data mining. http://mahout.apache.org/ - accessed on 6th January 2012 SciPlore: Exploring Science. http://www.sciplore.org - accessed on 6th January 2012 G Adomavicius and A Tuzhilin. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6):734-749, 2005 B Gipp, J Beel and C Hentschel. Scienstein: A research paper recommender system, volume 301, pages 309-315. IEEE, 2009 Sean Owen, Robin Anil, Ted Dunning and Ellen Friedman. Mahout in Action, 2011 Scientific Recommender Systems 18