SlideShare a Scribd company logo
1 of 56
Download to read offline
Mahout becomes
   a researcher




           Kris Jack, PhD
Senior Data Mining Engineer
Overview

➔
    What's Mendeley?

➔
    Applications of Mahout's Recommender

➔
    Under Mahout's Bonnet

➔
    Mahout's Research Career so Far

➔
    Conclusions
What's Mendeley?
➔
    Mendeley is a data platform for researchers
    ➔
        We're bringing together researchers and the research
        that they produce from all over the world

    ➔
        We're structuring this data in a machine readable format

    ➔
        We're opening this data up for you to build applications
        on top of it using our API

    ➔
        These applications help researchers to do even better
        research and become more productive

➔
    How are we building our community?
Mendeley provides tools to help users...


...organise
their research

                                              ➔
                                               Reference
                                              management

                                              ➔
                                               Cite-as-you-
                                              write

                                              ➔
                                                Full-text
                                              article search

                                              ➔
                                               Digitalised
                                              annotations
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise
their research

                                        ➔
                                            Research network

                                        ➔
                                          Professional
                                        research groups
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise                                ...discover new
their research                                    research

                                       ➔
                                           Mendeley Suggest

                                       ➔
                                         Personalised article
                                       recommendations

                                       ➔
                                         Weekly batch of 10
                                       recommended articles

                                       ➔
                                           Collaborative Filtering

                                       ➔
                                        The more data, the
                                       better
1.5 million+ users; the 20 largest user bases:
                            University of Cambridge
                                 Stanford University
                                                   MIT
                                 University of Michigan
                                       Harvard University
                                       University of Oxford
                                      Sao Paulo University
                                    Imperial College London
                                      University of Edinburgh
                                            Cornell University
                              University of California at Berkeley
                                                      RWTH Aachen
                                               Columbia University
                                                           Georgia Tech
                                               University of Wisconsin
                                                            UC San Diego
                                              University of California at LA
                                                        University of Florida

50m research articles                              University of North Carolina
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise                            ...discover new
their research                                research



            We need a recommender
           that scales up, coping with
           our data and future growth
Applications of Mahout's
          Recommender
Mahout use cases:
                          ➔
                              Retrieve related items in
                              large collections




http://www.slideshare.net/kryton/the-data-layer
Mahout use cases:
                          ➔
                              Retrieve related items in
                              large collections

                          ➔
                              Discover relevant items that
                              you may have overlooked




http://engineering.foursquare.com/2011/03/22/build
ing-a-recommendation-engine-foursquare-style/
Mahout use cases:
                               ➔
                                   Retrieve related items in
                                   large collections

                               ➔
                                   Discover relevant items that
                                   you may have overlooked

                               ➔
                                   Find love!
                                   ➔
                                       Mahout implements collaborative
                                       filtering, a surprisingly powerful
                                       algorithm




http://www.speeddate.com/apps/site/views/mp/technology.php
Mahout use cases:
                                  ➔
                                      Retrieve related items in
                                      large collections

                                  ➔
                                      Discover relevant items that
                                      you may have overlooked

                                  ➔
                                      Find love!
                                      ➔
                                          Mahout implements collaborative
                                          filtering, a surprisingly powerful
                                          algorithm

                                  ➔
                                      Mendeley Suggest
                                      ➔
                                          Discover new research
                                      ➔
                                          Fill in gaps in your library
                                      ➔
                                          Your personal advisor

http://krisjack.blogspot.co.uk/2012/02/your-very-own-
personalised-research.html
Under Mahout's
       Bonnet
Generating recommendations
through matrix multiplication

                                                          This is item-based
                                                          recommendations as
                                                          similarity is based on
                                                          items, not users




Not convinced? Try reading these...
 Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender
 systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions
 on Knowledge and Data Engineering, 17(6), 734-749. Piscataway, NJ, USA.

 http://www.slideshare.net/srowen/collaborative-filtering-at-scale-2
 http://krisjack.blogspot.co.uk/2012/04/under-bonnet-of-mahouts-item-based.html
Researchers
                                      Turing Babbage Einstein   Newton




                    Comp Sci 1
Research Articles



                    Comp Sci 2



                      Physics 1



                      Physics 2



                                  Input (all user preferences)
Researchers
                                      Turing Babbage Einstein   Newton
                                                                         1.5M



                    Comp Sci 1
Research Articles



                    Comp Sci 2



                      Physics 1



                      Physics 2
                                                                          300M
                                                                          prefs

                                   50M

                                  Input (all user preferences)
Researchers




                               Research
                               Articles
item.RecommenderJob
 1. Prep. pref. matrix (1-3)
 2. Gen. sim. matrix (4-6)
 3. Multiply matrices (7-10)              All User Preferences
                                              (item x user)
Researchers




                                   Research
                                   Articles
item.RecommenderJob
 1. Prep. pref. matrix (1-3)
 2. Gen. sim. matrix (4-6)
 3. Multiply matrices (7-10)                  All User Preferences
                                                  (item x user)




                               Research       Turing
                               Articles




                               A User's Preferences
                                  (item x user)
Researchers




                                    Research
                                    Articles
item.RecommenderJob
  1. Prep. pref. matrix (1-3)
  2. Gen. sim. matrix (4-6)
  3. Multiply matrices (7-10)                  All User Preferences
                                                   (item x user)


                Research
                Articles                       Turing


            2   1    0     0
                                Research
Research




                     0     0
                                Articles


            1   1
Articles




            0   0    2     2
            0   0    2     2
           Item Similarity      A User's Preferences
            (item x item)          (item x user)
Researchers




                                                                          Research
                                                                          Articles
                                          Research Articles
                                  Comp Sci 1         Physics 1
                                           Comp Sci 2         Physics 2
                                                                                     Input (all user
                                                                                     preferences)



                    Comp Sci 1       2        1         0        0
Research Articles




                    Comp Sci 2       1        1         0        0
                      Physics 1
                                     0         0        2        2
                      Physics 2
                                     0         0        2        2
Researchers




                                       Research
                                       Articles
item.RecommenderJob
  1. Prep. pref. matrix (1-3)
  2. Gen. sim. matrix (4-6)
  3. Multiply matrices (7-10)                     All User Preferences
                                                      (item x user)


                Research
                Articles                          Turing                       Turing


            2   1    0     0
                                   Research




                                                                    Research
Research




                     0     0
                                   Articles




                                                                    Articles
            1   1
Articles




            0   0    2     2   X                             =
            0   0    2     2
           Item Similarity         A User's Preferences               Recommendations
            (item x item)             (item x user)                     (item x user)
Running on Amazon's Elastic Map Reduce




                On demand use and easy to cost
Mahout's Research
    Career so Far
Mendeley Suggest
Mahout's
Normalised Amazon Hours          Performance




                          No. Good Recommendations/10
Mahout's
               Costly & Bad
Normalised Amazon Hours              Performance            Costly & Good




           Cheap & Bad        No. Good Recommendations/10   Cheap & Good
Mahout's
               Costly & Bad
Normalised Amazon Hours              Performance            Costly & Good




           Cheap & Bad        No. Good Recommendations/10   Cheap & Good
Mahout's
               Costly & Bad
Normalised Amazon Hours              Performance            Costly & Good




           Cheap & Bad        No. Good Recommendations/10   Cheap & Good
Mahout's
               Costly & Bad        Performance           Costly & Good
                          7K
Normalised Amazon Hours


                          6K

                          5K

                          4K

                          3K

                          2K

                          1K

                           0
                       0.5     0
                               1      1.5   2      2.5         3
           Cheap & Bad   No. Good Recommendations/10     Cheap & Good
Mahout's
               Costly & Bad          Performance         Costly & Good
                          7K
                                   6.5K, 1.5
Normalised Amazon Hours


                          6K       Orig. item-based


                          5K

                          4K

                          3K

                          2K

                          1K

                           0
                       0.5     0
                               1      1.5   2      2.5         3
           Cheap & Bad   No. Good Recommendations/10     Cheap & Good
Mahout's
               Costly & Bad              Performance      Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K

                          4K

                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K

                          1K

                           0
                       0.5     0
                               1      1.5   2      2.5          3
           Cheap & Bad   No. Good Recommendations/10      Cheap & Good
Mahout's
               Costly & Bad              Performance              Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K
                                                          -4.1K
                                                          (63%)
                          4K

                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K

                          1K

                           0
                       0.5     0
                               1      1.5   2      2.5                  3
           Cheap & Bad   No. Good Recommendations/10              Cheap & Good
Reducing processing time and cost

➔
    Mahout's recommender is already efficient
    ➔
        but your data may have unusual properties
➔
    We got improvements by:
    ➔
        tuning Hadoop's mapper and reducer allocation over the 10
        steps in the RecommenderJob
    ➔
        using an appropriate partitioner
Task Allocation              37 hours to complete




    1 reducer allocated, despite having 48 available...
Task Allocation

Allocating more reducers on a per job basis

                job.getConfiguration().setInt(
                    "mapred.reduce.tasks",
                    numMappers);



Allocating more mappers on a per job basis

                job.getConfiguration().set(
                    "mapred.max.split.size",
                    String.valueOf(splitSize));
Task Allocation   37 hours to complete
                      14 hours




                      From 1 → 40
                      reducers
Partitioners   14 hours to complete
Partitioners   14 hours to complete

                                      ~50KB




                            ~500MB
InputSampler.Sampler<IntWritable, Text> sampler =
      new InputSampler.RandomSampler<IntWritable, Text>(...);
  InputSampler.writePartitionFile(conf, sampler);
  conf.setPartitionerClass(TotalOrderPartitioner.class);




http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-
series-issue-2-getting-started-with-customized-partitioning/
Partitioners        14 hours to complete
                   2 hours




               Evenly
               distributed
Mahout's
               Costly & Bad              Performance              Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K
                                                          -4.1K
                                                          (63%)
                          4K

                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K

                          1K

                           0
                       0.5     0
                               1      1.5   2      2.5                  3
           Cheap & Bad   No. Good Recommendations/10              Cheap & Good
Researchers




                                       Research
                                       Articles
item.RecommenderJob
  1. Prep. pref. matrix (1-3)
  2. Gen. sim. matrix (4-6)
  3. Multiply matrices (7-10)                     All User Preferences
                                                      (item x user)


                Research
                Articles                          Turing                       Turing


            2   1    0     0
                                   Research




                                                                    Research
Research




                     0     0
                                   Articles




                                                                    Articles
            1   1
Articles




            0   0    2     2   X                             =
            0   0    2     2
           Item Similarity         A User's Preferences               Recommendations
            (item x item)             (item x user)                     (item x user)
Researchers


   user




                                         Research
                                         Articles
   item.RecommenderJob
      1. Prep. pref. matrix (1-3)
      2. Gen. sim. matrix (4-6)
      3. Multiply matrices (7-10)                   All User Preferences
                                                        (item x user)

                Researchers
                  Research
                  Articles                          Turing                       Turing


               2   1    0   0
Researchers




                                     Research




                                                                      Research
  Research




                        0   0
                                     Articles




                                                                      Articles
               1   1
  Articles




               0   0    2   2   X                              =
               0   0    2   2
              Item Similarity        A User's Preferences               Recommendations
               (item x item)            (item x user)                     (item x user)
     User Similarity (user x user)
Mahout's
               Costly & Bad              Performance                        Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K

                          4K

                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K
                                                              Orig. user-based
                          1K
                                                          ➔
                                                              1K, 2.5


                           0
                       0.5     0
                               1      1.5   2      2.5                            3
           Cheap & Bad   No. Good Recommendations/10                         Cheap & Good
Mahout's
               Costly & Bad              Performance                        Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K

                          4K

                          3K           Cust. item-based
                                                          +1 (67%)
                                   ➔
                                       2.4K, 1.5
                          2K              -1.4K
                                                              Orig. user-based
                                          (58%)
                          1K
                                                          ➔
                                                              1K, 2.5


                           0
                       0.5     0
                               1      1.5   2      2.5                            3
           Cheap & Bad   No. Good Recommendations/10                         Cheap & Good
Mahout's
               Costly & Bad              Performance                      Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K

                          4K

                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K
                                                            Orig. user-based
                          1K
                                                          ➔
                                                            1K, 2.5
                                                            Cust. user-based
                                                          ➔
                                                            0.3K, 2.5
                           0
                       0.5     0
                               1      1.5   2      2.5                          3
           Cheap & Bad   No. Good Recommendations/10                       Cheap & Good
Mahout's
               Costly & Bad              Performance                   Costly & Good
                          7K
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K
                                                          -4.1K
                                                          (63%)
                          4K

                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K
                                                         Orig. user-based
                          1K                             1K, 2.5
                                                           ➔


                                                  -0.7K  Cust. user-based
                                                  (70%) ➔0.3K, 2.5
                           0
                       0.5     0
                               1      1.5   2      2.5                       3
           Cheap & Bad   No. Good Recommendations/10                    Cheap & Good
Mahout's
               Costly & Bad              Performance                      Costly & Good
                          7K                              +1 (67%)
                                       6.5K, 1.5
Normalised Amazon Hours


                          6K           Orig. item-based


                          5K

                          4K
                                                                     -6.2K
                                                                     (95%)
                          3K           Cust. item-based
                                   ➔
                                       2.4K, 1.5
                          2K
                                                            Orig. user-based
                          1K
                                                          ➔
                                                            1K, 2.5
                                                            Cust. user-based
                                                          ➔
                                                            0.3K, 2.5
                           0
                       0.5     0
                               1      1.5   2      2.5                          3
           Cheap & Bad   No. Good Recommendations/10                       Cheap & Good
Conclusions
Conclusions
➔
    Mahout is doing a great job of powering Mendeley Suggest
    ➔
        Large scale data set
    ➔
        Excellent for batch processing requirements
➔
 We'll soon be feeding our user-based implementation into
Mahout
    ➔
        User-based can outperform item-based
    ➔
        Makes Mahout's offering more rounded
➔
    Save resources and money by understanding your data
    ➔
        Help Hadoop with task allocation if necessary
    ➔
        Paritition your data appropriately
We're Hiring!
➔
    Hadoop Data Architect
    ➔
        design a coherent data model across the company
    ➔
        take ownership of our data
    ➔
        hands on Hadoop administration
➔
    Marie Curie Senior Research Fellow
    ➔
        ensure that Mendeley’s research catalogue is of high quality
    ➔
        research and development opportunity
➔
    £500 Finder's Fee if you find someone who we hire
➔
    http://www.mendeley.com/careers/
www.mendeley.com

More Related Content

Similar to Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley

Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011
Lynch & Dirks  - Platforms for Open Research - Charleston Conference 2011Lynch & Dirks  - Platforms for Open Research - Charleston Conference 2011
Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011Lee Dirks
 
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...datascience_at
 
Using Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource RecommendationUsing Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource RecommendationChris Clarke
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyKris Jack
 
Teaching with Technology Institute Training
Teaching with Technology Institute TrainingTeaching with Technology Institute Training
Teaching with Technology Institute TrainingEmily Puckett Rodgers
 
Wiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School PkuWiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School Pkuguest8ed46d
 
Wiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School PkuWiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School Pkuwiser pku
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleKris Jack
 
Effective Literature Searching 2011
Effective Literature Searching 2011Effective Literature Searching 2011
Effective Literature Searching 2011Middlesex University
 
Prizing Open and Enhancing Research Corpora for Language Teaching
Prizing Open and Enhancing Research Corpora for Language TeachingPrizing Open and Enhancing Research Corpora for Language Teaching
Prizing Open and Enhancing Research Corpora for Language TeachingAlannah Fitzgerald
 
Towards a Cloud Library
Towards a Cloud LibraryTowards a Cloud Library
Towards a Cloud LibraryRachel Frick
 
Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Guus van den Brekel
 
P2Pvalue Directory: A collaborative resource to map common-based peer produc...
P2Pvalue Directory:  A collaborative resource to map common-based peer produc...P2Pvalue Directory:  A collaborative resource to map common-based peer produc...
P2Pvalue Directory: A collaborative resource to map common-based peer produc...P2Pvalue
 
Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012Jeanne Kitchens
 
21stcenturye learningslideshare
21stcenturye learningslideshare21stcenturye learningslideshare
21stcenturye learningslidesharetsimatsima
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Lillian Rigling
 
2collab London Online web2.0 after the buzz
2collab London Online web2.0 after the buzz2collab London Online web2.0 after the buzz
2collab London Online web2.0 after the buzzf kersten
 

Similar to Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley (20)

Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011
Lynch & Dirks  - Platforms for Open Research - Charleston Conference 2011Lynch & Dirks  - Platforms for Open Research - Charleston Conference 2011
Lynch & Dirks - Platforms for Open Research - Charleston Conference 2011
 
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
 
Using Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource RecommendationUsing Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource Recommendation
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from Mendeley
 
Teaching with Technology Institute Training
Teaching with Technology Institute TrainingTeaching with Technology Institute Training
Teaching with Technology Institute Training
 
Wiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School PkuWiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School Pku
 
Wiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School PkuWiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School Pku
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
 
Libraries meet research 2.0
Libraries meet research 2.0Libraries meet research 2.0
Libraries meet research 2.0
 
Effective Literature Searching 2011
Effective Literature Searching 2011Effective Literature Searching 2011
Effective Literature Searching 2011
 
Prizing Open and Enhancing Research Corpora for Language Teaching
Prizing Open and Enhancing Research Corpora for Language TeachingPrizing Open and Enhancing Research Corpora for Language Teaching
Prizing Open and Enhancing Research Corpora for Language Teaching
 
Towards a Cloud Library
Towards a Cloud LibraryTowards a Cloud Library
Towards a Cloud Library
 
Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0
 
Final Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational ResearchFinal Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational Research
 
P2Pvalue Directory: A collaborative resource to map common-based peer produc...
P2Pvalue Directory:  A collaborative resource to map common-based peer produc...P2Pvalue Directory:  A collaborative resource to map common-based peer produc...
P2Pvalue Directory: A collaborative resource to map common-based peer produc...
 
Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012
 
21stcenturye learningslideshare
21stcenturye learningslideshare21stcenturye learningslideshare
21stcenturye learningslideshare
 
University 2.0
University 2.0University 2.0
University 2.0
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
 
2collab London Online web2.0 after the buzz
2collab London Online web2.0 after the buzz2collab London Online web2.0 after the buzz
2collab London Online web2.0 after the buzz
 

More from Kris Jack

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ MendeleyKris Jack
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Kris Jack
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemKris Jack
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutKris Jack
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesKris Jack
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Kris Jack
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionKris Jack
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...Kris Jack
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...Kris Jack
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersKris Jack
 
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureRecommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureKris Jack
 

More from Kris Jack (13)

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with Mahout
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similarities
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language Acquisition
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchers
 
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureRecommendation Engines for Scientific Literature
Recommendation Engines for Scientific Literature
 

Recently uploaded

Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxAvaniJani1
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPCeline George
 

Recently uploaded (20)

Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERP
 
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
 

Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley

  • 1. Mahout becomes a researcher Kris Jack, PhD Senior Data Mining Engineer
  • 2. Overview ➔ What's Mendeley? ➔ Applications of Mahout's Recommender ➔ Under Mahout's Bonnet ➔ Mahout's Research Career so Far ➔ Conclusions
  • 4. Mendeley is a data platform for researchers ➔ We're bringing together researchers and the research that they produce from all over the world ➔ We're structuring this data in a machine readable format ➔ We're opening this data up for you to build applications on top of it using our API ➔ These applications help researchers to do even better research and become more productive ➔ How are we building our community?
  • 5. Mendeley provides tools to help users... ...organise their research ➔ Reference management ➔ Cite-as-you- write ➔ Full-text article search ➔ Digitalised annotations
  • 6. Mendeley provides tools to help users... ...collaborate with one another ...organise their research ➔ Research network ➔ Professional research groups
  • 7. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research ➔ Mendeley Suggest ➔ Personalised article recommendations ➔ Weekly batch of 10 recommended articles ➔ Collaborative Filtering ➔ The more data, the better
  • 8. 1.5 million+ users; the 20 largest user bases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego University of California at LA University of Florida 50m research articles University of North Carolina
  • 9. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research We need a recommender that scales up, coping with our data and future growth
  • 11.
  • 12.
  • 13. Mahout use cases: ➔ Retrieve related items in large collections http://www.slideshare.net/kryton/the-data-layer
  • 14. Mahout use cases: ➔ Retrieve related items in large collections ➔ Discover relevant items that you may have overlooked http://engineering.foursquare.com/2011/03/22/build ing-a-recommendation-engine-foursquare-style/
  • 15. Mahout use cases: ➔ Retrieve related items in large collections ➔ Discover relevant items that you may have overlooked ➔ Find love! ➔ Mahout implements collaborative filtering, a surprisingly powerful algorithm http://www.speeddate.com/apps/site/views/mp/technology.php
  • 16. Mahout use cases: ➔ Retrieve related items in large collections ➔ Discover relevant items that you may have overlooked ➔ Find love! ➔ Mahout implements collaborative filtering, a surprisingly powerful algorithm ➔ Mendeley Suggest ➔ Discover new research ➔ Fill in gaps in your library ➔ Your personal advisor http://krisjack.blogspot.co.uk/2012/02/your-very-own- personalised-research.html
  • 17. Under Mahout's Bonnet
  • 18. Generating recommendations through matrix multiplication This is item-based recommendations as similarity is based on items, not users Not convinced? Try reading these... Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734-749. Piscataway, NJ, USA. http://www.slideshare.net/srowen/collaborative-filtering-at-scale-2 http://krisjack.blogspot.co.uk/2012/04/under-bonnet-of-mahouts-item-based.html
  • 19. Researchers Turing Babbage Einstein Newton Comp Sci 1 Research Articles Comp Sci 2 Physics 1 Physics 2 Input (all user preferences)
  • 20. Researchers Turing Babbage Einstein Newton 1.5M Comp Sci 1 Research Articles Comp Sci 2 Physics 1 Physics 2 300M prefs 50M Input (all user preferences)
  • 21. Researchers Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user)
  • 22. Researchers Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Research Turing Articles A User's Preferences (item x user)
  • 23. Researchers Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Research Articles Turing 2 1 0 0 Research Research 0 0 Articles 1 1 Articles 0 0 2 2 0 0 2 2 Item Similarity A User's Preferences (item x item) (item x user)
  • 24. Researchers Research Articles Research Articles Comp Sci 1 Physics 1 Comp Sci 2 Physics 2 Input (all user preferences) Comp Sci 1 2 1 0 0 Research Articles Comp Sci 2 1 1 0 0 Physics 1 0 0 2 2 Physics 2 0 0 2 2
  • 25. Researchers Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Research Articles Turing Turing 2 1 0 0 Research Research Research 0 0 Articles Articles 1 1 Articles 0 0 2 2 X = 0 0 2 2 Item Similarity A User's Preferences Recommendations (item x item) (item x user) (item x user)
  • 26. Running on Amazon's Elastic Map Reduce On demand use and easy to cost
  • 27. Mahout's Research Career so Far
  • 29. Mahout's Normalised Amazon Hours Performance No. Good Recommendations/10
  • 30. Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 31. Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 32. Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 33. Mahout's Costly & Bad Performance Costly & Good 7K Normalised Amazon Hours 6K 5K 4K 3K 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 34. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 35. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 36. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 37. Reducing processing time and cost ➔ Mahout's recommender is already efficient ➔ but your data may have unusual properties ➔ We got improvements by: ➔ tuning Hadoop's mapper and reducer allocation over the 10 steps in the RecommenderJob ➔ using an appropriate partitioner
  • 38. Task Allocation 37 hours to complete 1 reducer allocated, despite having 48 available...
  • 39. Task Allocation Allocating more reducers on a per job basis job.getConfiguration().setInt( "mapred.reduce.tasks", numMappers); Allocating more mappers on a per job basis job.getConfiguration().set( "mapred.max.split.size", String.valueOf(splitSize));
  • 40. Task Allocation 37 hours to complete 14 hours From 1 → 40 reducers
  • 41. Partitioners 14 hours to complete
  • 42. Partitioners 14 hours to complete ~50KB ~500MB
  • 43. InputSampler.Sampler<IntWritable, Text> sampler = new InputSampler.RandomSampler<IntWritable, Text>(...); InputSampler.writePartitionFile(conf, sampler); conf.setPartitionerClass(TotalOrderPartitioner.class); http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial- series-issue-2-getting-started-with-customized-partitioning/
  • 44. Partitioners 14 hours to complete 2 hours Evenly distributed
  • 45. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 46. Researchers Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Research Articles Turing Turing 2 1 0 0 Research Research Research 0 0 Articles Articles 1 1 Articles 0 0 2 2 X = 0 0 2 2 Item Similarity A User's Preferences Recommendations (item x item) (item x user) (item x user)
  • 47. Researchers user Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Researchers Research Articles Turing Turing 2 1 0 0 Researchers Research Research Research 0 0 Articles Articles 1 1 Articles 0 0 2 2 X = 0 0 2 2 Item Similarity A User's Preferences Recommendations (item x item) (item x user) (item x user) User Similarity (user x user)
  • 48. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 49. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based +1 (67%) ➔ 2.4K, 1.5 2K -1.4K Orig. user-based (58%) 1K ➔ 1K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 50. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 51. Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K 1K, 2.5 ➔ -0.7K Cust. user-based (70%) ➔0.3K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 52. Mahout's Costly & Bad Performance Costly & Good 7K +1 (67%) 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K -6.2K (95%) 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 54. Conclusions ➔ Mahout is doing a great job of powering Mendeley Suggest ➔ Large scale data set ➔ Excellent for batch processing requirements ➔ We'll soon be feeding our user-based implementation into Mahout ➔ User-based can outperform item-based ➔ Makes Mahout's offering more rounded ➔ Save resources and money by understanding your data ➔ Help Hadoop with task allocation if necessary ➔ Paritition your data appropriately
  • 55. We're Hiring! ➔ Hadoop Data Architect ➔ design a coherent data model across the company ➔ take ownership of our data ➔ hands on Hadoop administration ➔ Marie Curie Senior Research Fellow ➔ ensure that Mendeley’s research catalogue is of high quality ➔ research and development opportunity ➔ £500 Finder's Fee if you find someone who we hire ➔ http://www.mendeley.com/careers/