Mendeley Suggest: Engineering a Personalised Article Recommender System
Upcoming SlideShare
Loading in...5
×
 

Mendeley Suggest: Engineering a Personalised Article Recommender System

on

  • 8,587 views

I gave this presentation at the RecSysChallenge workshop (http://2012.recsyschallenge.com/) at Recommender Systems 2012 (http://recsys.acm.org/2012/) in Dublin on 13 September, 2012. ...

I gave this presentation at the RecSysChallenge workshop (http://2012.recsyschallenge.com/) at Recommender Systems 2012 (http://recsys.acm.org/2012/) in Dublin on 13 September, 2012.

This presentation describes how we have been making use of Mahout to power Mendeley Suggest. First, it includes some results from tuning Mahout's recommender on AWS and the cost vs. precision tradeoff. Then it concludes with details on how to make use of other big data technologies and AWS in order to put a serving layer in place.

Acknowledgement to Daniel Jones for making the slides for the serving layer part of the presentation.

Statistics

Views

Total Views
8,587
Views on SlideShare
3,077
Embed Views
5,510

Actions

Likes
14
Downloads
47
Comments
0

62 Embeds 5,510

http://musingsaboutlibrarianship.blogspot.com 2653
http://musingsaboutlibrarianship.blogspot.sg 752
http://musingsaboutlibrarianship.blogspot.co.uk 326
http://musingsaboutlibrarianship.blogspot.ca 298
http://musingsaboutlibrarianship.blogspot.in 184
http://musingsaboutlibrarianship.blogspot.com.au 176
https://twitter.com 141
http://feeds2.feedburner.com 101
http://musingsaboutlibrarianship.blogspot.ru 60
http://musingsaboutlibrarianship.blogspot.co.nz 60
http://musingsaboutlibrarianship.blogspot.fr 57
http://musingsaboutlibrarianship.blogspot.de 55
http://musingsaboutlibrarianship.blogspot.com.es 53
http://musingsaboutlibrarianship.blogspot.it 42
http://www.newsblur.com 39
http://musingsaboutlibrarianship.blogspot.se 38
http://musingsaboutlibrarianship.blogspot.co.il 35
http://musingsaboutlibrarianship.blogspot.nl 33
http://musingsaboutlibrarianship.blogspot.tw 32
http://musingsaboutlibrarianship.blogspot.be 29
http://newsblur.com 28
http://musingsaboutlibrarianship.blogspot.kr 27
http://musingsaboutlibrarianship.blogspot.ch 27
http://musingsaboutlibrarianship.blogspot.com.br 24
http://musingsaboutlibrarianship.blogspot.jp 24
http://musingsaboutlibrarianship.blogspot.pt 23
http://musingsaboutlibrarianship.blogspot.hk 21
http://musingsaboutlibrarianship.blogspot.ie 19
http://musingsaboutlibrarianship.blogspot.no 16
http://musingsaboutlibrarianship.blogspot.gr 15
http://musingsaboutlibrarianship.blogspot.dk 12
http://musingsaboutlibrarianship.blogspot.com.ar 12
http://www.musingsaboutlibrarianship.blogspot.com 10
http://musingsaboutlibrarianship.blogspot.co.at 10
http://musingsaboutlibrarianship.blogspot.mx 7
http://apps.synaptive.net 7
http://musingsaboutlibrarianship.blogspot.fi 6
http://musingsaboutlibrarianship.blogspot.hu 6
http://musingsaboutlibrarianship.blogspot.ro 6
http://musingsaboutlibrarianship.blogspot.cz 6
http://www.musingsaboutlibrarianship.blogspot.co.nz 5
http://webcache.googleusercontent.com 5
http://musingsaboutlibrarianship.blogspot.com.tr 4
http://www.commafeed.com 3
http://kred.com 2
http://translate.googleusercontent.com 2
http://feeds.feedburner.com 2
http://musingsaboutlibrarianship.blogspot.sk 2
http://www.google.co.in 2
https://www.google.com.sg 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Mendeley Suggest: Engineering a Personalised Article Recommender System Mendeley Suggest: Engineering a Personalised Article Recommender System Presentation Transcript

  • Mendeley Suggest: Engineering a Personalised ArticleRecommender System Kris Jack, PhD Chief Data Scientist https://twitter.com/_krisjack
  • Overview➔ Whats Mendeley?➔ Whats Mendeley Suggest?➔ Computation Layer➔ Serving Layer ➔ Architecture ➔ Technologies ➔ Deployment➔ Conclusions
  • Whats Mendeley?
  • ➔ Mendeley is a platform that connects researchers, research data and apps Mendeley Open API
  • ➔ Mendeley is a platform that connects researchers, research data and apps Mendeley Open API➔ Startup company with ~20 R&D engineers
  • Whats Mendeley Suggest?
  • Use Case➔ Good researchers are on top of their game➔ Difficult with the amount being produced➔ There must be a technology that can help➔ Help researchers by recommending relevant research
  • Mendeley Suggest
  • Computation Layer
  • Mendeley Suggest
  • Mendeley Suggest
  • Mendeley Suggest
  • Running on Amazons Elastic Map Reduce On demand use and easy to cost
  • Computation Layer 1.5M Users, 50M Articles Mahouts Normalised Amazon Hours Performance No. Good Recommendations/10
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K Normalised Amazon Hours 6K 5K 4K 3K 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K Paritioners MR allocation 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based +1 (67%) ➔ 2.4K, 1.5 2K -1.4K Orig. user-based (58%) 1K ➔ 1K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K 1K, 2.5 ➔ -0.7K Cust. user-based (70%) ➔0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K +1 (67%) 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K -6.2K (95%) 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • Mahout as the ComputationLayer➔ Out of the box, didnt work so well for us➔ Needed to understand Hadoop better➔ Contributed patch back to community (user-user)➔ Next step, the serving layer...
  • Serving Layer
  • Architecture Mendeley Hadoop Cluster User Cascading Libraries Computation Layer
  • Architecture AWS Elastic Elastic Beanstalk DynamoDB Elastic Beanstalk Beanstalk Serving Layer Mendeley Hadoop Cluster User Map Reduce Libraries Computation Layer
  • Technologies➔ Spring dependency injection framework ➔ Context-wide integration testing is easy, including pre-loading of test data ➔ Allows other Spring features (cache, security, messaging)➔ Spring MVC 3.2.M1 ➔ Annotated controllers, type conversion for free ➔ Asynchronous Servlet 3.0 supports thread parking➔ AlternatorDB ➔ In-memory DynamoDB implementation for testing
  • Technologies Recommendation<K> LongRecommendation UuidRecommendationGroupRecommendation PersonRecommendation DocumentRecommendation➔ Build once, employ in several use cases
  • Deployment➔ AWS ElasticBeanstalk ➔ Managed, auto-scaling, health-checking .war container➔ Jenkins continuous integration (CI) server➔ Maven build tool (useful dependency management)➔ beanstalk-maven-plugin (push a button to deploy) ➔ Deploys to ElasticBeanstalk ➔ Replaces existing application version if required ➔ Zero downtime updates (tested at ~300ms) ➔ Triggered by Jenkins
  • Putting it all together... $$$➔ Real-time article recommendations for 2 million users➔ 20 requests per second➔ $65.84/month ➔ $34.24 ElasticBeanstalk ➔ $28.17 DynamoDB ➔ $2.76 bandwidth➔ $30 to update the computation layer periodically
  • Conclusions
  • Conclusions➔ Mendeley Suggest is a personalised article recommender➔ Built by small team for big data➔ Uses Mahout as computation layer ➔ Needs some love out of the box➔ Serves from AWS ➔ Reduces maintenance costs and is reliable➔ Intend to release Mendeley Suggest to all users this year
  • Were Hiring!➔ Data Scientist ➔ apply recommender technologies to Mendeleys data ➔ work on improving the quality of Mendeleys research catalogue ➔ starting in first quarter of 2013 ➔ 6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7 TEAM project (http://team-project.tugraz.at/)➔ http://www.mendeley.com/careers/
  • www.mendeley.com