Your SlideShare is downloading. ×
0
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Mendeley Suggest: Engineering a Personalised Article Recommender System

8,650

Published on

I gave this presentation at the RecSysChallenge workshop (http://2012.recsyschallenge.com/) at Recommender Systems 2012 (http://recsys.acm.org/2012/) in Dublin on 13 September, 2012. …

I gave this presentation at the RecSysChallenge workshop (http://2012.recsyschallenge.com/) at Recommender Systems 2012 (http://recsys.acm.org/2012/) in Dublin on 13 September, 2012.

This presentation describes how we have been making use of Mahout to power Mendeley Suggest. First, it includes some results from tuning Mahout's recommender on AWS and the cost vs. precision tradeoff. Then it concludes with details on how to make use of other big data technologies and AWS in order to put a serving layer in place.

Acknowledgement to Daniel Jones for making the slides for the serving layer part of the presentation.

Published in: Technology, Business
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,650
On Slideshare
0
From Embeds
0
Number of Embeds
47
Actions
Shares
0
Downloads
49
Comments
0
Likes
15
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Mendeley Suggest: Engineering a Personalised ArticleRecommender System Kris Jack, PhD Chief Data Scientist https://twitter.com/_krisjack
  • 2. Overview➔ Whats Mendeley?➔ Whats Mendeley Suggest?➔ Computation Layer➔ Serving Layer ➔ Architecture ➔ Technologies ➔ Deployment➔ Conclusions
  • 3. Whats Mendeley?
  • 4. ➔ Mendeley is a platform that connects researchers, research data and apps Mendeley Open API
  • 5. ➔ Mendeley is a platform that connects researchers, research data and apps Mendeley Open API➔ Startup company with ~20 R&D engineers
  • 6. Whats Mendeley Suggest?
  • 7. Use Case➔ Good researchers are on top of their game➔ Difficult with the amount being produced➔ There must be a technology that can help➔ Help researchers by recommending relevant research
  • 8. Mendeley Suggest
  • 9. Computation Layer
  • 10. Mendeley Suggest
  • 11. Mendeley Suggest
  • 12. Mendeley Suggest
  • 13. Running on Amazons Elastic Map Reduce On demand use and easy to cost
  • 14. Computation Layer 1.5M Users, 50M Articles Mahouts Normalised Amazon Hours Performance No. Good Recommendations/10
  • 15. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 16. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 17. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 18. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K Normalised Amazon Hours 6K 5K 4K 3K 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 19. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 20. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 21. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K Paritioners MR allocation 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 22. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 23. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 24. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based +1 (67%) ➔ 2.4K, 1.5 2K -1.4K Orig. user-based (58%) 1K ➔ 1K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 25. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 26. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K 1K, 2.5 ➔ -0.7K Cust. user-based (70%) ➔0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 27. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K +1 (67%) 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K -6.2K (95%) 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 28. Mahout as the ComputationLayer➔ Out of the box, didnt work so well for us➔ Needed to understand Hadoop better➔ Contributed patch back to community (user-user)➔ Next step, the serving layer...
  • 29. Serving Layer
  • 30. Architecture Mendeley Hadoop Cluster User Cascading Libraries Computation Layer
  • 31. Architecture AWS Elastic Elastic Beanstalk DynamoDB Elastic Beanstalk Beanstalk Serving Layer Mendeley Hadoop Cluster User Map Reduce Libraries Computation Layer
  • 32. Technologies➔ Spring dependency injection framework ➔ Context-wide integration testing is easy, including pre-loading of test data ➔ Allows other Spring features (cache, security, messaging)➔ Spring MVC 3.2.M1 ➔ Annotated controllers, type conversion for free ➔ Asynchronous Servlet 3.0 supports thread parking➔ AlternatorDB ➔ In-memory DynamoDB implementation for testing
  • 33. Technologies Recommendation<K> LongRecommendation UuidRecommendationGroupRecommendation PersonRecommendation DocumentRecommendation➔ Build once, employ in several use cases
  • 34. Deployment➔ AWS ElasticBeanstalk ➔ Managed, auto-scaling, health-checking .war container➔ Jenkins continuous integration (CI) server➔ Maven build tool (useful dependency management)➔ beanstalk-maven-plugin (push a button to deploy) ➔ Deploys to ElasticBeanstalk ➔ Replaces existing application version if required ➔ Zero downtime updates (tested at ~300ms) ➔ Triggered by Jenkins
  • 35. Putting it all together... $$$➔ Real-time article recommendations for 2 million users➔ 20 requests per second➔ $65.84/month ➔ $34.24 ElasticBeanstalk ➔ $28.17 DynamoDB ➔ $2.76 bandwidth➔ $30 to update the computation layer periodically
  • 36. Conclusions
  • 37. Conclusions➔ Mendeley Suggest is a personalised article recommender➔ Built by small team for big data➔ Uses Mahout as computation layer ➔ Needs some love out of the box➔ Serves from AWS ➔ Reduces maintenance costs and is reliable➔ Intend to release Mendeley Suggest to all users this year
  • 38. Were Hiring!➔ Data Scientist ➔ apply recommender technologies to Mendeleys data ➔ work on improving the quality of Mendeleys research catalogue ➔ starting in first quarter of 2013 ➔ 6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7 TEAM project (http://team-project.tugraz.at/)➔ http://www.mendeley.com/careers/
  • 39. www.mendeley.com

×