Mendeley Suggest:       Engineering a  Personalised ArticleRecommender System          Kris Jack, PhD         Chief Data S...
Overview➔    Whats Mendeley?➔    Whats Mendeley Suggest?➔    Computation Layer➔    Serving Layer    ➔      Architecture   ...
Whats Mendeley?
➔    Mendeley is a platform that connects    researchers, research data and apps                         Mendeley Open API
➔    Mendeley is a platform that connects    researchers, research data and apps                         Mendeley Open API...
Whats Mendeley       Suggest?
Use Case➔    Good researchers are on top of their game➔    Difficult with the amount being produced➔    There must be a te...
Mendeley Suggest
Computation     Layer
Mendeley Suggest
Mendeley Suggest
Mendeley Suggest
Running on Amazons Elastic Map Reduce                On demand use and easy to cost
Computation Layer                                      1.5M Users, 50M Articles                                      Mahou...
Computation Layer                                          1.5M Users, 50M Articles                                       ...
Computation Layer                                          1.5M Users, 50M Articles                                       ...
Computation Layer                                          1.5M Users, 50M Articles                                       ...
Computation Layer                                     1.5M Users, 50M Articles                                        Maho...
Computation Layer                                         1.5M Users, 50M Articles                                        ...
Computation Layer                                             1.5M Users, 50M Articles                                    ...
Computation Layer                                                       1.5M Users, 50M Articles                          ...
Computation Layer                                             1.5M Users, 50M Articles                                    ...
Computation Layer                                                           1.5M Users, 50M Articles                      ...
Computation Layer                                                           1.5M Users, 50M Articles                      ...
Computation Layer                                                        1.5M Users, 50M Articles                         ...
Computation Layer                                                      1.5M Users, 50M Articles                           ...
Computation Layer                                                        1.5M Users, 50M Articles                         ...
Mahout as the ComputationLayer➔    Out of the box, didnt work so well for us➔    Needed to understand Hadoop better➔    Co...
Serving Layer
Architecture                           Mendeley                            Hadoop                            Cluster   Use...
Architecture                       AWS                                               Elastic                              ...
Technologies➔    Spring dependency injection framework    ➔        Context-wide integration testing is easy, including pre...
Technologies                                   Recommendation<K>              LongRecommendation                         U...
Deployment➔    AWS ElasticBeanstalk    ➔        Managed, auto-scaling, health-checking .war container➔    Jenkins continuo...
Putting it all together... $$$➔    Real-time article recommendations for 2 million users➔    20 requests per second➔    $6...
Conclusions
Conclusions➔    Mendeley Suggest is a personalised article recommender➔    Built by small team for big data➔    Uses Mahou...
Were Hiring!➔    Data Scientist    ➔        apply recommender technologies to Mendeleys data    ➔        work on improving...
www.mendeley.com
Upcoming SlideShare
Loading in...5
×

Mendeley Suggest: Engineering a Personalised Article Recommender System

8,904

Published on

I gave this presentation at the RecSysChallenge workshop (http://2012.recsyschallenge.com/) at Recommender Systems 2012 (http://recsys.acm.org/2012/) in Dublin on 13 September, 2012.

This presentation describes how we have been making use of Mahout to power Mendeley Suggest. First, it includes some results from tuning Mahout's recommender on AWS and the cost vs. precision tradeoff. Then it concludes with details on how to make use of other big data technologies and AWS in order to put a serving layer in place.

Acknowledgement to Daniel Jones for making the slides for the serving layer part of the presentation.

Published in: Technology, Business
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,904
On Slideshare
0
From Embeds
0
Number of Embeds
47
Actions
Shares
0
Downloads
50
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide

Mendeley Suggest: Engineering a Personalised Article Recommender System

  1. 1. Mendeley Suggest: Engineering a Personalised ArticleRecommender System Kris Jack, PhD Chief Data Scientist https://twitter.com/_krisjack
  2. 2. Overview➔ Whats Mendeley?➔ Whats Mendeley Suggest?➔ Computation Layer➔ Serving Layer ➔ Architecture ➔ Technologies ➔ Deployment➔ Conclusions
  3. 3. Whats Mendeley?
  4. 4. ➔ Mendeley is a platform that connects researchers, research data and apps Mendeley Open API
  5. 5. ➔ Mendeley is a platform that connects researchers, research data and apps Mendeley Open API➔ Startup company with ~20 R&D engineers
  6. 6. Whats Mendeley Suggest?
  7. 7. Use Case➔ Good researchers are on top of their game➔ Difficult with the amount being produced➔ There must be a technology that can help➔ Help researchers by recommending relevant research
  8. 8. Mendeley Suggest
  9. 9. Computation Layer
  10. 10. Mendeley Suggest
  11. 11. Mendeley Suggest
  12. 12. Mendeley Suggest
  13. 13. Running on Amazons Elastic Map Reduce On demand use and easy to cost
  14. 14. Computation Layer 1.5M Users, 50M Articles Mahouts Normalised Amazon Hours Performance No. Good Recommendations/10
  15. 15. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  16. 16. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  17. 17. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  18. 18. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K Normalised Amazon Hours 6K 5K 4K 3K 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  19. 19. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  20. 20. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  21. 21. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K Paritioners MR allocation 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  22. 22. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  23. 23. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  24. 24. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based +1 (67%) ➔ 2.4K, 1.5 2K -1.4K Orig. user-based (58%) 1K ➔ 1K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  25. 25. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  26. 26. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K 1K, 2.5 ➔ -0.7K Cust. user-based (70%) ➔0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  27. 27. Computation Layer 1.5M Users, 50M Articles Mahouts Costly & Bad Performance Costly & Good 7K +1 (67%) 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K -6.2K (95%) 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  28. 28. Mahout as the ComputationLayer➔ Out of the box, didnt work so well for us➔ Needed to understand Hadoop better➔ Contributed patch back to community (user-user)➔ Next step, the serving layer...
  29. 29. Serving Layer
  30. 30. Architecture Mendeley Hadoop Cluster User Cascading Libraries Computation Layer
  31. 31. Architecture AWS Elastic Elastic Beanstalk DynamoDB Elastic Beanstalk Beanstalk Serving Layer Mendeley Hadoop Cluster User Map Reduce Libraries Computation Layer
  32. 32. Technologies➔ Spring dependency injection framework ➔ Context-wide integration testing is easy, including pre-loading of test data ➔ Allows other Spring features (cache, security, messaging)➔ Spring MVC 3.2.M1 ➔ Annotated controllers, type conversion for free ➔ Asynchronous Servlet 3.0 supports thread parking➔ AlternatorDB ➔ In-memory DynamoDB implementation for testing
  33. 33. Technologies Recommendation<K> LongRecommendation UuidRecommendationGroupRecommendation PersonRecommendation DocumentRecommendation➔ Build once, employ in several use cases
  34. 34. Deployment➔ AWS ElasticBeanstalk ➔ Managed, auto-scaling, health-checking .war container➔ Jenkins continuous integration (CI) server➔ Maven build tool (useful dependency management)➔ beanstalk-maven-plugin (push a button to deploy) ➔ Deploys to ElasticBeanstalk ➔ Replaces existing application version if required ➔ Zero downtime updates (tested at ~300ms) ➔ Triggered by Jenkins
  35. 35. Putting it all together... $$$➔ Real-time article recommendations for 2 million users➔ 20 requests per second➔ $65.84/month ➔ $34.24 ElasticBeanstalk ➔ $28.17 DynamoDB ➔ $2.76 bandwidth➔ $30 to update the computation layer periodically
  36. 36. Conclusions
  37. 37. Conclusions➔ Mendeley Suggest is a personalised article recommender➔ Built by small team for big data➔ Uses Mahout as computation layer ➔ Needs some love out of the box➔ Serves from AWS ➔ Reduces maintenance costs and is reliable➔ Intend to release Mendeley Suggest to all users this year
  38. 38. Were Hiring!➔ Data Scientist ➔ apply recommender technologies to Mendeleys data ➔ work on improving the quality of Mendeleys research catalogue ➔ starting in first quarter of 2013 ➔ 6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7 TEAM project (http://team-project.tugraz.at/)➔ http://www.mendeley.com/careers/
  39. 39. www.mendeley.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×