Advertisement

Mendeley Suggest: Engineering a Personalised Article Recommender System

Kris Jack
Chief Data Scientist
Sep. 13, 2012
Advertisement

More Related Content

Advertisement

Mendeley Suggest: Engineering a Personalised Article Recommender System

  1. Mendeley Suggest: Engineering a Personalised Article Recommender System Kris Jack, PhD Chief Data Scientist https://twitter.com/_krisjack
  2. Overview ➔ What's Mendeley? ➔ What's Mendeley Suggest? ➔ Computation Layer ➔ Serving Layer ➔ Architecture ➔ Technologies ➔ Deployment ➔ Conclusions
  3. What's Mendeley?
  4. Mendeley is a platform that connects researchers, research data and apps Mendeley Open API
  5. Mendeley is a platform that connects researchers, research data and apps Mendeley Open API ➔ Startup company with ~20 R&D engineers
  6. What's Mendeley Suggest?
  7. Use Case ➔ Good researchers are on top of their game ➔ Difficult with the amount being produced ➔ There must be a technology that can help ➔ Help researchers by recommending relevant research
  8. Mendeley Suggest
  9. Computation Layer
  10. Mendeley Suggest
  11. Mendeley Suggest
  12. Mendeley Suggest
  13. Running on Amazon's Elastic Map Reduce On demand use and easy to cost
  14. Computation Layer 1.5M Users, 50M Articles Mahout's Normalised Amazon Hours Performance No. Good Recommendations/10
  15. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  16. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  17. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  18. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K Normalised Amazon Hours 6K 5K 4K 3K 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  19. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  20. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  21. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K Paritioners MR allocation 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  22. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  23. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  24. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based +1 (67%) ➔ 2.4K, 1.5 2K -1.4K Orig. user-based (58%) 1K ➔ 1K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  25. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  26. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K 1K, 2.5 ➔ -0.7K Cust. user-based (70%) ➔0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  27. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K +1 (67%) 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K -6.2K (95%) 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  28. Mahout as the Computation Layer ➔ Out of the box, didn't work so well for us ➔ Needed to understand Hadoop better ➔ Contributed patch back to community (user-user) ➔ Next step, the serving layer...
  29. Serving Layer
  30. Architecture Mendeley Hadoop Cluster User Cascading Libraries Computation Layer
  31. Architecture AWS Elastic Elastic Beanstalk DynamoDB Elastic Beanstalk Beanstalk Serving Layer Mendeley Hadoop Cluster User Map Reduce Libraries Computation Layer
  32. Technologies ➔ Spring dependency injection framework ➔ Context-wide integration testing is easy, including pre-loading of test data ➔ Allows other Spring features (cache, security, messaging) ➔ Spring MVC 3.2.M1 ➔ Annotated controllers, type conversion 'for free' ➔ Asynchronous Servlet 3.0 supports thread 'parking' ➔ AlternatorDB ➔ In-memory DynamoDB implementation for testing
  33. Technologies Recommendation<K> LongRecommendation UuidRecommendation GroupRecommendation PersonRecommendation DocumentRecommendation ➔ Build once, employ in several use cases
  34. Deployment ➔ AWS ElasticBeanstalk ➔ Managed, auto-scaling, health-checking .war container ➔ Jenkins continuous integration (CI) server ➔ Maven build tool (useful dependency management) ➔ beanstalk-maven-plugin (push a button to deploy) ➔ Deploys to ElasticBeanstalk ➔ Replaces existing application version if required ➔ 'Zero downtime' updates (tested at ~300ms) ➔ Triggered by Jenkins
  35. Putting it all together... $$$ ➔ Real-time article recommendations for 2 million users ➔ 20 requests per second ➔ $65.84/month ➔ $34.24 ElasticBeanstalk ➔ $28.17 DynamoDB ➔ $2.76 bandwidth ➔ $30 to update the computation layer periodically
  36. Conclusions
  37. Conclusions ➔ Mendeley Suggest is a personalised article recommender ➔ Built by small team for big data ➔ Uses Mahout as computation layer ➔ Needs some love out of the box ➔ Serves from AWS ➔ Reduces maintenance costs and is reliable ➔ Intend to release Mendeley Suggest to all users this year
  38. We're Hiring! ➔ Data Scientist ➔ apply recommender technologies to Mendeley's data ➔ work on improving the quality of Mendeley's research catalogue ➔ starting in first quarter of 2013 ➔ 6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7 TEAM project (http://team-project.tugraz.at/) ➔ http://www.mendeley.com/careers/
  39. www.mendeley.com
Advertisement