Your SlideShare is downloading. ×
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley

3,711

Published on

I gave this presentation as part of the Big Data Week Conferences in London, 25th April, 2012. …

I gave this presentation as part of the Big Data Week Conferences in London, 25th April, 2012.

Mendeley Suggest is a research article recommendation system powered by Mahout. This presentation explores how Mahout's distributed recommender works and how well it performs when applied to the problem of recommending research to Mendeley users. Based on experimentation, some tips are provided on how to speed Mahout up by tuning it to the characteristics of the training data set. A new recommendation algorithm is also presented that implements user-based collaborative filtering which complements Mahout's existing item-based collaborative filtering algorithm. The user-based implementation will soon be contributed back to the Mahout community.

Published in: Education, Technology
1 Comment
10 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,711
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
98
Comments
1
Likes
10
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Mahout becomes a researcher Kris Jack, PhDSenior Data Mining Engineer
  • 2. Overview➔ Whats Mendeley?➔ Applications of Mahouts Recommender➔ Under Mahouts Bonnet➔ Mahouts Research Career so Far➔ Conclusions
  • 3. Whats Mendeley?
  • 4. ➔ Mendeley is a data platform for researchers ➔ Were bringing together researchers and the research that they produce from all over the world ➔ Were structuring this data in a machine readable format ➔ Were opening this data up for you to build applications on top of it using our API ➔ These applications help researchers to do even better research and become more productive➔ How are we building our community?
  • 5. Mendeley provides tools to help users......organisetheir research ➔ Reference management ➔ Cite-as-you- write ➔ Full-text article search ➔ Digitalised annotations
  • 6. Mendeley provides tools to help users... ...collaborate with one another...organisetheir research ➔ Research network ➔ Professional research groups
  • 7. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research ➔ Mendeley Suggest ➔ Personalised article recommendations ➔ Weekly batch of 10 recommended articles ➔ Collaborative Filtering ➔ The more data, the better
  • 8. 1.5 million+ users; the 20 largest user bases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego University of California at LA University of Florida50m research articles University of North Carolina
  • 9. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research We need a recommender that scales up, coping with our data and future growth
  • 10. Applications of Mahouts Recommender
  • 11. Mahout use cases: ➔ Retrieve related items in large collectionshttp://www.slideshare.net/kryton/the-data-layer
  • 12. Mahout use cases: ➔ Retrieve related items in large collections ➔ Discover relevant items that you may have overlookedhttp://engineering.foursquare.com/2011/03/22/building-a-recommendation-engine-foursquare-style/
  • 13. Mahout use cases: ➔ Retrieve related items in large collections ➔ Discover relevant items that you may have overlooked ➔ Find love! ➔ Mahout implements collaborative filtering, a surprisingly powerful algorithmhttp://www.speeddate.com/apps/site/views/mp/technology.php
  • 14. Mahout use cases: ➔ Retrieve related items in large collections ➔ Discover relevant items that you may have overlooked ➔ Find love! ➔ Mahout implements collaborative filtering, a surprisingly powerful algorithm ➔ Mendeley Suggest ➔ Discover new research ➔ Fill in gaps in your library ➔ Your personal advisorhttp://krisjack.blogspot.co.uk/2012/02/your-very-own-personalised-research.html
  • 15. Under Mahouts Bonnet
  • 16. Generating recommendationsthrough matrix multiplication This is item-based recommendations as similarity is based on items, not usersNot convinced? Try reading these... Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734-749. Piscataway, NJ, USA. http://www.slideshare.net/srowen/collaborative-filtering-at-scale-2 http://krisjack.blogspot.co.uk/2012/04/under-bonnet-of-mahouts-item-based.html
  • 17. Researchers Turing Babbage Einstein Newton Comp Sci 1Research Articles Comp Sci 2 Physics 1 Physics 2 Input (all user preferences)
  • 18. Researchers Turing Babbage Einstein Newton 1.5M Comp Sci 1Research Articles Comp Sci 2 Physics 1 Physics 2 300M prefs 50M Input (all user preferences)
  • 19. Researchers Research Articlesitem.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user)
  • 20. Researchers Research Articlesitem.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Research Turing Articles A Users Preferences (item x user)
  • 21. Researchers Research Articlesitem.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Research Articles Turing 2 1 0 0 ResearchResearch 0 0 Articles 1 1Articles 0 0 2 2 0 0 2 2 Item Similarity A Users Preferences (item x item) (item x user)
  • 22. Researchers Research Articles Research Articles Comp Sci 1 Physics 1 Comp Sci 2 Physics 2 Input (all user preferences) Comp Sci 1 2 1 0 0Research Articles Comp Sci 2 1 1 0 0 Physics 1 0 0 2 2 Physics 2 0 0 2 2
  • 23. Researchers Research Articlesitem.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Research Articles Turing Turing 2 1 0 0 Research ResearchResearch 0 0 Articles Articles 1 1Articles 0 0 2 2 X = 0 0 2 2 Item Similarity A Users Preferences Recommendations (item x item) (item x user) (item x user)
  • 24. Running on Amazons Elastic Map Reduce On demand use and easy to cost
  • 25. Mahouts Research Career so Far
  • 26. Mendeley Suggest
  • 27. MahoutsNormalised Amazon Hours Performance No. Good Recommendations/10
  • 28. Mahouts Costly & BadNormalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 29. Mahouts Costly & BadNormalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 30. Mahouts Costly & BadNormalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 31. Mahouts Costly & Bad Performance Costly & Good 7KNormalised Amazon Hours 6K 5K 4K 3K 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 32. Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 33. Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 34. Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 35. Reducing processing time and cost➔ Mahouts recommender is already efficient ➔ but your data may have unusual properties➔ We got improvements by: ➔ tuning Hadoops mapper and reducer allocation over the 10 steps in the RecommenderJob ➔ using an appropriate partitioner
  • 36. Task Allocation 37 hours to complete 1 reducer allocated, despite having 48 available...
  • 37. Task AllocationAllocating more reducers on a per job basis job.getConfiguration().setInt( "mapred.reduce.tasks", numMappers);Allocating more mappers on a per job basis job.getConfiguration().set( "mapred.max.split.size", String.valueOf(splitSize));
  • 38. Task Allocation 37 hours to complete 14 hours From 1 → 40 reducers
  • 39. Partitioners 14 hours to complete
  • 40. Partitioners 14 hours to complete ~50KB ~500MB
  • 41. InputSampler.Sampler<IntWritable, Text> sampler = new InputSampler.RandomSampler<IntWritable, Text>(...); InputSampler.writePartitionFile(conf, sampler); conf.setPartitionerClass(TotalOrderPartitioner.class);http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/
  • 42. Partitioners 14 hours to complete 2 hours Evenly distributed
  • 43. Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 44. Researchers Research Articlesitem.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Research Articles Turing Turing 2 1 0 0 Research ResearchResearch 0 0 Articles Articles 1 1Articles 0 0 2 2 X = 0 0 2 2 Item Similarity A Users Preferences Recommendations (item x item) (item x user) (item x user)
  • 45. Researchers user Research Articles item.RecommenderJob 1. Prep. pref. matrix (1-3) 2. Gen. sim. matrix (4-6) 3. Multiply matrices (7-10) All User Preferences (item x user) Researchers Research Articles Turing Turing 2 1 0 0Researchers Research Research Research 0 0 Articles Articles 1 1 Articles 0 0 2 2 X = 0 0 2 2 Item Similarity A Users Preferences Recommendations (item x item) (item x user) (item x user) User Similarity (user x user)
  • 46. Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 47. Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based +1 (67%) ➔ 2.4K, 1.5 2K -1.4K Orig. user-based (58%) 1K ➔ 1K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 48. Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 49. Mahouts Costly & Bad Performance Costly & Good 7K 6.5K, 1.5Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K 1K, 2.5 ➔ -0.7K Cust. user-based (70%) ➔0.3K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 50. Mahouts Costly & Bad Performance Costly & Good 7K +1 (67%) 6.5K, 1.5Normalised Amazon Hours 6K Orig. item-based 5K 4K -6.2K (95%) 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 0 1 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 51. Conclusions
  • 52. Conclusions➔ Mahout is doing a great job of powering Mendeley Suggest ➔ Large scale data set ➔ Excellent for batch processing requirements➔ Well soon be feeding our user-based implementation intoMahout ➔ User-based can outperform item-based ➔ Makes Mahouts offering more rounded➔ Save resources and money by understanding your data ➔ Help Hadoop with task allocation if necessary ➔ Paritition your data appropriately
  • 53. Were Hiring!➔ Hadoop Data Architect ➔ design a coherent data model across the company ➔ take ownership of our data ➔ hands on Hadoop administration➔ Marie Curie Senior Research Fellow ➔ ensure that Mendeley’s research catalogue is of high quality ➔ research and development opportunity➔ £500 Finders Fee if you find someone who we hire➔ http://www.mendeley.com/careers/
  • 54. www.mendeley.com

×