Collaborative Filtering in Map/Reduce

6,278 views

Published on

Published in: Technology
2 Comments
4 Likes
Statistics
Notes
No Downloads
Views
Total views
6,278
On SlideShare
0
From Embeds
0
Number of Embeds
684
Actions
Shares
0
Downloads
185
Comments
2
Likes
4
Embeds 0
No embeds

No notes for slide

Collaborative Filtering in Map/Reduce

  1. 1. Collaborative Filtering in Map/Reduce Ole-Martin Mørk - Open AdExchange tirsdag 14. september 2010
  2. 2. Vision • Learn that Map/Reduce is simple • Learn that Map/Reduce may be powerful • Collaborative Filtering is fun! tirsdag 14. september 2010
  3. 3. Agenda • Map/Reduce • Collaborative Filtering • Collaborative Filtering with Map/Reduce • Amazon Elastic MapReduce tirsdag 14. september 2010
  4. 4. Map/Reduce tirsdag 14. september 2010
  5. 5. Map/Reduce • Very scalable algorithm • Inspirered by map and reduce from functional programming. • Everything is based on key/value tirsdag 14. september 2010
  6. 6. 6 phases • Reader • Map • Partition • Comparison • Reduce • Writer tirsdag 14. september 2010
  7. 7. 6 phases • Reader • Map • Partition • Comparison • Reduce • Writer tirsdag 14. september 2010
  8. 8. Map tirsdag 14. september 2010
  9. 9. functional map List(“hello”,“dude”).map{x=>x.substring(0,1)} tirsdag 14. september 2010
  10. 10. Map/Reduce map • Input is key/value • Output is key/value tirsdag 14. september 2010
  11. 11. Simple Example, Map • Count occurences of words in a document • Input is: <linenumber>, <content of line> • For each word on the line, the output is <word>, <count> tirsdag 14. september 2010
  12. 12. Map tirsdag 14. september 2010
  13. 13. Reduce tirsdag 14. september 2010
  14. 14. functional reduce val sum=List(32,40,23).reduceLeft{_+_} tirsdag 14. september 2010
  15. 15. Map/Reduce reduce • Input is key/list of values • Output is key/value tirsdag 14. september 2010
  16. 16. Simple Example, Reduce • Reduce input is <word, counts> • For each value we increase the count • Output is <word>, <sum of counts> tirsdag 14. september 2010
  17. 17. Reduce tirsdag 14. september 2010
  18. 18. Collaborative Filtering tirsdag 14. september 2010
  19. 19. Amazon tirsdag 14. september 2010
  20. 20. Last.fm tirsdag 14. september 2010
  21. 21. Sceneami.com tirsdag 14. september 2010
  22. 22. User based • Useful when we have • Small number of users • High correlation between users • Data that changes often tirsdag 14. september 2010
  23. 23. Item based • Useful for big sites like Amazon etc.. • Small overlap between users • Mostly static data tirsdag 14. september 2010
  24. 24. Euclidean Distance Rating Match Min drømmeapplikasjon Match Rating Pattern Matching in Scala tirsdag 14. september 2010
  25. 25. Euclidean Distance • Alf‘s presentations:1,25,56,57,58,98 (6) • Kari’s presentations: 2,25,98,99 (4) • Equal presentations: 25 and 98 (2) • Unmatched presentations: 6-2 + 4-2 = 6 • Distance score: 1/1+sqr(6)= 0.29 tirsdag 14. september 2010
  26. 26. Recommended sessions • Me:1,2,5,6,7 • Kate (0.31): 5,6,8,9 • Paul (0.41): 1,2,4,5,6 • Mary(0.31):1,5,8,9 tirsdag 14. september 2010
  27. 27. Recommended sessions • Me:1,2,5,6,7 • Kate (0.31): 5,6,8,9 • Paul (0.41): 1,2,4,5,6 • Mary(0.31):1,5,8,9 • Recommended: 8 (0.62) tirsdag 14. september 2010
  28. 28. Recommended sessions • Me:1,2,5,6,7 • Kate (0.31): 5,6,8,9 • Paul (0.41): 1,2,4,5,6 • Mary(0.31):1,5,8,9 • Recommended: 8 (0.62), 9 (0.62) tirsdag 14. september 2010
  29. 29. Recommended sessions • Me:1,2,5,6,7 • Kate (0.31): 5,6,8,9 • Paul (0.41): 1,2,4,5,6 • Mary(0.31):1,5,8,9 • Recommended: 8 (0.62), 9 (0.62), 4 (0.41) tirsdag 14. september 2010
  30. 30. Demo tirsdag 14. september 2010
  31. 31. More Map/Reduce tirsdag 14. september 2010
  32. 32. Several iterations Iteration 1 Iteration 2 Iteration 3 tirsdag 14. september 2010
  33. 33. Several iterations Iteration 1 Iteration 2 Iteration 3 tirsdag 14. september 2010
  34. 34. Partitioning Paul Mary Kate Lea Jeff Ali Ali Jeff Lea Kate Paul Mary Reducer Reducer tirsdag 14. september 2010
  35. 35. Comparison Pres 1 Pres 2 Paul Lea Ali Jeff Mary Kate Paul Kate Pres 1 Ali Pres 2 Jeff Lea Mary Reducer Reducer tirsdag 14. september 2010
  36. 36. Guidelines • Never access external sources during computation. • Your functions should be small and fast • You might not have all the data available tirsdag 14. september 2010
  37. 37. Hadoop • Hadoop is reusing objects, so remember to clone if you plan to keep them. • You can read and write all objects implementing hadoop.WritableComparable • write(DataOutput) • readFields(DataInput) • compareTo(Object) tirsdag 14. september 2010
  38. 38. Collaborative Filtering, the Map/Reduce way tirsdag 14. september 2010
  39. 39. Overview • Create an application that recommends JavaZone presentations. • Overall goal: Scalable performance • 4 iterations • Reading input from text file tirsdag 14. september 2010
  40. 40. Iteration 1 • Map input: <user>, <presentations> • Map output: <presentation>, <user> • Reduce output: <presentation>, <userList> tirsdag 14. september 2010
  41. 41. Iteration 2 • Map input: <presentation>, <userList> • Map output: <user>, <userList> • Reduce input: <user>, <list of userList> • Reduce output: <userTuplet>, <match count> tirsdag 14. september 2010
  42. 42. Iteration 3 • Map input: <userTuplet>, <match count> • Map output: <userTuplet>, <diff> • Map output: <userTuplet reversed>, <diff> • Reduce output: <user>, <similaruser> tirsdag 14. september 2010
  43. 43. Iteration 4 • Map input: <user>, <similaruser> • Map output: <user>, <presentation with score> • Reduce output: <user>, <presentations> tirsdag 14. september 2010
  44. 44. Demo tirsdag 14. september 2010
  45. 45. Map/Reduce on EC2 tirsdag 14. september 2010
  46. 46. Elastic Map/Reduce • Same code • Same input • Different configuration tirsdag 14. september 2010
  47. 47. Upload files s3cmd put oax-jz10:jar/oax-jz10.jar target/ oax.jz10.jar s3cmd.rb put oax-jz10:input/data.txt data.txt tirsdag 14. september 2010
  48. 48. Create job flow elastic-mapreduce --create --alive --log-uri s3n://oax-jz10/log tirsdag 14. september 2010
  49. 49. Register iterations elastic-mapreduce --jobflow j-1NLAIW45QUN4B --jar s3n://oax-jz10/jar/oax-jz10.jar --arg com.openadex.pres.iterations.Iteration1 --arg s3n://oax-jz10/input --arg s3n://oax-jz10/output1 tirsdag 14. september 2010
  50. 50. Download output s3cmd.rb get oax-jz10:output4/part-00000 out tirsdag 14. september 2010
  51. 51. Demo tirsdag 14. september 2010
  52. 52. Summary • Map/Reduce may be simple • Map/Reduce can be really powerful • Collaborative filtering is fun :-) tirsdag 14. september 2010
  53. 53. tirsdag 14. september 2010
  54. 54. Thank you Ole-Martin Mørk olemartin@gmail.com twitter.com/olemartin del.icio.us/olemartin/jz10 All images are licensed with Creative Commons. See http://bit.ly/mr-photos for details, tirsdag 14. september 2010

×