Mahout in Action          Part 1    Yasmine M. Gaber      28 February 2013
Agenda    Meet Apache Mahout    Part 1: Recommendation    Part 2: Clustering    Part 3: Classification
Meet Apache Mahout  It is an open source machine learning libraryfrom Apache    It is scalable    It is a Java library...
Famous Engines  Recommender engines: Amazon.com Netflix Dating sites like Líbímseti Social networking sites like Face...
Recommendations
Recommender Input    A preference consists of a user ID and an item    ID, user’s preference for the item    It is .csv ...
Create Recommender
Recommender Evaluation    Average difference vs Root-mean-square
Mahout RecommenderEvaluator
Precision and Recall
RecommenderIRStatsEvaluator
Representing Recommender Data    Preference object    −   new GenericPreference(123, 456, 3.0f)    Preference Array
Representing Recommender Data    Preference Array    FastByIDMap and FastIDSet
In-memory DataModels    GenericDataModel    File-based data    Refreshable components    Database-based data
Coping without preference values
Coping without preference values
User-based Recommender    The algorithmfor every item i that u has no preference for yet for every other user v that has ...
Recommender Components    Data model, implemented via DataModel    User-user similarity metric, implemented via    UserS...
GenericUserBasedRecommender
User Neighborhoods    Fixed-size neighborhoods    Threshold-based neighborhood
similarity metrics    Pearson correlation–based similarity    −   It is a number between –1 and 1 that measures        th...
similarity metrics    Euclidean distance similarity    −   1 / (1+euclidean distance)    Cosine measure similarity    − ...
Item-based recommendation    The algorithmfor every item i that u has no preference for yet for every item j that u has a...
GenericItemBasedRecommender
Slope-one recommender    The algorithmfor every item i the user u expresses no preference for for every item j that user ...
Taking Recommender to Production
User-based recommenders
Thank You               Contact at:Email: Yasmine.Gaber@espace.com.egTwitter: Twitter.com/yasmine_mohamed
Upcoming SlideShare
Loading in …5
×

Mahout part1

1,810 views
1,583 views

Published on

Part one of a presentation about Mahout system. It is based on http://my.safaribooksonline.com/9781935182689/

Published in: Education, Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,810
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
88
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Mahout part1

  1. 1. Mahout in Action Part 1 Yasmine M. Gaber 28 February 2013
  2. 2. Agenda Meet Apache Mahout Part 1: Recommendation Part 2: Clustering Part 3: Classification
  3. 3. Meet Apache Mahout It is an open source machine learning libraryfrom Apache It is scalable It is a Java library It can be used with Hadoop to deal with largescale data.
  4. 4. Famous Engines Recommender engines: Amazon.com Netflix Dating sites like Líbímseti Social networking sites like Facebook Clustering engines: Google News Search engines like Clusty Classification engines: Spam emails Google’s Picasa Optical character recognition software Apple’s Genius feature in iTunes
  5. 5. Recommendations
  6. 6. Recommender Input A preference consists of a user ID and an item ID, user’s preference for the item It is .csv file
  7. 7. Create Recommender
  8. 8. Recommender Evaluation Average difference vs Root-mean-square
  9. 9. Mahout RecommenderEvaluator
  10. 10. Precision and Recall
  11. 11. RecommenderIRStatsEvaluator
  12. 12. Representing Recommender Data Preference object − new GenericPreference(123, 456, 3.0f) Preference Array
  13. 13. Representing Recommender Data Preference Array FastByIDMap and FastIDSet
  14. 14. In-memory DataModels GenericDataModel File-based data Refreshable components Database-based data
  15. 15. Coping without preference values
  16. 16. Coping without preference values
  17. 17. User-based Recommender The algorithmfor every item i that u has no preference for yet for every other user v that has a preference for i compute a similarity s between u and v incorporate vs preference for i, weighted by s, into a running averagereturn the top items, ranked by weighted average
  18. 18. Recommender Components Data model, implemented via DataModel User-user similarity metric, implemented via UserSimilarity User neighborhood definition, implemented via UserNeighborhood Recommender engine, implemented via a Recommender (here,
  19. 19. GenericUserBasedRecommender
  20. 20. User Neighborhoods Fixed-size neighborhoods Threshold-based neighborhood
  21. 21. similarity metrics Pearson correlation–based similarity − It is a number between –1 and 1 that measures the tendency of two series of numbers, paired up one-to-one, to move together − Problems:  It doesn’t take into account the number of items in which two users’ preferences overlap, which is probably a weakness in the context of recommender engines.  If two users overlap on only one item, no correlation can be computed because of how the computation is defined
  22. 22. similarity metrics Euclidean distance similarity − 1 / (1+euclidean distance) Cosine measure similarity − between –1 and 1 Tanimoto coefficient similarity − The ratio of the size of the intersection to the size of the union of their preferred items
  23. 23. Item-based recommendation The algorithmfor every item i that u has no preference for yet for every item j that u has a preference for compute a similarity s between i and j add us preference for j, weighted by s, to a running averagereturn the top items, ranked by weighted average
  24. 24. GenericItemBasedRecommender
  25. 25. Slope-one recommender The algorithmfor every item i the user u expresses no preference for for every item j that user u expresses a preference for find the average preference difference between j and i add this diff to us preference value for j add this to a running averagereturn the top items, ranked by these averages
  26. 26. Taking Recommender to Production
  27. 27. User-based recommenders
  28. 28. Thank You Contact at:Email: Yasmine.Gaber@espace.com.egTwitter: Twitter.com/yasmine_mohamed

×