Your SlideShare is downloading. ×
Webinar Presentation: Building a Big Data Recommendation Engine
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Webinar Presentation: Building a Big Data Recommendation Engine

1,821
views

Published on

Relevant recommendations play a major role in the positive buyer experience and have become a critical tool for online retailers, banks, insurance providers and most all other industries. …

Relevant recommendations play a major role in the positive buyer experience and have become a critical tool for online retailers, banks, insurance providers and most all other industries.

The Big Data platform provides a scalable, flexible and cost effective infrastructure to create a recommendation engine that provides exponential improvement over traditional psych-demographic profiling and "old world" targeted marketing.

Caserta Concepts and Datameer held a webinar to share how to integrate unstructured log data with traditional data warehouse data on Hadoop to build a robust Big Data Recommendation Engine.

Speakers:
Elliott Cordo, Principal Consultant, Caserta Concepts
Adam Gugliciello, Solutions Engineer, Datameer

For more information, visit http://www.casertaconcepts.com/ or http://www.datameer.com/

Published in: Technology

0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,821
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
51
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Focused Expertise Industries Served • Data Warehouse Design • Healthcare / Insurance • Business Intelligence • Financial Services • Big Data Analytics • Retail / eCommerce • Search / Relevance • Digital Media / Marketing • Infographics • K-12 / Higher Education 445 Park Ave New York, NY | 1-855-755-2246 | info@casertaconcepts.comBig DataAnalytics
  • 2. Recommendations• Your customers expect them • Good recommendations make life easier • Help them find information, products, and services they might not have thought of• What makes a good recommendation? • Relevant but not obvious • Sense of “surprise” SOLD!! 23” LED TV 24” LED TV 25” LED TV 23” LED TV`` Blu-Ray Home Theater HDMI Cables
  • 3. Where can recommendationsengines be found?• Applications can be found in a wide variety of industries and applications: • Travel • Service Industry • Music/Online radio • TV and Video • Online Publications • Retail ..and countless others Our Use Case: Movie Ratings!
  • 4. Our Goal• Create a powerful, scalable recommendation engine with minimal development• Make recommendations to users as they are browsing movie titles - instantaneously• Recommendation must have context to the movie they are currently viewing. OOPS! – too much surprise!
  • 5. How do we hope to accomplish this?Hadoop – distributed file system and processing platformMahout – collection of machine learning librariesWe will leverage 2 algorithms:• Item Similarity– how similar is this particular movie to other movies based on usage• Item-Based Recommender – predict an individuals preference based on their peers ratings• Both algorithms only require a simple dataset of 3 fields: “User ID” , “Item ID”, “Rating”
  • 6. Item Similarity – Context, Content Filtering“People who liked this movie liked these as well”• Item Similarity builds a matrix of items to other items and calculates similarity (based on user rating)• The most similar item are then output as a list: • Item ID, Similar Item ID, Similarity Score • Items with the highest score are most similar • In this example users who liked “Twelve Monkeys” (7) also like “Fargo” (100) 7 100 0.690951001800917 7 50 0.653299445638532 7 117 0.643701303640083
  • 7. Item-Base – Peer, Collaborative Filtering“People with similar taste to you liked these movies”• Item-Base takes the Item Similarity matrix and weights based on “peer” user preference.• Essentially it determines the best movie critics for you to follow• The items with the highest recommendation score are then output as tuples • User ID [Item ID1:Score,…., Item IDn:Score] • Items with the highest recommendation score are the most relevant to this user • For user “Johny Sisklebert” (572), the two most highly recommended movies are “Seven” and “Donnie Brasco”572 [11:5.0,293:4.70718,8:4.688335,273:4.687676,427:4.685926,234:4.683155,168:4.669672,89:4.66959,4:4.65515]573 [487:4.54397,1203:4.5291,616:4.51644,605:4.49344,709:4.3406,502:4.33706,152:4.32263,503:4.20515,432:4.26455,611:4.22019]574 [1:5.0,902:5.0,546:5.0,13:5.0,534:5.0,533:5.0,531:5.0,1082:5.0,1631:5.0,515:5.0]
  • 8. Recommendation Store• Serving recommendations needs to be instantaneous We need a database!• The core to this solution is two reference tables: Rec_Item_Similarity Rec_User_Item_Base Item_ID User_ID Similar_Item Item_ID Similarity_Score Recommendation_Score• When called to make recommendations we query our store • Rec_Item_Similarity based on the Item_ID they are viewing • Rec_User_Item_Base based on their User_ID
  • 9. Delivering Recommendations So if Johny is viewing “12 Monkeys” we query our recommendation store and present the results Item Similarity Raw Score Score Item-Base (Peer) Raw Score ScoreFargo 0.691 1.000 Seven 5.000 1.000Star Wars 0.653 0.946 Donnie Brasco 4.707 Item-Based: 0.941Rock, The 0.644 0.932 Babe 4.688 0.938Pulp Fiction 0.628 0.909 Peers like these Heat 4.688 0.938Return of the Jedi 0.627 0.908 Movies To Kill a Mockingbird 4.686 0.937Independence Day 0.618 0.894 Jaws 4.683 0.937Willy Wonka 0.603 0.872 Monty Python, Holy Grail 4.670 0.934Mission: Impossible 0.597 0.864 Best Blade Runner 4.670 0.934Silence of the Lambs, The 0.596 0.863 Get Shorty Recommendations 4.655 0.931Star Trek: First Contact 0.594 0.859Raiders of the Lost Ark 0.584 0.845Terminator, The 0.574 0.831 Top 10 RecommendationsBlade Runner 0.571 0.826Usual Suspects, The 0.569 0.823 Seven (Se7en) 1.823Seven (Se7en) 0.569 0.823 Blade Runner 1.760 Fargo 1.000 Star Wars 0.946 Donnie Brasco 0.941 Babe 0.938 Heat 0.938 To Kill a Mockingbird 0.937 Jaws 0.937 Monty Python, Holy Grail 0.934
  • 10. From Good to Great Recommendations• Note that the first 5 recommendations look pretty good …but the 6th result would have been “Babe” the childrens movie OOPS!• Tuning the algorithms might help: parameter changes, similarity measures.• How else can we make it better?1. Delivery filters2. Introduce additional algorithms such as K-Means, or Fuzzy K-Means
  • 11. Delivery Scoring and Filters Apply assumptions to control the results of collaborative filtering • One or more categories must match • Only children movies will be recommended for childrens movies. Action Adventure Childrens Comedy Crime Drama Film-Noir Horror Romance Sci-Fi ThrillerTwelve Monkeys 0 0 0 0 0 1 0 0 0 1 0Babe 0 0 1 1 0 1 0 0 0 0 0Seven (Se7en) 0 0 0 0 1 1 0 0 0 0 1Star Wars 1 1 0 0 0 0 0 0 1 1 0Blade Runner 0 0 0 0 0 0 1 0 0 1 0Fargo 0 0 0 0 1 1 0 0 0 0 1Willy Wonka 0 1 1 1 0 0 0 0 0 0 0Monty Python 0 0 0 1 0 0 0 0 0 0 0Jaws 1 0 0 0 0 0 0 1 0 0 0Heat 1 0 0 0 1 0 0 0 0 0 1Donnie Brasco 0 0 0 0 1 1 0 0 0 0 0To Kill a Mockingbird 0 0 0 0 0 1 0 0 0 0 0 Similarly logic could be applied to promote more favorable options • New Releases • Retail Case: Items that are on-sale, overstock
  • 12. Additional Algorithm – K-Means “These movies are similar based on their attributes” • Treats items as coordinates • Places a number of random “centroids” and assigns the nearest items • Moves the centroids around based on average location • Process repeats until the assignments stop changingWe would use the major attributes of the Movie to create coordinate points.• Categories• Actors• Director• Synopsis Text
  • 13. Integrating K-Means into the processMovies recommended by more than 1 algorithm are the most highly rated K-Means: Item-Based Similar Item Similarity Best Recommendations
  • 14. Summary• Mahout and Hadoop can provide a relatively low cost and extremely scalable platform for recommendations• Mahout offers a great library of established Machine Learning libraries, reducing development efforts• A good recommendation system combines Collaborative and Content filtering algorithms elliott@casertaconcepts.com