Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Partner Webinar: Recommendation Engines with MongoDB and Hadoop

5,791
views

Published on

Personalized recommendations drive business, helping people find the products they want, the news they need, and the music they didn't know they would love. Despite the obvious advantages, many …

Personalized recommendations drive business, helping people find the products they want, the news they need, and the music they didn't know they would love. Despite the obvious advantages, many companies either don't have recommendations or don't leverage their data to make good ones. Too many recommendation engines are black-box algorithms that are hard to change or don't scale well. Using the same recommendation techniques as used at StubHub, Viacom, and AP, this technical webinar will show you how to load your data from MongoDB into Hadoop, generate recommendations, and then put those recommendations into MongoDB, ready to serve end-users. This webinar will prepare you to build a custom recommender for your company that is highly scalable, easy to understand, and built on open-source technology.

K Young: About the speaker

K Young is the CEO of Mortar Data. Mortar serves data scientists and engineers with a service that makes creating and operating high-scale data pipelines easy. Mortar contributes to several open source projects including Pig, Luigi, and the Mongo-Hadoop connector. Prior to founding Mortar Data, K built software that reaches one in ten public school students in the U.S. He holds a Computer Science degree from Rice University.

Published in: Technology

0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,791
On Slideshare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
88
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. K Young - CEO, Mortar Recommendation Engines with MongoDB and Hadoop
  • 2. Recommendation Engine Recommendation engines automatically recommend the "right" items for each user. • Retail • Music • Videos • Dating • Etc… WHAT IS IT
  • 3. EXAMPLES Recommendation Engine LinkedIn: 50% of new connections come from "People You May Know" Netflix: 75% of content is viewed because of a recommendation Amazon: 35% of sales are driven by recommendations
  • 4. THAT’S ME K Young
  • 5. FOR THIS WEBINAR Agenda 1. Recommendation Engines 2. Hadoop 3. Demo: Build a Recommendation Engine 4. Your Recommendation Engine 5. Q&A
  • 6. Recommendation Engine NOW GENERALLY AVAILABLE • Open source, free • Very flexible • Massively Scalable • 100% Customizable • Tested and proven
  • 7. Recommendation Engine Technical implementation of how humans make recommendations. Using: • past behavior • similar users • content metadata • outside signals e.g. instagram HOW DO THEY WORK?
  • 8. Recommendation Engine USER INTERACTIONS: SIGNALS
  • 9. Recommendation Engine ITEM-ITEM RECOMMENDATIONS
  • 10. Recommendation Engine USER-ITEM RECOMMENDATIONS
  • 11. WHERE DO RECOMMENDATIONS APPEAR? Recommendation Engine Landing page Product page Cart Push email Etc.
  • 12. Recommendation Engine Predictions based on macro-trends, e.g. trending on twitter Numeric predictions, e.g. price elasticity WHAT IS IT ISN’T
  • 13. A WARNING Recommendation Engine Recommendation engines are famously hard to launch because they touch: engineering, finance, product, executive. How to succeed: 1) speedy implementation (target 1 week) 2) engine flexibility 3) gradual roll-out 4) visible KPI-impact
  • 14. RAPID OVERVIEW Hadoop Platform for distributed data processing. Strengths: • Can scale up to thousands of computers • Widely used • Very broadly applicable • Free, open Problem: • Difficult to use for complex problems
  • 15. ON HADOOP Pig Less code Compiles to native Hadoop code Popular (LinkedIn, Twitter, Sal esforce, Yahoo, Spoti fy...)
  • 16. BRIEF, EXPRESSIVE LIKE PROCEDURAL SQL Pig (thanks: twitter hadoop world presentation)
  • 17. FOR SERIOUS The Same Script, In MapReduce
  • 18. MOTIVATIONS MongoDB + Pig Data storage and data processing are often separate concerns Hadoop is built for scalable processing of large datasets
  • 19. SIMILAR PHILOSOPHY MongoDB, Pig Poly-structured data • MongoDB: stores data, regardless of structure • Pig: reads data, regardless of structure
  • 20. SIMILAR PHILOSOPHY MongoDB Hadoop Connector Open source connector for Hadoop (and family) to read from and write to MongoDB. (Links at end).
  • 21. Build a recommendation engine ENOUGH PREAMBLE, NOW IT’S… Demo Time!
  • 22. Build a recommendation engine DEMO AGENDA 1) Intro to Mortar 2) Download recommendation code 3) Hook up the demo implementation (last.fm) 4) Generate recommendations at scale 5) View recommendations
  • 23. Build a recommendation engine DEMO Use Mortar for demo Free to use Open, code runs anywhere Complete tutorial online (link at end)
  • 24. Mortar ONLINE TUTORIAL
  • 25. Mortar FAST INTRO
  • 26. Mortar FAST INTRO Data science lacks a way to organize, test, deploy, and collaborate with code. So: • One-button code deployment, powered by Github • Award-winning job monitoring and visualization • Realtime log collection and error analysis • Free local development with one-click installation
  • 27. >mortar projects:fork git@github.com:mortardata/mortar-recsys.git mortar_webinar_20140415 Sending request to register project: mortar_webinar_20140415... done Status: Success! Your project is ready for use. Type 'mortar help' to see the commands you can perform on the project.
  • 28. DEFINITIONS Recommendation Engine Users: Someone interacting with your items and generating events that you capture Items: The things you are recommending: videos, articles, products, etc. Signal: A user-item interaction with a weighting that tells us the relative value of the interaction.
  • 29. Recommendation Engine USER INTERACTIONS: SIGNALS
  • 30. STEPS Recommendation Engine Steps in a recommendation engine: • Load your data • Generate your signals • Call code to generate recommendations • Store your recommendations Not covered today: • Serve your recommendations • Track KPI-impact
  • 31. DEMO Recommendation Engine 17.5MM documents of 360K users’ top played artists. Provided by Last.fm at http://www.dtic.upf.edu/~ocelma/MusicRec ommendationDataset/lastfm-360K.html Used a Pig job to load a MongoLab database with the data.
  • 32. >db.lastfm_plays.find() { "user" : "faf…a60", "num_plays" : 67, "artist_name" : "beastie boys" } { "user" : "faf0…a60", "num_plays" : 66, "artist_name" : "the beatles" } { "user" : "faf0…a60", "num_plays" : 65, "artist_name" : "the smashing pumpkins" }
  • 33. DEMO: LOAD THE DATA Recommendation Engine First step: Load our listening data.
  • 34. %default DB 'mongo_webinar' %default PLAYS_COLLECTION ‘lastfm_plays' raw_input = load '$CONN/$DB.$PLAYS_COLLECTION' using com.mongodb.hadoop.pig.MongoLoader(' user:chararray, artist_name:chararray, num_plays:int '); Pig code
  • 35. DEMO: GENERATE SIGNALS Recommendation Engine Now that we have our data loaded we need to extract: user, item, signal.
  • 36. user_signals = foreach raw_input generate user, artist_name as item, num_plays as weight:int; Pig code
  • 37. DEMO: CALL MORTAR Recommendation Engine Now that the data is in the correct format we’ll call the mortar algorithms for generating item-item and user-item recommendations.
  • 38. item_item_recs = recsys__GetItemItemRecommendations(user_signals); user_item_recs = recsys__GetUserItemRecommendations(user_signals, item_item_recs); Pig code
  • 39. DEMO: STORE OUR RESULTS Recommendation Engine Now that we have our results let’s store them back to MongoDB for use by our application.
  • 40. %default II_COLLECTION 'item_item_recs' %default UI_COLLECTION 'user_item_recs' store item_item_recs into '$CONN/$DB.$II_COLLECTION' using com.mongodb.hadoop.pig.MongoInsertStorage('',''); store user_item_recs into '$CONN/$DB.$UI_COLLECTION' using com.mongodb.hadoop.pig.MongoInsertStorage('',''); Pig code
  • 41. DEMO: RUN IT! Recommendation Engine Now we’re going to use Mortar to start and manage a Hadoop cluster to run our recommender.
  • 42. >mortar run pigscripts/mongo/lastfm-recsys-online.pig -f params/lastfm.params --clustersize 10 Taking code snapshot... done Sending code snapshot to Mortar... done Requesting job execution... done job_id: 534462bea22f3803fd9cacca Job status can be viewed on the web at: https://app.mortardata.com/jobs/job_detail?job_id=534462bea22f3803 fd9cacca
  • 43. >db.item_item_recs.find() { "item_A":"yo-yo ma", "rank":1, "item_B":"natalie clein" } { "item_A":"miley cyrus", "rank":1, "item_B":"miley cyrus and billy ray cyrus” } { "item_A":"dimmu borgir", "rank":1, "item_B":"ad inferna” }
  • 44. EVALUATING YOUR RESULTS Your Recommendation Engine At first, use your knowledge of your domain knowledge to determine whether recommendations are sensible. Mortar provides a recommendation browser.
  • 45. EVALUATING YOUR RESULTS Your Recommendation Engine Optionally get detailed recommendations.
  • 46. item_item_recs = recsys__GetItemItemRecommendationsDetailed(user_signals); Pig code
  • 47. EVALUATING YOUR RESULTS Your Recommendation Engine Later, run A/B tests with your recommendations to see how they improve the metrics you care about. Usually not multivariate. Usually no training set is possible.
  • 48. CUSTOMIZING Your Recommendation Engine To make customization easier Mortar has help documentation and code covering more than a dozen common cases: • Removing bots from your signal data • Removing out-of-stock items • Boosting popular items • Adding categories to your items • Cold start • Greater discovery and variety
  • 49. PRODUCTION QUESTIONS Your Recommendation Engine How do you read your MongoDB? 1) Read backup files from S3 2) Connect to secondary nodes 3) Connect to primary nodes 4) Connect to dedicated analytics nodes 5) Turn file-system snapshot backups into BSON
  • 50. PRODUCTION QUESTIONS Your Recommendation Engine How do you release new recommendations while serving the old ones? API Flip between live and offline database Also enables rollback
  • 51. WE DISCUSSED Summary What a recommendation engine is How Hadoop works with MongoDB Set up a demo recommendation engine How to connect your data Touched on advanced techniques Steered away from pot holes Resources for next step
  • 52. help.mortardata.com/recommenders answers.mortardata.com @kky @mortardata

×