Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building a Recommendation Engine Using Diverse Features by Divyanshu Vats

1,690 views

Published on

Spark Summit East Talk

Published in: Data & Analytics

Building a Recommendation Engine Using Diverse Features by Divyanshu Vats

  1. 1. BUILDING A RECOMMENDATION ENGINE USING DIVERSE FEATURES Divyanshu Vats Sailthru
  2. 2. Joint work with Max Sperlich (Sailthru) David Glueck (Sailthru) Alex Gaudio (Alluvium) Jeremy Stanley (Instacart)
  3. 3. Customer Retention Cloud
  4. 4. users items
  5. 5. users items Buy?
  6. 6. users items Buy? Matrix Factorization model = ALS.trainImplicit(data) model.predict(user_id, item_id)
  7. 7. users items Buy? Matrix Factorization model = ALS.trainImplicit(data) model.predict(user_id, item_id) Other Methods - Item-Item Similarity - Content Based - Nearest neighbor based methods
  8. 8. users items Buy? Matrix Factorization model = ALS.trainImplicit(data) model.predict(user_id, item_id) Other Methods - Item-Item Similarity - Content Based - Nearest neighbor based methods Open source scalable implementations Easily Incorporate additional features?
  9. 9. Additional Features?
  10. 10. Additional Features? title image description item location browser # clicks user
  11. 11. Additional Features? title image description item location browser # clicks user user-item - purchases - item views
  12. 12. Goal: Framework to easily integrate user, item, and user-item interactions for meaningful recommendations
  13. 13. prediction interval response
  14. 14. prediction interval response 1features 0
  15. 15. prediction interval response user item user-item interactions user-item features 1 0
  16. 16. prediction interval response Can use a combination of collaborative filtering algorithms! user item 1 0 user-item interactions user-item features
  17. 17. So far…. users items Buy? - Easily integrate user, item, and user-item features - Recommendation → Binary classification - Next: Implementation and use of Spark!
  18. 18. Structured Tables: e.g. url views, purchases
  19. 19. Structured Tables: e.g. url views, purchases item-item similarity filtering table filtering table filtering table filtering table
  20. 20. Structured Tables: e.g. url views, purchases filtering table filtering table filtering table filtering table Extract features: user, item, user-item features Model Building, Validation, Reporting item-item similarity Recommendations
  21. 21. Know Thy Partitions! data.select(‘user_id’, ‘url’) data.groupBy(‘user_id’) data.count() data.filter(count > 10)
  22. 22. Know Thy Partitions! data.select(‘user_id’, ‘url’) data.groupBy(‘user_id’) data.count() data.filter(count > 10) Inefficient!
  23. 23. Know Thy Partitions! partition by user_id data.select(‘user_id’, ‘url’) data.groupBy(‘user_id’) data.count() data.filter(count > 10)
  24. 24. Know Thy Partitions! data.select(‘user_id’, ‘url’) data.mapPartitions(filter_url) da partition by user_id
  25. 25. Break up application into small chunks! users = data.select(‘user_id’, ‘url’) .mapPartitions(filter_url) .collect() BigData.filter(lambda x: x.user_id in users)
  26. 26. Break up application into small chunks! users = data.select(‘user_id’, ‘url’) .mapPartitions(filter_url) .collect() BigData.filter(lambda x: x.user_id in users)
  27. 27. Break up application into small chunks! users = data.select(‘user_id’, ‘url’) .mapPartitions(filter_url) .collect() BigData.filter(lambda x: x.user_id in users)
  28. 28. Mesos + Scheduler + Docker + Spark A B C D E F - Carefully define applications and state a dependency graph - Manage graph using: github.com/sailthru/stolos
  29. 29. Mesos + Scheduler + Docker + Spark A B C D E F - Carefully define applications and state a dependency graph - Manage graph using: github.com/sailthru/stolos
  30. 30. Item-Item Similarities users items P(Buy) = 0.002
  31. 31. users items P(Buy) = 0.002 500 items 1000 items > 5000 items Item-Item Similarities
  32. 32. users items P(Buy) = 0.002 RowMatrix.computeSimilarites item-item similarities Item-Item Similarities
  33. 33. users items P(Buy) = 0.002 RowMatrix.computeSimilarites item-item similarities filter and score items for each user Item-Item Similarities
  34. 34. cluster of users and items
  35. 35. To Summarize.. users items Buy? - Recommendations using diverse features - Carefully designed spark apps connected using a graph - Future work: - More spark apps + spark-mesos cluster manager
  36. 36. THANK YOU. email: dvats@sailthru.com

×