Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data @ Hippo - GetTogether 2014

308 views

Published on

Analysis of Hippo's own data with machine learning - actionable insights, recommendation engine

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Big Data @ Hippo - GetTogether 2014

  1. 1. follow the Hippo trail Hippo GetTogether 2014 Big Data @ Hippo Hippo GetTogether 2014 - Trouw Frank van Lankvelt follow the Hippo trail
  2. 2. follow the Hippo trail Hippo GetTogether 2014 Co-occurrence Relating Attributes
  3. 3. follow the Hippo trail Hippo GetTogether 2014 Scary Math
  4. 4. follow the Hippo trail Hippo GetTogether 2014 Contingency Table A not A B x 20 - x 20 not B 40 - x 140 + x 180 40 160 200 Documents A, B total # visitors visitors of B visitors of A x P(x >= 8) ≈ 3% visitors of A & B
  5. 5. follow the Hippo trail Hippo GetTogether 2014 Co-occurrence Insights Insight: a high cohesion of page visits in the partner section standing out from the regular ‘.com’ visitor cluster suggests that visitors looking for a partner go through every single page and probably can’t find what they’re looking for. Action: Hippo suggests to improve navigation, search or filtering. ● attribute / url relatedness find partner /fr .com.org genericrelease notes
  6. 6. follow the Hippo trail Hippo GetTogether 2014 Recommendations Alice Bob Charlie Star Wars 3 4 Finding Nemo 3 4 Sound of Music 5 1 2 genre stars Star Wars sci-fi Portman Finding Nemo animation DeGeneres Sound of Music musical Andrews user - item (rating) collaborative filtering content (meta) data which documents are interesting for ME? find docs similar to visited documents find docs co-occurring with visited documents
  7. 7. follow the Hippo trail Hippo GetTogether 2014 Implementation combine in search index: Recommendation Query Content-based: (meta) data Collaborative Filtering: co-occurrence
  8. 8. follow the Hippo trail Hippo GetTogether 2014
  9. 9. follow the Hippo trail Hippo GetTogether 2014 Recommended For You 1.Collect ID of viewed content 2.Calculate co-occurrences 3.Index, along with content IDs of co-viewed documents 4.Search with recent IDs, similarity
  10. 10. follow the Hippo trail Hippo GetTogether 2014 Patterns Beyond Co-occurrence
  11. 11. follow the Hippo trail Hippo GetTogether 2014 Patterns in the Data customers that buy diapers often buy beer as well (young dads rewarding themselves?)
  12. 12. follow the Hippo trail Hippo GetTogether 2014 Itemsets Rules Find the patterns (association rule mining): 1.sets of items that are bought together P(beer,diapers) > 1% (support) 1.subsets that are good predictors > 4 (lift)P(beer,diapers) P(beer) P(diapers)
  13. 13. follow the Hippo trail Hippo GetTogether 2014 http://www.onehippo.com/en/thankyou - Thank You Beer? Diapers? Conversions!!!
  14. 14. follow the Hippo trail Hippo GetTogether 2014 http://www.onehippo.com/en/thankyou will a visitor go there? P(conversion|request log) what are the relevant “signals”? which configuration performs best?
  15. 15. follow the Hippo trail Hippo GetTogether 2014 Patterns For Conversion single item: referrer www.google.com pattern/itemset: visited demo 2014 week 4 correlations
  16. 16. follow the Hippo trail Hippo GetTogether 2014 Scary Data Structure
  17. 17. follow the Hippo trail Hippo GetTogether 2014 1.Build Frequent Prefix Tree (FPGrowth) 2.Extract patterns relevant for conversion (using contingencies) Finding Frequent Itemsets
  18. 18. follow the Hippo trail Hippo GetTogether 2014 Pattern Contingency Table converted not converted pattern matches pattern does not match converted ● visited /thankyou sample pattern ● visited demo ● in 2014 week 4
  19. 19. follow the Hippo trail Hippo GetTogether 2014 Sub-Pattern Filtering Problem: when pattern (A, B, C) is relevant, patterns (A), (B), (C), (A, B), (A, C), (B, C) (likely) also match. E.g. with C meta-data on page B. Solution: test for independence using contingency!
  20. 20. follow the Hippo trail Hippo GetTogether 2014 Actionable Insights? The found itemsets are quite numerous and seem to contain a lot of redundancy. But they are certainly interesting, e.g. for a periodic evaluation.
  21. 21. follow the Hippo trail Hippo GetTogether 2014 Personalization Putting Patterns to Use
  22. 22. follow the Hippo trail Hippo GetTogether 2014 Naive A/B Testing The naive solution: route some traffic to alternative configuration A (old config): 80% B (new config): 20% run for some time see if B has relatively more conversions
  23. 23. follow the Hippo trail Hippo GetTogether 2014 Problems With Naive Solution if B is drastically worse, 20% of traffic is LOST marketer must regularly check and decide when has a new config PROVEN itself? number of concurrent experiments is LOW no user context
  24. 24. follow the Hippo trail Hippo GetTogether 2014 Scary Math
  25. 25. follow the Hippo trail Hippo GetTogether 2014 Predict Conversion Conversion rate depends on context: x the patterns w the “weights” ϕ cdf of normal dist.
  26. 26. follow the Hippo trail Hippo GetTogether 2014 Experimental Setup Split data set (.org + .com) 1.training set 189660 visitors, 435 conversions 2.test set 27013 visitors, 40 conversions
  27. 27. follow the Hippo trail Hippo GetTogether 2014 Can We Predict Conversion? 1260 itemsets ROC curve TPR versus FPR @ false positive rate 10% : 96% true positive rate
  28. 28. follow the Hippo trail Hippo GetTogether 2014 Towards Actionable Insights Use A utomatic R elevance D etermination to prune the patterns (optimize the prior) σ μ relevant irrelevant weights (w)
  29. 29. follow the Hippo trail Hippo GetTogether 2014 Top 20 Patterns For Conversion referer.go.onehippo.com .pathInfo./resources/whitepapers/forrester-market- overview-web-content-management-systems.html .pathInfo./resources/whitepapers/cms---a-critical- solution-for-todays-ecommerce.html .pathInfo./resources/whitepapers/hippo-cms-for-the- enterprise.html .pathInfo./resources/whitepapers/web-content- management-in-the-cloud.html .collectorData.channel.One Hippo English Site .collectorData.audience.terms. referer.www.onehippo.com .collectorData.categories.terms.cms .pathInfo./mobile-cms .collectorData.channel.One Hippo English Site .pathInfo./ressourcen/demo .pathInfo./resources/videos/hippo-cms-grand- tour.html .collectorData.channel.One Hippo English Site .collectorData.audience.terms. .collectorData.categories.terms.cms .pathInfo./ressources/demo .pathInfo./what_to_buy/compare.html referer.www.cmswire.com .pathInfo./resources/demo .collectorData.categories.terms.mobile .pathInfo./resources/whitepapers/understanding-hippo-cms-7- software-architecture.html .pathInfo./resources/whitepapers/selecting-today’s- enterprise-web-content-management-system.html .collectorData.channel.One Hippo English Site referer.www.google.nl referer.www.onehippo.com .pathInfo./resources/videos/a- quick-overview-of-hippo-cms-in-just-under-3-minutes.html .collectorData.categories.terms.repository .pathInfo./resources/whitepapers/selecting-today’s- enterprise-web-content-management-system.html .collectorData.categories.terms. .collectorData.categories.terms.relevance
  30. 30. follow the Hippo trail Hippo GetTogether 2014 Actionable Insights! we can find a small model that can be used for human interpretation and automated personalization
  31. 31. follow the Hippo trail Hippo GetTogether 2014 Product Challenge KISS # parameters should be minimal
  32. 32. follow the Hippo trail Hippo GetTogether 2014 Parameters Recommendations 1 hyper-param Personalization idem NICE!
  33. 33. follow the Hippo trail Hippo GetTogether 2014 Questions?

×