SlideShare a Scribd company logo
1 of 33
follow the Hippo trail
Hippo GetTogether 2014
Big Data @ Hippo
Hippo GetTogether 2014 - Trouw
Frank van Lankvelt
follow the Hippo trail
follow the Hippo trail
Hippo GetTogether 2014
Co-occurrence
Relating Attributes
follow the Hippo trail
Hippo GetTogether 2014
Scary Math
follow the Hippo trail
Hippo GetTogether 2014
Contingency Table
A not A
B x 20 - x 20
not B 40 - x 140 + x 180
40 160 200
Documents A, B
total # visitors
visitors of B
visitors of A
x
P(x >= 8) ≈ 3%
visitors of A & B
follow the Hippo trail
Hippo GetTogether 2014
Co-occurrence Insights
Insight: a high cohesion of page visits in the partner section
standing out from the regular ‘.com’ visitor cluster suggests that
visitors looking for a partner go through every single page and
probably can’t find what they’re looking for.
Action: Hippo suggests to improve navigation, search or filtering.
● attribute / url
relatedness
find partner
/fr
.com.org
genericrelease
notes
follow the Hippo trail
Hippo GetTogether 2014
Recommendations
Alice Bob Charlie
Star Wars 3 4
Finding
Nemo
3 4
Sound of
Music
5 1 2
genre stars
Star Wars sci-fi Portman
Finding
Nemo
animation DeGeneres
Sound of
Music
musical Andrews
user - item (rating)
collaborative filtering
content
(meta) data
which documents are interesting for ME?
find docs similar to visited documents find docs co-occurring with visited documents
follow the Hippo trail
Hippo GetTogether 2014
Implementation
combine in search index:
Recommendation Query
Content-based:
(meta) data
Collaborative Filtering:
co-occurrence
follow the Hippo trail
Hippo GetTogether 2014
follow the Hippo trail
Hippo GetTogether 2014
Recommended For You
1.Collect ID of viewed content
2.Calculate co-occurrences
3.Index, along with content
IDs of co-viewed documents
4.Search with recent IDs, similarity
follow the Hippo trail
Hippo GetTogether 2014
Patterns
Beyond Co-occurrence
follow the Hippo trail
Hippo GetTogether 2014
Patterns in the Data
customers that buy diapers often buy beer as well
(young dads rewarding themselves?)
follow the Hippo trail
Hippo GetTogether 2014
Itemsets Rules
Find the patterns (association rule mining):
1.sets of items that are bought together
P(beer,diapers) > 1%
(support)
1.subsets that are good predictors
> 4 (lift)P(beer,diapers)
P(beer) P(diapers)
follow the Hippo trail
Hippo GetTogether 2014
http://www.onehippo.com/en/thankyou - Thank You
Beer? Diapers? Conversions!!!
follow the Hippo trail
Hippo GetTogether 2014
http://www.onehippo.com/en/thankyou
will a visitor go there?
P(conversion|request log)
what are the relevant “signals”?
which configuration performs best?
follow the Hippo trail
Hippo GetTogether 2014
Patterns For Conversion
single item:
referrer www.google.com
pattern/itemset:
visited demo
2014 week 4
correlations
follow the Hippo trail
Hippo GetTogether 2014
Scary Data
Structure
follow the Hippo trail
Hippo GetTogether 2014
1.Build Frequent Prefix Tree
(FPGrowth)
2.Extract patterns relevant for conversion
(using contingencies)
Finding Frequent Itemsets
follow the Hippo trail
Hippo GetTogether 2014
Pattern Contingency Table
converted not converted
pattern
matches
pattern
does not
match
converted
● visited /thankyou
sample pattern
● visited demo
● in 2014 week 4
follow the Hippo trail
Hippo GetTogether 2014
Sub-Pattern Filtering
Problem:
when pattern (A, B, C) is relevant, patterns
(A), (B), (C), (A, B), (A, C), (B, C)
(likely) also match. E.g. with C meta-data on page B.
Solution:
test for independence using contingency!
follow the Hippo trail
Hippo GetTogether 2014
Actionable Insights?
The found itemsets are quite numerous and
seem to contain a lot of redundancy.
But they are certainly interesting, e.g. for a
periodic evaluation.
follow the Hippo trail
Hippo GetTogether 2014
Personalization
Putting Patterns to Use
follow the Hippo trail
Hippo GetTogether 2014
Naive A/B Testing
The naive solution:
route some traffic to alternative configuration
A (old config): 80%
B (new config): 20%
run for some time
see if B has relatively more conversions
follow the Hippo trail
Hippo GetTogether 2014
Problems With Naive Solution
if B is drastically worse,
20% of traffic is LOST
marketer must regularly check and decide
when has a new config PROVEN itself?
number of concurrent experiments is LOW
no user context
follow the Hippo trail
Hippo GetTogether 2014
Scary Math
follow the Hippo trail
Hippo GetTogether 2014
Predict Conversion
Conversion rate depends on context:
x the patterns
w the “weights”
ϕ cdf of normal dist.
follow the Hippo trail
Hippo GetTogether 2014
Experimental Setup
Split data set (.org + .com)
1.training set
189660 visitors, 435 conversions
2.test set
27013 visitors, 40 conversions
follow the Hippo trail
Hippo GetTogether 2014
Can We Predict Conversion?
1260 itemsets
ROC curve
TPR versus FPR
@ false positive rate 10%
: 96% true positive rate
follow the Hippo trail
Hippo GetTogether 2014
Towards Actionable Insights
Use
A utomatic
R elevance
D etermination
to prune the patterns
(optimize the prior)
σ
μ
relevant
irrelevant
weights (w)
follow the Hippo trail
Hippo GetTogether 2014
Top 20 Patterns For Conversion
referer.go.onehippo.com
.pathInfo./resources/whitepapers/forrester-market-
overview-web-content-management-systems.html
.pathInfo./resources/whitepapers/cms---a-critical-
solution-for-todays-ecommerce.html
.pathInfo./resources/whitepapers/hippo-cms-for-the-
enterprise.html
.pathInfo./resources/whitepapers/web-content-
management-in-the-cloud.html
.collectorData.channel.One Hippo English Site
.collectorData.audience.terms.
referer.www.onehippo.com
.collectorData.categories.terms.cms
.pathInfo./mobile-cms
.collectorData.channel.One Hippo English Site
.pathInfo./ressourcen/demo
.pathInfo./resources/videos/hippo-cms-grand-
tour.html
.collectorData.channel.One Hippo English Site
.collectorData.audience.terms.
.collectorData.categories.terms.cms
.pathInfo./ressources/demo
.pathInfo./what_to_buy/compare.html
referer.www.cmswire.com
.pathInfo./resources/demo
.collectorData.categories.terms.mobile
.pathInfo./resources/whitepapers/understanding-hippo-cms-7-
software-architecture.html
.pathInfo./resources/whitepapers/selecting-today’s-
enterprise-web-content-management-system.html
.collectorData.channel.One Hippo English Site
referer.www.google.nl
referer.www.onehippo.com .pathInfo./resources/videos/a-
quick-overview-of-hippo-cms-in-just-under-3-minutes.html
.collectorData.categories.terms.repository
.pathInfo./resources/whitepapers/selecting-today’s-
enterprise-web-content-management-system.html
.collectorData.categories.terms.
.collectorData.categories.terms.relevance
follow the Hippo trail
Hippo GetTogether 2014
Actionable Insights!
we can find a
small model
that can be used for
human interpretation
and
automated personalization
follow the Hippo trail
Hippo GetTogether 2014
Product Challenge
KISS
# parameters should be minimal
follow the Hippo trail
Hippo GetTogether 2014
Parameters
Recommendations
1 hyper-param
Personalization
idem
NICE!
follow the Hippo trail
Hippo GetTogether 2014
Questions?

More Related Content

Similar to Big Data @ Hippo - GetTogether 2014

A/B Testing - What your mother didn't tell you
A/B Testing - What your mother didn't tell youA/B Testing - What your mother didn't tell you
A/B Testing - What your mother didn't tell youCurtis Poe
 
Hippo GetTogether: The architecture behind Hippos relevance platform
Hippo GetTogether: The architecture behind Hippos relevance platformHippo GetTogether: The architecture behind Hippos relevance platform
Hippo GetTogether: The architecture behind Hippos relevance platformJeroen Reijn
 
Boost Your Productivity Through an Innovative E-Sourcing Process Using Ariba ...
Boost Your Productivity Through an Innovative E-Sourcing Process Using Ariba ...Boost Your Productivity Through an Innovative E-Sourcing Process Using Ariba ...
Boost Your Productivity Through an Innovative E-Sourcing Process Using Ariba ...SAP Ariba
 
Reining in Elusive Tail Spend With Spot Buy Purchasing
Reining in Elusive Tail Spend With Spot Buy PurchasingReining in Elusive Tail Spend With Spot Buy Purchasing
Reining in Elusive Tail Spend With Spot Buy PurchasingSAP Ariba
 
Lead Conversions: 6 Steps to Patching the Profit Leaks in Your Marketing Funnel
Lead Conversions: 6 Steps to Patching the Profit Leaks in Your Marketing Funnel Lead Conversions: 6 Steps to Patching the Profit Leaks in Your Marketing Funnel
Lead Conversions: 6 Steps to Patching the Profit Leaks in Your Marketing Funnel Christopher Marentis
 
Optimizely Experience Keynote - Dan Siroker
Optimizely Experience Keynote - Dan SirokerOptimizely Experience Keynote - Dan Siroker
Optimizely Experience Keynote - Dan SirokerOptimizely
 
Cool Tools for Creating UX Hypotheses
Cool Tools for Creating UX HypothesesCool Tools for Creating UX Hypotheses
Cool Tools for Creating UX HypothesesJason Goldberg
 
The E-Commerce reference index based on the eShopper journey
The E-Commerce reference index based on the eShopper journey The E-Commerce reference index based on the eShopper journey
The E-Commerce reference index based on the eShopper journey Planimedia
 
DIT Digitial Marketing Forum: Analytics
DIT Digitial Marketing Forum: AnalyticsDIT Digitial Marketing Forum: Analytics
DIT Digitial Marketing Forum: AnalyticsLar Veale
 
How AI will move the Kotler's 4P to SAVEEE
How AI will move the Kotler's 4P to SAVEEE How AI will move the Kotler's 4P to SAVEEE
How AI will move the Kotler's 4P to SAVEEE Hugues Rey
 
Hacking Inbound: 25+ Proven B2B Lead Generation Campaigns and Quick Wins
Hacking Inbound: 25+ Proven B2B Lead Generation Campaigns and Quick WinsHacking Inbound: 25+ Proven B2B Lead Generation Campaigns and Quick Wins
Hacking Inbound: 25+ Proven B2B Lead Generation Campaigns and Quick WinsPR 20/20
 
ESHOPPER INDEX 2015 REPORT (FREE DOWNLOAD)
ESHOPPER INDEX 2015 REPORT (FREE DOWNLOAD)ESHOPPER INDEX 2015 REPORT (FREE DOWNLOAD)
ESHOPPER INDEX 2015 REPORT (FREE DOWNLOAD)iVentures Consulting
 

Similar to Big Data @ Hippo - GetTogether 2014 (12)

A/B Testing - What your mother didn't tell you
A/B Testing - What your mother didn't tell youA/B Testing - What your mother didn't tell you
A/B Testing - What your mother didn't tell you
 
Hippo GetTogether: The architecture behind Hippos relevance platform
Hippo GetTogether: The architecture behind Hippos relevance platformHippo GetTogether: The architecture behind Hippos relevance platform
Hippo GetTogether: The architecture behind Hippos relevance platform
 
Boost Your Productivity Through an Innovative E-Sourcing Process Using Ariba ...
Boost Your Productivity Through an Innovative E-Sourcing Process Using Ariba ...Boost Your Productivity Through an Innovative E-Sourcing Process Using Ariba ...
Boost Your Productivity Through an Innovative E-Sourcing Process Using Ariba ...
 
Reining in Elusive Tail Spend With Spot Buy Purchasing
Reining in Elusive Tail Spend With Spot Buy PurchasingReining in Elusive Tail Spend With Spot Buy Purchasing
Reining in Elusive Tail Spend With Spot Buy Purchasing
 
Lead Conversions: 6 Steps to Patching the Profit Leaks in Your Marketing Funnel
Lead Conversions: 6 Steps to Patching the Profit Leaks in Your Marketing Funnel Lead Conversions: 6 Steps to Patching the Profit Leaks in Your Marketing Funnel
Lead Conversions: 6 Steps to Patching the Profit Leaks in Your Marketing Funnel
 
Optimizely Experience Keynote - Dan Siroker
Optimizely Experience Keynote - Dan SirokerOptimizely Experience Keynote - Dan Siroker
Optimizely Experience Keynote - Dan Siroker
 
Cool Tools for Creating UX Hypotheses
Cool Tools for Creating UX HypothesesCool Tools for Creating UX Hypotheses
Cool Tools for Creating UX Hypotheses
 
The E-Commerce reference index based on the eShopper journey
The E-Commerce reference index based on the eShopper journey The E-Commerce reference index based on the eShopper journey
The E-Commerce reference index based on the eShopper journey
 
DIT Digitial Marketing Forum: Analytics
DIT Digitial Marketing Forum: AnalyticsDIT Digitial Marketing Forum: Analytics
DIT Digitial Marketing Forum: Analytics
 
How AI will move the Kotler's 4P to SAVEEE
How AI will move the Kotler's 4P to SAVEEE How AI will move the Kotler's 4P to SAVEEE
How AI will move the Kotler's 4P to SAVEEE
 
Hacking Inbound: 25+ Proven B2B Lead Generation Campaigns and Quick Wins
Hacking Inbound: 25+ Proven B2B Lead Generation Campaigns and Quick WinsHacking Inbound: 25+ Proven B2B Lead Generation Campaigns and Quick Wins
Hacking Inbound: 25+ Proven B2B Lead Generation Campaigns and Quick Wins
 
ESHOPPER INDEX 2015 REPORT (FREE DOWNLOAD)
ESHOPPER INDEX 2015 REPORT (FREE DOWNLOAD)ESHOPPER INDEX 2015 REPORT (FREE DOWNLOAD)
ESHOPPER INDEX 2015 REPORT (FREE DOWNLOAD)
 

Recently uploaded

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Recently uploaded (20)

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Big Data @ Hippo - GetTogether 2014

  • 1. follow the Hippo trail Hippo GetTogether 2014 Big Data @ Hippo Hippo GetTogether 2014 - Trouw Frank van Lankvelt follow the Hippo trail
  • 2. follow the Hippo trail Hippo GetTogether 2014 Co-occurrence Relating Attributes
  • 3. follow the Hippo trail Hippo GetTogether 2014 Scary Math
  • 4. follow the Hippo trail Hippo GetTogether 2014 Contingency Table A not A B x 20 - x 20 not B 40 - x 140 + x 180 40 160 200 Documents A, B total # visitors visitors of B visitors of A x P(x >= 8) ≈ 3% visitors of A & B
  • 5. follow the Hippo trail Hippo GetTogether 2014 Co-occurrence Insights Insight: a high cohesion of page visits in the partner section standing out from the regular ‘.com’ visitor cluster suggests that visitors looking for a partner go through every single page and probably can’t find what they’re looking for. Action: Hippo suggests to improve navigation, search or filtering. ● attribute / url relatedness find partner /fr .com.org genericrelease notes
  • 6. follow the Hippo trail Hippo GetTogether 2014 Recommendations Alice Bob Charlie Star Wars 3 4 Finding Nemo 3 4 Sound of Music 5 1 2 genre stars Star Wars sci-fi Portman Finding Nemo animation DeGeneres Sound of Music musical Andrews user - item (rating) collaborative filtering content (meta) data which documents are interesting for ME? find docs similar to visited documents find docs co-occurring with visited documents
  • 7. follow the Hippo trail Hippo GetTogether 2014 Implementation combine in search index: Recommendation Query Content-based: (meta) data Collaborative Filtering: co-occurrence
  • 8. follow the Hippo trail Hippo GetTogether 2014
  • 9. follow the Hippo trail Hippo GetTogether 2014 Recommended For You 1.Collect ID of viewed content 2.Calculate co-occurrences 3.Index, along with content IDs of co-viewed documents 4.Search with recent IDs, similarity
  • 10. follow the Hippo trail Hippo GetTogether 2014 Patterns Beyond Co-occurrence
  • 11. follow the Hippo trail Hippo GetTogether 2014 Patterns in the Data customers that buy diapers often buy beer as well (young dads rewarding themselves?)
  • 12. follow the Hippo trail Hippo GetTogether 2014 Itemsets Rules Find the patterns (association rule mining): 1.sets of items that are bought together P(beer,diapers) > 1% (support) 1.subsets that are good predictors > 4 (lift)P(beer,diapers) P(beer) P(diapers)
  • 13. follow the Hippo trail Hippo GetTogether 2014 http://www.onehippo.com/en/thankyou - Thank You Beer? Diapers? Conversions!!!
  • 14. follow the Hippo trail Hippo GetTogether 2014 http://www.onehippo.com/en/thankyou will a visitor go there? P(conversion|request log) what are the relevant “signals”? which configuration performs best?
  • 15. follow the Hippo trail Hippo GetTogether 2014 Patterns For Conversion single item: referrer www.google.com pattern/itemset: visited demo 2014 week 4 correlations
  • 16. follow the Hippo trail Hippo GetTogether 2014 Scary Data Structure
  • 17. follow the Hippo trail Hippo GetTogether 2014 1.Build Frequent Prefix Tree (FPGrowth) 2.Extract patterns relevant for conversion (using contingencies) Finding Frequent Itemsets
  • 18. follow the Hippo trail Hippo GetTogether 2014 Pattern Contingency Table converted not converted pattern matches pattern does not match converted ● visited /thankyou sample pattern ● visited demo ● in 2014 week 4
  • 19. follow the Hippo trail Hippo GetTogether 2014 Sub-Pattern Filtering Problem: when pattern (A, B, C) is relevant, patterns (A), (B), (C), (A, B), (A, C), (B, C) (likely) also match. E.g. with C meta-data on page B. Solution: test for independence using contingency!
  • 20. follow the Hippo trail Hippo GetTogether 2014 Actionable Insights? The found itemsets are quite numerous and seem to contain a lot of redundancy. But they are certainly interesting, e.g. for a periodic evaluation.
  • 21. follow the Hippo trail Hippo GetTogether 2014 Personalization Putting Patterns to Use
  • 22. follow the Hippo trail Hippo GetTogether 2014 Naive A/B Testing The naive solution: route some traffic to alternative configuration A (old config): 80% B (new config): 20% run for some time see if B has relatively more conversions
  • 23. follow the Hippo trail Hippo GetTogether 2014 Problems With Naive Solution if B is drastically worse, 20% of traffic is LOST marketer must regularly check and decide when has a new config PROVEN itself? number of concurrent experiments is LOW no user context
  • 24. follow the Hippo trail Hippo GetTogether 2014 Scary Math
  • 25. follow the Hippo trail Hippo GetTogether 2014 Predict Conversion Conversion rate depends on context: x the patterns w the “weights” ϕ cdf of normal dist.
  • 26. follow the Hippo trail Hippo GetTogether 2014 Experimental Setup Split data set (.org + .com) 1.training set 189660 visitors, 435 conversions 2.test set 27013 visitors, 40 conversions
  • 27. follow the Hippo trail Hippo GetTogether 2014 Can We Predict Conversion? 1260 itemsets ROC curve TPR versus FPR @ false positive rate 10% : 96% true positive rate
  • 28. follow the Hippo trail Hippo GetTogether 2014 Towards Actionable Insights Use A utomatic R elevance D etermination to prune the patterns (optimize the prior) σ μ relevant irrelevant weights (w)
  • 29. follow the Hippo trail Hippo GetTogether 2014 Top 20 Patterns For Conversion referer.go.onehippo.com .pathInfo./resources/whitepapers/forrester-market- overview-web-content-management-systems.html .pathInfo./resources/whitepapers/cms---a-critical- solution-for-todays-ecommerce.html .pathInfo./resources/whitepapers/hippo-cms-for-the- enterprise.html .pathInfo./resources/whitepapers/web-content- management-in-the-cloud.html .collectorData.channel.One Hippo English Site .collectorData.audience.terms. referer.www.onehippo.com .collectorData.categories.terms.cms .pathInfo./mobile-cms .collectorData.channel.One Hippo English Site .pathInfo./ressourcen/demo .pathInfo./resources/videos/hippo-cms-grand- tour.html .collectorData.channel.One Hippo English Site .collectorData.audience.terms. .collectorData.categories.terms.cms .pathInfo./ressources/demo .pathInfo./what_to_buy/compare.html referer.www.cmswire.com .pathInfo./resources/demo .collectorData.categories.terms.mobile .pathInfo./resources/whitepapers/understanding-hippo-cms-7- software-architecture.html .pathInfo./resources/whitepapers/selecting-today’s- enterprise-web-content-management-system.html .collectorData.channel.One Hippo English Site referer.www.google.nl referer.www.onehippo.com .pathInfo./resources/videos/a- quick-overview-of-hippo-cms-in-just-under-3-minutes.html .collectorData.categories.terms.repository .pathInfo./resources/whitepapers/selecting-today’s- enterprise-web-content-management-system.html .collectorData.categories.terms. .collectorData.categories.terms.relevance
  • 30. follow the Hippo trail Hippo GetTogether 2014 Actionable Insights! we can find a small model that can be used for human interpretation and automated personalization
  • 31. follow the Hippo trail Hippo GetTogether 2014 Product Challenge KISS # parameters should be minimal
  • 32. follow the Hippo trail Hippo GetTogether 2014 Parameters Recommendations 1 hyper-param Personalization idem NICE!
  • 33. follow the Hippo trail Hippo GetTogether 2014 Questions?