SlideShare a Scribd company logo
1 of 23
Download to read offline
Personalized Search
Building a prototype to infer the user's interest
Tom Burgmans
Technology Product Owner Search
Wolters Kluwer
April 28th 2022
2
Make search better through Personalization….a controversial topic
If you know the users,
you could improve
their search
There is too little
traffic to build user
profiles
Our competitors claim
to have it too…
Our users don’t need it,
they work on different
cases every day
Our customers expect
our search to learn
what they need
Our search is already
personal…via
subscription filters
This could destroy
the user’s trust
3
4
Personalized search is web based search results that are tailored
specifically to an individual's interests by incorporating information
about the individual beyond the specific query provided.
Wikipedia says
What could make search “personal”
Search
bonus depreciation
Activity
• Past queries
• Document interactions
• Saved preferences
• Filter actions
Demographics
• Nationality
• Gender
• Age
• Profession
Context
• Location
• Date/time
• Device
• Weather
Social
• Network connections
• Loyalty
• Shared information
Factors to infer the user’s interest
5
What could make search “personal” (this PoC)
Search
bonus depreciation
Activity
• Past queries
• Document interactions
• Saved preferences
• Filter actions
Context
Demographics
Social
• Nationality
• Gender
• Age
• Profession
• Location
• Date/time
• Device
• Weather
• Network connections
• Loyalty
• Shared information
Factors to infer the user’s interest
6
Hypothesis
The basis for Personalized Search is a Recommendation Engine which is based on an index of
user activity that allows to infer the user’s interest via collaborative filtering.
likes:
Recommended
likes:
Similar
interests:
like
s:
like
s:
likes:
likes:
Everybody
else:
????
7
Collaborative filtering & anomaly detection
• Find users with similar interests
• Recommend items with an unusually high presence in this group compared to the rest
likes:
Similar
interests:
likes: likes: likes: likes: likes:
8
very similar users not so similar
Collaborative filtering & selecting the foreground
• Find only users with very similar interests
• Balance between similarity and volume
9
foreground background
10
What is the ideal time window to infer the user’s interest?
year
“session”
1
2
3
4
5
last N actions
11
1 2 3
4 5
Example extremely temporary personalization
12
How to apply the inferred user’s interest?
personal
recommendations
query
boost
results
personal
recommendations
query
filter
results
personal
recommendations
query
results
results
personal
recommendations
alerts
as a boost:
as a filter:
personal
recommendations
query
autocomplete
as query suggestions:
as a separate result set:
as alerts:
personal
recommendations
results
as interesting items:
Goals of this PoC
How to build it?
What can we do on our existing technology stack?
In what cases to apply it?
Where is the need for personalization? What form is needed?
What data do we need?
Define ‘additional’ data sources; cleaning of noise
How to measure success?
Can we test/tune it offline? How to A/B test online?
• Technology
• Need
• Data
• Verification
13
14
Finding a technology for anomaly detection
JLH score in Elastic = (fg_percentage - bg_percentage) * (fg_percentage / bg_percentage)
foreground background
Significant Terms
15
Finding a technology for anomaly detection
Solr's SignificantTerms score = (log(freq_foreground)+1) * (log( (numDocs + 1)/(freq_background + 1) ) + 1)
foreground background
Significant Terms
(SignificantTermsStream.java)
Stream source
16
Finding a technology for anomaly detection
fq_count - fg_size * bg_prob
Z-score = ---------------------------------------
sqrt(fg_size * bg_prob * (1 - bg_prob))
foreground background
Reletedness() JSON Facet function
bg_prob = bg_count / bg_size
fg_prob = fg_count / fg_size
a.k.a. Semantic Knowledge Graph
(RelatednessAgg.java)
Selecting similar users
Wrapping mlt into mltplus
very similar users not so similar
17
• MLT standard Solr query parser to find all similar documents
• MLTPLUS home made query parser to only keep documents that are very similar.
18
Design recommendation engine
Source: Wolters Kluwer Navigator 2021 customer usage data.
Recommendation engine
88.208 users
Year’s activity
per user
• documents
• publications
• practice areas
• document types
• queries
recommends
given a user’s
year
• userid (encrypted)
• Time stamp
• behavior:
• # of doc views
• Viewed documents
• Publications
• doc types
• practice areas
• The strongly liked documents
• Favorites
• Prints
• Downloads
• email
• Queries
• The applied filters
Recommendation engine
2.582.900 sessions
Activity
per session
• documents
• publications
• practice areas
• document types
• queries
recommends
given a user’s
(previous) session
given a user’s
last N actions
DWH
usage
19
88.208 users
Recommendation engine
2.582.900 sessions
Personalization Service
Demo app
user’s (last) year ID
user’s (last) session ID
user’s current actions
Recommended
• documents
• publications
• practice areas
• document types
• queries
Personalization prototype
content of personal folder
long-term personalization
short-term personalization
Very short-term personalization
20
What is the need for personalization based on a year history ?
32% users > 20 actions / year 68% users < 20 actions / year
From the 32% most active users:
66% have viewed (relatively)
consistently documents of the
same 2 practice areas.
So for 21% of all users
recommending PAs based on the
activity of past year might make
sense.
How consistent is the user’s interest through the year?
21
What is the need for personalization based on the last session ?
For 42% of the users the
previous session has a 0.9-1.0
consine overlap with the
current one w.r.t. PA interest.
This rises to 66% when the time
between sessions get shorter.
How much does the last session predict the interest of the next one?
22
How to measure success?
• Offline:
• Manual tests by a domain expert
• Build a model that predicts the likelihood that the previous session predicts the
interest of the current one.
• Replay user sessions and guestimate how much shorter they could have been in
case personalization would have been applied.
• Online:
• Measure clicks on explicit recommendations
• A/B test (in case recommendations applied as boosts/filters)
23
Lessons learned
The accuracy of the
recommendations
depends on the
purity/completeness
of the usage logs.
Preferred: front-end
usage logging
Quality logs needed
Improvement
opportunities:
- Use frond-end logs
- Better data cleaning
- Query normalization
- Add more signals like
filters &
autocomplete clicks
- Testing strategies
We’re not done yet
Offline testing of
personalization is
hard, but slightly
easier for the short-
term variant.
Online testing takes
time.
Challenge: testing
Short-term
personalization is
applicable to more
users than long-term
personalization
(may differ per
business case)
What works best
We can indeed infer
the user’s interest via
collaborative filtering
of user actions, using
Solr and using usage
data from a large
Wolters Kluwer
product
Yes we can

More Related Content

Similar to Personalized Search-Building a prototype to infer the user's interest

Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and ProfitLouis Rosenfeld
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Marianne Sweeny
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Marianne Sweeny
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
CUTGroup Detroit Slides for CUTGroup Collective Call
CUTGroup Detroit Slides for CUTGroup Collective CallCUTGroup Detroit Slides for CUTGroup Collective Call
CUTGroup Detroit Slides for CUTGroup Collective CallSmart Chicago Collaborative
 
Data-Driven Design for User Experience
Data-Driven Design for User Experience Data-Driven Design for User Experience
Data-Driven Design for User Experience Emi Kwon
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...Pieter De Leenheer
 
Agent technology for e commerce-recommendation systems
Agent technology for e commerce-recommendation systemsAgent technology for e commerce-recommendation systems
Agent technology for e commerce-recommendation systemsAravindharamanan S
 
Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)
Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)
Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)webdagene
 
Site Search Analytics in a Nutshell
Site Search Analytics in a NutshellSite Search Analytics in a Nutshell
Site Search Analytics in a NutshellLouis Rosenfeld
 
Discovering Temporal Hidden Contexts in Web Sessions for User Trail Prediction
Discovering Temporal Hidden Contexts in Web Sessions for User Trail PredictionDiscovering Temporal Hidden Contexts in Web Sessions for User Trail Prediction
Discovering Temporal Hidden Contexts in Web Sessions for User Trail PredictionJulia Kiseleva
 
Context Mining and Integration in Web Predictive Analytics
Context Mining and Integration in Web Predictive AnalyticsContext Mining and Integration in Web Predictive Analytics
Context Mining and Integration in Web Predictive AnalyticsJulia Kiseleva
 
Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Matt Dusig
 
Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Matt Dusig
 
User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks Shah Alam Sabuj
 
The talk at Twente University on 28 July 2014
The talk at Twente University on 28 July 2014 The talk at Twente University on 28 July 2014
The talk at Twente University on 28 July 2014 Julia Kiseleva
 

Similar to Personalized Search-Building a prototype to infer the user's interest (20)

Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and Profit
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Tech Essentials - UP Edition
Tech Essentials - UP EditionTech Essentials - UP Edition
Tech Essentials - UP Edition
 
CUTGroup Detroit Slides for CUTGroup Collective Call
CUTGroup Detroit Slides for CUTGroup Collective CallCUTGroup Detroit Slides for CUTGroup Collective Call
CUTGroup Detroit Slides for CUTGroup Collective Call
 
Data-Driven Design for User Experience
Data-Driven Design for User Experience Data-Driven Design for User Experience
Data-Driven Design for User Experience
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...
 
Agent technology for e commerce-recommendation systems
Agent technology for e commerce-recommendation systemsAgent technology for e commerce-recommendation systems
Agent technology for e commerce-recommendation systems
 
Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)
Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)
Louis Rosenfeld: Nettstedssøk i et nøtteskall (Webdagene 2013)
 
Site Search Analytics in a Nutshell
Site Search Analytics in a NutshellSite Search Analytics in a Nutshell
Site Search Analytics in a Nutshell
 
Ch 3
Ch   3Ch   3
Ch 3
 
Discovering Temporal Hidden Contexts in Web Sessions for User Trail Prediction
Discovering Temporal Hidden Contexts in Web Sessions for User Trail PredictionDiscovering Temporal Hidden Contexts in Web Sessions for User Trail Prediction
Discovering Temporal Hidden Contexts in Web Sessions for User Trail Prediction
 
Context Mining and Integration in Web Predictive Analytics
Context Mining and Integration in Web Predictive AnalyticsContext Mining and Integration in Web Predictive Analytics
Context Mining and Integration in Web Predictive Analytics
 
Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!
 
Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!
 
User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks
 
The talk at Twente University on 28 July 2014
The talk at Twente University on 28 July 2014 The talk at Twente University on 28 July 2014
The talk at Twente University on 28 July 2014
 
Role of Data Science in eCommerce
Role of Data Science in eCommerceRole of Data Science in eCommerce
Role of Data Science in eCommerce
 

Recently uploaded

SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comsaastr
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘 謝
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Escort Service
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxnoorehahmad
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.KathleenAnnCordero2
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 

Recently uploaded (20)

SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 

Personalized Search-Building a prototype to infer the user's interest

  • 1. Personalized Search Building a prototype to infer the user's interest Tom Burgmans Technology Product Owner Search Wolters Kluwer April 28th 2022
  • 2. 2
  • 3. Make search better through Personalization….a controversial topic If you know the users, you could improve their search There is too little traffic to build user profiles Our competitors claim to have it too… Our users don’t need it, they work on different cases every day Our customers expect our search to learn what they need Our search is already personal…via subscription filters This could destroy the user’s trust 3
  • 4. 4 Personalized search is web based search results that are tailored specifically to an individual's interests by incorporating information about the individual beyond the specific query provided. Wikipedia says
  • 5. What could make search “personal” Search bonus depreciation Activity • Past queries • Document interactions • Saved preferences • Filter actions Demographics • Nationality • Gender • Age • Profession Context • Location • Date/time • Device • Weather Social • Network connections • Loyalty • Shared information Factors to infer the user’s interest 5
  • 6. What could make search “personal” (this PoC) Search bonus depreciation Activity • Past queries • Document interactions • Saved preferences • Filter actions Context Demographics Social • Nationality • Gender • Age • Profession • Location • Date/time • Device • Weather • Network connections • Loyalty • Shared information Factors to infer the user’s interest 6
  • 7. Hypothesis The basis for Personalized Search is a Recommendation Engine which is based on an index of user activity that allows to infer the user’s interest via collaborative filtering. likes: Recommended likes: Similar interests: like s: like s: likes: likes: Everybody else: ???? 7
  • 8. Collaborative filtering & anomaly detection • Find users with similar interests • Recommend items with an unusually high presence in this group compared to the rest likes: Similar interests: likes: likes: likes: likes: likes: 8
  • 9. very similar users not so similar Collaborative filtering & selecting the foreground • Find only users with very similar interests • Balance between similarity and volume 9 foreground background
  • 10. 10 What is the ideal time window to infer the user’s interest? year “session” 1 2 3 4 5 last N actions
  • 11. 11 1 2 3 4 5 Example extremely temporary personalization
  • 12. 12 How to apply the inferred user’s interest? personal recommendations query boost results personal recommendations query filter results personal recommendations query results results personal recommendations alerts as a boost: as a filter: personal recommendations query autocomplete as query suggestions: as a separate result set: as alerts: personal recommendations results as interesting items:
  • 13. Goals of this PoC How to build it? What can we do on our existing technology stack? In what cases to apply it? Where is the need for personalization? What form is needed? What data do we need? Define ‘additional’ data sources; cleaning of noise How to measure success? Can we test/tune it offline? How to A/B test online? • Technology • Need • Data • Verification 13
  • 14. 14 Finding a technology for anomaly detection JLH score in Elastic = (fg_percentage - bg_percentage) * (fg_percentage / bg_percentage) foreground background Significant Terms
  • 15. 15 Finding a technology for anomaly detection Solr's SignificantTerms score = (log(freq_foreground)+1) * (log( (numDocs + 1)/(freq_background + 1) ) + 1) foreground background Significant Terms (SignificantTermsStream.java) Stream source
  • 16. 16 Finding a technology for anomaly detection fq_count - fg_size * bg_prob Z-score = --------------------------------------- sqrt(fg_size * bg_prob * (1 - bg_prob)) foreground background Reletedness() JSON Facet function bg_prob = bg_count / bg_size fg_prob = fg_count / fg_size a.k.a. Semantic Knowledge Graph (RelatednessAgg.java)
  • 17. Selecting similar users Wrapping mlt into mltplus very similar users not so similar 17 • MLT standard Solr query parser to find all similar documents • MLTPLUS home made query parser to only keep documents that are very similar.
  • 18. 18 Design recommendation engine Source: Wolters Kluwer Navigator 2021 customer usage data. Recommendation engine 88.208 users Year’s activity per user • documents • publications • practice areas • document types • queries recommends given a user’s year • userid (encrypted) • Time stamp • behavior: • # of doc views • Viewed documents • Publications • doc types • practice areas • The strongly liked documents • Favorites • Prints • Downloads • email • Queries • The applied filters Recommendation engine 2.582.900 sessions Activity per session • documents • publications • practice areas • document types • queries recommends given a user’s (previous) session given a user’s last N actions DWH usage
  • 19. 19 88.208 users Recommendation engine 2.582.900 sessions Personalization Service Demo app user’s (last) year ID user’s (last) session ID user’s current actions Recommended • documents • publications • practice areas • document types • queries Personalization prototype content of personal folder long-term personalization short-term personalization Very short-term personalization
  • 20. 20 What is the need for personalization based on a year history ? 32% users > 20 actions / year 68% users < 20 actions / year From the 32% most active users: 66% have viewed (relatively) consistently documents of the same 2 practice areas. So for 21% of all users recommending PAs based on the activity of past year might make sense. How consistent is the user’s interest through the year?
  • 21. 21 What is the need for personalization based on the last session ? For 42% of the users the previous session has a 0.9-1.0 consine overlap with the current one w.r.t. PA interest. This rises to 66% when the time between sessions get shorter. How much does the last session predict the interest of the next one?
  • 22. 22 How to measure success? • Offline: • Manual tests by a domain expert • Build a model that predicts the likelihood that the previous session predicts the interest of the current one. • Replay user sessions and guestimate how much shorter they could have been in case personalization would have been applied. • Online: • Measure clicks on explicit recommendations • A/B test (in case recommendations applied as boosts/filters)
  • 23. 23 Lessons learned The accuracy of the recommendations depends on the purity/completeness of the usage logs. Preferred: front-end usage logging Quality logs needed Improvement opportunities: - Use frond-end logs - Better data cleaning - Query normalization - Add more signals like filters & autocomplete clicks - Testing strategies We’re not done yet Offline testing of personalization is hard, but slightly easier for the short- term variant. Online testing takes time. Challenge: testing Short-term personalization is applicable to more users than long-term personalization (may differ per business case) What works best We can indeed infer the user’s interest via collaborative filtering of user actions, using Solr and using usage data from a large Wolters Kluwer product Yes we can