Click-through filter is a relatively simple, well-constrained and flexible method for improving query returns using clickstream data. This presentation gives a brief overview of what it does, including some evidence of it's effectiveness.
Method detail described at http://www.slideshare.net/pontneo/click-through-filter
Analysis & demo (MySQL implementation using medical clickstream data) at http://www.slideshare.net/pontneo/click-through-filterprototyperesultsv2
Solr implementation described at http://www.slideshare.net/pontneo/better-search-implementation-of-click-through-filter-as-a-query-parser-plugin-for-apache-solr-lucene
If you're interested in testing the filter on your site or clickstream data, feel free to contact me or leave a comment.
2. What is {!ctf}?
A relatively simple way to make query returns more efficient, dynamic
and intelligent
Developed to improve results for a medical research database
based in SW UK - Trip (www.tripdatabase.com)
Uses the flow of people through content to help
• organise, extend & filter the items in query returns
• share & develop collective intelligence
• adapt naturally as things change
⇒ Implicit self-organisation without complex data, models or processing
3. {!ctf} stands for click-through filter
Click data
• reflects intelligent choices
• integrates community activity
• adds useful dimensions to content
For example, counting clicks on items and integrating over time
results in “Most Viewed” sections often seen on websites
These reflect collective behaviour and enable a dynamic connection
between users and content
Here’s one for “Rheumatology” documents on Trip varying in response
to click traffic at the beginning of 2015 …
4. Click data also identify the paths users take between items (sometimes
referred to as clickstream)
⇒ Lots of user click traffic connecting items
⇒ Forming a dynamic, bottom-up “Knowledge Map”
⇒ Useful source of time dependent intelligence …
{!ctf} extends this idea
25% - connect items in the same return
70% - connect to items not in the same return
15% - user modifies search query
55% - intersecting (i.e. related) searches
5% - from single click sessions
95% - from multiple click sessions (Ñ = 7)
For example, of all clicks in Trip search returns:
Traffic between items can be recorded, counted, integrated over time,
filtered etc.
5. Applications: recommendations
These connections can usefully extend queries
For example, {!ctf} personal recommendations use
• a list of items a user has visited
• & movement of other users to/from these items
to identify and rank interesting related items the user hasn’t visited
⇒ Responsive list of intelligent recommendations without
complex models, data or processing …
6. 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.2 0.4 0.6 0.8 1
cg ({!ctf} specificity param)
MeanAveragePrecision
all traffic
to/from high ratings only
high ratings, same demographic
most popular items
{!ctf} on the famous Movielens 1M dataset produces comparable
precision, recall and ranking to more complex methods (see Lu, L. et
al. (2012). Recommender Systems. Physics Reports, 519(1), pp.1-49)
How good are {!ctf} recommendations?
Mean Average Precision of top 100 recommendations calculated from each user’s 5 highest rated
items, for some different components of traffic and at different values of cg. Cg reduces the ranking of
widely connected, generally popular items, boosting more specifically connected recommendations.
Source of Recommendations
7. How good are {!ctf} recommendations?
BUT, high precision means boring & pointless recommendations
For movies, the “interesting” recommendations are those connected by enough (but not too much)
click traffic. Hence, lower precision can produce better results. An active feedback loop helps
dynamic, shared interest communities find the right balance themselves.
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
cg ({!ctf} specificity param)
fractionoftop100recommendations
below average rating
stuff user
already knows
obscure
interesting
8. Because click rates diminish extremely quickly down a list, there are
large improvements by getting the top of a search return “right”
Applications: search
Click data helps to improve search efficiency
The following {!ctf} search
• boosts recently more visited items
• injects items strongly related to returned items by lots of
recent click traffic (i.e. “recommendations”)
• responds to changing (in this case, unfiltered) click traffic
over time
This helps improve both precision and recall, again without complex
models, data or processing …
9. How good is {!ctf} search?
Overall average on Trip (before feedback & at various {!ctf}
settings) is 1.83 and does not fall below 1.
On Trip data, 1.5 to 4+ times more efficient than the underlying text
match algorithm (simple, unboosted TF-IDF on titles)
See http://www.slideshare.net/pontneo/click-through-
filterprototyperesultsv2 for details
These dynamics are not noise …simply a direct reflection of the active
interests in the relevant part of the information space
10. 0
01/01/2014 11/04/2014 20/07/2014 28/10/2014 05/02/2015
“Ebola” interest on Google Trends
(../News/Health/Infectious Diseases)
A simple illustration: Ebola in 2014
11. 0
01/01/2014 11/04/2014 20/07/2014 28/10/2014 05/02/2015
Clicks on “Ebola” documents in Trip
(a tiny signal in the total Trip traffic)
A simple illustration: Ebola in 2014
12. 0
01/01/2014 11/04/2014 20/07/2014 28/10/2014 05/02/2015
“Ebola” documents in top 10 {!ctf} search results
for “hemorrhagic fever” on Trip (sum relevance)
– over 60% as “recommendations”
A simple illustration: Ebola in 2014
13. Low level, high frequency collaborative method that “naturally” brings
users and information together at the right level
Modifying query responses using click data =>
• clear efficiency improvements
• intelligent, responsive content delivery
• efficient knowledge sharing
Conclusions
For more detail see “Click-Through Filter” e.g.
http://www.slideshare.net/pontneo/better-search-implementation-of-
click-through-filter-as-a-query-parser-plugin-for-apache-solr-lucene
Forms an implicit self-organising feedback loop that results in continual
evolution of responses & communities without complex data or methods