Clustering as presented at UX Poland 2013

Copyright © President & Fellows of Harvard College.
Ravi Mynampaty
Categorizing Your Search Queries to Improve Findability

About this talk…
 Case study on how we are improving search and
browse by performing clustering exercises on search
query data
 Not rocket science
 High-level overview
 You can follow this method, with your own insights and
tweaks
 You can kick this off next week at your work

Inspired by…
• Chapters 8 & 9
• The power of incrementalism

What is clustering?
A process for organizing and analyzing search log
data that:
 Is repeatable, low-cost, scalable, simple
 Yields actionable results
 Supports constant incremental improvement
to search

What’s clustering good for?
 Ensure results for high frequency queries
 Improve Metadata and Taxonomy
 Inform and validate decision making in site IA
 Informs editorial/curatorial activities
 Provides Feedback for Search Suggestions
o Autosuggest, synonym lists, no-hits page
suggestions
 But more on this later...

So how do I cluster search queries?
A simple set of steps
Create
query report
Cluster
queries
Determine #
queries to
analyze
Analyze
clusters
Draw
conclusions
and ACT

Step 1: Create a query report
We started with the site with the most traffic
• Upper-bound limit
• One year’s data by quarter
• Cut off tail at frequency < 10

Step 1: Create a query report
We started with the site with the most traffic
• Upper-bound limit
• One year’s data by quarter
• Cut off tail at frequency < 10
HBS Working Knowledge FY12 Use Snapshot
Overall Traffic
Page Views: 6,439,485
Visits: 3,635,746
Unique visitors: 2,734,620
On-site searches: 174,425
Views per Visit: 1.77
Local Search visit rate: 5%
Organic Search visit rate: 46%

Step 2 (cont’d): Three levels of clustering
Level Method Example
Narrow Simple
normalization
Eliminate
grammatical,
spelling, typos, and
punctuation
differences
Mid-level Group by subject management,
finance, decision
making
Broad Group by facet topic, name, date,
content type

Step 2 (cont’d): Levels  Tasks Enabled
Level Improve your
base for
query
analysis
Ensure
representation
of major
clusters on your
site
Improve
Metadata/Index
/Taxonomy
Improve
Search
Suggestions
Narrow
(simple)
X X X
Mid-level
(group by
subject)
X X X
Broad
(group by
facet)
X X

Step 2 (cont’d): Narrow Clustering Example

Step 2 (cont’d): Mid-level Example
Cluster brand
branding 245
brand 160
brand management 73
consumer branding 57
global brand 32
service brands 24
brand image retail bank 17
employer branding 16
brand management professional
services 16
global branding 13
b2b branding 13
importance of branding 12
brand 2002 12
brand equity 11
brand image 11

Step 2 (cont’d): Broad Clustering Example

Step 2 (cont’d): List of facets we used
Facet Example
content type
case studies, cases, working papers, articles,
newspaper
date 2011, world in 2030
demographic characteristics women, Gen Y, gender, baby boomers
event economic crisis
format podcast, video
geographic area india, japan, mount everest
industry global wine industry
job type/role
independent director, entrepreneur, ceo, phd
economist
organization name ikea, zara, toyota
person name michael porter, kanter, sebenius
product name / brand name ipad
product/commodity coffee, wine, cement
topic this covers the majority of keywords
work
faculty work, ex: publication name, title of a
case

Step 3: Choose #clusters to analyze
Number of
Clusters
Analyzed
Analyze Top Hits Improve Metadata/
Taxonomy
/Index
Supply Search
Suggestions
50 X
150 X X
300+ X X X

Small # Clusters can cover a lot of your data
Number of top clusters % Total Queries
Top 20 clusters 14
Top 30 clusters 18
Top 50 clusters 26
Top 100 clusters 37

Now you have your clusters…
What do you do with them?
TAKE ACTION!

Analyze Top (“Short Head”) Clusters
Clustering has created a condensed and reliable
list of your top search queries
 Are they what you thought they would be?
 Does the information on your site accurately
represent the top searches?
 Are you fulfilling user needs?

Use your clusters: Improve Site Navigation
Examine the short-head of clusters, basically:
 For each cluster, add up the frequencies
of queries
 Reorder clusters by cumulative frequency
descending
 Ensure top clusters are accounted for in your
navigation
 Use cluster topics as browse/navigation
headers/footers for your website

WK Top Clusters
Cluster Frequency
innovation 867
balanced scorecard 794
leadership 570
cases 545
social media 508
negotiation 470
knowledge management 457
ethics 448
apple 430
corporate social responsibility 398

Use your clusters: Improve Taxonomy
• Missing categories in browse taxonomy
• "Balanced Scorecard"
• “Ethics”
• “Social media”
• Second-level topics in the WK context

Mid-level clustering:
Informs editorial /curatorial activities
 “Featured Topics”
o What topics to highlight this week/month/year
o News items to focus on
o What research guides to create
o How to formulate queries for the topics

How about improving search?
 Clustered list provides synonyms for taxonomy
 Requires human judgment and
standards/guidelines for synonyms – in our
case, synonyms are exact
 Map to one "like term" in the search engine
Example:
Balanced Scorecard, BSC, Balanced score card
kaplan and norton -> Balanced Scorecard

Use your clusters: Improve no-hits page

Time Commitment
• 2 hours to 2 weeks
• Variables include:
• What kind of information you want to gather
• How broad or narrow you want your clusters
• How many queries you analyze
• In our case ~2 person-weeks

Results vs. Time Invested
Analyze top
clusters
Update
Taxonomy
Create New
Metadata
Determine
New Search
Suggestions
2 Hours X X
6 Hours X X X
One Week X X X X

Next Steps: Autosuggest
 Your top clusters probably make up a large
percentage of what people are looking for
o Use them to establish/supplement
auto-suggest!
Example: suggestions for “innovation”
o innovation and leadership
o disruptive innovation
o innovation management
o open innovation

Next Steps: New Access Structures
 Needed an obvious way to search podcasts
o Put in best bets for now
 A lot of people searching for article titles
o Considering simple interface/approach for select
field-specific search, e.g. “title”
 Consider adding other facets to browse
taxonomy where we have entities tagged
o “company name”, “job type/class”, etc.

Summary
 Established plan/process, but be willing to tweak
as you go
 Keep it very simple.
 Play with your data – the more we played, the better
we understood what benefits could be realized by
levels of clustering and effort
 Tuning process/results
o Build staging/working prototypes
o Repeat process on other sites

Thank you! And remember…TAKE ACTION!
Kropla drąży skalę !
Questions?
searchguy@hbs.edu
@ravimynampaty
http://www.slideshare.net/mynampaty/

Clustering as presented at UX Poland 2013

More Related Content

Similar to Clustering as presented at UX Poland 2013

More from Ravi Mynampaty

Recently uploaded

Clustering as presented at UX Poland 2013