Copyright © President & Fellows of Harvard College.
Ravi Mynampaty
Categorizing Your Search Queries to Improve Findability
About this talk…
 Case study on how we are improving search and
browse by performing clustering exercises on search
query data
 Not rocket science
 High-level overview
 You can follow this method, with your own insights and
tweaks
 You can kick this off next week at your work
Inspired by…
• Chapters 8 & 9
• The power of incrementalism
What is clustering?
A process for organizing and analyzing search log
data that:
 Is repeatable, low-cost, scalable, simple
 Yields actionable results
 Supports constant incremental improvement
to search
What’s clustering good for?
 Ensure results for high frequency queries
 Improve Metadata and Taxonomy
 Inform and validate decision making in site IA
 Informs editorial/curatorial activities
 Provides Feedback for Search Suggestions
o Autosuggest, synonym lists, no-hits page
suggestions
 But more on this later...
So how do I cluster search queries?
A simple set of steps
Create
query report
Cluster
queries
Determine #
queries to
analyze
Analyze
clusters
Draw
conclusions
and ACT
Step 1: Create a query report
We started with the site with the most traffic
• Upper-bound limit
• One year’s data by quarter
• Cut off tail at frequency < 10
Step 1: Create a query report
We started with the site with the most traffic
• Upper-bound limit
• One year’s data by quarter
• Cut off tail at frequency < 10
Step 1: Create a query report
We started with the site with the most traffic
• Upper-bound limit
• One year’s data by quarter
• Cut off tail at frequency < 10
HBS Working Knowledge FY12 Use Snapshot
Overall Traffic
Page Views: 6,439,485
Visits: 3,635,746
Unique visitors: 2,734,620
On-site searches: 174,425
Views per Visit: 1.77
Local Search visit rate: 5%
Organic Search visit rate: 46%
Step 2: Cluster the queries
Step 2 (cont’d): Three levels of clustering
Level Method Example
Narrow Simple
normalization
Eliminate
grammatical,
spelling, typos, and
punctuation
differences
Mid-level Group by subject management,
finance, decision
making
Broad Group by facet topic, name, date,
content type
Step 2 (cont’d): Levels  Tasks Enabled
Level Improve your
base for
query
analysis
Ensure
representation
of major
clusters on your
site
Improve
Metadata/Index
/Taxonomy
Improve
Search
Suggestions
Narrow
(simple)
X X X
Mid-level
(group by
subject)
X X X
Broad
(group by
facet)
X X
Step 2 (cont’d): Narrow Clustering Example
Step 2 (cont’d): Mid-level Example
Cluster brand
branding 245
brand 160
brand management 73
consumer branding 57
global brand 32
service brands 24
brand image retail bank 17
employer branding 16
brand management professional
services 16
global branding 13
b2b branding 13
importance of branding 12
brand 2002 12
brand equity 11
brand image 11
Step 2 (cont’d): Broad Clustering Example
Step 2 (cont’d): List of facets we used
Facet Example
content type
case studies, cases, working papers, articles,
newspaper
date 2011, world in 2030
demographic characteristics women, Gen Y, gender, baby boomers
event economic crisis
format podcast, video
geographic area india, japan, mount everest
industry global wine industry
job type/role
independent director, entrepreneur, ceo, phd
economist
organization name ikea, zara, toyota
person name michael porter, kanter, sebenius
product name / brand name ipad
product/commodity coffee, wine, cement
topic this covers the majority of keywords
work
faculty work, ex: publication name, title of a
case
Step 3: Choose #clusters to analyze
Number of
Clusters
Analyzed
Analyze Top Hits Improve Metadata/
Taxonomy
/Index
Supply Search
Suggestions
50 X
150 X X
300+ X X X
Small # Clusters can cover a lot of your data
Number of top clusters % Total Queries
Top 20 clusters 14
Top 30 clusters 18
Top 50 clusters 26
Top 100 clusters 37
Now you have your clusters…
What do you do with them?
TAKE ACTION!
Analyze Top (“Short Head”) Clusters
Clustering has created a condensed and reliable
list of your top search queries
 Are they what you thought they would be?
 Does the information on your site accurately
represent the top searches?
 Are you fulfilling user needs?
Use your clusters: Improve Site Navigation
Examine the short-head of clusters, basically:
 For each cluster, add up the frequencies
of queries
 Reorder clusters by cumulative frequency
descending
 Ensure top clusters are accounted for in your
navigation
 Use cluster topics as browse/navigation
headers/footers for your website
WK Top Clusters
Cluster Frequency
innovation 867
balanced scorecard 794
leadership 570
cases 545
social media 508
negotiation 470
knowledge management 457
ethics 448
apple 430
corporate social responsibility 398
Use your clusters: Improve Taxonomy
• Missing categories in browse taxonomy
• "Balanced Scorecard"
• “Ethics”
• “Social media”
• Second-level topics in the WK context
Use your clusters: Improve Taxonomy
• Missing categories in browse taxonomy
• "Balanced Scorecard"
• “Ethics”
• “Social media”
• Second-level topics in the WK context
Use your clusters: Improve Taxonomy
• Missing categories in browse taxonomy
• "Balanced Scorecard"
• “Ethics”
• “Social media”
• Second-level topics in the WK context
Use your clusters: Improve Taxonomy
• Missing categories in browse taxonomy
• "Balanced Scorecard"
• “Ethics”
• “Social media”
• Second-level topics in the WK context
Mid-level clustering:
Informs editorial /curatorial activities
 “Featured Topics”
o What topics to highlight this week/month/year
o News items to focus on
o What research guides to create
o How to formulate queries for the topics
How about improving search?
 Clustered list provides synonyms for taxonomy
 Requires human judgment and
standards/guidelines for synonyms – in our
case, synonyms are exact
 Map to one "like term" in the search engine
Example:
Balanced Scorecard, BSC, Balanced score card
kaplan and norton -> Balanced Scorecard
Use your clusters: Improve no-hits page
Time Commitment
• 2 hours to 2 weeks
• Variables include:
• What kind of information you want to gather
• How broad or narrow you want your clusters
• How many queries you analyze
• In our case ~2 person-weeks
Results vs. Time Invested
Analyze top
clusters
Update
Taxonomy
Create New
Metadata
Determine
New Search
Suggestions
2 Hours X X
6 Hours X X X
One Week X X X X
Next Steps: Autosuggest
 Your top clusters probably make up a large
percentage of what people are looking for
o Use them to establish/supplement
auto-suggest!
Example: suggestions for “innovation”
o innovation and leadership
o disruptive innovation
o innovation management
o open innovation
Next Steps: New Access Structures
 Needed an obvious way to search podcasts
o Put in best bets for now
 A lot of people searching for article titles
o Considering simple interface/approach for select
field-specific search, e.g. “title”
 Consider adding other facets to browse
taxonomy where we have entities tagged
o “company name”, “job type/class”, etc.
Summary
 Established plan/process, but be willing to tweak
as you go
 Keep it very simple.
 Play with your data – the more we played, the better
we understood what benefits could be realized by
levels of clustering and effort
 Tuning process/results
o Build staging/working prototypes
o Repeat process on other sites
Thank you! And remember…TAKE ACTION!
Kropla drąży skalę !
Questions?
searchguy@hbs.edu
@ravimynampaty
http://www.slideshare.net/mynampaty/

Clustering as presented at UX Poland 2013

  • 1.
    Copyright © President& Fellows of Harvard College. Ravi Mynampaty Categorizing Your Search Queries to Improve Findability
  • 2.
    About this talk… Case study on how we are improving search and browse by performing clustering exercises on search query data  Not rocket science  High-level overview  You can follow this method, with your own insights and tweaks  You can kick this off next week at your work
  • 3.
    Inspired by… • Chapters8 & 9 • The power of incrementalism
  • 4.
    What is clustering? Aprocess for organizing and analyzing search log data that:  Is repeatable, low-cost, scalable, simple  Yields actionable results  Supports constant incremental improvement to search
  • 5.
    What’s clustering goodfor?  Ensure results for high frequency queries  Improve Metadata and Taxonomy  Inform and validate decision making in site IA  Informs editorial/curatorial activities  Provides Feedback for Search Suggestions o Autosuggest, synonym lists, no-hits page suggestions  But more on this later...
  • 6.
    So how doI cluster search queries? A simple set of steps Create query report Cluster queries Determine # queries to analyze Analyze clusters Draw conclusions and ACT
  • 7.
    Step 1: Createa query report We started with the site with the most traffic • Upper-bound limit • One year’s data by quarter • Cut off tail at frequency < 10
  • 8.
    Step 1: Createa query report We started with the site with the most traffic • Upper-bound limit • One year’s data by quarter • Cut off tail at frequency < 10
  • 9.
    Step 1: Createa query report We started with the site with the most traffic • Upper-bound limit • One year’s data by quarter • Cut off tail at frequency < 10 HBS Working Knowledge FY12 Use Snapshot Overall Traffic Page Views: 6,439,485 Visits: 3,635,746 Unique visitors: 2,734,620 On-site searches: 174,425 Views per Visit: 1.77 Local Search visit rate: 5% Organic Search visit rate: 46%
  • 10.
    Step 2: Clusterthe queries
  • 11.
    Step 2 (cont’d):Three levels of clustering Level Method Example Narrow Simple normalization Eliminate grammatical, spelling, typos, and punctuation differences Mid-level Group by subject management, finance, decision making Broad Group by facet topic, name, date, content type
  • 12.
    Step 2 (cont’d):Levels  Tasks Enabled Level Improve your base for query analysis Ensure representation of major clusters on your site Improve Metadata/Index /Taxonomy Improve Search Suggestions Narrow (simple) X X X Mid-level (group by subject) X X X Broad (group by facet) X X
  • 13.
    Step 2 (cont’d):Narrow Clustering Example
  • 14.
    Step 2 (cont’d):Mid-level Example Cluster brand branding 245 brand 160 brand management 73 consumer branding 57 global brand 32 service brands 24 brand image retail bank 17 employer branding 16 brand management professional services 16 global branding 13 b2b branding 13 importance of branding 12 brand 2002 12 brand equity 11 brand image 11
  • 15.
    Step 2 (cont’d):Broad Clustering Example
  • 16.
    Step 2 (cont’d):List of facets we used Facet Example content type case studies, cases, working papers, articles, newspaper date 2011, world in 2030 demographic characteristics women, Gen Y, gender, baby boomers event economic crisis format podcast, video geographic area india, japan, mount everest industry global wine industry job type/role independent director, entrepreneur, ceo, phd economist organization name ikea, zara, toyota person name michael porter, kanter, sebenius product name / brand name ipad product/commodity coffee, wine, cement topic this covers the majority of keywords work faculty work, ex: publication name, title of a case
  • 17.
    Step 3: Choose#clusters to analyze Number of Clusters Analyzed Analyze Top Hits Improve Metadata/ Taxonomy /Index Supply Search Suggestions 50 X 150 X X 300+ X X X
  • 18.
    Small # Clusterscan cover a lot of your data Number of top clusters % Total Queries Top 20 clusters 14 Top 30 clusters 18 Top 50 clusters 26 Top 100 clusters 37
  • 19.
    Now you haveyour clusters… What do you do with them? TAKE ACTION!
  • 20.
    Analyze Top (“ShortHead”) Clusters Clustering has created a condensed and reliable list of your top search queries  Are they what you thought they would be?  Does the information on your site accurately represent the top searches?  Are you fulfilling user needs?
  • 21.
    Use your clusters:Improve Site Navigation Examine the short-head of clusters, basically:  For each cluster, add up the frequencies of queries  Reorder clusters by cumulative frequency descending  Ensure top clusters are accounted for in your navigation  Use cluster topics as browse/navigation headers/footers for your website
  • 22.
    WK Top Clusters ClusterFrequency innovation 867 balanced scorecard 794 leadership 570 cases 545 social media 508 negotiation 470 knowledge management 457 ethics 448 apple 430 corporate social responsibility 398
  • 23.
    Use your clusters:Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 24.
    Use your clusters:Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 25.
    Use your clusters:Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 26.
    Use your clusters:Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 27.
    Mid-level clustering: Informs editorial/curatorial activities  “Featured Topics” o What topics to highlight this week/month/year o News items to focus on o What research guides to create o How to formulate queries for the topics
  • 28.
    How about improvingsearch?  Clustered list provides synonyms for taxonomy  Requires human judgment and standards/guidelines for synonyms – in our case, synonyms are exact  Map to one "like term" in the search engine Example: Balanced Scorecard, BSC, Balanced score card kaplan and norton -> Balanced Scorecard
  • 29.
    Use your clusters:Improve no-hits page
  • 30.
    Time Commitment • 2hours to 2 weeks • Variables include: • What kind of information you want to gather • How broad or narrow you want your clusters • How many queries you analyze • In our case ~2 person-weeks
  • 31.
    Results vs. TimeInvested Analyze top clusters Update Taxonomy Create New Metadata Determine New Search Suggestions 2 Hours X X 6 Hours X X X One Week X X X X
  • 32.
    Next Steps: Autosuggest Your top clusters probably make up a large percentage of what people are looking for o Use them to establish/supplement auto-suggest! Example: suggestions for “innovation” o innovation and leadership o disruptive innovation o innovation management o open innovation
  • 33.
    Next Steps: NewAccess Structures  Needed an obvious way to search podcasts o Put in best bets for now  A lot of people searching for article titles o Considering simple interface/approach for select field-specific search, e.g. “title”  Consider adding other facets to browse taxonomy where we have entities tagged o “company name”, “job type/class”, etc.
  • 34.
    Summary  Established plan/process,but be willing to tweak as you go  Keep it very simple.  Play with your data – the more we played, the better we understood what benefits could be realized by levels of clustering and effort  Tuning process/results o Build staging/working prototypes o Repeat process on other sites
  • 35.
    Thank you! Andremember…TAKE ACTION! Kropla drąży skalę ! Questions? searchguy@hbs.edu @ravimynampaty http://www.slideshare.net/mynampaty/