SlideShare a Scribd company logo
Clustering Search Query Log Data to Improve Search

Sophy Bishop & Ravi Mynampaty

                                     Copyright © President & Fellows of Harvard College.
Agenda

     Background
     Five W’s of Clustering
      •   What, why, who, how, when
     Is it really repeatable?
     Questions
About Information Management Services (IMS)

                                   Analytics




           Lifecycle                                     Metadata
            Mgmt.                                         Mgmt.
                                - Standards
                                - Best Practices
                                - User Needs
                                - Service Models




                                                   Taxonomy
                       Search
                                                     Dev.
Inspired by…

Chapters 8 & 9
About this talk…

   Case study on how we are improving search and
    browse by performing clustering exercises on your
    search query data
   Not rocket science
   High-level overview
   You can follow this method, with your own insights and
    tweaks

   You can kick this off next week at your work
What is clustering?

A process for organizing and analyzing search log
data that:
   Is repeatable, low-cost, scalable, simple

   Yields actionable results
   Supports constant incremental improvement
    to search
What’s clustering good for?

   Ensure results for high frequency queries

   Improve Metadata and Taxonomy

   Inform and validate decision making in site IA

   Informs editorial/curatorial activities

   Provides Feedback for Search Suggestions
      o   Autosuggest, synonym lists, no-hits page
          suggestions
   But more on this later...
So how do I cluster search queries?

A simple set of steps
                                 Create
                               query report



               Draw
                                                  Cluster
            conclusions
                                                  queries
             and ACT




                                          Determine #
                    Analyze
                                           queries to
                    clusters
                                            analyze
Step 1: Create a query report

We started with the site with the most traffic
  • Upper-bound limit
  • One year’s data by quarter
  • Cut off tail at frequency < 10
Step 1: Create a query report

 We started with the site with the most traffic
     • Upper-bound limit
HBS Working Knowledge FY12 Use Snapshot
     • One year’s data by quarter
Overall Traffic
     • Cut off tail at frequency < 10
   Page Views:             6,439,485
   Visits:                 3,635,746
   Unique visitors:        2,734,620
   On-site searches:          174,425
   Views per Visit:              1.77
   Local Search visit rate:        5%
   Organic Search visit rate:     46%
Step 2: Cluster the queries
Step 2 (cont’d): Three levels of clustering
Level           Method             Example


Narrow          Simple             Eliminate
                normalization      grammatical,
                                   spelling, typos, and
                                   punctuation
                                   differences
Mid-level       Group by subject   management,
                                   finance, decision
                                   making
Broad           Group by facet     topic, name, date,
                                   content type
Step 2 (cont’d): Levels  Tasks Enabled

Level       Improve your   Ensure           Improve        Improve
            base for       representation   Metadata/Index Search
            query          of major         /Taxonomy      Suggestions
            analysis       clusters on your
                           site
Narrow           X                               X             X
(simple)

Mid-level                        X               X             X
(group by
subject)
Broad                            X               X
(group by
facet)
Step 2 (cont’d): Narrow Clustering Example
Step 2 (cont’d): Mid-level Example
Cluster                         brand
branding                                245
brand                                   160
brand management                         73
consumer branding                        57
global brand                             32
service brands                           24
brand image retail bank                  17
employer branding                        16
brand management professional
services                                16
global branding                         13
b2b branding                            13
importance of branding                  12
brand 2002                              12
brand equity                            11
brand image                             11
Cluster                         brand
Step 2 (cont’d): Mid-level Example
branding                                245
brand                                   160
brand management                         73
consumer branding                        57
global brand                             32
service brands                           24
brand image retail bank                  17
employer branding                        16
brand management professional
services                                 16
global branding                          13
b2b branding                             13
importance of branding                   12
brand 2002                               12
brand equity                             11
brand image                              11
Cluster                                  customer
                                             brand
 Step 2 (cont’d): Mid-level Example
350
      333
branding                                                                                    245
brand
300                                                                                         160
brand management                                                                                 73
250
consumer branding                                                                                57
global brand
200
                                                                                                 32
            179
service brands                                                                                   24
          145
brand image retail bank
150                                                                                              17
employer branding101
             111                                                                                 16
100
brand management professional
                     88

services                                                                                         16
 50                       40
global branding                26   26    25   20
                                                                                                 13
                                                    19   15   14   12   12   11   11   10   10    10
b2b branding
 0
                                                                                                 13
importance of branding                                                                           12
brand 2002                                                                                       12
brand equity                                                                                     11
brand image                                                                                      11
Step 2 (cont’d): Broad Clustering Example
Step 2 (cont’d): List of facets we used
Facet                           Example
                              case studies, cases, working papers, articles,
content type
                              newspaper
date                          2011, world in 2030
demographic characteristics   women, Gen Y, gender, baby boomers
event                         economic crisis
format                        podcast, video
geographic area               india, japan, mount everest
industry                      global wine industry
                              independent director, entrepreneur, ceo, phd
job type/role
                              economist
organization name             ikea, zara, toyota
person name                   michael porter, kanter, sebenius
product name / brand name      ipad
product/commodity             coffee, wine, cement
topic                         this covers the majority of keywords
                              faculty work, ex: publication name, title of a
work
                              case
Step 3: Choose #clusters to analyze
Number of   Analyze Top Hits   Improve Metadata/   Supply Search
Clusters                       Taxonomy            Suggestions
Analyzed                       /Index




50                  X


150                X                   X


300+               X                   X                  X
Small # Clusters can cover a lot of your data

  Number of top clusters     % Total Queries

Top 20 clusters                    14

Top 30 clusters                    18

Top 50 clusters                    26

Top 100 clusters                   37
Now you have your clusters…

What do you do with them?



           TAKE ACTION!
Analyze Top (“Short Head”) Clusters

Clustering has created a condensed and reliable
list of your top search queries
   Are they what you thought they would be?
   Does the information on your site accurately
    represent the top searches?
   Are you fulfilling user needs?
Use your clusters: Improve Site Navigation


Examine the short-head of clusters, basically:
   For each cluster, add up the frequencies
    of queries
   Reorder clusters by cumulative frequency
    descending
   Ensure top clusters are accounted for in your
    navigation
   Use cluster topics as browse/navigation
    headers/footers for your website
WK Top Clusters
Cluster                           Frequency
innovation                        867

balanced scorecard                794

leadership                        570
cases                             545

social media                      508

negotiation                       470

knowledge management              457
ethics                            448

apple                             430
corporate social responsibility   398
Use your clusters: Improve Taxonomy

•   Missing categories in browse taxonomy
    •   "Balanced Scorecard"
    •   “Ethics”
    •   “Social media”

•   Second-level topics in the WK context
Use your clusters: Improve Taxonomy

•   Missing categories in browse taxonomy
    •   "Balanced Scorecard"
    •   “Ethics”
    •   “Social media”

•   Second-level topics in the WK context
Use your clusters: Improve Taxonomy

•   Missing categories in browse taxonomy
    •   "Balanced Scorecard"
    •   “Ethics”
    •   “Social media”

•   Second-level topics in the WK context
Mid-level clustering:
Informs editorial /curatorial activities
   “Featured Topics”
     o  What topics to highlight this week/month/year
     o  News items to focus on
     o  What research guides to create
     o  How to formulate queries for the topics
Use your clusters: Improve Synonym Handling

   Clustered list provides synonyms for taxonomy
   Requires human judgment and
    standards/guidelines for synonyms – in our
    case, synonyms are exact
   Map to one "like term" in the search engine

    Example:
      Balanced Scorecard, BSC, Balanced score card
      kaplan and norton -> Balanced Scorecard
Use your clusters: Improve no-hits page
Time Commitment
•   2 hours to 2 weeks

•    Variables include:
    •   What kind of information you want to gather
    •   How broad or narrow you want your clusters
    •   How many queries you analyze

•   In our case ~2 person-weeks
    •   We had Sophy Bishop
    •   Intern, MSLIS student
Results vs. Time Invested

           Analyze top   Update     Create New   Determine
           clusters      Taxonomy   Metadata     New Search
                                                 Suggestions

2 Hours         X            X



6 Hours         X            X           X



One Week        X            X           X            X
Next Steps: Autosuggest
   Your top clusters probably make up a large
    percentage of what people are looking for
      o Use them to establish/supplement
         auto-suggest!

    Example: suggestions for “innovation”
      o   innovation and leadership
      o   disruptive innovation
      o   innovation management
      o   open innovation
Next Steps: New Access Structures

   Needed an obvious way to search podcasts
    o   Put in best bets for now
   A lot of people searching for article titles
    o   Considering simple interface/approach for select
        field-specific search, e.g. “title”
   Consider adding other facets to browse
    taxonomy where we have entities tagged
    o   “company name”, “job type/class”, etc.
Next Steps

   SEO Optimization Input
    o   Advise authors to use top cluster terms in Titles,
        Abstracts, Keywords
    o   Report on clusters in our monthly analytics reports
        to faculty (“Top search topics/subjects in May 2012
        were…” ; “Searchers found your works with
        following queries”)

   Repeat process on other sites/content
Summary
   Established plan/process, but be willing to tweak
    as you go

   Keep it very simple.
   Play with your data – the more we played, the better
    we understood what benefits could be realized by
    levels of clustering and effort
   Tuning process/results
     o Build staging/working prototypes
     o Repeat process on other sites

   TAKE ACTION!
Thank you!



               Questions?


       sophybishop@gmail.com @sophreads

      searchguy@hbs.edu @ravimynampaty

More Related Content

Similar to Clustering Search Log Data

5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit
5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit
5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit
David Rogers
 
Applying Design Thinking Principles in Product Management
Applying Design Thinking Principles in Product ManagementApplying Design Thinking Principles in Product Management
Applying Design Thinking Principles in Product Management
SVPMA
 
David rogers ingles_bloque_5_y_6
David rogers ingles_bloque_5_y_6David rogers ingles_bloque_5_y_6
David rogers ingles_bloque_5_y_6
UPN Universidad Privada del Norte
 
Energize 2013 slides
Energize 2013 slidesEnergize 2013 slides
Energize 2013 slides
Norris Krueger
 
Peru Marketing Symposium 2012 David Rogers
Peru Marketing Symposium 2012 David RogersPeru Marketing Symposium 2012 David Rogers
Peru Marketing Symposium 2012 David Rogers
David Rogers
 
ReformIS Capability Statement
ReformIS Capability StatementReformIS Capability Statement
ReformIS Capability Statement
jpmoynihan
 
Why care about brand management?
Why care about brand management?Why care about brand management?
Why care about brand management?
Brandworkz
 
Ddu for ap ms edit
Ddu for ap ms   editDdu for ap ms   edit
Ddu for ap ms edit
Christian Sutherland-Wong
 
Fall 2012 Info Session Slides
Fall 2012 Info Session SlidesFall 2012 Info Session Slides
Fall 2012 Info Session Slides
Jamie Thai
 
Software Product Management in Web 2.0
Software Product Management in Web 2.0Software Product Management in Web 2.0
Software Product Management in Web 2.0
Suhas Kelkar
 
Agile Prod Mgmt v. Proj Mgmt
Agile Prod Mgmt v. Proj MgmtAgile Prod Mgmt v. Proj Mgmt
Agile Prod Mgmt v. Proj Mgmt
Karen Spencer, CSM, CCA, CSTP
 
Citrix systems lnkd ms v3
Citrix systems lnkd ms v3Citrix systems lnkd ms v3
Citrix systems lnkd ms v3
cbmoore14
 
MRSC company presentation
MRSC company presentationMRSC company presentation
MRSC company presentation
Jo Fone
 
MRSC company presentation (U.S.)
MRSC company presentation (U.S.)MRSC company presentation (U.S.)
MRSC company presentation (U.S.)
Jo Fone
 
Brand rjvntr brochure
Brand rjvntr brochureBrand rjvntr brochure
Brand rjvntr brochure
Roy Wollen
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Christian Posse
 
Brand Asset Valuation
Brand Asset ValuationBrand Asset Valuation
Brand Asset Valuation
Chappy_02
 
Consumer Engagement in the Digital Age
Consumer Engagement in the Digital AgeConsumer Engagement in the Digital Age
Consumer Engagement in the Digital Age
Gregory Birgé
 
Market xcel profile
Market xcel profileMarket xcel profile
Market xcel profile
Alwin Samuel
 
Attractive branding Portfolio2010
Attractive branding Portfolio2010Attractive branding Portfolio2010
Attractive branding Portfolio2010
udimenda
 

Similar to Clustering Search Log Data (20)

5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit
5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit
5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit
 
Applying Design Thinking Principles in Product Management
Applying Design Thinking Principles in Product ManagementApplying Design Thinking Principles in Product Management
Applying Design Thinking Principles in Product Management
 
David rogers ingles_bloque_5_y_6
David rogers ingles_bloque_5_y_6David rogers ingles_bloque_5_y_6
David rogers ingles_bloque_5_y_6
 
Energize 2013 slides
Energize 2013 slidesEnergize 2013 slides
Energize 2013 slides
 
Peru Marketing Symposium 2012 David Rogers
Peru Marketing Symposium 2012 David RogersPeru Marketing Symposium 2012 David Rogers
Peru Marketing Symposium 2012 David Rogers
 
ReformIS Capability Statement
ReformIS Capability StatementReformIS Capability Statement
ReformIS Capability Statement
 
Why care about brand management?
Why care about brand management?Why care about brand management?
Why care about brand management?
 
Ddu for ap ms edit
Ddu for ap ms   editDdu for ap ms   edit
Ddu for ap ms edit
 
Fall 2012 Info Session Slides
Fall 2012 Info Session SlidesFall 2012 Info Session Slides
Fall 2012 Info Session Slides
 
Software Product Management in Web 2.0
Software Product Management in Web 2.0Software Product Management in Web 2.0
Software Product Management in Web 2.0
 
Agile Prod Mgmt v. Proj Mgmt
Agile Prod Mgmt v. Proj MgmtAgile Prod Mgmt v. Proj Mgmt
Agile Prod Mgmt v. Proj Mgmt
 
Citrix systems lnkd ms v3
Citrix systems lnkd ms v3Citrix systems lnkd ms v3
Citrix systems lnkd ms v3
 
MRSC company presentation
MRSC company presentationMRSC company presentation
MRSC company presentation
 
MRSC company presentation (U.S.)
MRSC company presentation (U.S.)MRSC company presentation (U.S.)
MRSC company presentation (U.S.)
 
Brand rjvntr brochure
Brand rjvntr brochureBrand rjvntr brochure
Brand rjvntr brochure
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 
Brand Asset Valuation
Brand Asset ValuationBrand Asset Valuation
Brand Asset Valuation
 
Consumer Engagement in the Digital Age
Consumer Engagement in the Digital AgeConsumer Engagement in the Digital Age
Consumer Engagement in the Digital Age
 
Market xcel profile
Market xcel profileMarket xcel profile
Market xcel profile
 
Attractive branding Portfolio2010
Attractive branding Portfolio2010Attractive branding Portfolio2010
Attractive branding Portfolio2010
 

More from Ravi Mynampaty

Build Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to OmegaBuild Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to Omega
Ravi Mynampaty
 
Let Search Power Your Intranet!
Let Search Power Your Intranet!Let Search Power Your Intranet!
Let Search Power Your Intranet!
Ravi Mynampaty
 
How we spiked the HBS water supply with Solr
How we spiked the HBS water supply with Solr How we spiked the HBS water supply with Solr
How we spiked the HBS water supply with Solr
Ravi Mynampaty
 
Building a Solr-driven Web Portal
Building a Solr-driven Web PortalBuilding a Solr-driven Web Portal
Building a Solr-driven Web Portal
Ravi Mynampaty
 
Developing a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the EnterpriseDeveloping a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the Enterprise
Ravi Mynampaty
 
Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013
Ravi Mynampaty
 
Unix for Librarians
Unix for LibrariansUnix for Librarians
Unix for Librarians
Ravi Mynampaty
 
How We Incrementally Improved Search
How We Incrementally Improved SearchHow We Incrementally Improved Search
How We Incrementally Improved Search
Ravi Mynampaty
 
Findability Standards
Findability StandardsFindability Standards
Findability Standards
Ravi Mynampaty
 
What to Feed Your Search Engine: The Evolution of Search Analytics at HBS
What to Feed Your Search Engine:  The Evolution of Search Analytics at HBSWhat to Feed Your Search Engine:  The Evolution of Search Analytics at HBS
What to Feed Your Search Engine: The Evolution of Search Analytics at HBS
Ravi Mynampaty
 
Better Search UX
Better Search UXBetter Search UX
Better Search UX
Ravi Mynampaty
 
Business owner findability interview questions
Business owner findability interview questionsBusiness owner findability interview questions
Business owner findability interview questions
Ravi Mynampaty
 
Developing & Implementing Findability Standards
Developing & Implementing Findability StandardsDeveloping & Implementing Findability Standards
Developing & Implementing Findability Standards
Ravi Mynampaty
 

More from Ravi Mynampaty (13)

Build Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to OmegaBuild Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to Omega
 
Let Search Power Your Intranet!
Let Search Power Your Intranet!Let Search Power Your Intranet!
Let Search Power Your Intranet!
 
How we spiked the HBS water supply with Solr
How we spiked the HBS water supply with Solr How we spiked the HBS water supply with Solr
How we spiked the HBS water supply with Solr
 
Building a Solr-driven Web Portal
Building a Solr-driven Web PortalBuilding a Solr-driven Web Portal
Building a Solr-driven Web Portal
 
Developing a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the EnterpriseDeveloping a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the Enterprise
 
Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013
 
Unix for Librarians
Unix for LibrariansUnix for Librarians
Unix for Librarians
 
How We Incrementally Improved Search
How We Incrementally Improved SearchHow We Incrementally Improved Search
How We Incrementally Improved Search
 
Findability Standards
Findability StandardsFindability Standards
Findability Standards
 
What to Feed Your Search Engine: The Evolution of Search Analytics at HBS
What to Feed Your Search Engine:  The Evolution of Search Analytics at HBSWhat to Feed Your Search Engine:  The Evolution of Search Analytics at HBS
What to Feed Your Search Engine: The Evolution of Search Analytics at HBS
 
Better Search UX
Better Search UXBetter Search UX
Better Search UX
 
Business owner findability interview questions
Business owner findability interview questionsBusiness owner findability interview questions
Business owner findability interview questions
 
Developing & Implementing Findability Standards
Developing & Implementing Findability StandardsDeveloping & Implementing Findability Standards
Developing & Implementing Findability Standards
 

Recently uploaded

Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
313mohammedarshad
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
Edge AI and Vision Alliance
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
Ivanti
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
Anant Gupta
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 

Recently uploaded (20)

Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 

Clustering Search Log Data

  • 1. Clustering Search Query Log Data to Improve Search Sophy Bishop & Ravi Mynampaty Copyright © President & Fellows of Harvard College.
  • 2. Agenda  Background  Five W’s of Clustering • What, why, who, how, when  Is it really repeatable?  Questions
  • 3. About Information Management Services (IMS) Analytics Lifecycle Metadata Mgmt. Mgmt. - Standards - Best Practices - User Needs - Service Models Taxonomy Search Dev.
  • 5. About this talk…  Case study on how we are improving search and browse by performing clustering exercises on your search query data  Not rocket science  High-level overview  You can follow this method, with your own insights and tweaks  You can kick this off next week at your work
  • 6. What is clustering? A process for organizing and analyzing search log data that:  Is repeatable, low-cost, scalable, simple  Yields actionable results  Supports constant incremental improvement to search
  • 7. What’s clustering good for?  Ensure results for high frequency queries  Improve Metadata and Taxonomy  Inform and validate decision making in site IA  Informs editorial/curatorial activities  Provides Feedback for Search Suggestions o Autosuggest, synonym lists, no-hits page suggestions  But more on this later...
  • 8. So how do I cluster search queries? A simple set of steps Create query report Draw Cluster conclusions queries and ACT Determine # Analyze queries to clusters analyze
  • 9. Step 1: Create a query report We started with the site with the most traffic • Upper-bound limit • One year’s data by quarter • Cut off tail at frequency < 10
  • 10. Step 1: Create a query report We started with the site with the most traffic • Upper-bound limit HBS Working Knowledge FY12 Use Snapshot • One year’s data by quarter Overall Traffic • Cut off tail at frequency < 10 Page Views: 6,439,485 Visits: 3,635,746 Unique visitors: 2,734,620 On-site searches: 174,425 Views per Visit: 1.77 Local Search visit rate: 5% Organic Search visit rate: 46%
  • 11. Step 2: Cluster the queries
  • 12. Step 2 (cont’d): Three levels of clustering Level Method Example Narrow Simple Eliminate normalization grammatical, spelling, typos, and punctuation differences Mid-level Group by subject management, finance, decision making Broad Group by facet topic, name, date, content type
  • 13. Step 2 (cont’d): Levels  Tasks Enabled Level Improve your Ensure Improve Improve base for representation Metadata/Index Search query of major /Taxonomy Suggestions analysis clusters on your site Narrow X X X (simple) Mid-level X X X (group by subject) Broad X X (group by facet)
  • 14. Step 2 (cont’d): Narrow Clustering Example
  • 15. Step 2 (cont’d): Mid-level Example Cluster brand branding 245 brand 160 brand management 73 consumer branding 57 global brand 32 service brands 24 brand image retail bank 17 employer branding 16 brand management professional services 16 global branding 13 b2b branding 13 importance of branding 12 brand 2002 12 brand equity 11 brand image 11
  • 16. Cluster brand Step 2 (cont’d): Mid-level Example branding 245 brand 160 brand management 73 consumer branding 57 global brand 32 service brands 24 brand image retail bank 17 employer branding 16 brand management professional services 16 global branding 13 b2b branding 13 importance of branding 12 brand 2002 12 brand equity 11 brand image 11
  • 17. Cluster customer brand Step 2 (cont’d): Mid-level Example 350 333 branding 245 brand 300 160 brand management 73 250 consumer branding 57 global brand 200 32 179 service brands 24 145 brand image retail bank 150 17 employer branding101 111 16 100 brand management professional 88 services 16 50 40 global branding 26 26 25 20 13 19 15 14 12 12 11 11 10 10 10 b2b branding 0 13 importance of branding 12 brand 2002 12 brand equity 11 brand image 11
  • 18. Step 2 (cont’d): Broad Clustering Example
  • 19. Step 2 (cont’d): List of facets we used Facet Example case studies, cases, working papers, articles, content type newspaper date 2011, world in 2030 demographic characteristics women, Gen Y, gender, baby boomers event economic crisis format podcast, video geographic area india, japan, mount everest industry global wine industry independent director, entrepreneur, ceo, phd job type/role economist organization name ikea, zara, toyota person name michael porter, kanter, sebenius product name / brand name ipad product/commodity coffee, wine, cement topic this covers the majority of keywords faculty work, ex: publication name, title of a work case
  • 20. Step 3: Choose #clusters to analyze Number of Analyze Top Hits Improve Metadata/ Supply Search Clusters Taxonomy Suggestions Analyzed /Index 50 X 150 X X 300+ X X X
  • 21. Small # Clusters can cover a lot of your data Number of top clusters % Total Queries Top 20 clusters 14 Top 30 clusters 18 Top 50 clusters 26 Top 100 clusters 37
  • 22. Now you have your clusters… What do you do with them? TAKE ACTION!
  • 23. Analyze Top (“Short Head”) Clusters Clustering has created a condensed and reliable list of your top search queries  Are they what you thought they would be?  Does the information on your site accurately represent the top searches?  Are you fulfilling user needs?
  • 24. Use your clusters: Improve Site Navigation Examine the short-head of clusters, basically:  For each cluster, add up the frequencies of queries  Reorder clusters by cumulative frequency descending  Ensure top clusters are accounted for in your navigation  Use cluster topics as browse/navigation headers/footers for your website
  • 25. WK Top Clusters Cluster Frequency innovation 867 balanced scorecard 794 leadership 570 cases 545 social media 508 negotiation 470 knowledge management 457 ethics 448 apple 430 corporate social responsibility 398
  • 26. Use your clusters: Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 27. Use your clusters: Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 28. Use your clusters: Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 29. Mid-level clustering: Informs editorial /curatorial activities  “Featured Topics” o What topics to highlight this week/month/year o News items to focus on o What research guides to create o How to formulate queries for the topics
  • 30. Use your clusters: Improve Synonym Handling  Clustered list provides synonyms for taxonomy  Requires human judgment and standards/guidelines for synonyms – in our case, synonyms are exact  Map to one "like term" in the search engine Example: Balanced Scorecard, BSC, Balanced score card kaplan and norton -> Balanced Scorecard
  • 31. Use your clusters: Improve no-hits page
  • 32. Time Commitment • 2 hours to 2 weeks • Variables include: • What kind of information you want to gather • How broad or narrow you want your clusters • How many queries you analyze • In our case ~2 person-weeks • We had Sophy Bishop • Intern, MSLIS student
  • 33. Results vs. Time Invested Analyze top Update Create New Determine clusters Taxonomy Metadata New Search Suggestions 2 Hours X X 6 Hours X X X One Week X X X X
  • 34. Next Steps: Autosuggest  Your top clusters probably make up a large percentage of what people are looking for o Use them to establish/supplement auto-suggest! Example: suggestions for “innovation” o innovation and leadership o disruptive innovation o innovation management o open innovation
  • 35. Next Steps: New Access Structures  Needed an obvious way to search podcasts o Put in best bets for now  A lot of people searching for article titles o Considering simple interface/approach for select field-specific search, e.g. “title”  Consider adding other facets to browse taxonomy where we have entities tagged o “company name”, “job type/class”, etc.
  • 36. Next Steps  SEO Optimization Input o Advise authors to use top cluster terms in Titles, Abstracts, Keywords o Report on clusters in our monthly analytics reports to faculty (“Top search topics/subjects in May 2012 were…” ; “Searchers found your works with following queries”)  Repeat process on other sites/content
  • 37. Summary  Established plan/process, but be willing to tweak as you go  Keep it very simple.  Play with your data – the more we played, the better we understood what benefits could be realized by levels of clustering and effort  Tuning process/results o Build staging/working prototypes o Repeat process on other sites  TAKE ACTION!
  • 38. Thank you! Questions? sophybishop@gmail.com @sophreads searchguy@hbs.edu @ravimynampaty