Your SlideShare is downloading. ×
Clustering as presented at UX Poland 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Clustering as presented at UX Poland 2013

486

Published on

Published in: Technology, Business
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
486
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Copyright © President & Fellows of Harvard College.Ravi MynampatyCategorizing Your Search Queries to Improve Findability
  • 2. About this talk… Case study on how we are improving search andbrowse by performing clustering exercises on searchquery data Not rocket science High-level overview You can follow this method, with your own insights andtweaks You can kick this off next week at your work
  • 3. Inspired by…• Chapters 8 & 9• The power of incrementalism
  • 4. What is clustering?A process for organizing and analyzing search logdata that: Is repeatable, low-cost, scalable, simple Yields actionable results Supports constant incremental improvementto search
  • 5. What’s clustering good for? Ensure results for high frequency queries Improve Metadata and Taxonomy Inform and validate decision making in site IA Informs editorial/curatorial activities Provides Feedback for Search Suggestionso Autosuggest, synonym lists, no-hits pagesuggestions But more on this later...
  • 6. So how do I cluster search queries?A simple set of stepsCreatequery reportClusterqueriesDetermine #queries toanalyzeAnalyzeclustersDrawconclusionsand ACT
  • 7. Step 1: Create a query reportWe started with the site with the most traffic• Upper-bound limit• One year’s data by quarter• Cut off tail at frequency < 10
  • 8. Step 1: Create a query reportWe started with the site with the most traffic• Upper-bound limit• One year’s data by quarter• Cut off tail at frequency < 10
  • 9. Step 1: Create a query reportWe started with the site with the most traffic• Upper-bound limit• One year’s data by quarter• Cut off tail at frequency < 10HBS Working Knowledge FY12 Use SnapshotOverall TrafficPage Views: 6,439,485Visits: 3,635,746Unique visitors: 2,734,620On-site searches: 174,425Views per Visit: 1.77Local Search visit rate: 5%Organic Search visit rate: 46%
  • 10. Step 2: Cluster the queries
  • 11. Step 2 (cont’d): Three levels of clusteringLevel Method ExampleNarrow SimplenormalizationEliminategrammatical,spelling, typos, andpunctuationdifferencesMid-level Group by subject management,finance, decisionmakingBroad Group by facet topic, name, date,content type
  • 12. Step 2 (cont’d): Levels  Tasks EnabledLevel Improve yourbase forqueryanalysisEnsurerepresentationof majorclusters on yoursiteImproveMetadata/Index/TaxonomyImproveSearchSuggestionsNarrow(simple)X X XMid-level(group bysubject)X X XBroad(group byfacet)X X
  • 13. Step 2 (cont’d): Narrow Clustering Example
  • 14. Step 2 (cont’d): Mid-level ExampleCluster brandbranding 245brand 160brand management 73consumer branding 57global brand 32service brands 24brand image retail bank 17employer branding 16brand management professionalservices 16global branding 13b2b branding 13importance of branding 12brand 2002 12brand equity 11brand image 11
  • 15. Step 2 (cont’d): Broad Clustering Example
  • 16. Step 2 (cont’d): List of facets we usedFacet Examplecontent typecase studies, cases, working papers, articles,newspaperdate 2011, world in 2030demographic characteristics women, Gen Y, gender, baby boomersevent economic crisisformat podcast, videogeographic area india, japan, mount everestindustry global wine industryjob type/roleindependent director, entrepreneur, ceo, phdeconomistorganization name ikea, zara, toyotaperson name michael porter, kanter, sebeniusproduct name / brand name ipadproduct/commodity coffee, wine, cementtopic this covers the majority of keywordsworkfaculty work, ex: publication name, title of acase
  • 17. Step 3: Choose #clusters to analyzeNumber ofClustersAnalyzedAnalyze Top Hits Improve Metadata/Taxonomy/IndexSupply SearchSuggestions50 X150 X X300+ X X X
  • 18. Small # Clusters can cover a lot of your dataNumber of top clusters % Total QueriesTop 20 clusters 14Top 30 clusters 18Top 50 clusters 26Top 100 clusters 37
  • 19. Now you have your clusters…What do you do with them?TAKE ACTION!
  • 20. Analyze Top (“Short Head”) ClustersClustering has created a condensed and reliablelist of your top search queries Are they what you thought they would be? Does the information on your site accuratelyrepresent the top searches? Are you fulfilling user needs?
  • 21. Use your clusters: Improve Site NavigationExamine the short-head of clusters, basically: For each cluster, add up the frequenciesof queries Reorder clusters by cumulative frequencydescending Ensure top clusters are accounted for in yournavigation Use cluster topics as browse/navigationheaders/footers for your website
  • 22. WK Top ClustersCluster Frequencyinnovation 867balanced scorecard 794leadership 570cases 545social media 508negotiation 470knowledge management 457ethics 448apple 430corporate social responsibility 398
  • 23. Use your clusters: Improve Taxonomy• Missing categories in browse taxonomy• "Balanced Scorecard"• “Ethics”• “Social media”• Second-level topics in the WK context
  • 24. Use your clusters: Improve Taxonomy• Missing categories in browse taxonomy• "Balanced Scorecard"• “Ethics”• “Social media”• Second-level topics in the WK context
  • 25. Use your clusters: Improve Taxonomy• Missing categories in browse taxonomy• "Balanced Scorecard"• “Ethics”• “Social media”• Second-level topics in the WK context
  • 26. Use your clusters: Improve Taxonomy• Missing categories in browse taxonomy• "Balanced Scorecard"• “Ethics”• “Social media”• Second-level topics in the WK context
  • 27. Mid-level clustering:Informs editorial /curatorial activities “Featured Topics”o What topics to highlight this week/month/yearo News items to focus ono What research guides to createo How to formulate queries for the topics
  • 28. How about improving search? Clustered list provides synonyms for taxonomy Requires human judgment andstandards/guidelines for synonyms – in ourcase, synonyms are exact Map to one "like term" in the search engineExample:Balanced Scorecard, BSC, Balanced score cardkaplan and norton -> Balanced Scorecard
  • 29. Use your clusters: Improve no-hits page
  • 30. Time Commitment• 2 hours to 2 weeks• Variables include:• What kind of information you want to gather• How broad or narrow you want your clusters• How many queries you analyze• In our case ~2 person-weeks
  • 31. Results vs. Time InvestedAnalyze topclustersUpdateTaxonomyCreate NewMetadataDetermineNew SearchSuggestions2 Hours X X6 Hours X X XOne Week X X X X
  • 32. Next Steps: Autosuggest Your top clusters probably make up a largepercentage of what people are looking foro Use them to establish/supplementauto-suggest!Example: suggestions for “innovation”o innovation and leadershipo disruptive innovationo innovation managemento open innovation
  • 33. Next Steps: New Access Structures Needed an obvious way to search podcastso Put in best bets for now A lot of people searching for article titleso Considering simple interface/approach for selectfield-specific search, e.g. “title” Consider adding other facets to browsetaxonomy where we have entities taggedo “company name”, “job type/class”, etc.
  • 34. Summary Established plan/process, but be willing to tweakas you go Keep it very simple. Play with your data – the more we played, the betterwe understood what benefits could be realized bylevels of clustering and effort Tuning process/resultso Build staging/working prototypeso Repeat process on other sites
  • 35. Thank you! And remember…TAKE ACTION!Kropla drąży skalę !Questions?searchguy@hbs.edu@ravimynampatyhttp://www.slideshare.net/mynampaty/

×