Clustering as presented at UX Poland 2013

902 views

Published on

Published in: Technology, Business

Clustering as presented at UX Poland 2013

  1. 1. Copyright © President & Fellows of Harvard College.Ravi MynampatyCategorizing Your Search Queries to Improve Findability
  2. 2. About this talk… Case study on how we are improving search andbrowse by performing clustering exercises on searchquery data Not rocket science High-level overview You can follow this method, with your own insights andtweaks You can kick this off next week at your work
  3. 3. Inspired by…• Chapters 8 & 9• The power of incrementalism
  4. 4. What is clustering?A process for organizing and analyzing search logdata that: Is repeatable, low-cost, scalable, simple Yields actionable results Supports constant incremental improvementto search
  5. 5. What’s clustering good for? Ensure results for high frequency queries Improve Metadata and Taxonomy Inform and validate decision making in site IA Informs editorial/curatorial activities Provides Feedback for Search Suggestionso Autosuggest, synonym lists, no-hits pagesuggestions But more on this later...
  6. 6. So how do I cluster search queries?A simple set of stepsCreatequery reportClusterqueriesDetermine #queries toanalyzeAnalyzeclustersDrawconclusionsand ACT
  7. 7. Step 1: Create a query reportWe started with the site with the most traffic• Upper-bound limit• One year’s data by quarter• Cut off tail at frequency < 10
  8. 8. Step 1: Create a query reportWe started with the site with the most traffic• Upper-bound limit• One year’s data by quarter• Cut off tail at frequency < 10
  9. 9. Step 1: Create a query reportWe started with the site with the most traffic• Upper-bound limit• One year’s data by quarter• Cut off tail at frequency < 10HBS Working Knowledge FY12 Use SnapshotOverall TrafficPage Views: 6,439,485Visits: 3,635,746Unique visitors: 2,734,620On-site searches: 174,425Views per Visit: 1.77Local Search visit rate: 5%Organic Search visit rate: 46%
  10. 10. Step 2: Cluster the queries
  11. 11. Step 2 (cont’d): Three levels of clusteringLevel Method ExampleNarrow SimplenormalizationEliminategrammatical,spelling, typos, andpunctuationdifferencesMid-level Group by subject management,finance, decisionmakingBroad Group by facet topic, name, date,content type
  12. 12. Step 2 (cont’d): Levels  Tasks EnabledLevel Improve yourbase forqueryanalysisEnsurerepresentationof majorclusters on yoursiteImproveMetadata/Index/TaxonomyImproveSearchSuggestionsNarrow(simple)X X XMid-level(group bysubject)X X XBroad(group byfacet)X X
  13. 13. Step 2 (cont’d): Narrow Clustering Example
  14. 14. Step 2 (cont’d): Mid-level ExampleCluster brandbranding 245brand 160brand management 73consumer branding 57global brand 32service brands 24brand image retail bank 17employer branding 16brand management professionalservices 16global branding 13b2b branding 13importance of branding 12brand 2002 12brand equity 11brand image 11
  15. 15. Step 2 (cont’d): Broad Clustering Example
  16. 16. Step 2 (cont’d): List of facets we usedFacet Examplecontent typecase studies, cases, working papers, articles,newspaperdate 2011, world in 2030demographic characteristics women, Gen Y, gender, baby boomersevent economic crisisformat podcast, videogeographic area india, japan, mount everestindustry global wine industryjob type/roleindependent director, entrepreneur, ceo, phdeconomistorganization name ikea, zara, toyotaperson name michael porter, kanter, sebeniusproduct name / brand name ipadproduct/commodity coffee, wine, cementtopic this covers the majority of keywordsworkfaculty work, ex: publication name, title of acase
  17. 17. Step 3: Choose #clusters to analyzeNumber ofClustersAnalyzedAnalyze Top Hits Improve Metadata/Taxonomy/IndexSupply SearchSuggestions50 X150 X X300+ X X X
  18. 18. Small # Clusters can cover a lot of your dataNumber of top clusters % Total QueriesTop 20 clusters 14Top 30 clusters 18Top 50 clusters 26Top 100 clusters 37
  19. 19. Now you have your clusters…What do you do with them?TAKE ACTION!
  20. 20. Analyze Top (“Short Head”) ClustersClustering has created a condensed and reliablelist of your top search queries Are they what you thought they would be? Does the information on your site accuratelyrepresent the top searches? Are you fulfilling user needs?
  21. 21. Use your clusters: Improve Site NavigationExamine the short-head of clusters, basically: For each cluster, add up the frequenciesof queries Reorder clusters by cumulative frequencydescending Ensure top clusters are accounted for in yournavigation Use cluster topics as browse/navigationheaders/footers for your website
  22. 22. WK Top ClustersCluster Frequencyinnovation 867balanced scorecard 794leadership 570cases 545social media 508negotiation 470knowledge management 457ethics 448apple 430corporate social responsibility 398
  23. 23. Use your clusters: Improve Taxonomy• Missing categories in browse taxonomy• "Balanced Scorecard"• “Ethics”• “Social media”• Second-level topics in the WK context
  24. 24. Use your clusters: Improve Taxonomy• Missing categories in browse taxonomy• "Balanced Scorecard"• “Ethics”• “Social media”• Second-level topics in the WK context
  25. 25. Use your clusters: Improve Taxonomy• Missing categories in browse taxonomy• "Balanced Scorecard"• “Ethics”• “Social media”• Second-level topics in the WK context
  26. 26. Use your clusters: Improve Taxonomy• Missing categories in browse taxonomy• "Balanced Scorecard"• “Ethics”• “Social media”• Second-level topics in the WK context
  27. 27. Mid-level clustering:Informs editorial /curatorial activities “Featured Topics”o What topics to highlight this week/month/yearo News items to focus ono What research guides to createo How to formulate queries for the topics
  28. 28. How about improving search? Clustered list provides synonyms for taxonomy Requires human judgment andstandards/guidelines for synonyms – in ourcase, synonyms are exact Map to one "like term" in the search engineExample:Balanced Scorecard, BSC, Balanced score cardkaplan and norton -> Balanced Scorecard
  29. 29. Use your clusters: Improve no-hits page
  30. 30. Time Commitment• 2 hours to 2 weeks• Variables include:• What kind of information you want to gather• How broad or narrow you want your clusters• How many queries you analyze• In our case ~2 person-weeks
  31. 31. Results vs. Time InvestedAnalyze topclustersUpdateTaxonomyCreate NewMetadataDetermineNew SearchSuggestions2 Hours X X6 Hours X X XOne Week X X X X
  32. 32. Next Steps: Autosuggest Your top clusters probably make up a largepercentage of what people are looking foro Use them to establish/supplementauto-suggest!Example: suggestions for “innovation”o innovation and leadershipo disruptive innovationo innovation managemento open innovation
  33. 33. Next Steps: New Access Structures Needed an obvious way to search podcastso Put in best bets for now A lot of people searching for article titleso Considering simple interface/approach for selectfield-specific search, e.g. “title” Consider adding other facets to browsetaxonomy where we have entities taggedo “company name”, “job type/class”, etc.
  34. 34. Summary Established plan/process, but be willing to tweakas you go Keep it very simple. Play with your data – the more we played, the betterwe understood what benefits could be realized bylevels of clustering and effort Tuning process/resultso Build staging/working prototypeso Repeat process on other sites
  35. 35. Thank you! And remember…TAKE ACTION!Kropla drąży skalę !Questions?searchguy@hbs.edu@ravimynampatyhttp://www.slideshare.net/mynampaty/

×