Successfully reported this slideshow.
Your SlideShare is downloading. ×

Search Analytics: Diagnosing what ails your site


Check these out next

1 of 31 Ad

More Related Content

Slideshows for you (15)

Viewers also liked (14)


Similar to Search Analytics: Diagnosing what ails your site (20)

More from Louis Rosenfeld (20)


Search Analytics: Diagnosing what ails your site

  1. 1. Search Analytics: Diagnosing what ails your site Michigan UPA Ann Arbor, Michigan January 17, 2007 Louis Rosenfeld
  2. 2. About Me <ul><li>Information architecture (IA) consultant; formerly president Argus Associates </li></ul><ul><li>Publisher and founder, Rosenfeld Media ( </li></ul><ul><li>Background in librarianship/information science; consult for Fortune 500s </li></ul><ul><li>Co-author, Information Architecture for the World Wide Web (3rd edition 11/06) </li></ul><ul><li>Co-founder, Information Architecture Institute ( and UXnet ( </li></ul>
  3. 3. AOL Searcher #4417749 <ul><li>Interests </li></ul><ul><ul><li>60 single men </li></ul></ul><ul><ul><li>aameetings in georgia </li></ul></ul><ul><ul><li>plastic surgeons in gwinnett county </li></ul></ul><ul><ul><li>applying to west point </li></ul></ul><ul><ul><li>bipolar </li></ul></ul><ul><ul><li>panic disorders </li></ul></ul><ul><ul><li>yerba mate </li></ul></ul><ul><ul><li>shedless dogs </li></ul></ul><ul><ul><li>movies for dogs </li></ul></ul><ul><ul><li>new zealand real estate </li></ul></ul><ul><li>Thelma Arnold </li></ul><ul><ul><li>62-year old widow </li></ul></ul><ul><ul><li>Lilburn, GA resident </li></ul></ul>NY Times , August 9, 2006: “A Face Is Exposed for AOL Searcher No. 4417749”
  4. 4. Our Inadvertent Search Analytics Education, courtesy AOL <ul><li> </li></ul>650,000 searchers 21,000,000 queries
  5. 5. Anatomy of a Search Log (from Google Search Appliance) <ul><li>Critical elements in bold: IP address , time/date stamp , query , and # of results: </li></ul><ul><li>XXX.XXX.X.104 - - [ 10/Jul/2006:10:25:46 -0800] &quot;GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q= lincense+plate &ip=XXX.XXX.X.104 HTTP/1.1&quot; 200 971 0 0.02 </li></ul><ul><li>XXX.XXX.X.104 - - [ 10/Jul/2006:10:25:48 -0800] &quot;GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q= license+plate &ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1&quot; 200 8283 146 0.16 </li></ul><ul><li>XXX.XXX.XX.130 - - [ 10/Jul/2006:10:24:38 -0800] &quot;GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q= regional+transportation+governance+commission &ip=XXX.XXX.X.130 HTTP/1.1&quot; 200 9718 62 0.17 </li></ul>Full legend and more examples here:
  6. 6. Sample Query Analysis Report Download template here:
  7. 7. The Head, the Long Tail, and the Interesting Stuff in Between Sorting queries by frequency results in a Zipf Distribution Can we improve performance for the most popular queries?
  8. 8. Querying your Queries: Some basic questions 1/2 <ul><li>What are the most common unique queries? </li></ul><ul><li>Do any interesting patterns emerge from analyzing these common queries? </li></ul><ul><li>When common queries are searched, are the results the ones your users should be seeing? </li></ul><ul><li>Which common queries retrieve zero results? </li></ul><ul><li>Which common queries retrieve a large number of results, say 100 or more? </li></ul>
  9. 9. Querying your Queries: Some basic questions 2/2 <ul><li>Which common queries retrieve results that don’t get clicked through? </li></ul><ul><li>What page is the top source (referrer) per common query? </li></ul><ul><li>What is the number of click-throughs per common query? </li></ul><ul><li>Which result is most frequently clicked-through per common query? </li></ul><ul><li>What’s the average query length (number of terms, number of characters)? </li></ul><ul><li>Which URLs are users searching for? </li></ul>
  10. 10. Tune your Questions: Broad to specific <ul><li>Netflix asks: </li></ul><ul><ul><li>Which movies most frequently searched? </li></ul></ul><ul><ul><li>Which of them most frequently clicked through? </li></ul></ul><ul><ul><li>Which of them least frequently added to queue (and why)? </li></ul></ul><ul><ul><li>Examples: </li></ul></ul><ul><ul><li>“ OO7” versus “007” </li></ul></ul><ul><ul><li>Porn-related (not carried by Netflix) </li></ul></ul><ul><ul><li>“ yoga”: not stocking enough? or not indexing enough record content? </li></ul></ul>
  11. 11. SA as Diagnostic Tool: What can you fix or improve? <ul><li>User Research </li></ul><ul><li>Interface Design: search entry interface, search results </li></ul><ul><li>Retrieval Algorithm Modification </li></ul><ul><li>Navigation Design </li></ul><ul><li>Metadata Development </li></ul><ul><li>Content Development </li></ul>
  12. 12. User Research: What do they want?… <ul><li>SA is a true expression of users’ information needs (often surprising: e.g., SKU numbers at LL Bean; URLs at IBM) </li></ul><ul><li>Provides context by displaying aspects of single search sessions </li></ul>
  13. 13. User Research: …who wants it?… <ul><li>What can you learn from knowing these things? </li></ul><ul><ul><li>What specific segments want; determined by: </li></ul></ul><ul><ul><ul><li>Security clearance </li></ul></ul></ul><ul><ul><ul><li>IP address </li></ul></ul></ul><ul><ul><ul><li>Job function </li></ul></ul></ul><ul><ul><ul><li>Account information </li></ul></ul></ul><ul><ul><li>Which pages they initiate searches from </li></ul></ul>
  14. 14. User Research: …and when do they want it? <ul><li>Time-based variation (and clustered queries) </li></ul><ul><li>By hour, by day, by season </li></ul><ul><li>Helps determine “best bets” and “guide” develop- ment </li></ul>
  15. 15. Search Entry Interface Design: “The Box” or something else? <ul><li>SA identifies “dead end” points (e.g., 0 hits, 2000 hits) where assistance could be added (e.g., revise search, browsing alternative) </li></ul><ul><li>Syntax of queries informs selection of search features to expose (e.g., use of Boolean operators, fielded searching) </li></ul>… OR…
  16. 16. Search Results Interface Design: Which results where? <ul><li>#10 result is clicked through more often than #s 6, 7, 8, and 9 (ten results per page) </li></ul>From SLI Systems (
  17. 17. Search Results Interface Design: How to sort results? <ul><li>Financial Times has found that users often include dates in their queries </li></ul><ul><li>Obvious but effective improvement: Allow users to sort by date </li></ul>
  18. 18. Search System: What to change? <ul><li>Identify new functionality: Financial Times added spell checking </li></ul><ul><li>Retrieval algorithm modifications: </li></ul><ul><ul><li>Deloitte, Barnes & Noble use SA to demonstrate that basic improvements (e.g., Best Bets) are insufficient </li></ul></ul><ul><ul><li>Financial Times weights company names higher </li></ul></ul>
  19. 19. Navigation: Any improvements? <ul><li>Michigan State University builds A-Z index automatically based on frequent queries </li></ul>
  20. 20. Navigation: Where does it fail? <ul><li>Track and study pages (excluding main page) where search is initiated </li></ul><ul><ul><li>Are there obvious issues that would cause a “dead end”? </li></ul></ul><ul><ul><li>Are there user studies that could test/validate problems on these pages? </li></ul></ul><ul><li>Sandia Labs analyzes most requested documents to test content independent of site structure; results used to improve structure </li></ul>
  21. 21. Metadata Development: How do users express their needs? <ul><li>SA provides a sense of tone: how users’ needs are expressed </li></ul><ul><ul><li>Jargon (e.g., “cancer” vs. “oncology,” “lorry” vs. “truck,” acronyms) </li></ul></ul><ul><ul><li>Length (e.g., number of terms/query) </li></ul></ul><ul><ul><li>Syntax (e.g., Boolean, natural language, keyword) </li></ul></ul>
  22. 22. Metadata Development: Which metadata values? <ul><li>SA helps in the creation of controlled vocabularies </li></ul><ul><li>Terms are fodder for metadata values (e.g., “cell phone,” “JFK” vs. “John Kennedy,” “country music”), especially for determining preferred terms </li></ul><ul><li>Works with tools that cluster synonyms (example from, enabling concept searching and thesaurus development </li></ul>
  23. 23. Metadata Development: Which metadata attributes? <ul><li>SA helps in the creation of vocabularies </li></ul><ul><li>Simple cluster analysis can detect metadata attributes (e.g., “product,” “person,” “topic”) </li></ul><ul><li>Look for variations between short head and long tail (Deloitte intranet: “known-item” queries are common; research topics are infrequent) </li></ul>known-item queries research queries
  24. 24. Content Development: Do we have the right content? <ul><li>SA identifies content that can’t be found (0 results) </li></ul><ul><li>Does the content exist? If so, there are wording, metadata, or spidering problems </li></ul><ul><li>If not, why not? </li></ul>
  25. 25. Content Development: Are we featuring the right stuff? <ul><li>Clickthrough tracking helps determine which results should rise to the top (example: SLI Systems) </li></ul><ul><li>Also suggests which “best bets” to develop to address common queries </li></ul>
  26. 26. Organizational Impact: Educational opportunities <ul><li>SA is a way to “reverse engineer” how your site performs in order to: </li></ul><ul><ul><li>Sensitize organization to analytics, specifically related to findability </li></ul></ul><ul><ul><li>Sensitize content owners/authors to benefits of good practices around content titling, tagging, and navigational placement </li></ul></ul>
  27. 27. Organizational Impact: Rethinking how you do things <ul><li>Financial Times learns about breaking stories from their logs by monitoring spikes in company names and individuals’ names and comparing with their current coverage </li></ul><ul><li>Discrepancy = possible breaking story; reporter is assigned to follow up </li></ul><ul><li>Next step? Assign reporters to “beats” that emerge from SA </li></ul>
  28. 28. SA as User Research Method: Sleeper, but no panacea <ul><li>Benefits </li></ul><ul><ul><li>Non-intrusive </li></ul></ul><ul><ul><li>Inexpensive and (usually) accessible </li></ul></ul><ul><ul><li>Large volume of “real” data </li></ul></ul><ul><ul><li>Represents actual usage patterns </li></ul></ul><ul><li>Drawbacks </li></ul><ul><ul><li>Provides an incomplete picture of usage: was user satisfied at session’s end? </li></ul></ul><ul><ul><li>Difficult to analyze: where are the commercial tools? </li></ul></ul><ul><li>Ultimately an excellent complement to qualitative methods (e.g., task analysis, field studies) </li></ul>
  29. 29. SA Headaches: What gets in the way? <ul><li>Lack of time </li></ul><ul><li>Few useful tools for parsing logs, generating reports </li></ul><ul><li>Tension between those who want to perform SA and those who “own” the data (chiefly IT) </li></ul><ul><li>Ignorance of the method </li></ul><ul><li>Hard work and/or boredom of doing analysis </li></ul><ul><li>From summer 2006 survey (134 responses) </li></ul>
  30. 30. Please Share Your SA Knowledge: Visit our “book in progress” site <ul><li>Search Analytics for Your Site: Conversations with your Customers by Louis Rosenfeld and Richard Wiggins (Rosenfeld Media, 2007) </li></ul><ul><li>Site URL: </li></ul><ul><li>Feed URL: </li></ul><ul><li>Site contains: </li></ul><ul><li>Reading list </li></ul><ul><li>Survey results </li></ul><ul><li>Perl script for parsing logs </li></ul><ul><li>Log samples </li></ul><ul><li>Report templates </li></ul><ul><li>… and more </li></ul>
  31. 31. Contact Information <ul><li>Louis Rosenfeld LLC </li></ul><ul><li>902 Miller Avenue </li></ul><ul><li>Ann Arbor, Michigan 48103 USA </li></ul><ul><li>[email_address] </li></ul><ul><li> </li></ul><ul><li>+1.734.302.3323 voice </li></ul><ul><li>+1.734.661.1655 fax </li></ul>