Search Analytics: Diagnosing what ails your site Michigan UPA Ann Arbor, Michigan January 17, 2007 Louis Rosenfeld www.louisrosenfeld.com www.rosenfeldmedia.com/books/searchanalytics
About Me Information architecture (IA) consultant; formerly president Argus Associates Publisher and founder, Rosenfeld Media (www.rosenfeldmedia.com) Background in librarianship/information science; consult for Fortune 500s Co-author,  Information Architecture for the World Wide Web  (3rd edition 11/06) Co-founder,  Information Architecture Institute  (www.iainstitute.org) and  UXnet  (www.uxnet.org)
AOL Searcher #4417749 Interests 60 single men aameetings in georgia plastic surgeons in gwinnett county applying to west point bipolar panic disorders yerba mate shedless dogs movies for dogs new zealand real estate Thelma Arnold 62-year old widow Lilburn, GA resident NY Times , August 9, 2006:  “A Face Is Exposed for AOL Searcher No. 4417749”
Our Inadvertent Search Analytics Education, courtesy AOL  http://www.aolsearchdatabase.com 650,000 searchers 21,000,000 queries
Anatomy of a Search Log (from Google Search Appliance) Critical elements in bold:  IP address ,  time/date stamp ,  query , and  # of results: XXX.XXX.X.104  - - [ 10/Jul/2006:10:25:46  -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q= lincense+plate &ip=XXX.XXX.X.104 HTTP/1.1" 200 971  0  0.02 XXX.XXX.X.104  - - [ 10/Jul/2006:10:25:48  -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q= license+plate &ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283  146  0.16 XXX.XXX.XX.130  - - [ 10/Jul/2006:10:24:38  -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q= regional+transportation+governance+commission &ip=XXX.XXX.X.130 HTTP/1.1" 200 9718  62  0.17 Full legend and more examples here: http://www.rosenfeldmedia.com/books/searchanalytics/blog/log_sample_google_appliance/
Sample Query Analysis Report Download template here: http://www.rosenfeldmedia.com/books/searchanalytics/blog/free_ms_excel_template_for_ana/
The Head, the Long Tail, and the Interesting Stuff in Between Sorting queries by frequency results in a Zipf Distribution Can we improve performance for the most popular queries?
Querying your Queries:  Some basic questions  1/2 What are the most common unique queries? Do any interesting patterns emerge from analyzing these common queries? When common queries are searched, are the results the ones your users  should  be seeing? Which common queries retrieve zero results?  Which common queries retrieve a large number of results, say 100 or more?
Querying your Queries:  Some basic questions  2/2 Which common queries retrieve results that don’t get clicked through?  What page is the top source (referrer) per common query? What is the number of click-throughs per common query?  Which result is most frequently clicked-through per common query? What’s the average query length (number of terms, number of characters)? Which URLs are users searching for?
Tune your Questions: Broad to specific Netflix asks: Which movies most frequently searched? Which of them most frequently clicked through? Which of them  least  frequently added to queue (and why)?  Examples:  “ OO7” versus “007” Porn-related (not carried by Netflix) “ yoga”:  not stocking enough?  or not indexing enough record content?
SA as Diagnostic Tool:  What can you fix or improve? User Research Interface Design:  search entry interface, search results Retrieval Algorithm Modification Navigation Design Metadata Development Content Development
User Research: What do they want?… SA is a true expression of users’ information needs (often surprising:  e.g., SKU numbers at LL Bean; URLs at IBM) Provides context by displaying aspects of single search sessions
User Research: …who wants it?… What can you learn from knowing these things? What specific segments want; determined by: Security clearance IP address Job function Account information Which pages they initiate searches from
User Research: …and when do they want it? Time-based variation (and clustered queries) By hour, by day, by season Helps determine “best bets” and “guide” develop- ment
Search Entry Interface Design: “The Box” or something else? SA identifies “dead end” points (e.g., 0 hits, 2000 hits) where assistance could be added (e.g., revise search, browsing alternative)  Syntax of queries informs selection of search features to expose (e.g., use of Boolean operators, fielded searching) … OR…
Search Results Interface Design: Which results where? #10 result is clicked through more often than #s 6, 7, 8, and 9 (ten results per page) From SLI Systems (www.sli-systems.com)
Search Results Interface Design: How to sort results? Financial Times  has found that users often include dates in their queries Obvious but effective improvement:  Allow users to sort by date
Search System: What to change? Identify new functionality:  Financial Times  added spell checking Retrieval algorithm modifications: Deloitte, Barnes & Noble use SA to demonstrate that basic improvements (e.g., Best Bets) are insufficient Financial Times  weights company names higher
Navigation: Any improvements? Michigan State University builds A-Z index automatically based on frequent queries
Navigation: Where does it fail? Track and study pages (excluding main page) where search is initiated Are there obvious issues that would cause a “dead end”?  Are there user studies that could test/validate problems on these pages? Sandia Labs analyzes most requested documents to test content independent of site structure; results used to improve structure
Metadata Development: How do users express their needs? SA provides a sense of  tone:  how users’ needs are expressed  Jargon (e.g., “cancer” vs. “oncology,” “lorry” vs. “truck,” acronyms) Length (e.g., number of terms/query) Syntax (e.g., Boolean, natural language, keyword)
Metadata Development: Which metadata values? SA helps in the creation of controlled vocabularies Terms are fodder for metadata values (e.g., “cell phone,” “JFK” vs. “John Kennedy,” “country music”), especially for determining preferred terms Works with tools that cluster synonyms (example from www.behaviortracking.com), enabling concept searching and thesaurus development
Metadata Development: Which metadata attributes? SA helps in the creation of vocabularies Simple cluster analysis can detect metadata attributes (e.g., “product,” “person,” “topic”) Look for variations between short head and long tail (Deloitte intranet:  “known-item”  queries are  common;  research topics  are infrequent) known-item queries research queries
Content Development: Do we have the right content? SA identifies content that can’t be found (0 results) Does the content exist?  If so, there are wording, metadata, or spidering problems If not, why not? www.behaviortracking.com
Content Development: Are we featuring the right stuff? Clickthrough tracking helps determine which results should rise to the top (example:  SLI Systems) Also suggests which “best bets” to develop to address common queries
Organizational Impact: Educational opportunities SA is a way to “reverse engineer” how your site performs in order to: Sensitize organization to analytics, specifically related to findability Sensitize content owners/authors to benefits of good practices around content titling, tagging, and navigational placement
Organizational Impact: Rethinking how you do things Financial Times  learns about breaking stories from their logs by monitoring spikes in company names and individuals’ names and comparing with their current coverage Discrepancy = possible breaking story; reporter is assigned to follow up Next step?  Assign reporters to “beats” that emerge from SA
SA as User Research Method:  Sleeper, but no panacea Benefits Non-intrusive Inexpensive and (usually) accessible Large volume of “real” data Represents actual usage patterns Drawbacks Provides an incomplete picture of usage:  was user satisfied at session’s end? Difficult to analyze:  where are the commercial tools? Ultimately an excellent  complement  to qualitative methods (e.g., task analysis, field studies)
SA Headaches: What gets in the way? Lack of time Few useful tools for parsing logs, generating reports Tension between those who want to perform SA and those who “own” the data (chiefly IT) Ignorance of the method Hard work and/or boredom of doing analysis  From summer 2006 survey (134 responses)  www.rosenfeldmedia.com/books/searchanalytics/blog/search_analytics_survey_result/
Please Share Your SA Knowledge: Visit our “book in progress” site Search Analytics for Your Site:  Conversations with your Customers  by Louis Rosenfeld and Richard Wiggins (Rosenfeld Media, 2007) Site URL:  www.rosenfeldmedia.com/books/searchanalytics/ Feed URL:  feeds.rosenfeldmedia.com/searchanalytics/ Site contains: Reading list Survey results Perl script for  parsing logs Log samples Report templates … and more
Contact Information Louis Rosenfeld LLC 902 Miller Avenue Ann Arbor, Michigan  48103  USA [email_address] www.louisrosenfeld.com +1.734.302.3323 voice +1.734.661.1655 fax

Search Analytics: Diagnosing what ails your site

  • 1.
    Search Analytics: Diagnosingwhat ails your site Michigan UPA Ann Arbor, Michigan January 17, 2007 Louis Rosenfeld www.louisrosenfeld.com www.rosenfeldmedia.com/books/searchanalytics
  • 2.
    About Me Informationarchitecture (IA) consultant; formerly president Argus Associates Publisher and founder, Rosenfeld Media (www.rosenfeldmedia.com) Background in librarianship/information science; consult for Fortune 500s Co-author, Information Architecture for the World Wide Web (3rd edition 11/06) Co-founder, Information Architecture Institute (www.iainstitute.org) and UXnet (www.uxnet.org)
  • 3.
    AOL Searcher #4417749Interests 60 single men aameetings in georgia plastic surgeons in gwinnett county applying to west point bipolar panic disorders yerba mate shedless dogs movies for dogs new zealand real estate Thelma Arnold 62-year old widow Lilburn, GA resident NY Times , August 9, 2006: “A Face Is Exposed for AOL Searcher No. 4417749”
  • 4.
    Our Inadvertent SearchAnalytics Education, courtesy AOL http://www.aolsearchdatabase.com 650,000 searchers 21,000,000 queries
  • 5.
    Anatomy of aSearch Log (from Google Search Appliance) Critical elements in bold: IP address , time/date stamp , query , and # of results: XXX.XXX.X.104 - - [ 10/Jul/2006:10:25:46 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q= lincense+plate &ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02 XXX.XXX.X.104 - - [ 10/Jul/2006:10:25:48 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q= license+plate &ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16 XXX.XXX.XX.130 - - [ 10/Jul/2006:10:24:38 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q= regional+transportation+governance+commission &ip=XXX.XXX.X.130 HTTP/1.1" 200 9718 62 0.17 Full legend and more examples here: http://www.rosenfeldmedia.com/books/searchanalytics/blog/log_sample_google_appliance/
  • 6.
    Sample Query AnalysisReport Download template here: http://www.rosenfeldmedia.com/books/searchanalytics/blog/free_ms_excel_template_for_ana/
  • 7.
    The Head, theLong Tail, and the Interesting Stuff in Between Sorting queries by frequency results in a Zipf Distribution Can we improve performance for the most popular queries?
  • 8.
    Querying your Queries: Some basic questions 1/2 What are the most common unique queries? Do any interesting patterns emerge from analyzing these common queries? When common queries are searched, are the results the ones your users should be seeing? Which common queries retrieve zero results? Which common queries retrieve a large number of results, say 100 or more?
  • 9.
    Querying your Queries: Some basic questions 2/2 Which common queries retrieve results that don’t get clicked through? What page is the top source (referrer) per common query? What is the number of click-throughs per common query? Which result is most frequently clicked-through per common query? What’s the average query length (number of terms, number of characters)? Which URLs are users searching for?
  • 10.
    Tune your Questions:Broad to specific Netflix asks: Which movies most frequently searched? Which of them most frequently clicked through? Which of them least frequently added to queue (and why)? Examples: “ OO7” versus “007” Porn-related (not carried by Netflix) “ yoga”: not stocking enough? or not indexing enough record content?
  • 11.
    SA as DiagnosticTool: What can you fix or improve? User Research Interface Design: search entry interface, search results Retrieval Algorithm Modification Navigation Design Metadata Development Content Development
  • 12.
    User Research: Whatdo they want?… SA is a true expression of users’ information needs (often surprising: e.g., SKU numbers at LL Bean; URLs at IBM) Provides context by displaying aspects of single search sessions
  • 13.
    User Research: …whowants it?… What can you learn from knowing these things? What specific segments want; determined by: Security clearance IP address Job function Account information Which pages they initiate searches from
  • 14.
    User Research: …andwhen do they want it? Time-based variation (and clustered queries) By hour, by day, by season Helps determine “best bets” and “guide” develop- ment
  • 15.
    Search Entry InterfaceDesign: “The Box” or something else? SA identifies “dead end” points (e.g., 0 hits, 2000 hits) where assistance could be added (e.g., revise search, browsing alternative) Syntax of queries informs selection of search features to expose (e.g., use of Boolean operators, fielded searching) … OR…
  • 16.
    Search Results InterfaceDesign: Which results where? #10 result is clicked through more often than #s 6, 7, 8, and 9 (ten results per page) From SLI Systems (www.sli-systems.com)
  • 17.
    Search Results InterfaceDesign: How to sort results? Financial Times has found that users often include dates in their queries Obvious but effective improvement: Allow users to sort by date
  • 18.
    Search System: Whatto change? Identify new functionality: Financial Times added spell checking Retrieval algorithm modifications: Deloitte, Barnes & Noble use SA to demonstrate that basic improvements (e.g., Best Bets) are insufficient Financial Times weights company names higher
  • 19.
    Navigation: Any improvements?Michigan State University builds A-Z index automatically based on frequent queries
  • 20.
    Navigation: Where doesit fail? Track and study pages (excluding main page) where search is initiated Are there obvious issues that would cause a “dead end”? Are there user studies that could test/validate problems on these pages? Sandia Labs analyzes most requested documents to test content independent of site structure; results used to improve structure
  • 21.
    Metadata Development: Howdo users express their needs? SA provides a sense of tone: how users’ needs are expressed Jargon (e.g., “cancer” vs. “oncology,” “lorry” vs. “truck,” acronyms) Length (e.g., number of terms/query) Syntax (e.g., Boolean, natural language, keyword)
  • 22.
    Metadata Development: Whichmetadata values? SA helps in the creation of controlled vocabularies Terms are fodder for metadata values (e.g., “cell phone,” “JFK” vs. “John Kennedy,” “country music”), especially for determining preferred terms Works with tools that cluster synonyms (example from www.behaviortracking.com), enabling concept searching and thesaurus development
  • 23.
    Metadata Development: Whichmetadata attributes? SA helps in the creation of vocabularies Simple cluster analysis can detect metadata attributes (e.g., “product,” “person,” “topic”) Look for variations between short head and long tail (Deloitte intranet: “known-item” queries are common; research topics are infrequent) known-item queries research queries
  • 24.
    Content Development: Dowe have the right content? SA identifies content that can’t be found (0 results) Does the content exist? If so, there are wording, metadata, or spidering problems If not, why not? www.behaviortracking.com
  • 25.
    Content Development: Arewe featuring the right stuff? Clickthrough tracking helps determine which results should rise to the top (example: SLI Systems) Also suggests which “best bets” to develop to address common queries
  • 26.
    Organizational Impact: Educationalopportunities SA is a way to “reverse engineer” how your site performs in order to: Sensitize organization to analytics, specifically related to findability Sensitize content owners/authors to benefits of good practices around content titling, tagging, and navigational placement
  • 27.
    Organizational Impact: Rethinkinghow you do things Financial Times learns about breaking stories from their logs by monitoring spikes in company names and individuals’ names and comparing with their current coverage Discrepancy = possible breaking story; reporter is assigned to follow up Next step? Assign reporters to “beats” that emerge from SA
  • 28.
    SA as UserResearch Method: Sleeper, but no panacea Benefits Non-intrusive Inexpensive and (usually) accessible Large volume of “real” data Represents actual usage patterns Drawbacks Provides an incomplete picture of usage: was user satisfied at session’s end? Difficult to analyze: where are the commercial tools? Ultimately an excellent complement to qualitative methods (e.g., task analysis, field studies)
  • 29.
    SA Headaches: Whatgets in the way? Lack of time Few useful tools for parsing logs, generating reports Tension between those who want to perform SA and those who “own” the data (chiefly IT) Ignorance of the method Hard work and/or boredom of doing analysis From summer 2006 survey (134 responses) www.rosenfeldmedia.com/books/searchanalytics/blog/search_analytics_survey_result/
  • 30.
    Please Share YourSA Knowledge: Visit our “book in progress” site Search Analytics for Your Site: Conversations with your Customers by Louis Rosenfeld and Richard Wiggins (Rosenfeld Media, 2007) Site URL: www.rosenfeldmedia.com/books/searchanalytics/ Feed URL: feeds.rosenfeldmedia.com/searchanalytics/ Site contains: Reading list Survey results Perl script for parsing logs Log samples Report templates … and more
  • 31.
    Contact Information LouisRosenfeld LLC 902 Miller Avenue Ann Arbor, Michigan 48103 USA [email_address] www.louisrosenfeld.com +1.734.302.3323 voice +1.734.661.1655 fax