Search Analytics: Diagnosing what ails your site


Published on

Lou Rosenfeld's presentation on search analytics, given at's Web Manager University, October 27, 2006.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Search Analytics: Diagnosing what ails your site

  1. 1. Search Analytics: Diagnosing what ails your site Web Manager University September 27, 2006 Louis Rosenfeld
  2. 2. About me <ul><li>Information architecture (IA) consultant; formerly president Argus Associates </li></ul><ul><li>Publisher and founder, Rosenfeld Media ( </li></ul><ul><li>Background in librarianship/information science; consult for Fortune 500s </li></ul><ul><li>Co-author, Information Architecture for the World Wide Web (3rd edition out this fall) </li></ul><ul><li>Co-founder, Information Architecture Institute ( and UXnet ( </li></ul>
  3. 3. AOL Searcher #4417749 <ul><li>Interests </li></ul><ul><ul><li>60 single men </li></ul></ul><ul><ul><li>aameetings in georgia </li></ul></ul><ul><ul><li>plastic surgeons in gwinnett county </li></ul></ul><ul><ul><li>applying to west point </li></ul></ul><ul><ul><li>bipolar </li></ul></ul><ul><ul><li>panic disorders </li></ul></ul><ul><ul><li>yerba mate </li></ul></ul><ul><ul><li>shedless dogs </li></ul></ul><ul><ul><li>movies for dogs </li></ul></ul><ul><ul><li>new zealand real estate </li></ul></ul><ul><li>Thelma Arnold </li></ul><ul><ul><li>62-year old widow </li></ul></ul><ul><ul><li>Lilburn, GA resident </li></ul></ul>NY Times , August 9, 2006: “A Face Is Exposed for AOL Searcher No. 4417749”
  4. 4. Our Inadvertent Search Analytics Education, courtesy AOL <ul><li> </li></ul>650,000 searchers 21,000,000 queries
  5. 5. Analyze This: <ul><li>Keywords: focis; 0; 11/26/04 12:57 PM; XXX.XXX.XXX.2 </li></ul><ul><li>Keywords: focus; 167; 11/26/04 12:59 PM; XXX.XXX.XXX.2 </li></ul><ul><li>Keywords: focus pricing; 12; 11/26/04 1:02 PM; XXX.XXX.XXX.2 </li></ul><ul><li>Keywords: discounts for college students; 0; 11/26/04 3:35 PM; XXX.XXX.XXX.59 </li></ul><ul><li>Keywords: student discounts; 3; 11/26/04 3:35 PM; XXX.XXX.XXX.59 </li></ul><ul><li>Keywords: ford or mercury; 500; 11/26/04 3:35 PM; XXX.XXX.XXX.126 </li></ul><ul><li>Keywords: (ford or mercury) and dealers; 73; 11/26/04 3:36 PM; XXX.XXX.XXX.126 </li></ul><ul><li>Keywords: lorry; 0; 11/26/04 3:36 PM; XXX.XXX.XXX.36 </li></ul><ul><li>Keywords: “safety ratings”; 3; 11/26/04 3:36 PM; XXX.XXX.XXX.55 </li></ul><ul><li>Keywords: safety; 389; 11/26/04 3:36 PM; XXX.XXX.XXX.55 </li></ul><ul><li>Keywords: seatbelts; 2; 11/26/04 3:37 PM; XXX.XXX.XXX.55 </li></ul><ul><li>Keywords: seat belts; 33; 11/26/04 3:37 PM; XXX.XXX.XXX.55 </li></ul>
  6. 6. The Head, the Long Tail, and the Interesting Stuff in Between Sorting queries by frequency results in a Zipf Distribution Can we improve performance for the most popular queries?
  7. 7. User Research: What do they want?… <ul><li>SA is a true expression of users’ information needs (often surprising: e.g., SKU numbers at LL Bean; URLs at IBM) </li></ul><ul><li>Provides context by displaying aspects of single search sessions </li></ul>
  8. 8. User Research: …who wants it?… <ul><li>What can you learn from knowing these things? </li></ul><ul><ul><li>What specific segments want; determined by: </li></ul></ul><ul><ul><ul><li>Security clearance </li></ul></ul></ul><ul><ul><ul><li>IP address </li></ul></ul></ul><ul><ul><ul><li>Job function </li></ul></ul></ul><ul><ul><ul><li>Account information </li></ul></ul></ul><ul><ul><li>Which pages they initiate searches from </li></ul></ul>
  9. 9. Users Research: …and when do they want it? <ul><li>Time-based variation (and clustered queries) </li></ul><ul><li>By hour, by day, by season </li></ul><ul><li>Helps determine “best bets” and “guide” develop- ment </li></ul>
  10. 10. Search Entry Interface Design: “The Box” or something else? <ul><li>SA identifies “dead end” points (e.g., 0 hits, 2000 hits) where assistance could be added (e.g., revise search, browsing alternative) </li></ul><ul><li>Syntax of queries informs selection of search features to expose (e.g., use of Boolean operators, fielded searching) </li></ul>… OR…
  11. 11. Search Results Interface Design: Which results where? <ul><li>#10 result is clicked through more often than #s 6, 7, 8, and 9 (ten results per page) </li></ul>From SLI Systems (
  12. 12. Search Results Interface Design: How to sort results? <ul><li>Financial Times has found that users often include dates in their queries </li></ul><ul><li>Obvious but effective improvement: Allow users to sort by date </li></ul>
  13. 13. Navigation: Any improvements? <ul><li>Michigan State University builds A-Z index automatically based on frequent queries </li></ul>
  14. 14. Navigation: Where does it fail? <ul><li>Track and study pages (excluding main page) where search is initiated </li></ul><ul><ul><li>Are there obvious issues that would cause a “dead end”? </li></ul></ul><ul><ul><li>Are there user studies that could test/validate problems on these pages? </li></ul></ul><ul><li>Sandia Labs analyzes most requested documents to test content independent of site structure; results used to improve structure </li></ul>
  15. 15. Search System: What to change? <ul><li>Identify new functionality: Financial Times added spell checking </li></ul><ul><li>Retrieval algorithm modifications: </li></ul><ul><ul><li>Deloitte, Barnes & Noble use SA to demonstrate that basic improvements (e.g., Best Bets) are insufficient </li></ul></ul><ul><ul><li>Financial Times weights company names higher </li></ul></ul>
  16. 16. Metadata Development: How do users express their needs? <ul><li>SA provides a sense of tone: how users’ needs are expressed </li></ul><ul><ul><li>Jargon (e.g., “cancer” vs. “oncology,” “lorry” vs. “truck,” acronyms) </li></ul></ul><ul><ul><li>Length (e.g., number of terms/query) </li></ul></ul><ul><ul><li>Syntax (e.g., Boolean, natural language, keyword) </li></ul></ul>
  17. 17. Metadata Development: Which metadata values? <ul><li>SA helps in the creation of controlled vocabularies </li></ul><ul><li>Terms are fodder for metadata values (e.g., “cell phone,” “JFK” vs. “John Kennedy,” “country music”), especially for determining preferred terms </li></ul><ul><li>Works with tools that cluster synonyms (example from, enabling concept searching and thesaurus development </li></ul>
  18. 18. Metadata Development: Which metadata attributes? <ul><li>SA helps in the creation of vocabularies </li></ul><ul><li>Simple cluster analysis can detect metadata attributes (e.g., “product,” “person,” “topic”) </li></ul><ul><li>Look for variations between short head and long tail (Deloitte intranet: “known-item” queries are common; research topics are infrequent) </li></ul>known-item queries research queries
  19. 19. Content Development: Do we have the right content? <ul><li>SA identifies content that can’t be found (0 results) </li></ul><ul><li>Does the content exist? If so, there are wording, metadata, or spidering problems </li></ul><ul><li>If not, why not? </li></ul>
  20. 20. Content Development: Are we featuring the right stuff? <ul><li>Clickthrough tracking helps determine which results should rise to the top (example: SLI Systems) </li></ul><ul><li>Also suggests which “best bets” to develop to address common queries </li></ul>
  21. 21. Organizational Impact: Educational opportunities <ul><li>SA is a way to “reverse engineer” how your site performs in order to: </li></ul><ul><ul><li>Sensitize organization to analytics, specifically related to findability </li></ul></ul><ul><ul><li>Sensitize content owners/authors to benefits of good practices around content titling, tagging, and navigational placement </li></ul></ul>
  22. 22. Organizational Impact: Rethinking how you do things <ul><li>Financial Times learns about breaking stories from their logs by monitoring spikes in company names and individuals’ names and comparing with their current coverage </li></ul><ul><li>Discrepancy = possible breaking story; reporter is assigned to follow up </li></ul><ul><li>Next step? Assign reporters to “beats” that emerge from SA </li></ul>
  23. 23. The Ideal SLA report 1/2 (from Avi Rappoport) <ul><li># searches for each week/month/quarter/year </li></ul><ul><li>Top 1% of queries (cluster by stem if possible) </li></ul><ul><li>Top 10% of no-matches queries </li></ul><ul><li>Top 10% of low-matches queries? (one to 4 hits, or more depending on site size) </li></ul><ul><li># empty searches </li></ul><ul><li>Changes in these over the last week/month/quarter/year </li></ul><ul><li>Changes’ correlation to changes in the site, search engine, company profile </li></ul>
  24. 24. The Ideal SLA report 2/2 (from Avi Rappoport) <ul><li>Queries showing significant increases </li></ul><ul><li>Patterns in less-frequent queries -- names? places? web site addresses? </li></ul><ul><li>Top pages retrieved in search results and the queries that retrieved them </li></ul><ul><li>Queries that retrieved the best/most important pages </li></ul><ul><li>For search zones, create reports for each zone (will have significant impact on no-matches data) </li></ul>
  25. 25. SA as User Research Method: Sleeper, but no panacea <ul><li>Benefits </li></ul><ul><ul><li>Non-intrusive </li></ul></ul><ul><ul><li>Inexpensive and (usually) accessible </li></ul></ul><ul><ul><li>Large volume of “real” data </li></ul></ul><ul><ul><li>Represents actual usage patterns </li></ul></ul><ul><li>Drawbacks </li></ul><ul><ul><li>Provides an incomplete picture of usage: was user satisfied at session’s end? </li></ul></ul><ul><ul><li>Difficult to analyze: where are the commercial tools? </li></ul></ul><ul><li>Ultimately an excellent complement to qualitative methods (e.g., task analysis, field studies) </li></ul>
  26. 26. SA headaches: What gets in the way? <ul><li>Lack of time </li></ul><ul><li>Few useful tools for parsing logs, generating reports </li></ul><ul><li>Tension between those who want to perform SA and those who “own” the data (chiefly IT) </li></ul><ul><li>Ignorance of the method </li></ul><ul><li>Hard work and/or boredom of doing analysis </li></ul><ul><li>From summer 2007 survey (134 responses) </li></ul>
  27. 27. Please Share Your SA Knowledge: Visit our “book in progress” site <ul><li>Site URL: </li></ul><ul><li>Feed URL: </li></ul><ul><li>Site contains: </li></ul><ul><li>Reading list </li></ul><ul><li>Survey results </li></ul><ul><li>Perl script for parsing logs </li></ul><ul><li>Log samples </li></ul><ul><li>… and more </li></ul>
  28. 28. Contact Information <ul><li>Louis Rosenfeld LLC </li></ul><ul><li>902 Miller Avenue </li></ul><ul><li>Ann Arbor, Michigan 48103 USA </li></ul><ul><li>[email_address] </li></ul><ul><li> </li></ul><ul><li>+1.734.302.3323 voice </li></ul><ul><li>+1.734.661.1655 fax </li></ul>