Search Analytics: Powerful diagnostics for your site


Published on

Presented by Lou Rosenfeld and Rich Wiggins to Seth Earley's Search Solutions Jumpstart Conference Series, November 3, 2006.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Search Analytics: Powerful diagnostics for your site

  1. 1. Search Analytics: Powerful diagnostics for your site Search Solutions Jumpstart Conference Call Series November 3, 2006 Louis Rosenfeld
  2. 2. About Me <ul><li>Information architecture (IA) consultant; formerly president Argus Associates </li></ul><ul><li>Publisher and founder, Rosenfeld Media ( </li></ul><ul><li>Background in librarianship/information science; consult for Fortune 500s </li></ul><ul><li>Co-author, Information Architecture for the World Wide Web (3rd edition out this fall) </li></ul><ul><li>Co-founder, Information Architecture Institute ( and UXnet ( </li></ul>
  3. 3. Anatomy of a Search Log (from Google Search Appliance) <ul><li>Critical elements in bold: IP address , time/date stamp , query , and # of results: </li></ul><ul><li>XXX.XXX.X.104 - - [ 10/Jul/2006:10:25:46 -0800] &quot;GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q= lincense+plate &ip=XXX.XXX.X.104 HTTP/1.1&quot; 200 971 0 0.02 </li></ul><ul><li>XXX.XXX.X.104 - - [ 10/Jul/2006:10:25:48 -0800] &quot;GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q= license+plate &ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1&quot; 200 8283 146 0.16 </li></ul><ul><li>XXX.XXX.XX.130 - - [ 10/Jul/2006:10:24:38 -0800] &quot;GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q= regional+transportation+governance+commission &ip=XXX.XXX.X.130 HTTP/1.1&quot; 200 9718 62 0.17 </li></ul>Full legend and more examples here:
  4. 4. The Head, the Long Tail, and the Interesting Stuff in Between Sorting queries by frequency results in a Zipf Distribution Can we improve performance for the most popular queries?
  5. 5. SA as Diagnostic Tool: What can you fix or improve? <ul><li>User Research </li></ul><ul><li>Interface Design: search entry interface, search results </li></ul><ul><li>Retrieval Algorithm Modification </li></ul><ul><li>Navigation Design </li></ul><ul><li>Metadata Development </li></ul><ul><li>Content Development </li></ul>
  6. 6. User Research: What do they want?… <ul><li>SA is a true expression of users’ information needs (often surprising: e.g., SKU numbers at LL Bean; URLs at IBM) </li></ul><ul><li>Provides context by displaying aspects of single search sessions </li></ul>
  7. 7. User Research: …who wants it?… <ul><li>What can you learn from knowing these things? </li></ul><ul><ul><li>What specific segments want; determined by: </li></ul></ul><ul><ul><ul><li>Security clearance </li></ul></ul></ul><ul><ul><ul><li>IP address </li></ul></ul></ul><ul><ul><ul><li>Job function </li></ul></ul></ul><ul><ul><ul><li>Account information </li></ul></ul></ul><ul><ul><li>Which pages they initiate searches from </li></ul></ul>
  8. 8. User Research: …and when do they want it? <ul><li>Time-based variation (and clustered queries) </li></ul><ul><li>By hour, by day, by season </li></ul><ul><li>Helps determine “best bets” and “guide” develop- ment </li></ul>
  9. 9. Search Entry Interface Design: “The Box” or something else? <ul><li>SA identifies “dead end” points (e.g., 0 hits, 2000 hits) where assistance could be added (e.g., revise search, browsing alternative) </li></ul><ul><li>Syntax of queries informs selection of search features to expose (e.g., use of Boolean operators, fielded searching) </li></ul>… OR…
  10. 10. Search Results Interface Design: Which results where? <ul><li>#10 result is clicked through more often than #s 6, 7, 8, and 9 (ten results per page) </li></ul>From SLI Systems (
  11. 11. Search Results Interface Design: How to sort results? <ul><li>Financial Times has found that users often include dates in their queries </li></ul><ul><li>Obvious but effective improvement: Allow users to sort by date </li></ul>
  12. 12. Search System: What to change? <ul><li>Identify new functionality: Financial Times added spell checking </li></ul><ul><li>Retrieval algorithm modifications: </li></ul><ul><ul><li>Deloitte, Barnes & Noble use SA to demonstrate that basic improvements (e.g., Best Bets) are insufficient </li></ul></ul><ul><ul><li>Financial Times weights company names higher </li></ul></ul>
  13. 13. Navigation: Any improvements? <ul><li>Michigan State University builds A-Z index automatically based on frequent queries </li></ul>
  14. 14. Navigation: Where does it fail? <ul><li>Track and study pages (excluding main page) where search is initiated </li></ul><ul><ul><li>Are there obvious issues that would cause a “dead end”? </li></ul></ul><ul><ul><li>Are there user studies that could test/validate problems on these pages? </li></ul></ul><ul><li>Sandia Labs analyzes most requested documents to test content independent of site structure; results used to improve structure </li></ul>
  15. 15. Metadata Development: How do users express their needs? <ul><li>SA provides a sense of tone: how users’ needs are expressed </li></ul><ul><ul><li>Jargon (e.g., “cancer” vs. “oncology,” “lorry” vs. “truck,” acronyms) </li></ul></ul><ul><ul><li>Length (e.g., number of terms/query) </li></ul></ul><ul><ul><li>Syntax (e.g., Boolean, natural language, keyword) </li></ul></ul>
  16. 16. Metadata Development: Which metadata values? <ul><li>SA helps in the creation of controlled vocabularies </li></ul><ul><li>Terms are fodder for metadata values (e.g., “cell phone,” “JFK” vs. “John Kennedy,” “country music”), especially for determining preferred terms </li></ul><ul><li>Works with tools that cluster synonyms (example from, enabling concept searching and thesaurus development </li></ul>
  17. 17. Metadata Development: Which metadata attributes? <ul><li>SA helps in the creation of vocabularies </li></ul><ul><li>Simple cluster analysis can detect metadata attributes (e.g., “product,” “person,” “topic”) </li></ul><ul><li>Look for variations between short head and long tail (Deloitte intranet: “known-item” queries are common; research topics are infrequent) </li></ul>known-item queries research queries
  18. 18. Content Development: Do we have the right content? <ul><li>SA identifies content that can’t be found (0 results) </li></ul><ul><li>Does the content exist? If so, there are wording, metadata, or spidering problems </li></ul><ul><li>If not, why not? </li></ul>
  19. 19. Content Development: Are we featuring the right stuff? <ul><li>Clickthrough tracking helps determine which results should rise to the top (example: SLI Systems) </li></ul><ul><li>Also suggests which “best bets” to develop to address common queries </li></ul>
  20. 20. Organizational Impact: Educational opportunities <ul><li>SA is a way to “reverse engineer” how your site performs in order to: </li></ul><ul><ul><li>Sensitize organization to analytics, specifically related to findability </li></ul></ul><ul><ul><li>Sensitize content owners/authors to benefits of good practices around content titling, tagging, and navigational placement </li></ul></ul>
  21. 21. Organizational Impact: Rethinking how you do things <ul><li>Financial Times learns about breaking stories from their logs by monitoring spikes in company names and individuals’ names and comparing with their current coverage </li></ul><ul><li>Discrepancy = possible breaking story; reporter is assigned to follow up </li></ul><ul><li>Next step? Assign reporters to “beats” that emerge from SA </li></ul>
  22. 22. The Ideal SLA report 1/2 (Avi Rappoport, <ul><li># searches for each week/month/quarter/year </li></ul><ul><li>Top 1% of queries (cluster by stem if possible) </li></ul><ul><li>Top 10% of no-matches queries </li></ul><ul><li>Top 10% of low-matches queries? (one to 4 hits, or more depending on site size) </li></ul><ul><li># empty searches </li></ul><ul><li>Changes in these over the last week/month/quarter/year </li></ul><ul><li>Changes’ correlation to changes in the site, search engine, company profile </li></ul>
  23. 23. The Ideal SLA report 2/2 (Avi Rappoport , <ul><li>Queries showing significant increases </li></ul><ul><li>Patterns in less-frequent queries -- names? places? web site addresses? </li></ul><ul><li>Top pages retrieved in search results and the queries that retrieved them </li></ul><ul><li>Queries that retrieved the best/most important pages </li></ul><ul><li>For search zones, create reports for each zone (will have significant impact on no-matches data) </li></ul>
  24. 24. SA as User Research Method: Sleeper, but no panacea <ul><li>Benefits </li></ul><ul><ul><li>Non-intrusive </li></ul></ul><ul><ul><li>Inexpensive and (usually) accessible </li></ul></ul><ul><ul><li>Large volume of “real” data </li></ul></ul><ul><ul><li>Represents actual usage patterns </li></ul></ul><ul><li>Drawbacks </li></ul><ul><ul><li>Provides an incomplete picture of usage: was user satisfied at session’s end? </li></ul></ul><ul><ul><li>Difficult to analyze: where are the commercial tools? </li></ul></ul><ul><li>Ultimately an excellent complement to qualitative methods (e.g., task analysis, field studies) </li></ul>
  25. 25. SA Headaches: What gets in the way? <ul><li>Lack of time </li></ul><ul><li>Few useful tools for parsing logs, generating reports </li></ul><ul><li>Tension between those who want to perform SA and those who “own” the data (chiefly IT) </li></ul><ul><li>Ignorance of the method </li></ul><ul><li>Hard work and/or boredom of doing analysis </li></ul><ul><li>From summer 2006 survey (134 responses) </li></ul>
  26. 26. Please Share Your SA Knowledge: Visit our “book in progress” site <ul><li>Search Analytics for Your Site: Conversations with your Customers by Louis Rosenfeld and Richard Wiggins (Rosenfeld Media, 2007) </li></ul><ul><li>Site URL: </li></ul><ul><li>Feed URL: </li></ul><ul><li>Site contains: </li></ul><ul><li>Reading list </li></ul><ul><li>Survey results </li></ul><ul><li>Perl script for parsing logs </li></ul><ul><li>Log samples </li></ul><ul><li>… and more </li></ul>
  27. 27. Contact Information <ul><li>Louis Rosenfeld LLC </li></ul><ul><li>902 Miller Avenue </li></ul><ul><li>Ann Arbor, Michigan 48103 USA </li></ul><ul><li>[email_address] </li></ul><ul><li> </li></ul><ul><li>+1.734.302.3323 voice </li></ul><ul><li>+1.734.661.1655 fax </li></ul>