Search Analytics for Fun and Profit An Event Apart Chicago, Illinois August 27, 2007 Lou Rosenfeld www.rosenfeldmedia.com
Who I Am <ul><li>Information architecture consultant to Fortune 500s </li></ul><ul><li>Publisher and founder, Rosenfeld  M...
Anatomy of a Search Log (from Google Search Appliance) <ul><li>Critical elements in  pink :  IP address ,  time/date stamp...
The Zipf Curve:  Short Head, Middle Torso, Long Tail
Keep It In Proportion <ul><li>7218 campus map </li></ul><ul><li>5859 map </li></ul><ul><li>5184 im west </li></ul><ul><li>...
What’s the Sweet Spot? department of surgery 7 80.00 7877 hotels 124 50.02 500 msu union 295 40.05 221 computer center 650...
Topical Patterns and Seasonal Changes
Where will you  Capture Search Queries? <ul><li>The  search logs  that your search engine naturally captures and maintains...
Querying your Queries:  Getting started <ul><li>What are the most  frequent unique queries? </li></ul><ul><li>Are frequent...
Tune your Questions: From generic to specific <ul><li>Netflix asks </li></ul><ul><ul><li>Which movies most frequently sear...
Diagnose This:  Fixing and improving the UX <ul><li>User Research </li></ul><ul><li>Content Development  </li></ul><ul><li...
User Research: What do they want?… <ul><li>SA is a true expression of users’ information needs (often surprising:  e.g., S...
User Research: …what else do they want?… BBC provides reports to determine other terms searched within same session (track...
User Research: …who wants it?… <ul><li>Specific segments needs as determined by: </li></ul><ul><ul><li>Security clearance ...
User Research: …who wants it?… BBC’s top  queries report from children’s section of site
User Research: …and when do they want it? <ul><li>Time-based variation (and clustered queries) from MSU </li></ul><ul><li>...
Content Development: Do we have the right content? From www.behaviortracking.com <ul><li>Analyze 0 result queries </li></u...
Content Development: Are we featuring the right stuff? Track clickthroughs to determine which results should rise to the t...
Search Entry Interface Design: “The Box” or something else? <ul><li>Identify “dead end” points (e.g., 0 hits, 2000 hits) w...
Search Results Interface Design: Which results where? <ul><li>#10 result is clicked through more often than #s 6, 7, 8, an...
Search Results Interface Design: How to sort results? <ul><li>Financial Times  has found that users often include dates in...
Search System: What to change? <ul><li>Add functionality:  Financial Times  added spell checking </li></ul><ul><li>Retriev...
Navigation: Any improvements? <ul><li>Michigan State University builds A-Z index automatically based on frequent queries <...
Navigation: Where does it fail? <ul><li>Track and study pages (excluding main page) where search is initiated </li></ul><u...
Metadata Development: How do searchers express their needs? <ul><li>Tone and jargon (e.g., “cancer” vs. “oncology,” “lorry...
Metadata Development: Which values and attributes? <ul><li>Uncover hierarchy and identify </li></ul><ul><ul><li>Metadata v...
Metadata Development: Leveraging differences in the curve <ul><li>Variations in information needs emerge between Short Hea...
Organizational Impact: Educational opportunities <ul><li>“ Reverse engineer” performance problems </li></ul><ul><ul><li>Va...
Organizational Impact: Reexamining assumptions <ul><li>Financial Times  learns about breaking stories from their logs by m...
SA as User Research Method:  Sleeper, but no panacea <ul><li>Benefits </li></ul><ul><ul><li>Non-intrusive </li></ul></ul><...
SA Headaches: What gets in the way? <ul><li>Problems* </li></ul><ul><ul><li>Lack of time </li></ul></ul><ul><ul><li>Few us...
Please Share Your SA Knowledge: Visit our  book in progress  site <ul><li>Search Analytics for Your Site:  Conversations w...
Contact Information <ul><li>Louis Rosenfeld  </li></ul><ul><li>Rosenfeld Media, LLC </li></ul><ul><li>705 Carroll Street, ...
Upcoming SlideShare
Loading in …5
×

Search Analytics for Fun and Profit

4,265 views

Published on

Lou Rosenfeld's presentation on local site search analytics; An Event Apart Chicago, August 27, 2007.

Published in: Economy & Finance, Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,265
On SlideShare
0
From Embeds
0
Number of Embeds
42
Actions
Shares
0
Downloads
85
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Search Analytics for Fun and Profit

    1. 1. Search Analytics for Fun and Profit An Event Apart Chicago, Illinois August 27, 2007 Lou Rosenfeld www.rosenfeldmedia.com
    2. 2. Who I Am <ul><li>Information architecture consultant to Fortune 500s </li></ul><ul><li>Publisher and founder, Rosenfeld Media </li></ul><ul><li>Blog at www.louisrosenfeld.com </li></ul><ul><li>Co-author, Information Architecture for the World Wide Web (3rd ed., 2006; O’Reilly) </li></ul><ul><li>New book: Search Analytics for Your Site: Conversations with your customers (2008; Rosenfeld Media): www.rosenfeldmedia.com/books/searchanalytics </li></ul>
    3. 3. Anatomy of a Search Log (from Google Search Appliance) <ul><li>Critical elements in pink : IP address , time/date stamp , query , and # of results: </li></ul><ul><li>XXX.XXX.X.104 - - [ 10/Jul/2006:10:25:46 -0800] &quot;GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q= lincense+plate &ip=XXX.XXX.X.104 HTTP/1.1&quot; 200 971 0 0.02 </li></ul><ul><li>XXX.XXX.X.104 - - [ 10/Jul/2006:10:25:48 -0800] &quot;GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q= license+plate &ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1&quot; 200 8283 146 0.16 </li></ul><ul><li>XXX.XXX.XX.130 - - [ 10/Jul/2006:10:24:38 -0800] &quot;GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q= regional+transportation+governance+commission &ip=XXX.XXX.X.130 HTTP/1.1&quot; 200 9718 62 0.17 </li></ul>
    4. 4. The Zipf Curve: Short Head, Middle Torso, Long Tail
    5. 5. Keep It In Proportion <ul><li>7218 campus map </li></ul><ul><li>5859 map </li></ul><ul><li>5184 im west </li></ul><ul><li>4320 library </li></ul><ul><li>3745 study abroad </li></ul><ul><li>3690 schedule of courses </li></ul><ul><li>3584 bookstore </li></ul><ul><li>3575 spartantrak </li></ul><ul><li>3229 angel </li></ul><ul><li>3204 cata </li></ul>
    6. 6. What’s the Sweet Spot? department of surgery 7 80.00 7877 hotels 124 50.02 500 msu union 295 40.05 221 computer center 650 30.01 98 webenroll 1351 20.18 42 housing 2464 10.53 14 campus map 7218 1.40 1 Query Count Cumul. % Rank
    7. 7. Topical Patterns and Seasonal Changes
    8. 8. Where will you Capture Search Queries? <ul><li>The search logs that your search engine naturally captures and maintains as searches take place </li></ul><ul><li>Search keywords or phrases that your users execute, that you capture into your own local database </li></ul><ul><li>Search keywords or phrases that your commercial search solution captures, records, and reports on (Mondosoft, Visual Sciences, Ultraseek, Google Appliance, etc.) </li></ul>
    9. 9. Querying your Queries: Getting started <ul><li>What are the most frequent unique queries? </li></ul><ul><li>Are frequent queries retrieving quality results? </li></ul><ul><li>Click-through rates per frequent query? </li></ul><ul><li>Most frequently clicked result per query? </li></ul><ul><li>Which frequent queries retrieve zero results? </li></ul><ul><li>What are the referrer pages for frequent queries? </li></ul><ul><li>Which queries retrieve popular documents? </li></ul><ul><li>What interesting patterns emerge in general? </li></ul>
    10. 10. Tune your Questions: From generic to specific <ul><li>Netflix asks </li></ul><ul><ul><li>Which movies most frequently searched? </li></ul></ul><ul><ul><li>Which of them most frequently clicked through? </li></ul></ul><ul><ul><li>Which of them least frequently added to queue? </li></ul></ul>
    11. 11. Diagnose This: Fixing and improving the UX <ul><li>User Research </li></ul><ul><li>Content Development </li></ul><ul><li>Interface Design: search entry interface, search results </li></ul><ul><li>Retrieval Algorithm Modification </li></ul><ul><li>Navigation Design </li></ul><ul><li>Metadata Development </li></ul>
    12. 12. User Research: What do they want?… <ul><li>SA is a true expression of users’ information needs (often surprising: e.g., SKU #s at clothing retailer; URLs at IBM) </li></ul><ul><li>Provides context by displaying aspects of single search sessions </li></ul>
    13. 13. User Research: …what else do they want?… BBC provides reports to determine other terms searched within same session (tracked by cookies)
    14. 14. User Research: …who wants it?… <ul><li>Specific segments needs as determined by: </li></ul><ul><ul><li>Security clearance </li></ul></ul><ul><ul><li>IP address </li></ul></ul><ul><ul><li>Job function </li></ul></ul><ul><ul><li>Account information </li></ul></ul><ul><ul><li>Alternatively, you may be able to extrapolate segments directly from SA </li></ul></ul><ul><li>Pages they initiate searches from </li></ul>
    15. 15. User Research: …who wants it?… BBC’s top queries report from children’s section of site
    16. 16. User Research: …and when do they want it? <ul><li>Time-based variation (and clustered queries) from MSU </li></ul><ul><li>By hour, by day, by season </li></ul><ul><li>Helps determine “best bets” development </li></ul><ul><li>Also can help tune main page and other editorial content </li></ul>
    17. 17. Content Development: Do we have the right content? From www.behaviortracking.com <ul><li>Analyze 0 result queries </li></ul><ul><li>Does the content exist? </li></ul><ul><li>If so, there are titling, wording, metadata, or indexing problems </li></ul><ul><li>If not, why not? </li></ul>
    18. 18. Content Development: Are we featuring the right stuff? Track clickthroughs to determine which results should rise to the top (example: SLI Systems) Also suggests which “best bets” to develop to address common queries BBC removes navigation pages from search results
    19. 19. Search Entry Interface Design: “The Box” or something else? <ul><li>Identify “dead end” points (e.g., 0 hits, 2000 hits) where assistance could be added </li></ul><ul><li>Query syntax helps you select search features to expose (e.g., use of Boolean operators) </li></ul>OR
    20. 20. Search Results Interface Design: Which results where? <ul><li>#10 result is clicked through more often than #s 6, 7, 8, and 9 (ten results per page) </li></ul>From SLI Systems (www.sli-systems.com)
    21. 21. Search Results Interface Design: How to sort results? <ul><li>Financial Times has found that users often include dates in their queries </li></ul><ul><li>Obvious but effective improvement: allow users to sort by date </li></ul>
    22. 22. Search System: What to change? <ul><li>Add functionality: Financial Times added spell checking </li></ul><ul><li>Retrieval algorithm modifications </li></ul><ul><ul><li>Financial Times weights company names higher </li></ul></ul><ul><ul><li>Netflix determines better weighting for unique terms and phrases </li></ul></ul><ul><li>Deloitte, Barnes & Noble, Vanguard demonstrate that basic improvements (e.g., Best Bets) are insufficient (and justify increased $$$) </li></ul>
    23. 23. Navigation: Any improvements? <ul><li>Michigan State University builds A-Z index automatically based on frequent queries </li></ul>
    24. 24. Navigation: Where does it fail? <ul><li>Track and study pages (excluding main page) where search is initiated </li></ul><ul><ul><li>What do they search? (e.g., acronyms, jargon) </li></ul></ul><ul><ul><li>Are there other issues that would cause a “dead end”? (e.g., tagging and titling problems) </li></ul></ul><ul><ul><li>Are there user studies that could test/validate problems on these pages? (e.g., “Where did you want to go next?) </li></ul></ul>
    25. 25. Metadata Development: How do searchers express their needs? <ul><li>Tone and jargon (e.g., “cancer” vs. “oncology,” “lorry” vs. “truck,” acronyms) </li></ul><ul><li>Syntax (e.g., Boolean, natural language, keyword) </li></ul><ul><li>Length (e.g., number of terms/query; Long Tail queries longer and more complex than Short Head) </li></ul><ul><li>Everything we know from analyzing folksonomic tags applies here, and vice versa </li></ul>
    26. 26. Metadata Development: Which values and attributes? <ul><li>Uncover hierarchy and identify </li></ul><ul><ul><li>Metadata values (e.g., mobile vs. cell) </li></ul></ul><ul><ul><li>Metadata attributes (e.g., genre, region) </li></ul></ul><ul><ul><li>Content types (e.g., spec, price sheet) </li></ul></ul><ul><li>SA combines with AI tools for clustering, enabling concept searching and thesaurus development </li></ul>
    27. 27. Metadata Development: Leveraging differences in the curve <ul><li>Variations in information needs emerge between Short Head and Long Tail </li></ul><ul><li>Example: Deloitte intranet’s “known-item” queries are common; research topics are infrequent </li></ul>known-item queries research queries
    28. 28. Organizational Impact: Educational opportunities <ul><li>“ Reverse engineer” performance problems </li></ul><ul><ul><li>Vanguard </li></ul></ul><ul><ul><ul><li>Tests “best” results for common queries </li></ul></ul></ul><ul><ul><ul><li>Determines why these results aren’t retrieved or clicked-through </li></ul></ul></ul><ul><ul><ul><li>Demonstrates problem and solutions to content owners/authors benefits </li></ul></ul></ul><ul><ul><li>Sandia Labs does same, only with top results that are losing rank in search results pages </li></ul></ul>
    29. 29. Organizational Impact: Reexamining assumptions <ul><li>Financial Times learns about breaking stories from their logs by monitoring spikes in company names and individuals’ names and comparing with their current coverage </li></ul><ul><li>Discrepancy = possible breaking story; reporter is assigned to follow up </li></ul><ul><li>Next step? Assign reporters to “beats” that emerge from SA </li></ul>
    30. 30. SA as User Research Method: Sleeper, but no panacea <ul><li>Benefits </li></ul><ul><ul><li>Non-intrusive </li></ul></ul><ul><ul><li>Inexpensive and (usually) accessible </li></ul></ul><ul><ul><li>Large volume of “real” data </li></ul></ul><ul><ul><li>Represents actual usage patterns </li></ul></ul><ul><li>Drawbacks </li></ul><ul><ul><li>Provides an incomplete picture of usage: was user satisfied at session’s end? </li></ul></ul><ul><ul><li>Difficult to analyze: where are the commercial tools? </li></ul></ul><ul><li>Complements qualitative methods (e.g., persona development, task analysis, field studies) </li></ul>
    31. 31. SA Headaches: What gets in the way? <ul><li>Problems* </li></ul><ul><ul><li>Lack of time </li></ul></ul><ul><ul><li>Few useful tools for parsing logs, generating reports </li></ul></ul><ul><ul><li>Tension between those who want to perform SA and those who “own” the data (chiefly IT) </li></ul></ul><ul><ul><li>Ignorance of the method </li></ul></ul><ul><ul><li>Hard work and/or boredom of doing analysis </li></ul></ul><ul><li>Most of these are going away… </li></ul><ul><li>* From summer 2006 survey (134 responses), available at book site. </li></ul>
    32. 32. Please Share Your SA Knowledge: Visit our book in progress site <ul><li>Search Analytics for Your Site: Conversations with your Customers by Louis Rosenfeld and Richard Wiggins (Rosenfeld Media, 2008) </li></ul><ul><li>Site URL: www.rosenfeldmedia.com/books/searchanalytics/ </li></ul><ul><li>Feed URL: feeds.rosenfeldmedia.com/searchanalytics/ </li></ul>
    33. 33. Contact Information <ul><li>Louis Rosenfeld </li></ul><ul><li>Rosenfeld Media, LLC </li></ul><ul><li>705 Carroll Street, #2L </li></ul><ul><li>Brooklyn, NY 11215 USA </li></ul><ul><li>+1.718.306.9396 </li></ul><ul><li>[email_address] </li></ul><ul><li>www.louisrosenfeld.com </li></ul><ul><li>www.rosenfeldmedia.com </li></ul>

    ×