Slideshow transcript
Slide 1: Using Search Analytics to Diagnose What’s Ailing your Information Architecture ASIS&T IA Summit Las Vegas, Nevada March 24, 2007 Rich Wiggins & Lou Rosenfeld www.rosenfeldmedia.com/books/searchanalytics 1 www.rosenfeldmedia.com
Slide 2: Trying to Fit into the IA Summit 2007 Theme… Rich Information Rich Interaction Rich Relationships Rich Wiggins 2 www.rosenfeldmedia.com
Slide 3: Thesis By analyzing search logs, you engage in a conversation with your customers At best, it’s a two way conversation: Your users tell you what they seek You tune your search engine (and your site) to give them what they seek the most If y o u ’ r e n o t a n a ly z in g y o u r s e a r c h lo g s , th e n y o u a r e n ’ t lis te n in g to y o u r c u s to m e r s S e a r c h is to o im p o r ta n t to le a v e in th e h a n d s o f r o b o ts 3 www.rosenfeldmedia.com
Slide 4: The Wonderful Things Search Engines Do Help harness massive amounts of content Thousands, millions, billions of URLs Cut across barriers Document structure Topical structure Institutional structure, silos 4 www.rosenfeldmedia.com
Slide 5: The Horrible Things that Search Engines Do Confuse low-value content with vital content And point to obsolete content And draft, internal, duplicative content Rank leaf pages ahead of starting points Rank popular or personal pages ahead of official content 5 www.rosenfeldmedia.com
Slide 6: MSU Keywords: Accidental Thesaurus Circa 1999 MSU’s local AltaVista stopped scaling Search for “human resources” and you get resume for a student in the HR program We h a d to do something We asked AltaVista for a way to goose the real HR site to the top of the hit list They didn’t deliver So we rolled our own Best Bets service, called it MSU Keywords And it worked! 6 www.rosenfeldmedia.com
Slide 7: Methodology Study the most popular unique searches Map each to appropriate URL “human resources” -> hr.msu.edu “campus map” -> www.msu.edu/maps Watch the results: User complaints go down So do content provider complaints Continue to watch, learn, and act 7 www.rosenfeldmedia.com
Slide 8: Google Has Trained ’Em to Search First •Top 10 searches, www.msu.edu, Jan Count Unique Query 2007 • “map” is a top search even with a map logo on 7218 campus map the home page 5859 map • MSU Usability Center, 5184 im west testing 2006 redesign, ordered testers to stay 4320 library away from the search 3745 study abroad box 3690 schedule of courses •Nielsen 50% theory 3584 bookstore may underestimate 3575 spartantrak 3229 angel 8 3204 cata www.rosenfeldmedia.com
Slide 9: The Zipf Curve: Short Head, Torso, and Long Tail 9 www.rosenfeldmedia.com
Slide 10: Keep It In Proportion 7218 campus map 5859 map 5184 im west 4320 library study abroad 3745 3690 schedule of courses 3584 bookstore 3575 spartantrak 3229 angel 3204 cata 10 www.rosenfeldmedia.com
Slide 11: Find the Sweet Spot; Avoid Diminishing Returns Rank Cumulative Count Query Percent 1 1.40 7218 campus map 14 10.53 2464 housing 42 20.18 1351 webenroll 98 30.01 650 computer center 221 40.05 295 msu union 500 50.02 124 hotels 7877 80.00 7 department of surgery 11 www.rosenfeldmedia.com
Slide 12: Look for Topical Patterns and Seasonal Changes 12 www.rosenfeldmedia.com
Slide 13: Does Best Bets Apply to Everyone? Walter Underwood, former chief architect of Ultraseek: Instead of Best Bets, Get a Better Search Engine Best Bets requires human labor Commitment of time and attention … so do good search engine implementations 13 www.rosenfeldmedia.com
Slide 14: We Didn’t Start the Fire; credit to: Vilfredo Pareto, circa 1890 – “the law of the vital few” (simplified as “80-20 rule”) George Kingsley Zipf, Harvard, circa 1932 – counting the words used in Joyce’s U ly s s e s “the” is more common than “no” or “Dublin” Bradford’s Law of Scattering, circa 1934 – a small number of journals accounts for a large percent of all important papers Cited, most importantly, by the pricing model of Elsevier for leading scientific journals Many other best bet pioneers: Microsoft, Raytheon, BBC, ESPN, AOL 14 www.rosenfeldmedia.com
Slide 15: Where will you Capture Search Queries The search logs that your search engine naturally captures and maintains as searches take place Search keywords or phrases that your users execute, that you capture into your own local database Search keywords or phrases that your commercial search solution captures, records, and reports on (Mondosoft, Web Side Story, Ultraseek, Google Appliance, etc.) 15 www.rosenfeldmedia.com
Slide 16: Anatomy of a Search Log (from Google Search Appliance) Critical elements in bold: IP address, time/date stamp, query, and # of results: XXX.XXX.X.104 - - [10/Jul/2006:10:25:46 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3A d1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF- 8&proxystylesheet=www&q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02 XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3A d1&ie=UTF- 8&client=www&q=license+plate&ud=1&site=AllSites&spell=1&oe=UTF- 8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16 XXX.XXX.XX.130 - - [10/Jul/2006:10:24:38 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3A d1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF- 8&proxystylesheet=www&q=regional+transportation+governance+comm ission&ip=XXX.XXX.X.130 HTTP/1.1" 200 9718 62 0.17 Full legend and more examples available from book site 16 www.rosenfeldmedia.com
Slide 17: Sample Query Analysis Report Excel template available from book site 17 www.rosenfeldmedia.com
Slide 18: Querying your Queries: Some basic questions 1/2 1. What are the most common unique queries? 2. Do any interesting patterns emerge from analyzing these common queries? 3. When common queries are searched, are the results the ones your users s h o u ld be seeing? 4. Which common queries retrieve zero results? 5. Which common queries retrieve a large number of results, say 100 or more? 18 www.rosenfeldmedia.com
Slide 19: Querying your Queries: Some basic questions 2/2 1. Which common queries retrieve results that don’t get clicked through? 2. What page is the top source (referrer) per common query? 3. What is the number of click-throughs per common query? 4. Which result is most frequently clicked-through per common query? 5. What’s the average query length (number of terms, number of characters)? 6. Which URLs are users searching for? 19 www.rosenfeldmedia.com
Slide 20: Tune your Questions: Broad to specific Netflix asks: 1. Which movies most frequently searched? 2. Which of them most frequently clicked through? 3. Which of them le a s t frequently added to queue (and why)? Examples: “OO7” versus “007” Porn-related (not carried by Netflix) “yoga”: not stocking enough? or not indexing enough record content? 20 www.rosenfeldmedia.com
Slide 21: SA as Diagnostic Tool: What can you fix or improve? User Research Interface Design: search entry interface, search results Retrieval Algorithm Modification Navigation Design Metadata Development Content Development 21 www.rosenfeldmedia.com
Slide 22: User Research: What do they want?… SA is a true expression of users’ information needs (often surprising: e.g., SKU numbers at LL Bean; URLs at IBM) Provides context by displaying aspects of single search sessions 22 www.rosenfeldmedia.com
Slide 23: User Research: …who wants it?… What can you learn from knowing these things? What specific segments want; determined by: Security clearance IP address Job function Account information Which pages they initiate searches from 23 www.rosenfeldmedia.com
Slide 24: User Research: …and when do they want it? Time-based variation (and clustered queries) • By hour, by day, by season • Helps determine “best bets” and “guide” develop- ment 24 www.rosenfeldmedia.com
Slide 25: Search Entry Interface Design: “The Box” or something else? SA identifies “dead end” points (e.g., 0 hits, 2000 hits) where assistance could be added (e.g., revise search, browsing alternative) Syntax of queries informs selection of search features to expose (e.g., use of Boolean operators, fielded searching) …OR… 25 www.rosenfeldmedia.com
Slide 26: Search Results Interface Design: Which results where? #10 result is clicked through more often than #s 6, 7, 8, and 9 (ten results per page) 26 From SLI Systems (www.sli-systems.com) www.rosenfeldmedia.com
Slide 27: Search Results Interface Design: How to sort results? F in a n c ia l T im e s has found that users often include dates in their queries Obvious but effective improvement: allow users to sort by date 27 www.rosenfeldmedia.com
Slide 28: Search System: What to change? Identify new functionality: F in a n c ia l T im e s added spell checking Retrieval algorithm modifications: Deloitte, Barnes & Noble use SA to demonstrate that basic improvements (e.g., Best Bets) are insufficient F in a n c ia l T im e s weights company names higher 28 www.rosenfeldmedia.com
Slide 29: Navigation: Any improvements? Michigan State University builds A-Z index automatically based on frequent queries 29 www.rosenfeldmedia.com
Slide 30: Navigation: Where does it fail? Track and study pages (excluding main page) where search is initiated Are there obvious issues that would cause a “dead end”? Are there user studies that could test/validate problems on these pages? Sandia Labs analyzes most requested documents to test content independent of site structure; results used to improve structure 30 www.rosenfeldmedia.com
Slide 31: Metadata Development: How do users express their needs? SA provides a sense of to n e : how users’ needs are expressed Jargon (e.g., “cancer” vs. “oncology,” “lorry” vs. “truck,” acronyms) Length (e.g., number of terms/query) Syntax (e.g., Boolean, natural language, keyword) 31 www.rosenfeldmedia.com
Slide 32: Metadata Development: Which metadata values? SA helps in the creation of controlled vocabularies • Terms are fodder for metadata values (e.g., “cell phone,” “JFK” vs. “John Kennedy,” “country music”), especially for determining preferred terms • Works with tools that cluster synonyms (example from www.behaviortracking.com), enabling concept searching and thesaurus development 32 www.rosenfeldmedia.com
Slide 33: Metadata Development: Which metadata attributes? SA helps in the creation of vocabularies • Simple cluster analysis can detect metadata attributes (e.g., “product,” “person,” “topic”) • Look for variations between short head and long tail (Deloitte intranet: “known-item” queries are common; known-item research research topics queries queries are infrequent) 33 www.rosenfeldmedia.com
Slide 34: Content Development: Do we have the right content? SA identifies content that can’t be found (0 results) Does the content exist? If so, there are wording, metadata, or spidering problems If not, why not? www.behaviortracking.com 34 www.rosenfeldmedia.com
Slide 35: Content Development: Are we featuring the right stuff? Clickthrough tracking helps determine which results should rise to the top (example: SLI Systems) Also suggests which “best bets” to develop to address common queries 35 www.rosenfeldmedia.com
Slide 36: Organizational Impact: Educational opportunities SA is a way to “reverse engineer” how your site performs in order to: Sensitize organization to analytics, specifically related to findability Sensitize content owners/authors to benefits of good practices around content titling, tagging, and navigational placement 36 www.rosenfeldmedia.com
Slide 37: Organizational Impact: Rethinking how you do things F in a n c ia l T im e s learns about breaking stories from their logs by monitoring spikes in company names and individuals’ names and comparing with their current coverage Discrepancy = possible breaking story; reporter is assigned to follow up Next step? Assign reporters to “beats” that emerge from SA 37 www.rosenfeldmedia.com
Slide 38: SA as User Research Method: Sleeper, but no panacea Benefits Non-intrusive Inexpensive and (usually) accessible Large volume of “real” data Represents actual usage patterns Drawbacks Provides an incomplete picture of usage: was user satisfied at session’s end? Difficult to analyze: where are the commercial tools? Ultimately an excellent c o m p le m e n t to qualitative methods (e.g., task analysis, field studies) 38 www.rosenfeldmedia.com
Slide 39: SA Headaches: What gets in the way? Lack of time Few useful tools for parsing logs, generating reports Tension between those who want to perform SA and those who “own” the data (chiefly IT) Ignorance of the method Hard work and/or boredom of doing analysis From summer 2006 survey (134 responses), available at book site. 39 www.rosenfeldmedia.com
Slide 40: Please Share Your SA Knowledge: Visit our “book in progress” site S e a r c h A n a ly tic s fo r Y o u r S ite : C o n v e r s a tio n s w ith y o u r C u s to m e r s by Louis Rosenfeld and Richard Wiggins (Rosenfeld Media, 2007) Site URL: www.rosenfeldmedia.com/books/searchanalytics/ Feed URL: feeds.rosenfeldmedia.com/searchanalytics/ Site contains: • Reading list • Survey results • Perl script for parsing logs • Log samples • Report templates • …and more 40 www.rosenfeldmedia.com
Slide 41: Contact Information Rich Wiggins wiggins@msu.edu Louis Rosenfeld lou@louisrosenfeld.com http://rosenfeldmedia.com/books/searchanalytics 41 www.rosenfeldmedia.com


Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 51 (more)