Slideshow transcript
Slide 1: Se arch Analytics fo r Fun and Pro fit An Eve nt Apart Chicago, Illinois August 27, 2007 Lou Rosenfeld www.rosenfeldmedia.com
Slide 2: Who I Am Information architecture consultant to Fortune 500s Publisher and founder, Rosenfeld Media Blog at www.louisrosenfeld.com Co-author, I nformation Architecture for the World Wide Web (3rd ed., 2006; O’Reilly) Ne w bo o k: Search Analytics for Your Site: Conversations with your customers (2008; Rosenfeld Media): www.rosenfeldmedia.com/books/searchanalytics
Slide 3: Anato my o f a Se arch Lo g (fro m Go o gle Se arch Appliance ) Critical elements in pink: IP addre ss, time / date stamp, que ry, and # o f re sults: XXX.XXX.X.104 - - [10/Jul/2006:10:25:46 -0800] \"GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL% 3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF- 8&proxystylesheet=www&q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1\" 200 971 0 0.02 XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800] \"GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL% 3Ad1&ie=UTF- 8&client=www&q=license+plate&ud=1&site=AllSites&spell=1&oe=UTF- 8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1\" 200 8283 146 0.16 XXX.XXX.XX.130 - - [10/Jul/2006:10:24:38 -0800] \"GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL% 3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF- 8&proxystylesheet=www&q=regional+transportation+governance+co mmission&ip=XXX.XXX.X.130 HTTP/1.1\" 200 9718 62 0.17
Slide 4: The Zipf Curve : Sho rt He ad, Middle To rso , Lo ng Tail
Slide 5: Ke e p It In Pro po rtio n 7218 campus map 5859 map 5184 im w e st 4320 library study abro ad 3745 3690 sche dule o f co urse s 3584 bo o ksto re 3575 spartantrak 3229 ange l 3204 cata
Slide 6: What’s the Sw e e t Spo t? Rank Cumul. % Count Query 1 1.40 7218 campus map 14 10.53 2464 housing 42 20.18 1351 webenroll 98 30.01 650 computer center 221 40.05 295 msu union 500 50.02 124 hotels 7877 80.00 7 department of surgery
Slide 7: To pical Patte rns and Se aso nal Change s
Slide 8: Whe re w ill yo u Capture Se arch Que rie s? 1. The se arch lo gs that your search engine naturally captures and maintains as searches take place 2. Search keywords or phrases that your users execute, that you capture into your own lo cal database 3. Search keywords or phrases that your co m m e rcial se arch so lutio n captures, records, and reports on (Mondosoft, Visual Sciences, Ultraseek, Google Appliance, etc.)
Slide 9: Que rying yo ur Que rie s: Ge tting starte d 1. What are the most fre que nt unique que rie s? 2. Are frequent queries retrieving quality re sults? Click-thro ugh rate s per frequent query? 3. 4. Most fre que ntly clicke d re sult per query? 5. Which frequent queries retrieve ze ro re sults? 6. What are the referrer pages for fre que nt que rie s? 7. Which queries retrieve po pular do cum e nts? 8. What inte re sting patte rns e m e rge in general?
Slide 10: Tune yo ur Que stio ns: Fro m ge ne ric to spe cific Netflix asks 1. Which movies most frequently searched? 2. Which of them most frequently clicked through? 3. Which of them least frequently added to queue?
Slide 11: Diagno se This: Fixing and impro ving the UX 1. User Research 2. Content Development 3. Interface Design: search entry interface, search results 4. Retrieval Algorithm Modification 5. Navigation Design 6. Metadata Development
Slide 12: Use r Re se arch: What do the y w ant?… SA is a true expression of users’ information needs (often surprising: e.g., SKU #s at clothing retailer; URLs at IBM) Provides context by displaying aspects of single search sessions
Slide 13: Use r Re se arch: …w hat e lse do the y w ant?… BBC provides reports to determine other terms searched within same session (tracked by cookies)
Slide 14: Use r Re se arch: …w ho w ants it?… Specific segments needs as determined by: Security clearance IP address Job function Account information Alternatively, you may be able to extrapolate segments directly from SA Pages they initiate searches from
Slide 15: Use r Re se arch: …w ho w ants it?… BBC’s top queries report from children’s section of site
Slide 16: Use r Re se arch: …and w he n do the y w ant it? Time-based variation (and clustered queries) from MSU By hour, by day, by season Helps determine “best bets” development Also can help tune main page and other editorial content
Slide 17: Co nte nt De ve lo pme nt: Do w e have the right co nte nt? Analyze 0 result queries Does the content exist? If so, there are titling, wording, metadata, or indexing problems If not, why not? From
Slide 18: Co nte nt De ve lo pme nt: Are w e fe aturing the right stuff? Track clickthroughs to determine which results should rise to the top (example: SLI Systems) Also suggests which “best bets” to develop to address common queries BBC removes navigation pages from search
Slide 19: Se arch Entry Inte rface De sign: “The Bo x” o r so me thing e lse ? Identify “dead end” points (e.g., 0 hits, 2000 hits) where assistance could be added Query syntax helps you select search features to expose (e.g., use of Boolean operators) OR
Slide 20: Se arch Re sults Inte rface De sign: Which re sults w he re ? #10 result is clicked through more often than #s 6, 7, 8, and 9 (ten results per page) From SLI Systems (www.sli-systems.com)
Slide 21: Se arch Re sults Inte rface De sign: Ho w to so rt re sults? imes has found that users often include Financial T dates in their queries Obvious but effective improvement: allow users to sort by date
Slide 22: Se arch Syste m: What to change ? Add functionality: F imes added spell inancial T checking Retrieval algorithm modifications imes weights company names higher F inancial T Netflix determines better weighting for unique terms and phrases Deloitte, Barnes & Noble, Vanguard demonstrate that basic improvements (e.g., Best Bets) are insufficient (and justify increased $$$)
Slide 23: Navigatio n: Any impro ve me nts? Michigan State University builds A-Z index automatically based on frequent queries
Slide 24: Navigatio n: Whe re do e s it fail? Track and study pages (excluding main page) where search is initiated What do they search? (e.g., acronyms, jargon) Are there other issues that would cause a “dead end”? (e.g., tagging and titling problems) Are there user studies that could test/validate problems on these pages? (e.g., “Where did you want to go next?)
Slide 25: Me tadata De ve lo pme nt: Ho w do se arche rs e xpre ss the ir ne e ds? Tone and jargon (e.g., “cancer” vs. “oncology,” “lorry” vs. “truck,” acronyms) Syntax (e.g., Boolean, natural language, keyword) Length (e.g., number of terms/query; Long Tail queries longer and more complex than Short Head) Everything we know from analyzing folksonomic tags applies here, and vice versa
Slide 26: Me tadata De ve lo pme nt: Which value s and attribute s? Uncover hierarchy and identify Metadata values (e.g., mobile vs. cell) Metadata attributes (e.g., genre, region) Content types (e.g., spec, price sheet) SA combines with AI tools for clustering, enabling concept searching and thesaurus development
Slide 27: Me tadata De ve lo pme nt: Le ve raging diffe re nce s in the curve Variations in information needs emerge between Short Head and Long Tail Example: Deloitte intranet’s “known-item” queries are common; research topics are infrequent known-item research queries queries
Slide 28: Organizatio nal Impact: Educatio nal o ppo rtunitie s “Reverse engineer” performance problems Vanguard Tests “best” results for common queries Determines why these results aren’t retrieved or clicked-through Demonstrates problem and solutions to content owners/authors benefits Sandia Labs does same, only with top results that are losing rank in search results pages
Slide 29: Organizatio nal Impact: Re e xamining assumptio ns imes learns about breaking stories Financial T from their logs by monitoring spikes in company names and individuals’ names and comparing with their current coverage Discrepancy = possible breaking story; reporter is assigned to follow up Next step? Assign reporters to “beats” that emerge from SA
Slide 30: SA as Use r Re se arch Me tho d: Sle e pe r, but no panace a Benefits Non-intrusive Inexpensive and (usually) accessible Large volume of “real” data Represents actual usage patterns Drawbacks Provides an incomplete picture of usage: was user satisfied at session’s end? Difficult to analyze: where are the commercial tools? Complements qualitative methods (e.g., persona development, task analysis, field studies)
Slide 31: SA He adache s: What ge ts in the w ay? Problems* Lack of time Few useful tools for parsing logs, generating reports Tension between those who want to perform SA and those who “own” the data (chiefly IT) Ignorance of the method Hard work and/or boredom of doing analysis M of these are going away… ost * From summer 2006 survey (134 responses), available at book site.
Slide 32: Ple ase Share Yo ur SA Kno w le dge : Visit o ur book in progress site Search Analytics for Your Site: Conversations with your Customers by Louis Rosenfeld and Richard Wiggins (Rosenfeld Media, 2008) Site URL: www.rosenfeldmedia.com/books/searchanalytics/ Fe e d URL: feeds.rosenfeldmedia.com/searchanalytics/
Slide 33: Co ntact Info rmatio n Louis Rosenfeld Rosenfeld Media, LLC 705 Carroll Street, #2L Brooklyn, NY 11215 USA +1.718.306.9396 lou@louisrosenfeld.com www.louisrosenfeld.com www.rosenfeldmedia.com



Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 2 (more)