Workshop presented at Webdagene 2013 (http://webdagene.no/en/) September 9, 2013; UX Lisbon (http://www.ux-lx.com), May 12, 2011; UX Hong Kong (http://www.uxhongkong.com/), February 17, 2011.
2. Hello, my name is Lou
www.louisrosenfeld.com | www.rosenfeldmedia.com
3. Agenda
1.The basics of Site Search Analytics (SSA)
2.Exercise 1 (pattern analysis)
3.Things you can do with SSA
4.Exercise 2 (longitudinal analysis
5.More things you can do with SSA
6.A case study
7.More on metrics
8.Things you can do today
9.Discussion
5. No, let’s look at the real data
Critical elements in bold: IP address, time/date stamp, query, and # of
results:
XXX.XXX.X.104 - - [10/Jul/2011:10:25:46 -0800]
"GET /search?access=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ud=1&site=AllSites&ie=UTF-8
&client=www&oe=UTF-8&proxystylesheet=www&
q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1"
200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2011:10:25:48 -0800]
"GET /searchaccess=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ie=UTF-8&client=www&
q=license+plate&ud=1&site=AllSites
&spell=1&oe=UTF-8&proxystylesheet=www&
ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
6. No, let’s look at the real data
Critical elements in bold: IP address, time/date stamp, query, and # of
results:
XXX.XXX.X.104 - - [10/Jul/2011:10:25:46 -0800]
"GET /search?access=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ud=1&site=AllSites&ie=UTF-8
&client=www&oe=UTF-8&proxystylesheet=www&
q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1"
200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2011:10:25:48 -0800]
"GET /searchaccess=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ie=UTF-8&client=www&
q=license+plate&ud=1&site=AllSites
&spell=1&oe=UTF-8&proxystylesheet=www&
ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
What are users
searching?
7. No, let’s look at the real data
Critical elements in bold: IP address, time/date stamp, query, and # of
results:
XXX.XXX.X.104 - - [10/Jul/2011:10:25:46 -0800]
"GET /search?access=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ud=1&site=AllSites&ie=UTF-8
&client=www&oe=UTF-8&proxystylesheet=www&
q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1"
200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2011:10:25:48 -0800]
"GET /searchaccess=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ie=UTF-8&client=www&
q=license+plate&ud=1&site=AllSites
&spell=1&oe=UTF-8&proxystylesheet=www&
ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
What are users
searching?
How often are
users failing?
11. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents
meet the needs of your most important audiences
12. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents
meet the needs of your most important audiences
Not all queries are
distributed equally
13. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents
meet the needs of your most important audiences
14. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents
meet the needs of your most important audiences
Nor do they
diminish gradually
15. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents
meet the needs of your most important audiences
16. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents
meet the needs of your most important audiences
80/20 rule isn’t
quite accurate
25. Agenda
1.The basics of Site Search Analytics (SSA)
2.Exercise 1 (pattern analysis)
3.Things you can do with SSA
4.Exercise 2 (longitudinal analysis
5.More things you can do with SSA
6.A case study
7.More on metrics
8.Things you can do today
9.Discussion
26. Exercise 1 (pattern analysis)
Work in pairs
• Each pair should have a laptop with
Microsoft Excel
• Laptop platform (Mac, PC) doesn’t matter
Download data files: 2005-October.xls
Refer to exercise sheet
No right answers
Have fun!
27. Agenda
1.The basics of Site Search Analytics (SSA)
2.Exercise 1 (pattern analysis)
3.Things you can do with SSA
4.Exercise 2 (longitudinal analysis
5.More things you can do with SSA
6.A case study
7.More on metrics
8.Things you can do today
9.Discussion
32. Start with basic SSA data:
queries and query frequency
Percent: volume
of search activity
for a unique
query during a
particular time
period
Cumulative
Percent:
running sum of
percentages
35. Tease out common content types
Took an hour to...
• Analyze top 50 queries (20% of all search activity)
• Ask and iterate: “what kind of content would users be
looking for when they searched these terms?”
• Add cumulative percentages
Result: prioritized list of potential content types
#1) application: 11.77%
#2) reference: 10.5%
#3) instructions: 8.6%
#4) main/navigation pages: 5.91%
#5) contact info: 5.79%
#6) news/announcements: 4.27%
36. Clear content types lead to
better contextual navigation
artist descriptions
album reviews
album pages
artist biosdiscography
TV listings
46. Session data suggest
progression and context
search session patterns
1. solar energy
2. how solar energy works
search session patterns
1. solar energy
2. energy
47. Session data suggest
progression and context
search session patterns
1. solar energy
2. how solar energy works
search session patterns
1. solar energy
2. energy
search session patterns
1. solar energy
2. solar energy charts
48. Session data suggest
progression and context
search session patterns
1. solar energy
2. how solar energy works
search session patterns
1. solar energy
2. energy
search session patterns
1. solar energy
2. solar energy charts
search session patterns
1. solar energy
2. explain solar energy
49. Session data suggest
progression and context
search session patterns
1. solar energy
2. how solar energy works
search session patterns
1. solar energy
2. energy
search session patterns
1. solar energy
2. solar energy charts
search session patterns
1. solar energy
2. explain solar energy
search session patterns
1. solar energy
2. solar energy news
62. Why analyze queries by audience?
Fortify your personas with data
Learn about differences between audiences
• Open University “Enquirers”: 16 of 25 queries
are for subjects not taught at OU
• Open University Students: search for course
codes, topics dealing with completing program
Determine what’s commonly important to all
audiences (these queries better work well)
64. Save the brand by killing jargon
Jargon related to online education: FlexEd, COD,
College on Demand
Marketing’s solution: expensive campaign to
educate public (via posters, brochures)
Result: content relabeled, money saved
query rank query
#22 online*
#101 COD
#259 College on Demand
#389 FlexTrack
*“online”part of 213 queries
65. Agenda
1.The basics of Site Search Analytics (SSA)
2.Exercise 1 (pattern analysis)
3.Things you can do with SSA
4.Exercise 2 (longitudinal analysis
5.More things you can do with SSA
6.A case study
7.More on metrics
8.Things you can do today
9.Discussion
66. Exercise 2 (longitudinal analysis)
Work in pairs
• Each pair should have a laptop with
Microsoft Excel
• Laptop platform (Mac, PC) doesn’t matter
Download data files: 2006-February.xls +
2006-June.xls
Refer to exercise sheet
No right answers
Have fun!
67. Agenda
1.The basics of Site Search Analytics (SSA)
2.Exercise 1 (pattern analysis)
3.Things you can do with SSA
4.Exercise 2 (longitudinal analysis
5.More things you can do with SSA
6.A case study
7.More on metrics
8.Things you can do today
9.Discussion
85. Failed business goals?
Developing custom metrics
Netflix asks
1. Which movies most frequently searched? (query count)
2. Which of them most frequently clicked through? (MDP views)
3. Which of them least frequently added to queue? (queue adds)
86. Failed business goals?
Developing custom metrics
Netflix asks
1. Which movies most frequently searched? (query count)
2. Which of them most frequently clicked through? (MDP views)
3. Which of them least frequently added to queue? (queue adds)
87. Failed business goals?
Developing custom metrics
Netflix asks
1. Which movies most frequently searched? (query count)
2. Which of them most frequently clicked through? (MDP views)
3. Which of them least frequently added to queue? (queue adds)
96.
1.Choose a
content type (e.g.,
events)
2.Ask:“Where
should users go
from here?”
3.Analyze the
frequent queries
from this content
type
from aiga.org
99. Sandia National Labs
• Regularly record which documents came up
at position #1 for 50 most frequent queries
• If and when that top document falls out of
position #1, document's owner is alerted
• Result: healthy dialogue (often about
following policies and procedures and their
value)
103. Shaping the
FinancialTimes’ editorial agenda
FT compares these
• Spiking queries
for proper nouns
(i.e., people and
companies)
• Recent editorial
coverage of
people and
companies
Discrepancy?
• Breaking story?!
• Let the editors
know!
104. Agenda
1.The basics of Site Search Analytics (SSA)
2.Exercise 1 (pattern analysis)
3.Things you can do with SSA
4.Exercise 2 (longitudinal analysis
5.More things you can do with SSA
6.A case study
7.More on metrics
8.Things you can do today
9.Discussion
105. Avoiding a disaster atVanguard
Vanguard used SSA to help benchmark
existing search engine’s performance and
help select new engine
New search engine “performed” poorly
But IT needed
convincing
to delay
launch
Information Architect &
Dev Team Meeting
Search seems
to have a few
problems… Nah
.
Where’s the
proof?
You can’t tell
for sure.
106. What to do?
Test performance of most frequent queries
Measure using original two sets of metrics
1.relevance: how reliably the search engine
returns the best matches first
2.precision: proportion of relevant and
irrelevant results clustered at the top of the list
107. Relevance: 5 metrics
(queries tested have “best” result)
Mean: Average distance from the top
Median: Less sensitive to outliers, but not useful once at
least half are ranked #1
Count - Below 1st: How
often is the best target
something other than
1st?
Count – Below 5th: How
often is the best target
outside the critical area?
Count – Below 10th: How
often is the best target
beyond the first page?
108. Relevance: 5 metrics
(queries tested have “best” result)
Mean: Average distance from the top
Median: Less sensitive to outliers, but not useful once at
least half are ranked #1
Count - Below 1st: How
often is the best target
something other than
1st?
Count – Below 5th: How
often is the best target
outside the critical area?
Count – Below 10th: How
often is the best target
beyond the first page?
OK!
109. Relevance: 5 metrics
(queries tested have “best” result)
Mean: Average distance from the top
Median: Less sensitive to outliers, but not useful once at
least half are ranked #1
Count - Below 1st: How
often is the best target
something other than
1st?
Count – Below 5th: How
often is the best target
outside the critical area?
Count – Below 10th: How
often is the best target
beyond the first page?
OK!
Hmmm...
110. Relevance: 5 metrics
(queries tested have “best” result)
Mean: Average distance from the top
Median: Less sensitive to outliers, but not useful once at
least half are ranked #1
Count - Below 1st: How
often is the best target
something other than
1st?
Count – Below 5th: How
often is the best target
outside the critical area?
Count – Below 10th: How
often is the best target
beyond the first page?
OK!
Hmmm...
Uh oh
111. Precision:
rating scale
Evaluate frequent queries’ top search results on this scale
• r / Relevant: Based on the information the user provided, the page's
ranking is completely relevant
• n / Near: The page is not a
perfect match, but it’s clearly
reasonable for it to be ranked
highly
• m / Misplaced: You can see
why the search engine
returned it, but it should not
be ranked highly
• i / Irrelevant: The result has
no apparent relationship to
the user’s search
112. Precision:
three metrics
Metrics based on degrees of permissiveness
1. strict: only counts completely relevant results
2. loose: counts relevant and near results
3. permissive: counts relevant, near, and misplaced results
113. Putting it all together:
old engine (target) and new
Note: low relevance and high precision scores are optimal
More on Vanguard case study: http://bit.ly/D3B8c
114. Agenda
1.The basics of Site Search Analytics (SSA)
2.Exercise 1 (pattern analysis)
3.Things you can do with SSA
4.Exercise 2 (longitudinal analysis
5.More things you can do with SSA
6.A case study
7.More on metrics
8.Things you can do today
9.Discussion
116. Search Metrics: general examples
(Lee Romero, blog.leeromero.org)
• Total searches for a given time period
• Total distinct search terms for a given time period
• Total distinct words for a given time period
• Average words per search
• Top searches for a given time period
• Top Searches over time
• Not found searches
• Error searches
• Ratio of searches performed each reporting period to the
number of visits for that same time period
117. Search Metrics: search engine tuning
(Jeannine Bartlett, earley.com)
Do users not find what they want because the search engine
and its ranking and relevance algorithms have not been
adequately tuned?
Example Benchmarks and Metrics
• # of valid queries returning no results / total unique queries
• Relative % search results per data source
• Relative % click throughs per data source
• Pass/fail % for queries using stemming
• Pass/fail % for queries with misspellings
• Precision measures of“seed”documents sent through the tagging
process
118. Search Metrics: query entry
(Jeannine Bartlett, earley.com)
Do users not find what they want because the UI for
expressing search terms is inadequate or unintuitive?
Example Benchmarks and Metrics
• % queries in the bottom set of the Zipf Curve (flat vs. hockey-stick
distribution)
• % queries with no click throughs
• % queries using syntactic metadata filtering (date, author, source,
document type, geography, etc.)
• % queries using Boolean search grammar
• % queries using type-ahead against taxonomy terms and synonyms
• % queries using faceted semantic refinement
• % pages from which search is available
119. Search Metrics: result sets
(Jeannine Bartlett, earley.com)
Do users not find what they want because the UI for
visualizing result sets is inadequate or unintuitive?
Example Benchmarks and Metrics
• % queries utilizing multiple results views
• % queries with drill-down through clusters
• % queries using iterative syntactic metadata filtering (date range,
sorting, type or source inclusion/exclusion, etc.)
• % queries suggesting broader/narrower terms
• % queries suggesting“Best Bets”or“See Also”
• % queries using iterative semantic term filtering, inclusion or
exclusion
120. Agenda
1.The basics of Site Search Analytics (SSA)
2.Exercise 1 (pattern analysis)
3.Things you can do with SSA
4.Exercise 2 (longitudinal analysis
5.More things you can do with SSA
6.A case study
7.More on metrics
8.Things you can do today
9.Discussion
121. Things to do today
1.Set up SSA in Google Analytics
2.Query your queries
3.Start developing a site report card
4.Start incorporating SSA into your
user research program
122. Turn on SSA in Google Analytics
Set up GA for your site if you haven’t already
Then teach it to parse and capture your
search engine’s queries (not set by default)
References
• http://is.gd/cR0qr
• http://is.gd/cR0qP
123. Seed your analysis by
querying your queries
Starter questions
1. What are the most frequent unique queries?
2. Are frequent queries retrieving quality results?
3. Click-through rates per frequent query?
4. Most frequently clicked result per query?
5. Which frequent queries retrieve zero results?
6. What are the referrer pages for frequent queries?
7. Which queries retrieve popular documents?
8. What interesting patterns emerge in general?
128. Agenda
1.The basics of Site Search Analytics (SSA)
2.Exercise 1 (pattern analysis)
3.Things you can do with SSA
4.Exercise 2 (longitudinal analysis
5.More things you can do with SSA
6.A case study
7.More on metrics
8.Things you can do today
9.Discussion
130. Long tail queries:
Longer, more complex (fromVanguard)
Short head: common queries Long tail: common queries
Beneficiary form
401(k)
beneficiary
career
forms
amt
money market
location
loans
calculator
403(b)(7) account asset transfer authorization
automatic investing
Wire transfer instructions
adoption agreement
international wire transfers
socially responsible investing
Vanguard tax identification number
IRA Asset Transfer form
fdic insured account
early withdrawal penalties
131. Now on sale
Search Analytics forYour Site:
Conversations with
Your Customers
by Louis Rosenfeld
(Rosenfeld Media, 2011)
www.rosenfeldmedia.com
Use code
WEBDAGENE2013
for 20% off all
Rosenfeld Media books
We get two major things out of this data: SESSIONS and FREQUENT QUERIES\n
Your brain on data: what will it do?\n
Your brain on data: what will it do?\n
\n
\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Personas: http://www.uie.com/images/blog/YahooExamplePersona.gif\nTable: From Jarrett, Quesenbery, Stirling, and Allen’s report “Search Behaviour at OU;” April 6, 2007.\n
Personas: http://www.uie.com/images/blog/YahooExamplePersona.gif\nTable: From Jarrett, Quesenbery, Stirling, and Allen’s report “Search Behaviour at OU;” April 6, 2007.\n
Personas: http://www.uie.com/images/blog/YahooExamplePersona.gif\nTable: From Jarrett, Quesenbery, Stirling, and Allen’s report “Search Behaviour at OU;” April 6, 2007.\n
Personas: http://www.uie.com/images/blog/YahooExamplePersona.gif\nTable: From Jarrett, Quesenbery, Stirling, and Allen’s report “Search Behaviour at OU;” April 6, 2007.\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Examples\n “OO7” versus “007”\n Porn-related (not carried by Netflix)\n “yoga”: not stocking enough? Or not indexing enough record content? Some other problem?\n
Examples\n “OO7” versus “007”\n Porn-related (not carried by Netflix)\n “yoga”: not stocking enough? Or not indexing enough record content? Some other problem?\n