SIKM Leaders July 2012 - Understanding your Search Log

Search analytics –
Understanding the long tail
SIKM Leaders July 2012

Lee Romero
blog.leeromero.org
July 2012

About me

My background and early career are both in software engineering.

I've worked in the knowledge management field for the last 12+
years – almost all of it in the technology of KM

I’ve worked with various search solutions for the last 7-8 years –
and spent most of that time trying to figure out how to measure
their usefulness and improve them in any way I can.

I’ve spoken at both Enterprise Search Summit and Taxonomy Boot
Camp twice.

My writings on search analytics have been featured by a number of
experts in the field including Lou Rosenfeld and Avi Rappoport

2

Search Analytics

Definition: Search analytics is the field of analyzing and
aggregating usage statistics of your search solution to
understand user behavior and to improve the experience.

Some search analytics are focused on SEO / SEM activities (for
internet searches).

The focus here will be on enterprise search, so will primarily be
focusing on the aspect of improving the user experience.

Further, I will primarily focus here on keyword search and
understanding the user language found in search logs

Always remember – analytics without action does not have much
value.
3

The challenge of your
search log

Understanding your search log

For enterprise search solutions1, the “80-20” rule is not true

The language variability is very high in a couple of ways (covered
in the next few slides)

Yet having a good understanding of the language, frequency and
commonality in your search log is critical to being able to make
sustainable improvements to your search

The remainder of this presentation first provides some evidence
supporting my claim and then will cover some ideas and research
into this problem

1 This does not seem to apply equally to e-commerce solutions

5

Some facts about search terms

There’s an anecdote that goes something like, “80% of your
searches are from 20% of your search terms”
• Equivalently, some will say that you can make significant impact by paying
attention to a few of your most common terms (you can, but in limited ways)

Fact: in enterprise search solutions the curve is much shallower:

This chart shows the
inverted power curve for
two different solutions
I’m currently working
with

In the second case, it takes 13% of terms to cover 50% of searches,
and that is over 7000 distinct terms in a typical month!
6

Some facts about search terms: part 2

Another myth: a large percent of searches repeat over and over
again

Fact: on enterprise search solutions, there is surprisingly little
commonality month-to-month

Over a recent six month period, which saw a total of ~289K distinct
search terms, only 11% of terms occurred in more than 1 month!
# of months # terms % of searches
1 257665 89.2%
2 17994 6.2%
3 5790 2.0%
4 2900 1.0%
5 2019 0.7%
6 2340 0.8%

7

Some facts about search terms: part 3

Another myth: a good percentage of your search terms will repeat in
sequential periods

Fact: There is much more churn even month-to-month than you
might expect – in the period covered below, only about 13% of
terms repeated from one month to the next (covering about 36%
of searches)

8

What to do with your search log?

The summary of the previous slides:

• It is hard to understand a decent percentage of terms within a
given time period (month)!
• If you could do that, the problem during the next time period isn’t
that much easier!

The next sections describe a couple of research projects I’ve been
working on to tackle these issues

9

Understanding your
users’ information
needs

Categorizing your users’ language

Given the challenges previously laid out, using the search log to
understand user needs seems very challenging

Beyond the first several dozen terms, it is hard to understand what
users are looking for
• And those several dozen terms cover a vanishingly small percentage of all
searches!

However, it would be very useful to understand your users’
information needs if we could somehow understand the entirety
of the search log

How do we handle this? Categorize the search terms!

11

Categorizing your users’ language, p2

So we need to categorize search terms to really be able to
understand our users’ information needs.

To do this, we face two challenges

1. What categorization scheme should we use?
2. How do we apply categorization in a repeatable, scalable and manageable
way?

For the first challenge, I would recommend you use your taxonomy
(you do have one, right?)

The second challenge is a bit more difficult but is addressed later in
this deck

12

Categories to use

Proposal: Start with your own taxonomy and its vocabularies as the
categories into which search terms are grouped

Some searches will not fit into any of these categories, so you can
anticipate the need to add further categories

As an aside, this exercise actually provides a great measurement
tool for your taxonomy
• You can quantitatively assess the percent of your users’ language that is
classifiable with your taxonomy
• A number you may wish to drive up over time (through evolution of your
taxonomy)

13

Automating categorization

Now we turn to the hairier challenge – how can we categorize
search terms?

To describe the problem, we have:
1. A set of categories, which may be hierarchically related (most taxonomies
are)
2. A set of search terms, as entered by users, that need to be assigned to
those categories

Search Term
Category
Category ? Search Term
Category
Search Term
Category
Category Category
Search Term
Category

... Search Term
... ? ...
...

14

Automating categorization, p2

The proposed solution is based on a couple of concepts:
1. You can think of this categorization problem as search!
2. You are taking each search term and searching in an index in which the
potential search results are categories!

Question: What is the “body” of what you are searching?
Answer: Previously-categorized search terms!

Using this approach, you can consider the set of previously-
categorized search terms as a corpus against which to search
• You can apply all of the same heuristics to this search as any search:
• Word matching (not string matching)
• Stemming
• Relevancy (word ordering, proximity, # of matches, etc.)

15

Automating categorization, p3

Here’s a depiction of this solution

Previously
categorized
terms Search Term
Category
Category
Category Search Term

Category Search Term
Category Category Previously
Category categorized Search Term
terms
... Search Term
...
... ...

Previously
categorized
terms

This red oval represents the “matching” process
– it takes as input the search terms to be
categorized, the set of categories along with
previously-matched search terms and produces
as output a set of categories associated with the
new search terms
16

Automating categorization, p4: Bootstrapping

This approach depends on matching to previously-categorized terms
• Every time you categorize a new search term, you expand the set of
categorized terms, enabling more matches in the future

Bootstrapping: You can take the names of the categories (the terms
in your taxonomy) as the first set of “categorized search terms”
• This allows you to start with no search terms having been categorized at all
• You run a first round of matching against the categories to find first-level
matches
• Take those that seem like “good” matches and pull those into the set of
categorized search terms for a second iteration, etc.
• Using this in initial testing resulted in 10% of distinct terms from a month
being associated with at least one category

Another aspect: Any manual categorization of common search terms
will add to the success of categorization
17

Automating categorization, p5: Iterative

Previously
categorized Search Term
Category terms
Category
Search Term
Category

Category New categorizations Search Term
Category Category
Category Previously Search Term
categorized
... terms Search Term
...
... ...

New categorizations
Previously
categorized
terms

New categorizations


This approach also needs to be applied iteratively
• You start with a set of categorized search terms and a new set of
(uncategorized) search terms
• You then apply this matching to the uncategorized search terms, getting a set
of newly-categorized search terms (with some measure of probability of
“correctness” of the match, i.e., relevancy)
• You pull in the newly-categorized search terms and run the matching process
again
• Each time, as you expand the set of categorized search terms (from a
previous match), you increase the possibility of more matches (in
subsequent matches)

19


It will be beneficial to have a human review the set of matches for
each iteration and determine if they are accurate enough
• The measurement of relevancy is intended to do this but would likely only be
partially successful

Over time, using this process, you build up a larger and larger set of
categorized search terms
• This makes it more likely in future iterations that more terms will be
categorizable

20

Automating categorization, p7: No matches

There will always be search terms that do not get matched.
• This may be because the terminology used does not match
• This may be because there are no categories in the global taxonomy that
would be useful for categorization

The first issue would require a human to recognize the association
(thus, categorizing the term and then enabling matches on future
uses of that term)

The second issue would require adding in new categories (not part
of the global taxonomy)
• And then categorizing the term into the newly-added category(ies)

21

Summary

With this approach, we can take a set of search terms at any time
and categorize them (partially) automatically
• Over time, the accuracy of the matching will improve through human review-
and-approval of matches
We then are able to relate these information needs to a variety of
other pieces of data:
• Volume of content available to users – significant mismatches can highlight
need for new content
• Rating of content in these categories – can highlight that a particular area of
interest has content but it isn’t quality content
• Downloads of content in these categories – could highlight navigational
issues (e.g., when a category is much more highly represented in search
than in downloads)
This does not require directly working with end-users and is scalable

22

Additional benefits: Measuring your taxonomy

As mentioned earlier, part of the challenge will be that there will be
terms that do not match the starting categories (i.e., the global
taxonomy)

This actually highlights some valuable insight obtainable from this:
• We can identify gaps in our taxonomy (terms requiring new categories)
• We can identify areas of our taxonomy where we have many search terms
associated with a taxonomy term and consider if we need to either add or
split search terms in order to better match our users’ real language
• We can identify areas of the taxonomy that are of little use in terms of the
language used by our users

23

Additional benefits: Linguistic statistics

Word Distinct Terms Searches
management 3128 8283
Word counts – independent of term usage, sap 1931 3873
strategy 1414 3728
what are the most common individual business 1558 3599
words? it 1343 2992
process 1515 2920
data 1264 2899
project 1249 2823
model 1296 2791
plan 987 2170

Word networks – we can understand
the inter-relationships between
individual words (which pairs
occur commonly together,
which words occur commonly
for a given word)

These are not as much about information needs as about understanding the language
users use (so this insight can help shape categorization)

These are also very useful to prioritize your efforts in reviewing your search logs
24

Additional benefits: Comparing to your content space

With the statistics described in the previous slide, you could,
conceivably compare it to the same analysis applied to your
“content space”

For example, derive the statistics for the titles of content available in
your search
• Do you find significant differences? This could represent differences in the
names people apply to things and what they expect to use to find the content

Another interesting angle is to use other controlled lists as the
matched terms in a category
• People names (applied this and found about 8% of terms match a person’s
name)
• Client names

25

Understanding the
quality of your users’
experience

The Problem

Search sucks!

Yes, the common refrain from many users – “search doesn’t return
what I’m looking for” or “I can never find what I’m looking for”

There are many tools available to improve the users’ experience,
including:
• Improving the UI
• Improving the content included
• Manipulating settings in the engine to modify relevancy
calculations, possibly even the engine itself

The challenge for many of these is, once you make a change, how
do you know it has improved the results?

27

A solution?

One way to assess the impact is to have a set of users perform
either a set of pre-defined searches or a set of their own searches
and then evaluate the quality of results

The challenge with this is that it is very labor intensive, can take a
long calendar time and is hard to do iteratively.

An alternative could be to automate this evaluation!

It is important to keep in mind that this is not about the relevancy of
the results or determining whether the engine is returning the
“right” items
• It’s about assessing the user-perceived quality of a set of
results given a set of criteria for a search

28

Automating evaluation

The idea is to automate some of the analysis of the quality of the
result set by examining properties of the result set

This approach attempts to perform a simple test similar to what a
human user would do in scanning a set of search results
• It uses the data returned by the search engine and displayed on
the first page of results
• It does not do a “deep” review of content

29

The approach

The algorithm takes the following approach:

• For each search term, it executes the query against the search
engine and retrieves the results
‒For each individual result, it calculates a quality score from 0.0 to
1.0 (a higher score implies the result looks like a better result)
‒The individual scores for a search term’s set of results are
averaged to get a single score for that search term

• In addition, the current POC outputs data in a tabular format
including most of the individual elements returned by the search
engine along with the derived score

30

What are we looking at in assessing quality?
Facets that influence quality
• Focusing primarily on user-visible aspects
First page

Result set size

Snippet

Title

Age

Uniqueness of
title
31

Factors that influence quality
• Only examining the first page of results
• Similarity / dissimilarity of keywords to title
• Similarity / dissimilarity of keywords to excerpt
• Uniqueness of titles within the result set (just first page)
• Size of total result set
• Age of results

• Looking for specific “known” targets
• (one “cheat”) Presence of keywords in “concepts” identified by
engine

32

Others that may be explored
• Balance across sources of content (does it match overall ratio?)

• Ratings of individual results

• Web domain of content (following an internet expectation that “some sources are
better than others”)

• Match of terms could be altered to consider synonyms
• Examining taxonomy values
‒ Could apply matching to taxonomy values?
‒ Could be a “bonus” to items that have taxonomy?
• May want to make weights (e.g., impact of age) consider source or class of
content
• Currently, in our search engine, best bets are automatically included.
‒ Would prefer to have them not included to see where they end up organically.
• Also, in our search engine, the exact order on a page has not been replicated so
we can’t include the exact order as a factor

33

Validating the approach
Does this reflect how a human user would perceive the quality?
• This idea seems reasonable, but do we really have a way to
determine if it is valid
‒Or, do we run the risk that this would lead to “local maximums” for
the factors measured but not meaningfully improve the user’s
experience?

• So far, I have 2 independent ways to assess this
‒Comparing the results of this against a human assessment
‒Comparing the results of this against other factors that have been
used as indicators of quality in the past

34

Validating the approach, p2
Comparing against a human assessment
• One of our on-going operations in GCKM is to review the quality of
results for a very small number of terms
‒The below takes the output of the most recent of this for our a
subset of our “super search terms” and compares it against the
programmatically calculated quality

‒There is at least a correlation
0.8
between the automated score y = 0.2781x + 0.3826
0.7
R² = 0.5803
(the Y axis) and the manual 0.6
score (the X axis) Automated Score
0.5

0.4

0.3

0.2

0.1

0
0 0.2 0.4 0.6 0.8 1 1.2
Manual Score

35

Comparing against searches/term
• Within our search program, we use the ratio of searches per visit
for a term as an indicator of the quality the results
‒The more pages of results a user looks at for a term, indicates
that it’s harder for the user to find what they are looking for
‒The following chart displays a comparison between searches/visit
(X-axis) and the automated quality score (Y-axis)

‒Again, we can see that there 80

is a correlation, though perhaps y = -0.6857x + 55.234
R² = 0.5225
70

not as strong as 60

50
compared to the manual 40

review 30

20

10

0
50 40 30 20 10 0

Quality Linear (Quality)

36

Summing up
• At this point, I am confident that the quality assessment we are
producing automatically is reflecting the user’s general experience.
‒On individual items, it can vary significantly but in aggregate it
appears to be valid
‒I have not yet dug into this but the automation enables the
weights of each factor to be adjusted and it’s possible that we can
get the automated score closer still to the “real” quality of results
through adjusting weights

37

Additional benefits of this tool
Better analysis
• Given that this utility can output data in a spreadsheet format, this
presents some other capabilities
‒Estimate total “search impressions” for specific targets
• Analyze “search impressions” vs. usage
‒Analyze spread of returned results across sources
‒Analyze quality along a variety of dimensions (source,
taxonomy values, etc.)
‒Comparing results sets between terms that should show
similar results
• E.g., how similar are the results really for two synonyms?
‒Also, comparing result sets along a temporal dimension
• How much change is there from one month (week) to the next?
‒Analyzing factors by depth into the “long tail”
‒Evaluating the quality of results for auto-complete terms

38

Quality of results split by taxonomy on the content
Better analysis - examples
• Quality of results averaged over the service area assigned to
content

Quality by Service Area of content
38.0

37.5

37.0

36.5
Overall Avg

36.0

35.5

35.0

34.5

34.0

33.5

33.0
Enterprise Human Capital Outsourcing Strategy & Technology
Applications (Consulting) Operations Integration

39

Quality of results by depth into the “long tail”
• A chart of the quality of the result pages by how far into the long
tail a search term is

Quality by Depth into the "long tail"
60.0

50.0

40.0

30.0

20.0 y = 55.685x-0.14
R² = 0.5253

10.0

0.0
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
6500
7000
7500
8000
8500
9000
9500
10000
10500
11000
11500
12000
12500
13000
13500
14000
14500
15000
15500
16000
16500
17000
17500
40

Quality over time – comparing before and after an upgrade
• This chart shows the # of terms by their change in quality through
an upgrade of our search engine – overall change was +2%!
Change in Quality through an upgrade
450

400
Worse Better
350

300

250

200

150

100

50

0
11%
13%
15%
17%
19%
21%
23%
25%
27%
29%
31%
33%
35%
37%
39%
41%
44%
47%
49%
51%
54%
56%
59%
66%
81%
-9%
-7%
-5%
-3%
-1%
1%
3%
5%
7%
9%
-46%
-39%
-34%
-31%
-29%
-27%
-25%
-23%
-21%
-19%
-17%
-15%
-13%
-11%

41

And, finally

For more about search analytics, I highly would recommend:

• “Search Analytics for your Site” by Lou Rosenfeld
• www.searchtools.com – edited by Avi Rappoport

Also, you can find my own writings on search analytics (along with a
variety of other KM topics) on my blog:
• blog.leeromero.org

42

SIKM Leaders July 2012 - Understanding your Search Log

Recommended

Recommended

More Related Content

Similar to SIKM Leaders July 2012 - Understanding your Search Log

Similar to SIKM Leaders July 2012 - Understanding your Search Log (20)

Recently uploaded

Recently uploaded (20)

SIKM Leaders July 2012 - Understanding your Search Log