Successfully reported this slideshow.
Your SlideShare is downloading. ×

SIKM Leaders July 2012 - Understanding your Search Log

Loading in …3

Check these out next

1 of 42 Ad

More Related Content

Similar to SIKM Leaders July 2012 - Understanding your Search Log (20)

Recently uploaded (20)


SIKM Leaders July 2012 - Understanding your Search Log

  1. 1. Search analytics – Understanding the long tail SIKM Leaders July 2012 Lee Romero July 2012
  2. 2. About me My background and early career are both in software engineering. I've worked in the knowledge management field for the last 12+ years – almost all of it in the technology of KM I’ve worked with various search solutions for the last 7-8 years – and spent most of that time trying to figure out how to measure their usefulness and improve them in any way I can. I’ve spoken at both Enterprise Search Summit and Taxonomy Boot Camp twice. My writings on search analytics have been featured by a number of experts in the field including Lou Rosenfeld and Avi Rappoport 2
  3. 3. Search Analytics Definition: Search analytics is the field of analyzing and aggregating usage statistics of your search solution to understand user behavior and to improve the experience. Some search analytics are focused on SEO / SEM activities (for internet searches). The focus here will be on enterprise search, so will primarily be focusing on the aspect of improving the user experience. Further, I will primarily focus here on keyword search and understanding the user language found in search logs Always remember – analytics without action does not have much value. 3
  4. 4. The challenge of your search log
  5. 5. Understanding your search log For enterprise search solutions1, the “80-20” rule is not true The language variability is very high in a couple of ways (covered in the next few slides) Yet having a good understanding of the language, frequency and commonality in your search log is critical to being able to make sustainable improvements to your search The remainder of this presentation first provides some evidence supporting my claim and then will cover some ideas and research into this problem 1 This does not seem to apply equally to e-commerce solutions 5
  6. 6. Some facts about search terms There’s an anecdote that goes something like, “80% of your searches are from 20% of your search terms” • Equivalently, some will say that you can make significant impact by paying attention to a few of your most common terms (you can, but in limited ways) Fact: in enterprise search solutions the curve is much shallower: This chart shows the inverted power curve for two different solutions I’m currently working with In the second case, it takes 13% of terms to cover 50% of searches, and that is over 7000 distinct terms in a typical month! 6
  7. 7. Some facts about search terms: part 2 Another myth: a large percent of searches repeat over and over again Fact: on enterprise search solutions, there is surprisingly little commonality month-to-month Over a recent six month period, which saw a total of ~289K distinct search terms, only 11% of terms occurred in more than 1 month! # of months # terms % of searches 1 257665 89.2% 2 17994 6.2% 3 5790 2.0% 4 2900 1.0% 5 2019 0.7% 6 2340 0.8% 7
  8. 8. Some facts about search terms: part 3 Another myth: a good percentage of your search terms will repeat in sequential periods Fact: There is much more churn even month-to-month than you might expect – in the period covered below, only about 13% of terms repeated from one month to the next (covering about 36% of searches) 8
  9. 9. What to do with your search log? The summary of the previous slides: • It is hard to understand a decent percentage of terms within a given time period (month)! • If you could do that, the problem during the next time period isn’t that much easier! The next sections describe a couple of research projects I’ve been working on to tackle these issues 9
  10. 10. Understanding your users’ information needs
  11. 11. Categorizing your users’ language Given the challenges previously laid out, using the search log to understand user needs seems very challenging Beyond the first several dozen terms, it is hard to understand what users are looking for • And those several dozen terms cover a vanishingly small percentage of all searches! However, it would be very useful to understand your users’ information needs if we could somehow understand the entirety of the search log How do we handle this? Categorize the search terms! 11
  12. 12. Categorizing your users’ language, p2 So we need to categorize search terms to really be able to understand our users’ information needs. To do this, we face two challenges 1. What categorization scheme should we use? 2. How do we apply categorization in a repeatable, scalable and manageable way? For the first challenge, I would recommend you use your taxonomy (you do have one, right?) The second challenge is a bit more difficult but is addressed later in this deck 12
  13. 13. Categories to use Proposal: Start with your own taxonomy and its vocabularies as the categories into which search terms are grouped Some searches will not fit into any of these categories, so you can anticipate the need to add further categories As an aside, this exercise actually provides a great measurement tool for your taxonomy • You can quantitatively assess the percent of your users’ language that is classifiable with your taxonomy • A number you may wish to drive up over time (through evolution of your taxonomy) 13
  14. 14. Automating categorization Now we turn to the hairier challenge – how can we categorize search terms? To describe the problem, we have: 1. A set of categories, which may be hierarchically related (most taxonomies are) 2. A set of search terms, as entered by users, that need to be assigned to those categories Search Term Category Category ? Search Term Category Search Term Category Category Category Search Term Category ... Search Term ... ? ... ... 14
  15. 15. Automating categorization, p2 The proposed solution is based on a couple of concepts: 1. You can think of this categorization problem as search! 2. You are taking each search term and searching in an index in which the potential search results are categories! Question: What is the “body” of what you are searching? Answer: Previously-categorized search terms! Using this approach, you can consider the set of previously- categorized search terms as a corpus against which to search • You can apply all of the same heuristics to this search as any search: • Word matching (not string matching) • Stemming • Relevancy (word ordering, proximity, # of matches, etc.) 15
  16. 16. Automating categorization, p3 Here’s a depiction of this solution Previously categorized terms Search Term Category Category Category Search Term Category Search Term Category Category Previously Category categorized Search Term terms ... Search Term ... ... ... Previously categorized terms This red oval represents the “matching” process – it takes as input the search terms to be categorized, the set of categories along with previously-matched search terms and produces as output a set of categories associated with the new search terms 16
  17. 17. Automating categorization, p4: Bootstrapping This approach depends on matching to previously-categorized terms • Every time you categorize a new search term, you expand the set of categorized terms, enabling more matches in the future Bootstrapping: You can take the names of the categories (the terms in your taxonomy) as the first set of “categorized search terms” • This allows you to start with no search terms having been categorized at all • You run a first round of matching against the categories to find first-level matches • Take those that seem like “good” matches and pull those into the set of categorized search terms for a second iteration, etc. • Using this in initial testing resulted in 10% of distinct terms from a month being associated with at least one category Another aspect: Any manual categorization of common search terms will add to the success of categorization 17
  18. 18. Automating categorization, p5: Iterative Previously categorized Search Term Category terms Category Search Term Category Category New categorizations Search Term Category Category Category Previously Search Term categorized ... terms Search Term ... ... ... New categorizations Previously categorized terms New categorizations
  19. 19. Automating categorization, p5: Iterative This approach also needs to be applied iteratively • You start with a set of categorized search terms and a new set of (uncategorized) search terms • You then apply this matching to the uncategorized search terms, getting a set of newly-categorized search terms (with some measure of probability of “correctness” of the match, i.e., relevancy) • You pull in the newly-categorized search terms and run the matching process again • Each time, as you expand the set of categorized search terms (from a previous match), you increase the possibility of more matches (in subsequent matches) 19
  20. 20. Automating categorization, p6: Iterative It will be beneficial to have a human review the set of matches for each iteration and determine if they are accurate enough • The measurement of relevancy is intended to do this but would likely only be partially successful Over time, using this process, you build up a larger and larger set of categorized search terms • This makes it more likely in future iterations that more terms will be categorizable 20
  21. 21. Automating categorization, p7: No matches There will always be search terms that do not get matched. • This may be because the terminology used does not match • This may be because there are no categories in the global taxonomy that would be useful for categorization The first issue would require a human to recognize the association (thus, categorizing the term and then enabling matches on future uses of that term) The second issue would require adding in new categories (not part of the global taxonomy) • And then categorizing the term into the newly-added category(ies) 21
  22. 22. Summary With this approach, we can take a set of search terms at any time and categorize them (partially) automatically • Over time, the accuracy of the matching will improve through human review- and-approval of matches We then are able to relate these information needs to a variety of other pieces of data: • Volume of content available to users – significant mismatches can highlight need for new content • Rating of content in these categories – can highlight that a particular area of interest has content but it isn’t quality content • Downloads of content in these categories – could highlight navigational issues (e.g., when a category is much more highly represented in search than in downloads) This does not require directly working with end-users and is scalable 22
  23. 23. Additional benefits: Measuring your taxonomy As mentioned earlier, part of the challenge will be that there will be terms that do not match the starting categories (i.e., the global taxonomy) This actually highlights some valuable insight obtainable from this: • We can identify gaps in our taxonomy (terms requiring new categories) • We can identify areas of our taxonomy where we have many search terms associated with a taxonomy term and consider if we need to either add or split search terms in order to better match our users’ real language • We can identify areas of the taxonomy that are of little use in terms of the language used by our users 23
  24. 24. Additional benefits: Linguistic statistics Word Distinct Terms Searches management 3128 8283 Word counts – independent of term usage, sap 1931 3873 strategy 1414 3728 what are the most common individual business 1558 3599 words? it 1343 2992 process 1515 2920 data 1264 2899 project 1249 2823 model 1296 2791 plan 987 2170 Word networks – we can understand the inter-relationships between individual words (which pairs occur commonly together, which words occur commonly for a given word) These are not as much about information needs as about understanding the language users use (so this insight can help shape categorization) These are also very useful to prioritize your efforts in reviewing your search logs 24
  25. 25. Additional benefits: Comparing to your content space With the statistics described in the previous slide, you could, conceivably compare it to the same analysis applied to your “content space” For example, derive the statistics for the titles of content available in your search • Do you find significant differences? This could represent differences in the names people apply to things and what they expect to use to find the content Another interesting angle is to use other controlled lists as the matched terms in a category • People names (applied this and found about 8% of terms match a person’s name) • Client names 25
  26. 26. Understanding the quality of your users’ experience
  27. 27. The Problem Search sucks! Yes, the common refrain from many users – “search doesn’t return what I’m looking for” or “I can never find what I’m looking for” There are many tools available to improve the users’ experience, including: • Improving the UI • Improving the content included • Manipulating settings in the engine to modify relevancy calculations, possibly even the engine itself The challenge for many of these is, once you make a change, how do you know it has improved the results? 27
  28. 28. A solution? One way to assess the impact is to have a set of users perform either a set of pre-defined searches or a set of their own searches and then evaluate the quality of results The challenge with this is that it is very labor intensive, can take a long calendar time and is hard to do iteratively. An alternative could be to automate this evaluation! It is important to keep in mind that this is not about the relevancy of the results or determining whether the engine is returning the “right” items • It’s about assessing the user-perceived quality of a set of results given a set of criteria for a search 28
  29. 29. Automating evaluation The idea is to automate some of the analysis of the quality of the result set by examining properties of the result set This approach attempts to perform a simple test similar to what a human user would do in scanning a set of search results • It uses the data returned by the search engine and displayed on the first page of results • It does not do a “deep” review of content 29
  30. 30. The approach The algorithm takes the following approach: • For each search term, it executes the query against the search engine and retrieves the results ‒For each individual result, it calculates a quality score from 0.0 to 1.0 (a higher score implies the result looks like a better result) ‒The individual scores for a search term’s set of results are averaged to get a single score for that search term • In addition, the current POC outputs data in a tabular format including most of the individual elements returned by the search engine along with the derived score 30
  31. 31. What are we looking at in assessing quality? Facets that influence quality • Focusing primarily on user-visible aspects First page Result set size Snippet Title Age Uniqueness of title 31
  32. 32. What are we looking at in assessing quality? Factors that influence quality • Only examining the first page of results • Similarity / dissimilarity of keywords to title • Similarity / dissimilarity of keywords to excerpt • Uniqueness of titles within the result set (just first page) • Size of total result set • Age of results • Looking for specific “known” targets • (one “cheat”) Presence of keywords in “concepts” identified by engine 32
  33. 33. What are we looking at in assessing quality? Others that may be explored • Balance across sources of content (does it match overall ratio?) • Ratings of individual results • Web domain of content (following an internet expectation that “some sources are better than others”) • Match of terms could be altered to consider synonyms • Examining taxonomy values ‒ Could apply matching to taxonomy values? ‒ Could be a “bonus” to items that have taxonomy? • May want to make weights (e.g., impact of age) consider source or class of content • Currently, in our search engine, best bets are automatically included. ‒ Would prefer to have them not included to see where they end up organically. • Also, in our search engine, the exact order on a page has not been replicated so we can’t include the exact order as a factor 33
  34. 34. Validating the approach Does this reflect how a human user would perceive the quality? • This idea seems reasonable, but do we really have a way to determine if it is valid ‒Or, do we run the risk that this would lead to “local maximums” for the factors measured but not meaningfully improve the user’s experience? • So far, I have 2 independent ways to assess this ‒Comparing the results of this against a human assessment ‒Comparing the results of this against other factors that have been used as indicators of quality in the past 34
  35. 35. Validating the approach, p2 Comparing against a human assessment • One of our on-going operations in GCKM is to review the quality of results for a very small number of terms ‒The below takes the output of the most recent of this for our a subset of our “super search terms” and compares it against the programmatically calculated quality ‒There is at least a correlation 0.8 between the automated score y = 0.2781x + 0.3826 0.7 R² = 0.5803 (the Y axis) and the manual 0.6 score (the X axis) Automated Score 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 1.2 Manual Score 35
  36. 36. Validating the approach, p3 Comparing against searches/term • Within our search program, we use the ratio of searches per visit for a term as an indicator of the quality the results ‒The more pages of results a user looks at for a term, indicates that it’s harder for the user to find what they are looking for ‒The following chart displays a comparison between searches/visit (X-axis) and the automated quality score (Y-axis) ‒Again, we can see that there 80 is a correlation, though perhaps y = -0.6857x + 55.234 R² = 0.5225 70 not as strong as 60 50 compared to the manual 40 review 30 20 10 0 50 40 30 20 10 0 Quality Linear (Quality) 36
  37. 37. Validating the approach, p4 Summing up • At this point, I am confident that the quality assessment we are producing automatically is reflecting the user’s general experience. ‒On individual items, it can vary significantly but in aggregate it appears to be valid ‒I have not yet dug into this but the automation enables the weights of each factor to be adjusted and it’s possible that we can get the automated score closer still to the “real” quality of results through adjusting weights 37
  38. 38. Additional benefits of this tool Better analysis • Given that this utility can output data in a spreadsheet format, this presents some other capabilities ‒Estimate total “search impressions” for specific targets • Analyze “search impressions” vs. usage ‒Analyze spread of returned results across sources ‒Analyze quality along a variety of dimensions (source, taxonomy values, etc.) ‒Comparing results sets between terms that should show similar results • E.g., how similar are the results really for two synonyms? ‒Also, comparing result sets along a temporal dimension • How much change is there from one month (week) to the next? ‒Analyzing factors by depth into the “long tail” ‒Evaluating the quality of results for auto-complete terms 38
  39. 39. Quality of results split by taxonomy on the content Better analysis - examples • Quality of results averaged over the service area assigned to content Quality by Service Area of content 38.0 37.5 37.0 36.5 Overall Avg 36.0 35.5 35.0 34.5 34.0 33.5 33.0 Enterprise Human Capital Outsourcing Strategy & Technology Applications (Consulting) Operations Integration 39
  40. 40. Quality of results by depth into the “long tail” Better analysis - examples • A chart of the quality of the result pages by how far into the long tail a search term is Quality by Depth into the "long tail" 60.0 50.0 40.0 30.0 20.0 y = 55.685x-0.14 R² = 0.5253 10.0 0.0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 8500 9000 9500 10000 10500 11000 11500 12000 12500 13000 13500 14000 14500 15000 15500 16000 16500 17000 17500 40
  41. 41. Quality over time – comparing before and after an upgrade Better analysis - examples • This chart shows the # of terms by their change in quality through an upgrade of our search engine – overall change was +2%! Change in Quality through an upgrade 450 400 Worse Better 350 300 250 200 150 100 50 0 11% 13% 15% 17% 19% 21% 23% 25% 27% 29% 31% 33% 35% 37% 39% 41% 44% 47% 49% 51% 54% 56% 59% 66% 81% -9% -7% -5% -3% -1% 1% 3% 5% 7% 9% -46% -39% -34% -31% -29% -27% -25% -23% -21% -19% -17% -15% -13% -11% 41
  42. 42. And, finally For more about search analytics, I highly would recommend: • “Search Analytics for your Site” by Lou Rosenfeld • – edited by Avi Rappoport Also, you can find my own writings on search analytics (along with a variety of other KM topics) on my blog: • 42