Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enterprise search information search behaviour

338 views

Published on

Some practitioners state that users in an enterprise search deployment enter a much smaller number of words in a search query (1.5 average) than on the Internet (3.0 average) and infer it as one of the causes for poor outcomes. This short article presents an argument that this enterprise search user behaviour rather than being a cause, is actually a symptom of factors related to the enterprise environment, including corpus sizes and search query parsing algorithms. User search behaviour (agency) may develop as a result of corpus size/query parsing algorithms (structure) explaining some of the search query length differences between Internet search engines, site-search and enterprise search deployments. These may act as a constraining effect in many enterprises, where user behaviour adapts to these structures. This shift in thinking may enable more effective interventions and solution design.

Published in: Business
  • Be the first to comment

Enterprise search information search behaviour

  1. 1. Short enterprise search queries: Are users really to blame? Paul H Cleverley Nov 2017 Some practitioners state that users in an enterprise search deployment enter a much smaller number of words in a search query (1.5 average) than on the Internet (3.0 average) and infer it as one of the causes for poor outcomes. This short article presents an argument that this enterprise search user behaviour rather than being a cause, is actually a symptom of factors related to the enterprise environment, including corpus sizes and search query parsing algorithms. User search behaviour (agency) may develop as a result of corpus size/query parsing algorithms (structure) explaining some of the search query length differences between Internet search engines, site-search and enterprise search deployments. These may act as a constraining effect in many enterprises, where user behaviour adapts to these structures. This shift in thinking may enable more effective interventions and solution design. Internet Search The mean average number of words used in Internet search queries appears to have grown over time. Between 1996-1998 some studies show the mean number of words used in a search query was 1.2 to 1.5, but by 1999-2004 it had grown to 2.5 to 2.61. Recent studies of Google (2011) place the mean number of words in a query at 3.32 (or 3.08 for a range of Internet search engines)2. There will clearly be differences between user knowledge and task difficulty3 including lookup/known item search (where there is a right answer) and exploratory search tasks, so these data are a generalized smoothed average. 1 http://erichorvitz.com/queryrefine.pdf 2 http://www.sciencedirect.com/science/article/pii/S0920548911000808 3 http://onlinelibrary.wiley.com/doi/10.1002/pra2.2016.14505301063/pdf
  2. 2. Explanations in the literature for this increase over time have focused on several areas. Firstly, information volumes and competing items. As the web grows, the hypothesis is that more refinement is required to locate what people are looking for. Google indexed 17 million websites in 2000. It is over 1 Billion today, with a sharp increase particularly in the past 5 years which has tripled the volume4. This may have prompted some improvements in search literacy out of necessity. Secondly is semantic search5 where more advanced algorithms are used by Internet search engines to parse queries, matching user intent to information, moving away from classic keyword Boolean queries. Internet search users have probably found that by adding more terms (through to full questions), is helping them find what they need. Thirdly is the increasing use of voice as a search method, although there is conflicting evidence. From voice having negligible impact on overall average words used6 to studies showing voice searches are almost a word longer (3.4) on mobile devices compared to typed searches on mobile devices (2.23)7. However, the explosive growth in voice search is without question (particularly to service ‘local’ queries) and the likelihood that Conversational Interfaces (CI) will gain in adoption. Some marketing companies go as far as stating by 2020, 50% of all searches will be voice8. Regardless of the hype, voice is likely to have a profound impact on ‘search’ in the years to come, although these search lengths are still some way from the 11-14 word length typically used in human to human questions9. Finally, autocomplete/autosuggest10 features (including crowdsourced previous queries from the community, thesaurus and ngram type statistical suggestions/priming from rules and content) may also have encouraged a user to select more words. For example, a query today using Google for ‘Samsung’ will prompt a suggestion of ‘Samsung Galaxy S8’. Today in 2017, it has been reported that 8% of all Google search queries are questions11 (‘What..’, ‘How..’, ‘When..’, ‘Why..’, Who..’ etc.). From Google Trends12, data indicates this has more than doubled since 2008 after remaining flat between 2004-2008. There is some evidence that for Internet Search engines it was as low as 1% in 200113. The increase may be associated with autosuggest features introduced for the first time in Google during 2008. However, whilst autosuggest may have influenced increases in the number of words used in search queries since 2008, it can’t be used to explain previous rises. In site search, practitioners have also suggested that support websites receive more questions than entertainment websites14 illustrating the variability of intent that exists which won’t be picked 4 http://www.internetlivestats.com/total-number-of-websites/ 5 https://searchenginewatch.com/sew/opinion/2411478/longer-search-queries-are-becoming-the-norm-what-it-means-for-seo 6 https://neilpatel.com/blog/type-no-more-how-voice-search-is-going-to-impact-the-seo-landscape/ 7 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.205.8735&rep=rep1&type=pdf 8 https://www.branded3.com/blog/voice-search/ 9 http://singularitybookreviews.com/john-smart-2013-exponential-increase-in-search-query-length/ 10 https://uxmag.com/articles/designing-search-as-you-type-suggestions 11 https://moz.com/blog/state-of-searcher-behavior-revealed 12 https://trends.google.co.uk/trends/explore?q=What,How,When,Why,Who&geo=,,,,&date=all,all,all,all,all#TIMESERIES 13 http://eprints.qut.edu.au/4756/1/4756.pdf 14 https://www.quora.com/What-percent-of-search-queries-are-phrased-as-questions
  3. 3. up by overall averages. For example, ‘verticals’ such as retail that utilize autosuggest, have much lower averages than Internet search (1.515 to 1.65 no. words in a search query)16. Enterprise Search In the enterprise, the current practitioner view is that users input a much smaller number of words in a search query than on the Internet. It continues to be reported in 2017 that an average of 1.5 words is typically used in an Enterprise Search query. This appears to have not changed over time with figures of 1.4 reported in 200517. There is some variation, with figures from 1.8118 quoted in 2013 (from 1.72 in 2003 for the same site search), to 1.54 in an enterprise search deployment in a large multinational in 2008.19 Hypothesizing, more specific terminology, corporate acronyms and need for precision could steer enterprise search behaviour to use smaller query sizes than Internet search. The potential for misunderstandings may exist, with some reports quoting the modal average20 21 rather than the mean or average length (unique queries), rather than overall volume (popularity). Practitioners often assert that enterprise users should add more words in their enterprise search queries22 and is often put forward as one of the key causes for poor enterprise search results. There are two lines of evidence that may challenge this assertion . Impact of information volumes in the enterprise In a study of a large enterprise search deployment (Fig 1) with over 10Million+ queries, a medium/strong correlation was found between corpus size and search query length. Fig 1 – Average number of words in a search query compared to search index size (information volume) in an enterprise search deployment over four years 15 https://www.statista.com/statistics/744854/retail-site-search-word-length-search/ 16 https://www.statista.com/statistics/744854/retail-site-search-word-length-search/ 17 http://onlinelibrary.wiley.com/doi/10.1002/meet.1450420115/full 18 http://www.nowpublishers.com/article/Details/INR-053 19 http://dougcornelius.com/2008/09/enterprise-search-at-procter-gamble 20 http://trace.tennessee.edu/cgi/viewcontent.cgi?article=1669&context=utk_gradthes 21 https://moz.com/blog/state-of-searcher-behavior-revealed 22 https://openair.rgu.ac.uk/handle/10059/2403
  4. 4. During a four year period, as the content volume grew from 30 million to 220 million items, the number of words used in a search query increased from 1.83 to 2.65 (an increase of 45%). This may show adaptation of user search behaviour within a deployment. Correlation is of course not causation. Changes in search algorithms (see next section) and the user interface (e.g. search box23) are some of the other possibilities for this trend. However, during the four years in the enterprise search deployment shown in Figure 1, there were no dramatic changes to either search algorithms or the user interface functionality. Voice activated search is not relevant in this deployment at this time. Another explanation is the possibility that user behaviour from the Google habitus (adding more words to a search query) was translated into the enterprise. Analysing some random months from the enterprise search log, indicates questions (‘What..’, ‘How..’, ‘When..’, ‘Why..’, Who..’ etc.) make up < 0.005% of queries and have remained relatively unchanged over the four years. Another explanation could be changes in search task behaviour over time (lookup/known item versus exploratory) or addition of new user communities with different needs, although there is no evidence for this. The influence of autocomplete/autosuggest techniques (at least for frequent searches) has been hypothesized to increase the number of words selected by a user for search queries in the enterprise24. However, these techniques were always present in the enterprise deployment studied (Fig 1). Also, fortuitously (in this case), every 18 to 24 months the previous search suggestions were cleared out due to issues copying them into different server/cloud service environments. So the increase in the number of words used in a search query shown in Figure 1, cannot be attributed to an increasing build up of ‘search suggestions’ driven by an increasing population of previous searches. Some environments19 with ‘small’ corpus sizes (<1million items), report as much as 16%of all queries lead to ‘no results’; this is unlikely to encourage users to use more words, where experiences and outcomes can shape behaviour. Laboratory style control conditions are hard (virtually impossible) to apply in real world situations over several years to isolate independent variables. However, for the reasons discussed above and using inference to the best explanation, it is postulated that the trend (Fig 1) of increasing number of words used in a search query in the enterprise, is probably influenced by information volume factors. The effect of ‘structure’ on ‘agency’. Search query parsing algorithms in the enterprise Much of the sophistication in search query parsing utilized by Internet search engines, does not appear to have made its way into many (most?) enterprise search deployments21. Although thesauri are increasingly used by organizations in their enterprise search deployments (to address areas like synonyms), the base technology for many deployments may still be fundamentally based around rudimentary Boolean keyword search. 23 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.875.9558&rep=rep1&type=pdf 24 http://journals.sagepub.com/doi/abs/10.1177/0165551514554522
  5. 5. For example, take a user making the query ‘PGT health and safety standard frameworks’. They could be effectively ‘penalized’ in many deployments if the most pertinent web page or document does not contain the last word ‘frameworks’ (but has all the others). A rigid Boolean query (logical AND between terms) without any search term query dropping or probabilistic statistical matching techniques, could mean longer search queries are actually detrimental. It is therefore postulated that search query parsing sophistication may also influence the number of words used in search queries within the enterprise. Users may adapt. Summary It may be time to move away from the often re-quoted and recycled metric that people only add 1.5 words on average in a search query to enterprise search deployments. Whilst this might be true for some specific site search/retail verticals, the picture in larger enterprise search deployments may be quite different or at least heterogeneous. Users are often blamed by some practitioners because they do not add ‘enough’ words when making a search query. In certain situations, there may be evidence to support this and no doubt search literacy in enterprises can be significantly improved. However, there is evidence that shows when corpus index information volumes get bigger, user adapt and the number of words used in search queries increases. Many organizations are making progress in improving their search query parsing to handle more ‘natural language’ type queries and to stimulate (prime) them through autosuggest manipulation. As well as rule based methods, the literature (and technology marketplace) is peppered with Machine Learning / Artificial Intelligence / Cognitive approaches whether they utilize conversational interfaces or not. If these approaches are ultimately successful, it could be inferred that search logs will show the artefacts of this success – changes in user behaviour - more words used in search queries. It may be possible to benchmark this and presents an area for research. The differences between enterprise search and Internet search are well documented25 26. Viewing the generally smaller amounts of words entered in search queries in the enterprise (compared to the Internet) as a symptom rather than a cause of sub-optimal outcomes, may help deepen understanding and enable more effective solution design and interventions. With the growth of ‘big data’ and rate of technological progress, it is highly likely that one way or another, search behaviour could be set for some major changes in the next few years as people adapt to new structures and new ‘norms’. Acknowledgements Appreciation to the following for their comments: Udo Kruschwitz (University of Essex), Paul Clough (University of Sheffield), Ed Dale (EY), Martin Baumgartel (Verizon Wireless), Martin White (IntranetFocus) and Charlie Hull (Flax). 25 https://dl.acm.org/citation.cfm?id=1012297 26 http://irsg.bcs.org/informer/2012/01/delivering-successful-search-within-the-enteprise/

×