Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Natural Language Processing and Search Intent Understanding C3 Conductor 2019 Dawn Anderson

This talk looks at the ways in which search engines are evolving to understand further the nuance of linguistics in natural language processing and in understanding searcher intent.

  • Login to see the comments

Natural Language Processing and Search Intent Understanding C3 Conductor 2019 Dawn Anderson

  1. 1. “What Happens in Vagueness Stays in Vagueness’” Dawn Anderson
  2. 2. If I came into your hardware store… Image Attribution: Acabashi [CC BY-SA 4.0 (]
  3. 3. And asked for “fork handles”
  4. 4. It kind of sounds like “four candles”
  5. 5. What about if I said “Got any Os”?
  6. 6. It reads like O’s but it sounds like other things as well as … 0 (zero)’s … for the gate
  7. 7. Homophones - Examples ‘four candles’ and ‘fork handles’
  8. 8. What about, if, in the very next sentence, I asked “Got any P’s?”
  9. 9. You’d presume I meant P’s”
  10. 10. Because of the context of the previous question
  11. 11. Not… garden peas
  12. 12. The Two Ronnies – ‘British Comedians’ Name Droppers The Confusing Library Four Candles Crossed Lines Mastermind
  13. 13. Almost every other word in the English language has multiple meanings
  14. 14. “The meaning of a word is its use in a language” (Ludwig Wittgenstein , 1953) Image attribution: Moritz Nähr [Public domain]
  15. 15. The most important thing to remember is I have a Pomeranian called Bert
  16. 16. Disclaimer: I am NOT a data scientist
  17. 17. But I will be talking about some concepts covering: Data Science 01 Information Retrieval 02 Algorithms 03 Linguistics 04 Information Architecture 05 Library Science Category theory
  18. 18. Since… These are all areas connected to how search engines (try to) find the right information, for the right informational need at the right time for the right user
  19. 19. ‘information retrieval’ in web search To extract informational resources to meet a search engine user’s information need at time of query.
  20. 20. Let us first take a very simplistic look at how we know search engines work
  21. 21. It’s just like gathering & organizing books in a library system or using an old card index system
  22. 22. But instead we are taking words (or phrases) and recording where they live
  23. 23. EXAMPLE Inverted Index: Text to Doc ID Mapping
  24. 24. And then picking one (or some) of these ‘word homes’ (documents) to meet a query
  25. 25. The hard part is knowing how to choose the right documents, in the right order, at the right time
  26. 26. Since ‘relevance’ to one user is not ‘relevance to another
  27. 27. For some queries there can be only one answer
  28. 28. And even - Zero-query queries - The user is the query
  29. 29. Where the user might be looking for a restaurant whilst travelling at 60 mph on a highway?
  30. 30. So… Just what is the right information need for the right user at the right time?
  31. 31. It Depends …
  32. 32. Who? What? When? Where? Why?
  33. 33. Relevance Matching to Query Requires: Understanding meaning of words in content & query (What?) Understanding meaning of word's context in content & query (What?) Understanding of user’s context (Who / Where / When / Why?) Understanding of collaboration (Past queries / popularity / reinforcement / learning to rank)
  34. 34. Matching ‘content’ with ‘intent’ requires increasing precision
  35. 35. A lot of content is kind of unfocused
  36. 36. Each document (page) is largely just a ‘stream of words’
  37. 37. Every day there are huge volumes of new indexable data
  38. 38. Image credit: Paul Newbury
  39. 39. Since every Single Tweet is a new web Page
  40. 40. Many websites (and webpages) are not logically organized at all Unstructured data is voluminous Filled with irrelevance Lacks focus Riddled with nuance Lots of meaningless text and further ambiguating jabber
  41. 41. Most text-filled web pages could be considered unstructured, noisy data Blog == Blah Blah
  42. 42. Structured versus unstructured data • Structured data – high degree of organization • Readily searchable by simple search engine algorithms or known search operators (e.g. SQL) • Logically organized • Often stored in a relational database
  43. 43. When we compare them with highly organized relational database systems
  44. 44. A form of structured (& semi-structured) data – Entities, Knowledge Graphs, Knowledge Bases & Knowledge Repositories
  45. 45. “Entities help to bridge the gap between structured and unstructured data” (Krisztian Balog, ECIR2019 Keynote)
  46. 46. Author of Entity-Oriented Search – Free on Open Access
  47. 47. Using structured data is an obvious way to disambiguate in both content & query understanding
  48. 48. Two things (entities) are similar if they have a not so distant common ancestor
  49. 49. ‘Is A’ Hierarchies
  50. 50. Knowledge Graphs using triples (subject, predicate, object)
  51. 51. IsA Concepts in entities & their relationships can be mapped & categorised
  52. 52. A well organized website can resemble a knowledge graph
  53. 53. Since website is NOT ALL unstructured data even before structured data markup It can have a hierarchy It can have weighted sections It can have metadata It (often) has a tree like structure
  54. 54. As long as there is understanding of notions of categorical ‘inheritance’
  55. 55. Categories & subcategories’
  56. 56. Semi- structured data • Hierarchical nature of a website • Tree structure • Well sectioned and including clear containers and meta headings • An ontology map between semi and structured
  57. 57. Internal linking can be as much about ontology mapping as crawl optimisation
  58. 58. And many pages lack the things that emphasise important topics and structure
  59. 59. Ontology Driven Natural Language Processing Image credit: IBM
  60. 60. But even named entities can be polysemic
  61. 61. Did you mean? •Amadeus Mozart (composer) •Mozart Street •Mozart Cafe
  62. 62. And verbally…Who (what) are you talking about? ”Lyndsey Doyle” or ”Linseed Oil”?
  63. 63. And not everyone or thing is mapped to the knowledge graph
  64. 64. On their own single words have no semantic meaning
  65. 65. Even if we understand the entity (thing) itself we need to understand word’s context
  66. 66. Semantic context matters •He kicked the bucket •I have yet to cross that off my bucket list •The bucket was filled with water
  67. 67. Unfortunately… when things lack topical focus and relevance
  68. 68. How can search engines fill in the gaps between named entities?
  69. 69. When they can’t even tell the difference between Pomeranians and pancakes
  70. 70. They need ‘Text cohesion’ Cohesion is the grammatical and lexical linking within a text or sentence that holds a text together and gives it meaning. Without surrounding words the word bucket could mean anything in a sentence
  71. 71. If I said to you… “I’ve got a new jaguar”
  72. 72. “It’s in the garage” (sidenote: this is not my garage)
  73. 73. You probably wouldn’t expect to see this
  74. 74. Because garage and car go together
  75. 75. The ‘jaguar’ (cat) is the odd one out
  76. 76. Garage and car and jaguar ‘co-occur’ in common language together - ‘garage’ added context to ’jaguar’ the ‘car’
  77. 77. But if we understood a topic is about felines we might be more confident of a jaguar ‘cat’
  78. 78. “You shall know a word by the company it keeps” (John Rupert Firth, 1957)
  79. 79. Natural Language Disambiguation
  80. 80. Probabilistic ‘Guesstimation’
  81. 81. Teaching machines to understand what words live nearby each other in context
  82. 82. Then we can disambiguate through co-occurrence
  83. 83. Using ‘Distributional Similarity’ (Relatedness)
  84. 84. Nearest Neighbours (Similarity) Evaluations KNN – K-Nearest-Neighbour
  85. 85. 2 words are similar if they co-occur with similar words
  86. 86. 2 words are similar if they occur in a given grammatical relation with the same words Harvest Peel Eat Slice
  87. 87. First Level Relatedness – Words that appear together in the same sentence
  88. 88. Second Level Relatedness – words that co-occur with the same ‘other’ words
  89. 89. Coast and Shore Example Coast and shore have a similar meaning They co-occur in first and second level relatedness documents in a collection They would receive a high score in similarity
  90. 90. Language models are trained on very large text corpora or collections (loads of words) to learn distributional similarity
  91. 91. Vector representations of words (Word Vectors)
  92. 92. Models learn the weights of the similarity and relatedness distances
  93. 93. An important part of this is ‘Part of Speech’ (POS) tagging
  94. 94. Continuous Bag of Words (CBoW) (Method) or Skip-gram (Opposite of CBoW) Continuous Bag of Words - Taking a continuous bag of words with no context utilize a context window of n size n-gram) to ascertain words which are similar or related using Euclidean distances to create vector models and word embeddings
  95. 95. A Moving Word ‘Context Window’
  96. 96. And build vector space models for word embeddings king - man + woman = queen
  97. 97. Tensorflow (tool) & e.g. Word2Vec or Glove2Vec (language models)
  98. 98. Layers Everywhere
  99. 99. Concept2Vec Ontological concepts
  100. 100. Google’s Topic Layer is a new Layer in the Knowledge Graph
  101. 101. Example Microsoft Concept Distribution Layer
  102. 102. Past language models (e.g. Word2Vec & Glove2Vec) built context-free word embeddings
  103. 103. Did you mean “bank”? Or did you mean “bank”?
  104. 104. Most language modellers are uni-directional Source Text Writing a list of random sentences is harder than I Initially thought it would be Writing a list of random sentences is harder than I Initially thought it would be Writing a list of random sentences is harder than I Initially thought it would be Writing a list of random sentences is harder than I Initially thought it would be They can traverse over the word’s context window from only left to right or right to left. Only in one direction, but not both at the same time
  105. 105. They can only look at words in the context window before and not the words in the rest of the sentence. Nor sentence to follow next
  106. 106. Often the next sentence REALLY matters
  107. 107. I remember the last words my Grandpa said before he kicked the bucket… …How far do you reckon I could kick this bucket?
  108. 108. Meet BERT
  109. 109. Not the pomeranian BERT
  110. 110. BERT (Bidirectional Encoder Representation from Transformers)
  111. 111. BERT is different. BERT uses bi-directional language modelling. The FIRST to do this Source Text Writing a list of random sentences is harder than I Initially thought it would be Writing a list of random sentences is harder than I Initially thought it would be Writing a list of random sentences is harder than I Initially thought it would be Writing a list of random sentences is harder than I Initially thought it would be Bert can see both the left and the right hand side of the target word
  112. 112. BERT has been open sourced by Google AI
  113. 113. Google’s move to open source BERT may change natural language processing forever
  114. 114. Bert uses ‘Transformers’ & ’Masked Language Modelling’
  115. 115. Masked Language Modelling Stops The Target Word From Seeing Itself
  116. 116. BERT can see the WHOLE sentence on either side of a word (contextual language modelling) and all of the words almost at once
  117. 117. BERT has been pre-trained on a lot of words … on the whole of the English Wikipedia (2,500 million words)
  118. 118. BERT can identify which sentence likely comes next from two choices
  119. 119. The ML & NLP Community are very excited about BERT
  120. 120. Vanilla BERT provides a pre-trained starting point layer for Neural Networks in machine learning & natural language diverse tasks
  121. 121. Everybody wants to ‘Build-a- BERT. Now there are loads of algorithms with BERT
  122. 122. Whilst BERT has been pre- trained on Wikipedia it is fine- tuned on ‘questions and answer datasets’
  123. 123. Whilst BERT has been pre-trained on Wikipedia it is fine- tuned on ‘questions and answer datasets’
  124. 124. Researchers compete over Natural Language Understanding with e.g. SQuAD (Stanford Question & Answering Dataset)
  125. 125. BERT now even beats the human reasoning benchmark on SQuAD
  126. 126. Not to be outdone – Microsoft also extends on BERT with MT-DNN
  127. 127. In GLUE – It’s Humans, MT- DNN, then BERT
  128. 128. It’s not just words in content that need to be disambiguated though
  129. 129. How can search engines understand intents?
  130. 130. Query Classifications - There are some we know of already
  131. 131. We need to understand how queries have been classified by search engines
  132. 132. Google’s Quality Raters Guide simplifies & extends these Know query == Informational Website query == Navigational Do query == Transactional Visit in person == Local intent
  133. 133. There are also several types of queries too (Krisztian Balog, ECIR, 2019) Keyword queries (Normal keyword queries) Keyword++ queries (Faceted / filtered queries) Zero-Query queries (User is the query) Natural language queries Structured queries (e.g. SQL)
  134. 134. ‘Dresses’ is clearly classified as a transactional query
  135. 135. Whilst keyword research tools are useful... The SERPs tell us some secrets on ‘initial intent’ detection
  136. 136. But If I searched for “fork handles’
  137. 137. Would I mean “Handles for forks”?
  138. 138. The organic results are NOT for fork handles
  139. 139. The Two Ronnies – ‘Four Candles’
  140. 140. No high organic ranking candles or forks
  141. 141. Apart from… Ebay selling an actual fork handle in position 8
  142. 142. Almost completely ‘informational & video results’ (not transactional)
  143. 143. The overwhelming ‘intent’ was detected
  144. 144. Even in voice search & assistant
  145. 145. Temporal Dynamic Intent (Burstiness) is a huge factor for intent
  146. 146. At certain times far more intents will be transactional
  147. 147. “dresses”, “shoes”, “bags” “buy dresses”, “buy shoes”, “buy bags”, “dress sales”, “shoe sales” Really means
  148. 148. And sometimes only reasons a particular audience would understand spike temporal queries
  149. 149. Sometimes it is other events which trigger unexpected queries
  150. 150. [Four candles] interest over time
  151. 151. [Fork Handles] interest over time
  152. 152. Often intents can be modelled according to predicted intent shifts
  153. 153. Google Trends will only show interest, not intent
  154. 154. The exact same queries have different intent at different times & different locations
  155. 155. Let’s Take The Query [Easter]
  156. 156. What did you really mean when you searched for ‘Easter’? When did you search for ‘Easter’? A few weeks before Easter A few days before Easter During Easter What you mostly meant When is Easter? Things to do at Easter What is the meaning of Easter? Radinsky, K., Svore, K.M., Dumais, S.T., Shokouhi, M., Teevan, J., Bocharov, A. and Horvitz, E., 2013. Behavioral dynamics on the web: Learning, modeling, and prediction. ACM Transactions on Information Systems (TOIS), 31(3), p.16.
  157. 157. “Easter” Query Intent Shift
  158. 158. Predicting the future with Web Dynamics • The journey to predict the future: Kira Radinsky at TEDxHiriya
  159. 159. This is ‘Query Intent Shift’
  160. 160. “When users’ information needs change over time, the ranking of results should also change to accommodate these needs.” (Radinsky, 2013)
  161. 161. Your ranking flux might well be shifting query intents at scale
  162. 162. The passage of time adds new meaning sometimes too
  163. 163. Another Great ‘Ronnies’ Sketch BTW
  164. 164. The rise and fall of the Blackberry?
  165. 165. In query understanding sometimes users don’t know what they want
  166. 166. Sometimes the searcher query is a ‘cold start’ query
  167. 167. Broad queries might call for result diversification due to lack of intent detection
  168. 168. Search engines may return a blend of results to match these Freshness Serendipity Novelty Diversity
  169. 169. The searcher has to click around to provide feedback on their intent or reformulate the query by entering something else (‘query refinement’)
  170. 170. To then deliver sequential queries with greater intent understanding
  171. 171. Query Refinement says… “Your move next”
  172. 172. Sometimes there are not enough precise results either
  173. 173. And result precision is not possible
  174. 174. And this can increase recall due to query expansion or relaxation
  175. 175. Precision versus recall in search results
  176. 176. The intent tied to the page type matters too
  177. 177. Different features matter to users more dependent on the domain News (freshness) Jobs (salary, job title, location) Restaurants (location, cuisine) Shopping (price)
  178. 178. In theory… a consolidated page should rank higher… but…
  179. 179. Who? What? When? Where? Why?
  180. 180. Mixing ‘intent’ on target pages can be like oil and water
  181. 181. So watch out for random informational blurb on ecommerce pages
  182. 182. Watch out for both topical & intent drift
  183. 183. And watch out you don’t lose a featured snippet by changing intent
  184. 184. Focus & disambiguate
  185. 185. One oar on topic – the other on intent
  186. 186. To keep the boat going straight
  187. 187. But wait… Understanding word’s context more is NOT understanding ‘The Whole Context’
  188. 188. Where the user is truly ‘the query’
  189. 189. Since humans are unique individuals
  190. 190. Truly PERSONAL AI is not possible without a PERSONAL KNOWLEDGE GRAPH (Krisztian Balog, ECIR 2019)
  191. 191. Assistant + Home + Discover + Search App + Desktop
  192. 192. A Recent Microsoft Personal Knowledge Graph Patent
  193. 193. Semantic Query Understanding Example Source & Image Attribution NTent
  194. 194. That is a whole different ‘kettle of fish’
  195. 195. And that is for another time…
  196. 196. In the meantime… remember…
  197. 197. Sources, References, further reading
  198. 198. • Balog, K - Entity-Oriented Search | SpringerLink. 2019. Entity-Oriented Search | SpringerLink. [ONLINE] Available at: 3-319-93935-3. [Accessed 06 May 2019]. • Boyd-Graber, J., Hu, Y. and Mimno, D., 2017. Applications of topic models. Foundations and Trends® in Information Retrieval, 11(2-3), pp.143-296. • ECIR 2019. 2019. Proceedings. [ONLINE] Available at: [Accessed 06 May 2019]. • Gabrilovich, E. and Markovitch, S., 2007, January. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJcAI (Vol. 7, pp. 1606-1611). • Hakkani-Tur, D., Tur, G., Li, X. and Li, Q., Microsoft Technology Licensing LLC, 2017. Personal knowledge graph population from declarative user utterances. U.S. Patent Application 14/809,243. • Lim, Y.J., Linn, J., Liang, Y., Steinebach, C., Lu, W.L., Kim, D.H., Kunz, J., Koepnick, L. and Yang, M., Google LLC, 2018. Predicting intent of a search for a particular context. U.S. Patent Application 15/598,580. • Lotfi, A., Bouchachia, H., Gegov, A., Langensiepen, C. and McGinnity, M., 2018. Advances in Computational Intelligence Systems. Intelligence.
  199. 199. • Lohar, P., Ganguly, D., Afli, H., Way, A. and Jones, G.J., 2016. FaDA: Fast document aligner using word embedding. The Prague Bulletin of Mathematical Linguistics, 106(1), pp.169-179. • McDonald, R., Brokos, G.I. and Androutsopoulos, I., 2018. Deep relevance ranking using enhanced document-query interactions. arXiv preprint arXiv:1809.01682. • NTENT. 2019. Query Understanding - NTENT. [ONLINE] Available at: [Accessed 09 May 2019]. • Plank, Barbara | Keynote - Natural Language Processing: - • Radinsky, Kira - Tedx Talk - • Radinsky, K., 2012, December. Learning to predict the future using Web knowledge and dynamics. In ACM SIGIR Forum(Vol. 46, No. 2, pp. 114-115). ACM.
  200. 200. • • Sherkat, E. and Milios, E.E., 2017, June. Vector embedding of wikipedia concepts and entities. In International conference on applications of natural language to information systems (pp. 418- 428). Springer, Cham. • Syed, U., Slivkins, A. and Mishra, N., 2009. Adapting to the shifting intent of search queries. In Advances in Neural Information Processing Systems (pp. 1829-1837).
  201. 201. • • representations-and-measures-of-relatedness-in-vector-spaces- 1.html • lYLBCw • • network.html • language-embeddings/
  202. 202. • • network.html • evolving-semantic-discovery/ • 30b9e77e772.pdf • reporting-for-your-sites.html • into-ecommerce-category-pages/299003/
  203. 203. • model-for-nlp-f8b21a9b6270 • • model-for-nlp-f8b21a9b6270 • • • • may-change-nlp-forever/ • intuition-78614e4d6e0b
  204. 204. • gram-model/ • Semantic similarity and relatedness as scaffolding for natural language processing -> • gensim: models.word2vec – Word2vec embeddings. 2019. gensim: models.word2vec – Word2vec embeddings. [ONLINE] Available at: [Accessed 09 May 2019].