Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The User is The Query: The Rise of Predictive Proactive Search


Published on

Dawn Anderson's slides from TechSEO Boost 2019

Published in: Marketing
  • Be the first to comment

The User is The Query: The Rise of Predictive Proactive Search

  1. 1. Dawn Anderson | @dawnieando | #TechSEOBoost #TechSEOBoost | @CatalystSEM THANK YOU TO THIS YEAR’S SPONSORS The User is The Query: The Rise of Predictive Proactive Search Dawn Anderson, Bertey
  2. 2. Dawn Anderson | @dawnieando | #TechSEOBoost The rise of predictive, proactive search TechSEO Boost 2019 The User is The Query
  3. 3. Dawn Anderson | @dawnieando | #TechSEOBoost “Today you are you! That is truer than true! There is no one alive who is you-er than you!” (Dr Seuss)
  4. 4. Dawn Anderson | @dawnieando | #TechSEOBoost Said Dr Seuss… and Google
  5. 5. When introducing Google Feed (now Discover)
  6. 6. Dawn Anderson | @dawnieando | #TechSEOBoost Today’s Topic: The User is the Query
  7. 7. Dawn Anderson | @dawnieando | #TechSEOBoost
  8. 8. Dawn Anderson | @dawnieando | #TechSEOBoost Also… Meet Bert and Ted
  9. 9. Dawn Anderson | @dawnieando | #TechSEOBoost There’s a problem with queries, content & users too
  10. 10. “In 1998 the web consisted of just 25 million pages…” (Ben Gomez, Google, 2018)
  11. 11. “… That’s roughly the equivalent number of those in a small library” (Ben Gomez, Google, 2018)
  12. 12. In 2019… we know the web is huge… billions of web pages (Netcraft, 2019)
  13. 13. App usage is huge too - By 2018 – App Store has 20 million registered developers. (Techcrunch, 2018)
  14. 14. 42% of the global population use social media (Emarsys, 2019)
  15. 15. We are competing with programmatic solutions spraying content & information EVERYWHERE
  16. 16. Over-choice: Too much choice often has negative impacts
  17. 17. Almost 98% of visits are people window shopping Average ecommerce conversion +/- 2% (Monetate)
  18. 18. Despite this… users are still seeking even more information
  19. 19. The number of Google searches increases year on year (Internetlivestat, 2018, curation from various sources)
  20. 20. Dawn Anderson | @dawnieando | #TechSEOBoost 15% of queries every day are new (Google)
  21. 21. Humans forage (like bears) all over the place seeking information… we are informavores
  22. 22. Dawn Anderson | @dawnieando | #TechSEOBoost Researching ALL THE THINGS… before making final decisions
  23. 23. We have become very good at filtering out things which are NOT interesting enough (8 second filter)
  24. 24. Dawn Anderson | @dawnieando | #TechSEOBoost It’s NOT a short attention span thing
  25. 25. Otherwise we would not binge on ‘Stranger Things’
  26. 26. Dawn Anderson | @dawnieando | #TechSEOBoost This is cognitive load management & information filtering
  27. 27. Dawn Anderson | @dawnieando | #TechSEOBoost AT THE SAME TIME words are problematic. Ambiguous… polysemous… synonymous
  28. 28. Often words have multiple meanings. Like “like” can be 5 possible parts of speech (POS)
  29. 29. Spoken word can be worse. Like “four candles” and “fork handles”
  30. 30. Which does not bode well for the likes of conversational search
  31. 31. In query understanding sometimes users don’t know what they want either
  32. 32. Sometimes exactly the same users express an information need in a different way
  33. 33. Sometimes different users use lots of different ways to mean exactly the same thing
  34. 34. 'The Vocabulary Problem’ Furnas, G.W., Landauer, T.K., Gomez, L.M. and Dumais, S.T., 1987. The vocabulary problem in human- system communication. Communications of the ACM, 30(11), pp.964-971. 1987
  35. 35. One of the inventors of ‘Latent Semantic Indexing’, created to solve ‘The Vocabulary Problem’ whilst researching at Bellcore (1990)
  36. 36. BTW… No-one said LSI was used by Google (aside)
  37. 37. Sometimes the searcher query is a ‘cold start’ query
  38. 38. Broad or cold start queries might call for result diversification due to lack of intent detection
  39. 39. Search engines may return a broad blend of results to match these queries Freshness Serendipity Novelty Diversity
  40. 40. AKA Result Diversification
  41. 41. The searcher has to click around to provide feedback on their intent or reformulate the query by entering something else (‘query refinement’)
  42. 42. To then deliver sequential queries with greater intent understanding
  43. 43. Human in the loop
  44. 44. Query refinement says… “Your move next”
  45. 45. A kind of ‘probability- driven fork in the road’ (Sadikov et al, 2010) CLUSTERING QUERY REFINEMENTS BY USER INTENT
  46. 46. BUT word’s meaning & user intent /context combined are still very hard to understand for search engines
  47. 47. Despite assistance from Google’s BERT & progress in NLP
  48. 48. Glue Benchmark Leaderboard
  49. 49. Superglue Benchmarks
  50. 50. Stanford Question And Answer Dataset 2.0 • Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P., 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
  51. 51. MS MARCO Paper • Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R. and Deng, L., 2016. MS MARCO: A Human-Generated MAchine Reading COmprehension Dataset.
  52. 52. The exact same queries have different intent at different times & different locations
  53. 53. What did you really mean when you searched for ‘Easter’? • Radinsky, K., Svore, K.M., Dumais, S.T., Shokouhi, M., Teevan, J., Bocharov, A. and Horvitz, E., 2013. Behavioral dynamics on the web: Learning, modeling, and prediction. ACM Transactions on Information Systems (TOIS), 31(3), p.16. When did you search for ‘Easter’? A few weeks before Easter A few days before Easter During Easter What you mostly meant When is Easter? Things to do at Easter What is the meaning of Easter?
  54. 54. Modeling & Predicting Behavioural Dynamics on The Web (Radinsky et al, 2012)
  55. 55. “When users’ information needs change over time, the ranking of results should also change to accommodate these needs.” (Radinsky, 2013)
  56. 56. This is ‘Query Intent Shift’
  57. 57. The intent of queries changes over time
  58. 58. The passage of time adds new meaning to queries sometimes too
  59. 59. The rise and fall of the Blackberry?
  60. 60. ‘iPhone’ – Query Example (Google Quality Raters Guidelines)
  61. 61. Temporal Dynamic Intent (Burstiness) is a huge factor for intent
  62. 62. At certain times far more intents will be transactional
  63. 63. “dresses”, “shoes”, “bags” “buy dresses”, “buy shoes”, “buy bags”, “dress sales”, “shoe sales” Really means
  64. 64. And sometimes only reasons a particular audience would understand spike temporal queries
  65. 65. [Four candles] + [fork handles] interest over time
  66. 66. Sometimes it is other events which trigger unexpected queries
  67. 67. Your ranking flux might well be shifting query intents at scale
  68. 68. Dawn Anderson | @dawnieando | #TechSEOBoost What a nightmare queries are
  69. 69. Maybe It’s Time For A Change?
  70. 70. Enter… The Next 20 Years of Search
  71. 71. Hmm… That sounds big Google… This is HUGE
  72. 72. Dawn Anderson | @dawnieando | #TechSEOBoost Three FUNDAMENTAL shifts in search
  73. 73. Dawn Anderson | @dawnieando | #TechSEOBoost Fundamental: “forming a necessary base or core; of central importance.”
  74. 74. Dawn Anderson | @dawnieando | #TechSEOBoost Three Fundamental Shifts • The shift from answers to journeys • The shift from queries to queryless • The shift from text to visual information
  75. 75. Dawn Anderson | @dawnieando | #TechSEOBoost The shift from text to more visual information
  76. 76. This feels like a huge UX / accessibility shift… Hoorah
  77. 77. Images are much easier to mentally consume than text & audio
  78. 78. Images & video engage… Images & video entertain Images & video provoke emotion
  79. 79. Photography app usage had a 210% increase between 2016 and 2018 according to App Annie
  80. 80. People spend on average 2.6x more time on pages with video
  81. 81. Image search is curation. Totally different to text-based search
  82. 82. Dawn Anderson | @dawnieando | #TechSEOBoost This is cognitive load management & information filtering
  83. 83. Go nuts with quality images & video
  84. 84. Dawn Anderson | @dawnieando | #TechSEOBoost The shift from queries to queryless
  85. 85. “Queries Are Difficult To Understand in Isolation” (Susan Dumais, Microsoft Research, 2016)
  86. 86. “Easier if we can model: who is asking, what they have done in the past, where they are, when it is, etc.” (Susan Dumais, CIKM, 2016)
  87. 87. Better still… what about predicting the user’s informational needs to proactively make suggestions?
  88. 88. QueryLess: Next Gen Proactive Search And Recommender Engines (2016)
  89. 89. “Nevertheless, as the world is becoming more mobile-centric, this old-fashioned query-driven search scenario and clickbased evaluation mechanism can no longer catch up with the rapid evolution of user demand on mobile devices.” (Song and Guo,2016 (Microsoft Research))
  90. 90. “Therefore,a more user- friendly, mobile-centric and scenario driven search paradigm that requires minimal user inputs is ready to come out” (Song and Guo,2016 (Microsoft Research))
  91. 91. It kind of sounds like Google Discover
  92. 92. At last announcement Google Discover had 800 million users (May, 2018)
  93. 93. It’s now on mobile home page. It knows you… and the things you do… where you’ve been… where you’re going
  94. 94. “In many cases predicting informational needs removes the need for the query & reactive search engine” (Song & Guo, 2016)
  95. 95. Zero-Query Queries – No Query Required
  96. 96. Google’s Recommender Systems
  97. 97. QueryLess: Next Gen Proactive Search And Recommender Engines
  98. 98. Google Scholar is now a Recommender System Too
  99. 99. YouTube is a Recommender System
  100. 100. YouTube Feedback Controls is ‘The Human in The Loop’
  101. 101. Reinforcement learning thrives from rewards (implicit feedback)
  102. 102. Contextual Bandit Algorithms
  103. 103. Dawn Anderson | @dawnieando | #TechSEOBoost The User (needs) is ‘The Query’
  104. 104. Dawn Anderson | @dawnieando | #TechSEOBoost The shift from answers to journeys
  105. 105. An information need is rarely a task with a single finite item
  106. 106. It’s more like a series of little chunks (sub-tasks)
  107. 107. People are creatures of habit it seems
  108. 108. “Patterns were spotted about repetitive task driven search behaviours – predictable” (Song & Guo, 2016)
  109. 109. Tasks & timelines go hand in hand… it seems
  110. 110. “Predictable task timeline patterns are more prevalent on mobile devices” (Song & Guo, 2016)
  111. 111. Like e.g. ‘checking the stock market’ every morning if you’re interested in stocks and shares
  112. 112. Mobile Device Sensors (14 sensors or more) Proximity sensors GPS sensor Ambient light sensor Accelerometer Compass Gyroscope Back illuminated sensor
  113. 113. Many tasks & intents can be modelled according to predicted patterns
  114. 114. Personalising Search via Interests & Activities 2005 paper awarded the 2017 SIGIR Test of Time Award. Cited 1029 times to date Teevan, J., Dumais, S.T. and Horvitz, E., 2005, August. Personalizing search via automated analysis of interests and activities. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 449-456). ACM.
  115. 115. Google Discover looks to be focusing on hobbies, interests, news and social activities
  116. 116. Very Recent Microsoft Research
  117. 117. The Ideal is Personalisation • Not easy to achieve fully • Sparsity of data • Privacy concerns • Broken sequences
  118. 118. In the absence of personalization… collaborative Filtering
  119. 119. There are other people nearly like you
  120. 120. You (and me) are unique… but may be similar
  121. 121. Matrix Factorisation (Netflix Recommendation System) + Matrix Factorisation (WALS Algorithm, Tensorflow)
  122. 122. Tensorflow Matrix Factorisation
  123. 123. Based on users liking the same things (with hidden common preferences)
  124. 124. Those sharing similar interests likely share other hidden interests too (i.e. the system does not know of them yet)
  125. 125. Google Discover ‘Topics’
  126. 126. Modelling cohorts
  127. 127. Understand the user, understand their cohort… Understand other similar informational needs
  128. 128. Progressive personalisation
  129. 129. The two sides of assistant will both be proactive Provide answers / search Conversation Search Help with activities / tasks Conversation Actions
  130. 130. Extend Actions on Google using Machine Learning
  131. 131. Understand your customers to assist with AI Perceived Information need Micro-task Micro-task Micro-task Micro-task Micro-task Task Micro-task Micro-task Micro-task Micro-task Task Micro-task Micro-task Task Micro-task Micro-task Micro-task Task Micro-task Micro-task Task Micro-task Task We can identify the user’s probable top tasks & subtasks Identify their needs & what info they need along the way
  132. 132. Tell us about the tasks, order and steps involved in booking a hotel
  133. 133. Many built-in intents & many ‘coming soon’
  134. 134. Connecting Tasks Across Devices & Applications
  135. 135. Multi- platforming • Switching between search and video • Between search and a recommender system
  136. 136. Connections Between Things
  137. 137. Building a Personal Knowledge Graph
  138. 138. A Recent Microsoft Personal Knowledge Graph Patent
  139. 139. Dawn Anderson | @dawnieando | #TechSEOBoost This is ’Task- driven’ Search & Recommender Systems
  140. 140. Where the user is truly ‘the query’
  141. 141. Dawn Anderson | @dawnieando | #TechSEOBoost Toward a Personal Knowledge Graph
  142. 142. Truly PERSONAL AI is not possible without a PERSONAL KNOWLEDGE GRAPH (Krisztian Balog, ECIR 2019)
  143. 143. Dawn Anderson | @dawnieando | #TechSEOBoost But where will users be reached?
  144. 144. By 2022 PCs will account for only 19 percent of IP traffic (Comscore, 2019)
  145. 145. Interest over time for Google Home & Amazon Alexa
  146. 146. Assistant + Home + Discover + Search App + Desktop + Location Tracker + Calendar + Gmail + YouTube
  147. 147. In your car
  148. 148. In your console
  149. 149. Carrier’s for Recommender Systems
  150. 150. Toward An Audience of One
  151. 151. What Can SEOs Do About This?
  152. 152. Realise… your ranking tools are mostly wrong
  153. 153. Dawn Anderson | @dawnieando | #TechSEOBoost Think CRM for SEO
  154. 154. Dawn Anderson | @dawnieando | #TechSEOBoost Identify interests & affinity groups
  155. 155. Map every single informational need sub-task you can think of to the sections of a model like the RACE model
  156. 156. Build task timeline clusters
  157. 157. Map & cluster ‘Related’ content by task & temporal type. Categories are too broad, and topics may be too
  158. 158. Dawn Anderson | @dawnieando | #TechSEOBoost Continually improve and update solid URL seasonal & temporal content
  159. 159. Contextual Order Matters
  160. 160. Dawn Anderson | @dawnieando | #TechSEOBoost Continually improve and update solid URL evergreen content
  161. 161. Dawn Anderson | @dawnieando | #TechSEOBoost Map content clearly to tasks and task timelines
  162. 162. Identify predictable patterns of user behavior
  163. 163. Understand the shared preferences, learn the hidden preferences
  164. 164. Go • Go big on evergreen content & keep updated Optimise • Optimise images well – think curation / collections Map • Map user journeys to content plans Optimise • video well – enhance with markup / transcription Get • Get personal – keep refining segments / personas Identify • Identify & cluster content around task timelines Use • Use relatedness across content, tasks & temporality
  165. 165. Dawn Anderson | @dawnieando | #TechSEOBoost Bias and reproducibility is a challenge
  166. 166. Reproducibility problems in research & RecSys (very high)
  167. 167. Bias on the web and recommender systems
  168. 168. Bias Considerations Presentation Bias Programming Bias Audience Manipulated Bias (e.g fake reviews) Machine Learning / AI Bias (Black box algorithms) Matthew’s Law Zipfian Distribution of Web Content
  169. 169. NoBIAS Project
  170. 170. Spotify add novelty items to home page to avoid biased personalisation
  171. 171. Do yourself a favour and follow Mounia Lalmas @mounialalmas
  172. 172. And this polar bear
  173. 173. The QueryLess change will not come overnight … things move slowly
  174. 174. Dawn Anderson | @dawnieando | #TechSEOBoost References
  175. 175. • Broder, A., 2002, September. A taxonomy of web search. In ACM Sigir forum (Vol. 36, No. 2, pp. 3-10). ACM. • Chuklin, A., Severyn, A., Trippas, J., Alfonseca, E., Silen, H. and Spina, D., 2018. Prosody Modifications for Question-Answering in Voice-Only Settings. arXiv preprint arXiv:1806.03957. • HigherVisibility. 2018. How Popular is Voice Search? | HigherVisibility. [ONLINE] Available at: • Filippova, K., Alfonseca, E., Colmenares, C.A., Kaiser, L. and Vinyals, O., 2015. Sentence compression by deletion with lstms. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 360-368). • Filippova, K. and Alfonseca, E., 2015. Fast k-best sentence compression. arXiv preprint arXiv:1510.08418. • Google Developers. 2018. Content-based Actions | Actions on Google | Google Developers. [ONLINE] Available at: actions/. [Accessed 18 June 2018]
  176. 176. References Radinsky, K., Svore, K.M., Dumais, S.T., Shokouhi, M., Teevan, J., Bocharov, A. and Horvitz, E., 2013. Behavioral dynamics on the web: Learning, modeling, and prediction. ACM Transactions on Information Systems (TOIS), 31(3), p.16 Sadikov, E., Madhavan, J. and Halevy, A., Google LLC, 2013. Clustering query refinements by inferred user intent. U.S. Patent 8,423,538. Official Google Webmaster Central Blog. 2019. Official Google Webmaster Central Blog: Rolling out mobile-first indexing . [ONLINE] Available at: mobile-first-indexing.html. [Accessed 25 September 2019]. Zhou, S., Cheng, K. and Men, L., 2017, April. The survey of large-scale query classification. In AIP Conference Proceedings (Vol. 1834, No. 1, p. 040045). AIP Publishing.
  177. 177. References Search Engine Land. 2019. Starting July 1, all new sites will be indexed using Google's mobile-first indexing - Search Engine Land. [ONLINE] Available at: indexed-using-googles-mobile-first-indexing-317490. [Accessed 25 September 2019]. Teevan, J., Dumais, S.T. and Horvitz, E., 2005, August. Personalizing search via automated analysis of interests and activities. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 449-456). ACM. Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R. and Deng, L., 2016. MS MARCO: A Human-Generated MAchine Reading COmprehension Dataset.
  178. 178. Dawn Anderson | @dawnieando | #TechSEOBoost Keep in touch @dawnieando
  179. 179. Dawn Anderson | @dawnieando | #TechSEOBoost Thanks for Viewing the Slideshare! – Watch the Recording: Or Contact us today to discover how Catalyst can deliver unparalleled SEO results for your business.