Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

quest is in the name: question answering systems

179 views

Published on

Question answering systems (QAS) involve machines directly answering a user's question. They are quickly becoming a new model for search engines as technology enables higher precision answers. This talk covers QAS basics, research from Google from over the past year, and covers what we can do as search professionals for our site's strategic direction.

Published in: Marketing
  • Be the first to comment

quest is in the name: question answering systems

  1. 1. | #searchlove | @alexisKsanders quest is in the word.
  2. 2. | #searchlove | @alexisKsanders the foundation of science, math, philosophy, and intellectual exploration boil down to one thing… ironically teaching programs in struggle with inquiry-based learning, falling back on fact-based memorization – read for full thoughts: “The value of asking questions” by keith g. kozminski
  3. 3. | #searchlove | @alexisKsanders the (humble) question… questions are genuinely the building blocks of learning, the root of search, inquiry-based learning dates back to socrates
  4. 4. | #searchlove | @alexisKsanders …and their answers. the quality of our information becomes more important as the quantity increases
  5. 5. | #searchlove | @alexisKsanders which makes the process of answering questions fascinating!
  6. 6. | #searchlove | @alexisKsanders now (more than ever), as we have a massive information source according to howtogeek.com it is estimated that google held 15 extabytes (10^12, a trillion MB)
  7. 7. | #searchlove | @alexisKsanders the internet gives us: ultimate diversity
  8. 8. | #searchlove | @alexisKsanders map of known universe and the internet https://www.cfa.harvard.edu/news/2011-16 https://internet-map.net/
  9. 9. | #searchlove | @alexisKsanders we have access to info w/: different opinions, point of views, backgrounds, countries, etc.
  10. 10. | #searchlove | @alexisKsanders our ability to get answers, is limited only by our imagination the boxour potential
  11. 11. | #searchlove | @alexisKsanders (and ability to ask the right questions)
  12. 12. | #searchlove | @alexisKsanders despite its beauty, the internet suffers from its: • size • low barrier to entry
  13. 13. | #searchlove | @alexisKsanders leading to: info overload, incorrect, incomplete, (ironically) ignorance, etc. so... many....i words. sculpture by alicia martin
  14. 14. | #searchlove | @alexisKsanders to find anything useful at all, we needed to filter w/ a machine (b/c time and computational speed) although we are better an comprehending and processing natural language questions (for now…) 01110101 01110011 01100101 01101100 01100101 01110011 01110011 useful.
  15. 15. | #searchlove | @alexisKsanders thus, we have the rise of information retrieval.
  16. 16. | #searchlove | @alexisKsanders information retrieval systems: an automated process, responds to a query by examining documents and returning relevant information sorted. Modern Information Retrieval – Baeza-Yates and Robiero-Neto in 1999 defined IR as – “”
  17. 17. | #searchlove | @alexisKsanders this infers that an optimal information retrieval system returns all relevant documents in a prioritized order. “searching health information in question-answering systems” maria-dolores olvera-lobo and juncal gutierrez-artacho (2013) Meadows 1993
  18. 18. | #searchlove | @alexisKsanders however, this implies users: want to see webpages users will evaluate the process is unidirectional (i.e., not interactive) query & page share same language “searching health information in question- answering systems” maria-dolores olvera- lobo and juncal gutierrez-artacho (2013)
  19. 19. | #searchlove | @alexisKsanders in reality: • users want fast answers (to fact-based questions) • choose first-page higher results • search is haunted by confirmation bias CTR by industry study: https://twitter.com/AlexisKSanders/status/100 1544770089553920
  20. 20. | #searchlove | @alexisKsanders when put under pressure, we either get diamonds… or crushed. (yay evolution)
  21. 21. | #searchlove | @alexisKsanders a natural evolutionary improvement: a machine that directly answers questions
  22. 22. | #searchlove | @alexisKsanders a.k.a., question and answering systems
  23. 23. | #searchlove | @alexisKsanders this isn’t a new idea (+58 years young…) the legacy starts with roots in the socratic method; however, automated q/a from databased was the start with BASEBALL (’61) and LUNAR (’72), both of which answered closed-domain questions relating to baseball and lunar samples from the apollo mission respectively.
  24. 24. | #searchlove | @alexisKsanders so, what is a question & answering (QA) system?
  25. 25. | #searchlove | @alexisKsanders QA is a computer science discipline within the fields of information retrieval & NLP which is concerned with building systems that automatically answer questions posed by humans in a natural language. - wikipedia https://en.wikipedia.org/wiki/Question_answe ring
  26. 26. | #searchlove | @alexisKsanders an interactive human-computer process that encompasses: • understanding users informational needs • typically expressed in a natural language query • retrieving relevant documents, data, or knowledge • extracting, qualifying, & prioritizing available answers • presenting, explaining responses in an effective manner definition from mark maybury, new directions in question answering (2004)
  27. 27. | #searchlove | @alexisKsanders laymen’s terms: computer(s) answering human questions (in laymen’s terms) recursive much?
  28. 28. | #searchlove | @alexisKsanders visual information from mark maybury, new directions in question answering (2004) NLP question/document analysis information extraction language generation discourse analysis (i.e., ways in which language is used) IR query formulation document retrieval document analysis id’ing relevant docs ordering docs relevancy feedback human- computer interaction user modelling user preferences displays user interaction Q/A
  29. 29. | #searchlove | @alexisKsanders www + sources QA process (at a very high level) • query decomposition • syntactic & semantic parsing • question analysis • translation • classification • expansion • matching • query reformulation • document analysis • retrieval • id’ing relevant documents • ordering • relevancy feedback • answer analysis • id’ing candidates • extraction • validation • evaluation (rank) answer display answer processing information retrieval processing query • representation
  30. 30. | #searchlove | @alexisKsanders the challenge is that people & machines don’t process information in the same way… an oldie, but a goodie…  https://www.youtube.com/watch?v=gn4nRCC9TwQ
  31. 31. | #searchlove | @alexisKsanders types of QA problems: factoid temporal spatial definitional descriptional biographical opinionoid multimedia / multimodal multilingual visual information from mark maybury, new directions in question answering (2004)
  32. 32. | #searchlove | @alexisKsanders visual and concept: https://chatbotslife.com/ultimate-guide-to-leveraging- nlp-machine-learning-for-you-chatbot-531ff2dd870c mitsuku, worbot, watson, drqa, pizzabot, eagli, baseball, lunar, etc. ask anythingask specific area of qs smart-machine very hard hardrules-baseddeletion open domainclosed domain generative answer by chopping existing lexical structures, like paraphrasing making brand new content, generating sentences common concepts discussed w/in research
  33. 33. | #searchlove | @alexisKsanders why care: well, search engines care, a lot…
  34. 34. | #searchlove | @alexisKsanders we can see the seedings of this research: • featured snippets • PAA • voice
  35. 35. | #searchlove | @alexisKsanders shout out to bing for: multi-perspective answers chatbots integrated with SERPs (for seattle restaurants)
  36. 36. | #searchlove | @alexisKsanders end goal: a system that can respond to any question. a mix of human’s natural language process and a machines processing power.
  37. 37. | #searchlove | @alexisKsanders “QAS are becoming a model for the future of web search.” Question answering systems: a review on present developments, challenges and trends” lorena kodra and elina kajo mece (computer engineering polytechnic university of Tirana) - 2017
  38. 38. | #searchlove | @alexisKsanders there’s a ton of: research, datasets, and competitions being actively worked on around QAS.
  39. 39. | #searchlove | @alexisKsanders “sentence compression by deletion with LSTMs” – '15 the goal of sentence compression is to generate a shorter paraphrase of a sentence. the deletion approach is a standard (i.e., not reformulating words). “Sentence Compression by Deletion with LSTMs” Katja Filippova, Enrique Alfonseca, Carlos A. Colmenares, Lukasz Kaiser, Oriol Vinyals (2015)
  40. 40. | #searchlove | @alexisKsanders tl;dr: google team introduced an evaluation scheme for generative models for text (i.e., a way to grade machines, when they use their own words). Eval all, trust a few, do wrong to none: Comparing sentence generation models Ondrej Cıfka, Aliaksei Severyn, Enrique Alfonseca, Katja Filippova (2018)
  41. 41. | #searchlove | @alexisKsanders example sentence compressions “Sentence Compression by Deletion with LSTMs” Katja Filippova, Enrique Alfonseca, Carlos A. Colmenares, Lukasz Kaiser, Oriol Vinyals (2015) -------------- ----------- ---------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------- ------------------------------------------- -------------------------------------------------------------------- ------------------ ----------------------------------------------------------------------------------------------------------------------------
  42. 42. | #searchlove | @alexisKsanders not gwen-gwen! “Sentence Compression by Deletion with LSTMs” Katja Filippova, Enrique Alfonseca, Carlos A. Colmenares, Lukasz Kaiser, Oriol Vinyals (2015) there were of course difficulties…
  43. 43. | #searchlove | @alexisKsanders “Sentence Compression by Deletion with LSTMs” Katja Filippova, Enrique Alfonseca, Carlos A. Colmenares, Lukasz Kaiser, Oriol Vinyals (2015) sidebar: apparently nose telescopes actually exists… one more (just for fun)
  44. 44. | #searchlove | @alexisKsanders results: • outperformed baseline • indicate a compression model (which is not given syntactic information explicitly in the form of features) may demonstrate competitive performance • some difficult due to quotes, commas, dense script, important context “Sentence Compression by Deletion with LSTMs” Katja Filippova, Enrique Alfonseca, Carlos A. Colmenares, Lukasz Kaiser, Oriol Vinyals (2015)
  45. 45. | #searchlove | @alexisKsanders “searchQA: a new Q&A dataset augmented with context from a search engine” – '17 launched searchQA (dataset of Jeopardy! questions) w/140k q-a pairs “analyzing language learned by an active question answering agent” by buck, bulian, ciaramite, gajewski, gesmundo, houlsby, wang (2018) 140k q-a pairs w/snippets
  46. 46. | #searchlove | @alexisKsanders “identifying well-formed natural language questions” - '18 attempt to id' well-formed natural- language questions with 25k qs classified as: well-formed and not well-formed. “identifying well-formed natural language questions” by manaal faruqui and dipanjan das – Google AI 2018 well-formed not w-f x25,000
  47. 47. | #searchlove | @alexisKsanders achievement: 70.7% accuracy error resulting from deep semantics and syntax (e.g., [what is the history of dirk bikes?] vs. dirt) “identifying well-formed natural language questions” by manaal faruqui and dipanjan das – Google AI 2018
  48. 48. | #searchlove | @alexisKsanders “ask the right questions” – '17/18 proposes a new framework to improve QA: active question answering (AQA). “ask the right questions: active question reformulation with reinforcement learning” by buck, bulian, ciaramite, gajewski, gesmundo, houlsby, wang (2018)
  49. 49. | #searchlove | @alexisKsanders inspired by humans I and our ability to ask the right questions.
  50. 50. | #searchlove | @alexisKsanders it improves answers by reformulating questions. “ask the right questions: active question reformulation with reinforcement learning” by buck, bulian, ciaramite, gajewski, gesmundo, houlsby, wang (2018) well formed q = easy poorly formed q = hard
  51. 51. | #searchlove | @alexisKsanders how: evaluated against dataset of jeopardy! questions (which are convoluted by design) “ask the right questions: active question reformulation with reinforcement learning” by buck, bulian, ciaramite, gajewski, gesmundo, houlsby, wang (2018)
  52. 52. | #searchlove | @alexisKsanders results: • approach = effective • agent able to learn non-trivial information • suggests that machine comprehension task involve “mostly pattern matching and relevant modelling” (i.e., it’s not comprehending) “ask the right questions: active question reformulation with reinforcement learning” by buck, bulian, ciaramite, gajewski, gesmundo, houlsby, wang (2018)
  53. 53. | #searchlove | @alexisKsanders “adversarial examples for evaluating reading comprehension systems” - '17 • it’s unclear how much a reading comprehension system understands language • suggests it’s not capable of significant understanding “adversarial examples for evaluating reading comprehension systems” Robin Jia, Percy Liang, CS department 2017
  54. 54. | #searchlove | @alexisKsanders and of our there’s the work on the standford question answering dataset (SQuAD) 150k questions posed by crowdworkers on a set of wikipedia articles squad is a reading comprehension dataset - https://rajpurkar.github.io/SQuAD-explorer/ 150k SQuAD
  55. 55. | #searchlove | @alexisKsanders if you have a squad, you want BERT on it…. bidirectional encoder representations from transformers (not him ) BERT is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. - https://github.com/google-research/bert
  56. 56. | #searchlove | @alexisKsanders look at that date… every week has a groundbreaking results… someone should create an ernie to start competing with bert…
  57. 57. | #searchlove | @alexisKsanders google ai blog - jan '19 intro’ed a new db of 300k q-a pairs, "natural questions" https://ai.googleblog.com/2019/01/natural-questions-new-corpus-and.html https://ai.google.com/research/NaturalQuestions/ https://ai.google.com/research/NaturalQuestions/visualization 300k natural questions
  58. 58. | #searchlove | @alexisKsanders there’s also a comp. guess which model is first… https://ai.google.com/research/NaturalQuesti ons/competition
  59. 59. | #searchlove | @alexisKsanders https://ai.google.com/research/NaturalQuestions/ what an overachiever…
  60. 60. | #searchlove | @alexisKsanders so, what do we do about it? have a problem? can u do sth about it? don’t worry about itdo it. sleep, enjoy hobbies, live life, etc. y n y n
  61. 61. | #searchlove | @alexisKsanders well, obvi: o strive for first place, o in a manner that supports long-term stability, o focus build a loyal base, o enjoy the ride.
  62. 62. | #searchlove | @alexisKsanders how do we strive for first place?
  63. 63. | #searchlove | @alexisKsanders we return to the SEO model, for additional context see: https://moz.com/blog/seo-cyborg crawl render index rank connect technical content signaling
  64. 64. | #searchlove | @alexisKsanders get a checklist at: moz.com/blog/seo-cyborg
  65. 65. | #searchlove | @alexisKsanders +focus on strategic content and experiences
  66. 66. | #searchlove | @alexisKsanders G is probably going to own these (eventually): featured snippet: factoid temporal descriptional definitional biographical local features: spatial image search: images YouTube: video
  67. 67. | #searchlove | @alexisKsanders probably, they’ll also continue to go after transactional opportunities (expanding what they’re already doing with booking in hotels, flights, and entertainment)
  68. 68. | #searchlove | @alexisKsanders what are best bets? o brand questions (they’re yours) o niche, expertise questions o opinionoid o video o interactive experiences o seamless user experiences* see checklist on seamlessness: https://searchengineland.com/2019-in- search-find-your-seamlessness-309844
  69. 69. | #searchlove | @alexisKsanders a final note:
  70. 70. | #searchlove | @alexisKsanders even though we’re not at a point where machines return our answers, the general public acts as if we are. shout out to ian madrigal for making these hearings somewhat bearable… https://twitter.com/iansmadrig/status/10725327674 92182024 (cough) (cough)
  71. 71. | #searchlove | @alexisKsanders we see this behavior in the CTR on top results. https://twitter.com/AlexisKSanders/status/100 1544770089553920/
  72. 72. | #searchlove | @alexisKsanders we understand that search engines are just returning the most relevant document for the query,
  73. 73. | #searchlove | @alexisKsanders that the response is determined (in part) by the question, well, what is it G?
  74. 74. | #searchlove | @alexisKsanders and that (even though it’s is extraordinarily impressive) search is not perfect. https://www.seroundtable.com/google- pyramids-are-85-years-old-26839.html
  75. 75. | #searchlove | @alexisKsanders with the power of knowledge (of the internet) comes responsibilities.
  76. 76. | #searchlove | @alexisKsanders suggested list of our responsibilities as education internet denizens: □ being a gateway for quality information □ attempt to be aware of our own biases □ validating sources (making a good faith effort to) □ educating others (on searches fallibility & discerning fact from fiction) □ not being a troll (remembering that people are on the other end) □ reporting (and escalating) egregious errors □ emphasize credibility and security w/clients
  77. 77. | #searchlove | @alexisKsanders recap: • questions contribute to answers • QAS are a potential strategic direction for search engines • established what SEOs can do • our responsibility as internet citizens It’s been a pleasure getting to know you. thank you for your time and attention!
  78. 78. | #searchlove | @alexisKsanders fin.
  79. 79. | #searchlove | @alexisKsanders after a long sissy- sophie day at her favorite froyo loc sophia's aunt waiting for santa to arrive in town
  80. 80. | #searchlove | @alexisKsanders merkle’s seo partners
  81. 81. | #searchlove | @alexisKsanders thank you for participating! @AlexisKSanders /in/alexissanders
  82. 82. | #searchlove | @alexisKsanders “deal or no deal? end-to-end learning for negotiation dialogues” – '17 trained end-to-end model for negotiation (i.e., machine had to learn linguistic and reasoning skill) “deal or no deal? end-to-end learning for negotiation dialogues” mike lewis, denis yarats, yann n. dauphin, devi parikh, dhruv batra (2017)
  83. 83. | #searchlove | @alexisKsanders negotiation requires complex communications and reasoning skills. “deal or no deal? end-to-end learning for negotiation dialogues” mike lewis, denis yarats, yann n. dauphin, devi parikh, dhruv batra (2017)
  84. 84. | #searchlove | @alexisKsanders results: • agents demonstrated compromise, holding out, and to deceive w/o human design • can be improved in self- play (practicing on negotiating with computers first) “deal or no deal? end-to-end learning for negotiation dialogues” mike lewis, denis yarats, yann n. dauphin, devi parikh, dhruv batra (2017)
  85. 85. | #searchlove | @alexisKsanders and ultimately… “the answer determines the success of the question- answering system.” “when the answer comes into question in question-answering: survey and open issues” - 2011

×