Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Google BERT and Family and the Natural Language Understanding Leaderboard Race

5,952 views

Published on

Natural Language Understanding and Word Sense Disambiguation remains one of the prevailing challenges for both conversational and written word. Natural language understanding attempts to untangle the 'hot mess' of words between more structured data in content, but the challenge is not trivial, since there is so much polysemy in language. Some recent developments in machine learning have seen significant leaps forward in understanding more clearly the context (and therefore user intent and informational need at time of query). Here we will explore these developments, and some of their implementations and seek to understand what this means for search strategists and the brands they support both now and into the future.

Published in: Marketing
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/tu6xl5r } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/tu6xl5r } ......................................................................................................................... Download Full doc Ebook here { https://tinyurl.com/tu6xl5r } ......................................................................................................................... Download PDF EBOOK here { https://tinyurl.com/tu6xl5r } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/tu6xl5r } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/tu6xl5r } ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Google BERT and Family and the Natural Language Understanding Leaderboard Race

  1. 1. #pubcon Google BERT & Family & The Natural Language Understanding Leaderboard Race Presented by: Dawn Anderson @BeBertey
  2. 2. #pubcon The Problem with Words
  3. 3. #pubcon
  4. 4. #pubcon Words are problematic. Ambiguous… polysemous… synonymous
  5. 5. #pubcon Ambiguity and Polysemy Almost every other word in the English language has multiple meanings
  6. 6. #pubcon In spoken word it is even worse because of homophones and prosody
  7. 7. #pubcon Like “four candles” and “fork handles”
  8. 8. #pubcon Which does not bode well for conversational search into the future
  9. 9. #pubcon Today’s Topic: Current Search Engine Solutions For Dealing with the Problem of Words
  10. 10. #pubcon MS MARCO
  11. 11. #pubcon Meet Bertey & Tedward
  12. 12. #pubcon Word’s Context • ”The meaning of a word is its use in a language” (Ludwig Wittgenstein, Philosopher, 1953) • Image attribution: Moritz Nähr [Public domain]
  13. 13. #pubcon Word’s Context Changes As A Sentence Evolves • The meaning of a word changes (literally) as a sentence develops • Due to the multiple parts of speech a word could be in a given content
  14. 14. #pubcon Like “like” We can see in just this short sentence alone using Stanford Part of Speech Tagger Online that the word like is considered to be 2 separate parts of speech http://nlp.stanford.edu:8080/parser/index.jsp
  15. 15. #pubcon Like “like” For example: The word ”like” has several possible parts of speech (including ‘verb’, ‘noun’, ‘adjective’) POS = Part of Speech
  16. 16. #pubcon An important part of this is ‘Part of Speech’ (POS) tagging
  17. 17. #pubcon Chunking and Tokenization
  18. 18. #pubcon Natural language understanding is NOT structured data
  19. 19. #pubcon Structured data helps to disambiguate but what about the ‘hot mess’ in between?
  20. 20. #pubcon Part of Speech Tagging (POS)
  21. 21. #pubcon Example Part of Speech Tagging (POS) • Pubcon • is • a • great • conference • NNP • VBZ • DT • JJ • NN • Proper noun, singular • Verb (3rd person, singular, present) • Determiner • Adjective • Noun
  22. 22. #pubcon Popular POS (Part of Speech) Taggers • Penn Treebank Tagger -> 36 different part of speech tags • CLAWS 7 (C7) Tagset -> 146 different part of speech tags • Brown Corpus Tagger -> 81 different part of speech tags
  23. 23. #pubcon Pronouns are problematic too
  24. 24. #pubcon Computer programs lose track of who is who easily I’m confused… Here… Have some flowers instead
  25. 25. #pubcon Named Entity Recognition is NOT Named Entity Disambiguation
  26. 26. #pubcon
  27. 27. #pubcon
  28. 28. #pubcon Ontology Driven Natural Language Processing Image credit: IBM https://www.ibm.com/developerworks/community/blogs/nlp/entry/ontology_driven_nlp
  29. 29. #pubcon But even named entities can be polysemic
  30. 30. #pubcon Did you mean? • Amadeus Mozart (composer) • Mozart Street • Mozart Cafe
  31. 31. #pubcon AND VERBALLY…WHO (WHAT) ARE YOU TALKING ABOUT? ”LYNDSEY DOYLE” OR ”LINSEED OIL”?
  32. 32. #pubcon AND NOT EVERYONE OR THING IS MAPPED TO THE KNOWLEDGE GRAPH
  33. 33. #pubcon
  34. 34. #pubcon
  35. 35. #pubcon EVEN IF WE UNDERSTAND THE ENTITY (THING) ITSELF WE NEED TO UNDERSTAND WORD’S CONTEXT
  36. 36. #pubcon Semantic context matters • He kicked the bucket • I have yet to cross that off my bucket list • The bucket was filled with water
  37. 37. #pubcon How can search engines fill in the gaps between named entities?
  38. 38. #pubcon When they can’t even tell the difference between Pomeranians and pancakes
  39. 39. #pubcon They need ‘Text cohesion’ Cohesion is the grammatical and lexical linking within a text or sentence that holds a text together and gives it meaning. Without surrounding words the word bucket could mean anything in a sentence
  40. 40. #pubcon Word’s Company “You shall know a word by the company it keeps” (John Rupert Firth, Linguist,1957) Image Attribution: Wikimedia Commons Public Domain
  41. 41. #pubcon Words That Live Together Are Strongly Connected • Co-occurrence • Co-occurrence provides context • Co-occurrence changes word’s meaning • Words that share similar neighbours are also strongly connected • Similarity & relatedness
  42. 42. #pubcon Natural Language Disambiguation
  43. 43. #pubcon Natural Language Recognition is NOT Understanding • Natural language understanding requires understanding of context and common sense reasoning. VERY challenging for machines, but largely straightforward for humans.
  44. 44. #pubcon Language models are trained on very large text corpora or collections (loads of words) to learn distributional similarity
  45. 45. #pubcon Vector representations of words (Word Vectors)
  46. 46. #pubcon And build vector space models for word embeddings king - man + woman = queen
  47. 47. #pubcon A Moving Word ‘Context Window’
  48. 48. #pubcon Typical window size might be 5 Source Text Writin g a lis t of rando m sentence s is harde r than I Initiall y though t it woul d be Writin g a lis t of rando m sentence s is harde r than I Initiall y though t it woul d be Writin g a lis t of rando m sentence s is harde r than I Initiall y though t it woul d be Writin g a lis t of rando m sentence s is harde r than I Initiall y though t it woul d be 11 letters (5 left and 5 right of the moving target word)
  49. 49. #pubcon Example context window size 3 Source Text Training Sample s The quick brown fox jumps over the lazy dog (the, quick) (the, brown) (the, fox) The quick brown fox jumps over the lazy dog (quick, the) (quick, brown) (quick, fox) (quick, jumps) The quick brown fox jumps over the lazy dog Etcetera The quick brown fox jumps over the lazy dog Etcetera
  50. 50. #pubcon A Moving Word ‘Context Window’
  51. 51. #pubcon Tensorflow (tool) & e.g. Word2Vec or Glove2Vec (language models)
  52. 52. #pubcon Continuous Bag of Words (CBoW) (Method) or Skip- gram (Opposite of CBoW) Continuous Bag of Words - Taking a continuous bag of words with no context utilize a context window of n size n- gram) to ascertain words which are similar or related using Euclidean distances to create vector models and word embeddings
  53. 53. #pubcon Models learn the weights of the similarity and relatedness distances
  54. 54. #pubcon Layers Everywhere
  55. 55. Concept2Ve c
  56. 56. #pubcon Google’s Topic Layer is a new Layer in the Knowledge Graph
  57. 57. #pubcon EXAMPLE MICROSOFT CONCEPT DISTRIBUTION LAYER
  58. 58. #pubcon PAST LANGUAGE MODELS (E.G. WORD2VEC & GLOVE2VEC) BUILT CONTEXT-FREE WORD EMBEDDINGS
  59. 59. #pubcon Most language modellers are uni-directional Source Text Writin g a lis t of rando m sentence s is harde r than I Initiall y though t it woul d be Writin g a lis t of rando m sentence s is harde r than I Initiall y though t it woul d be Writin g a lis t of rando m sentence s is harde r than I Initiall y though t it woul d be Writin g a lis t of rando m sentence s is harde r than I Initiall y though t it woul d be They can traverse over the word’s context window from only left to right or right to left. Only in one direction, but not both at the same time
  60. 60. #pubcon They can only look at words in the context window before and not the words in the rest of the sentence. Nor sentence to follow next
  61. 61. #pubcon OFTEN THE NEXT SENTENCE REALLY MATTERS
  62. 62. #pubcon I Remember When My Grandad Kicked The Bucket BERT is able to understand the NEXT sentence The NEXT sentence here provides the context
  63. 63. #pubcon “How far do you reckon I could kick this bucket?”
  64. 64. #pubcon Did you mean “bank”? Or did you mean “bank”?
  65. 65. #pubcon NER Example • E.g. Sentence: “Taylor Swift will launch her new album in Apple Music.” • NER result:“Taylor[B-PER] Swift[I-PER] will[O] launch[O] her[O] new[O] album[O] in[O] Apple[B-ORG] Music[I-ORG].[O]” • PS: [O] means no meaning [B-PER]/[I-PER] means person name [B-ORG]/[I-ORG] means organization name Source: https://medium.com/@yingbiao/ner-with-bert-in-action- 936ff275bc73
  66. 66. #pubcon Meet BERT
  67. 67. #pubcon Not the pomeranian BERT
  68. 68. #pubcon BERT (Bidirectional Encoder Representation from Transformers)
  69. 69. #pubcon Transformers (Attention simultaneously)
  70. 70. #pubcon 11 NLP Tasks • BERT advances the State of the Art (SOT) of 11 NLP Tasks
  71. 71. #pubcon BERT is different. BERT uses bi-directional language modelling. The FIRST to do this Source Text Writin g a lis t of rando m sentence s is harde r than I Initiall y though t it woul d be Writin g a lis t of rando m sentence s is harde r than I Initiall y though t it woul d be Writin g a lis t of rando m sentence s is harde r than I Initiall y though t it woul d be Writin g a lis t of rando m sentence s is harde r than I Initiall y though t it woul d be Bert can see both the left and the right hand side of the target word
  72. 72. #pubcon BERT HAS BEEN OPEN SOURCED BY GOOGLE AI
  73. 73. #pubcon Google’s move to open source BERT may change natural language processing forever
  74. 74. #pubcon Bert uses ‘Transformers’ & ’Masked Language Modelling’
  75. 75. #pubcon Masked Language Modelling Stops The Target Word From Seeing Itself
  76. 76. #pubcon BERT can see the WHOLE sentence on either side of a word (contextual language modelling) and all of the words almost at once
  77. 77. #pubcon BERT has been pre-trained on a lot of words … on the whole of the English Wikipedia (2,500 million words)
  78. 78. #pubcon Previously Uni-Directional Previously all language models were uni- directional so could only move the context window in one directional A moving window of ‘n’ words (either left or right of a target word) to understand word’s context
  79. 79. #pubcon Google BERT Paper • Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  80. 80. #pubcon BERT can identify which sentence likely comes next from two choices
  81. 81. #pubcon THE ML & NLP COMMUNITY ARE VERY EXCITED ABOUT BERT
  82. 82. #pubcon EVERYBODY WANTS TO ‘BUILD-A- BERT. NOW THERE ARE LOADS OF ALGORITHMS WITH BERT
  83. 83. #pubcon VANILLA BERT PROVIDES A PRE-TRAINED STARTING POINT LAYER FOR NEURAL NETWORKS IN MACHINE LEARNING & NATURAL LANGUAGE DIVERSE TASKS
  84. 84. #pubcon Whilst BERT has been pre-trained on Wikipedia it is fine- tuned on ‘questions and answer datasets’
  85. 85. #pubcon Andre Broder’s Call to Arms in Assistive AI
  86. 86. #pubcon Researchers compete over Natural Language Understanding with e.g. SQuAD (Stanford Question & Answering Dataset)
  87. 87. #pubcon BERT Has Dramatically Accelerated NLU
  88. 88. #pubcon BERT now even beats the human reasoning benchmark on SQuAD
  89. 89. #pubcon Not to be outdone – Microsoft also extends on BERT with MT-DNN
  90. 90. #pubcon RoBERTa from Facebook
  91. 91. #pubcon In GLUE – It’s Humans, MT-DNN, then BERT
  92. 92. #pubcon Glue Benchmark Leaderboard
  93. 93. #pubcon SuperGlue Benchmark
  94. 94. #pubcon Stanford Question & Answering DataSet
  95. 95. #pubcon Includes Adversarial Questions: Making Sure Machines Know What They Don’t Know
  96. 96. #pubcon MS MARCO
  97. 97. #pubcon MS MARCO: A Human Generated MAchine Reading Comprehension Dataset • Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P., 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
  98. 98. #pubcon Real Bing Questions Feed MS MARCO From real Bing anonymized queries
  99. 99. #pubcon Teaching Machines Commonsense Zellers, R., Bisk, Y., Schwartz, R. and Choi, Y., 2018. Swag: A large- scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326.
  100. 100. #pubcon BERT Has Grown • Further iterations have grown in size so that the models are arguably so large they are inefficience and unscaleable
  101. 101. #pubcon FastBERT
  102. 102. #pubcon ALBERT BERT’s successor from Google Joint work between Google Research & Toyota Technological Institute
  103. 103. #pubcon HuggingFace
  104. 104. #pubcon Distil-BERT (Distillated BERT)
  105. 105. #pubcon VideoBERT
  106. 106. #pubcon BLACK BOX ALGORITHMS
  107. 107. #pubcon Algorithmic Bias Concerns Ricardo Baeza-Yates' work - Bias on the Web NoBIAS Project IBM initiatives to prevent bias BERT does not know why it makes decisions BERT is considered a ‘black box algorithm’ Programmatic bias is a concern Algorithmic justice league is active
  108. 108. #pubcon Keep in Touch •@dawnieando •@BeBertey
  109. 109. #pubcon And Remember…
  110. 110. #pubcon
  111. 111. #pubcon References • Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P., 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250. • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I., 2017. Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).

×