Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Google BERT - What SEOs and Marketers Need to Know


Published on

Google BERT is many things, including the name of a Google Search algorithm update. There is lots of confusion as to what Google BERT is, where it has come from and what SEOs and marketers need to do about it (if anything). Here we look at the solutions the introduction of Google BERT by Google seeks to provide and explore the background to natural language processing and computational linguistics.

Published in: Marketing

Google BERT - What SEOs and Marketers Need to Know

  1. 1. @dawnieando Google BERT What SEOs and Marketers Need to Know
  2. 2. @dawnieando Here’s another Bert, & I am Dawn Anderson Managing Director of BerteyBert, my Pomeranian
  3. 3. @dawnieando • A Google algorithmic update • Google announce BERT to the organic search world in a VERY geeky way • Mentions of the 15% of new queries every day • Touches on ‘The Vocabulary Problem’ (many ways of querying the same thing) October 2019 - Welcome To Search, BERT
  4. 4. @dawnieando • Probably the biggest improvement in search EVER • The biggest change in search in five years, since RankBrain Fundamentally… Google BERT is
  5. 5. @dawnieando !Layman’s Terms: it can be used to help Google better understand the context of words in search queries & content So, just what is Google BERT update?
  6. 6. @dawnieando • Used globally in all languages on featured snippets • BERT to impact rankings for 1 in 10 queries • Initially for English language queries in US The bottom line search announcement
  7. 7. @dawnieando Dec 2019 – BERT expands internationally • Over 70 languages • Still only impacts 10% of queries despite the considerable expansion • Still all featured snippets globally
  8. 8. @dawnieando • BERT deals with ambiguity & ‘nuance’ in queries & content • Unlikely to impact short queries • More likely to impact conversational queries • Unlikely to impact branded queries Why just 10% of Google Queries Impacted?
  9. 9. @dawnieando • The SEO community is abuzz • BERT is a big deal • Likened to ‘Rank Brain’ in some of the ‘interesting’ interpretations • Some confusions around ‘What BERT is and what it means for search’ SEO’s React
  10. 10. @dawnieando !A neural network-based technique for natural language processing pre-training !An anagram of Bi-Directional Encoder Representations from Transformers BERT in Geek Speak
  11. 11. @dawnieando !Bi-directional !Encoder !Representations From !Transformers Let’s Visit The B – E – R – T Explanations Later
  12. 12. @dawnieando • Search algorithm update • Open source pre-trained model / framework for natural language understanding • Academic research paper • Evolving tool for computational linguistics efficiency • Beginning of MANY BERT’ish language models Important: BERT is Many Things
  13. 13. @dawnieando So What’s The Backstory? Where%did%BERT%come%from? Where%did%the%need%for%BERT%arise? The$Impact$of$BERT$for$SEO$&$beyond? What%next?
  14. 14. @dawnieando • Academic Paper • Research Project by Devlin et al • Published a year before the update in October 2018 • Bert: Pre-training of deep bidirectional transformers for language understanding BERT started as a research paper in 2018
  15. 15. @dawnieando • Open sourced so anyone can build a BERT • BERT created a sea-change leap-forward in natural language understanding in information retrieval very quickly • Provided a pre-trained language model which required only fine- tuning BERT Open Sourced in 2018
  16. 16. @dawnieando The whole of the English Wikipedia & The Books Corpus combined. Over 2,500 million words BERT Has Been Pre-Trained On Many Words
  17. 17. @dawnieando Vanilla BERT provides a pre- trained starting point layer for neural networks in machine learning & natural language diverse tasks The machine learning community got very excited about BERT
  18. 18. @dawnieando • BERT is fine-tuned on a variety of downstream NLP tasks, including question and answer datasets BERT Can Be Fine-Tuned in A Short Space of Time
  19. 19. @dawnieando • Vanilla BERT can be used ‘out of the box’ or fine-tuned • Provides a great starting point & saves huge amounts of time & money • Those wishing to, ‘can build upon’, and improve BERT BERT Saves Researchers Time AND Money
  20. 20. @dawnieando • Microsoft – MT-DNN • Facebook – RoBERTa • XLNet • ERNIE – Baidu • Lots of other contenders Since 2018 Major tech companies extend BERT
  21. 21. @dawnieando Training Datasets Like MSMARCO Are Used To Fine Tune With Question & Answer Datasets
  22. 22. @dawnieando Real Bing Questions Feed MSMARCO Microsoft Machine Reading Comprehension Dataset. Real Bing User Queries for NLU Research
  23. 23. @dawnieando You think SEOs are competitive? ML Engineers are more so • GLUE • SuperGLUE • MSMARCO • SQuAD …And Leaderboards
  24. 24. @dawnieando SuperGLUE was created because GLUE got too easy Progress was phenomenal with many new SOTAs
  25. 25. @dawnieando Language models like BERT help machines understand the nuance in word’s context and surrounding text cohesion What Purpose Does BERT Serve & How?
  26. 26. @dawnieando • Dates back over 60 years old to the Turing Test paper • Aims at understanding the way words fit together with structure and meaning. • NLU is Connected to the field of linguistics (computational linguistics) • Over time, increasingly computational linguistics overflows to a growing online web of content What is Natural Language Understanding?
  27. 27. @dawnieando • Natural language understanding requires: • Word’s context • Common sense reasoning Natural Language Recognition is NOT Understanding
  28. 28. @dawnieando Humans mostly understand nuance and jargon from multiple meanings in written and spoken word because of ‘context’ Humans ‘Naturally’ Understand Context
  29. 29. @dawnieando • Synonymous • Polysemous • Homonymous But Words Can Be VERY Problematic for Machines & Sometimes Even for Humans
  30. 30. @dawnieando “The meaning of a word is its use in a language” (Ludwig, Wittgenstein, Philosopher, 1953) Image attribution: Mortiz, Nahr (Public domain) Single Words Have No Meaning
  31. 31. @dawnieando The word ‘like’ in this sentence, is both a: !(VBP) : (‘verb’ (non 3rd-person, singular, present) ) !(IN) : (Preposition or subordinating conjunction) An Example of Word’s Meaning Changing • I -> PRP • Like -> VBP • That -> IN • He -> PRP • Is -> VBZ • Like -> IN • That -> DT
  32. 32. @dawnieando Linguists Tag ‘Parts of Speech’
  33. 33. @dawnieando E.g. Verbs, nouns, adjectives • Penn-treebank tagger -> 36 different parts of speech • CLAWS7 (C7) -> 146 different parts of speech • Brown Corpus Tagger -> 81 different parts of speech Words Are ‘Part of Speech’ When Combined
  34. 34. @dawnieando • He kicked the bucket • I have yet to tick that off my bucket list • The bucket was filled with water The Meaning of The Word ‘Bucket’ Changes
  35. 35. @dawnieando Words Need ’Text Cohesion’ The$‘Glue’$which$adds$meaning May$historically$be$‘stop$words’ Surrounding)words)can)change)‘intent’ They%add%‘context’
  36. 36. @dawnieando ”Ambiguity is the greatest bottleneck to computational knowledge acquisition, the killer problem of all natural language processing.” (Stephen Clark, formerly of Cambridge University & now a full- time research scientist with Google Deep Mind) Ambiguity Is Problematic
  37. 37. @dawnieando • Words with a similar meaning to something else • Example: humorous, comical, hilarious, hysterical are ALL synonyms of funny Synonymous (Synonyms)
  38. 38. @dawnieando Ambiguity & Polysemy • Ambiguity is at a sentence level • Polysemous words are arguably the most problematic due to ‘nuanced’ nature
  39. 39. @dawnieando • Words usually with the same root and multiple meanings • Example: “Run” has 396 Oxford English Dictionary definitions Polysemous (Polysemy)
  40. 40. @dawnieando •Over%40%%of%English%words%are% polysemous)(McCarthy,)1997;) Durkin'&'Manning,'1989)
  41. 41. @dawnieando • Words spelt the same but with very different ‘root’ of word meanings • Example: pen (writing implement), pen (pig pen) • Example: rose (stood up / ascended), rose (flower) • Example: bark (dog sound), bark (tree bark) Homonyms
  42. 42. @dawnieando Spelt differently with VERY different meanings but sound exactly the same • Draft, draught • Dual, duel • Made, maid • For, fore, four • To, too, two • There, their • Where, wear, were Homophones – Difficult To Disambiguate Verbally
  43. 43. @dawnieando Fork handles Four candles Very difficult to disambiguate in spoken word Worse When Words are Joined Together
  44. 44. @dawnieando Did you want four candles or fork handles? Much Comedy Comes From ‘Play on Words’
  45. 45. @dawnieando Which Does Not Bode Well For Voice Search
  46. 46. @dawnieando EXAMPLES • Zipfian Distribution • Firthian Linguistics • Treebanks • Language can be tied back to mathematical spaces & algorithms Language Has Natural Patterns & Phenomena
  47. 47. @dawnieando Example: Zipfian Distribution (Power Law) • The frequency of any word in a collection is inversely proportional to its rank in the frequency table • Applies to any word frequency ANYWHERE • Image is 30 Wikipedias
  48. 48. @dawnieando To illustrate Zipfian Distribution (Most used Words): Rank Word Frequency)of)Use)in)a)Corpus 1 the 2 be 1/2 3 to 1/3 4 of 1/4 5 and 1/5 6 a 1/6 7 in 1/7 8 that 1/8 9 have 1/9 10 I 1/10
  49. 49. @dawnieando “You shall know a word by the company it keeps” (Firth, 1957) Firthian Linguistics One Such Phenomenon is Co-occurrence
  50. 50. @dawnieando Words with similar meaning tend to live near each other in a body of text Word’s ‘nearness’ can be measured in mathematical vector spaces – a context vector is ‘word’s company’ Distributional Relatedness & Firthian Linguistics
  51. 51. @dawnieando Co-occurrence, Similarity & Relatedness • Language models are trained on large bodies of text to learn ‘distributional similarity’ (co- occurrence)
  52. 52. @dawnieando Context Vectors & Word Embeddings • And build vector space models for word embeddings • Models learn the weights of similarity & relatedness distances
  53. 53. @dawnieando Context-Free Word Embeddings • Past models have been context-free embeddings • They lacked the ‘text-cohesion necessary to understand a word in context
  54. 54. @dawnieando • He kicked the bucket • I have yet to tick that off my bucket list • The bucket was filled with water Remember ‘bucket’ Without Text Cohesion?
  55. 55. @dawnieando Word’s Context Still Needed Gaps Filling • Past models used context-free embeddings • A moving ‘context window’ was used to gain word’s context
  56. 56. @dawnieando But Even Then True Context Needs Both Sides of a Word • Past models were ‘uni-directional’ • The context window moved from left to right or right to left
  57. 57. @dawnieando They Didn’t Look At Words On Either Side Simultaneously
  58. 58. @dawnieando !Bi-directional !Encoder !Representations From !Transformers So What About That B – E – R – T explanation?
  59. 59. @dawnieando • BERT can see the word’s context on both sides of a word in a context window Bi-Directional is The B in BERT
  60. 60. @dawnieando !Encoder Representations relates to the input and output process of ‘word’s context’ & embeddings What About Encoder Representations?
  61. 61. @dawnieando !Transformer is a big deal !Derived from a 2017 paper called ‘Attention is all you Need’ (Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I., 2017) What About The Transformer Part?
  62. 62. @dawnieando Transformer & Attention Works out how important words are to each other in a given context & focuses attention
  63. 63. @dawnieando !Bi-directional !Encoder !Representations From !Transformers This Technology Provides Word’s Context
  64. 64. @dawnieando River Bank or Financial Bank? By identifying ‘cheque’ or ‘deposit’ in the company of ‘bank’ BERT can disambiguate from a ‘river’ bank
  65. 65. @dawnieando So Where is BERT’s Value in Google Search • Named entity determination • Textual entailment (next sentence prediction) • Coreference resolution • Question answering • Word sense disambiguation • Automatic summarization • Polysemy resolution
  66. 66. @dawnieando BERT recognizes the word ‘to’ makes all the difference to the intent of the query BERT and Disambiguating Nuance
  67. 67. @dawnieando BERT recognizes the ambiguous word ‘stand’s meaning and importance in the context of the query BERT and Disambiguating Nuance
  68. 68. @dawnieando !A single word can change the whole intent of a query !Conversational queries particularly so !The ‘stop words’ are actually part of text-cohesion !Historically ‘stop-words’ were often ignored !The next sentence matters BERT and Intent Understanding
  69. 69. @dawnieando Example: “I remember what my Grandad said just before he kicked the bucket.” Next Sentence Prediction (Textual Entailment) Often the next sentence REALLY matters
  70. 70. @dawnieando “How far do you reckon I can kick this bucket?” Not What You Expected?
  71. 71. @dawnieando • There have been lots of improvement by others upon BERT • Google have likely improved dramatically on BERT too • There were some issues with next-sentence prediction • Facebook built RoBERTa BERT Probably Doesn’t Resemble The Original BERT Paper
  72. 72. @dawnieando • Named entity determination • Coreference resolution • Question answering • Word sense disambiguation • Automatic summarization • Polysemy resolution Featured Snippets Knowledge Graph & Web Page Extraction Together
  73. 73. @dawnieando !BERT is multilingual from mono-lingual !Other language specific BERTs are being built !Transformer was trained on international translations !Language has transferrable phenomena BERT and International SEO Expect Big Things
  74. 74. @dawnieando • Deepset – German BERT • CamemBERT – French BERT • AlBERTo – Italian BERT • RobBERT - Dutch RoBERTa model BERT & International SEO
  75. 75. @dawnieando !The challenges of Pygmalion !Conversational search can now ‘scale’ !BERT takes away some of the human labelling effort necessary !Next sentence prediction could impact assistants and clarifying questions BERT and Conversational Search Expect Big Things
  76. 76. @dawnieando Semantic Heterogeneity Issues in Entity Oriented Search (Semantic Search) !Helps with anaphora & cataphora resolution (resolving pronouns of entities) !Helps with coreference resolution !Helps with named entity determination !Next sentence prediction could impact assistants and clarifying questions
  77. 77. @dawnieando Bing has been BERTing since April 2019 • Impacts ALL Bing queries globally
  78. 78. @dawnieando • It’s supposed to be natural • In the same way you can’t optimize for Rank Brain you can’t optimize for BERT • BERT is a tool / learning process in search for disambiguation & contextual understanding of words • BERT is a ‘black-box’ algorithm Why can’t you optimize for BERT?
  79. 79. @dawnieando • Black-box algorithm • Hugging Face coined the phrase BERTology • Now a field of study exploring why BERT makes choices • Some concerns over bias & responsible AI Black Box Algorithms & BERTology
  80. 80. @dawnieando !Cluster together content and interlink well on topic & nuance !Avoid ‘too-similar’ completing categories - merge !Consider not just the content in the page but the content in the linked pages & sections !Consider the content of the ‘whole domain’ as everything contributes in co-occurrence !Be extra vigilant when ‘pruning Utilising Co-Occurrence Strategically Employ Relatedness
  81. 81. @dawnieando Categorisation & Subcategorisation Are King • Employ strong conceptual logic in your site architecture • Be careful with random blogs • If you must ‘tag’, tag thoughfully
  82. 82. @dawnieando Anyone can build a BERT to train their own language processing system for a variety of natural language understanding downstream tasks. Fine-tuning can be carried out in a short time BERT represents a union of data science and SEO Anyone Can Use BERT – BERT is a Tool
  83. 83. @dawnieando • Automatic categorization & subcategorization of content • Automatic generation of meta-descriptions • Automatic summarization of extracts & teasers • Categorising user-generated content / posts probably better than humans How Could BERT Be Harnessed For Efficiency in SEO? A Few Examples
  84. 84. @dawnieando • J R Oakes - @jroakes • Hamlet Batista - @hamletbatista • Andrea Volpini - @cyberandy • Gefen Hermesh - @ghermesh SEOs Are Getting Busy With BERTishness
  85. 85. @dawnieando Efficiency is Also A Focus DistilBERT (Hugging'Face) ALBERT'(Google) Fast%BERT
  86. 86. @dawnieando • Original BERT was computationally expensive to run • ALBERT stands for A Lite BERT • Increased efficiency • ALBERT is BERT’s natural successor • ALBERT much leaner whilst providing similar results • A joint research work between Google & Toyota ALBERT – BERT’s Successor
  87. 87. @dawnieando Reformer (Google) – Transformer’s Successor Understands word’s context from the perspective of a ‘whole novel’. les-ai-language-model-reformer-can- process-the-entirety-of-novels/
  88. 88. @dawnieando Growth has been huge in the natural language processing community – Current Superglue Leaderboard BERT Was Just The Start • Google T5 is winning • Even more advanced technology • Transfer-learning • Expect big things
  89. 89. @dawnieando SEE YOU AT THE NEXT SMX!