Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BERT Explained: What You Need to Know About Google’s New Algorithm

3,823 views

Published on

Google’s newest algorithmic update, BERT, helps Google understand natural language better, particularly in conversational search.

BERT (which stands for Bidirectional Encoder Representations from Transformers) will impact around 10% of queries. It will also impact organic rankings and featured snippets. So this is no small change!

But did you know that BERT is not just any algorithmic update, but also a research paper and machine learning natural language processing framework?

In fact, in the year preceding its implementation, BERT has caused a frenetic storm of activity in production search.

In this presentation, Dawn Anderson of Bertey explains exactly what Google’s BERT is all about.

You’ll learn:
– What BERT really is and how it works.
– How BERT will impact search.
– Whether you should try (or can) optimize your content for BERT.

Published in: Marketing
  • Be the first to comment

BERT Explained: What You Need to Know About Google’s New Algorithm

  1. 1. BERT Explained: What you need to know about Google’s new algorithm Dawn Anderson #SEJThinktank @dawnieando
  2. 2. About Me #SEJThinktank @dawnieando
  3. 3. Also…Meet)Bert)and)Tedward #SEJThinktank @dawnieando
  4. 4. We’re%talking%about%BERT%in%Search%today #SEJThinktank @dawnieando
  5. 5. What%BERT%really%is #SEJThinktank @dawnieando
  6. 6. Important:*BERT*is*many* things #SEJThinktank @dawnieando
  7. 7. BERT%is%a%Google%search% algorithm%‘ingredient’%/%tool% /%framework%called%‘Google% BERT’ #SEJThinktank @dawnieando
  8. 8. BERT%is%also%an%open% source%research% project%&%academic% paper #SEJThinktank @dawnieando
  9. 9. Google& BERT&Paper • Devlin,&J.,&Chang,&M.W.,&Lee,&K.&and&Toutanova,&K.,&2018.&Bert:&PreD training&of&deep&bidirectional&transformers&for&language& understanding. arXiv preprint*arXiv:1810.04805. #SEJThinktank @dawnieando
  10. 10. BERT% (Bidirectional% Encoder% Representation% from% Transformers) #SEJThinktank @dawnieando
  11. 11. Probably…)Most)mentions)of) BERT)online)are)NOT)about) THE)Google)BERT)update #SEJThinktank @dawnieando
  12. 12. BERT%Has% Dramatically% Accelerated%NLU% (natural%language% understanding) #SEJThinktank @dawnieando
  13. 13. Google’s(move(to(open( source(BERT(has(probably( changed(natural(language( processing(forever #SEJThinktank @dawnieando
  14. 14. THE$ML$&$NLP$ COMMUNITY$ ARE$VERY$ EXCITED$ ABOUT$BERT #SEJThinktank @dawnieando
  15. 15. BERT%has%been%pre.trained%on%a%lot%of%words%…%on%the%whole%of%the%English% Wikipedia%(2,500%million%words) #SEJThinktank @dawnieando
  16. 16. VANILLA&BERT&PROVIDES&A& PRE/TRAINED&STARTING& POINT&LAYER&FOR&NEURAL& NETWORKS&IN&MACHINE& LEARNING&&&NATURAL& LANGUAGE&DIVERSE&TASKS #SEJThinktank @dawnieando
  17. 17. EVERYBODY(WANTS(TO(‘BUILD2A2 BERT.((NOW(THERE(ARE(LOADS(OF( ALGORITHMS(WITH(BERT #SEJThinktank @dawnieando
  18. 18. Whilst'BERT' has'been'pre2 trained'on' Wikipedia'it'is' fine2tuned'on' ‘questions' and'answer' datasets’
  19. 19. MS#MARCO:#A# Human# Generated# MAchine#Reading# Comprehension# Dataset • Rajpurkar,#P.,#Zhang,#J.,#Lopyrev,#K.#and#Liang,#P.,#2016.#Squad:# 100,000+#questions#for#machine#comprehension#of#text. arXiv&preprint& arXiv:1606.05250. #SEJThinktank @dawnieando
  20. 20. MS#MARCO #SEJThinktank @dawnieando
  21. 21. Real%Bing% Questions% Feed%MS% MARCO From%real%Bing%anonymized% queries #SEJThinktank @dawnieando
  22. 22. Researchers(compete(over(Natural(Language(Understanding(with(e.g.(SQuAD( (Stanford(Question(&(Answering(Dataset) #SEJThinktank @dawnieando
  23. 23. BERT%now%even%beats%the%human%reasoning%benchmark%on%SQuAD #SEJThinktank @dawnieando
  24. 24. Lots%of%the%major%AI% companies%are%also% building%BERT%versions #SEJThinktank @dawnieando
  25. 25. Microsoft)extends)on)BERT)with)MT4DNN #SEJThinktank @dawnieando
  26. 26. RoBERTa' from' Facebook #SEJThinktank @dawnieando
  27. 27. SuperGLUE Benchmark1was1created1because1GLUE1became1too1easy #SEJThinktank @dawnieando
  28. 28. What%challenges%does% BERT%help%to%solve? #SEJThinktank @dawnieando
  29. 29. The$Problem$with$ Words #SEJThinktank @dawnieando
  30. 30. #SEJThinktank @dawnieando
  31. 31. Words&are$problematic.$$ Ambiguous…$polysemous…$ synonymous #SEJThinktank @dawnieando
  32. 32. Ambiguity)and) Polysemy • Almost)every)other)word)in)the) English)language)has)multiple) meanings #SEJThinktank @dawnieando
  33. 33. In#spoken#word#it# is#even#worse# because#of# homophones#and# prosody #SEJThinktank @dawnieando
  34. 34. Like%“four% candles”%and% “fork% handles” #SEJThinktank @dawnieando
  35. 35. Which%does%not%bode%well%for%conversational% search%into%the%future #SEJThinktank @dawnieando
  36. 36. Word’s'Context • ”The'meaning'of'a'word'is'its'use'in'a'language”'(Ludwig' Wittgenstein,'Philosopher,'1953) • Image'attribution:'Moritz'Nähr'[Public'domain] #SEJThinktank @dawnieando
  37. 37. Word’s'Context'Changes' As'A'Sentence'Evolves • The'meaning'of'a'word'changes'(literally)'as'a' sentence'develops • Due'to'the'multiple'parts'of'speech'a'word' could'be'in'a'given'content #SEJThinktank @dawnieando
  38. 38. Like%“like” We%can%see%in%just%this% short%sentence%alone%using% Stanford%Part%of%Speech% Tagger%Online%that%the% word%like%is%considered%to% be%2%separate%parts%of% speech http://nlp.stanford.edu:8080/parser/index.jsp #SEJThinktank @dawnieando
  39. 39. Like%“like” • For%example:%The%word%”like”%has% several%possible%parts%of%speech% (including%‘verb’,%‘noun’,%‘adjective’) • POS%=%Part%of%Speech #SEJThinktank @dawnieando
  40. 40. Natural'Language'Recognition'is' NOT'Understanding • Natural'language'understanding'requires' understanding'of'context'and'common'sense' reasoning. VERY'challenging'for'machines,'but' largely'straightforward'for'humans. #SEJThinktank @dawnieando
  41. 41. Natural'language' understanding'is'NOT' structured'data #SEJThinktank @dawnieando
  42. 42. Structured(data(helps( to(disambiguate(but( what(about(the(‘hot( mess’(in(between? #SEJThinktank @dawnieando
  43. 43. AND$NOT$EVERYONE$ OR$THING$IS$ MAPPED$TO$THE$ KNOWLEDGE$GRAPH #SEJThinktank @dawnieando
  44. 44. #SEJThinktank @dawnieando
  45. 45. Ontology(Driven(Natural(Language(Processing Image&credit:&IBM #SEJThinktank @dawnieando
  46. 46. How$can$search$ engines$fill$in$the$ gaps$between$ named$entities? #SEJThinktank @dawnieando
  47. 47. Natural'Language'Disambiguation #SEJThinktank @dawnieando
  48. 48. Word’s'Company “You'shall'know'a'word'by'the'company'it' keeps”'(John'Rupert'Firth,'Linguist,1957) Image'Attribution:'Wikimedia'Commons' Public'Domain #SEJThinktank @dawnieando
  49. 49. Words&That&Live& Together&Are& Strongly& Connected • Co7occurrence • Co7occurrence&provides&context • Co7occurrence&changes&word’s& meaning • Words&that&share&similar&neighbours& are&also&strongly&connected • Similarity&&&relatedness #SEJThinktank @dawnieando
  50. 50. Language'models'are'trained'on'very'large' text'corpora'or'collections'(loads'of'words)' to'learn'distributional'similarity #SEJThinktank @dawnieando
  51. 51. Vector'representations'of'words'(Word'Vectors)
  52. 52. And$build$vector$ space$models$ for$word$ embeddings king$7 man$+$ woman$=$queen
  53. 53. Models'learn'the' weights'of'the' similarity'and' relatedness'distances #SEJThinktank @dawnieando
  54. 54. EVEN$IF$WE$UNDERSTAND$THE$ENTITY$ (THING)$ITSELF$WE$NEED$TO$UNDERSTAND$ WORD’S$CONTEXT #SEJThinktank @dawnieando
  55. 55. #SEJThinktank @dawnieando
  56. 56. They%need% ‘Text% cohesion’ Cohesion is%the grammatical and% lexical linking%within%a%text% or sentence that%holds%a%text% together%and%gives%it%meaning.% Without%surrounding%words%the% word%bucket%could%mean% anything%in%a%sentence #SEJThinktank @dawnieando
  57. 57. Semantic)context)matters • He)kicked)the)bucket • I)have)yet)to)cross)that)off)my)bucket)list • The)bucket)was)filled)with)water #SEJThinktank @dawnieando
  58. 58. An#important# part#of#this#is# ‘Part#of# Speech’#(POS)# tagging #SEJThinktank @dawnieando
  59. 59. Chunking(and(Tokenization #SEJThinktank @dawnieando
  60. 60. Example(Part(of(Speech(Tagging((POS) #SEJThinktank @dawnieando
  61. 61. How$BERT$works #SEJThinktank @dawnieando
  62. 62. PAST%LANGUAGE%MODELS% (E.G.%WORD2VEC%&% GLOVE2VEC)%BUILT% CONTEXT:FREE%WORD% EMBEDDINGS #SEJThinktank @dawnieando
  63. 63. BERT%provides% ‘context’ #SEJThinktank @dawnieando
  64. 64. BERT%has%been%pre.trained%on%a%lot%of%words%…%on%the%whole%of%the%English% Wikipedia%(2,500%million%words) #SEJThinktank @dawnieando
  65. 65. B"#>"Bi#directional #SEJThinktank @dawnieando
  66. 66. A"Moving" Word" ‘Context" Window’ #SEJThinktank @dawnieando
  67. 67. Example(context(window(size(3 Source'Text Training' Samples The quick brown fox jumps over the lazy dog (the,(quick)( (the,( brown)( (the,(fox) The quick brown fox jumps over the lazy dog (quick,(the)( (quick,( brown)( (quick,(fox)( (quick,( jumps) The quick brown fox jumps over the lazy dog Etcetera The quick brown fox jumps over the lazy dog Etcetera #SEJThinktank @dawnieando
  68. 68. Previously+Uni.Directional #SEJThinktank @dawnieando Previously+all+language+ models+were+uni. directional+so+could+only+ move+the+context+window+ in+one+directional A+moving+window+of+‘n’+ words+(either+left+or+right+ of+a+target+word)+to+ understand+word’s+context
  69. 69. Most%language%modellers%are%uni0 directionalSource'Text Writing a list of random sentences is harder than I Initially thought it would be Writing a list of random sentences is harder than I Initially thought it would be Writing a list of random sentences is harder than I Initially thought it would be Writing a list of random sentences is harder than I Initially thought it would be They%can%traverse%over%the%word’s%context%window%from%only%left%to%right%or%right%to%left.%%Only%in%one%direction,%but% not%both%at%the%same%time #SEJThinktank @dawnieando
  70. 70. BERT%is%different.%%BERT%uses%bi1directional% language%modelling.%%The%FIRST%to%do%thisSource'Text Writing a list of random sentences is harder than I Initially thought it would be Writing a list of random sentences is harder than I Initially thought it would be Writing a list of random sentences is harder than I Initially thought it would be Writing a list of random sentences is harder than I Initially thought it would be Bert%can%see%both%the%left%and%the%right%hand%side%of%the%target%word #SEJThinktank @dawnieando
  71. 71. BERT%can%see%the%WHOLE% sentence%on%either%side%of%a% word%(contextual%language% modelling)%and%all%of%the% words%almost%at%once #SEJThinktank @dawnieando
  72. 72. Did$you$mean$“bank”? Or$did$you$mean$“bank”? #SEJThinktank @dawnieando
  73. 73. ER #> Encoder Representations #SEJThinktank @dawnieando
  74. 74. T"#>"Transformers #SEJThinktank @dawnieando
  75. 75. Bert%uses%‘Transformers’%&% ’Masked%Language%Modelling’ #SEJThinktank @dawnieando
  76. 76. Masked'Language' Modelling'Stops' The'Target'Word' From'Seeing'Itself #SEJThinktank @dawnieando
  77. 77. Transformers* (Attention* simultaneously) #SEJThinktank @dawnieando
  78. 78. Attention'is'all'you' need #SEJThinktank @dawnieando
  79. 79. Type%of%natural%language% tasks%BERT%helps%with #SEJThinktank @dawnieando
  80. 80. Things' Like: • Named'entity'determination • Textual'entailment'(next'sentence'prediction) • Coreference'resolution • Question'answering • Word'sense'disambiguation • Automatic'summarization • Polysemy'resolution #SEJThinktank @dawnieando
  81. 81. 11"NLP" Tasks BERT"advanced"the"State"of"the"Art"(SOT)"of"11" NLP"Tasks #SEJThinktank @dawnieando
  82. 82. Polysemy(&( Homonyms #SEJThinktank @dawnieando
  83. 83. Coreference( resolution #SEJThinktank @dawnieando
  84. 84. Pronouns'can'be' problematic #SEJThinktank @dawnieando
  85. 85. Computer) programs)lose) track)of)who)is) who)easily I’m)confused…)Here…) Have)some)flowers)instead) #SEJThinktank @dawnieando
  86. 86. Anaphora(&(Cataphora #SEJThinktank @dawnieando
  87. 87. Named&Entity& Determination #SEJThinktank @dawnieando
  88. 88. Named&Entity&Recognition&is& NOT&Named&Entity& Disambiguation #SEJThinktank @dawnieando
  89. 89. Named&entities&can&be& polysemic #SEJThinktank @dawnieando
  90. 90. Did$you$mean? • Amadeus$Mozart$(composer) • Mozart$Street • Mozart$Cafe #SEJThinktank @dawnieando
  91. 91. AND$VERBALLY…WHO$(WHAT)$ARE$ YOU$TALKING$ABOUT? ”LYNDSEY$DOYLE”$OR$”LINSEED$ OIL”? #SEJThinktank @dawnieando
  92. 92. #SEJThinktank @dawnieando
  93. 93. #SEJThinktank @dawnieando
  94. 94. Textual(Entailment((Next( sentence(prediction) #SEJThinktank @dawnieando
  95. 95. BERT%can% identify%which% sentence%likely% comes%next% from%two% choices #SEJThinktank @dawnieando
  96. 96. OFTEN&THE&NEXT& SENTENCE&REALLY& MATTERS #SEJThinktank @dawnieando
  97. 97. I"Remember" When"My" Grandad" Kicked"The" Bucket #SEJThinktank @dawnieando BERT"is"able"to" understand"the"NEXT" sentence The"NEXT"sentence"here" provides"the"context
  98. 98. “How%far%do%you% reckon%I%could%kick% this%bucket?” #SEJThinktank @dawnieando
  99. 99. How$BERT$will$impact$ search? #SEJThinktank @dawnieando
  100. 100. BERT%will%help%Google% to%better%understand% human%language #SEJThinktank @dawnieando
  101. 101. More%able%to%scale% conversational%search% (Pygmalion%alternative) #SEJThinktank @dawnieando
  102. 102. Expect'big'leaps' for'international' SEO #SEJThinktank @dawnieando
  103. 103. Google&will&be&better& able&to&understand& ‘contextual&nuance’&&& ambiguous&queries #SEJThinktank @dawnieando
  104. 104. Should'you'try'(or'can)' optimize'our'content' for'BERT? #SEJThinktank @dawnieando
  105. 105. PROBABLY( NOT #SEJThinktank @dawnieando
  106. 106. ‘Bertology’+– ‘The+ study+of+why+BERT+ does+things+(Hugging+ Face) #SEJThinktank @dawnieando
  107. 107. BLACK&BOX& ALGORITHMS #SEJThinktank @dawnieando
  108. 108. Layers' Everywhere #SEJThinktank @dawnieando
  109. 109. BERT%by%name% but%not%by%nature #SEJThinktank @dawnieando
  110. 110. More%efficient%BERT’s%have% been%developed • DistilBERT • ALBERT • Fast BERT #SEJThinktank @dawnieando
  111. 111. ALBERT #SEJThinktank @dawnieando BERT’s)successor)from) Google Joint)work)between) Google)Research)&)Toyota) Technological)Institute
  112. 112. Algorithmic+ Bias+ Concerns Ricardo+Baeza5Yates'+work+5 Bias+on+the+Web NoBIAS+Project IBM+initiatives+to+prevent+bias BERT+does+not+know+why+it+makes+decisions BERT+is+considered+a+‘black+box++algorithm’ Programmatic+bias+is+a+concern Algorithmic+justice+league+is+active #SEJThinktank @dawnieando
  113. 113. #SEJThinktank @dawnieando
  114. 114. Keep$in$Touch @dawnieando @BeBertey #SEJThinktank @dawnieando
  115. 115. References • Rajpurkar,.P.,.Zhang,.J.,.Lopyrev,.K..and.Liang,.P.,.2016..Squad:.100,000+.questions.for.machine.comprehension.of. text. arXiv preprint*arXiv:1606.05250. • Vaswani,.A.,.Shazeer,.N.,.Parmar,.N.,.Uszkoreit,.J.,.Jones,.L.,.Gomez,.A.N.,.Kaiser,.Ł..and.Polosukhin,.I.,.2017..Attention.is.all. you.need..In Advances*in*neural*information*processing*systems (pp..5998V6008). #SEJThinktank @dawnieando

×