Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020

4,802 views

Published on

Connecting the probable dots in content and data can help significantly and improve your search strategy. Ambiguity in SEO comes in many form too, going beyond content and into entities and locations. This talk touches on some of the areas where ambiguity can impact and hinder your performance

Published in: Marketing

Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020

  1. 1. Connecting)The) Probable(Dots Disambiguating+strategies+for+entity+ oriented(search(and(natural( language'understanding' Dawn%Anderson @dawnieando #FOS20
  2. 2. The$probability$ of#rolling#an#even# number'or'an' odd#number#is# 1:2
  3. 3. This%is%‘Equiprobability’%! Both%odds$and$evens$are$ equiprobable+outcomes
  4. 4. “Assigns' equal probabilities+to+ outcomes(when(they( are$judged$to$be$ equipossible”" (Wikipedia)
  5. 5. In#URLs,#content,# locations)&)entities) equiprobability-can- be#problematic
  6. 6. This%could%be%in#The# Web$of$Document$ (content)(or(The(Web( of#Data#(entities,# including(locations)
  7. 7. Where%two%or%more%domain% assets,&surface&forms,&or& entity&determinations&are& considered)equal)a) representative*is*‘the*chosen* one’%(canonical)
  8. 8. Equiprobability-is-like-‘See-saw-SEO’
  9. 9. Nearest'Neighbours
  10. 10. All#the#pluralities# are$clustered$ together' (neighbours)"and$ one$is$picked
  11. 11. LaPlace’s)Principle)of) Indifference
  12. 12. Where%there%is%equipossibility there%is%a%need%for%further% confirmations+beyond+the+ content&itself
  13. 13. One$must$ beat%the% other
  14. 14. Sometimes(‘bits’( are$picked$from$ multiple( ‘equiprobable, outcomes’
  15. 15. Today’s(Topic:(Disambiguating( equiprobability-for-improved- search'performance
  16. 16. WHO IS DAWN ANDERSON @dawnieando dawn.anderson@@move-it- marketing.co.uk move-it-marketing.co.uk linkedin.com/in/msdawnanderson Meet$Bert$&$Tedward @dawnieando #FOS20
  17. 17. In#an#ideal# world& ‘precision’+and+ ‘recall’(would( be#perfect…# but…%it’s%NOT
  18. 18. So#search#engines#must# also%work%off%probability% determination
  19. 19. Using&‘The&Law&of& Large&Numbers’& (Educated)guesses))– Progressively+learning
  20. 20. Google&Uses&(some)'Machine'Learning(for(Crawling,(Indexing(&(Ranking
  21. 21. On#the#‘long#tail’# there’s'often' very%little,%or% nothing'in'it
  22. 22. Low$to$no$page!rank%pages% (but%that%is%also%now%just% one$of$very$many$things)
  23. 23. See!saw$SEO
  24. 24. When%the%canonical% balance'tips'from' equiprobability- slightly
  25. 25. Your%ranking%flux%might%well%be%transferring%equiprobability
  26. 26. Several'types'of' equiprobability- ambiguitity may$cause$ your%project%to% perform'less'well' overall'in'search
  27. 27. Different( types&of& ambiguity) can$impact$ SEO Exact&duplicate&content Near%duplicate%content Natural'language'ambiguity Generational*cruft*based*ambiguity Machine(learned(ambiguity(‘lag’ Dot!to!dot$ambiguity Semantic)heterogeneity Location(based(ambiguity
  28. 28. Exact&duplicate& content
  29. 29. Google& probably( chooses&the& one$with$ highest' ‘probability’ Most%linked%to%internally Most%linked%to%externally Included(in(XML(sitemap HTTPS Prettier&URLs Important)parent
  30. 30. Maybe&one&or& several'of'many' ‘importance’- factors
  31. 31. Google&works&this& out$VERY$quickly
  32. 32. Near!duplicate* content
  33. 33. It#starts#off#well# enough…(but(over( time
  34. 34. Google&realizes& these%pages%are% ‘mostly’)the) same%as%others
  35. 35. Some% reasons Many%matching%shingles Quilting ‘Borrowed’)content)at)scale Feeds Big$Boilerplate$/$little$main$value Data$driven$sites$with$little$value
  36. 36. Sometimes(these(are(sites( trying'to'make'a'URL' footprint(well(beyond(their( contribution)to)a)positive) ‘Network)Effect’
  37. 37. Spoofing(the( ‘Network)Effect’
  38. 38. After&some&crawling& &"sampling"Google" puts%the%pieces%of% the$puzzle$together
  39. 39. And$begins$to$ exclude'pages' from%the%index
  40. 40. Exclusion*still* happens…(A(LOT
  41. 41. I"ran"a"little" Twitter' competition) with%only% empathy(as( the$prize
  42. 42. Some%of%the% GSC$ ‘exclusion’, numbers( were$huge…$ millions
  43. 43. And$even$over…$one$billion
  44. 44. Crawl&demand& probably( follows&the& Pareto'Principle
  45. 45. Probably(20%(of(the( URLs%satisfy%80%%of% the$demand$&$ contribute*well*to* ‘The%Network%Effect’
  46. 46. Google&will&likely&use& mostly'‘Sampling’'on'those'' heavy&‘excluded&URLs’
  47. 47. Looking'for'small' samples'of' content&&&URL& patterns…)and…) ‘Discovered,,not, crawled’)will)be) high
  48. 48. ’Discovered,, not$crawled’$ in#GSC#is# Google& saying…
  49. 49. They%know%what’s% down%that%site% section(or(URL( parameter'path
  50. 50. The$verdict$is$ in
  51. 51. Those&URLs&will&be& ‘starved’*of*crawl
  52. 52. However'many' times&you&submit& and$inspect
  53. 53. Natural' language' ambiguity
  54. 54. For machines words are$ problematic.--Ambiguous…- polysemous…*synonymous
  55. 55. In#spoken#word#it#is# even$worse$because$of$ homophones(and( prosody
  56. 56. Like%“four% candles”)and) “fork&handles”
  57. 57. Word’s'context' (company)*helps& enormously
  58. 58. Words&that&are&related&live& near%each%other%in% mathematical)spaces
  59. 59. “You%shall%know%a% word%by%the% company(it(keeps”( (Firth,(1957)
  60. 60. Relatedness First&level& relatedness Second'level' relatedness
  61. 61. Co! occurrence' in#content# helps&A&LOT In#the#page#itself'(1st'level) In#the#interconnected#pages&(2nd level) In#the#subcategorization.(2nd level) In#the#site#sections((2nd level) In#the#external#relationships,(2nd level) In#the#domain#ontology#as#a#whole&(2nd level)
  62. 62. Informational+content+ adds$to$the$contextual$2nd level$relatedness$of$ transactional*content Transactional+ content Informational+ content
  63. 63. If#you#can#help# with% ‘informational+ needs’&you&can& ‘probably’*help* with% transactional* needs Add#value#in#rich# informational* needs Contextual*value* passes%throughout% the$whole$site
  64. 64. Adding& Contextual* Value&Does& NOT$Mean Adding&loads&of&content&below& ecommerce&pagesAdding Adding&‘LSI%keywords’%in#ecommerce# pagesAdding
  65. 65. High%topical% relatedness) &"context" vectors Internal(linking Not$spam
  66. 66. Utilise conceptual* ‘nearest(neighbours’ well
  67. 67. Utilise ‘Overflow*SEO’*as* demand&&&content& naturally(grows
  68. 68. Subcategories are one of the Kings
  69. 69. Merge%content% which%is%‘too’% conceptually+ very%similar%with% no#separate# demand
  70. 70. Google&may&realise&you& can$‘probably’$help$with$ transactional*queries
  71. 71. Dot!to!dot$ ambiguity
  72. 72. Who$knows$what$ this%dot%to%dot% puzzle&is&creating?
  73. 73. Absolutely* no!one
  74. 74. If#you#chop#away#or#migrate# half%of%your%website%then% equiprobability-will-likely-be-a- problem
  75. 75. A"half"drawn"website" is#not#the#same#as#a# fully%drawn%previous% website?
  76. 76. Although(you(may(rank(‘marginally’(better(for( fewer%things Your%domain%‘topic’% became&overall&more& about&some&topics&&&less& about&other&topics
  77. 77. Informational+ content&adds& to#the# contextual*2nd level$ relatedness)of) transactional* content Informational+ content Transactional+ content
  78. 78. Prune,'Merge,'Improve,'Archive Prune&– Only% ballast 01 Merge%– Highly' semantically+ related'content 02 Improve(– Evergreen'pages' worth&the&effort 03 Archive(– Older& pieces&of& temporal)content) (like&a&library) 04
  79. 79. Avoid&removing&or& hiding&user& generated(content( (except'spam)
  80. 80. Cruft&based& ambiguity
  81. 81. Generational*cruft* contributes+to+ equiprobability- issues
  82. 82. Just%a%few% examples…) legacy'is'a' problem 301$redirect$chains Inconsistent)301s No#301s#at#all Canonical(points(to(a(301 Canonical(points(to(a(no!index No!index&points&to&a&canonical
  83. 83. 301$Should$ Mean%The% Resource( Moved
  84. 84. You’re'saying' things'(tokens' (words),)topics,) concepts,) entities)'moved' so#they#can#be# re!filed
  85. 85. But$often$ there%is%little% or#no#match# at#all
  86. 86. Borderline)or) true%‘soft%404’
  87. 87. Also,&any&legacy&canonicals& are$now$defunct$by$the$ redirect'status
  88. 88. Cruft&&&migrations& are$like$a$game$of$ rugby&as&the&ball& (signals))is)passed) between&URLs
  89. 89. A"sign"of"this"is"a" temporarily+wrong+target+ ranking
  90. 90. One$URL$is$going$up.$$One$is$ going%down%as%signals%pass% from%one%to%the%other%over% time%– See!saw$SEO
  91. 91. Machine( learned' ambiguity)‘lag’
  92. 92. ALL#of#the#above#==
  93. 93. Crawl&Budget& Woes%/%Learned% Quality(Patterns
  94. 94. Your% reputation* precedes'you' – You$are$ being&judged& on#the#past
  95. 95. Your%quality(&(crawl will$be$ ‘machine)learned’)by)‘Large) Numbers’)over)time
  96. 96. Probably(you’ll(struggle( to#get#enough#crawl#for# Google&to&catch&up
  97. 97. You’ve'probably'got' URLs%which%have%not% been$crawled$for$years
  98. 98. Search'engines'realise there%is% no#‘demand’#for#your#poopy# pages’
  99. 99. Your%performance%will%be%based%on%what%is%indexed
  100. 100. Fear%not…%Small% Wordpress(site
  101. 101. You$need$to$lure$ a"Grumpy" Googlebot(with( tasty%quality% content&morsels
  102. 102. You$MUST$substantively,change, the$ratio$of$poor$quality$to$high$ quality(in(the(site(as(a(whole(&" add#more#value#OVERALL
  103. 103. Regain'the'crawl&and&get&more& demand
  104. 104. Watch&for& patterns((clues)(in( GSC$Coverage$&$ take%a%demand! driven'approach
  105. 105. In#transactional#pages# identify(valuable(content,( features(&(attributes(your( audience(wants(
  106. 106. Pro!tip:%Adding%moooaaarrr words&to&transactional&pages& does%not%always%add%much%value
  107. 107. Plus…&There&is&often&a& mathematically*natural* ordered%pattern%to%things% ranked'by'frequency
  108. 108. Like%word% frequencies* many%types%of% ‘ordered’' popularities+will+ have%a%Zipfian% Distribution
  109. 109. Where%the%frequency%of% n"is#inversely# proportional)to)its) frequency)table)rank)– 1/n
  110. 110. Zipfian' Distribution* occurs&in& many%other% rankings( unrelated)to) language Population*of*cities*in*a*country Corporation)sizes Income'rankings No.$of$people$watching$same$TV$ channel
  111. 111. Nurture&internal&link&graph& popularity*that*mirrors* ‘real&life&popularity
  112. 112. Popularity+&+Zipfian+Distributions Is#it#really# popular'in' ‘real&life’? Does%it%follow% ‘The%Network% Effect’?
  113. 113. Or#are#you#manipulating#it#for# things'you'want'to'rank'for' (which&aren’t&really&popular&at& all)?
  114. 114. Building(genuine( topical(hubs((Hub(&( spoke)'(Strongly' connected' components)
  115. 115. Google&puts&the&pieces&of&the& puzzle&together&slowly&but& increases(with(quality( improvements
  116. 116. Equipossibility goes%well% beyond'duplicate'content'too
  117. 117. Semantic) heterogeneity
  118. 118. Mis!matching)data)types)or) equivalencies+in+data+tables+ from%same%or%other%domains
  119. 119. Linked'Data'has'been'around'for'a'long'time
  120. 120. There%have%historically% been$several$ways$to$ implement(linked(data
  121. 121. Some%Types%of%Semantic%Web%Technologies%&% Their&Markups RDF SPARQL OWL SKOS RDFa JSON!LD Microdata
  122. 122. Several'linked!data$types$&$web$ sources'come'together'causing' equipossibility
  123. 123. Implementation+ Inconsistencies)Prevail) (ed)
  124. 124. The$Knowledge$Graph$ &"Its"Data"Sources
  125. 125. The$majority$of$Google’s$entity$ types&reference&the&Wikipedia& URI$or$the$Knowledge$Graph$MID
  126. 126. But$Now$Anything$Structured$Might$Be$Used
  127. 127. We#already#know#conversation#search#fills#gaps#via#web#information
  128. 128. The$web$is$a$‘Data$Lake’
  129. 129. Public'Datasets'(e.g.'Kaggle)'>'27k'datasets
  130. 130. Enter&Data&Search&by&Google&AI
  131. 131. Disambiguating+Data+ Sets%(is%hard)
  132. 132. I"asked"for"some"examples"on"Twitter
  133. 133. Example:) Philosophers*Dates* of#Birth#different#in# Wikidata &"DBPedia
  134. 134. The$community$responded
  135. 135. 'Mashed!up’$ Knowledge)Graphs) (Two%Don%Cherry’s% and$his$dog’s$name$ ! ‘Blue’
  136. 136. Image&Search&Got&Don&Cherry’s&Dog&Right
  137. 137. Wrong&price&included&in&images&from&other&sites
  138. 138. Images'from' Direct' Competitors
  139. 139. Edgar’s(Were(Comical
  140. 140. Van$Diesel$– Death&– To#Be# Advised??
  141. 141. Does%An%Ice% Cube%Have% Parents?
  142. 142. Teenage& Mutant&Ninja& Turtles(paint( works&of&art
  143. 143. Wikipedia(Made(Their(Point(Well
  144. 144. A"six"legged"horse?
  145. 145. Somes&Reputable& Knowledge) Repositories)are) Just%Plain%‘Wrong’
  146. 146. “Simply(saying(that(a(horse( has$five$legs$doesn't$make$ it#true#– calling'a'horse's' tail%a%leg%does%not%make%it% one.”&(Wikipedia,&2019)
  147. 147. Equivalency, between&data& sources'is'sought
  148. 148. Efforts'Are'Underway' to#Resolve Since&2011&major&search& engines&have&been& focusing)on)Schema) commonly
  149. 149. Google& Deprecating+ Data!vocabulary* Schema
  150. 150. Your% Breadcrumbs+ Markup'May' Be#Impacted
  151. 151. Entities'often'have' many%surface%forms
  152. 152. Schema:SameAs and$OWL:SameAs
  153. 153. SameAs'Seems'To'Be'Effective
  154. 154. BE#consistent#in#ALL# mentions(of(your(brand( name%&%ALL%THINGS% GENERALLY
  155. 155. Utilise'‘known’' (popular))public) datasets&&&knowledge& repositories
  156. 156. Avoid&conflict&with& ‘known’'(popular)' public'datasets'&' ‘knowledge* repositories’
  157. 157. Realise well$ known% knowledge) repositories( may$be$wrong
  158. 158. But$they$may$ conceptually+still+ be#‘right’"– They% will$PROBABLY$win
  159. 159. Location(based( ambiguity
  160. 160. Semantic)Heterogeneity) is#a#problem#locally
  161. 161. Entity& Address& (location)*is* a"VERY"Big" Problem( with%entity! oriented( search street%number Locality)! city%or%town street/route(name,(if(detected postal'code,'if'detected country,)if)detected broad_region+! administrative+area,+such+as+the+state,+if+ detected Narrow_region+! smaller'administrative'area,'such'as'county,' if#detected
  162. 162. Narrow_Region, is#VERY# Problematic Inconsistent)data)from) knowledge)repositories Inconsistent)data)from)websites Inconsistent)data)from)searchers HUGELY'SPLIT'EQUIPROBABILITY
  163. 163. Which% County'is' Bristol(in?
  164. 164. Bristol(is( BOTH%a% county'and'a' city
  165. 165. In#Scotland#and#Wales# it’s%worse&(They&work&of& a"different"regional" division'system)
  166. 166. Google&Does&Not& Understand)the) Word%‘Historic'
  167. 167. What%about% counties)that) don’t&even&exist& any$more?
  168. 168. At#least#this# one$mentions$ London%as%well% as#Middlesex
  169. 169. Everyone...)Is) confused
  170. 170. Older&humans& probably(keep( ambiguous) county'+'town' combinations* alive
  171. 171. In#location#focused#pages#adding#a# postcode(could(be(much(more(valuable( than%adding%lots%of%words
  172. 172. Contextual*relatedness* in#local#spaces#is# proximity!based
  173. 173. AreaServed Schema'&'Internal' linking&by&Proximity&(e.g.&lat (x)$ &"long%(y))%is#powerful
  174. 174. Google&sometimes&puts& the$pieces$of$the$puzzle$ together'sporadically
  175. 175. Align&ALL&the& dots
  176. 176. Remember:'Consistency'is'one'of'the'Kings
  177. 177. Leave%no%room%for% ambiguity
  178. 178. Keep$in$Touch •@dawnieando •@BeBertey
  179. 179. Some%Take%Aways
  180. 180. Go • Watch&for&‘impact’&patterns&to&maximise upon%in%GSC%‘coverage’%reports Optimise • Be#careful#you#don’t#prune#away#second#level#relatedness#signals Map • Consider)Pareto’s)Law)when)identifying)issues)to)fix)in)GSC)coverage Optimise • Check&for&‘wrong&page&ranking’&due&to&cruft&&&lag&passing&signals Get • On#location#sites#consider#internal#linking#on#proximity Identify • Add#value#to#improve#on#‘The#Network#Effect’#in#attributes#&#informational#content Use • Make%it%obvious%you%can%‘probably’%help%with%transactional%needs%through%other%content

×