Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

You Don't Know SEO

9,681 views

Published on

Mike King examines the state of the SEO industry and talks through knowing information retrieval will help improve our understanding of Google. This talk debuted at MozCon

Published in: Marketing

You Don't Know SEO

  1. 1. Download! http://bit.ly/ydkseo
  2. 2. inspired by @getify… This talk is inspired by a series of books called “You Don’t Know JavaScript” by Kyle Simpson
  3. 3. We open to our hero pondering the ramifications of search.
  4. 4. The Industry is so often wrong about how search works.
  5. 5. We can’t really listen to webmaster trends analysts anymore
  6. 6. At this moment Our hero began to miss matt cutts.
  7. 7. SEO GOOGLE INFORMATION RETRIEVAL We Spend most of our time on this level of abstraction THE HIERARCHY OF SEARCH
  8. 8. SEO GOOGLE INFORMATION RETRIEVAL WHO LIVES WHERE? Us and the W ebmaster TrendsAnalysts SearchEngineers Computer Scientists
  9. 9. We act more like users than the search engineers that we are.
  10. 10. VOCABULARY IS A KEY DISCONNECT Scoring Functions Query processing Query understanding Evaluation Click logs Document Statistics Ranking adjustment Retrieval Models Relevance feedback Sometimes Google is quite explicit, just in terminology that we don’t know or understand.
  11. 11. Mobile-first Accessibility driven by the mobile user experience STRUCTURED DATA Positioning content to be extracted easily SPEED Making sure the experience is fast as possible CONTENT Hyper-targeted and optimized for specific user contexts authority Building credibility to your content INTEGRATED SEARCH Making sure the different parts of the SERP work together BEING EFFECTIVE AT MODERN SEO IS ABOUT SIX THINGS
  12. 12. You don’t know information retrieval
  13. 13. THE BASIC FORM OF INFORMATION RETRIEVAL SYSTEM TEXTMODEL INDEXING RANKING QUERY OPERATIONS TEXT OPERATIONS VISUAL INTERFACE QUERY TEXT INDEX SEARCHING USER
  14. 14. Indexing Process per @tcmg
  15. 15. Google adheres very closely to the response codes.
  16. 16. Do you know about the 304 response code?
  17. 17. Use the 304 response code to manage your crawl allocation.
  18. 18. According to botify’s recent study, you’re not using 304s
  19. 19. Little known fact: you can request a crawl increase here after big changes
  20. 20. Quick Notes on JavaScript and SEO.
  21. 21. THE WEB RENDERING SERVICE uses Chrome 41. it doesn’t support everything the latest chrome does.
  22. 22. For auditing javascript sites, look at the hash in Screaming Frog.
  23. 23. If you want to get fancy, crawl how Google crawls. https://codeburst.io/a-guide-to-automating-scraping-the-web-with- javascript-chrome-puppeteer-node-js-b18efb9e9921
  24. 24. You can even watch a Googler teach you to scrape Google with it.
  25. 25. @justinrbriggs has a good step by step on how to do this. https://www.briggsby.com/auditing-javascript-for-seo/
  26. 26. Dynamic serving still can have pitfalls. Progressive enhancement is still the only way to be sure.
  27. 27. The missing piece Process Content must be processed before it can be indexed
  28. 28. Feature Extraction Google Extracts features of the page to inform indexing.
  29. 29. Google’s Version Source: Multi- stage query processing system and method for use with tokenspace repository
  30. 30. THE INDEX IS NOT ONE THING
  31. 31. The “Index” is really just a list of keywords, document ids and their scores… INVERTED INDEX
  32. 32. Doc Server Structure The Docserver is probably what you’re actually thinking of
  33. 33. The cache is the latest version of the page from the doc server.
  34. 34. This is how the “Document Scoring based on document content update” capability can work
  35. 35. THE INDEX IS NOT TWO THINGS
  36. 36. Search is Powered by 200+ microservices
  37. 37. How ranking basically works
  38. 38. 10 Second edit of Paul Haahr & Jeff Dean videos
  39. 39. RETRIEVAL MODELS AKA RANKING ALGOS INFORMATION NEED QUERY MODEL Q RETRIEVAL FUNCTIONR (q.d) OUTPUT DOCUMENT MODEL D dDOCUMENT F q
  40. 40. Basic Rank Scoring Summation of all terms
  41. 41. Basic Rank Scoring Weight of query term
  42. 42. Basic Rank Scoring Document Term Weight
  43. 43. Ranking Models
  44. 44. There are many Ranking Models
  45. 45. Bing’s rank scoring Model Just in case anyone cares…
  46. 46. Bing’s rank scoring Model As in Why The F**K are people using bing!
  47. 47. OH SNAP!
  48. 48. Google’s Scoring Functions Google Attempts multiple different “scoring functions” S o u r c e : F r a m e w o r k f o r e v a l u a t in g w e b s e a r c h s c o r in g f u n c t io n s
  49. 49. Post-retrieval Adjustment Google often adjusts rankings before presenting themS o u r c e : P h r a s e - b a s e d in d e x in g in a n in f o r m a t io n r e t r ie v a l s y s t e m
  50. 50. Query processing Source: Multi-stage query processing system and method for use with tokenspace repository
  51. 51. Query Adjustment – hummingbird? Google may also revise your query internally to serve better resultsS o u r c e : F r a m e w o r k f o r e v a l u a t in g w e b s e a r c h s c o r in g f u n c t io n s
  52. 52. Multi-modal search Google uses implicit signals as well as the explicit query to show you what ranks.
  53. 53. User Modeling Google builds models of you based on your search history.
  54. 54. Your search history is stored next to additional data that google derives
  55. 55. Queries become Entities Google breaks your queries down into entities first before determining what the result set is. This allows them to use the context of these entities to show more relevant results
  56. 56. 32 Word Query Limit This indicates that the inverted index does not store results for higher than a 32-gram.
  57. 57. So with all this complexity, how can we beat the bots?
  58. 58. By far one of the greatest blog posts ever written on seo.
  59. 59. Keyword Usage As You might imagine, leveraging Text Statistics to inform rankings indicates that you might want to use the keywords on the page. This is where having the opportunity to rank begins.
  60. 60. Tokenization Search Engines break paragraphs into sentences and sentences into “tokens” or individual words. This better positions content for statistical analysis
  61. 61. N-Grams N-grams are phrases of n length
  62. 62. Google made their entire n-gram dataset available in 2006
  63. 63. The n-gram viewer is this concept used on books
  64. 64. Synonyms & Close Variants Statistical relevance is computed using both synonyms and close variants of words.
  65. 65. Stemming Search Engines break words into their stems to better determine relationships and relevance
  66. 66. Lemmatization Similar to stemming, lEmmatization is the grouping of inflected forms of the same word or idea. For example, “better” and “good” have the same lemma.
  67. 67. Semantic Distance & Term Relationships Search engines look for how physically close words are together to better determine their relationship in statistical models.
  68. 68. Zipf’s Law Zipf’s law is a theory that words will be similarly distributed across documents in the same corpus (or document set).
  69. 69. Zipf’s Law applied When run on an actual dataset, zipf’s law tends to hold up in high rankings.
  70. 70. Latent Semantic Analysis Latent Semantic analysis represents content as a matrix to determine relationships
  71. 71. Tf-Idf Term frequency, inverse document frequency (TF-IDF) identifies the importance of keywords with respect to other keywords in documents in a corpus.
  72. 72. Ryte has a great guide on tf*IDF https://en.ryte.com/lp/tf-idf/
  73. 73. PHRASE-BASED INDEXING & CO-OCCURENCE Google Specifically looks for keywords that co-occur with a target keyword in a document set. Usage of those keywords is an indication of relevance for subsequent documents.
  74. 74. Phrase-based indexing
  75. 75. Heaps Law on Vocab Growth Heaps law indicates that vocabulary within a corpus grows at a predictable rate
  76. 76. Entity Salience Source: Techniques for automatically identifying salient entities in documents
  77. 77. Hidden markov Model Hidden Markov Models allow search engines to extract implicit entities
  78. 78. PAGE SEGMENTATION Modern Search engines determine both prominence of content and review the visual hierarchy to determine how valuable the content is to the page.
  79. 79. This All Boils down to fulfilling the expectations of users and search engines
  80. 80. Unfortunately, this is a place where most seo tools are very far behind the curve.
  81. 81. You can use Knime to really dig into text analytics https://www.knime.com/software
  82. 82. They have great examples of how to use knime’s Text processing plugin https://www.knime.com/term- coocurrence-heatmap-example
  83. 83. 1. Put your domain into semrush
  84. 84. Integrated Search is the way to influence this 2. See what you don’t rank well for In organic research
  85. 85. Integrated Search is the way to influence this 3. Pick a keyword
  86. 86. 4. Look at the serp
  87. 87. Really Google?
  88. 88. This is the page that ranks #86?!
  89. 89. 5. See What else could rank for our site.
  90. 90. Enter ryte’s content success
  91. 91. 6. Put in the keyword and compare tf-idf to ranking pages
  92. 92. You can see the corpus they pulled for review
  93. 93. 7. Put the copy in. see what co- occurring keywords are missing and make edits.
  94. 94. Searchmetrics content optimization https://www.searchmetrics.com/ content/optimization/ Searchmetrics’ tool also accounts for entities and readability.
  95. 95. Does Google Use Dwell Time?
  96. 96. Information retrieval Evaluation Measures CTR and Session Success Rate are best practices for “evaluating” the performance rank scoring models. Google isn’t lying, just being specific.
  97. 97. Does Google Use CTR?
  98. 98. QUERY & Click LOGS
  99. 99. What’s in a Query Log Source: Systems and methods for generating statistics from search engine query logs
  100. 100. QUERY LOGS ARE NOISY
  101. 101. Time Based Ranking Patent Says they do
  102. 102. Machine Learning Response Variable
  103. 103. Integrated Search is the way to influence this
  104. 104. Compelling meta descriptions
  105. 105. AdWords Search Console SEMRush Analytics Business RulesKeyword Portfolio Integrated Search is all about combining data from both channels…
  106. 106. TIE TOGETHER ANYTHING RELEVANT ON THE KEYWORD LEVEL KEYWORD PORTFOLIO DATA SEARCH CONSOLE DATA LANDING PAGE ANALYTICS DATA SEMRUSH DATA ADWORDS DATA SEARCHMETRICS DATA CRM DATA
  107. 107. DEVELOP A SERIES OF BUSINESS RULES if (Organic Impressions + Paid Impressions) < .3(Search Volume) and LP Conv% > Avg LP Conv% then Increase Bids. If (LP Conv% > Baseline Conversion Rate) & Keyword Difficulty < 50 then Increase SEO Effort If (LP Conv% < Baseline Conversion Rate) and QS < Avg QS then Improve Landing Page If Organic Ranking > 10 and Paid Ranking < 1 and LP Conv% > Baseline Conversion Rate then Increase SEO Effort If Organic Landing Page != Paid Landing Page and Paid Landing Page has a high QS then optimize your internal linking structure IF Need State = Action and LP Conv% < Avg LP Conv% then Improve Landing Page
  108. 108. Structured Data is not just this.
  109. 109. What kinda serp feature is that?
  110. 110. That’s just a table, but The relationships are clear.
  111. 111. Embrace all relevant semantic structural opportunities even if they don’t show something in the serps
  112. 112. Pagerank is still the measure, but I wonder if we ever truly understood it.
  113. 113. Ranking search results based on anchors
  114. 114. Google generates quotes from the linking page indicating that they want parity on both sides.
  115. 115. Internal link building is one of the most valuable things you can do. Especially on large sites.
  116. 116. More internal links leads to more crawl activity too
  117. 117. Compute Your Internal pagerank in R
  118. 118. Or get you an SEO tool that can do it.
  119. 119. Here’s An algorithmic Approach to Generating the best internal linking Structure Perform text analysis at publish to extract features Pull pages from the database with related features Place links in copy based on n-grams Be sure to limit the number of links per page Be sure there is only one link to a given page
  120. 120. You Don’t know Seo implementation You don’t know how implementation happens
  121. 121. Clients can’t always “just” fix something. And our expectations don’t match the effort required in some cases.
  122. 122. Http Header Changes http headers can be governed by the server or the application
  123. 123. Redirects redirects can be governed by the server or the application
  124. 124. Hreflang Managing hreflang requires a data structure of url relationships
  125. 125. Internal Linking Structure Often Requires sitewide adjustments
  126. 126. Xml Sitemaps Generating sitemaps requires a data structure of urls and their features.
  127. 127. G o o g l e h a s a g g r e s s ive s p e e d e x p e c ta tio n s Speed your Site up! Of course, google uses speed in its scoring function
  128. 128. Botify’s findings indicate big sites get crawled more when they are fast
  129. 129. Critical rendering Path We look to enforce it via the critical rendering path
  130. 130. These Recommendations at face value will break sites.
  131. 131. Actionable & optimal You don’t know how to be actionable and optimal
  132. 132. 1. Segment your crawl – different page types are governed by mechanisms.
  133. 133. 2. Segment your Rankings. Focus in on key opportunities.
  134. 134. 3. Be painfully Specific.
  135. 135. Be like @Richardbaxter
  136. 136. Check out the Code Coverage report to see what code isn’t being used and delete it from your pages. https://www.portent.com/blog/user-experience/code- coverage-page-speed.htm (h/t @portentint)
  137. 137. Is This Your Recommendation? “Remove unused JavaScript and CSS from all pages to enhance the page speed”
  138. 138. That’s not actionable or OPtimal
  139. 139. Why? Many of those scripts are hosted libraries such as Facebook Connect, Google Analytics or jQuery. Hosting those libraries locally and removing items will take forever and won’t support forward compatibility.
  140. 140. Try this instead Consider removing or otherwise refactoring lines 49-56 in the suchandsuch.js (or page type A) because they are not currently being used and no functionality or other code is dependent upon it.
  141. 141. Is This Your Recommendation? “Implement HTTP/2 for faster site performance”
  142. 142. That’s not actionable or OPtimal
  143. 143. Why? Depending on the server version and environment, the client may not currently support HTTP/2. If their server does support it, they may not know where to start.
  144. 144. Try this instead Based on your site’s HTTP headers, you’re running NGINX vX.XX. We recommend adjusting your HTTPS server configuration in your .conf file to include the following.
  145. 145. Is This Your Recommendation? “You have broken pages throughout the site; we recommend updating those URLs to return 301 redirects and we have prepared a list of 1:1 relationships for redirection in your .htaccess file.”
  146. 146. That’s not actionable or OPtimal f o r e v a l u a t in g w e b s e a r c h s c o r in g f u n c t io n s
  147. 147. Why? #1 I see a lot of recommendations that automatically assume Apache servers. (break down of server tech on the right) NGINX and IIS don’t have .htaccess files.
  148. 148. Why? #2 1:1 rules are suboptimal. Always create RegEx-driven rules for redirects to minimize TTFB of every page throughout the site.
  149. 149. Try this instead Based on your site’s HTTP headers, you’re running NGINX vX.XX. We recommend adjusting your HTTPS server configuration in your .conf file to include the following code.
  150. 150. Is This Your Recommendation? “You have links to redirects throughout the site; we recommend updating those links to the final destination URLs. Here is a list of URLs with links to redirects.”
  151. 151. That’s not actionable or OPtimal
  152. 152. Why? What are they going to do? Run Find and Replace on their entire site?
  153. 153. Try this instead Crawl the site and keep track of all of the final destination URLs. Prepare a spreadsheet or database table and instruct the client to update these links on the database level. Alternatively, spec out a simple crawler they can run on a daily basis to crawl their site and update their links.
  154. 154. Is This Your Recommendation? “Google has recently increased the meta description from 155- 160 characters to ~320. You rewrite your meta descriptions to take more advantage of the space.”
  155. 155. Why? What are they going to write? Which pages will they do it on? How are they going to populate the meta description? How can they capitalize on all the space? How will they scale?
  156. 156. That’s not actionable or OPtimal
  157. 157. Try this instead We recommend using structured data from each page template to generate keyword-relevant meta descriptions that are click-worthy. The following are schemas to implement per page type. We recommend prioritizing the following 5,000 URLs for implementation because they have higher crawl frequency. We will measure the impact of this recommendation using CTR and clicks from Google Search Console as well as traffic and conversion performance from Google Analytics.
  158. 158. software testing you don’t know the software testing use case for seo
  159. 159. Software Testing Types
  160. 160. Automated Testing Happens when Devs Push Code
  161. 161. Software Testing process
  162. 162. Test Case ID Test Scenario Test Type Test Steps Test Data Expected Results Actual Results Pass/Fail SEO01 Presence of Meta Descriptions Unit Test Check for presence of meta description tag in HTML Page template code All URLs should have meta description Product Detail Page is missing meta description PASS SEO02 Viable Internal Links Functional Test 1. Render pages 2. Open all internal links 3. Review response codes Crawled URL data All links return 200 response code Many links to redirects and 404s FAIL SEO03 Average Page Speed Less than 2 Seconds Functional/Integrati on Test 1. Render pages 2. Capture page speed 3. Determine average page speed per page type Render all page types fromURL list All page types should return an average of 2 seconds load time Homepage takes 5 seconds to load FAIL HERE ARE SOME EXAMPLES TO GET YOU STARTED
  163. 163. You may need an internal crawler Consider a serverless crawler with a seed set of pages that can be worked into the push queue.
  164. 164. Split testing You don’t know how to split test for seo
  165. 165. @Distilled has been pioneering A/B Testing as a service for seo.
  166. 166. we ran an a/b test before doing a several million page duplicate content consolidation. Mitigate the Risk
  167. 167. Develop a Hypothesis By changing into , I Can get more better or more rankings and increase traffic.
  168. 168. Create two a groups and one variant group of pages. Number of pages should be statistically significant. Bucket Pages A A B
  169. 169. Make sure the pages you run your experiment on have crawl activity
  170. 170. Benchmark performance Benchmark Rankings, traffic, and conversions
  171. 171. Get it started. We let these run for 30 days or so. Launch Experiment
  172. 172. Analyze your performance.
  173. 173. CausalImpact https://google.github.io/CausalI mpact/CausalImpact.html
  174. 174. Prophet by Facebook for forecasting the impact of changes. https://facebook.github.io/prophet/
  175. 175. Run your own tests - https://www.distilled.net/diy- splittester/ Distilled Free Tool
  176. 176. Odn.distilled.net
  177. 177. How to Become a Dramatically better seo 5 things you should do to be ready for what google throws your way
  178. 178. Commit to doing any three things you’ve learned from this conference
  179. 179. Learn more about information retrieval https://ciir.cs.umass.edu/irbook/
  180. 180. Set up a web server
  181. 181. Build a new website in html, css and javascript
  182. 182. Make it rank for 5 competitive keywords
  183. 183. The Credits
  184. 184. I’m #ZorasDad
  185. 185. IPULLRANK.COM @ IPULLRANK We Do These Things Content Strategy SEO Paid Media Machine Learning Marketing Automation Measurement & Optimization
  186. 186. Shoutout to @Randfish
  187. 187. ‘Nuff said.
  188. 188. http://bit.ly/ydkseo Mike King Founder & Managing Director @iPullRank mike@ipullrank.com
  189. 189. Post-credits scene
  190. 190. Use Buzzsumo to find the most popular content on a site and use it to inform your outreach email
  191. 191. Use a text summarizer so you don’t have to read the entire thing. https://www.tools4noobs.com/summarize/
  192. 192. Use a chatbot to automatically overcome objections in your outreach process
  193. 193. Create a chatbot with DialogFlow (PKA API.AI) https://dialogflow.com/
  194. 194. Set up your responses based on key objections that people have
  195. 195. Connect your chatbot to your email via zapier to automatically overcome objections

×