2. #SMX #23A @PatrickStox
• Technical SEO for IBM - Opinions expressed are my own and not
those of IBM.
• I write, mainly for Search Engine Land
• I speak at some conferences like this one, Pubcon, TechSEO Boost
• Organizer for the Raleigh SEO Meetup (most successful in US)
• We also run a conference, the Raleigh SEO Conference
• Also the Beer & SEO Meetup (because beer)
• 2017 US Search Awards Judge, 2017 UK Search Awards Judge, 2018
Interactive Marketing Awards Judge
Who is Patrick Stox?
8. #SMX #23A @PatrickStox
Share these #SMXInsights on your social channels!
#SMXInsights
If you had a big drop that wasn’t seasonal, a
measurement error, or an algorithm update,
then something changed. Dig deep to find out
what.
10. #SMX #23A @PatrickStox
Internet Archive Wayback Machine: http://archive.org/web/
You can even get archived versions of your robots.txt file
Crawl Comparisons
Change Monitoring
How To See What Changed?
11. #SMX #23A @PatrickStox
Don’t assume there’s just one system.
Because of routing rules and reverse
proxies, multiple systems and even
infrastructures may be involved and the
transition can appear seamless.
In what system did something change?
12. #SMX #23A @PatrickStox
Ahrefs Site Explorer > Best by links > filter by 404
Fixing redirects is usually one of the easiest wins of an SEO
campaign. Often old /index pages are missed because they require
special rules in many cases.
Redirects
13. #SMX #23A @PatrickStox
Try previous versions of pages if you know them or check
archive.org and see if and how they are redirecting. Sometimes
they may be redirecting, but not to the preferred location.
https://searchengineland.com/fixing-historical-redirects-using-
wayback-machine-apis-257628
Redirects
14. #SMX #23A @PatrickStox
Make sure any domains you
had redirected are still
registered and if they had a
security certificate that it
hasn’t expired.
Redirects from previous domains
15. #SMX #23A @PatrickStox
Share these #SMXInsights on your social channels!
#SMXInsights
Figure out where your redirects are firing
DNS level
Middleware
CDN level
Server level (for Apache .htaccess or the server
config)
HTTP header response
Language based (PHP, JS, meta refresh, etc)
16. #SMX #23A @PatrickStox
200 – OK
301 – Permanent Redirect (as long as it’s in place)
302 – Temporary Redirect (may keep indexing at original URL)
307 – mostly browser cached these days. Could be a 302 or a 301
(check in private / incognito)
404 – Not Found
410 – Gone
418 – I’m a teapot
50x – different errors
Status Code
17. #SMX #23A @PatrickStox
Don’t blindly trust status codes. Error pages can show a 200 status
as easy as a page that’s correct could show a 404.
Check all the hops. Chrome Dev Tools, Ayima Redirect Path, Link
Redirect Trace
Status Code
18. #SMX #23A @PatrickStox
External
Any removed, nofollowed?
Tools: Ahrefs, Moz, Majestic, SEMrush
Internal
Related posts removed, pages deleted, nofollow added to
links or pages?
Links
19. #SMX #23A @PatrickStox
Check on-page elements (canonical, meta robots, pagination,
hreflang)
Robots.txt – check which folders are being blocked, also look for
noindex in robots.txt (not officially supported). If anything is
blocked from crawling, Google can’t see the content and can’t see
on-page elements they need to consolidate signals.
Meta robots values: noindex, nofollow, none (none doesn’t mean
there isn’t one, it is the same as noindex, nofollow)
Anything Blocked or Noindexed?
20. #SMX #23A @PatrickStox
Did any new sets of tags appear that might conflict with others?
Could have additional tags because of a theme change or
plugin/module added.
For instance, if you have 2 robots meta tags that are index and one
that is noindex, Google will likely obey the noindex.
Multiple Tags
21. #SMX #23A @PatrickStox
Canonical – <head>, HTTP Header. Also send signals: preferred
version in GSC, redirects, sitemap
Noindex – <head>, HTTP Header, robots.txt (unofficially)
Hreflang – <head>, HTTP Header, sitemap
Tags in Multiple Locations
22. #SMX #23A @PatrickStox
A tag may not show in the source if it is added using JavaScript. You
should see it in the rendered DOM (use Inspect). Google will likely
not see these on the first pass, but after the page is sent to the
renderer these would be picked up.
Example: nofollow injected on outbound links will probably be
counted as follow initially, then counted as nofollow later after it’s
run through the WRS (Web Rendering Service).
Tags Injected
23. #SMX #23A @PatrickStox
URL Parameter settings – make sure if you set these up that they
are doing what you want them to.
URL Removal Tool – Did anyone remove URLs?
Disavow file – Did someone disavow any links that may have been
helping?
Google Search Console
24. #SMX #23A @PatrickStox
Sometimes scripts or iframes or just anything not coded correctly
can break the <head> section early. You will not see this with view-
source, but may see it with Inspect or Inspect Element to see the
rendered DOM (Document Object Model).
Broken <head>
25. #SMX #23A @PatrickStox
This can be especially true with JS
frameworks where you might be
serving different things to different
systems or older versions of your
files may be cached. What a user
sees and even the location they are
looking at might be different from
what a search engine is seeing.
Google sees something different
26. #SMX #23A @PatrickStox
Use Fetch and Render in Google Search Console
Desktop: Rich Results Tool https://search.google.com/test/rich-
results
Mobile: Google Mobile Friendly Test
https://search.google.com/test/mobile-friendly - renders a page
with smartphone Googlebot. It does have the processed DOM
(source code) and a debug mode (see page loading issues and
JavaScript Console).
What Google Sees
27. #SMX #23A @PatrickStox
Share these #SMXInsights on your social channels!
#SMXInsights
Google’s Mobile Friendly Test
https://search.google.com/test/mobile-friendly
is currently the best way to see what Google
sees for a client-side rendered JS website. It
shows the processed DOM and has a JavaScript
Console.
28. #SMX #23A @PatrickStox
Blocking crawling in robots.txt means nothing on the page gets
seen and nothing gets consolidated.
Noindex of a page will break hreflang tags, so will redirects and
canonical tags to a page other than the one specified.
Things don’t always Work Together
29. #SMX #23A @PatrickStox
Noindex a page that has a canonical set as another page. Google
isn’t necessarily consistent with this one. They see you’re trying to
set a preferred version with the noindex, so they may drop one
version of the page, still count that page for part of the set and
ignore noindex, or in rare cases pass the noindex value to both
pages.
*A lot of things can go wrong. Many times a lot of signals add up.
Things don’t always Work Together
30. #SMX #23A @PatrickStox
Add &filter=0 to the end of the URL for your Google Search.
google.com/more-stuff-here&filter=0
This removes filters like domain clustering and shows when there
are multiple pages on your website eligible for a query, which may
indicate that they should be combined.
Another Page Showing
31. #SMX #23A @PatrickStox
A site:domain.com search can reveal a wealth of knowledge about
a website. I would be looking for pages that are indexed in ways I
wouldn’t expect, such as with parameters, pages in site sections I
may not know about, and any issues with pages being indexed that
shouldn’t be (like a dev server).
Site:domain.com
32. #SMX #23A @PatrickStox
A single term can show you relevant pages on your website related
to that term which can give you internal link or content
consolidation opportunities. It also shows if you’re eligible for a
featured snippet that may not show.
If you use a phrase instead of a keyword, this can be used to check
if content is being picked up by Google, which is handy on websites
that are JavaScript-driven. If it shows other websites with the same
content, it may indicate that content is being copied.
Site:domain.com “text from your site”
33. #SMX #23A @PatrickStox
Shows you Google’s cache of the page. This is typically a snapshot
of the HTML and should not be used to diagnose JavaScript
websites. If it shows a different page or a different domain or a
different language, it indicates there may be some issues around
consolidating indexing signals, duplicate content, or crawling.
Cache:https://www.domain.com/page
34. #SMX #23A @PatrickStox
This lets you know if a page is indexed and how it is indexed.
Multiple pages can be in the same set, like multiple records of
different URLs grouped together.
Most of the time, Google will return the version shown, but it can
sometimes pull one of the other pages like when searching from a
different country.
Info:https://www.domain.com/page
35. #SMX #23A @PatrickStox
You have to be careful with these.
Google crawls from the US
mostly and may be shown the
wrong content, then funky things
can happen.
Rules For User-Agents Or Autoredirecting
The homepage is missing.
36. #SMX #23A @PatrickStox
Info: shows the French page is being treated as the US page.
Rules For User-Agents Or Autoredirecting
37. #SMX #23A @PatrickStox
The cached version of the page shows the English content.
Rules For User-Agents Or Autoredirecting
38. #SMX #23A @PatrickStox
There’s some kind of JS that’s redirecting the cache of Coursera to
an error page.
Rules For User-Agents Or Autoredirecting