A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
The document provides an overview of technical SEO best practices. It discusses on-page SEO elements like titles, meta descriptions, headings, images and URLs. It also covers technical aspects like sitemaps, indexing APIs, robots.txt files, redirects and canonical tags. The document recommends prioritizing content, links and proper indexing as the most important ranking factors. It also lists various tools for SEO audits, monitoring and troubleshooting technical issues.
An overview of the presentation focusing on technical SEO by Patrick Stox from IBM.
Discusses the importance of title tags, meta descriptions, headings, and images in on-page SEO.
Meta descriptions summarize pages for search engines, keying on relevant search terms or rich results.
Importance of headings (H1-H6) for structuring content for better readability and skimming.
Emphasizes optimizing images through text content, alt attributes, and sensible naming practices.The significance of URLs in identification and discovery, along with structures for internal linking.
Highlights main ranking factors such as indexing, content relevancy, and tracking user intent.
Clarification on crawling vs indexing and their importance to search engine optimization.
Details how search engines like Googlebot crawl sites and the concept of crawl budget.
Explanation of the robots.txt file role in controlling crawler access to website content.
Sitemaps facilitate crawling and indexing by search engines, detailing structure and links.
Robots directives define crawling and indexing rules affecting how search engines see pages.
Discusses issues with conflicting instructions and common HTTP response status codes.
Utilizes canonical tags to resolve duplicate content issues and manage the preferred version of pages.
Strategies for dealing with duplicate content, including Canonical tags and redirects.List of tools for SEO monitoring, site speed testing, and troubleshooting methods.
Insights on caching issues, the indexing process, and troubleshooting indexed pages.
Redirects facilitate page navigation for users and search engines, ensuring proper URL routing.
Importance of site speed, platforms for monitoring, and the impact on user analytics.
Introduction to AMP and its contribution to optimized loading speed for mobile users.
Discussion on schema markup benefits for SERP features, including rich snippets.
Embedding jump links for quick navigation to sections on a page for better user experience.
The transition to HTTPS and its significance for security and SEO benefits.
Managing mixed content issues during HTTPS migration through content security policies.
Challenges and strategies for optimizing JavaScript frameworks for search engines.
Overview of pagination's effectiveness and best practices for current SEO needs.
Focus on creating fewer and stronger pages for effective SEO management.
Using log files for insights into user interaction and site performance.
Discussion on middleware systems and their influence on SEO strategies.
Examining serverless SEO practices and their relevance in modern site optimization.
How hreflang tags address internationalization challenges in SEO.
Insight into keyword cannibalization and strategies to manage it in SEO.
The significance of internal links in enhancing website SEO performance.
On Page
The mainthing is content
But since this is technical
SEO:
● Title
● Meta Description
● Headings
● Images
● URL
Most of these should align with
the content and what the page is
about.
4.
Title
“A <title> tagtells both users and
search engines what the topic of a
particular page is.”
Say what the page is about
There’s no actual character
limit
Google may overwrite if not
relevant or too long
Desktop and Mobile lengths
are different
Meta Description
“A page'sdescription meta tag
gives Google and other search
engines a summary of what the
page is about.”
Not a ranking factor
Google can ignore this one
too and write you a new one
No real limit on this either
Chance to sell yourself or
fulfill their needs
Headings
H1 > H6
Breakup content into a structure
and allow for easier skimming of
sections
Order doesn’t matter, can
use h1 after h2
Can use multiple h1 tags
Can use h2 or h3 as the
highest, it’s okay really
Describe the section
Makes pages skimmable
16.
Images
Google has areal focus on images
at the moment
The image index is separate
from the index for pages
Allow crawling of images
Have an image sitemap
If you have a lot of good
images, opening up
copyright to allow people to
use the images with
attribution = links!
17.
What Matters For
ImageOptimization
Text around the image
Content of the image
Image name (minimal
impact)
Alt attribute
Webpage title + description
19.
SEOs have alot of
debates around
image best
practices, mine are:
You can’t use an alt
attribute on a background
image
You shouldn’t use an alt
attribute for an irrelevant
image or one used for
spacing
You shouldn’t add your
keyword if the image is not
relevant
20.
URL
The address ofa web page, i.e.
https://patrickstox.com
Mainly used as an identifier
Keywords in URL are a tiny
ranking factor
Up to 2083 characters (IE
cutoff)
Structure matters for
discovery/crawling but not
ranking
21.
Keep URL structuressensible and logical.
The structure can change Internal linking (links from one page
on your site to another) which can impact results
•Is it indexed,how is it indexed
•How are signals consolidated (content, internal and external links, redirects). Fewer, stronger
pages
•Content (talk about the right things, the things people are searching for, the things top pages are
ranking for, the things top pages are talking about, expand to domain/topic level opportunities,
what’s driving value to competitors) Every person, every time
•Intent – to me this is as simple as match popular search terms. Google tries to show a mix
(fractured intent) of popular searches in their results and generally top pages talk about what
people search for. Meet user needs and fulfill your promises
Search Engines are
Constantly
Crawlingthe Web
Googlebot
Bingbot
They find pages by following links,
through sitemaps, submissions
(Google Search Console, Mobile
Friendly Test), Indexing API
Crawl budget = # of pages they’ll
crawl on your site, but note they
prioritize them based on a lot of
things like how often the pages
update and how strong the pages are
28.
Googlebot
Uses a longviewport 431x731, 12,140
pixels high for mobile and 768/1024
9,307 pixels high for desktop
https://codeseo.io/console-log-
hacking-for-googlebot/
Is being updated to current Chrome
Heavily caches files
Runs with a sped up clock
29.
Robots.txt
Tells crawlers wherethey can and
can’t go
Multiple robots.txt files may
need to be looked at when
troubleshooting. Each domain
and subdomain can have it’s
own.
Max size is 500 KB
You can search archive.org for
older versions of robots.txt files
Blocking things causes lots of
issues with canonicalization
30.
User-agent — specifieswhich robot.
Disallow — suggests the robots not crawl this area.
Allow — allows robots to crawl this area.
Crawl-delay — tells robots to wait a certain number of
seconds before continuing the crawl.
Sitemap — specifies the sitemap location.
Noindex — tells Google to remove pages from the index.
(Not officially supported)
# — comments out a line so it will not be read.
* — match any text.
$ — the URL must end here.
Sitemap
“A sitemap isa file where you
provide information about the
pages, videos, and other files on
your site, and the relationships
between them. Search engines like
Google read this file to more
intelligently crawl your site.”
Not Required
Manually run sitemaps are almost
impossible to maintain
Most common is xml, but can be txt
Can be cross-domain if both sites
verified in GSC
Limit 50MB (uncompressed) and
50,000 URLs, can create a sitemap
index (file that points to other
sitemaps) if more are needed
Pages, video, images, news
33.
Sitemap 2
“A sitemapis a file where you
provide information about the
pages, videos, and other files on
your site, and the relationships
between them. Search engines like
Google read this file to more
intelligently crawl your site.”
Used as a canonicalization
signal
Should be only resolved
URLs
Can include hreflang here
(internationalization)
Can filter to see how pages
are being treated in the
Index Coverage report in
Google Search Console
34.
Indexing APIs
API tonotifiy search engines of
updates, may replace sitemaps in
the future
https://developers.google.c
om/search/apis/indexing-
api/v3/quickstart
https://www.bing.com/web
master/help/submit-urls-to-
bing-62f2860a
35.
Robots Directive
Controls crawling,indexing, and
some more items
Good bots will listen to these
Can be in the <head> section or the
the server response headers in the
request
<meta name="googlebot"
content="noindex" />
X-Robots-Tag: noindex
Google will take the most restrictive
setting
36.
all There areno restrictions for indexing or serving. Note: this directive is the
default value and has no effect if explicitly listed.
noindex Do not show this page in search results and do not show a "Cached" link in
search results.
nofollow Do not follow the links on this page.
none Equivalent to noindex, nofollow.
noarchive Do not show a "Cached" link in search results.
nosnippet Do not show a text snippet or video preview in the search results for this
page. A static thumbnail (if available) will still be visible.
notranslate Do not offer translation of this page in search results.
noimageindex Do not index images on this page.
unavailable_after: [RFC-850 date/time] Do not show this page in search results after the specified date/time. The
date/time must be specified in the RFC 850 format.
37.
Conflicting Signals
A pagecan that’s marked noindex
but is blocked by robots.txt can be
indexed
Lots of different signals add up and
sometimes search engines have to
make a decision. Some things for
them are directives and others are
more like suggestions. They will
typically lean towards more
restrictive statements.
38.
Server Responses
HTTP responsestatus codes indicate
whether a specific HTTP request has
been successfully completed.
Responses are grouped in five classes:
informational responses, successful
responses, redirects, client errors, and
servers errors
200 - Okay
301 - permanent redirect (really more cached)
302 - temporary
307 usually used for HSTS and browser cached,
mostly a 301 or 302 behind this
308 - similar to 301 but no switching from Post
to GET
404 - Not Found
410 - Gone (faster than 404)
418 - I’m a teapot
5xx - Server error
39.
Canonical Tag
The canonicaltag helps solve
duplicate content issues by setting
the preferred version of a page
and passing signals such as links
to the preferred version.
The tag helps consolidate duplicate
content caused by issues such as:
HTTP and HTTPS
www and non-www
parameters and faceted navigation
session IDs
trailing slashes
index/default pages
alternate page versions such as m. or
AMP pages or print versions
40.
Canonical Tag 2
Thecanonical tag helps solve
duplicate content issues by setting
the preferred version of a page
and passing signals such as links
to the preferred version.
Can be ignored
Google prefers https over http,
shorter URL over longer, and usually
will ignore canonical tags if enough
other signals are stronger like
potentially with hreflang (used for
internationalization)
Wouldn’t be needed in an ideal world
Don’t work in the <body> section
Can be in <head> or <header>
<link rel="canonical" href="https://example.com/" />
HTTP/1.1 200 OK
Link: <https://example.com/>; rel="canonical"
41.
If the canonicalis thrown into the <body> section, it’s usually because something like
an iframe was injected into the <head> section and caused the browser to end the
<head> early.
To troubleshoot this, you can use DOM breakpoints to step through what might have
caused this https://developers.google.com/web/updates/2015/05/view-and-change-your-
dom-breakpoints
42.
Canonicalization
Determining which pageto show
in the index, which pages will
share signals
Canonical tags
Hreflang
Duplicates
Sitemap URLs
Internal Links
Redirects
43.
Duplicate Content
Duplicate contentgenerally refers
to substantive blocks of content
within or across domains that
either completely match other
content or are appreciably similar.
Mostly, this is not deceptive in
origin.
“Don't be afraid of duplicate content.
It's the internet, it happens. Google
search does not penalize for duplicate
content.” - Gary Illyes, Google
Webmaster Trends Analyst
Duplicates are typically filtered or
folded together in Google’s index in
what are called Clusters. These
typically share signals like links.
44.
Duplicate Content 2
Duplicatecontent generally refers
to substantive blocks of content
within or across domains that
either completely match other
content or are appreciably similar.
Mostly, this is not deceptive in
origin.
HTTP and HTTPS
www and non-www
Parameters and faceted navigation
Session IDs
Trailing slashes
Index pages
Alternate page versions such as m. or
AMP pages or print
Dev/hosting environments
Pagination
Scrapers. Copying, syndication
Country/language versions
45.
Solutions:
Do nothing andhope Google gets it right.
Canonical tags.
Redirects.
Tell Google how to handle URL parameters in GSC
Rel=”alternate”. Used to consolidate alternate versions of a page, such as mobile or various
country/language pages. With country/language in particular, hreflang is used to show the
correct country/language page in the search results.
Follow syndication best practices, typically canonical back if you can or at least link back
to original source.
46.
Tools
Crawlers: DeepCrawl, Botify,
Oncrawl,Ryte, Screaming Frog,
Sitebulb, ContentKing
Change Monitoring: ContentKing,
PageModified, Little Warden
Chrome:
Ayima Redirect Path or Link Redirect
Trace - shows header responses and
redirect paths
Tag Assistant (GTM), Tealium Tools
Dev Tools
47.
Tools
See what Googlesees:
Mobile Friendly Test (Mobile)
https://search.google.com/test/mobile-friendly
Search Console URL Inspector
https://search.google.com/search-console/inspect
Rich Results Test (Desktop)
https://search.google.com/test/rich-results
48.
Tools give data,but you need insights. These are common suggestions that waste time:
Page size - Google says keep it under a couple hundred megabytes
Too many links on a page - Google says a few thousand at most
https://support.google.com/webmasters/answer/35769?hl=en&ref_topic=6001981
Missing sitemap
Missing robots.txt
No custom 404 page
Title tags too long/short
Invalid HTML
Word Count
Keyword Density
Readability
Favicon
Social tags
Multiple h1 tags
<a id="whatever-you-want">What youwant to link to link to.</a>
<a href="#whatever-you-want">Something about the section you’re linking to.</a>
That anchor is all that’s needed.