4. @RyanJones
60% of all Google searches are mobile.
20% of ecommerce starts on mobile.
29% of Americans would give
up sex before giving up their
phone.
People check their phone 80
times/day (iphone. 76 android)
5. @RyanJones
Why We Search Isn’t Changing
Search is about
Verbs. No, not just
the words on a page.
Search is about
helping users
accomplish a task.
“Do” something.
Photo Credit: Shutterstock.com
7. @RyanJones
The Future of SEO will be different
Source Article: https://www.blog.google/products/search/improving-search-next-20-years/
8. @RyanJones
Crawling (Googlebot)
• Discovery (Finding URLs)
• Robots.txt, response time, crawl schedule, etc.
Indexing (Caffeine,WRS)
• Extracting content and other ranking signals
• Paragraph vectors, B+ trees, canonical, Javascript, etc.
Retrieval (relevance)
• How relevant is this to my query?
• Query Understanding, entities, synonyms, etc
Ranking (Post Retrieval)
• Order by pagerank desc limit 10
• Penalties, Speed, HTTPS, Mobile Friendly, etc.
How A
Search
Engine
Works
Note: Google
doesn’t separate
these. This is for
illustration
9. @RyanJones
Crawling, Indexing, Rendering, Oh My!
Crawler Index RendererInitial Crawl
RendersURLs
Finds New content (JS)
URLs Found after rendering must be re-crawled
Chrome 41
11. @RyanJones
What The Crawler Sees
The first step of Crawling is to discover what pages exist on the web.
• Start with known pages in the index
• Discover new links on those pages
• Augment with sitemap data
The Crawler….
• Can’t crawl pages blocked with robots.txt
• Can only crawl pages accessible to anonymous users (no cookies, login, etc)
• Respects canonicals, alternate, hreflangs, etc.
• Doesn’t see JavaScript content.This is only seen after the page is rendered.
Tip: Use Search Console to request crawls of pages, and check crawl stats in the index coverage report
12. @RyanJones
Problems with Rendering
If your content is lazy loaded this way, or requires a user action,
search engines will not see it.
The renderer does not:
• Click
• Hover
• Scroll
• Focus
• MouseOver
• Etc…
13. @RyanJones
An “index” is just a list.
Keyword Score Document ID (the web page)
KoeWetzel 5 282016
Josh Abbott 9 146
William Clark Green 7 7849
Casey Donahew 8 648
Robert Earl Keen 10 65467
TurnpikeTroubadours 2 38
14. @RyanJones
“The Index” is more like “the indices”
There’s multiple indexes.
There’s also multiple features/stages of indexing.This includes things like:
• Tokenization
• Sentence segmenting
• Spell checking
• Entities
• Natural language processing
15. @RyanJones
Indexing
We typically think of indexing
in tables.
But it can also be done with
vectors.
Keyword Occurrence
The 1557
Road 98
Goes 72
On 435
Forever 17
Kind of like a word cloud but with math.
16. @RyanJones
Tokens & N-Grams
If you haven’t climbed up to Enchanted Rock,Drank a cold Shiner down in Luckenbach,Taken your
baby to the RiverWalk,Then you ain’t seen MyTexas yet.
Tokens (Unigrams) If,you,haven’t,climbed,up,to.enchanted,rock,drank,a,cold,shiner,down,in,luckenbach,taken,yo
ur,baby,to,the,river,walk,then,you,ain’t,seen,my,Texas,yet
Bigrams If you, you haven’t, haven’t climbed, climbed up, up to, to enchanted, enchanted rock, rock drank,
drank a, a cold, cold shiner, shiner down, down in, in luchenback………
Trigrams If you haven’t, you haven’t climbed, haven’t climbed up, climbed up to, up to enchanted, to
enchanted rock, a cold shiner, drank a cold, cold shiner down, down in luckenback, taken your baby,
your baby to, baby to the, to the river, the river walk, river walk then, walk then you, then you ain’t,
you ain’t seen, myTexas yet
Note: I skipped a few bigrams and trigrams here, but you get the point – I hope.
Note 2: Lyrics, Josh Abbott Band – MyTexas
17. @RyanJones
Indexing
Zipf's law states that given a
large sample of words used,
the frequency of any word is
inversely proportional to its rank
in the frequency table.
18. @RyanJones
A Quick TF-IDF Rant
IDF is a quick way of seeing
which words offer little value
(the, of, and or,)
Tf-IDF adjusts this based on
the frequency of the words
used.
You’ll notice there’s no actual subtraction in tf-idf.
All of this stuff has to do with indexing and relevance – NOT RANKING.
19. @RyanJones
More on TF-IDF Word Term
frequency
Document
frequency
The 44 3
road 6 1
goes 11 2
on 16 3
forever 4 2
Word Doc 1 Doc 2 Doc 3
The 17 24 3
road 6 0 0
goes 3 0 8
on 5 7 4
forever 0 3 1
Word idf
The 0
road 1.098
goes 0.405
on 0
forever .405
Word Doc 1 Doc 2 Doc 3
The 0 0 0
road 6.588 0 0
goes 1.215 0 3.24
On 0 0 0
forever 0 1.215 .405
20. @RyanJones
A Real-World Example
Word Doc 1 Doc 2 Doc 3
car 0.88 .09 .58
auto .10 .71 0
insurance 0 .71 .70
best .46 0 .41
Data here is based on the Reuters-RCV! Collection included with my introduction to information retrieval textbook
Best Car: doc1
Car Insurance: doc 3
Best Insurance: doc3
Auto Insurance: doc2
22. @RyanJones 21
Indexing
What the crawler sees.
Content on the page.
Crawlable links
Markup
Disavow/robots/etc
Mobile Friendly
Viewport
Font sizes
Content scale
Links/Buttons clickable
No overlays / popups
Redirect/canonical/alternate
Page speed
Mobile Indexing Vs Mobile Friendly
24. @RyanJones
Retrieval
Explicit Signals:
• “What the user thinks they want”.
• The user’s actual query
• Search operators
• Language used
Implicit Signals:
• “What the user needs”.
• Searcher Intent.
• QueryType
• Information
• Transactional
• Navigational
• Synonyms
• ResultType (images, web, etc)
This is basically rankbrain at work
Image Source @dannysullivan
https://twitter.com/dannysullivan/status/1044274915388481537
25. @RyanJones
What Makes Up ranking?
• Rankbrain: Understanding the search query
• Panda: Content Quality
• Penguin: Link Spam
• Pidgeon: Local Spam
• Pirate: Copyright Infrigement
• Top Heavy: Ads
• Mobile Friendly
• Core Factors:
• Pagerank
• On-site signals
• Authority
26. @RyanJones
PageRank
“domain authority” is NOT a thing Google uses. (at least the way we think of it.)
At a high level,The pagerank of a page is the sum of: the pagerank of every page that
links to it, divided by the number of links on that page.
The actual calculation uses a dampening factor (d) (usually around .85) to simulate
users randomly leaving the website for another site.
27. @RyanJones
What is A Core Algorithm Change?
Read my article on SEJ about the core algorithm update: https://www.searchenginejournal.com/what-is-a-google-broad-core-algorithm-update/264261
1. Pagerank
2. TitleTag
3. H1
4. BoldText
5. Internal Anchors
6. Speed
7. HTTPS
8. Linking to wtfseo.com
1. Pagerank
2. H1
3. HTTPS
4. Internal Anchors
5. Speed
6. TitleTag
7. BoldText
8. Linking to wtfseo.com
Think of core algorithm
change as Google “shuffling”
the order and importance of
their hundreds of ranking
factors
In reality, it’s likely more than this. E.G changing the decay value in the Pagerank calculation or changing the retrieval method or a change to how
synonyms are weighted or word vectors calculated, etc.
28. @RyanJones
Quality Raters & Algorithm Changes
Quality Raters rate algorithm changes against each other.They’re part of how Google
tests algorithm changes.
Quality Raters DO NOT:
• Penalize your website
• Affect your site’s ranking / relevance
29. @RyanJones
Penalties.
I made this image ~10 years ago. I miss Matt.
It’s 👏
Not 👏
a 👏
penalty 👏
every 👏
time 👏
rankings 👏
drop. 👏
Think of penalties
(manual and
algorithmic) as
applied AFTER the
retrieval and
ranking phase.
30. @RyanJones
Questions:
• WhenWillThis Roll Out?
• NOW! Some sites have migrated. Check Search Console for notifications.
• What about “hidden content”
• If it’s Good for mobile user experience, it shouldn’t hurt you.
• Can I test my site?
• Use Search Console’s ”fetch and render” tool.
• Will there be 2 indexes?
• NO. It’s one index with mobile and desktop signals.
• What about Links?
• Make sure those canonical/alternate tags are in place.
31. @RyanJones
Who to follow / where to learn more.
ME! Follow Me! @RyanJones.
(seriously, what did you expect here?)
But also, buy these books