SlideShare a Scribd company logo
1 of 44
The New Content SEO
FLOQ - Amanda King
Sydney SEO Conference
14 April 2023
The New Content
SEO
What we’ll talk about
1. A quick refresher
2. Have keywords ever actually
been a thing Google used?
3. How Google reads content
may not be what you think
4. So what do we do about all
this?
5. Who tf am I?
A quick refresher
A brief refresher on how Google crawls the Internet
It’s three separate stages: crawl,
index, serve; with sub-processes
for scoring and ranking.
Content analysis is included in the
indexing engine, content relevancy
is in the serving engine.
While this is an old patent (2011) the
fundamentals still apply for this
reminder.
Source: https://patents.google.com/patent/US8572075B1/, retrieved 22 Mar 2023
https://developers.google.com/search/docs/fundamentals/how-search-works
● Query Deserves Freshness is a system
● Helpful Content is a system
● MUM & BERT are systems
○ “Bidirectional Encoder Representations from
Transformers (BERT) is an AI system Google uses
that allows us to understand how combinations of
words express different meanings and intent.”
The search engine ranking engine works
in systems
https://developers.google.com/search/docs/appearance/ranking-systems-guide
Have keywords ever actually been
a thing Google used?
While Google is a
machine, it’s moved
fundamentally beyond
keywords…and has since
at least 2015.
Why hasn’t SEO?
Queries very quickly
become entities
“[...]identifying queries in query data;
determining, in each of the queries,
(i) an entity-descriptive portion that
refers to an entity and (ii) a suffix;
determining a count of a number of
times the one or more queries were
submitted“
- patent granted in 2015, submitted in
2012
Source: https://patents.google.com/patent/US9047278B1/en ; https://patents.google.com/patent/US20150161127A1/
Google acknowledges query-only based
matching is pretty terrible.
“Direct “Boolean” matching of query terms has well known limitations,
and in particular does not identify documents that do not have the query
terms, but have related words [...]The problem here is that conventional
systems index documents based on individual terms, rather than on
concepts. Concepts are often expressed in phrases [...] Accordingly,
there is a need for an information retrieval system and methodology that
can comprehensively identify phrases in a large scale corpus, index
documents according to phrases, search and rank documents in
accordance with their phrases, and provide additional clustering and
descriptive information about the documents. [...]”
- Information retrieval system for archiving multiple document
versions, granted 2017 (link)
So it decided to make it’s search engine
concept and phrase-based.
“The system is adapted to identify phrases that have
sufficiently frequent and/or distinguished usage in the
document collection to indicate that they are “valid” or “good”
phrases [...]The system is further adapted to identify phrases
that are related to each other, based on a phrase's ability to
predict the presence of other phrases in a document.”
- Information retrieval system for archiving multiple
document versions, granted 2017 (link)
“Rather than simply
searching for content that
matches individual words,
BERT comprehends how a
combination of words
expresses a complex idea.”
Source: https://blog.google/products/search/how-ai-powers-great-search-results/
MUM takes this a step further
● About 1,000 times more powerful than BERT
● Trained across 75 languages for greater context
● Recognises this across different types of media (video,
text, etc)
https://blog.google/products/search/introducing-mum/
How Google reads content may
not be what you think
Step 1
Indexing
Indexing is the stage where content
is analysed, so how does Google
do it?
BERT is a technique for
pre-training natural
language classification. So
how does natural language
processing work, once it
has a corpus of data?
Source: https://blog.google/products/search/search-language-understanding-bert/
Is there anything in this process that even looks like “keywords”?
1. Parsing: Tokenisation, parts of speech, stemming
(for Google, lemmatization)
2. Topic Modelling: entity detection, relation detection
3. Understanding
4. Onto the next engine, ranking
So the broad strokes steps in the
indexation process are
● Semantic distance
● Keyword-seed affinity
● Category-seed affinity
● Category-seed affinity to
threshold
Parsing is intrinsically
categorisation
https://patents.google.com/patent/US11106712B2; https://www.seobythesea.com/2021/09/semantic-relevance-of-keywords/
How natural language processing usually works: tokenization and subwords
Source: https://ai.googleblog.com/2021/12/a-fast-wordpiece-tokenization-system.html
● N-grams: important to find the
primary concepts of the
sentence by identifying and
excluding stop words
● “Running” “runs” “ran” = same
base — “run”
This gets broken down even
further
https://patents.google.com/patent/US8423350B1/
Google does a lot of things when detecting
entities and relationships
● Identifying aspects to define entities based on popularity
and diversity, granted in 2011 (link)
● Finding the entity associated with a query before returning
a result, using input from human quality raters to confirm
objective fact associated with an entity, granted in 2015
(link)
● Understanding the context of the query, entity and related
answer you’re searching for, granted in 2019 (link)
● Aims to understand user generated content signals in
relation to a webpage, granted in 2022 (link)
Google does a lot of things when detecting
entities and relationships
● Understanding the best way to present an entity in a
results page, granted in 2016 (link)
● Managing and identifying disambiguation in entities,
granted in 2016 (link)
● Build entities through co-occurring ”methodology based
on phrases” and store lower information gain
documents in a secondary index, granted in 2020 (link)
● Understanding context from previous query results and
behaviour, granted in 2016 (link)
Step 2
Scoring
In their own description of their
ranking & scoring engine, Google
offers 5 buckets:
● Meaning
● Relevance
● Quality
● Usability
● Context
Scoring is all those 200+ factors we talk
about…
Google has cited everything from internal links, external links, pogo sticking, “user
behaviour”, proximity of the query terms to each other, context, attributes, and more
Just a few of the patents related to scoring:
● Evaluating quality based on neighbor features (link)
● Entity confidence (link)
● Search operation adjustment and re-scoring (link)
● Evaluating website properties by partitioning user feedback (link)
● Providing result-based query suggestions (link)
● Multi-process scoring (link)
● Block spam blog posts with “low link-based score” (link)
It actually looks like
they have a
classification engine
for entities as well
This patent was filed in 2010,
granted in 2014. Likely a basis
for the Knowledge Graph.
(US8838587B1)
https://patents.google.com/patent/US8838587B1/en
“...link structure may be
unavailable, unreliable, or
limited in scope, thus,
limiting the value of using
PageRank in ascertaining
the relative quality of some
documents.” (circa 2005)
https://patents.google.com/patent/US7962462B1/en
There’s more than one document scoring function, which are weighted, and has been since the beginning
How Google ranks content
● Based on historical behaviour from similar searches in
aggregate (application)
● Based on external links (link)
● Based on your own previous searches (link)
● Based on or not it should directly provide the answer via
Knowledge Graph (link)
● Phrase- and entity-based co-occurrence threshold
scores (link)
● Understanding intent based on contextual information
(link)
Helpful Content Update & Information
Gain Score (granted Jun 2022)
● The information gain score might be personal to you
and the results you’ve already seen
● Featured snippets may be different from one search to
another based on the information gain score of your
second search
● Pre-training a ML model on a first set of data shown to
users in aggregate, getting an information gain score,
and using that to generate new results in SERPs.
https://patents.google.com/patent/US20200349181A1/en
What is “information gain”?
“Information gain, as the ratio of actual co-occurrence rate to
expected co-occurrence rate, is one such prediction
measure. Two phrases are related where the prediction
measure exceeds a predetermined threshold. In that case,
the second phrase has significant information gain with
respect to the first phrase.“
- Phrase-based searching in an information retrieval
system, granted 2009 (link)
So, basically, it’s
quantifying to what
degree you talk about all
the topics Google sees as
related to your main
subject.
If information gain is such a
strong concept in which
results Google chooses
which content to show, why
do so few folks talk about it?
https://patents.google.com/patent/US7962462B1/en
So what do we do about all this?
When is the last time
you’ve done a full
content inventory?
What I mean when I say content inventory
https://www.portent.com/onetrick
Redo keyword research and overlay
entities
● Pull content for at least the top 10 search results
ranking for your target keyword
● Dump them into Diffbot (https://demo.nl.diffbot.com/) or
the Natural Language AI demo
(https://cloud.google.com/natural-language)
● Note the entities and salience
● Run your target page
● Understand the differences
● Update your content accordingly
Start with keyword research, find co-
occuring terms
● Pull content for at least the top 10 search results
ranking for your target keyword
● Look at TF-IDF calculators to reverse engineer the topic
correlation (Ryte has a paid one)
● Note the terms included
● Run your target page
● Understand the differences
● Update your content accordingly
Break old content habits
● FAQ on product pages
● Consolidate super-granularly targeted blog articles
● Think outside of the blog folder — the semantic
relationship can carry through to the directory order of
the website as well
● Internal linking can be a secret weapon
● Fit content to purpose: not everything needs a 3,000
word in-depth article
Measure what really
matters to the business
— traffic and revenue
from organic.
Who tf am I?
Amanda King is a human
● Over a decade in the
SEO industry
● Traveled to 40+
countries
● Business- and
product-focussed
● Knows CRO, Data,
UX
● Always open to
learning something
new
● Slightly obsessed
with tea
Thank you
Amanda King
t. @amandaecking
i. @floq.co / @amandaecking
w. floq.co

More Related Content

What's hot

Crawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowCrawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowSallyR7
 
Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022Andrew Charlton
 
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...Jessica Maloney
 
KIM DEWE - Transitioning into people management (BrightonSEO April 2022)
KIM DEWE - Transitioning into people management (BrightonSEO April 2022)KIM DEWE - Transitioning into people management (BrightonSEO April 2022)
KIM DEWE - Transitioning into people management (BrightonSEO April 2022)Kim Dewe
 
Brighton SEO 2023 - ML Lessons For Total Search.pdf
Brighton SEO 2023 - ML Lessons For Total Search.pdfBrighton SEO 2023 - ML Lessons For Total Search.pdf
Brighton SEO 2023 - ML Lessons For Total Search.pdfMaxFlajsner1
 
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
BrightonSEO - Master Crawl Budget Optimization for Enterprise WebsitesBrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
BrightonSEO - Master Crawl Budget Optimization for Enterprise WebsitesManick Bhan
 
Content writers: will AI take your job?
Content writers: will AI take your job?Content writers: will AI take your job?
Content writers: will AI take your job?KatieThompson74137
 
Brighton SEO Autumn 2021: Core Web Vitals: Loopholes, Flaws, and Endless Delays
Brighton SEO Autumn 2021: Core Web Vitals: Loopholes, Flaws, and Endless DelaysBrighton SEO Autumn 2021: Core Web Vitals: Loopholes, Flaws, and Endless Delays
Brighton SEO Autumn 2021: Core Web Vitals: Loopholes, Flaws, and Endless DelaysTom Capper
 
SMX East: Recovering From Core Updates - Lily Ray
SMX East: Recovering From Core Updates - Lily RaySMX East: Recovering From Core Updates - Lily Ray
SMX East: Recovering From Core Updates - Lily RayLily Ray
 
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022 SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022 SophieBrannon
 
How to Use Search Intent to Dominate Google Discover
How to Use Search Intent to Dominate Google DiscoverHow to Use Search Intent to Dominate Google Discover
How to Use Search Intent to Dominate Google DiscoverFelipe Bazon
 
How Search Works
How Search WorksHow Search Works
How Search WorksAhrefs
 
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance FrameworkGoodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance FrameworkAleyda Solís
 
Kleecks - AI-Martech as a game changer-DEF.pdf
Kleecks - AI-Martech as a game changer-DEF.pdfKleecks - AI-Martech as a game changer-DEF.pdf
Kleecks - AI-Martech as a game changer-DEF.pdfKleecks
 
The Value of Featured Snippets (BrightonSEO 2023).pdf
The Value of Featured Snippets (BrightonSEO 2023).pdfThe Value of Featured Snippets (BrightonSEO 2023).pdf
The Value of Featured Snippets (BrightonSEO 2023).pdfNiki Mosier
 
[BrightonSEO 2019] Restructuring Websites to Improve Indexability
[BrightonSEO 2019] Restructuring Websites to Improve Indexability[BrightonSEO 2019] Restructuring Websites to Improve Indexability
[BrightonSEO 2019] Restructuring Websites to Improve IndexabilityAreej AbuAli
 
How To EAT Links.pptx
How To EAT Links.pptxHow To EAT Links.pptx
How To EAT Links.pptxDixon Jones
 
How to unlock the secrets of effortless keyword research with ChatGPT.pptx
How to unlock the secrets of effortless keyword research with ChatGPT.pptxHow to unlock the secrets of effortless keyword research with ChatGPT.pptx
How to unlock the secrets of effortless keyword research with ChatGPT.pptxDaniel Smullen
 
Web Server SEO: Make your TTFB faster!
Web Server SEO: Make your TTFB faster!Web Server SEO: Make your TTFB faster!
Web Server SEO: Make your TTFB faster!Ash New
 
Internal Linking - The Topic Clustering Way edited.pptx
Internal Linking - The Topic Clustering Way edited.pptxInternal Linking - The Topic Clustering Way edited.pptx
Internal Linking - The Topic Clustering Way edited.pptxDixon Jones
 

What's hot (20)

Crawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowCrawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to Know
 
Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022
 
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
 
KIM DEWE - Transitioning into people management (BrightonSEO April 2022)
KIM DEWE - Transitioning into people management (BrightonSEO April 2022)KIM DEWE - Transitioning into people management (BrightonSEO April 2022)
KIM DEWE - Transitioning into people management (BrightonSEO April 2022)
 
Brighton SEO 2023 - ML Lessons For Total Search.pdf
Brighton SEO 2023 - ML Lessons For Total Search.pdfBrighton SEO 2023 - ML Lessons For Total Search.pdf
Brighton SEO 2023 - ML Lessons For Total Search.pdf
 
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
BrightonSEO - Master Crawl Budget Optimization for Enterprise WebsitesBrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
 
Content writers: will AI take your job?
Content writers: will AI take your job?Content writers: will AI take your job?
Content writers: will AI take your job?
 
Brighton SEO Autumn 2021: Core Web Vitals: Loopholes, Flaws, and Endless Delays
Brighton SEO Autumn 2021: Core Web Vitals: Loopholes, Flaws, and Endless DelaysBrighton SEO Autumn 2021: Core Web Vitals: Loopholes, Flaws, and Endless Delays
Brighton SEO Autumn 2021: Core Web Vitals: Loopholes, Flaws, and Endless Delays
 
SMX East: Recovering From Core Updates - Lily Ray
SMX East: Recovering From Core Updates - Lily RaySMX East: Recovering From Core Updates - Lily Ray
SMX East: Recovering From Core Updates - Lily Ray
 
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022 SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
 
How to Use Search Intent to Dominate Google Discover
How to Use Search Intent to Dominate Google DiscoverHow to Use Search Intent to Dominate Google Discover
How to Use Search Intent to Dominate Google Discover
 
How Search Works
How Search WorksHow Search Works
How Search Works
 
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance FrameworkGoodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
 
Kleecks - AI-Martech as a game changer-DEF.pdf
Kleecks - AI-Martech as a game changer-DEF.pdfKleecks - AI-Martech as a game changer-DEF.pdf
Kleecks - AI-Martech as a game changer-DEF.pdf
 
The Value of Featured Snippets (BrightonSEO 2023).pdf
The Value of Featured Snippets (BrightonSEO 2023).pdfThe Value of Featured Snippets (BrightonSEO 2023).pdf
The Value of Featured Snippets (BrightonSEO 2023).pdf
 
[BrightonSEO 2019] Restructuring Websites to Improve Indexability
[BrightonSEO 2019] Restructuring Websites to Improve Indexability[BrightonSEO 2019] Restructuring Websites to Improve Indexability
[BrightonSEO 2019] Restructuring Websites to Improve Indexability
 
How To EAT Links.pptx
How To EAT Links.pptxHow To EAT Links.pptx
How To EAT Links.pptx
 
How to unlock the secrets of effortless keyword research with ChatGPT.pptx
How to unlock the secrets of effortless keyword research with ChatGPT.pptxHow to unlock the secrets of effortless keyword research with ChatGPT.pptx
How to unlock the secrets of effortless keyword research with ChatGPT.pptx
 
Web Server SEO: Make your TTFB faster!
Web Server SEO: Make your TTFB faster!Web Server SEO: Make your TTFB faster!
Web Server SEO: Make your TTFB faster!
 
Internal Linking - The Topic Clustering Way edited.pptx
Internal Linking - The Topic Clustering Way edited.pptxInternal Linking - The Topic Clustering Way edited.pptx
Internal Linking - The Topic Clustering Way edited.pptx
 

Similar to How Google Really Analyzes Content for Search

You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEOMichael King
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeMarianne Sweeny
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
 
How Google is Reading and Indexing Content in 2016
How Google is Reading and Indexing Content in 2016How Google is Reading and Indexing Content in 2016
How Google is Reading and Indexing Content in 2016Greenlane
 
Google Search Appliance Version 2.0 Webinar - May 2012
Google Search Appliance Version 2.0 Webinar - May 2012Google Search Appliance Version 2.0 Webinar - May 2012
Google Search Appliance Version 2.0 Webinar - May 2012Fishbowl Solutions
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Marianne Sweeny
 
Google indexing
Google indexingGoogle indexing
Google indexingtahoor71
 
Understanding Semantic Search and AI Content to Drive Growth in 2023 March 2023
Understanding Semantic Search and AI Content to Drive Growth in 2023 March 2023Understanding Semantic Search and AI Content to Drive Growth in 2023 March 2023
Understanding Semantic Search and AI Content to Drive Growth in 2023 March 2023TysonStockton1
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search ExperienceMarianne Sweeny
 
Quality not quantity
Quality not quantityQuality not quantity
Quality not quantityvanesz
 
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...Paul Shapiro
 
Croud Presents: How to Build a Data-driven SEO Strategy Using NLP
Croud Presents: How to Build a Data-driven SEO Strategy Using NLPCroud Presents: How to Build a Data-driven SEO Strategy Using NLP
Croud Presents: How to Build a Data-driven SEO Strategy Using NLPDaniel Liddle
 
A Survey On Search Engines
A Survey On Search EnginesA Survey On Search Engines
A Survey On Search EnginesAndrew Parish
 

Similar to How Google Really Analyzes Content for Search (20)

You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
Google
GoogleGoogle
Google
 
How Google is Reading and Indexing Content in 2016
How Google is Reading and Indexing Content in 2016How Google is Reading and Indexing Content in 2016
How Google is Reading and Indexing Content in 2016
 
Google Search Appliance Version 2.0 Webinar - May 2012
Google Search Appliance Version 2.0 Webinar - May 2012Google Search Appliance Version 2.0 Webinar - May 2012
Google Search Appliance Version 2.0 Webinar - May 2012
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
Not Your Mom's SEO
Not Your Mom's SEONot Your Mom's SEO
Not Your Mom's SEO
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Google indexing
Google indexingGoogle indexing
Google indexing
 
Understanding Semantic Search and AI Content to Drive Growth in 2023 March 2023
Understanding Semantic Search and AI Content to Drive Growth in 2023 March 2023Understanding Semantic Search and AI Content to Drive Growth in 2023 March 2023
Understanding Semantic Search and AI Content to Drive Growth in 2023 March 2023
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search Experience
 
Quality not quantity
Quality not quantityQuality not quantity
Quality not quantity
 
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
Croud Presents: How to Build a Data-driven SEO Strategy Using NLP
Croud Presents: How to Build a Data-driven SEO Strategy Using NLPCroud Presents: How to Build a Data-driven SEO Strategy Using NLP
Croud Presents: How to Build a Data-driven SEO Strategy Using NLP
 
Search V Next Final
Search V Next FinalSearch V Next Final
Search V Next Final
 
A SURVEY ON SEARCH ENGINES
A SURVEY ON SEARCH ENGINESA SURVEY ON SEARCH ENGINES
A SURVEY ON SEARCH ENGINES
 
A Survey On Search Engines
A Survey On Search EnginesA Survey On Search Engines
A Survey On Search Engines
 

Recently uploaded

Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 

Recently uploaded (20)

Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 

How Google Really Analyzes Content for Search

  • 1. The New Content SEO FLOQ - Amanda King Sydney SEO Conference 14 April 2023
  • 2. The New Content SEO What we’ll talk about 1. A quick refresher 2. Have keywords ever actually been a thing Google used? 3. How Google reads content may not be what you think 4. So what do we do about all this? 5. Who tf am I?
  • 3.
  • 5. A brief refresher on how Google crawls the Internet It’s three separate stages: crawl, index, serve; with sub-processes for scoring and ranking. Content analysis is included in the indexing engine, content relevancy is in the serving engine. While this is an old patent (2011) the fundamentals still apply for this reminder. Source: https://patents.google.com/patent/US8572075B1/, retrieved 22 Mar 2023 https://developers.google.com/search/docs/fundamentals/how-search-works
  • 6. ● Query Deserves Freshness is a system ● Helpful Content is a system ● MUM & BERT are systems ○ “Bidirectional Encoder Representations from Transformers (BERT) is an AI system Google uses that allows us to understand how combinations of words express different meanings and intent.” The search engine ranking engine works in systems https://developers.google.com/search/docs/appearance/ranking-systems-guide
  • 7. Have keywords ever actually been a thing Google used?
  • 8. While Google is a machine, it’s moved fundamentally beyond keywords…and has since at least 2015.
  • 10. Queries very quickly become entities “[...]identifying queries in query data; determining, in each of the queries, (i) an entity-descriptive portion that refers to an entity and (ii) a suffix; determining a count of a number of times the one or more queries were submitted“ - patent granted in 2015, submitted in 2012 Source: https://patents.google.com/patent/US9047278B1/en ; https://patents.google.com/patent/US20150161127A1/
  • 11. Google acknowledges query-only based matching is pretty terrible. “Direct “Boolean” matching of query terms has well known limitations, and in particular does not identify documents that do not have the query terms, but have related words [...]The problem here is that conventional systems index documents based on individual terms, rather than on concepts. Concepts are often expressed in phrases [...] Accordingly, there is a need for an information retrieval system and methodology that can comprehensively identify phrases in a large scale corpus, index documents according to phrases, search and rank documents in accordance with their phrases, and provide additional clustering and descriptive information about the documents. [...]” - Information retrieval system for archiving multiple document versions, granted 2017 (link)
  • 12. So it decided to make it’s search engine concept and phrase-based. “The system is adapted to identify phrases that have sufficiently frequent and/or distinguished usage in the document collection to indicate that they are “valid” or “good” phrases [...]The system is further adapted to identify phrases that are related to each other, based on a phrase's ability to predict the presence of other phrases in a document.” - Information retrieval system for archiving multiple document versions, granted 2017 (link)
  • 13. “Rather than simply searching for content that matches individual words, BERT comprehends how a combination of words expresses a complex idea.” Source: https://blog.google/products/search/how-ai-powers-great-search-results/
  • 14. MUM takes this a step further ● About 1,000 times more powerful than BERT ● Trained across 75 languages for greater context ● Recognises this across different types of media (video, text, etc) https://blog.google/products/search/introducing-mum/
  • 15. How Google reads content may not be what you think
  • 16. Step 1 Indexing Indexing is the stage where content is analysed, so how does Google do it?
  • 17. BERT is a technique for pre-training natural language classification. So how does natural language processing work, once it has a corpus of data? Source: https://blog.google/products/search/search-language-understanding-bert/
  • 18. Is there anything in this process that even looks like “keywords”?
  • 19. 1. Parsing: Tokenisation, parts of speech, stemming (for Google, lemmatization) 2. Topic Modelling: entity detection, relation detection 3. Understanding 4. Onto the next engine, ranking So the broad strokes steps in the indexation process are
  • 20. ● Semantic distance ● Keyword-seed affinity ● Category-seed affinity ● Category-seed affinity to threshold Parsing is intrinsically categorisation https://patents.google.com/patent/US11106712B2; https://www.seobythesea.com/2021/09/semantic-relevance-of-keywords/
  • 21. How natural language processing usually works: tokenization and subwords Source: https://ai.googleblog.com/2021/12/a-fast-wordpiece-tokenization-system.html
  • 22. ● N-grams: important to find the primary concepts of the sentence by identifying and excluding stop words ● “Running” “runs” “ran” = same base — “run” This gets broken down even further https://patents.google.com/patent/US8423350B1/
  • 23. Google does a lot of things when detecting entities and relationships ● Identifying aspects to define entities based on popularity and diversity, granted in 2011 (link) ● Finding the entity associated with a query before returning a result, using input from human quality raters to confirm objective fact associated with an entity, granted in 2015 (link) ● Understanding the context of the query, entity and related answer you’re searching for, granted in 2019 (link) ● Aims to understand user generated content signals in relation to a webpage, granted in 2022 (link)
  • 24. Google does a lot of things when detecting entities and relationships ● Understanding the best way to present an entity in a results page, granted in 2016 (link) ● Managing and identifying disambiguation in entities, granted in 2016 (link) ● Build entities through co-occurring ”methodology based on phrases” and store lower information gain documents in a secondary index, granted in 2020 (link) ● Understanding context from previous query results and behaviour, granted in 2016 (link)
  • 25. Step 2 Scoring In their own description of their ranking & scoring engine, Google offers 5 buckets: ● Meaning ● Relevance ● Quality ● Usability ● Context
  • 26. Scoring is all those 200+ factors we talk about… Google has cited everything from internal links, external links, pogo sticking, “user behaviour”, proximity of the query terms to each other, context, attributes, and more Just a few of the patents related to scoring: ● Evaluating quality based on neighbor features (link) ● Entity confidence (link) ● Search operation adjustment and re-scoring (link) ● Evaluating website properties by partitioning user feedback (link) ● Providing result-based query suggestions (link) ● Multi-process scoring (link) ● Block spam blog posts with “low link-based score” (link)
  • 27. It actually looks like they have a classification engine for entities as well This patent was filed in 2010, granted in 2014. Likely a basis for the Knowledge Graph. (US8838587B1) https://patents.google.com/patent/US8838587B1/en
  • 28. “...link structure may be unavailable, unreliable, or limited in scope, thus, limiting the value of using PageRank in ascertaining the relative quality of some documents.” (circa 2005) https://patents.google.com/patent/US7962462B1/en
  • 29. There’s more than one document scoring function, which are weighted, and has been since the beginning
  • 30. How Google ranks content ● Based on historical behaviour from similar searches in aggregate (application) ● Based on external links (link) ● Based on your own previous searches (link) ● Based on or not it should directly provide the answer via Knowledge Graph (link) ● Phrase- and entity-based co-occurrence threshold scores (link) ● Understanding intent based on contextual information (link)
  • 31. Helpful Content Update & Information Gain Score (granted Jun 2022) ● The information gain score might be personal to you and the results you’ve already seen ● Featured snippets may be different from one search to another based on the information gain score of your second search ● Pre-training a ML model on a first set of data shown to users in aggregate, getting an information gain score, and using that to generate new results in SERPs. https://patents.google.com/patent/US20200349181A1/en
  • 32. What is “information gain”? “Information gain, as the ratio of actual co-occurrence rate to expected co-occurrence rate, is one such prediction measure. Two phrases are related where the prediction measure exceeds a predetermined threshold. In that case, the second phrase has significant information gain with respect to the first phrase.“ - Phrase-based searching in an information retrieval system, granted 2009 (link)
  • 33. So, basically, it’s quantifying to what degree you talk about all the topics Google sees as related to your main subject.
  • 34. If information gain is such a strong concept in which results Google chooses which content to show, why do so few folks talk about it? https://patents.google.com/patent/US7962462B1/en
  • 35. So what do we do about all this?
  • 36. When is the last time you’ve done a full content inventory?
  • 37. What I mean when I say content inventory https://www.portent.com/onetrick
  • 38. Redo keyword research and overlay entities ● Pull content for at least the top 10 search results ranking for your target keyword ● Dump them into Diffbot (https://demo.nl.diffbot.com/) or the Natural Language AI demo (https://cloud.google.com/natural-language) ● Note the entities and salience ● Run your target page ● Understand the differences ● Update your content accordingly
  • 39. Start with keyword research, find co- occuring terms ● Pull content for at least the top 10 search results ranking for your target keyword ● Look at TF-IDF calculators to reverse engineer the topic correlation (Ryte has a paid one) ● Note the terms included ● Run your target page ● Understand the differences ● Update your content accordingly
  • 40. Break old content habits ● FAQ on product pages ● Consolidate super-granularly targeted blog articles ● Think outside of the blog folder — the semantic relationship can carry through to the directory order of the website as well ● Internal linking can be a secret weapon ● Fit content to purpose: not everything needs a 3,000 word in-depth article
  • 41. Measure what really matters to the business — traffic and revenue from organic.
  • 42. Who tf am I?
  • 43. Amanda King is a human ● Over a decade in the SEO industry ● Traveled to 40+ countries ● Business- and product-focussed ● Knows CRO, Data, UX ● Always open to learning something new ● Slightly obsessed with tea
  • 44. Thank you Amanda King t. @amandaecking i. @floq.co / @amandaecking w. floq.co

Editor's Notes

  1. This is a lot of information and I don’t have all the answers - there’s a lot of patents and patent diving I’ve done, so if things get dry, I apologise. You can do a shot for every time I say “system” or “entity”.
  2. https://status.search.google.com/ Crawling, indexing, ranking, serving
  3. I may
  4. Google is vector based: If search x goes to document a, and document a also contains term b, term b will be added to a list of associated topics for search x.
  5. Original applied in 2005, granted in 2010: https://patents.google.com/patent/US7702618B1/en (Google really started to become popular in 2000) Discussing how they would build their knowledge graph, essentially Indexing system: 1) identification of phrases and related phrases, 2) indexing of documents with respect to phrases 3) generation and maintenance of a phrase-based taxonomy. co-occurrence matrix for the good phrases is maintained
  6. If search x goes to document a, and document a also contains term b, term b will be added to a list of associated topics for search x. third stage of the indexing operation is to prune the good phrase list using a predictive measure derived from the co-occurrence matrix Unlike existing systems which use predetermined or hand selected phrases, the good phrase list reflects phrases that actual are being used in the corpus. Further, since the above process of crawling and indexing is repeated periodically as new documents are added to the document collection, the indexing system automatically detects new phrases as they enter the lexicon The next step is to determine which related phrases together form a cluster of related phrases. A cluster is a set of related phrases in which each phrase has high information gain with respect to at least one other phrase. In one embodiment, clusters are identified as follows. “ First, rather than a strictly—and often arbitrarily—defined hierarchy of topics and concepts, this approach recognizes that topics, as indicated by related phrases, form a complex graph of relationships, where some phrases are related to many other phrases, and some phrases have a more limited scope, and where the relationships can be mutual (each phrase predicts the other phrase) or one-directional (one phrase predicts the other, but not vice versa). The result is that clusters can be characterized “local” to each good phrase, and some clusters will then overlap by having one or more common related phrases.” “The indexing of documents by phrases and use of the clustering information provides yet another advantage of the indexing system, which is the ability to determine the topics that a document is about based on the related phrase information.”
  7. There’s also Palm, calm and lamda (one google engineer even claimed lamda was sentient)
  8. This is where content analysis is included
  9. BERT comes in during the topic modelling phase, it’s not the entirety of the indexation process. Define corpus - the documents on the internet they can crawl
  10. Remember natural language processing is not unique to Google. There are entire fields dedicated to it, it’s an entire branch of AI and computational linguistics.
  11. The semantic distance between words can be estimated as the number of vertices that connect the two words.
  12. Tokenisation is essentially converting a sentence into “tokens” to turn an unstructured string into elements that can be understood by machine learning. BERT has found shortcuts in the system of tokenisation through predictive modelling, matching and skipping, allowing the process to be about 5x faster than previous models to tokenise text.
  13. Popularity score - search history frequency, click through rate, dwell time; diversity score is based on how similar the unranked document is to already ranked documents.
  14. Based on historical behaviour from similar searches in aggregate (application) “The system may also comprise a profile database that stores profiles associated with specific remote devices for use by the results ranker in ordering the categories. In addition, the system may comprise a relevance filter that stores data about other search queries received from other remote devices, the data including distributions of previously determined correlations between the other search queries and one or more different categories of information.” Image 8 Based on your own previous searches (link) How quickly you went from choosing one result to another Whether or not you go back to the same source multiple times over time Whether you choose a particular result more than the general population Your declared demographics Your declared location (link) If you’ve made a bunch of the same types of searches (weather in britain, weather in spain), “sibling scores” (link) Whether or not it should directly provide the answer via Knowledge Graph (link) Whether or not it should have a zero result with a quick fact (link) Whether or not text or another presentation of information makes sense (link) Whether or not to return a “card”, like for movies showing at a particular theatre (link)
  15. Raising the threshold over 1.0 serves to reduce the possibility that two otherwise unrelated phrases co-occur more than randomly predicted
  16. Don’t have the answer for you there, I just like posing rhetorical questions.
  17. This process is manual, but hopefully before the end of the financial year I’ll have a more automated process you can steal What is entity salience? entity salience refers to the prominence of an entity within the content. Entity research and entity salience tell you what people who are ranking are talking about; co-occuring terms tell you what google is expecting folks to talk about — sometimes there’s a gap.
  18. Google uses TF-IDF to assign terms to an entity, amongst many other things. https://patents.google.com/patent/US8589399B1/en So why don’t we use TF-IDF to reverse engineer that? This isn’t about keyword density
  19. Adding FAQ (ongoing) leading indicators strong with product pages with 83% more traffic YoY than overall product category in organic (-1.7% v -10% YoY) Blog consolidation: redirected about 60% of blog content - maintained traffic parity with overall organic traffic to the website: win for the business (less overhead) Thinking outside the blog folder: Optus — 24% uplift in conversion when content was a part of the user journey