SlideShare a Scribd company logo
How to create your own
search quality
evaluation algorithms
Richard Lawrence
Sanity.io
@richlawre
@richlawre
● Principal SEO at
Sanity
Who the hell is this guy anyway?
Who the hell is this guy anyway?
@richlawre
● Sanity is a headless
CMS and more!
@richlawre
● Doing a Data Science
degree in my spare
time
Who the hell is this guy anyway?
Onto some context
@richlawre
The ‘helpful content update’ might have
been a bit of a damp squib…
@richlawre
…but Google is always working towards
ranking helpful content more highly
@richlawre
So wouldn’t it be great to know if your
content is helping your audience - at scale?
@richlawre
The search rater guidelines hold the key
@richlawre
167 page document
that says what good
looks like!
Google says it doesn’t directly use the
ratings in its ranking algorithms
“We use responses from Raters
to evaluate changes, but they
don’t directly impact how our
search results are ranked.”
bit.ly/ratings-answer
@richlawre
But it will use the rated content to help find
features of what ‘good’ looks like
@richlawre
Similar methods have been used for years
in various areas - like counterfeit notes
@richlawre
Features are found that best separate
authentic and counterfeit notes
Distance between edge & watermark
Width of
shaded area
Counterfeit
Authentic
@richlawre
Features for high vs. low quality content will
likely be more complex
@richlawre
Bing confirmed this is how it works in 2019
bit.ly/bing-confirmation @richlawre
With 90% of its algorithms being ML based
@richlawre
bit.ly/bing-features
Plus it revealed its process
@richlawre
bit.ly/bing-process
So how can we harness this as an industry?
@richlawre
We can try to create our own!
@richlawre
1. Label the content
2. Create a ‘Needs Met’ algorithm
3. Create a ‘Page Quality’ algorithm
What we need to do
@richlawre
Labelling the content
@richlawre
Get a representative sample of searches
448 million search queries
bit.ly/448-million @richlawre
Here’s how to play around with the file
@richlawre
bit.ly/large-file
Then gather the top 20 rankings for each
sample query
Likely available
feature of your
favourite rank
tracking software
@richlawre
Use some search raters to rate the content
Collect
labels
Choose
provider
Create
guidelines
Must not be
identical to
Google’s…
Needs Met &
Page Quality
2 search raters
with 3rd called in
for disagreements
@richlawre
Creating a Needs Met algorithm
@richlawre
This measures fulfilling search intent
Features will mainly be
relating to relevance
and structure
@richlawre
GPT language models are perfect for this
The open source option
@richlawre
GPT-3 became cheaper in September too
@richlawre
We need to create a pattern for GPT-J to learn
Content:
<h1>Compare car insurance quotes</h1>
<p>It's quick and easy to compare car insurance
and find cheaper cover – we just need a few
details about you and your vehicle.</p>
Target query: car insurance
Needs Met rating: Good
@richlawre
It will then rate new content
Content:
<h1>Car insurance</h1>
<p>From theft to write-offs and even lost keys,
you'll be covered with us. Here's what you'll like
about our comprehensive cover </p>
Target query: car insurance
Needs Met rating: ?????
@richlawre
We need to scrape content from each page to
give to the language model - with the rating
@richlawre
Then use this info to train GPT-J
@richlawre
bit.ly/finetune-gptj
You can also use existing services
@richlawre
NLP Cloud Forefront.ai
NLP Cloud also became cheaper!
@richlawre
Validate performance with a test set
@richlawre
Judge performance with a Confusion Matrix
@richlawre
Correct
Wrong
Correct Wrong
True positive False negative
False positive True negative
Actual
Prediction
Few shot learning can help improve
performance
@richlawre
Prompt
Example 1
Rating: Excellent
Example 2
Rating: Poor
Example 3
Rating: ????
GPT-J
Good
As can explaining to the model what it
needs to do!
@richlawre
Consider the content to rate.
Rate it according how well it
fits the search query.
We’ve done this for you within Sanity Studio
@richlawre
And lots of other great features
@richlawre
Contact us for more info about the beta for
these features:
bit.ly/sanity-beta
@richlawre
This isn’t perfect of course - though still very
useful
@richlawre
● Only text content
● Useful indication only
● Great at scale
Creating a Page Quality algorithm
@richlawre
This is much more difficult!
@richlawre
It measures how well a page achieves its
purpose
@richlawre
This is about quality of
content, independent
of search queries
So features can relate to a large number of
areas!
@richlawre
‘Main Content’ vs
‘Supplementary
Content’
Website
background
information
Amount of Main Content
Position of Main Content
Depth of ‘about’ info
Wikipedia presence
And you have to work out how to measure
them
@richlawre
Amount of Main
Content
Length of Main
Content area
Number of words
in Main Content
It becomes a huge multivariate challenge
@richlawre
Page
Length of
MC area
‘About us’
word count
Clicks to
‘About us’
Page 1 17cm 500 2
Page 2 20cm 300 1
Page 3 15cm 1000 2
Page 4 25cm 750 3
Then we need to find features that best
separate the groups
Number of words in ‘About’ section
Length of
‘Main Content’
area
High quality
Low quality
@richlawre
But with a large number of features!
@richlawre
This can be explored with a number of
potential models
@richlawre
Linear Discriminant Analysis
@richlawre
This can be explored with a number of
potential models
Random Forest
@richlawre
This can be explored with a number of
potential models
Neural Network
This is a huge challenge!
@richlawre
Which features?
@richlawre
How to measure them?
@richlawre
Which model?
@richlawre
The work is ongoing here!
@richlawre
Let’s sum up
@richlawre
Google likely uses its raters to gather
labelled data on content quality
@richlawre
It will then likely use that to find features of
‘good’ and ‘bad’ content
@richlawre
And creates algorithms to distinguish
between the two
@richlawre
You can do the same!
@richlawre
Get your own labelled content and create
your own scoring algorithms
@richlawre
We have created a ‘Needs Met’ score within
Sanity Studio
@richlawre
So that you can get an indication of content
calibre directly in your publishing workflow
@richlawre
Contact us to get more info about the beta
here:
bit.ly/sanity-beta
@richlawre
Richard Lawrence
Principal at Sanity.io
@richlawre
@richlawre

More Related Content

What's hot

How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
LazarinaStoyanova
 
Networking for SEOs (and why it matters)
Networking for SEOs (and why it matters)Networking for SEOs (and why it matters)
Networking for SEOs (and why it matters)
GretaKoivikko
 
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
BethBarnham1
 
How To EAT Links.pptx
How To EAT Links.pptxHow To EAT Links.pptx
How To EAT Links.pptx
Dixon Jones
 
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
BrightonSEO: How to generate 8 million SEO test ideas - Will CritchlowBrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
Will Critchlow
 
How to control googlebot
How to control googlebotHow to control googlebot
How to control googlebot
Serge Bezborodov
 
Shining a light on the dark funnel
Shining a light on the dark funnelShining a light on the dark funnel
Shining a light on the dark funnel
Riaz Kanani
 
Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022
Andrew Charlton
 
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptxBrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
JosephineHaagen
 
How to produce great multilingual content, even when you can't read it | Laur...
How to produce great multilingual content, even when you can't read it | Laur...How to produce great multilingual content, even when you can't read it | Laur...
How to produce great multilingual content, even when you can't read it | Laur...
Oban International
 
What we can learn from losing SEO tests
What we can learn from losing SEO testsWhat we can learn from losing SEO tests
What we can learn from losing SEO tests
Will Critchlow
 
Brighton SEO 2023 - ML Lessons For Total Search.pdf
Brighton SEO 2023 - ML Lessons For Total Search.pdfBrighton SEO 2023 - ML Lessons For Total Search.pdf
Brighton SEO 2023 - ML Lessons For Total Search.pdf
MaxFlajsner1
 
[BrightonSEO 2022] Unlocking the Hidden Potential of Product Listing Pages
[BrightonSEO 2022] Unlocking the Hidden Potential of Product Listing Pages[BrightonSEO 2022] Unlocking the Hidden Potential of Product Listing Pages
[BrightonSEO 2022] Unlocking the Hidden Potential of Product Listing Pages
Areej AbuAli
 
How to create content that generates leads -- not just traffic.pptx
How to create content that generates leads -- not just traffic.pptxHow to create content that generates leads -- not just traffic.pptx
How to create content that generates leads -- not just traffic.pptx
AramintaRobertson
 
How to get more traffic with less content - BrightonSEO
How to get more traffic with less content - BrightonSEOHow to get more traffic with less content - BrightonSEO
How to get more traffic with less content - BrightonSEO
Anna Gregory-Hall
 
Brighton SEO April 2022 - Automate the technical SEO stuff
Brighton SEO April 2022 - Automate the technical SEO stuffBrighton SEO April 2022 - Automate the technical SEO stuff
Brighton SEO April 2022 - Automate the technical SEO stuff
Michael Van Den Reym
 
Core Web Vitals Audit - Sophie Gibson - PDF - BrightonSEO.pdf
Core Web Vitals Audit - Sophie Gibson - PDF - BrightonSEO.pdfCore Web Vitals Audit - Sophie Gibson - PDF - BrightonSEO.pdf
Core Web Vitals Audit - Sophie Gibson - PDF - BrightonSEO.pdf
Sophie Gibson
 
Data Pitfalls - Brighton SEO - Katie Swann.pptx
Data Pitfalls - Brighton SEO - Katie Swann.pptxData Pitfalls - Brighton SEO - Katie Swann.pptx
Data Pitfalls - Brighton SEO - Katie Swann.pptx
KatieSwann5
 
[BrightonSEO Oct 2022] Remarketing SEO - Himani Kankaria - Missive Digital.pdf
[BrightonSEO Oct 2022] Remarketing SEO - Himani Kankaria - Missive Digital.pdf[BrightonSEO Oct 2022] Remarketing SEO - Himani Kankaria - Missive Digital.pdf
[BrightonSEO Oct 2022] Remarketing SEO - Himani Kankaria - Missive Digital.pdf
Himani Kankaria
 
Brighton Y U No Reply_Bibi the Link Builder.pdf
Brighton Y U No Reply_Bibi the Link Builder.pdfBrighton Y U No Reply_Bibi the Link Builder.pdf
Brighton Y U No Reply_Bibi the Link Builder.pdf
Bibi the Link Builder
 

What's hot (20)

How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
 
Networking for SEOs (and why it matters)
Networking for SEOs (and why it matters)Networking for SEOs (and why it matters)
Networking for SEOs (and why it matters)
 
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
Accessibility, strategy and schema - do they go hand in hand? Beth Barnham Br...
 
How To EAT Links.pptx
How To EAT Links.pptxHow To EAT Links.pptx
How To EAT Links.pptx
 
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
BrightonSEO: How to generate 8 million SEO test ideas - Will CritchlowBrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
 
How to control googlebot
How to control googlebotHow to control googlebot
How to control googlebot
 
Shining a light on the dark funnel
Shining a light on the dark funnelShining a light on the dark funnel
Shining a light on the dark funnel
 
Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022Probabilistic Thinking in SEO - BrightonSEO October 2022
Probabilistic Thinking in SEO - BrightonSEO October 2022
 
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptxBrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
 
How to produce great multilingual content, even when you can't read it | Laur...
How to produce great multilingual content, even when you can't read it | Laur...How to produce great multilingual content, even when you can't read it | Laur...
How to produce great multilingual content, even when you can't read it | Laur...
 
What we can learn from losing SEO tests
What we can learn from losing SEO testsWhat we can learn from losing SEO tests
What we can learn from losing SEO tests
 
Brighton SEO 2023 - ML Lessons For Total Search.pdf
Brighton SEO 2023 - ML Lessons For Total Search.pdfBrighton SEO 2023 - ML Lessons For Total Search.pdf
Brighton SEO 2023 - ML Lessons For Total Search.pdf
 
[BrightonSEO 2022] Unlocking the Hidden Potential of Product Listing Pages
[BrightonSEO 2022] Unlocking the Hidden Potential of Product Listing Pages[BrightonSEO 2022] Unlocking the Hidden Potential of Product Listing Pages
[BrightonSEO 2022] Unlocking the Hidden Potential of Product Listing Pages
 
How to create content that generates leads -- not just traffic.pptx
How to create content that generates leads -- not just traffic.pptxHow to create content that generates leads -- not just traffic.pptx
How to create content that generates leads -- not just traffic.pptx
 
How to get more traffic with less content - BrightonSEO
How to get more traffic with less content - BrightonSEOHow to get more traffic with less content - BrightonSEO
How to get more traffic with less content - BrightonSEO
 
Brighton SEO April 2022 - Automate the technical SEO stuff
Brighton SEO April 2022 - Automate the technical SEO stuffBrighton SEO April 2022 - Automate the technical SEO stuff
Brighton SEO April 2022 - Automate the technical SEO stuff
 
Core Web Vitals Audit - Sophie Gibson - PDF - BrightonSEO.pdf
Core Web Vitals Audit - Sophie Gibson - PDF - BrightonSEO.pdfCore Web Vitals Audit - Sophie Gibson - PDF - BrightonSEO.pdf
Core Web Vitals Audit - Sophie Gibson - PDF - BrightonSEO.pdf
 
Data Pitfalls - Brighton SEO - Katie Swann.pptx
Data Pitfalls - Brighton SEO - Katie Swann.pptxData Pitfalls - Brighton SEO - Katie Swann.pptx
Data Pitfalls - Brighton SEO - Katie Swann.pptx
 
[BrightonSEO Oct 2022] Remarketing SEO - Himani Kankaria - Missive Digital.pdf
[BrightonSEO Oct 2022] Remarketing SEO - Himani Kankaria - Missive Digital.pdf[BrightonSEO Oct 2022] Remarketing SEO - Himani Kankaria - Missive Digital.pdf
[BrightonSEO Oct 2022] Remarketing SEO - Himani Kankaria - Missive Digital.pdf
 
Brighton Y U No Reply_Bibi the Link Builder.pdf
Brighton Y U No Reply_Bibi the Link Builder.pdfBrighton Y U No Reply_Bibi the Link Builder.pdf
Brighton Y U No Reply_Bibi the Link Builder.pdf
 

Similar to Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf

Richard Lawrence - How to measure the impact of LinkedIn ads with zero clicks...
Richard Lawrence - How to measure the impact of LinkedIn ads with zero clicks...Richard Lawrence - How to measure the impact of LinkedIn ads with zero clicks...
Richard Lawrence - How to measure the impact of LinkedIn ads with zero clicks...
Richard Lawrence
 
Master Class SEO
Master Class SEOMaster Class SEO
Master Class SEO
DQ Network
 
Advanced Keyword Research
Advanced Keyword ResearchAdvanced Keyword Research
Advanced Keyword Research
Dave Snyder
 
The In-depth Guide to Website On-page Optimization
The In-depth Guide to Website On-page OptimizationThe In-depth Guide to Website On-page Optimization
The In-depth Guide to Website On-page Optimization
Julia Blake
 
intoduction to search engine optimization.pptx
intoduction to search engine optimization.pptxintoduction to search engine optimization.pptx
intoduction to search engine optimization.pptx
sumanjeetkaur15
 
SEO, PPC and AI in 2023 and Beyond
SEO, PPC and AI in 2023 and BeyondSEO, PPC and AI in 2023 and Beyond
SEO, PPC and AI in 2023 and Beyond
Lily Ray
 
SearchCon 2016 | High Velocity Presentations
SearchCon 2016 | High Velocity PresentationsSearchCon 2016 | High Velocity Presentations
SearchCon 2016 | High Velocity Presentations
SearchCon
 
180 Fusion - SEO capabilities
180 Fusion - SEO capabilities180 Fusion - SEO capabilities
180 Fusion - SEO capabilities
Justin Campbell
 
Demand Quest SEO Training Sept. 2017 - Session 1
Demand Quest SEO Training Sept. 2017 - Session 1Demand Quest SEO Training Sept. 2017 - Session 1
Demand Quest SEO Training Sept. 2017 - Session 1
Nate Plaunt
 
Seo questions for 2013
Seo questions for 2013Seo questions for 2013
Seo questions for 2013
Lalit Kant
 
How your (non-SEO) work affects Organic Search.
How your (non-SEO) work affects Organic Search. How your (non-SEO) work affects Organic Search.
How your (non-SEO) work affects Organic Search.
Matt Lacuesta
 
Critical Rules for SEO Success in 2014
Critical Rules for SEO Success in 2014Critical Rules for SEO Success in 2014
Critical Rules for SEO Success in 2014Act-On Software
 
Search Engine Optimisation: A High Level View
Search Engine Optimisation: A High Level ViewSearch Engine Optimisation: A High Level View
Search Engine Optimisation: A High Level View
justin spratt
 
SEO for humans, without the jargon- Halton Business Fair November 16
SEO for humans, without the jargon- Halton Business Fair November 16SEO for humans, without the jargon- Halton Business Fair November 16
SEO for humans, without the jargon- Halton Business Fair November 16
Jonathan Guy ✯ Paid and Organic Search Specialist
 
Purplegator SEO Pitch Deck.pptx
Purplegator SEO Pitch Deck.pptxPurplegator SEO Pitch Deck.pptx
Purplegator SEO Pitch Deck.pptx
Purplegator
 
SEO Training Course Online, Learn SEO, SEO for Beginners, Complete SEO Tutorial
SEO Training Course Online, Learn SEO, SEO for Beginners, Complete SEO TutorialSEO Training Course Online, Learn SEO, SEO for Beginners, Complete SEO Tutorial
SEO Training Course Online, Learn SEO, SEO for Beginners, Complete SEO Tutorial
Deep Mehta
 
Demand quest SEO training Session 1 May 2017
Demand quest SEO training Session 1 May 2017Demand quest SEO training Session 1 May 2017
Demand quest SEO training Session 1 May 2017
Nate Plaunt
 
Creating Findable Content: SEO for Non-SEOs
Creating Findable Content: SEO for Non-SEOsCreating Findable Content: SEO for Non-SEOs
Creating Findable Content: SEO for Non-SEOs
Harris A. Schachter
 
SEO Overview
SEO OverviewSEO Overview
SEO Overview
Bridgett Gutierrez
 
Demand quest seo training
Demand quest seo trainingDemand quest seo training
Demand quest seo training
Nate Plaunt
 

Similar to Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf (20)

Richard Lawrence - How to measure the impact of LinkedIn ads with zero clicks...
Richard Lawrence - How to measure the impact of LinkedIn ads with zero clicks...Richard Lawrence - How to measure the impact of LinkedIn ads with zero clicks...
Richard Lawrence - How to measure the impact of LinkedIn ads with zero clicks...
 
Master Class SEO
Master Class SEOMaster Class SEO
Master Class SEO
 
Advanced Keyword Research
Advanced Keyword ResearchAdvanced Keyword Research
Advanced Keyword Research
 
The In-depth Guide to Website On-page Optimization
The In-depth Guide to Website On-page OptimizationThe In-depth Guide to Website On-page Optimization
The In-depth Guide to Website On-page Optimization
 
intoduction to search engine optimization.pptx
intoduction to search engine optimization.pptxintoduction to search engine optimization.pptx
intoduction to search engine optimization.pptx
 
SEO, PPC and AI in 2023 and Beyond
SEO, PPC and AI in 2023 and BeyondSEO, PPC and AI in 2023 and Beyond
SEO, PPC and AI in 2023 and Beyond
 
SearchCon 2016 | High Velocity Presentations
SearchCon 2016 | High Velocity PresentationsSearchCon 2016 | High Velocity Presentations
SearchCon 2016 | High Velocity Presentations
 
180 Fusion - SEO capabilities
180 Fusion - SEO capabilities180 Fusion - SEO capabilities
180 Fusion - SEO capabilities
 
Demand Quest SEO Training Sept. 2017 - Session 1
Demand Quest SEO Training Sept. 2017 - Session 1Demand Quest SEO Training Sept. 2017 - Session 1
Demand Quest SEO Training Sept. 2017 - Session 1
 
Seo questions for 2013
Seo questions for 2013Seo questions for 2013
Seo questions for 2013
 
How your (non-SEO) work affects Organic Search.
How your (non-SEO) work affects Organic Search. How your (non-SEO) work affects Organic Search.
How your (non-SEO) work affects Organic Search.
 
Critical Rules for SEO Success in 2014
Critical Rules for SEO Success in 2014Critical Rules for SEO Success in 2014
Critical Rules for SEO Success in 2014
 
Search Engine Optimisation: A High Level View
Search Engine Optimisation: A High Level ViewSearch Engine Optimisation: A High Level View
Search Engine Optimisation: A High Level View
 
SEO for humans, without the jargon- Halton Business Fair November 16
SEO for humans, without the jargon- Halton Business Fair November 16SEO for humans, without the jargon- Halton Business Fair November 16
SEO for humans, without the jargon- Halton Business Fair November 16
 
Purplegator SEO Pitch Deck.pptx
Purplegator SEO Pitch Deck.pptxPurplegator SEO Pitch Deck.pptx
Purplegator SEO Pitch Deck.pptx
 
SEO Training Course Online, Learn SEO, SEO for Beginners, Complete SEO Tutorial
SEO Training Course Online, Learn SEO, SEO for Beginners, Complete SEO TutorialSEO Training Course Online, Learn SEO, SEO for Beginners, Complete SEO Tutorial
SEO Training Course Online, Learn SEO, SEO for Beginners, Complete SEO Tutorial
 
Demand quest SEO training Session 1 May 2017
Demand quest SEO training Session 1 May 2017Demand quest SEO training Session 1 May 2017
Demand quest SEO training Session 1 May 2017
 
Creating Findable Content: SEO for Non-SEOs
Creating Findable Content: SEO for Non-SEOsCreating Findable Content: SEO for Non-SEOs
Creating Findable Content: SEO for Non-SEOs
 
SEO Overview
SEO OverviewSEO Overview
SEO Overview
 
Demand quest seo training
Demand quest seo trainingDemand quest seo training
Demand quest seo training
 

Recently uploaded

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 

Recently uploaded (20)

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 

Creating Search Quality Algorithms - Richard Lawrence - BrightonSEO.pdf

  • 1. How to create your own search quality evaluation algorithms Richard Lawrence Sanity.io @richlawre
  • 2. @richlawre ● Principal SEO at Sanity Who the hell is this guy anyway?
  • 3. Who the hell is this guy anyway? @richlawre ● Sanity is a headless CMS and more!
  • 4. @richlawre ● Doing a Data Science degree in my spare time Who the hell is this guy anyway?
  • 6. The ‘helpful content update’ might have been a bit of a damp squib… @richlawre
  • 7. …but Google is always working towards ranking helpful content more highly @richlawre
  • 8. So wouldn’t it be great to know if your content is helping your audience - at scale? @richlawre
  • 9. The search rater guidelines hold the key @richlawre 167 page document that says what good looks like!
  • 10. Google says it doesn’t directly use the ratings in its ranking algorithms “We use responses from Raters to evaluate changes, but they don’t directly impact how our search results are ranked.” bit.ly/ratings-answer @richlawre
  • 11. But it will use the rated content to help find features of what ‘good’ looks like @richlawre
  • 12. Similar methods have been used for years in various areas - like counterfeit notes @richlawre
  • 13. Features are found that best separate authentic and counterfeit notes Distance between edge & watermark Width of shaded area Counterfeit Authentic @richlawre
  • 14. Features for high vs. low quality content will likely be more complex @richlawre
  • 15. Bing confirmed this is how it works in 2019 bit.ly/bing-confirmation @richlawre
  • 16. With 90% of its algorithms being ML based @richlawre bit.ly/bing-features
  • 17. Plus it revealed its process @richlawre bit.ly/bing-process
  • 18. So how can we harness this as an industry? @richlawre
  • 19. We can try to create our own! @richlawre
  • 20. 1. Label the content 2. Create a ‘Needs Met’ algorithm 3. Create a ‘Page Quality’ algorithm What we need to do @richlawre
  • 22. Get a representative sample of searches 448 million search queries bit.ly/448-million @richlawre
  • 23. Here’s how to play around with the file @richlawre bit.ly/large-file
  • 24. Then gather the top 20 rankings for each sample query Likely available feature of your favourite rank tracking software @richlawre
  • 25. Use some search raters to rate the content Collect labels Choose provider Create guidelines Must not be identical to Google’s… Needs Met & Page Quality 2 search raters with 3rd called in for disagreements @richlawre
  • 26. Creating a Needs Met algorithm @richlawre
  • 27. This measures fulfilling search intent Features will mainly be relating to relevance and structure @richlawre
  • 28. GPT language models are perfect for this The open source option @richlawre
  • 29. GPT-3 became cheaper in September too @richlawre
  • 30. We need to create a pattern for GPT-J to learn Content: <h1>Compare car insurance quotes</h1> <p>It's quick and easy to compare car insurance and find cheaper cover – we just need a few details about you and your vehicle.</p> Target query: car insurance Needs Met rating: Good @richlawre
  • 31. It will then rate new content Content: <h1>Car insurance</h1> <p>From theft to write-offs and even lost keys, you'll be covered with us. Here's what you'll like about our comprehensive cover </p> Target query: car insurance Needs Met rating: ????? @richlawre
  • 32. We need to scrape content from each page to give to the language model - with the rating @richlawre
  • 33. Then use this info to train GPT-J @richlawre bit.ly/finetune-gptj
  • 34. You can also use existing services @richlawre NLP Cloud Forefront.ai
  • 35. NLP Cloud also became cheaper! @richlawre
  • 36. Validate performance with a test set @richlawre
  • 37. Judge performance with a Confusion Matrix @richlawre Correct Wrong Correct Wrong True positive False negative False positive True negative Actual Prediction
  • 38. Few shot learning can help improve performance @richlawre Prompt Example 1 Rating: Excellent Example 2 Rating: Poor Example 3 Rating: ???? GPT-J Good
  • 39. As can explaining to the model what it needs to do! @richlawre Consider the content to rate. Rate it according how well it fits the search query.
  • 40. We’ve done this for you within Sanity Studio @richlawre
  • 41. And lots of other great features @richlawre
  • 42. Contact us for more info about the beta for these features: bit.ly/sanity-beta @richlawre
  • 43. This isn’t perfect of course - though still very useful @richlawre ● Only text content ● Useful indication only ● Great at scale
  • 44. Creating a Page Quality algorithm @richlawre
  • 45. This is much more difficult! @richlawre
  • 46. It measures how well a page achieves its purpose @richlawre This is about quality of content, independent of search queries
  • 47. So features can relate to a large number of areas! @richlawre ‘Main Content’ vs ‘Supplementary Content’ Website background information Amount of Main Content Position of Main Content Depth of ‘about’ info Wikipedia presence
  • 48. And you have to work out how to measure them @richlawre Amount of Main Content Length of Main Content area Number of words in Main Content
  • 49. It becomes a huge multivariate challenge @richlawre Page Length of MC area ‘About us’ word count Clicks to ‘About us’ Page 1 17cm 500 2 Page 2 20cm 300 1 Page 3 15cm 1000 2 Page 4 25cm 750 3
  • 50. Then we need to find features that best separate the groups Number of words in ‘About’ section Length of ‘Main Content’ area High quality Low quality @richlawre
  • 51. But with a large number of features! @richlawre
  • 52. This can be explored with a number of potential models @richlawre Linear Discriminant Analysis
  • 53. @richlawre This can be explored with a number of potential models Random Forest
  • 54. @richlawre This can be explored with a number of potential models Neural Network
  • 55. This is a huge challenge! @richlawre
  • 57. How to measure them? @richlawre
  • 59. The work is ongoing here! @richlawre
  • 61. Google likely uses its raters to gather labelled data on content quality @richlawre
  • 62. It will then likely use that to find features of ‘good’ and ‘bad’ content @richlawre
  • 63. And creates algorithms to distinguish between the two @richlawre
  • 64. You can do the same! @richlawre
  • 65. Get your own labelled content and create your own scoring algorithms @richlawre
  • 66. We have created a ‘Needs Met’ score within Sanity Studio @richlawre
  • 67. So that you can get an indication of content calibre directly in your publishing workflow @richlawre
  • 68. Contact us to get more info about the beta here: bit.ly/sanity-beta @richlawre
  • 69. Richard Lawrence Principal at Sanity.io @richlawre @richlawre