sdf
Improving Content
Quality at Scale
Presented by: Jake Bohall
Marketing for a Better World
@jakebohall
@jakebohall
Why are
you here?
@jakebohall
This talk is for you!
@jakebohall
Why am I
here?
#family #techseo #education#puzzles #music #food
Co-Founder at Hive Digital, an Award-winning agency
amplifying the impact of companies changing the world
Jake Bohall
@jakebohall
Specialties:
- Quickly identifying marketing inefficiencies
- Developing/Improving processes for in-house teams
- Solving complex SEO issues that require cross-
department coordination
- Learning from mistakes…
@jakebohall
DISCLAIMER…
I’m Not That Technical
@jakebohall
The
SkAI
is
falling!
Image credit: https://www.craiyon.com/
@jakebohall
You just disappoint more people
If your product/service/site/etc.
sucks, ranking #1 doesn’t fix it..
@jakebohall
Image Credit | https://brianaripley.com/
Prevention
Cure
Search Engines Are “People” Too
@jakebohall
Awarenes
s
Interest
Desire
Action
1
3 takeaways from this session:
Complex Problems Don’t Require Complex Solutions
3 takeaways from this session:
@jakebohall
2
Use AI to Assist with Content Validation
@jakebohall
3
3 takeaways from this session:
@jakebohall
Crawling / Discovery
Technical SEO - Ensure
search engines can
understand your content
and index it for users to
see
Hierarchy of SEO
Content / Relevance
Content SEO - Create
the link between search
intent and conversion
through targeted
messaging
Experience / Vitals
UX SEO - Ensure a safe,
stable, and frustration
free interaction with
website content
Links / Authority
Offsite SEO - Establish
brand dominance for
core to business topic
visibility
Search Engines are People Too
@jakebohall
Awareness
Interest
Desire
Action
1
@jakebohall
@jakebohall
Google Bot as a User
@jakebohall
Awareness
Discovered / Crawled
Ensure search engines
can find your content
AIDA Model Applied to Search Engines
Interest
Presented / Indexed
Create content that is
unique and brings value Desire
Recommended / Ranked
Users are searching for
what you offer, content
meets intent, has authority
Action
Clicked
Titles & Descriptions are
compelling to users
@jakebohall
@jakebohall
@jakebohall
@jakebohall
Typical Content Audit Discoveries
1. Old Seasonal Content - /best-x-2015
2. Spam - /get-your-gucci, /buy-me, /buy-me-2
3. Outdated Info - /v1-pricing, /submit-directories
4. Brokenness - /broke-template, /broke-embed
5. Management - Forces inventory and guides future
6. Opportunity - /calculator, /data
@jakebohall
@jakebohall
Complex Problems Don’t Require Complex Solutions
2
@jakebohall
@jakebohall
Do a “Conversion Influence” Audit!
Analytics segment - pages with conversion in session
@jakebohall
Do a “Conversion Influence” Audit!
Quickly identify pages
with more/less
influence on
conversions… and # of
visitors impacted
@jakebohall
Can We Delete Content?
@jakebohall
1. Tons of pages,
2. Garbage Content
3. Keyword Variants in the URL
Are You This Person?
Challenge: User Generated Content Site
● Users can upload their images to the site
● Users tag the images.
● There is no moderation on tags.
● Tag pages drive largest amount of traffic
@jakebohall
5,000,000 Pages
sail boat, sailboot, sailboats, boat sails, boat sailing, sailing boats, sail boats, sail boating, sailboating,
sail-boats, sail-boating, sail boat, a sail boat, 2 sail boats, 1 sail boat, sail boat ocean ….
@jakebohall
● Prep: Update DB to create canonical and noindex flags
● Run all tags through CPC tools - Noindex All $0/cpc
● Used Aspell to find mappings of one term to another for
simple misspelling issues
@jakebohall
Only 150k tags impacted :(
Hotfix: Triage Step 1
What can we use?
● Predicted Traffic Volume (keyword+product)
● Unique Visitors -- Last year
● Tag Count -- Weighting
● Porter Stemming -- Root Words
● Lemmatization -- twinword.com/api/lemmatizer.php
● Jaro-Winkler Scores -- Edit distance
● Keyword Planner Grouping -- Same data
@jakebohall
@jakebohall
Jaccard Indexing -- Product overlap
@jakebohall
Kmeans Clustering -- Visual groupings
@jakebohall
@jakebohall
Wikipedia Searches -- Redirects & Disambiguation
Create Buckets for Action
● Safe - Do nothing
● Map to Something Else - Redirect
● Delete
Create a heuristic for each bucket…
@jakebohall
Impact:
This group (491k) represented 5.3m of actual traffic in past
year
@jakebohall
Criteria:
- Matches wikipedia
- Has Traffic
- Has CPC Value
- Is Centroid
Action:
- These terms should be linked directly to
their results page
- No change in usage
Impact:
This group (254k) represented 145k of actual traffic in past
year
@jakebohall
Criteria:
- Almost a good tag
- High volume, but not a
centroid, or wiki
destination
Action:
- Tags will remain in the database, any
internal links will display text as anchor
- The internal link will point to the
corrected tag ID
- Results page canonical to corrected ID
Impact:
This group (384k) represented 68k of actual traffic in past
year
@jakebohall
Criteria:
- No match to
wikipedia or vector
- Traffic volume without
product volume
- Frequent usage
Action:
- These tags should be left in place for the
artwork
- Print onto the artwork page,
- Do not link to or create any search
results/tags pages
Impact:
This group (114k) provided 150 tracked visitors in the past
year
@jakebohall
Criteria:
- No volume
- Wikipedia match or
Vector match is
a good tag
Action:
- These tags need to be replaced on each
piece of artwork with the corrected tag
- Original tag removed from all
artwork/circulation
- Tag should not be displayed anywhere.
Impact:
This group (76k) did not provide any tracked traffic in the
past year
@jakebohall
Criteria:
- No traffic
- Has CPC value
- Positive Indicators
Action:
These tags should not display, but are not
considered bad...
Essentially Content only tags, but these have
less confidence
Impact:
This group (3.93m) did not provide any tracked traffic in
the past year
@jakebohall
Criteria:
- No traffic
- No mappings,
- Low volume usage
Action:
These tags simply need to be
destroyed.
They represent unmappable tags
without traffic potential
Get more details:
https://moz.com/blog/tag-sprawl
@jakebohall
SEO Win -
Improve Crawl
Efficiency
With fewer pages
to traverse, full
crawls cycle faster
decreasing time
to discovery
Image Credit | Verisign Blog
Eliminate misleading/confusing content
@jakebohall
Percentage of URLs crawled at each page level improves.
Improvement in crawl budget has allowed Google to crawl deeper into our site.
Before After
@jakebohall
SEO Win -
Consolidate
Authority
Fewer pages means
stronger weighting
on remaining pages
and decreased
keyword
cannibalization
Image Credit | Botify
Eliminate misleading/confusing content
@jakebohall
Increase in the number of active URLs on Google indicates that a larger number of our pages
are receiving traffic than prior. This supports our efforts toward increasing authority and
greater trust across the site as a whole.
After
Before
@jakebohall
Eliminate misleading/confusing content
SEO Win - Minimize
Risk to Algorithm
Updates
Removing low
quality content
improves overall site
quality metrics
+ Experience
@jakebohall
3
You Can Use AI to Assist with Content Validation
@jakebohall
@jakebohall
@jakebohall
@jakebohall
@jakebohall
@jakebohall
@jakebohall
@jakebohall
@jakebohall
@jakebohall
@jakebohall
Search Engines Are “People” Too
@jakebohall
Awarenes
s
Interest
Desire
Action
1
Don’t Forget
Complex Problems Don’t Require Complex Solutions
Don’t Forget
@jakebohall
2
You Can Use AI to Assist with Content Validation
@jakebohall
3
Don’t Forget
@jakebohall
How can we help you?
The Hive Digital Mission
Hive Digital empowers globally responsible companies, organizations, and
individuals by amplifying their message through innovative digital marketing.
We use integrity, transparency and humility to promote a nurturing
environment, which translates to growth for our clients and our company.
We hope you’ll let us be a part of your change in the world.
HiveDigital.com @HiveMarketing 800-650-0820

Improving Content Quality at Scale and with AI - Pubcon Austin 2023

  • 1.
    sdf Improving Content Quality atScale Presented by: Jake Bohall Marketing for a Better World @jakebohall
  • 2.
  • 3.
  • 4.
  • 5.
    #family #techseo #education#puzzles#music #food Co-Founder at Hive Digital, an Award-winning agency amplifying the impact of companies changing the world Jake Bohall @jakebohall Specialties: - Quickly identifying marketing inefficiencies - Developing/Improving processes for in-house teams - Solving complex SEO issues that require cross- department coordination - Learning from mistakes…
  • 6.
  • 7.
  • 8.
    @jakebohall You just disappointmore people If your product/service/site/etc. sucks, ranking #1 doesn’t fix it..
  • 9.
    @jakebohall Image Credit |https://brianaripley.com/ Prevention Cure
  • 10.
    Search Engines Are“People” Too @jakebohall Awarenes s Interest Desire Action 1 3 takeaways from this session:
  • 11.
    Complex Problems Don’tRequire Complex Solutions 3 takeaways from this session: @jakebohall 2
  • 12.
    Use AI toAssist with Content Validation @jakebohall 3 3 takeaways from this session:
  • 13.
    @jakebohall Crawling / Discovery TechnicalSEO - Ensure search engines can understand your content and index it for users to see Hierarchy of SEO Content / Relevance Content SEO - Create the link between search intent and conversion through targeted messaging Experience / Vitals UX SEO - Ensure a safe, stable, and frustration free interaction with website content Links / Authority Offsite SEO - Establish brand dominance for core to business topic visibility
  • 14.
    Search Engines arePeople Too @jakebohall Awareness Interest Desire Action 1
  • 15.
  • 16.
  • 17.
    @jakebohall Awareness Discovered / Crawled Ensuresearch engines can find your content AIDA Model Applied to Search Engines Interest Presented / Indexed Create content that is unique and brings value Desire Recommended / Ranked Users are searching for what you offer, content meets intent, has authority Action Clicked Titles & Descriptions are compelling to users
  • 18.
  • 19.
  • 20.
  • 21.
    @jakebohall Typical Content AuditDiscoveries 1. Old Seasonal Content - /best-x-2015 2. Spam - /get-your-gucci, /buy-me, /buy-me-2 3. Outdated Info - /v1-pricing, /submit-directories 4. Brokenness - /broke-template, /broke-embed 5. Management - Forces inventory and guides future 6. Opportunity - /calculator, /data
  • 22.
  • 23.
    @jakebohall Complex Problems Don’tRequire Complex Solutions 2
  • 24.
  • 25.
    @jakebohall Do a “ConversionInfluence” Audit! Analytics segment - pages with conversion in session
  • 26.
    @jakebohall Do a “ConversionInfluence” Audit! Quickly identify pages with more/less influence on conversions… and # of visitors impacted
  • 27.
  • 28.
    @jakebohall 1. Tons ofpages, 2. Garbage Content 3. Keyword Variants in the URL Are You This Person?
  • 29.
    Challenge: User GeneratedContent Site ● Users can upload their images to the site ● Users tag the images. ● There is no moderation on tags. ● Tag pages drive largest amount of traffic @jakebohall
  • 30.
    5,000,000 Pages sail boat,sailboot, sailboats, boat sails, boat sailing, sailing boats, sail boats, sail boating, sailboating, sail-boats, sail-boating, sail boat, a sail boat, 2 sail boats, 1 sail boat, sail boat ocean …. @jakebohall
  • 31.
    ● Prep: UpdateDB to create canonical and noindex flags ● Run all tags through CPC tools - Noindex All $0/cpc ● Used Aspell to find mappings of one term to another for simple misspelling issues @jakebohall Only 150k tags impacted :( Hotfix: Triage Step 1
  • 32.
    What can weuse? ● Predicted Traffic Volume (keyword+product) ● Unique Visitors -- Last year ● Tag Count -- Weighting ● Porter Stemming -- Root Words ● Lemmatization -- twinword.com/api/lemmatizer.php ● Jaro-Winkler Scores -- Edit distance ● Keyword Planner Grouping -- Same data @jakebohall
  • 33.
  • 34.
  • 35.
  • 36.
    @jakebohall Wikipedia Searches --Redirects & Disambiguation
  • 37.
    Create Buckets forAction ● Safe - Do nothing ● Map to Something Else - Redirect ● Delete Create a heuristic for each bucket… @jakebohall
  • 38.
    Impact: This group (491k)represented 5.3m of actual traffic in past year @jakebohall Criteria: - Matches wikipedia - Has Traffic - Has CPC Value - Is Centroid Action: - These terms should be linked directly to their results page - No change in usage
  • 39.
    Impact: This group (254k)represented 145k of actual traffic in past year @jakebohall Criteria: - Almost a good tag - High volume, but not a centroid, or wiki destination Action: - Tags will remain in the database, any internal links will display text as anchor - The internal link will point to the corrected tag ID - Results page canonical to corrected ID
  • 40.
    Impact: This group (384k)represented 68k of actual traffic in past year @jakebohall Criteria: - No match to wikipedia or vector - Traffic volume without product volume - Frequent usage Action: - These tags should be left in place for the artwork - Print onto the artwork page, - Do not link to or create any search results/tags pages
  • 41.
    Impact: This group (114k)provided 150 tracked visitors in the past year @jakebohall Criteria: - No volume - Wikipedia match or Vector match is a good tag Action: - These tags need to be replaced on each piece of artwork with the corrected tag - Original tag removed from all artwork/circulation - Tag should not be displayed anywhere.
  • 42.
    Impact: This group (76k)did not provide any tracked traffic in the past year @jakebohall Criteria: - No traffic - Has CPC value - Positive Indicators Action: These tags should not display, but are not considered bad... Essentially Content only tags, but these have less confidence
  • 43.
    Impact: This group (3.93m)did not provide any tracked traffic in the past year @jakebohall Criteria: - No traffic - No mappings, - Low volume usage Action: These tags simply need to be destroyed. They represent unmappable tags without traffic potential
  • 44.
  • 45.
    @jakebohall SEO Win - ImproveCrawl Efficiency With fewer pages to traverse, full crawls cycle faster decreasing time to discovery Image Credit | Verisign Blog Eliminate misleading/confusing content
  • 46.
    @jakebohall Percentage of URLscrawled at each page level improves. Improvement in crawl budget has allowed Google to crawl deeper into our site. Before After
  • 47.
    @jakebohall SEO Win - Consolidate Authority Fewerpages means stronger weighting on remaining pages and decreased keyword cannibalization Image Credit | Botify Eliminate misleading/confusing content
  • 48.
    @jakebohall Increase in thenumber of active URLs on Google indicates that a larger number of our pages are receiving traffic than prior. This supports our efforts toward increasing authority and greater trust across the site as a whole. After Before
  • 49.
    @jakebohall Eliminate misleading/confusing content SEOWin - Minimize Risk to Algorithm Updates Removing low quality content improves overall site quality metrics + Experience
  • 50.
    @jakebohall 3 You Can UseAI to Assist with Content Validation
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
    Search Engines Are“People” Too @jakebohall Awarenes s Interest Desire Action 1 Don’t Forget
  • 63.
    Complex Problems Don’tRequire Complex Solutions Don’t Forget @jakebohall 2
  • 64.
    You Can UseAI to Assist with Content Validation @jakebohall 3 Don’t Forget
  • 65.
  • 66.
    The Hive DigitalMission Hive Digital empowers globally responsible companies, organizations, and individuals by amplifying their message through innovative digital marketing. We use integrity, transparency and humility to promote a nurturing environment, which translates to growth for our clients and our company. We hope you’ll let us be a part of your change in the world. HiveDigital.com @HiveMarketing 800-650-0820