In this session, we discuss a site with large scale content quality issues caused by UGC and the process for developing and implementing a solution. Facing the need to review over 5 million pages, we had to develop a process that didn't require manual review, break site functionality that relied on these pages, or sacrifice valuable traffic. This session walks you through the creative brainstorming, and plenty of trial & error, that ultimately led to a novel and scalable fix that could be quickly implemented. We then look at how AI is changing the way we approach these types of problems.
5. #family #techseo #education#puzzles #music #food
Co-Founder at Hive Digital, an Award-winning agency
amplifying the impact of companies changing the world
Jake Bohall
@jakebohall
Specialties:
- Quickly identifying marketing inefficiencies
- Developing/Improving processes for in-house teams
- Solving complex SEO issues that require cross-
department coordination
- Learning from mistakes…
10. Search Engines Are “People” Too
@jakebohall
Awarenes
s
Interest
Desire
Action
1
3 takeaways from this session:
11. Complex Problems Don’t Require Complex Solutions
3 takeaways from this session:
@jakebohall
2
12. Use AI to Assist with Content Validation
@jakebohall
3
3 takeaways from this session:
13. @jakebohall
Crawling / Discovery
Technical SEO - Ensure
search engines can
understand your content
and index it for users to
see
Hierarchy of SEO
Content / Relevance
Content SEO - Create
the link between search
intent and conversion
through targeted
messaging
Experience / Vitals
UX SEO - Ensure a safe,
stable, and frustration
free interaction with
website content
Links / Authority
Offsite SEO - Establish
brand dominance for
core to business topic
visibility
14. Search Engines are People Too
@jakebohall
Awareness
Interest
Desire
Action
1
17. @jakebohall
Awareness
Discovered / Crawled
Ensure search engines
can find your content
AIDA Model Applied to Search Engines
Interest
Presented / Indexed
Create content that is
unique and brings value Desire
Recommended / Ranked
Users are searching for
what you offer, content
meets intent, has authority
Action
Clicked
Titles & Descriptions are
compelling to users
28. @jakebohall
1. Tons of pages,
2. Garbage Content
3. Keyword Variants in the URL
Are You This Person?
29. Challenge: User Generated Content Site
● Users can upload their images to the site
● Users tag the images.
● There is no moderation on tags.
● Tag pages drive largest amount of traffic
@jakebohall
31. ● Prep: Update DB to create canonical and noindex flags
● Run all tags through CPC tools - Noindex All $0/cpc
● Used Aspell to find mappings of one term to another for
simple misspelling issues
@jakebohall
Only 150k tags impacted :(
Hotfix: Triage Step 1
32. What can we use?
● Predicted Traffic Volume (keyword+product)
● Unique Visitors -- Last year
● Tag Count -- Weighting
● Porter Stemming -- Root Words
● Lemmatization -- twinword.com/api/lemmatizer.php
● Jaro-Winkler Scores -- Edit distance
● Keyword Planner Grouping -- Same data
@jakebohall
37. Create Buckets for Action
● Safe - Do nothing
● Map to Something Else - Redirect
● Delete
Create a heuristic for each bucket…
@jakebohall
38. Impact:
This group (491k) represented 5.3m of actual traffic in past
year
@jakebohall
Criteria:
- Matches wikipedia
- Has Traffic
- Has CPC Value
- Is Centroid
Action:
- These terms should be linked directly to
their results page
- No change in usage
39. Impact:
This group (254k) represented 145k of actual traffic in past
year
@jakebohall
Criteria:
- Almost a good tag
- High volume, but not a
centroid, or wiki
destination
Action:
- Tags will remain in the database, any
internal links will display text as anchor
- The internal link will point to the
corrected tag ID
- Results page canonical to corrected ID
40. Impact:
This group (384k) represented 68k of actual traffic in past
year
@jakebohall
Criteria:
- No match to
wikipedia or vector
- Traffic volume without
product volume
- Frequent usage
Action:
- These tags should be left in place for the
artwork
- Print onto the artwork page,
- Do not link to or create any search
results/tags pages
41. Impact:
This group (114k) provided 150 tracked visitors in the past
year
@jakebohall
Criteria:
- No volume
- Wikipedia match or
Vector match is
a good tag
Action:
- These tags need to be replaced on each
piece of artwork with the corrected tag
- Original tag removed from all
artwork/circulation
- Tag should not be displayed anywhere.
42. Impact:
This group (76k) did not provide any tracked traffic in the
past year
@jakebohall
Criteria:
- No traffic
- Has CPC value
- Positive Indicators
Action:
These tags should not display, but are not
considered bad...
Essentially Content only tags, but these have
less confidence
43. Impact:
This group (3.93m) did not provide any tracked traffic in
the past year
@jakebohall
Criteria:
- No traffic
- No mappings,
- Low volume usage
Action:
These tags simply need to be
destroyed.
They represent unmappable tags
without traffic potential
45. @jakebohall
SEO Win -
Improve Crawl
Efficiency
With fewer pages
to traverse, full
crawls cycle faster
decreasing time
to discovery
Image Credit | Verisign Blog
Eliminate misleading/confusing content
46. @jakebohall
Percentage of URLs crawled at each page level improves.
Improvement in crawl budget has allowed Google to crawl deeper into our site.
Before After
47. @jakebohall
SEO Win -
Consolidate
Authority
Fewer pages means
stronger weighting
on remaining pages
and decreased
keyword
cannibalization
Image Credit | Botify
Eliminate misleading/confusing content
48. @jakebohall
Increase in the number of active URLs on Google indicates that a larger number of our pages
are receiving traffic than prior. This supports our efforts toward increasing authority and
greater trust across the site as a whole.
After
Before
66. The Hive Digital Mission
Hive Digital empowers globally responsible companies, organizations, and
individuals by amplifying their message through innovative digital marketing.
We use integrity, transparency and humility to promote a nurturing
environment, which translates to growth for our clients and our company.
We hope you’ll let us be a part of your change in the world.
HiveDigital.com @HiveMarketing 800-650-0820