How to Use Your
Products to
Subcategorise Your
Website with Python
Lee Foot | Search Solved
@LeeFootSEO
@LeeFootSEO | #BrightonSEO
About Me
Ten Years
Experience
as a
Technical
SEO
@LeeFootSEO | #BrightonSEO
Founded
SearchSolved.co.uk
three years ago
which focuses on
eCommerce &
enterprise SEO
@LeeFootSEO | #BrightonSEO
Last Year We Won
the Drum Marketing
Awards in the
Retail and
eCommerce category
for search
@LeeFootSEO | #BrightonSEO
Learning
Python for
12 months
@LeeFootSEO | #BrightonSEO
What is Python?
What is Python?
Python is a high level programming
language which is perfect for
automating repetitive tasks
What is Python?
Python is a high level programming
language which is perfect for
automating repetitive tasks
Very popular in the data science
community
What is Python?
Python is a high level programming
language which is perfect for
automating repetitive tasks
Very popular in the data science
community
Becoming very popular with technical
SEOs
Especially for data blending and
automation
@LeeFootSEO | #BrightonSEO
Agenda for
Today
Agenda for
Today
The benefits of
subcategorisation
@LeeFootSEO | #BrightonSEO
The benefits of
subcategorisation
What you’ll need
@LeeFootSEO | #BrightonSEO
Agenda for
Today
The benefits of
subcategorisation
What you’ll need
The process
@LeeFootSEO | #BrightonSEO
Agenda for
Today
The benefits of
subcategorisation
What you’ll need
The process
The script output
@LeeFootSEO | #BrightonSEO
Agenda for
Today
The benefits of
subcategorisation
What you’ll need
The process
The script output
Limitations
@LeeFootSEO | #BrightonSEO
Agenda for
Today
Benefits of Subcategorisation
Subcategorisation
is one of the
most effective
ways to win more
traffic from
search engines
Yet it is often under utilised
or not used to full effect
SOFAS
This sofa
category
contains a
many types
of sofas
listed in a
single
category
LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
Grouping each
product type
into
subcategories
would better
align them to
search demand
LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
Creating
three new
subcategories
would create
an additional
21,000+
searches a
month*
+19,000 +1,200 +150
*source ahrefs.com
LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
This method
will produce
a lot of
additional
traffic for
any
eCommerce
site
+19,000 +1,200 +150
It’s great
for users
too!
There is
a problem
though ..
@LeeFootSEO | #BrightonSEO
Current methods to
find this
opportunity are
slow, manual &
labour-intensive
@LeeFootSEO | #BrightonSEO
It usually
involves using
keyword data to
eyeball the
opportunity.
@LeeFootSEO | #BrightonSEO
Low Hanging Fruit
@LeeFootSEO | #Brighton
enough to
catch the
obvious
opportunitie
s but leaves
a lot on the
table and
doesn’t
scale.
@LeeFootSEO | #BrightonSEO
Sometimes
keyword data
can suggest an
opportunity –
when there
aren’t enough
products to
support a new
subcategory
We realised
their must
be a better
way to do
this
We wrote a Python
script to automate
the process and do
the hard work for
us!
@LeeFootSEO | #BrightonSEO
LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
The
products
suggest
the
categorie
s for us!
+19,000 +1,200 +150
Leather Buttoned Sofa
Mid Century Leather
Sofa
Tetbury Leather Sofa -
Black
Hardwick Leather Sofa
Tetbury Leather
Sofa - Tan
@LeeFootSEO | #BrightonSEO
LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
By clustering
the product
names together
our script was
able to find
opportunities
for new
categories
+19,000 +1,200 +150
Leather Buttoned Sofa
Mid Century Leather
Sofa
Tetbury Leather Sofa -
Black
Hardwick Leather Sofa
Tetbury Leather
Sofa - Tan
@LeeFootSEO | #BrightonSEO
Total Opportunity – Cox &
Cox
New Subcategories: 185
Search Volume: 1,400,000
@LeeFoot@SEO | #BrightonSEO
In testing we ran
the script on
Homebase and found
opportunity to
create
1,650
subcategories with
over
13,000,000
estimated monthly
searches
@LeeFootSEO | #BrightonSEO
This would take
a LONG time to
do manually!
(Assuming you
could work as
efficiently as
a computer!)
@LeeFootSEO | #BrightonSEO
At the end of
this talk I’m
going to share
this script
with
instructions so
you can use it
on your own
Websites
@LeeFootSEO | #BrightonSEO
The
Mission
@LeeFootSEO | #BrightonSEO
The
Mission
@LeeFootSEO | #BrightonSEO
Automatically
create new
subcategories
by clustering
product names
together
The
Method
We’ll be using
Python and the
NLTK library to
generate hundreds
of thousands of N-
gram combinations
from product names
@LeeFootSEO | #BrightonSEO
What
Are N-
grams?
N-grams are
combinations of
adjacent words
or letters of
n-length
@LeeFootSEO | #BrightonSEO
The
Challenge
Using this method
to generate so
many n-grams will
create a lot of
non-sensical
words in the
process
@LeeFootSEO | #BrightonSEO
The
Challenge
The goal is to
keep only
relevant
keywords with
commercial
value and
discard the@LeeFootSEO | #BrightonSEO
The
Challeng
e
At a high level
our solution to
this problem is to
check the keywords
for search volume
& CPC data
@LeeFootSEO | #BrightonSEO
The
Challeng
e
If they have
neither Search
Volume or CPC
data then those
keywords will be
discarded before
the final output
@LeeFootSEO | #BrightonSEO
The
Challeng
e
Ideally
without
burning
through a ton
of API credits
in the @LeeFootSEO | #BrightonSEO
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables
self
@LeeFootSEO | #BrightonSEO
Examples of N-Grams the Script will
Generate from clustering product nam
@LeeFootSEO | #BrightonSEO
Only one of these
suggestions has commercial
value
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables
self
@LeeFootSEO | #BrightonSEO
Our goal is to programmatically
discard the non-sensical ones and
keep any with commercial value
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables
self
@LeeFootSEO | #BrightonSEO
So Let’s Check for Search
Volume!
aa alkaline(20)
aa alkaline batteries(80)
aa alkaline batteries command(0)
aa alkaline batteries command adjustables(0)
aa alkaline batteries command adjustables
self(0)
Everything is Red will be
discarded automatically because
they have no search volume
aa alkaline (20)
aa alkaline batteries (80)
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables
self
@LeeFootSEO | #BrightonSEO
Checking n-grams for keyword
volume does a lot of the hard
work but it’s not perfect
aa alkaline (20)
aa alkaline batteries (80)
@LeeFootSEO | #BrightonSEO
To deal with this we have included
pre and post configurable
filtering options
aa alkaline (20)
aa alkaline batteries (80)
@LeeFootSEO | #BrightonSEO
Keep Longest Word Fragment = True
Available Pre-filtering
Options
(Saving API Credits)
@LeeFootSEO | #BrightonSEO
Available Pre-filtering
Options
(Saving API Credits)
Match to a Minimum # of Products
@LeeFootSEO | #BrightonSEO
Available Pre-filtering
Options
(Saving API Credits)
Match to a Minimum # of Products
Use Search Console Data
@LeeFootSEO | #BrightonSEO
@LeeFootSEO | #BrightonSEO
Enabling
Filtering Options
Reduces API
Credit Spend by
Around 95%
Available Post-filtering
Options
(Reducing QA Time)
@LeeFootSEO | #BrightonSEO
Available Post-filtering
Options
(Reducing QA Time)
Keep Longest Word Fragment
@LeeFootSEO | #BrightonSEO
Available Post-filtering
Options
(Reducing QA Time)
Keep Longest Word Fragment
Set Minimum Search Volume / CPC
@LeeFootSEO | #BrightonSEO
Available Post-filtering
Options
(Reducing QA Time)
Keep Longest Word Fragment
Set Minimum Search Volume / CPC
Fuzzy Matching to Existing
Categories
@LeeFootSEO | #BrightonSEO
Getting Started
You Will Need
@LeeFootSEO | #BrightonSEO
You Will Need
Screaming Frog – To crawl the site
@LeeFootSEO | #BrightonSEO
You Will Need
Screaming Frog – To crawl the site
Keywords Everywhere API – To check search
volume ($10 for 100,000 creds)
@LeeFootSEO | #BrightonSEO
You Will Need
Screaming Frog – To crawl the site
Keywords Everywhere API – To check search
volume ($10 for 100,000 creds)
Python with the following libraries
imported
@LeeFootSEO | #BrightonSEO
You Will Need
Screaming Frog – To crawl the site
Keywords Everywhere API – To check search
volume ($10 for 100,000 creds)
Python with the following libraries
imported
NLTK – Used to create n-gram word
combinations
@LeeFootSEO | #BrightonSEO
You Will Need
Screaming Frog – To crawl the site
Keywords Everywhere API – To check search
volume ($10 for 100,000 creds)
Python with the following libraries
imported
NLTK – Used to create n-gram word
combinations
PolyFuzz – To match KWs to existing
categories
@LeeFootSEO | #BrightonSEO
Breaking Down the Process
Crawl
Cluster
Filter
Review
Crawl, Cluster, Filter &
Review
@LeeFootSEO | #BrightonSEO
Crawl
Crawl the
site using
Screaming
Frog with
two custom
extractions
@LeeFootSEO | #BrightonSEO
Crawl
This is to
identify
which pages
are products
@LeeFootSEO | #BrightonSEO
Crawl
And which
pages are
categories
@LeeFootSEO | #BrightonSEO
Crawl
The extractions
can be anything,
as long as the
extractor is
unique to each
page type.
@LeeFootSEO | #BrightonSEO
Crawl
For product
pages, that’s
usually the
price and for
category pages,
it’s usually a
sort parameter
@LeeFootSEO | #BrightonSEO
Crawl
Once the crawls
have finished
just export
all_Inlinks.csv
and
Internal_html.csv
@LeeFootSEO | #BrightonSEO
Trampolines
Parent Category
junior
trampolines
Auto Suggested
trampoline
accessory
kits
Auto Suggested
trampoline
covers
Auto Suggested
Exporting
inlinks allows
for subcategory
suggestions to
be associated
with their
parent
categories
automatically
@LeeFootSEO | #BrightonSEO
r
.csv exports are
read into Python
and processed with
the Natural
Language Tool Kit
library.
@LeeFootSEO | #BrightonSEO
Cluster
Product names are
clustered together
using n-grams to
generate new words
Keyword
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables self
aa alkaline batteries command adjustables self
adhesive
aa alkaline batteries duracell
aa alkaline batteries duracell optimum
aa alkaline batteries duracell optimum aa
aa alkaline batteries duracell optimum aa
batteries
aa alkaline batteries duracell plus
aa alkaline batteries duracell plus battery
aa alkaline batteries duracell plus battery pack
aa alkaline batteries duracell plus lr
aa alkaline batteries duracell plus lr aa
aa alkaline batteries duracell specialty
aa alkaline batteries duracell specialty alkaline
aa alkaline batteries duracell specialty alkaline
button
aa alkaline batteries energizer
aa alkaline batteries energizer maxplus
aa alkaline batteries energizer maxplus aa
aa alkaline batteries energizer maxplus aa
batteries
@LeeFootSEO | #BrightonSEO
Cluster
Products are
clustered category
by category (so if a
product lives in two
categories, it’ll be
clustered twice)
Keyword
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables self
aa alkaline batteries command adjustables self
adhesive
aa alkaline batteries duracell
aa alkaline batteries duracell optimum
aa alkaline batteries duracell optimum aa
aa alkaline batteries duracell optimum aa
batteries
aa alkaline batteries duracell plus
aa alkaline batteries duracell plus battery
aa alkaline batteries duracell plus battery pack
aa alkaline batteries duracell plus lr
aa alkaline batteries duracell plus lr aa
aa alkaline batteries duracell specialty
aa alkaline batteries duracell specialty alkaline
aa alkaline batteries duracell specialty alkaline
button
aa alkaline batteries energizer
aa alkaline batteries energizer maxplus
aa alkaline batteries energizer maxplus aa
aa alkaline batteries energizer maxplus aa
batteries
@LeeFootSEO | #BrightonSEO
Filterin
g
Clustering creates
many irrelevant
keywords which
will need to be
filtered
@LeeFoot@SEO | #BrightonSEO
@LeeFootSEO | #BrightonSEO
Filterin
g
We started by
generating over half
a million n-grams
using existing
products on
wilko.com
597,66
4
@LeeFoot@SEO | #BrightonSEO
@LeeFootSEO | #BrightonSEO
Filterin
g
34,000 were
matched to a
minimum of three
products and the
rest discarded
597,66
4
@LeeFoot@SEO | #BrightonSEO
34,100
@LeeFootSEO | #BrightonSEO
Filterin
g
Just under 9,000
keywords remained
after deduplication
These were then
checked for search
volume
597,66
4
@LeeFoot@SEO | #BrightonSEO
34,100
8,969
@LeeFootSEO | #BrightonSEO
Filterin
g
The final output
contained 1,883
subcategorisation
opportunities ready to
QA
597,66
4
@LeeFoot@SEO | #BrightonSEO
34,100
8,969
1,883
@LeeFootSEO | #BrightonSEO
Filterin
g
99.68% of all
keywords were
discarded before the
final output!
Essentially, we brute
forced the
opportunity
597,66
4
@LeeFoot@SEO | #BrightonSEO
34,100
8,969
1,883
@LeeFootSEO | #BrightonSEO
Typical Script
Output
Total Subcategories Generated : 597,6
Matched to Min of: 3 Products: 34,088
Remaining after de-duplication: 8,969
Subcategories with Search Volume: 1,8
Total Volume: 8,023,629
Discarded: 99.68 % of Keywords!
Completed in: 16.15 Minutes
@LeeFootSEO | #BrightonSEO
Quality Review
The final
shortlisted n-
grams are now
ready for the QA
process
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Vol CPC Products
Similarit
y Closest Matched Category
/outdoor-toys/climbing-
frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-toys/climbing-
frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-
swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-
swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-
swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-toys/ride-on-
toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-
toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
/outdoor-toys/trampolines.list
trampoline accessory
kits 70 0.26 4 69% accessory d-line
Output
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-toys/climbing-frames.list
wooden
climbing
frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list
double swing
sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list
single swing
sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list
wooden swing
sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-toys/ride-on-toys.list
pro stunt
scooters 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list
outdoor play
kitchens 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list
activity
tables 6,600 0.24 7 38% cavity wall
planter
tables 5,400
All of these
subcategory
suggestions
were created
automatically!
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category
/outdoor-
toys/climbing-
frames/ rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-
toys/climbing-
frames/
wooden
climbing
frames 90 0.78 3 72% climbing plants
/outdoor-
toys/garden-
swings/
double swing
sets 1,900 0.54 3 61% double beds
/outdoor-
toys/garden-
swings/
single swing
sets 1,000 0.32 4 58% garden swings
/outdoor-
toys/garden-
swings/
wooden swing
sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-
toys/ride-on-
toys/
pro stunt
scooter 320 0.61 5 20% protect garden
Subcategory
suggestions are
neatly tied back
to their parent
category
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category
/outdoor-
toys/climbing-
frames/ rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-
toys/climbing-
frames/
wooden
climbing
frames 90 0.78 3 72% climbing plants
/outdoor-
toys/garden-
swings/
double swing
sets 1,900 0.54 3 61% double beds
/outdoor-
toys/garden-
swings/
single swing
sets 1,000 0.32 4 58% garden swings
/outdoor-
toys/garden-
swings/
wooden swing
sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-
toys/ride-on-
toys/
pro stunt
scooter 320 0.61 5 20% protect garden
Double Swing Sets and
Single Swing Sets
have been placed in
the Garden Swings
parent category
automatically!
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Volume CPC # Products Similarity Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
/outdoor-toys/trampolines.list trampoline accessory kits 70 0.26 4 69% accessory d-line
/outdoor-toys/trampolines.list trampoline covers 2,900 0.19 4 78% trampolines
Search
volume and
CPC data
is
included
in the
output!
Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list water tables 27,100 0.37 3 59% 6 seater tables
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
It also shows
the number of
products
available to
populate the
new
categories!
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Volume CPC # Products Similarity Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 4 70% wooden garden swing seats
/outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 3 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 4 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list
water
tables
27,10
0 0.37
3 59% 6 seater table
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
/outdoor-toys/trampolines.list trampoline accessory kits 70 0.26 4 69% accessory d-line
Suggested categories with high
search demand, but low inventory
can signal that it could be time
to expand the range to tap into
the demand…
Low Inventory
High Demand
@LeeFootSEO | #BrightonSEO
Parent Category Suggested Subcategory Vol CPC # Products
Similarit
y Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73%
loft ladder new
ladders
/outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70%
wooden garden swing
seats
/outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47%
track set shop by
room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
All category suggestions
are fuzzy matched to
against existing
categories.
Categories which closely
match existing categories
(including plurals and
words out of order) are
removed automatically!
Limitations and
Considerations
The output is only
as good as the
naming conventions.
If product names are
short or non-
descriptive then
that’ll affect the
final output.
@LeeFootSEO | #BrightonSEO
Limitations and
Considerations
The script will output
keywords in the singular
tense
where as categories will
be pluralised because
they contain more than a
single product
@LeeFootSEO | #BrightonSEO
Limitations and
Considerations
A small amount
of clean up will
be needed to
change the tense
from singular to
plural
@LeeFootSEO | #BrightonSEO
Automation
This script can be automated
on a VPS in conjunction with
an automated crawl setup.
@LeeFootSEO | #BrightonSEO
Automation
Perhaps client work can be
road mapped every three
months with the output
automatically sent as an
email or a Slack channel
@LeeFootSEO | #BrightonSEO
Remixes and Mashups
I’d love to see some remixes,
mashups and improvements to the
script.
Just make sure you tag me in
anything you make!
@LeeFootSEO | #BrightonSEO
WHY
Running Through
Some ‘Why’s’
@LeeFootSEO | #BrightonSEO
WHY
Why use Screaming
Frog and not
build a dedicated
crawler?
@LeeFootSEO | #BrightonSEO
WHY
@LeeFootSEO | #BrightonSEO
Convenience, speed
and familiarity with
an industry standard
tool.
It meant I could
concentrate on the
script output from
the start
Question
@LeeFootSEO | #BrightonSEO
Do I need to
set custom
extractions in
Screaming
Frog?
Answer
@LeeFootSEO | #BrightonSEO
It’s was the
simplest way to
standardise the
script to work
with any
eCommerce
Website.
Question
@LeeFootSEO | #BrightonSEO
Where Can I
Download This
Script?
@LeeFootSEO | #BrightonSEO
SearchSolved.co.uk/python-subcats
You can find the full script with
instructions on our Website:
Getting Started with
Python
Don’t Wait🐍🔥
There is an awesome
community of SEOs Online who
are passionate about Python.
If you’re thinking about
getting started, come and
join us!
Python Resources
YouTube Channels
Corey Shafer
Data School
Socratica
MIT Introduction
to Computer
Science & Python
Apps
Solo Learn (Android
/ iPhone)
Books
Automate the Boring
Stuff
Python SEOs to follow on
Twitter
@GregBernhardt4
@DataChaz
@OritSiMu
@DanielHereMe
@LeeFootSEO | #BrightonSEO
@SEOPythonistas
@rvtheverett
@vdrweb
@LeeFootSEO 😃
Thank You
For Your
Attention
!
Feel free to DM
me any questions
or contact me
through our
Website.

How to Automatically Subcategorise Your Website Automatically With Python

  • 1.
    How to UseYour Products to Subcategorise Your Website with Python Lee Foot | Search Solved @LeeFootSEO @LeeFootSEO | #BrightonSEO
  • 2.
    About Me Ten Years Experience asa Technical SEO @LeeFootSEO | #BrightonSEO
  • 3.
    Founded SearchSolved.co.uk three years ago whichfocuses on eCommerce & enterprise SEO @LeeFootSEO | #BrightonSEO
  • 4.
    Last Year WeWon the Drum Marketing Awards in the Retail and eCommerce category for search @LeeFootSEO | #BrightonSEO
  • 5.
  • 6.
  • 7.
    What is Python? Pythonis a high level programming language which is perfect for automating repetitive tasks
  • 8.
    What is Python? Pythonis a high level programming language which is perfect for automating repetitive tasks Very popular in the data science community
  • 9.
    What is Python? Pythonis a high level programming language which is perfect for automating repetitive tasks Very popular in the data science community Becoming very popular with technical SEOs Especially for data blending and automation
  • 10.
  • 11.
    Agenda for Today The benefitsof subcategorisation @LeeFootSEO | #BrightonSEO
  • 12.
    The benefits of subcategorisation Whatyou’ll need @LeeFootSEO | #BrightonSEO Agenda for Today
  • 13.
    The benefits of subcategorisation Whatyou’ll need The process @LeeFootSEO | #BrightonSEO Agenda for Today
  • 14.
    The benefits of subcategorisation Whatyou’ll need The process The script output @LeeFootSEO | #BrightonSEO Agenda for Today
  • 15.
    The benefits of subcategorisation Whatyou’ll need The process The script output Limitations @LeeFootSEO | #BrightonSEO Agenda for Today
  • 16.
  • 17.
    Subcategorisation is one ofthe most effective ways to win more traffic from search engines
  • 18.
    Yet it isoften under utilised or not used to full effect
  • 19.
    SOFAS This sofa category contains a manytypes of sofas listed in a single category
  • 20.
    LEATHER SOFAS VELVET SOFAS LOUNGESOFAS SOFAS Grouping each product type into subcategories would better align them to search demand
  • 21.
    LEATHER SOFAS VELVET SOFAS LOUNGESOFAS SOFAS Creating three new subcategories would create an additional 21,000+ searches a month* +19,000 +1,200 +150 *source ahrefs.com
  • 22.
    LEATHER SOFAS VELVET SOFAS LOUNGESOFAS SOFAS This method will produce a lot of additional traffic for any eCommerce site +19,000 +1,200 +150
  • 23.
  • 24.
    There is a problem though.. @LeeFootSEO | #BrightonSEO
  • 25.
    Current methods to findthis opportunity are slow, manual & labour-intensive @LeeFootSEO | #BrightonSEO
  • 26.
    It usually involves using keyworddata to eyeball the opportunity. @LeeFootSEO | #BrightonSEO
  • 27.
    Low Hanging Fruit @LeeFootSEO| #Brighton enough to catch the obvious opportunitie s but leaves a lot on the table and doesn’t scale.
  • 28.
    @LeeFootSEO | #BrightonSEO Sometimes keyworddata can suggest an opportunity – when there aren’t enough products to support a new subcategory
  • 29.
    We realised their must bea better way to do this
  • 30.
    We wrote aPython script to automate the process and do the hard work for us! @LeeFootSEO | #BrightonSEO
  • 31.
    LEATHER SOFAS VELVET SOFAS LOUNGESOFAS SOFAS The products suggest the categorie s for us! +19,000 +1,200 +150 Leather Buttoned Sofa Mid Century Leather Sofa Tetbury Leather Sofa - Black Hardwick Leather Sofa Tetbury Leather Sofa - Tan @LeeFootSEO | #BrightonSEO
  • 32.
    LEATHER SOFAS VELVET SOFAS LOUNGESOFAS SOFAS By clustering the product names together our script was able to find opportunities for new categories +19,000 +1,200 +150 Leather Buttoned Sofa Mid Century Leather Sofa Tetbury Leather Sofa - Black Hardwick Leather Sofa Tetbury Leather Sofa - Tan @LeeFootSEO | #BrightonSEO
  • 33.
    Total Opportunity –Cox & Cox New Subcategories: 185 Search Volume: 1,400,000 @LeeFoot@SEO | #BrightonSEO
  • 34.
    In testing weran the script on Homebase and found opportunity to create 1,650 subcategories with over 13,000,000 estimated monthly searches @LeeFootSEO | #BrightonSEO
  • 35.
    This would take aLONG time to do manually! (Assuming you could work as efficiently as a computer!) @LeeFootSEO | #BrightonSEO
  • 36.
    At the endof this talk I’m going to share this script with instructions so you can use it on your own Websites @LeeFootSEO | #BrightonSEO
  • 37.
  • 38.
    The Mission @LeeFootSEO | #BrightonSEO Automatically createnew subcategories by clustering product names together
  • 39.
    The Method We’ll be using Pythonand the NLTK library to generate hundreds of thousands of N- gram combinations from product names @LeeFootSEO | #BrightonSEO
  • 40.
    What Are N- grams? N-grams are combinationsof adjacent words or letters of n-length @LeeFootSEO | #BrightonSEO
  • 41.
    The Challenge Using this method togenerate so many n-grams will create a lot of non-sensical words in the process @LeeFootSEO | #BrightonSEO
  • 42.
    The Challenge The goal isto keep only relevant keywords with commercial value and discard the@LeeFootSEO | #BrightonSEO
  • 43.
    The Challeng e At a highlevel our solution to this problem is to check the keywords for search volume & CPC data @LeeFootSEO | #BrightonSEO
  • 44.
    The Challeng e If they have neitherSearch Volume or CPC data then those keywords will be discarded before the final output @LeeFootSEO | #BrightonSEO
  • 45.
    The Challeng e Ideally without burning through a ton ofAPI credits in the @LeeFootSEO | #BrightonSEO
  • 46.
    aa alkaline aa alkalinebatteries aa alkaline batteries command aa alkaline batteries command adjustables aa alkaline batteries command adjustables self @LeeFootSEO | #BrightonSEO Examples of N-Grams the Script will Generate from clustering product nam
  • 47.
    @LeeFootSEO | #BrightonSEO Onlyone of these suggestions has commercial value aa alkaline aa alkaline batteries aa alkaline batteries command aa alkaline batteries command adjustables aa alkaline batteries command adjustables self
  • 48.
    @LeeFootSEO | #BrightonSEO Ourgoal is to programmatically discard the non-sensical ones and keep any with commercial value aa alkaline aa alkaline batteries aa alkaline batteries command aa alkaline batteries command adjustables aa alkaline batteries command adjustables self
  • 49.
    @LeeFootSEO | #BrightonSEO SoLet’s Check for Search Volume! aa alkaline(20) aa alkaline batteries(80) aa alkaline batteries command(0) aa alkaline batteries command adjustables(0) aa alkaline batteries command adjustables self(0)
  • 50.
    Everything is Redwill be discarded automatically because they have no search volume aa alkaline (20) aa alkaline batteries (80) aa alkaline batteries command aa alkaline batteries command adjustables aa alkaline batteries command adjustables self @LeeFootSEO | #BrightonSEO
  • 51.
    Checking n-grams forkeyword volume does a lot of the hard work but it’s not perfect aa alkaline (20) aa alkaline batteries (80) @LeeFootSEO | #BrightonSEO
  • 52.
    To deal withthis we have included pre and post configurable filtering options aa alkaline (20) aa alkaline batteries (80) @LeeFootSEO | #BrightonSEO Keep Longest Word Fragment = True
  • 53.
    Available Pre-filtering Options (Saving APICredits) @LeeFootSEO | #BrightonSEO
  • 54.
    Available Pre-filtering Options (Saving APICredits) Match to a Minimum # of Products @LeeFootSEO | #BrightonSEO
  • 55.
    Available Pre-filtering Options (Saving APICredits) Match to a Minimum # of Products Use Search Console Data @LeeFootSEO | #BrightonSEO
  • 56.
    @LeeFootSEO | #BrightonSEO Enabling FilteringOptions Reduces API Credit Spend by Around 95%
  • 57.
    Available Post-filtering Options (Reducing QATime) @LeeFootSEO | #BrightonSEO
  • 58.
    Available Post-filtering Options (Reducing QATime) Keep Longest Word Fragment @LeeFootSEO | #BrightonSEO
  • 59.
    Available Post-filtering Options (Reducing QATime) Keep Longest Word Fragment Set Minimum Search Volume / CPC @LeeFootSEO | #BrightonSEO
  • 60.
    Available Post-filtering Options (Reducing QATime) Keep Longest Word Fragment Set Minimum Search Volume / CPC Fuzzy Matching to Existing Categories @LeeFootSEO | #BrightonSEO
  • 61.
  • 62.
  • 63.
    You Will Need ScreamingFrog – To crawl the site @LeeFootSEO | #BrightonSEO
  • 64.
    You Will Need ScreamingFrog – To crawl the site Keywords Everywhere API – To check search volume ($10 for 100,000 creds) @LeeFootSEO | #BrightonSEO
  • 65.
    You Will Need ScreamingFrog – To crawl the site Keywords Everywhere API – To check search volume ($10 for 100,000 creds) Python with the following libraries imported @LeeFootSEO | #BrightonSEO
  • 66.
    You Will Need ScreamingFrog – To crawl the site Keywords Everywhere API – To check search volume ($10 for 100,000 creds) Python with the following libraries imported NLTK – Used to create n-gram word combinations @LeeFootSEO | #BrightonSEO
  • 67.
    You Will Need ScreamingFrog – To crawl the site Keywords Everywhere API – To check search volume ($10 for 100,000 creds) Python with the following libraries imported NLTK – Used to create n-gram word combinations PolyFuzz – To match KWs to existing categories @LeeFootSEO | #BrightonSEO
  • 68.
  • 69.
    Crawl Cluster Filter Review Crawl, Cluster, Filter& Review @LeeFootSEO | #BrightonSEO
  • 70.
    Crawl Crawl the site using Screaming Frogwith two custom extractions @LeeFootSEO | #BrightonSEO
  • 71.
    Crawl This is to identify whichpages are products @LeeFootSEO | #BrightonSEO
  • 72.
  • 73.
    Crawl The extractions can beanything, as long as the extractor is unique to each page type. @LeeFootSEO | #BrightonSEO
  • 74.
    Crawl For product pages, that’s usuallythe price and for category pages, it’s usually a sort parameter @LeeFootSEO | #BrightonSEO
  • 75.
    Crawl Once the crawls havefinished just export all_Inlinks.csv and Internal_html.csv @LeeFootSEO | #BrightonSEO
  • 76.
    Trampolines Parent Category junior trampolines Auto Suggested trampoline accessory kits AutoSuggested trampoline covers Auto Suggested Exporting inlinks allows for subcategory suggestions to be associated with their parent categories automatically @LeeFootSEO | #BrightonSEO
  • 77.
    r .csv exports are readinto Python and processed with the Natural Language Tool Kit library. @LeeFootSEO | #BrightonSEO
  • 78.
    Cluster Product names are clusteredtogether using n-grams to generate new words Keyword aa alkaline aa alkaline batteries aa alkaline batteries command aa alkaline batteries command adjustables aa alkaline batteries command adjustables self aa alkaline batteries command adjustables self adhesive aa alkaline batteries duracell aa alkaline batteries duracell optimum aa alkaline batteries duracell optimum aa aa alkaline batteries duracell optimum aa batteries aa alkaline batteries duracell plus aa alkaline batteries duracell plus battery aa alkaline batteries duracell plus battery pack aa alkaline batteries duracell plus lr aa alkaline batteries duracell plus lr aa aa alkaline batteries duracell specialty aa alkaline batteries duracell specialty alkaline aa alkaline batteries duracell specialty alkaline button aa alkaline batteries energizer aa alkaline batteries energizer maxplus aa alkaline batteries energizer maxplus aa aa alkaline batteries energizer maxplus aa batteries @LeeFootSEO | #BrightonSEO
  • 79.
    Cluster Products are clustered category bycategory (so if a product lives in two categories, it’ll be clustered twice) Keyword aa alkaline aa alkaline batteries aa alkaline batteries command aa alkaline batteries command adjustables aa alkaline batteries command adjustables self aa alkaline batteries command adjustables self adhesive aa alkaline batteries duracell aa alkaline batteries duracell optimum aa alkaline batteries duracell optimum aa aa alkaline batteries duracell optimum aa batteries aa alkaline batteries duracell plus aa alkaline batteries duracell plus battery aa alkaline batteries duracell plus battery pack aa alkaline batteries duracell plus lr aa alkaline batteries duracell plus lr aa aa alkaline batteries duracell specialty aa alkaline batteries duracell specialty alkaline aa alkaline batteries duracell specialty alkaline button aa alkaline batteries energizer aa alkaline batteries energizer maxplus aa alkaline batteries energizer maxplus aa aa alkaline batteries energizer maxplus aa batteries @LeeFootSEO | #BrightonSEO
  • 80.
    Filterin g Clustering creates many irrelevant keywordswhich will need to be filtered @LeeFoot@SEO | #BrightonSEO @LeeFootSEO | #BrightonSEO
  • 81.
    Filterin g We started by generatingover half a million n-grams using existing products on wilko.com 597,66 4 @LeeFoot@SEO | #BrightonSEO @LeeFootSEO | #BrightonSEO
  • 82.
    Filterin g 34,000 were matched toa minimum of three products and the rest discarded 597,66 4 @LeeFoot@SEO | #BrightonSEO 34,100 @LeeFootSEO | #BrightonSEO
  • 83.
    Filterin g Just under 9,000 keywordsremained after deduplication These were then checked for search volume 597,66 4 @LeeFoot@SEO | #BrightonSEO 34,100 8,969 @LeeFootSEO | #BrightonSEO
  • 84.
    Filterin g The final output contained1,883 subcategorisation opportunities ready to QA 597,66 4 @LeeFoot@SEO | #BrightonSEO 34,100 8,969 1,883 @LeeFootSEO | #BrightonSEO
  • 85.
    Filterin g 99.68% of all keywordswere discarded before the final output! Essentially, we brute forced the opportunity 597,66 4 @LeeFoot@SEO | #BrightonSEO 34,100 8,969 1,883 @LeeFootSEO | #BrightonSEO
  • 86.
    Typical Script Output Total SubcategoriesGenerated : 597,6 Matched to Min of: 3 Products: 34,088 Remaining after de-duplication: 8,969 Subcategories with Search Volume: 1,8 Total Volume: 8,023,629 Discarded: 99.68 % of Keywords! Completed in: 16.15 Minutes @LeeFootSEO | #BrightonSEO
  • 87.
    Quality Review The final shortlistedn- grams are now ready for the QA process @LeeFootSEO | #BrightonSEO
  • 88.
    Parent Category SuggestedSubcategory Vol CPC Products Similarit y Closest Matched Category /outdoor-toys/climbing- frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor-toys/climbing- frames.list wooden climbing frames 90 0.78 3 72% climbing plants /outdoor-toys/garden- swings.list double swing sets 1,900 0.54 3 61% double beds /outdoor-toys/garden- swings.list single swing sets 1,000 0.32 4 58% garden swings /outdoor-toys/garden- swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor-toys/ride-on- toys.list pro stunt scooter 320 0.61 5 20% protect garden /outdoor-toys/role-play- toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens /outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall /outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters /outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover /outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables /outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room /outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines /outdoor-toys/trampolines.list trampoline accessory kits 70 0.26 4 69% accessory d-line Output @LeeFootSEO | #BrightonSEO
  • 89.
    Parent Category SuggestedSubcategory Vol CPC # Products Similarity Closest Matched Category /outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants /outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds /outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings /outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor-toys/ride-on-toys.list pro stunt scooters 320 0.61 5 20% protect garden /outdoor-toys/role-play-toys.list outdoor play kitchens 320 0.38 3 82% outdoor kitchens /outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall planter tables 5,400 All of these subcategory suggestions were created automatically! @LeeFootSEO | #BrightonSEO
  • 90.
    Parent Category SuggestedSubcategory Vol CPC # Products Similarity Closest Matched Category /outdoor- toys/climbing- frames/ rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor- toys/climbing- frames/ wooden climbing frames 90 0.78 3 72% climbing plants /outdoor- toys/garden- swings/ double swing sets 1,900 0.54 3 61% double beds /outdoor- toys/garden- swings/ single swing sets 1,000 0.32 4 58% garden swings /outdoor- toys/garden- swings/ wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor- toys/ride-on- toys/ pro stunt scooter 320 0.61 5 20% protect garden Subcategory suggestions are neatly tied back to their parent category @LeeFootSEO | #BrightonSEO
  • 91.
    Parent Category SuggestedSubcategory Vol CPC # Products Similarity Closest Matched Category /outdoor- toys/climbing- frames/ rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor- toys/climbing- frames/ wooden climbing frames 90 0.78 3 72% climbing plants /outdoor- toys/garden- swings/ double swing sets 1,900 0.54 3 61% double beds /outdoor- toys/garden- swings/ single swing sets 1,000 0.32 4 58% garden swings /outdoor- toys/garden- swings/ wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor- toys/ride-on- toys/ pro stunt scooter 320 0.61 5 20% protect garden Double Swing Sets and Single Swing Sets have been placed in the Garden Swings parent category automatically! @LeeFootSEO | #BrightonSEO
  • 92.
    Parent Category SuggestedSubcategory Volume CPC # Products Similarity Closest Matched Category /outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants /outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds /outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings /outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden /outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens /outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall /outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters /outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover /outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables /outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room /outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines /outdoor-toys/trampolines.list trampoline accessory kits 70 0.26 4 69% accessory d-line /outdoor-toys/trampolines.list trampoline covers 2,900 0.19 4 78% trampolines Search volume and CPC data is included in the output!
  • 93.
    Parent Category SuggestedSubcategory Vol CPC # Products Similarity Closest Matched Category /outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants /outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds /outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings /outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden /outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens /outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall /outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters /outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover /outdoor-toys/sandpits.list water tables 27,100 0.37 3 59% 6 seater tables /outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room /outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines It also shows the number of products available to populate the new categories! @LeeFootSEO | #BrightonSEO
  • 94.
    Parent Category SuggestedSubcategory Volume CPC # Products Similarity Closest Matched Category /outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants /outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds /outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings /outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 4 70% wooden garden swing seats /outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden /outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens /outdoor-toys/sandpits.list activity tables 6,600 0.24 3 38% cavity wall /outdoor-toys/sandpits.list planter tables 5,400 0.28 4 76% planters /outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover /outdoor-toys/sandpits.list water tables 27,10 0 0.37 3 59% 6 seater table /outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room /outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines /outdoor-toys/trampolines.list trampoline accessory kits 70 0.26 4 69% accessory d-line Suggested categories with high search demand, but low inventory can signal that it could be time to expand the range to tap into the demand… Low Inventory High Demand @LeeFootSEO | #BrightonSEO
  • 95.
    Parent Category SuggestedSubcategory Vol CPC # Products Similarit y Closest Matched Category /outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders /outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants /outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds /outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings /outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats /outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden /outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens /outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall /outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters /outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover /outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables /outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room /outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines All category suggestions are fuzzy matched to against existing categories. Categories which closely match existing categories (including plurals and words out of order) are removed automatically!
  • 96.
    Limitations and Considerations The outputis only as good as the naming conventions. If product names are short or non- descriptive then that’ll affect the final output. @LeeFootSEO | #BrightonSEO
  • 97.
    Limitations and Considerations The scriptwill output keywords in the singular tense where as categories will be pluralised because they contain more than a single product @LeeFootSEO | #BrightonSEO
  • 98.
    Limitations and Considerations A smallamount of clean up will be needed to change the tense from singular to plural @LeeFootSEO | #BrightonSEO
  • 99.
    Automation This script canbe automated on a VPS in conjunction with an automated crawl setup. @LeeFootSEO | #BrightonSEO
  • 100.
    Automation Perhaps client workcan be road mapped every three months with the output automatically sent as an email or a Slack channel @LeeFootSEO | #BrightonSEO
  • 101.
    Remixes and Mashups I’dlove to see some remixes, mashups and improvements to the script. Just make sure you tag me in anything you make! @LeeFootSEO | #BrightonSEO
  • 102.
  • 103.
    WHY Why use Screaming Frogand not build a dedicated crawler? @LeeFootSEO | #BrightonSEO
  • 104.
    WHY @LeeFootSEO | #BrightonSEO Convenience,speed and familiarity with an industry standard tool. It meant I could concentrate on the script output from the start
  • 105.
    Question @LeeFootSEO | #BrightonSEO DoI need to set custom extractions in Screaming Frog?
  • 106.
    Answer @LeeFootSEO | #BrightonSEO It’swas the simplest way to standardise the script to work with any eCommerce Website.
  • 107.
    Question @LeeFootSEO | #BrightonSEO WhereCan I Download This Script?
  • 108.
    @LeeFootSEO | #BrightonSEO SearchSolved.co.uk/python-subcats Youcan find the full script with instructions on our Website:
  • 109.
  • 110.
    Don’t Wait🐍🔥 There isan awesome community of SEOs Online who are passionate about Python. If you’re thinking about getting started, come and join us!
  • 111.
    Python Resources YouTube Channels CoreyShafer Data School Socratica MIT Introduction to Computer Science & Python Apps Solo Learn (Android / iPhone) Books Automate the Boring Stuff
  • 112.
    Python SEOs tofollow on Twitter @GregBernhardt4 @DataChaz @OritSiMu @DanielHereMe @LeeFootSEO | #BrightonSEO @SEOPythonistas @rvtheverett @vdrweb @LeeFootSEO 😃
  • 113.
    Thank You For Your Attention ! Feelfree to DM me any questions or contact me through our Website.

Editor's Notes

  • #6 and since then my productivity has gone through the roof and it’s gotten to the point where I’m not even sure how I did my job without it before!
  • #24 Talk about internal search mapping,
  • #47 Examples of n-grams generated. Not the highest value of categories – but useful to get the idea across
  • #48 Examples of n-grams generated. Not the highest value of categories – but useful to get the idea across
  • #49 I know
  • #50 I know
  • #97 In other wod
  • #98 Tried to account for this in the past, by adding an ‘s’ to the fnial output – but there’s too many edges cases. ‘es’ words and the like
  • #109 I’ll tweet the link out at the end as well
  • #113 I’ll tweet this out at the end too great community of python enthusiasts and professionals online. If you want to get started – don’t wait! Make things and dive