Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Fixing the Indexability Challenge:
A Data-Based Framework
Areej AbuAli
#MozCon 2019
slideshare.net/areejabuali@areej_abuali
I’m here to talk to you about a framework
that I came up with to fix a client’s
indexability challenge
But more importantly…
(And I only realised this after I’d finished preparing this talk)
I’m also here to talk to you about how
“Technical Problems are People Problems”
I first met this client back in 2017
They’re a job aggregator site
Their website was struggling
89%YoY decrease in organic visibility
0
2000
4000
6000
8000
10000
12000
Organic Traffic
Y1 X
Y2 Y
Change -46%
They barely ranked…
0
100,000
200,000
300,000
400,000
500,000
600,000
Competitor OrganicVisibility
Client indeed.co.uk monster.co.uk reed.co.uk
And their site was so massive;
we struggled to crawl it
To make this work, we need to fix the
fundamentals first
We need to fix their tech
So this talk is about my 18-month
relationship with this client
It’s about what worked, what didn’t work
and what I would have done differently
The Initial Findings
Links Tech Content
I started working on a comprehensive audit
I ended up with a 70-page document
There was a total of 50 recommendations
Some of the main things included were…
72% of backlinks came from only
3 referring domains
Their on-page content was full of
duplication and missing the basics
There were NO sitemaps
Canonical tags were not set up correctly
And their internal linking structure
was a nightmare
Every recommendation was outlined using a
traffic light system
And every section was split into
Problem, Effect & Solution
We had a half-day audit handover meeting
where I walked them through all
of our recommendations
Everyone was in good spirits
Yet I couldn’t help but feel it
was not enough…
Something was missing
Everything I recommended up till
now was solid
But I had a gut feeling that due to the
nature of their site…
These recommendations wouldn’t
quite cut it
I had to go back to the drawing board
The Supplementary Findings
Also Known As…
The Findings I Should’ve Found The First Time
Round But Didn’t So I’m Choosing To Call It
Supplementary Findings To Sound
...
They’re a job aggregator site – in essence, they’re a
job search engine
That means that every single search
conducted could, potentially, be
crawled and/or indexed
if it wasn’t built right
That’s equivalent to an infinite number
of potential searches!
Which part was their site
not getting right?
I knew that it was impossible to crawl
The one time I tried to fully crawl the site,
it returned over 2.5 million URLs
And I could only crawl it by excluding
massive sections
And if their pages couldn’t be crawled,
then they would never be
indexed properly…
And they won’t rank
So there were three problems that
needed to be fixed
Crawling Indexing Ranking
It was apparent that there were no rules in
place to help direct robots
Unique URLs were created, using up all
possible filter combinations
This can create a potentially-unlimited
number of URLs
And all the pages looked exactly the
same – they just had a list of jobs!
Google was wasting crawl budget by
crawling duplicate thin pages and
attempting to index them
The ‘Aha!’ Moment
It was apparent that there were no rules in
place to help direct robots
I needed to create a customised framework
that instructs bots on what to do and
what not to do
The job aggregator industry seemed to be
doing one of two things
Limiting indexable pages
à miss out on ranking opportunity
Limiting indexable pages
à miss out on ranking opportunity
Not limiting indexable pages
à wind up with reduced link equity
My framework was going to do neither
Instead it would use search volume data
to determine which pages are
valuable to search engines
So how exactly would this work?
It starts off by passing
the search query
through a keyword
combination script
This script outputs different combinations
for the search conducted
It does that by changing the order of
keywords to see all possible combinations
Digital Marketing Manager London
Digital Marketing Manager London
London Digital Marketing Manager
Digital Marketing London Manager
4! = 24 combinations
These combinations will increase based
on the search query and filters applied
Even though Google will regard most
of these combinations as the same…
The script will help avoid duplicating
pages that have different versions
of the same thing
It then searches the
database to see
if this job is available
If (Job # Available)
Load a page stating so
and no-index it
If (Job = Available)
Search the keyword
database for all
keyword combinations
from the script
Fetch search volume
data for these
keyword combinations
For search volume data, we recommended
using keywordtool.io API
If (SearchVolume > 50)
A page is then created
using the highest SV
keyword combination
that is both indexed
and crawled
If (SearchVolume < 50)
Load page for users
but no-index it
Your search volume cut-off can be updated
at anytime and is based on what makes
sense for your industry
There is always the possibility of
errors occurring
What if there’s a tie-breaker?
(Several keyword combinations having the
same search volume)
Create an indexed crawlable page based
on the keyword used in the
user search query
What if the API was down?
(And you’re unable to generate search volume data)
Load a no-index page for usability
and don’t store the query in the database
As for their internal linking structure…
This was the status of their header
This is what we recommended
We also provided internal linking
recommendations for their footer and
job advertisement pages
And an exact breakdown of their filter
system on job search pages
As for their sitemaps – they
had none
We recommended creating and
splitting them up
Blog Sitemap
Job Advertisements Sitemap
Job Results Sitemap
Ancillary Sitemap
The final step was to help them sort out
their content
Even if they were only indexing high
search volume pages; their
content was very thin
Their chances to rank would
still be minimal
They had the same H1,Title Tag & Meta
Description for every filtered
indexed page
Most competitors automatically generate
optimised meta tags
They needed to do the same for
indexable filter pages
Other than a company page and a
handful of blog posts, there were
no core content pages
So we performed in-depth keyword research
and opportunity analysis to see what
content generates traffic
And we provided a content audit
and strategy to go with it
My ‘Aha!’ Moment felt complete
This was the piece of the puzzle
that was missing
Four Months Later
The client confirm that they
implemented everything
The first thing that caught my attention
was their site went from over
2.5M crawled pages to 20K
Which initially felt like good news…
Until I realised that their traffic had
declined…
Remember this?
Organic Traffic
Y1 X
Y2 Y
Change -46%
Organic Traffic
Y1 X
Y2 Y
Y3 Z
Change -86%
The Mini Audit Findings
Also Known As…
The Findings I’m Rushing To Find In A Panic
To Prove That They Haven’t Implemented
My Recommendations Accurately,
Hence I ...
I went through the original list of
50 recommended actions
At that point, there were 29 that had
not been implemented
I also discovered ten new issues that were
affecting their indexability
Google was choosing to only index
20% of the submitted sitemap
Googlebot will choose to visit their site less
if their indexability is not in check
The client explained that they’ve added
canonical tags and felt that was enough
They’re relying on Googlebot to:
1) Crawl the pages
They’re relying on Googlebot to:
1) Crawl the pages
2) Find the canonical tags
They’re relying on Googlebot to:
1) Crawl the pages
2) Find the canonical tags
3) Then choose to ignore them
Canonical tags are simply hints for bots
Google might decide to ignore the
tags and pick other pages to index
Almost 80K pages had been indexed
despite not being submitted
via the sitemap
And some pages were included in the
sitemap but not indexed
/job/finance-manager-liverpool
/job/finance-manager-liverpool?index=
57271b7c~4&utm_campaign=job-
detail&utm_source=search-
result&utm_medium=website
Because this page was the one
discovered via internal links
Over 5K similar pages with parameters
were getting indexed
Your main goal is to maximise crawl budget
You cannot use canonical tags as a
sticking plaster to fix that
This implementation was still incomplete
I had to change my way of conveying
this message
I put a stop to the endless stream of emails
and scheduled a face-to-face meeting
We reviewed each and every single
remaining task and discussed them in detail
Behold the wonders of Google Sheets!
We re-prioritised tasks and put
estimated completion dates
It was not an easy meeting but
it felt productive
Where are we now?
On a personal level, I discovered that I
suffer from Imposter Syndrome
This was my constant state of mind
I was working closely with the CTO
throughout this project
I felt he didn’t trust me or my knowledge
So, what would I have done differently?
If I could go back in time, I would realise
what the actual problem was
All technical problems are people problems
The SEO recommendations were solid
Getting it implemented was the hard part
As a tech SEO, the most you can do
is to influence priorities
You have no control
In this instance, I didn’t manage to
persuade him to implement
the recommendations
I also learned that the way I’ve been
doing SEO audits is plain wrong
I always focused on delivering a set
of comprehensive actions
Instead, maybe I should just deliver
a SINGLE recommendation
And once that’s implemented…
Then, and only then, will I recommend
another
And maybe I shouldn’t recommend
Nice-To-Do’s…
Until there are only Nice-To-Do’s
left to do
Because they are simply a distraction from
the main problem
This talk does not have a happy ending…
It is not a successful case study…
There’s no upward visibility graph and
page one rankings for me to show off…
This talk is about real life
It’s about a framework I created that
fixes indexability issues that I’m proud of
and I know in my gut *works*
So I wanted to share it with you
Because I can see this applied across
many sites in plenty of industries
And I’d love forYOU to implement it
So I’m going to share my full
methodology with you
slideshare.net/areejabuali
bit.ly/mozcon-areej
And just remember…
Getting the basics right is so fundamental
If you can do nothing else,
just do the tech.
Areej AbuAli
Slides: slideshare.net/areejabuali
Tweet things @areej_abuali
References
(1) http://3.bp.blogspot.com/_zKFo7OwbIp8/TLkK8i5FrlI/AAAAAAAAEeY/oeXBnenxnWU/s1600/uphill3_l.jpg
(2) https://s...
[MozCon 2019]  Fixing the Indexability Challenge: A Data-Based Framework
[MozCon 2019]  Fixing the Indexability Challenge: A Data-Based Framework
[MozCon 2019]  Fixing the Indexability Challenge: A Data-Based Framework
[MozCon 2019]  Fixing the Indexability Challenge: A Data-Based Framework
Upcoming SlideShare
Loading in …5
×

[MozCon 2019] Fixing the Indexability Challenge: A Data-Based Framework

3,017 views

Published on

[MozCon 2019] How do you turn an unwieldy 2.5 million-URL website into a manageable and indexable site of just 20,000 pages? Areej will share the methodology and takeaways used to restructure a job aggregator site which, like many large websites, had huge problems with indexability and the rules used to direct robot crawl. This talk will tackle tough crawling and indexing issues, diving into the case study with flow charts to explain the full approach and how to implement it.

Published in: Marketing

[MozCon 2019] Fixing the Indexability Challenge: A Data-Based Framework

  1. 1. Fixing the Indexability Challenge: A Data-Based Framework Areej AbuAli #MozCon 2019 slideshare.net/areejabuali@areej_abuali
  2. 2. I’m here to talk to you about a framework that I came up with to fix a client’s indexability challenge
  3. 3. But more importantly… (And I only realised this after I’d finished preparing this talk)
  4. 4. I’m also here to talk to you about how “Technical Problems are People Problems”
  5. 5. I first met this client back in 2017
  6. 6. They’re a job aggregator site
  7. 7. Their website was struggling
  8. 8. 89%YoY decrease in organic visibility 0 2000 4000 6000 8000 10000 12000
  9. 9. Organic Traffic Y1 X Y2 Y Change -46%
  10. 10. They barely ranked…
  11. 11. 0 100,000 200,000 300,000 400,000 500,000 600,000 Competitor OrganicVisibility Client indeed.co.uk monster.co.uk reed.co.uk
  12. 12. And their site was so massive; we struggled to crawl it
  13. 13. To make this work, we need to fix the fundamentals first
  14. 14. We need to fix their tech
  15. 15. So this talk is about my 18-month relationship with this client
  16. 16. It’s about what worked, what didn’t work and what I would have done differently
  17. 17. The Initial Findings
  18. 18. Links Tech Content I started working on a comprehensive audit
  19. 19. I ended up with a 70-page document
  20. 20. There was a total of 50 recommendations
  21. 21. Some of the main things included were…
  22. 22. 72% of backlinks came from only 3 referring domains
  23. 23. Their on-page content was full of duplication and missing the basics
  24. 24. There were NO sitemaps
  25. 25. Canonical tags were not set up correctly
  26. 26. And their internal linking structure was a nightmare
  27. 27. Every recommendation was outlined using a traffic light system
  28. 28. And every section was split into Problem, Effect & Solution
  29. 29. We had a half-day audit handover meeting where I walked them through all of our recommendations
  30. 30. Everyone was in good spirits
  31. 31. Yet I couldn’t help but feel it was not enough…
  32. 32. Something was missing
  33. 33. Everything I recommended up till now was solid
  34. 34. But I had a gut feeling that due to the nature of their site…
  35. 35. These recommendations wouldn’t quite cut it
  36. 36. I had to go back to the drawing board
  37. 37. The Supplementary Findings
  38. 38. Also Known As…
  39. 39. The Findings I Should’ve Found The First Time Round But Didn’t So I’m Choosing To Call It Supplementary Findings To Sound Like An Expert.
  40. 40. They’re a job aggregator site – in essence, they’re a job search engine
  41. 41. That means that every single search conducted could, potentially, be crawled and/or indexed if it wasn’t built right
  42. 42. That’s equivalent to an infinite number of potential searches!
  43. 43. Which part was their site not getting right?
  44. 44. I knew that it was impossible to crawl
  45. 45. The one time I tried to fully crawl the site, it returned over 2.5 million URLs
  46. 46. And I could only crawl it by excluding massive sections
  47. 47. And if their pages couldn’t be crawled, then they would never be indexed properly…
  48. 48. And they won’t rank
  49. 49. So there were three problems that needed to be fixed
  50. 50. Crawling Indexing Ranking
  51. 51. It was apparent that there were no rules in place to help direct robots
  52. 52. Unique URLs were created, using up all possible filter combinations
  53. 53. This can create a potentially-unlimited number of URLs
  54. 54. And all the pages looked exactly the same – they just had a list of jobs!
  55. 55. Google was wasting crawl budget by crawling duplicate thin pages and attempting to index them
  56. 56. The ‘Aha!’ Moment
  57. 57. It was apparent that there were no rules in place to help direct robots
  58. 58. I needed to create a customised framework that instructs bots on what to do and what not to do
  59. 59. The job aggregator industry seemed to be doing one of two things
  60. 60. Limiting indexable pages à miss out on ranking opportunity
  61. 61. Limiting indexable pages à miss out on ranking opportunity Not limiting indexable pages à wind up with reduced link equity
  62. 62. My framework was going to do neither
  63. 63. Instead it would use search volume data to determine which pages are valuable to search engines
  64. 64. So how exactly would this work?
  65. 65. It starts off by passing the search query through a keyword combination script
  66. 66. This script outputs different combinations for the search conducted
  67. 67. It does that by changing the order of keywords to see all possible combinations
  68. 68. Digital Marketing Manager London
  69. 69. Digital Marketing Manager London London Digital Marketing Manager Digital Marketing London Manager
  70. 70. 4! = 24 combinations
  71. 71. These combinations will increase based on the search query and filters applied
  72. 72. Even though Google will regard most of these combinations as the same…
  73. 73. The script will help avoid duplicating pages that have different versions of the same thing
  74. 74. It then searches the database to see if this job is available
  75. 75. If (Job # Available) Load a page stating so and no-index it
  76. 76. If (Job = Available) Search the keyword database for all keyword combinations from the script
  77. 77. Fetch search volume data for these keyword combinations
  78. 78. For search volume data, we recommended using keywordtool.io API
  79. 79. If (SearchVolume > 50) A page is then created using the highest SV keyword combination that is both indexed and crawled
  80. 80. If (SearchVolume < 50) Load page for users but no-index it
  81. 81. Your search volume cut-off can be updated at anytime and is based on what makes sense for your industry
  82. 82. There is always the possibility of errors occurring
  83. 83. What if there’s a tie-breaker? (Several keyword combinations having the same search volume)
  84. 84. Create an indexed crawlable page based on the keyword used in the user search query
  85. 85. What if the API was down? (And you’re unable to generate search volume data)
  86. 86. Load a no-index page for usability and don’t store the query in the database
  87. 87. As for their internal linking structure…
  88. 88. This was the status of their header
  89. 89. This is what we recommended
  90. 90. We also provided internal linking recommendations for their footer and job advertisement pages
  91. 91. And an exact breakdown of their filter system on job search pages
  92. 92. As for their sitemaps – they had none
  93. 93. We recommended creating and splitting them up
  94. 94. Blog Sitemap Job Advertisements Sitemap Job Results Sitemap Ancillary Sitemap
  95. 95. The final step was to help them sort out their content
  96. 96. Even if they were only indexing high search volume pages; their content was very thin
  97. 97. Their chances to rank would still be minimal
  98. 98. They had the same H1,Title Tag & Meta Description for every filtered indexed page
  99. 99. Most competitors automatically generate optimised meta tags
  100. 100. They needed to do the same for indexable filter pages
  101. 101. Other than a company page and a handful of blog posts, there were no core content pages
  102. 102. So we performed in-depth keyword research and opportunity analysis to see what content generates traffic
  103. 103. And we provided a content audit and strategy to go with it
  104. 104. My ‘Aha!’ Moment felt complete
  105. 105. This was the piece of the puzzle that was missing
  106. 106. Four Months Later
  107. 107. The client confirm that they implemented everything
  108. 108. The first thing that caught my attention was their site went from over 2.5M crawled pages to 20K
  109. 109. Which initially felt like good news…
  110. 110. Until I realised that their traffic had declined…
  111. 111. Remember this? Organic Traffic Y1 X Y2 Y Change -46%
  112. 112. Organic Traffic Y1 X Y2 Y Y3 Z Change -86%
  113. 113. The Mini Audit Findings
  114. 114. Also Known As…
  115. 115. The Findings I’m Rushing To Find In A Panic To Prove That They Haven’t Implemented My Recommendations Accurately, Hence I Am Still An Expert.
  116. 116. I went through the original list of 50 recommended actions
  117. 117. At that point, there were 29 that had not been implemented
  118. 118. I also discovered ten new issues that were affecting their indexability
  119. 119. Google was choosing to only index 20% of the submitted sitemap
  120. 120. Googlebot will choose to visit their site less if their indexability is not in check
  121. 121. The client explained that they’ve added canonical tags and felt that was enough
  122. 122. They’re relying on Googlebot to: 1) Crawl the pages
  123. 123. They’re relying on Googlebot to: 1) Crawl the pages 2) Find the canonical tags
  124. 124. They’re relying on Googlebot to: 1) Crawl the pages 2) Find the canonical tags 3) Then choose to ignore them
  125. 125. Canonical tags are simply hints for bots
  126. 126. Google might decide to ignore the tags and pick other pages to index
  127. 127. Almost 80K pages had been indexed despite not being submitted via the sitemap
  128. 128. And some pages were included in the sitemap but not indexed
  129. 129. /job/finance-manager-liverpool
  130. 130. /job/finance-manager-liverpool?index= 57271b7c~4&utm_campaign=job- detail&utm_source=search- result&utm_medium=website
  131. 131. Because this page was the one discovered via internal links
  132. 132. Over 5K similar pages with parameters were getting indexed
  133. 133. Your main goal is to maximise crawl budget
  134. 134. You cannot use canonical tags as a sticking plaster to fix that
  135. 135. This implementation was still incomplete
  136. 136. I had to change my way of conveying this message
  137. 137. I put a stop to the endless stream of emails and scheduled a face-to-face meeting
  138. 138. We reviewed each and every single remaining task and discussed them in detail
  139. 139. Behold the wonders of Google Sheets!
  140. 140. We re-prioritised tasks and put estimated completion dates
  141. 141. It was not an easy meeting but it felt productive
  142. 142. Where are we now?
  143. 143. On a personal level, I discovered that I suffer from Imposter Syndrome
  144. 144. This was my constant state of mind
  145. 145. I was working closely with the CTO throughout this project
  146. 146. I felt he didn’t trust me or my knowledge
  147. 147. So, what would I have done differently?
  148. 148. If I could go back in time, I would realise what the actual problem was
  149. 149. All technical problems are people problems
  150. 150. The SEO recommendations were solid
  151. 151. Getting it implemented was the hard part
  152. 152. As a tech SEO, the most you can do is to influence priorities
  153. 153. You have no control
  154. 154. In this instance, I didn’t manage to persuade him to implement the recommendations
  155. 155. I also learned that the way I’ve been doing SEO audits is plain wrong
  156. 156. I always focused on delivering a set of comprehensive actions
  157. 157. Instead, maybe I should just deliver a SINGLE recommendation
  158. 158. And once that’s implemented…
  159. 159. Then, and only then, will I recommend another
  160. 160. And maybe I shouldn’t recommend Nice-To-Do’s…
  161. 161. Until there are only Nice-To-Do’s left to do
  162. 162. Because they are simply a distraction from the main problem
  163. 163. This talk does not have a happy ending…
  164. 164. It is not a successful case study…
  165. 165. There’s no upward visibility graph and page one rankings for me to show off…
  166. 166. This talk is about real life
  167. 167. It’s about a framework I created that fixes indexability issues that I’m proud of and I know in my gut *works*
  168. 168. So I wanted to share it with you
  169. 169. Because I can see this applied across many sites in plenty of industries
  170. 170. And I’d love forYOU to implement it
  171. 171. So I’m going to share my full methodology with you
  172. 172. slideshare.net/areejabuali
  173. 173. bit.ly/mozcon-areej
  174. 174. And just remember…
  175. 175. Getting the basics right is so fundamental
  176. 176. If you can do nothing else, just do the tech.
  177. 177. Areej AbuAli Slides: slideshare.net/areejabuali Tweet things @areej_abuali
  178. 178. References (1) http://3.bp.blogspot.com/_zKFo7OwbIp8/TLkK8i5FrlI/AAAAAAAAEeY/oeXBnenxnWU/s1600/uphill3_l.jpg (2) https://static.independent.co.uk/s3fs-public/thumbnails/image/2018/08/16/16/handshake.jpg (3) http://www.subversivecopyeditor.com/.a/6a010536d5bba5970c022ad359d336200c-pi (4) https://pbs.twimg.com/tweet_video_thumb/Dd4TQ7jUQAA-umE.jpg (5) https://cdn-images-1.medium.com/max/1600/0*ZQDtKU0r347Sta1J.jpg (6) https://b.kisscc0.com/20180923/xzw/kisscc0-three-wise-monkeys-cartoon-black-and-white-animal-three-wise-monkeys-simple- 5ba75b8cdb7c39.781586971537694604899.png (7) https://upload.wikimedia.org/wikipedia/commons/c/ce/Puzzle_black-white_missing.jpg (8) http://d70c4c94fa2afb6dbbc0-2bb54928637cb07488eb9dfab3a7ca9e.r6.cf2.rackcdn.com/uploaded/t/0e5859330_1485453015_too-good-to- be-true-rotator.jpg (9) https://imgc.allpostersimages.com/img/print/posters/kim-warp-an-aerial-view-of-a-car-driving-along-an-infinity-shaped-road-new-yorker- cartoon_a-G-9164141-8419447.jpg

×