In this talk David will tell the story of Trainline’s multi-year crawlability project code named Black Widow. Find out why millions and millions of pages were deleted, the results that followed and what we learnt along the way.
4. 4 November 2021
@SearchLDN 4
David Lewis
Senior Technical SEO Manager @ Trainline
Little over 3.5 years at Trainline, 6.5 years in SEO
Advocate of personal development
Keen meditator
Guitar player
Big sports fan
https://www.linkedin.com/in/david-lewis-48462751/
5. About Trainline
Trainline is the leading
independent rail and coach travel
platform selling rail and coach
tickets to millions of travellers
worldwide. Via our highly rated
website and mobile app, people
can seamlessly search, book
and manage their journeys all in
one place. We bring together
millions of routes, fares and
journey times from more than
270 rail and coach carriers
across 45 countries. We offer
our customers the best price for
their journey and smart, real time
travel information on the go. Our
aim is to make rail and coach
travel easier and more
accessible, encouraging people
to make more environmentally
sustainable travel choices.
i. Based on transactions from
weeks commencing 06/10/2019
to 23/11/2019
ii. iOS rating from UK AppStore as
of 29 November 2019
iii. Trainline data for November
2019 (incl. UK consumer
number Web/App and EU
Web/App)
iv. Average monthly visits for year
ended 29 February 2020
v. UK consumer mobile app
transactions for the year ended
29 February 2020
Trainline is the world’s leading
independent rail and coach travel
platform
We sell tickets on behalf of
more than 270 rail and coach
companies and are adding
more all the time
We sell train and coach tickets
worldwide, helping our
customers travel in and across
45 countries in Europe and the
rest of the world
We sell tickets to people
living in more than 175
countriesi in 14 languages
Our customers can book in
10 currencies and payment
methods include
Apple Pay, PayPal, SOFORT &
iDEAL
We have built Europe’s
leading train and coach
app – with a 4.9/5* ratingii
There are more than
32.4m cumulative
downloads of our appiii
Our platforms host more
than 90m visits per monthiv
76% of our transactions
are mobilev
Our team numbers
more than 600 people and
more than 40 nationalities,
including 300+ travel tech
specialists and engineers
>160
Train
companies
>110
Coach
companies
34. 4 November 2021
@SearchLDN 34
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Percentage
Indexed
Date
Indexed vs. De-indexed - Orphaned Pages
% Indexed % de-indexed
Sitemaps removed from GSC
First sitemap data in GSC
86% of all deleted pages were
de-indexed in 6 weeks
36. 4 November 2021
@SearchLDN 36
Googlebot crawl share for valuable pages
improved from 64% to a peak of 87%
Stage 1: Orphaning and
Internal Linking
Stage 2: 410’s
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Jan-19
Feb-19
Mar-19
Apr-19
May-19
Jun-19
Jul-19
Aug-19
Sep-19
Oct-19
Nov-19
Dec-19
Jan-20
Feb-20
Mar-20
Apr-20
May-20
Jun-20
Jul-20
Aug-20
Sep-20
Oct-20
Nov-20
Dec-20
Jan-21
Feb-21
Mar-21
Apr-21
May-21
Jun-21
Jul-21
Aug-21
Sep-21
Googlebot
Crawl
Share
Percentage
Date
Googlebot Crawl Share Over Time
37. 4 November 2021
@SearchLDN 37
We’ve observed a 238% increase in URLs
crawled by Googlebot and Botify on a
monthly basis from where we started
April 2020 November 2020
2 million
URLs
1.7 million
URLs
1.7 million
URLs
800k
URLs
2.6 million
URLs
1.7
million
URLs
August 2021
900k
URLs
2.7 million
URLs
1.7
million
URLs
38. 4 November 2021
@SearchLDN 38
For previously orphaned URLs
that were internally linked to for
the first time - we improved
their average position by 2,
jumping from P8 to P6
41. Sitemaps
It’s not enough
to just remove
sitemaps from
GSC to prevent
crawling, you
must delete the
files too
4 November 2021
@SearchLDN 41
https://www.searchenginejournal.com/google-deleting-a-sitemap-wont-stop-us-crawling-your-site/409108/
42. Sitemaps
Or so we
thought…
Deleting the
sitemap files
still isn’t
enough,
because of
Google’s long-
term memory
4 November 2021
@SearchLDN 42
https://www.searchenginejournal.com/googlebot-404/239325/
43. 4 November 2021
@SearchLDN 43
410’s – de-
indexation
We expected the
410’s to be de-
indexed within 48
hours, but didn’t
account for the
time it would take
Google to crawl
through all the
URLs
Set the scale – number of pages, route combinations
Crawl rate limit = how fast/slow are the pages loading? Is the crawler running into errors? Is there a limit set in GSC?
Crawl demand = how popular are these pages? How fresh/stale are these pages?
One liner on what Botify does and why it’s important
Mention shared goals with devs
How did we get buy-in
First two columns are customer led
Fine for prerecord
Display differently for in-person