You deleted how many pages? 130M and heres why - brightonSEO Autumn 2021
Sep. 1, 2021•0 likes•2,355 views
Report
Marketing
In this talk David will tell the story of Trainline’s multi-year crawlability project code named Black Widow. Find out why millions and millions of pages were deleted, the results that followed and what we learnt along the way.
4. 10 September 2021
#brightonSEO 4
David Lewis
Senior Technical SEO Manager @ Trainline
3.5 years at Trainline, 6 years in SEO
Advocate of personal development
Keen meditator
Guitar player
Big sports fan
https://www.linkedin.com/in/david-lewis-48462751/
5. About Trainline
Trainline is the leading
independent rail and coach
travel platform selling rail and
coach tickets to millions of
travellers worldwide. Via our
highly rated website and mobile
app, people can seamlessly
search, book and manage their
journeys all in one place. We
bring together millions of
routes, fares and journey times
from more than 270 rail and
coach carriers across 45
countries. We offer our
customers the best price for
their journey and smart, real
time travel information on the
go. Our aim is to make rail and
coach travel easier and more
accessible, encouraging people
to make more environmentally
sustainable travel choices.
i. Based on transactions from
weeks commencing
06/10/2019 to 23/11/2019
ii. iOS rating from UK AppStore
as of 29 November 2019
iii. Trainline data for November
2019 (incl. UK consumer
number Web/App and EU
Web/App)
iv. Average monthly visits for year
ended 29 February 2020
v. UK consumer mobile app
transactions for the year ended
29 February 2020
Trainline is the world’s leading
independent rail and coach travel
platform
We sell tickets on behalf of
more than 270 rail and coach
companies and are adding
more all the time
We sell train and coach
tickets worldwide, helping
our customers travel in and
across 45 countries in Europe
and the rest of the world
We sell tickets to people
living in more than 175
countriesi
in 14
languages
Our customers can book in
10 currencies and payment
methods include
Apple Pay, PayPal, SOFORT &
iDEAL
We have built Europe’s
leading train and coach
app – with a 4.9/5*
ratingii
There are more than
32.4m cumulative
downloads of our appiii
Our platforms host more
than 90m visits per monthiv
76% of our transactions
are mobilev
Our team numbers
more than 600 people and
more than 40 nationalities,
including 300+ travel tech
specialists and engineers
>160
Train
companie
s
>110
Coach
companie
s
36. 10 September 2021
#brightonSEO 36
Googlebot crawl share for valuable
pages improved from 64% to 87%
Stage 1: Orphaning and
Internal Linking
Stage 2: 410’s
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Jan-19
Feb-19
Mar-19
Apr-19
May-19
Jun-19
Jul-19
Aug-19
Sep-19
Oct-19
Nov-19
Dec-19
Jan-20
Feb-20
Mar-20
Apr-20
May-20
Jun-20
Jul-20
Aug-20
Sep-20
Oct-20
Nov-20
Dec-20
Jan-21
Feb-21
Mar-21
Apr-21
May-21
Jun-21
Jul-21
Googlebot
Crawl
Share
Percentage
Date
Googlebot Crawl Share Over Time
37. 10 September 2021
#brightonSEO 37
3X more URLs are now crawled by
Googlebot and Botify on a monthly
basis from where we started
April 2020 November 2020
2 million
URLs
1.7 million URLs
1.7 million
URLs
800k
URLs
2.6 million URLs
1.7
million
URLs
July 2021
1
million
URLs
2.5 million URLs
1.7
million
URLs
38. 10 September 2021
#brightonSEO 38
For previously orphaned URLs
that were internally linked to
for the first time - we
improved their average
position by 2, jumping from P8
to P6
41. Sitemaps
It’s not enough
to just remove
sitemaps from
GSC to prevent
crawling, you
must delete the
files too
10 September 2021
#brightonSEO 41
https://www.searchenginejournal.com/google-deleting-a-sitemap-wont-stop-us-crawling-your-site/409108/
42. Sitemaps
Or so we
thought…
Deleting the
sitemap files
still isn’t
enough,
because of
Google’s long-
term memory
10 September 2021
#brightonSEO 42
https://www.searchenginejournal.com/googlebot-404/239325/
43. 10 September 2021
#brightonSEO 43
410’s – de-
indexation
We expected the
410’s to be de-
indexed within
48 hours, but
didn’t account for
the time it would
take Google to
crawl through all
the URLs