Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Large Site SEO Architecture - #BrightonSEO 2015

17,942 views

Published on

An overview of the challenges of large site SEO architecture and a case for a new pattern of developing the web - "Destination Oriented Architecture". Followed by the proposed measurement framework of "Destination to Crap Ratios" and a set of technical examples of applying these ideas.

Published in: Marketing

Large Site SEO Architecture - #BrightonSEO 2015

  1. 1. @earnedMarketing EPISODE WASTE
  2. 2. @earnedMarketing ` Tomas Vaitulevicius @earnedMarketing Head of Digital Marketing @ JustPark 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1,400,000 1,600,000 1,800,000 2,000,000 SEO Traffic 0 5,000,000 10,000,000 15,000,000 20,000,000 25,000,000 SEO Traffic
  3. 3. @earnedMarketing Richard Baxter @richardbaxter Benjamin Johnson @d00berry Dean Rowe +DeanRoweSEO Big thanks for help in putting this content together
  4. 4. @earnedMarketing SEO ARCHITECTURE
  5. 5. @earnedMarketing Search Demand Topic Coverage Top of Class Content Dedicated Pages Flat Prioritised Linking Monitoring Devising of SEO architecture follows a very similar set of steps at websites of all sizes
  6. 6. @earnedMarketing Monitoring Search Demand Topic Coverage Top of Class Content Dedicated Pages Flat Prioritised Linking But at large ones there’s a number of other SEO complications that need to be dealt with. This deck focuses primarily on Waste (of crawl budget, Google index & internal link equity)
  7. 7. @earnedMarketing Nofollow Robots.txt Noindex Canonical keep Googlebot away from parts of site keep Googlebot away from parts of site get parts of site out of Google’s index solve duplicate content issues SEO TOOLKIT Small Sites Another difference between small and large site SEO architecture is that the basic SEO tools…
  8. 8. @earnedMarketing Nofollow Robots.txt Noindex Canonical keep Googlebot away from parts of site keep Googlebot away from parts of site get parts of site out of Google’s index solve duplicate content issues burn internal link equity burn internal link equity & block inbound link equity waste crawl budget and burn 15% of link equity waste crawl budget, burn 15% of link equity & add uncertainty SEO TOOLKIT BAND AIDS Small Sites Large Sites …become pretty damaging on scale
  9. 9. @earnedMarketingThis matters because PageRank is still the foundation of Google’s crawl and indexation
  10. 10. @earnedMarketing Search Listing Home Category Let’s say we have a small website
  11. 11. @earnedMarketing Search 0.61 Listing Home 0.85 0.72 1 0.72 0.61 Category 0.85 1 With 1 unit of PageRank arriving to the homepage and cascading down through links
  12. 12. @earnedMarketing Search 0.15 Listing Home 0.85 0.36 1 1 0.36 0.15 Category 0.85 0.36 0.15 Dead -end Dead -end If we add a couple of links patched with the SEO band-aids (nofollow or Robots.txt Disallow), we’ll make half of the link equity of Category and Search pages evaporate from our site
  13. 13. @earnedMarketing Search 0.15 Listing Home 0.85 0.36 1 1 0.36 0.15 Category 0.85 -75% 0.36 0.15 Dead -end Dead -end Making the listing page 75% weaker. Inefficiencies like these are killing large site SEO as pages with little PageRank don’t get crawled and indexed, and obviously won’t get any traffic
  14. 14. @earnedMarketing IT IS HARD! Huge amounts of waste and damaging effects of SEO Band Aids do make Large Site SEO Architecture pretty d*mn hard
  15. 15. @earnedMarketing Page Oriented Architecture Destination Oriented Architecture Single Page Application But we found inspiration in the new technology of Single Page Applications for a new approach to SEO architecture which fixes the problems rather than patching them up
  16. 16. @earnedMarketing justpark.com/london/ justpark.com/london/?page=2 justpark.com/london/?sort=price justpark.com/listing-1/ justpark.com/listing-1/photos justpark.com/listing-1/save justpark.com/listing-1/enquire justpark.com/listing-1/book justpark.com/forgot-password page page? page? page page? page? page? page? page? destination destination destination destination destination destination destination destination destination PAGES vs DESTINATIONS In Destination Oriented Architecture we want to identify the canonical pages/URLs that represent real Destinations targeting SEO Topics and “kill” all of the other publicly available URLs
  17. 17. @earnedMarketing 1 SEO Topic = 1 Destination 1 Destination = 1 SEO Topic No SEO Topic = No Destination We want to have as many distinct Destinations as we have different SEO Topics we’re targeting. And all the supplementary content and functionality to live within these Destinations
  18. 18. @earnedMarketing DESTINATION TO CRAP RATIOS 0% 20% 40% 60% 80% 100% Usage Internal links Index Crawl We use Destination to Crap ratios to gauge how well we’re doing on the journey to a fully Destination Oriented Architecture (it’s also helpful in getting buy-in from the different stakeholders as no one wants to think of their platform as being 80% crap or waste)
  19. 19. @earnedMarketing Crawl – split of Googlebot crawl hits in your access logs between (exact) destination URLs and not Index – all of your destinations should be in your sitemaps that are submitted to the Google Search Console. Index ratio is = Indexed Destinations (GSC > Crawl > Sitemaps) vs Total Indexed (GSC > Google Index > Index Status) - Indexed Destinations Internal Links – all internal links from a web crawl (Screaming Frog, Deep Crawl, etc.) split between the ones pointing to (exact) destination URLs and not Usage – page views of your users (web analytics) split between (exact) destination URLs and not METHODOLOGY
  20. 20. @earnedMarketing REAL WORLD EXAMPLES
  21. 21. @earnedMarketing rightmove.co.uk/fees.html?listing_id=165467654 justpark.com/parking-spaces/…/callout-snippet/ > Js-off – host content within a relevant destination and link with in-page anchors > Js span trigger for preloaded or AJAX lightbox ! Crawl Waste ! Littered Index ! Duplicate Content ! Thin Content ! Wasted Internal Link Equity ! Scattered Inbound Link Equity SUPPLEMENTARY CONTENT
  22. 22. @earnedMarketing rightmove.co.uk/property/London.html/svr/2124;jsession id=9BE1415794CEDC5590B1FA11B8817DE0 > Exclude for bots > Move to cookies > Go stateless (in extreme circumstances carrying the state in POST form hidden fields) ! Crawl Waste ! Littered Index ! Duplicate Content ! Thin Content ! Wasted Internal Link Equity ! Scattered Inbound Link Equity SESSION PARAMETERS
  23. 23. @earnedMarketing http://ww.just-park.co.uk/uk/parking/London >> https://www.justpark.com /uk/parking/london / > Catch-all 301 redirects in server config > http <> https > non-www <> www < unrecognised subdomains > upper case > lower case > no trailing slash <> trailing slash ! Crawl Waste ! Littered Index ! Duplicate Content ! Thin Content ! Wasted Internal Link Equity ! Scattered Inbound Link Equity ALTERNATIVE URLs
  24. 24. @earnedMarketing instagram.com/accounts/login/?next=%2Fabout… distilled.net/store/profile/login/?next=/resources/ > Hash parameter for Js > HTTP Referrer > Cookies / LocalStorage > Lightbox login form ! Crawl Waste ! Littered Index ! Duplicate Content ! Thin Content ! Wasted Internal Link Equity ! Scattered Inbound Link Equity FORWARDING PARAMETERS
  25. 25. @earnedMarketing ufc.com/fightweek?utm_campaign=Intl+Fight+… ted.com/?utm_medium=email&utm_source=Oxford… > Special URL tracking redirect loop > Hash (#) parameter based traffic source tracking ! Crawl Waste ! Littered Index ! Duplicate Content ! Thin Content ! Wasted Internal Link Equity ! Scattered Inbound Link Equity TRACKING PARAMETERS
  26. 26. @earnedMarketing justpark.com/london/…/garden-car-park/?start_date= 2015-08-16&end_date=2015-08-16&start_time=… > Omit on default > Server session (but better not) > Hash parameter for Js > Cookies / LocalStorage ! Crawl Waste ! Littered Index ! Duplicate Content ! Thin Content ! Wasted Internal Link Equity ! Scattered Inbound Link Equity FUNCTIONAL PARAMETERS
  27. 27. @earnedMarketing worldbank.org/…/modules/economic/gnp/print.html rightmove.co.uk/…/print.html?listingId=47812940 > Print Stylesheet! Crawl Waste ! Littered Index ! Duplicate Content ! Thin Content ! Wasted Internal Link Equity ! Scattered Inbound Link Equity PRINT VERSION
  28. 28. @earnedMarketing justpark.com/…/book/?listing_id=148685&… rightmove.co.uk/addtoshortlist.html?listing_id=478129 > Logged-out version link to /login#forwarding=xxx > Logged-out version Js span trigger login lightbox > POST to the product URL > AJAX for logged-in ! Crawl Waste ! Littered Index ! Duplicate Content ! Thin Content ! Wasted Internal Link Equity ! Scattered Inbound Link Equity LOGGED-IN FUNCTIONALITY
  29. 29. @earnedMarketing justpark.com/uk/parking/brighton/?page=2 rightmove.co.uk/…/London.html?sortType=1 > InPage-only AJAX manipulations > Cookies > Hash parameters and on load AJAX processing ! Crawl Waste ! Littered Index ! Duplicate Content ! Thin Content ! Wasted Internal Link Equity ! Scattered Inbound Link Equity SEARCH PAGINATION & FILTERS
  30. 30. @earnedMarketing gumtree.com/search?q=car&tq=%7B%22i%22%3A... ebay.co.uk/sch/i.html?_nkw=car&_from=R40&_tr… > Canonicalising redirects > Js search form pointing to the canonical URL > AJAX search with pushState canonical URLs (SPA) ! Crawl Waste ! Littered Index ! Duplicate Content ! Thin Content ! Wasted Internal Link Equity ! Scattered Inbound Link Equity DYNAMIC SEARCH URLS
  31. 31. @earnedMarketing rightmove.co.uk/…/terms-of-use-and-privacy-policy justpark.com/uk/airport-parking/ > Js span triggered AJAX lightbox > Shortlisting only relevant resources by page type (homepage, search, etc.) > Merge multiple site-wide- linked pieces into a single location with hash deep links ! Crawl Waste ! Littered Index ! Duplicate Content ! Thin Content ! Wasted Internal Link Equity ! Scattered Inbound Link Equity SITE-WIDE LINKS (HEADER / FOOTER)
  32. 32. @earnedMarketing And, please, crawl your sites to make sure you’re not linking to URLs that redirect or canonicalise to other URLs!..

×