Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OnCrawl Afterwork - How to fix index bloating issues?

Last October, 25th, we organized an OnCrawl Afterwork in New York with the SEO manager from Purch, Vincent Malischewski.
He talked about index bloating, which is one of the most common SEO problems that websites face today. It happens whenever Google indexes pages that should not be indexed. It can happen to almost any website as a result of pagination issues or by allowing blog categories, tags, and archives to be indexed by Google. Vincent Malischewski, SEO Manager at Purch will show how to find and fix index bloating with actionable use cases and key takeaways.

  • Login to see the comments

  • Be the first to like this

OnCrawl Afterwork - How to fix index bloating issues?

  1. 1. Speaking today Vincent Malischewski SEO Manager Laura Bony VP Sales north america
  2. 2. We help e-commerce & online media take better SEO decisions and grow their revenues By providing access to the Most Advanced Technical SEO Platform
  3. 3. 2 founders 4M$ raised in 2018 30 OnCrawlers 2 labs involved 100B Logs / month 250M URLs / week Used in 66 countries Best SEO Tool 2 years in a row & 7 others
  4. 4. trust us for their daily SEO audits 800+ clients
  5. 5. What is Index Bloating? When a website has pages in the search engine’s index that should not be indexed. Examples: • Automated pages: Internal Search Pages, pagination, archive, author pages etc. • Low quality content: Outdated, thin, UGC, etc.
  6. 6. Why is it an issue? From our point of view, our quality algorithms do look at the website overall, so they do look at everything that’s indexed. (Webmaster Trends Analyst @Google)
  7. 7. Index Bloating @Purch(and @ any other company that generates a lot of content) Some of our websites are 15+ yo. There is A TON of outdated content. in 2002 Yet Google is still crawling, processing and indexing all this data. Don’t get me started on the forums…
  8. 8. Our solution: archiving - Still reachable (status code: 200) - No-Index instruction for Google to clean its index - Not internally linked: Related Links, Hardcoded links, Internal Search
  9. 9. Other methods we’re using( even #3 ಠ_ಠ ) Depending on your website, business or technical limitations, you can: • Remove the content (status code: 404 or 410 will be slightly faster) • Re-arrange: merge, redirect to make stronger pages (preferred solution) • Don’t do anything and hope for the best (not recommended)
  10. 10. Identifying the content This method really depends on your website. Because of the mass of content and the number of brands we’re dealing with, we used a simple rule for archiving: - Remove pages with no SEO visits over the past X months OnCrawl will easily provide you this data via the Data Explorer using the logs or GA data (I’d advise to use GA if not too many pages)
  11. 11. Identifying the content You can identify indexed pages with a crawl: Below the “indexability breakdown” report (segmentation is important)
  12. 12. Results: Index 51k 31k Site #1
  13. 13. Results: Crawl The log analysis will mainly show you if Google acknowledged the change.
  14. 14. Results: Index 40k 25k Site #2
  15. 15. Results: Crawl The log analysis will mainly show you if Google acknowledged the change.
  16. 16. Results: Index 11k 6k Site #3
  17. 17. Results: Crawl
  18. 18. Results: Traffic SEO results on traffic (especially technical) don’t show overnight. The goal is mostly to stay on the right side of the algorithm when major Google updates occurs.
  19. 19. Watch our product tour