Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Behemoth SEO: Search Strategy for Huge Websites

1,323 views

Published on

Talk on SEO for Aggregation Websites like Comparison Search Engines, Marketplaces or Classifieds platforms. Including Panda Diet and Internal linking, etc

Published in: Marketing
  • I have always found it hard to meet the requirements of being a student. Ever since my years of high school, I really have no idea what professors are looking for to give good grades. After some google searching, I found this service ⇒ www.HelpWriting.net ⇐ who helped me write my research paper. The final result was amazing, and I highly recommend ⇒ www.HelpWriting.net ⇐ to anyone in the same mindset as me.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Most surveys don't pay cash, ➜➜➜ https://t.cn/A6ybK3XL
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Water Hack Burns 2lb of Fat Overnight (video tutorial) ✔✔✔ https://url.cn/5yLnA6L
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I'm So Sorry They Lied To You� ♣♣♣ https://bit.ly/2WL1mUf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Behemoth SEO: Search Strategy for Huge Websites

  1. 1. Behemoth SEOSearch Strategy For Huge Websites @pip_net Download Slides: clk.me/behemoth
  2. 2. Philipp Klöckner Angel Investor & Advisor @pip_net 2005 2010 2015 2019
  3. 3. Behemoth SEOSearch Strategy For Huge Websites @pip_net
  4. 4. Most Behemoths Are Aggregation Websites with 1M+ Pages Vertical Search Engines • i.e. Comparison Shopping Engines (CSEs) and Meta- Search Engines • Scraping and aggregating price/fare and product information • Partly relying on affiliate data and feeds Classifieds • Real Estate, Cars, Jobs, Holiday Rentals, General Classifieds • Aggregating user-generated or previously published offers/ads • Content usually expires after certain timeframe Marketplaces • Aggregating supply (product/service feeds) and demand at the same time • Supplies often have several points of sale and syndicate data Social Networks & Forums • Vast amounts of user generated content • Insufficient control over quality and information architechture Most of these are „Intermediaries“ doing „Search“ and implicitly violate Guidelines.
  5. 5. Advantages & Challenges of Aggregators ChallengesAdvantages • Aggregation attracts demand (users) through superior availability, assortment (choice) and competition (price) • High degree of automation • Both market sides may create lots of content, data and value • Extremely scalable and capital efficient • Consequently build network effects and moats over time… • …and become hyper-profitable and well defendable • Automation potentially creates billions of documents • Quality of content/inventory is extremely diverse • Panda/Core algorithm sparked a structural decline of the whole sector • Google positions own verticals above SERPs • Aggregators may potentially violate different Google Guidelines: • Dupe Content (int/ext) • Thin Content • Affiliate Content • Indexable Search
  6. 6. Thin Affiliate Duplicate Content Excessive Page Growth Medic Panda SERP in SERPs Thin & Empty Pages
  7. 7. Useful Advice For Very Big Websites
  8. 8. But It‘s Has Gotten A Lot Better Recently… “…there’s some really good stuff here. But there’s also some really shady or iffy stuff here as well… and we don’t know like how we should treat things over all. That might be the case.” @JohnMu
  9. 9. Comparison Search has been in Structural Decline for the Past Decade Panda 1.0
  10. 10. “YOU HAVE STOLEN MY DREAMS AND MY CHILDHOOD WITH YOUR EMPTY
  11. 11. Navigating an aggregation website through Panda
  12. 12. PANDA HUGGER
  13. 13. Comparison Search has been in Structural Decline for the Past Decade Panda 1.0
  14. 14. Well… Everyone but Two Players Idealo.de Ladenzeile.de
  15. 15. Classical Search Engine Optimisation Framework SEO Content Popularity Technical SEO • Inventory • Text • Rich Media • Video • Advice • Structured Data • Tools & Apps • Interactive Content • Links • Mentions • Brand Search • Comp. Brand Search • Direct Type-Ins • Sharing • All available signals • Internal Linking • URL Design • Indexing • Heading Tags • Href Lang Setup • Structured Data • HTTPS/HTTP2
  16. 16. Search Engine Optimisation Post-Panda (2011) SEO Content Popularity Technical SEO • Inventory • Text • Rich Media • Video • Advice • Structured Data • Tools & Apps • Interactive Content • … • Links • Mentions • Brand Search • Comp. Brand Search • Direct Type-Ins • Sharing • All available signals • Internal Linking • URL Design • Indexing • Heading Tags • Href Lang Setup • Structured Data • HTTPS/HTTP2 User Experience • Bounce Rate • Back To SERP • Dwell Time • Retention • Trust • Search Journey • Satisfaction of Intent PageSpeed * * 2011 Major Google Update named after Engineer Panda Navneet
  17. 17. Search Engine Optimisation Today (2019) SEO Content Popularity Tech SEO User Experience
  18. 18. The Future of Search Engine Optimisation SEO C P T User Experience
  19. 19. http://clk.me/smx19
  20. 20. Focus Areas of Concern for Huge Websites SEO Content Popularity Technical SEO • Inventory • Text • Rich Media • Video • Advice • Structured Data • Tools & Apps • Interactive Content • … • Links • Mentions • Brand Search • Comp. Brand Search • Direct Type-Ins • Sharing • All available signals • Internal Linking • URL Design • Indexing • Heading Tags • Href Lang Setup • Structured Data • HTTPS/HTTP2 User Experience • Bounce Rate • Back To SERP • Dwell Time • Retention • Trust • Search Journey • Satisfaction of Intent PageSpeed * * 2011 Major Google Update named after Engineer Panda Navneet
  21. 21. Today we‘ll learn: 1. Index Management 2. Crawl Budget Optimisation with internal Linking 3. Making Users Happy!  4. Practise with Case Studies
  22. 22. Theory: Typical Page Quality (Qp) over Number of Pages (np) np Qp Homepage Category Category+Brand Facetted Search Thin Catalogue (low inventory) Dupe Content page „no results“ page highestlowestmediorceuseful 400.000200.000 300.000100.000 Page Quality (Qp) can be defined as content richness, engagement, ultimateley how useful the page is to the user. But also its revenue potential. PROBLEM: Since Panda (2011) this structure has become toxic.
  23. 23. TIME FOR A PANDA DIET!
  24. 24. Theory: Typical Page Quality (Qp) over Number of Pages (np) np Qp highestlowestmediorceuseful 400.000200.000 300.000100.000 Average Quality 😞 Quality Threshold (mediocre and better) NOINDEX (320.000) INDEX (80.000) New Average Quality QTY INCREASE Panda Diet: Let‘s cut some crap! Quality Threshold RANKINGS Page Quality (Qp) can be defined as content richness, engagement, ultimateley how useful the page is to the user. But also its revenue potential.
  25. 25. Identifying Low Quality Pages by Page-Type Easy NOINDEX Targets • „no results“ pages • Few results pages (set item threshold) • Single review pages, other low-quality UGC • Bulk product pages • Any dupe pages • Facetted search w/o search demand • Out of stock pages • Expired offers/ads • Parameters, etc… If your site has more indexed pages than things on sale – you‘re doing it wrong!
  26. 26. ME DOING THE PANDA DIET
  27. 27. Identifying Low Quality Pages: Data Driven Approach Data to support page quality decisions • Revenue distribution on landing pages (Google Analytics) • Engagement and commercial metrics per page-type • Conversion rate related to inventory count • Demand-Data (Search Volume, PPC traffic, navigational traffic) • „Indexation Gap“ (Sitemaps, Submitted vs. Indexed) • Crawling Activity (Server Logs) • Hint: Consider using De-Indexing sitemaps to accelerate Panda Diet
  28. 28. Theory: Typical Page Quality (Qp) over Number of Pages (np) np Qp highestlowestmediorceuseful 400.000200.000 300.000100.000 Truth is: This curve doesn‘t look like this… Page Quality (Qp) can be defined as content richness, engagement, ultimateley how useful the page is to the user. But also its revenue potential.
  29. 29. Theory: Typical Page Quality (Qp) over Number of Pages (np) np Qp highestlowestmediorceuseful 400.000200.000 300.000100.000 Truth is: This curve doesn‘t look like this… BUT: More like THIS! Page Quality (Qp) can be defined as content richness, engagement, ultimateley how useful the page is to the user. But also its revenue potential.
  30. 30. Theory: ACTUAL Page Quality (Qp) over Number of Pages (np) np Qp highestlowestmediorceuseful 400.000200.000 300.000100.000 Truth is: This curve doesn‘t look like this… BUT: More like THIS! ACTUALLY… like THIS! Page Quality (Qp) can be defined as content richness, engagement, ultimateley how useful the page is to the user. But also its revenue potential.
  31. 31. Theory: ACTUAL Page Quality (Qp) over Number of Pages (np) np Qp highestlowestmediorceuseful 400.000200.000 300.000100.000 Page Quality (Qp) can be defined as content richness, engagement, ultimateley how useful the page is to the user. These pages typically… • Never saw a visit, nor any conversions (GA Organic Langing Pages) • Aren‘t crawled any longer, as Google wont rank them anyway (logs) • Are not being considered for indexation (GSC Sitemaps Monitor) While 100% of your revenue is here!
  32. 32. A Proper Cut: Extreme Panda Diet
  33. 33. The Result of Removing 997 out of 1,000 Pages
  34. 34. Dev Fuckup
  35. 35. How To Deal With Duplicate Content Reliable Solutions 1. Avoid it! Internally and externally (Double Serving, Affiliate Content, Syndication) 2. Identify it! (Ryte Reports, „Quotation Searches“, HTML Improvements in GSC, etc) 3. Rewrite or enrich content 4. NOINDEX 5. Enforce Canoncial URL via 301 (lookup, fix, truncate – „Canonical for Adults“) (http://example.com/landing/?page=2&affID=anet ==301==> https://www.example.com/landing/) Post & Pray Solutions (these might or might not work perfectly) 1. Canonical Tag 2. GSC Parameter Handling 3. Robots.txt
  36. 36. Bot Recognition (Switch) Crawling- friendly website Fully functional website Tip: Surf Amazon side-by-side as Googlebot vs Real User
  37. 37. If Noindex: Consequently „Orphanize“ Pages Home One Two Three
  38. 38. If Noindex: Consequently „Orphanize“ Pages Home One Two Three NOINDEX
  39. 39. If Noindex: Consequently „Orphanize“ Pages Home One Two Three NOINDEX Viable solutions for link removal • Nofollow • Dynamic Serving („Cloaking“) • Client-side JS • PRG Pattern • Forms/Buttons
  40. 40. Get Rid Of Pagination (Entirely) Pagination Best Practise • Pagination is a stupid offline concept • More items, less pages, less problems • Users like comprehensive pages (A/B Test) • NOINDEX pagination if possible • Remove links to those pages • No pagination pages – no problem • Make sure discovery remains intact No one, ever…
  41. 41. This useless shit… Gone (for Bots at least) Social Profile Links Locale Selector Keep these on you Homepage or About Us, but not on every page. (If they are important for the user, why are they in the footer?)
  42. 42. Product Detail Pages
  43. 43. 46 Even Product/Offer Detail Pages Might Be Low-Quality 5x ? 0,1% of Pages
  44. 44. Case Study: How to identify the least valuable pages? 1. Out of Stock Handling: (OoS pages generate lots of html pages and poor UX) 1. If OoS for good: 301 to most similar page (parent category) or 410 if no alternative 2. (If potentially restocked keep page alive (200), offer restock alert and/or alternatives) 2. Facetted Search (Filters) & Indexable Site Search 1. Set minimum item threshold to define a „good“ search result page that doesn‘t look like a SERP 2. Build clusters where possible (typos, plurals, refined queries, entities) 3. Apply quality thresholds (Dwell time, Bounce rate, conversion) to SERP in SERP pages (indexing int. Search) 3. Pagination 1. Show more items per page (3x more items = 1/3 of pages) 2. Best solution for pagination: no pagination 4. PDP (product detail page) reduction 1. Get better at understanding shelf huggers and bestsellers using your data 2. Advanced: Predict page performance with machine learning (OEM, price, category, attributes, etc) 3. Merge variants into master products (sizes, patterns, color, etc) 5. Reviews & FAQ: Use Overview pages for reviews & questions, don‘t index single pieces of content 6. Don‘t built a self-fulfilling prophecy 1. Allow triggers for re-indexation (ppc traffic, navigational demand, etc)
  45. 45. Internal Search Makes Inventory Accessible Million $ Mistake
  46. 46. Internal Search Makes Inventory Accessible
  47. 47. Put Your Site Search In A Prominent Place!
  48. 48. Case Study: How to identify the least valuable pages?
  49. 49. Pinterest: Dupe Content Clusterfuck https://www.pinterest.com/pin/554083560398205192/ https://www.pinterest.de/pin/554083560398205192/ https://www.pinterest.at/pin/554083560398205192/ https://www.pinterest.fr/pin/554083560398205192/ https://www.pinterest.es/pin/554083560398205192/ https://www.pinterest.pt/pin/554083560398205192/ https://www.pinterest.se/pin/554083560398205192/ https://www.pinterest.dk/pin/554083560398205192/ https://www.pinterest.no/pin/554083560398205192/ https://www.pinterest.ch/pin/554083560398205192/ https://www.pinterest.ie/pin/554083560398205192/ https://www.pinterest.ch/pin/554083560398205192/ https://www.pinterest.id/pin/554083560398205192/ https://www.pinterest.it/pin/554083560398205192/ https://www.pinterest.ru/pin/554083560398205192/ + 2 dozen more locales….
  50. 50. Pinterest: Internationalization
  51. 51. RE-PINS – Adding Insult To Injury! https://www.pinterest.de/pin/243475923592500876/https://www.pinterest.de/pin/241013017546674029/ INDEXABLEINDEXABLE New URL!
  52. 52. Pinterest: Millions of Dead Files Boards Pins Home Fave Places My Style INDEXABLE
  53. 53. Quick Reminder (10 Years ago…) 2009!
  54. 54. Master of Soft 404s
  55. 55. Case Study: How to identify the least valuable pages? 1. Facebook Index Coverage: Accessibility vs. Page Quality 2. Inactive/Empty Groups, Pages, Users, Places 3. Privacy-aware users (or create incentive to share public to improve LP value) 4. Use Engagement as a quality metric for post-URLs (doesn‘t get much better than this) 5. Marketplace (See Advanced Panda Diet) 6. Expired Events 7. …
  56. 56. Case Study: How to identify the least valuable pages? 1. Facebook Index Coverage: Accessibility vs. Page Quality 2. Inactive/Empty Groups, Pages, Users, Places 3. Privacy-aware users (or create incentive to share public to improve LP value) 4. Use Engagement as a quality metric for post-URLs (doesn‘t get much better than this) 5. Marketplace (See Advanced Panda Diet) 6. Expired Events 7. …
  57. 57. 63 Crawling Efficiency & Internal Linking Links from GSC or Crawling Tools
  58. 58. 64 Crawling Efficiency & Internal Linking
  59. 59. Balance: Algorithmic Internal Linking for 1.000 Pages 1. New York 2. London 3. Paris 4. Rome 5. Amsterdam 6. Milan 7. Barcelona 8. Prague 9. Dublin 10. Berlin 1. Munich 2. Warzaw 3. Madrid 4. Copenhagen 5. Stockholm 6. San Francisco 7. Toronto 8. Hamburg 9. Rio de Janeiro 10. Cairo 1. Seattle 2. Marrakesh 3. Sofia 4. Wroclaw 5. Helsinki 6. Vancouver 7. Hanover 8. Marseille 9. Alicante 10. Edinburgh First Tier Top 10 This class of pages gets 1.000 Links each Second Tier: Random 10 out of Top 100 This class of pages gets 100 Links each Third Tier: Random 10 out of Top 1.000 This class of pages gets 10 Links each • Shuffle container 2+3, but keep static per page • Include relevance score/silos/topical proximity to improve UX
  60. 60. 66 Fix Internal Linking Using Bestseller Lists 1. Standard Sorting: Popularity 2. Dyn. Bestseller Lists for Prioritization 3. „New Arrivals“ for Discovery 4. Related Products für Completeness 5. Breadcrumb for Bottom Up Prio 6. Prio über Sitemap: Ask Santa about it!
  61. 61. SEO EfficiencyTM * * * * The key to extremely big websites: Trim them for Efficiency! 100x 2200x
  62. 62. THANK YOU!
  63. 63. Frequently Asked Questions How isn‘t this cloaking? I‘m afraid I could lose all my long-tail revenue. *mimimi* Should I remove all those pages in one drastic move? Wouldn‘t Google see that as a weakness? Should I really dynamically switch/flap index directives? How does GoogleBot discover my content without pagination? 1. It doesn‘t alter user experience 2. It only makes Google‘s job easier 3. Take a look at Amazon, bro 1. There‘s usually no data confirming the long-tail 2. Rankings are usually not lost but substituted by superior pages 3. Google actually prefers pages with good UX over the most specific result (Hummingbird, RankBrain instead of perfect title string match) It‘s always a good time to do the right thing! I think you should. See above. If you need pagination for discovery, you‘ve got bigger fish to fry. Seriously…
  64. 64. What to remember… 1. We‘re doing this for 10 years (Pre-Panda) now and it has never backfired 2. This is most important if your website has more than 100.000 pages 3. Index Bloat: Millions of indexed HTML documents are not an asset but a liability. Indexing everything is inefficient by definition. 4. 80 % (actually 95%) of your website usually is dead weight. And it‘s pulling down your best pages. 5. Analyse your potential with an organic landing page report 6. There‘s no black and white, but a reasonable amount of grey which should be defined by data 7. Non-transactional content is (most likely) overrated. (Inventory=Content)

×