Panda Diet for Overweight Websites

2,678 views

Published on

How to keep junk pages off of your site, and remove the ones that are there, so you can avoid Google Panda.

Published in: Marketing
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,678
On SlideShare
0
From Embeds
0
Number of Embeds
69
Actions
Shares
0
Downloads
31
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • - I’m from San Franciscos- Glassdoor is the world’s largest user-generated content community site for jobs, companies and salaries, with over 80 million pages of jobs and user-generated content.- About.com and Ask.com are both top 50 web properties in the US, by traffic. They are two of the largest online publishers in the world. Ask.com has question-and-answer content writren by users and editors. About.com has articles written by experts. Each site has about 10 million pages indexed in Google.
  • It’s harder to control quality: 100 pages: You know what’s on each page.100,000 pages: No one is checking them all.100,000,000 pages: Would you know if 1 million of them were junk?
  • PageRank : Overall site PR helps new and existing pagesInterlinking: If you have more pages, you can get more relevant links between pages.Economies of scale. Managing larger sites is more efficient.Brand: User prefer familiar brands in search results.
  • “No results” pages: When your site has faceted navigation, some pages have no data. (E.g., no products in this category, no reviews for this restaurant, no salaries for this company).URL based duplicates: Multiple URLs return the same content.Content-based duplicates: If you have lots of content, sometimes the same topic comes up again.Multiple versions of site, multiple countries: Duplication between versions? Empty pages in some versions?
  • Every company on Glassdoor times every city they’re located in time salaries, review, or interviews, times job titles. Tens of millions of pages with no results.
  • Every company on Glassdoor times every city they’re located in time salaries, review, or interviews, times job titles. Tens of millions of pages with no results.
  • At one of the companies I worked at, we found the worst-performing 5% of pages, and we hired a team of editors to fix them.
  • Eliminate Duplicate TitlesFind pages with the same title (Webmaster tools)Same/overlapping content? Canonicalize the worse one to the better one.Different content? Merge them into one content page.
  • We created a search engine index of all our pages using Solr, an open source search engine platform.
  • Panda Diet for Overweight Websites

    1. 1. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Click to edit Master title styleClick to edit Master title styleThe Panda Diet for Big, Fat, Overweight Websites Ehren Reilly | Glassdoor.com SMX München March, 2014
    2. 2. Confidential and Proprietary © Glassdoor, Inc. 2008-2013
    3. 3. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Bigger isn’t always better  Big and strong and lean?  …or fat?
    4. 4. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Sometimes, bigger is better  PageRank  Interlinking  Economies of scale  Brand
    5. 5. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 When You’re Big, It’s Easy to Get Overweight Pages Indexed (Webmaster Tools) SEO Visibility (SearchMetrics)
    6. 6. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Overweight Sites Are Food for the Panda PAGES INDEXED % USEFUL
    7. 7. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 How Big Sites Get Fat With Junk Pages  “No results” pages  URL based duplicates  Content topic repetition  Multiple versions of site, multiple countries
    8. 8. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Is Google Sending Traffic To Your Junk Pages?  Panda looks at all the pages of your site (not just the good ones).  Junk pages drive down your overall score.  Pre-Panda: “Send me any traffic to any page, it can’t hurt!”  Post-Panda: “Don’t send traffic to my junk pages, because that will ruin my average.”  How do you get Google to stop sending traffic to your junk pages? 8
    9. 9. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 The Panda Diet
    10. 10. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 1. "noindex" Pages with No Content
    11. 11. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 1. "noindex" Pages with No Content
    12. 12. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 1. "noindex" Pages with No Content
    13. 13. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 1. "noindex" Pages with No Content Benefits of noindex,follow  Still get credit for links to these pages.  Users can still access these pages via navigation.  Google won’t send users to these pages. Why not Canonical? Sometimes you can’t figure out in real time which is the most relevant other page. <meta name="robots" content="noindex,follow”>
    14. 14. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 2. If no one ever visits a page, remove it  If no one ever visits a page, it’s because: A. No one wants that information B. Google doesn’t think that page is a good result for any user queries  If you have a page with no visitors, do you really need that page?  If a page has no value, then remove, canonicalize or noindex
    15. 15. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 3. Identify your pages with the highest bounce rate. Fix them. Too expensive to improve all of your content? Only fix the worst pages.
    16. 16. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 4. Only One Page Per Unique Title
    17. 17. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 4. Only One Page Per Unique Title
    18. 18. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 5. Only One Page Per Topic
    19. 19. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 5. Only One Page Per Topic
    20. 20. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 5. Only One Page Per Topic How to automate detection of similar articles: For 1,000,000 pages, which pairs of pages are very similar? All Pairs Problem To compare every pair of items in a set of 1 million items requires billions of comparisons.
    21. 21. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 5. Only One Page Per Topic Create a search engine index (Solr) How to tie a tie How to tie a tie for a suit (0.92) How to tie a tie in a Windsor knot (0.82) How to tie a tie step by step (0.97) How to tie a neck tie (0.90) How to tie a Windsor knot (0.65)
    22. 22. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Case Study: Successful Panda Diet Before  12 million pages of article content.  95% of URLs get <3 visit per year.  Panda problem Project  Remove “no content” pages (3 million)  Merge duplicate title pages (80,000)  Merge similar topic pages using a Solr search index (2 million)  Remove pages with <3 visits in prior 12 months (5.5 million) After  1 million good quality pages remained.  Noindex or merged 11 million pages – 2% loss in traffic in first 30 days  Panda problem went away – Increase in traffic 22% in 60 days – Increase in traffic 118% in 120 days
    23. 23. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Case Study: Successful Panda Diet
    24. 24. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Conclusion  Bigger isn’t better.  Don’t try to get bigger, try to be more useful for more users.  As your site grows and you add new features, stay lean.  If your site gets overweight, put it on a diet.
    25. 25. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Thank You! Ehren Reilly ehren.reilly@glassdoor.com @ehrenreilly "noindex"

    ×