Panda Diet for Overweight Websites
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Panda Diet for Overweight Websites

  • 1,532 views
Uploaded on

How to keep junk pages off of your site, and remove the ones that are there, so you can avoid Google Panda.

How to keep junk pages off of your site, and remove the ones that are there, so you can avoid Google Panda.

More in: Marketing
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,532
On Slideshare
1,482
From Embeds
50
Number of Embeds
1

Actions

Shares
Downloads
26
Comments
0
Likes
3

Embeds 50

https://twitter.com 50

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • - I’m from San Franciscos- Glassdoor is the world’s largest user-generated content community site for jobs, companies and salaries, with over 80 million pages of jobs and user-generated content.- About.com and Ask.com are both top 50 web properties in the US, by traffic. They are two of the largest online publishers in the world. Ask.com has question-and-answer content writren by users and editors. About.com has articles written by experts. Each site has about 10 million pages indexed in Google.
  • It’s harder to control quality: 100 pages: You know what’s on each page.100,000 pages: No one is checking them all.100,000,000 pages: Would you know if 1 million of them were junk?
  • PageRank : Overall site PR helps new and existing pagesInterlinking: If you have more pages, you can get more relevant links between pages.Economies of scale. Managing larger sites is more efficient.Brand: User prefer familiar brands in search results.
  • “No results” pages: When your site has faceted navigation, some pages have no data. (E.g., no products in this category, no reviews for this restaurant, no salaries for this company).URL based duplicates: Multiple URLs return the same content.Content-based duplicates: If you have lots of content, sometimes the same topic comes up again.Multiple versions of site, multiple countries: Duplication between versions? Empty pages in some versions?
  • Every company on Glassdoor times every city they’re located in time salaries, review, or interviews, times job titles. Tens of millions of pages with no results.
  • Every company on Glassdoor times every city they’re located in time salaries, review, or interviews, times job titles. Tens of millions of pages with no results.
  • At one of the companies I worked at, we found the worst-performing 5% of pages, and we hired a team of editors to fix them.
  • Eliminate Duplicate TitlesFind pages with the same title (Webmaster tools)Same/overlapping content? Canonicalize the worse one to the better one.Different content? Merge them into one content page.
  • We created a search engine index of all our pages using Solr, an open source search engine platform.

Transcript

  • 1. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Click to edit Master title styleClick to edit Master title styleThe Panda Diet for Big, Fat, Overweight Websites Ehren Reilly | Glassdoor.com SMX München March, 2014
  • 2. Confidential and Proprietary © Glassdoor, Inc. 2008-2013
  • 3. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Bigger isn’t always better  Big and strong and lean?  …or fat?
  • 4. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Sometimes, bigger is better  PageRank  Interlinking  Economies of scale  Brand
  • 5. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 When You’re Big, It’s Easy to Get Overweight Pages Indexed (Webmaster Tools) SEO Visibility (SearchMetrics)
  • 6. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Overweight Sites Are Food for the Panda PAGES INDEXED % USEFUL
  • 7. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 How Big Sites Get Fat With Junk Pages  “No results” pages  URL based duplicates  Content topic repetition  Multiple versions of site, multiple countries
  • 8. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Is Google Sending Traffic To Your Junk Pages?  Panda looks at all the pages of your site (not just the good ones).  Junk pages drive down your overall score.  Pre-Panda: “Send me any traffic to any page, it can’t hurt!”  Post-Panda: “Don’t send traffic to my junk pages, because that will ruin my average.”  How do you get Google to stop sending traffic to your junk pages? 8
  • 9. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 The Panda Diet
  • 10. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 1. "noindex" Pages with No Content
  • 11. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 1. "noindex" Pages with No Content
  • 12. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 1. "noindex" Pages with No Content
  • 13. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 1. "noindex" Pages with No Content Benefits of noindex,follow  Still get credit for links to these pages.  Users can still access these pages via navigation.  Google won’t send users to these pages. Why not Canonical? Sometimes you can’t figure out in real time which is the most relevant other page. <meta name="robots" content="noindex,follow”>
  • 14. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 2. If no one ever visits a page, remove it  If no one ever visits a page, it’s because: A. No one wants that information B. Google doesn’t think that page is a good result for any user queries  If you have a page with no visitors, do you really need that page?  If a page has no value, then remove, canonicalize or noindex
  • 15. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 3. Identify your pages with the highest bounce rate. Fix them. Too expensive to improve all of your content? Only fix the worst pages.
  • 16. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 4. Only One Page Per Unique Title
  • 17. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 4. Only One Page Per Unique Title
  • 18. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 5. Only One Page Per Topic
  • 19. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 5. Only One Page Per Topic
  • 20. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 5. Only One Page Per Topic How to automate detection of similar articles: For 1,000,000 pages, which pairs of pages are very similar? All Pairs Problem To compare every pair of items in a set of 1 million items requires billions of comparisons.
  • 21. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Panda Diet: 5. Only One Page Per Topic Create a search engine index (Solr) How to tie a tie How to tie a tie for a suit (0.92) How to tie a tie in a Windsor knot (0.82) How to tie a tie step by step (0.97) How to tie a neck tie (0.90) How to tie a Windsor knot (0.65)
  • 22. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Case Study: Successful Panda Diet Before  12 million pages of article content.  95% of URLs get <3 visit per year.  Panda problem Project  Remove “no content” pages (3 million)  Merge duplicate title pages (80,000)  Merge similar topic pages using a Solr search index (2 million)  Remove pages with <3 visits in prior 12 months (5.5 million) After  1 million good quality pages remained.  Noindex or merged 11 million pages – 2% loss in traffic in first 30 days  Panda problem went away – Increase in traffic 22% in 60 days – Increase in traffic 118% in 120 days
  • 23. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Case Study: Successful Panda Diet
  • 24. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Conclusion  Bigger isn’t better.  Don’t try to get bigger, try to be more useful for more users.  As your site grows and you add new features, stay lean.  If your site gets overweight, put it on a diet.
  • 25. Confidential and Proprietary © Glassdoor, Inc. 2008-2013 Thank You! Ehren Reilly ehren.reilly@glassdoor.com @ehrenreilly "noindex"