Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Click to edit Master title styleClick to edit Master title styleThe Panda Diet for Big, Fat, Overweight Websites
Ehren Reilly | Glassdoor.com
SMX München
March, 2014
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Bigger isn’t always better
 Big and strong and lean?
 …or fat?
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Sometimes, bigger is better
 PageRank
 Interlinking
 Economies of scale
 Brand
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
When You’re Big, It’s Easy to Get Overweight
Pages Indexed (Webmaster Tools)
SEO Visibility (SearchMetrics)
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Overweight Sites Are Food for the Panda
PAGES INDEXED % USEFUL
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
How Big Sites Get Fat With Junk Pages
 “No results” pages
 URL based duplicates
 Content topic repetition
 Multiple versions of site, multiple countries
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Is Google Sending Traffic To Your Junk Pages?
 Panda looks at all the pages of your site (not just the good ones).
 Junk pages drive down your overall score.
 Pre-Panda: “Send me any traffic to any page, it can’t hurt!”
 Post-Panda: “Don’t send traffic to my junk pages, because that
will ruin my average.”
 How do you get Google to stop sending traffic to your junk
pages?
8
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
The Panda Diet
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Panda Diet: 1. "noindex" Pages with No Content
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Panda Diet: 1. "noindex" Pages with No Content
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Panda Diet: 1. "noindex" Pages with No Content
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Panda Diet: 1. "noindex" Pages with No Content
Benefits of noindex,follow
 Still get credit for links to these pages.
 Users can still access these pages via navigation.
 Google won’t send users to these pages.
Why not Canonical?
Sometimes you can’t figure out in real time which is the most
relevant other page.
<meta name="robots" content="noindex,follow”>
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Panda Diet: 2. If no one ever visits a page, remove it
 If no one ever visits a page, it’s because:
A. No one wants that information
B. Google doesn’t think that page is a good result for any user queries
 If you have a page with no visitors, do you really need that page?
 If a page has no value, then remove, canonicalize or noindex
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Panda Diet: 3. Identify your pages with the highest
bounce rate. Fix them.
Too expensive to improve all of
your content?
Only fix the worst pages.
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Panda Diet: 4. Only One Page Per Unique Title
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Panda Diet: 4. Only One Page Per Unique Title
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Panda Diet: 5. Only One Page Per Topic
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Panda Diet: 5. Only One Page Per Topic
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Panda Diet: 5. Only One Page Per Topic
How to automate detection of similar articles:
For 1,000,000 pages, which pairs of pages are very similar?
All Pairs Problem
To compare every pair of items in a set of 1 million items requires
billions of comparisons.
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Panda Diet: 5. Only One Page Per Topic
Create a search engine index (Solr)
How to tie a tie
How to tie a tie
for a suit (0.92)
How to tie a tie in a
Windsor knot (0.82)
How to tie a tie step
by step (0.97)
How to tie a neck
tie (0.90)
How to tie a Windsor
knot (0.65)
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Case Study: Successful Panda Diet
Before
 12 million pages of article content.
 95% of URLs get <3 visit per year.
 Panda problem
Project
 Remove “no content” pages (3 million)
 Merge duplicate title pages (80,000)
 Merge similar topic pages using a Solr search index (2 million)
 Remove pages with <3 visits in prior 12 months (5.5 million)
After
 1 million good quality pages remained.
 Noindex or merged 11 million pages
– 2% loss in traffic in first 30 days
 Panda problem went away
– Increase in traffic 22% in 60 days
– Increase in traffic 118% in 120 days
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Case Study: Successful Panda Diet
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Conclusion
 Bigger isn’t better.
 Don’t try to get bigger, try to be more useful for more users.
 As your site grows and you add new features, stay lean.
 If your site gets overweight, put it on a diet.
Confidential and Proprietary © Glassdoor, Inc. 2008-2013
Thank You!
Ehren Reilly
ehren.reilly@glassdoor.com
@ehrenreilly
"noindex"

Panda Diet for Overweight Websites

  • 1.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Click to edit Master title styleClick to edit Master title styleThe Panda Diet for Big, Fat, Overweight Websites Ehren Reilly | Glassdoor.com SMX München March, 2014
  • 2.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013
  • 3.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Bigger isn’t always better  Big and strong and lean?  …or fat?
  • 4.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Sometimes, bigger is better  PageRank  Interlinking  Economies of scale  Brand
  • 5.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 When You’re Big, It’s Easy to Get Overweight Pages Indexed (Webmaster Tools) SEO Visibility (SearchMetrics)
  • 6.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Overweight Sites Are Food for the Panda PAGES INDEXED % USEFUL
  • 7.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 How Big Sites Get Fat With Junk Pages  “No results” pages  URL based duplicates  Content topic repetition  Multiple versions of site, multiple countries
  • 8.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Is Google Sending Traffic To Your Junk Pages?  Panda looks at all the pages of your site (not just the good ones).  Junk pages drive down your overall score.  Pre-Panda: “Send me any traffic to any page, it can’t hurt!”  Post-Panda: “Don’t send traffic to my junk pages, because that will ruin my average.”  How do you get Google to stop sending traffic to your junk pages? 8
  • 9.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 The Panda Diet
  • 10.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Panda Diet: 1. "noindex" Pages with No Content
  • 11.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Panda Diet: 1. "noindex" Pages with No Content
  • 12.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Panda Diet: 1. "noindex" Pages with No Content
  • 13.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Panda Diet: 1. "noindex" Pages with No Content Benefits of noindex,follow  Still get credit for links to these pages.  Users can still access these pages via navigation.  Google won’t send users to these pages. Why not Canonical? Sometimes you can’t figure out in real time which is the most relevant other page. <meta name="robots" content="noindex,follow”>
  • 14.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Panda Diet: 2. If no one ever visits a page, remove it  If no one ever visits a page, it’s because: A. No one wants that information B. Google doesn’t think that page is a good result for any user queries  If you have a page with no visitors, do you really need that page?  If a page has no value, then remove, canonicalize or noindex
  • 15.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Panda Diet: 3. Identify your pages with the highest bounce rate. Fix them. Too expensive to improve all of your content? Only fix the worst pages.
  • 16.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Panda Diet: 4. Only One Page Per Unique Title
  • 17.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Panda Diet: 4. Only One Page Per Unique Title
  • 18.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Panda Diet: 5. Only One Page Per Topic
  • 19.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Panda Diet: 5. Only One Page Per Topic
  • 20.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Panda Diet: 5. Only One Page Per Topic How to automate detection of similar articles: For 1,000,000 pages, which pairs of pages are very similar? All Pairs Problem To compare every pair of items in a set of 1 million items requires billions of comparisons.
  • 21.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Panda Diet: 5. Only One Page Per Topic Create a search engine index (Solr) How to tie a tie How to tie a tie for a suit (0.92) How to tie a tie in a Windsor knot (0.82) How to tie a tie step by step (0.97) How to tie a neck tie (0.90) How to tie a Windsor knot (0.65)
  • 22.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Case Study: Successful Panda Diet Before  12 million pages of article content.  95% of URLs get <3 visit per year.  Panda problem Project  Remove “no content” pages (3 million)  Merge duplicate title pages (80,000)  Merge similar topic pages using a Solr search index (2 million)  Remove pages with <3 visits in prior 12 months (5.5 million) After  1 million good quality pages remained.  Noindex or merged 11 million pages – 2% loss in traffic in first 30 days  Panda problem went away – Increase in traffic 22% in 60 days – Increase in traffic 118% in 120 days
  • 23.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Case Study: Successful Panda Diet
  • 24.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Conclusion  Bigger isn’t better.  Don’t try to get bigger, try to be more useful for more users.  As your site grows and you add new features, stay lean.  If your site gets overweight, put it on a diet.
  • 25.
    Confidential and Proprietary© Glassdoor, Inc. 2008-2013 Thank You! Ehren Reilly ehren.reilly@glassdoor.com @ehrenreilly "noindex"

Editor's Notes

  • #3 - I’m from San Franciscos- Glassdoor is the world’s largest user-generated content community site for jobs, companies and salaries, with over 80 million pages of jobs and user-generated content.- About.com and Ask.com are both top 50 web properties in the US, by traffic. They are two of the largest online publishers in the world. Ask.com has question-and-answer content writren by users and editors. About.com has articles written by experts. Each site has about 10 million pages indexed in Google.
  • #4 It’s harder to control quality: 100 pages: You know what’s on each page.100,000 pages: No one is checking them all.100,000,000 pages: Would you know if 1 million of them were junk?
  • #5 PageRank : Overall site PR helps new and existing pagesInterlinking: If you have more pages, you can get more relevant links between pages.Economies of scale. Managing larger sites is more efficient.Brand: User prefer familiar brands in search results.
  • #8 “No results” pages: When your site has faceted navigation, some pages have no data. (E.g., no products in this category, no reviews for this restaurant, no salaries for this company).URL based duplicates: Multiple URLs return the same content.Content-based duplicates: If you have lots of content, sometimes the same topic comes up again.Multiple versions of site, multiple countries: Duplication between versions? Empty pages in some versions?
  • #11 Every company on Glassdoor times every city they’re located in time salaries, review, or interviews, times job titles. Tens of millions of pages with no results.
  • #12 Every company on Glassdoor times every city they’re located in time salaries, review, or interviews, times job titles. Tens of millions of pages with no results.
  • #16 At one of the companies I worked at, we found the worst-performing 5% of pages, and we hired a team of editors to fix them.
  • #18 Eliminate Duplicate TitlesFind pages with the same title (Webmaster tools)Same/overlapping content? Canonicalize the worse one to the better one.Different content? Merge them into one content page.
  • #22 We created a search engine index of all our pages using Solr, an open source search engine platform.