• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Not just talking about the affiliate penalty, but the big challenge with data feeds is to differentiate.Typically we’re looking to make the site be unique, and as a consequence get unique content too.
  • Two causes of the affiliate penalty. Algorithms and people.
  • This is the classical model of unique content. People think of unique content in terms of absolutes.
  • However, the truth is much more complicated. Unique content is a sliding scale.You can have unique content duplicated across your own site, you can mash up public information.Pages can be combinations of unique and non-unique content.
  • User generated content is awesome. But hard to get hold of without a community.
  • Ways of getting content from “users”.
  • Ways of getting content from users on other sites. Or mashing up content from sites behind login walls etc.
  • A springboard of ideas for building your own tools and mashups.
  • Content generated by users, but they didn’t imagine it would make it onto the site.E.g. “queries used to find this page”
  • Talking about the conversation you had with Chewie – how it’s not just about unique content, but about
  • Example keyword on the left pulled from anamazon feed. Keyword on the right what people search for.Perform keyword research intelligently and group/theme your keywords so that the products you’re fed match up with what people search for.
  • You can be unique, by having non-unique content and displaying it in valuable ways.


  • 1. Data Feed SEO
    A4uexpo London, October 2010
    Will Critchlow
  • 2. Data Feeds Are Not Unique
  • 3. The “Affiliate” Penalty
  • 4. Unique Content Matrix
    Site Strength
  • 5.
  • 6. Case Study
    “Welcome visitor, please find out selection of [insert product] below, we have [number of products] items. We think you’ll like them!”
  • 7. User Generated Content
  • 8. “User” Generated Content
  • 9. User “Generated” Content
  • 10. Building quick & dirty SEO ToolsA Cheat Sheet & Inspiration
    APIs (more on programmable web)
    AdWords – Keywords
    Alchemy – Structured data & text
    Bing – Search, news, spelling
    Evri – Sentiment and popularity
    Face.com – Face detection
    Facebook – Social graph
    Google Analytics – Visitor data
    Hostip – Geo data
    LinkedIn – Professional data
    Pingdom – Website uptime
    Postrank (1, 2, 3) – real-time & influence
    Rapleaf – Social media profiles
    Twitter – Real time and social
    ... And of course:
    Linkscape – Links
    YQL – Yahoo! Query Language
    select * from html where url=“<url>" and xpath=“<xpath>“
    select * from html where url=“<url>"
    select * from feed where url=“<url>”
    select * from search.web where query = “<query>"
    Crawlers / Scrapers
    Google App Engine
    Amazon Web Services
    Human Touch
    Amazon Mechanical Turk
    Smartsheet(interface to Mechanical Turk)
    Since Python is the language of Google App Engine, here is how you can use YQL easily within Python:
    Download source – extract to yql folder within your application
    import yql
    y = yql.Public()
    result = y.execute(“<yql query>”)
    xpath(more examples)
    /foo – the element ‘foo’
    //bar – all elements ‘bar’
    foo/bar – all bar elements children of foo
    foo//bar – bar arbitrary levels below foo
    foo/*/bar – bar grandchildren of foo
    foo/* - all children elements of foo
    foo/@bar – bar attribute on foo
    foo/[@bar] – foo with bar attributes
    foo/[@bar=baz] – where attribute=baz
    Data (more on infochimps)
    Data.gov – US government data
    Data.gov.uk – UK government data
    Delicious list – from Peter Skomoroch
    Google Public Data - Directory
    Guardian – content and data
    World Bank – finance, health, etc.
    80legs – prepackaged crawl data
    By Will Critchlow, www.distilled.co.uk. First published: www.seomoz.org
  • 11. User Generated “Content”
    • External search queries
    • 12. Internal search queries
    • 13. Tags
    • 14. Testimonials
    • 15. FAQs/Support emails
  • Tracking # of Reviews
    _gaq.push(['_setCustomVar',      1,                   // This custom var is set to slot #1.
          ‘Number of Reviews',       // The top level name for the variable
          ‘1',   // The Number of Reviews
          3                     // Page level variable   ]);
  • 16. Context Is Key
    Google News: Google likes alternative facts
    Lyrics: Never considered duplicate content
    Context is key
    Look to stand out from your competitors
    “Use a source of content that’s not unique, but that no-one else in your space is using”
  • 17. Manipulate & Clean Your Data
    “Kingston DataTraveler 101 USB flash drive - 4 GB – Cyan”
    “Kingston USB memory stick 4gb”
  • 18. Of Course, Links Always Win
  • 19. Manual Reviews – aka “Hand Jobs”
    Check out the quality rater guidelines
    “Add value to users”
    These are subjective!!
  • 20. Resources
    • http://www.seomoz.org/blog/whiteboard-friday-flat-site-architecture
    • 21. http://seogadget.co.uk/solving-site-architecture-issues/
    • 22. http://www.seomoz.org/blog/api-and-dataset-cheatsheet-building-quick-dirty-tools
    • 23. http://www.mozenda.com
    • 24. http://www.seomoz.org/blog/leveraging-mechanical-turk-odesk-elance-craigslist-for-seo
    • 25. http://www.seochat.com/c/a/Google-Optimization-Help/Googles-Quality-Rater-Guidelines-Leaked/
    • 26. http://www.flickr.com/photos/rosaydani/77371897/
  • Thanks!
  • 27. Will Critchlow