The Risks And Rewards Of Data Scraping For Seo 21 10 09


Published on

Jason Woodford was invited to join a panel discussion and presentation today on data scraping by our esteemed legal friends at DMH Stallard, “the business people who happen to be lawyers”.

I joined speakers from Sentor who introduced the audience to their data scraping monitoring service called Assassin as well as their customer who explained how they have overcome and now manage the data scraping issues they face as a business.

I was uncomfortably placed on the side of the “scrapers” whereas Sentor and Yell were defending the “scrapees” with Frank Jennings adjudicating from DMH Stallard. It’s clear there are strong arguments for and against scraping......

Published in: Business, Technology
1 Comment
  • great info on seo as well as the legaility of scraping.

    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Environment in Business, a Lexis Nexis specialist B2B publication The problem = need to drive relevant traffic to drive subscriptions
  • Use absolute URLs in your links.Include the full path ( instead of relative URLs (/page.php). Use internal linking strategically.In the first couple paragraphs of your content try to find a place where it makes sense, and link to another page of your site using relevant anchor text similar to keywords you want to rank for. Make sure each content headline is a link.Turn the headline of the web page into a link to that page.   Add copyright notice and a link to your site in the RSS feed.Most website scrapers just use your RSS feed to scrape content. They won’t realise that when they post the contents of your feed to their own site, they will be giving you a link with keywords in the anchor text, along with information saying the copyright for the content belongs to you and your site.
  • The Risks And Rewards Of Data Scraping For Seo 21 10 09

    1. 1. The Risks and Rewards of Data Scraping for SEOSiteVisibilitySearch & Digital Marketing ExpertsWe think beyond the click™<br />
    2. 2. SiteVisibility, the Search Marketing division of AI Digital<br /><ul><li>SiteVisibility is a top 20, rapidly growing, award winning digital & search engine marketing agency
    3. 3. Founded in 2002, 20 employees and £1.1m revenue in 2008
    4. 4. Renowned for our search marketing expertise
    5. 5. the number 1 marketing podcast on iTunes
    6. 6. the leader in SEO performance models
    7. 7. one of the top 20 marketing blogs in the UK
    8. 8. pioneering ISO standards for search marketing
    9. 9. Investing 2% of our revenue in Search Marketing R&D</li></li></ul><li>Maximise volume & minimise the cost of leads <br />
    10. 10. So what is SEO and why is it important?<br />
    11. 11. So, why is SEO important? <br />
    12. 12. What does SEO rely on? 8 SEO Basics<br />Findability - keywords need to be in Meta Titles, Headings, Content, URLs AND in hyperlinks linking BACK to a particular URL<br />Indexability: eg. Duplicate content, suffixes and sitemaps? <br />The other 6 are Accessibility, Usability, Sharability, Linkability, Convertability, andTrackability<br /> Google likes original, high quality, keyword rich content from high authority sites......<br />
    13. 13. Scraping - legal or illegal?<br />Data scraping from public data repositories is very common and in most cases legal. <br />However, if your purpose is to steal site Y&apos;s content so you can put it on your site and benefit from it then that is classed as copyright infringement<br />Scraping on this basis is illegal and…<br />Violates the Digital Millennium Copyright Act<br />It can often hurt search engine rankings of websites (bad for search engine optimisation – SEO)<br />
    14. 14. What Action Can I Take?<br />Option 1 – Report them to Google & their ISP and / or take legal action<br /> - Could cost you time & money<br />Option 2– Deal with it<br />- There are some technical mitigations<br />Option 3- Think ahead…<br />- Set up your website to take advantage of these scrapers and gain some SEO benefit.<br />
    15. 15. Why is Data Scraping a risk for SEO?<br />For the “Scraper”<br /><ul><li>Duplicate content but at least it’s content
    16. 16. Less authority as a producer of original content</li></ul>For the “Scrapee”<br /><ul><li>Google does not like duplicate content so you could be penalised for:
    17. 17. Effecting query data
    18. 18. Falsifying the number of “real” impressions for advertisers
    19. 19. Your authority as the original content source is in question</li></li></ul><li>Dealing with it – some technical mitigation<br />Ultimately, if data and content is accessible online, anyone/machine can manually copy and create a new database. Although this practice would be illegal in the UK, it is a known risk to all data publishers.<br /><ul><li>Monitor your web analytics for scraping
    20. 20. IP lock out which restricts any IP to maximum access per hour before blocking the IP or requiring a “captcha”
    21. 21. Use “captcha” forms instead of allowing extractable email addresses
    22. 22. Block the IP address of all of your known competitors
    23. 23. Generally scraping is done via patterns on the pages. If we use random page generators then scraping becomes difficult.
    24. 24. Use a Flash layer to display the final data so that it cannot be scraped whilst making sure you provide for SEO in the design</li></li></ul><li>Thinking ahead – make it work for SEO<br /><ul><li>Use absolute URLs in your links
    25. 25. Use internal linking strategically
    26. 26.  Make sure each content headline is a link
    27. 27.  Add copyright notice and a link to your site in the RSS feed
    28. 28.  Get some extra juice in Technorati.</li></li></ul><li>And our advice...<br />Recognise there is a SEO Risk / Opportunity<br />Decide on your approach<br />Go legal OR<br />Make it difficult OR<br />Make it work for you OR<br />All of the above<br />Constantly monitor the situation and develop / refine your approach as part of your online strategy<br />
    29. 29. The Risks and Rewards of Data Scraping for SEOSiteVisibilitySearch & Digital Marketing ExpertsWe think beyond the click™<br />
    30. 30. Some legal &helpful uses of scraping <br /><ul><li>Market research & business intelligence
    31. 31. Data mining your competitor's website to compare prices, products offered, business partners acquired and other critical data.
    32. 32. Reputation Management! What if you were alerted to every good or bad comment said about your company or product on a blog, forum or website and could respond with correction or enhancements before mis-information was spread around the Internet?</li>