The Risks And Rewards Of Data Scraping For Seo   21 10 09
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

The Risks And Rewards Of Data Scraping For Seo 21 10 09

  • 4,061 views
Uploaded on

Jason Woodford was invited to join a panel discussion and presentation today on data scraping by our esteemed legal friends at DMH Stallard, “the business people who happen to be lawyers”....

Jason Woodford was invited to join a panel discussion and presentation today on data scraping by our esteemed legal friends at DMH Stallard, “the business people who happen to be lawyers”.

I joined speakers from Sentor who introduced the audience to their data scraping monitoring service called Assassin as well as their customer www.yell.com who explained how they have overcome and now manage the data scraping issues they face as a business.

I was uncomfortably placed on the side of the “scrapers” whereas Sentor and Yell were defending the “scrapees” with Frank Jennings adjudicating from DMH Stallard. It’s clear there are strong arguments for and against scraping......

More in: Business , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • great info on seo as well as the legaility of scraping.

    Kyle
    www.mozenda.com
    Are you sure you want to
    Your message goes here
    Be the first to like this
No Downloads

Views

Total Views
4,061
On Slideshare
3,694
From Embeds
367
Number of Embeds
3

Actions

Shares
Downloads
14
Comments
1
Likes
0

Embeds 367

http://www.sitevisibility.co.uk 358
http://www.slideshare.net 8
http://feeds.feedburner.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Environment in Business, a Lexis Nexis specialist B2B publication The problem = need to drive relevant traffic to drive subscriptions
  • Use absolute URLs in your links.Include the full path (http://www.yoursite.com/page.php) instead of relative URLs (/page.php). Use internal linking strategically.In the first couple paragraphs of your content try to find a place where it makes sense, and link to another page of your site using relevant anchor text similar to keywords you want to rank for. Make sure each content headline is a link.Turn the headline of the web page into a link to that page.   Add copyright notice and a link to your site in the RSS feed.Most website scrapers just use your RSS feed to scrape content. They won’t realise that when they post the contents of your feed to their own site, they will be giving you a link with keywords in the anchor text, along with information saying the copyright for the content belongs to you and your site.

Transcript

  • 1. The Risks and Rewards of Data Scraping for SEOSiteVisibilitySearch & Digital Marketing ExpertsWe think beyond the click™
  • 2. SiteVisibility, the Search Marketing division of AI Digital
    • SiteVisibility is a top 20, rapidly growing, award winning digital & search engine marketing agency
    • 3. Founded in 2002, 20 employees and £1.1m revenue in 2008
    • 4. Renowned for our search marketing expertise
    • 5. the number 1 marketing podcast on iTunes
    • 6. the leader in SEO performance models
    • 7. one of the top 20 marketing blogs in the UK
    • 8. pioneering ISO standards for search marketing
    • 9. Investing 2% of our revenue in Search Marketing R&D
  • Maximise volume & minimise the cost of leads
  • 10. So what is SEO and why is it important?
  • 11. So, why is SEO important?
  • 12. What does SEO rely on? 8 SEO Basics
    Findability - keywords need to be in Meta Titles, Headings, Content, URLs AND in hyperlinks linking BACK to a particular URL
    Indexability: eg. Duplicate content, suffixes and sitemaps?
    The other 6 are Accessibility, Usability, Sharability, Linkability, Convertability, andTrackability
    Google likes original, high quality, keyword rich content from high authority sites......
  • 13. Scraping - legal or illegal?
    Data scraping from public data repositories is very common and in most cases legal.
    However, if your purpose is to steal site Y's content so you can put it on your site and benefit from it then that is classed as copyright infringement
    Scraping on this basis is illegal and…
    Violates the Digital Millennium Copyright Act
    It can often hurt search engine rankings of websites (bad for search engine optimisation – SEO)
  • 14. What Action Can I Take?
    Option 1 – Report them to Google & their ISP and / or take legal action
    - Could cost you time & money
    Option 2– Deal with it
    - There are some technical mitigations
    Option 3- Think ahead…
    - Set up your website to take advantage of these scrapers and gain some SEO benefit.
  • 15. Why is Data Scraping a risk for SEO?
    For the “Scraper”
    • Duplicate content but at least it’s content
    • 16. Less authority as a producer of original content
    For the “Scrapee”
    • Google does not like duplicate content so you could be penalised for:
    • 17. Effecting query data
    • 18. Falsifying the number of “real” impressions for advertisers
    • 19. Your authority as the original content source is in question
  • Dealing with it – some technical mitigation
    Ultimately, if data and content is accessible online, anyone/machine can manually copy and create a new database. Although this practice would be illegal in the UK, it is a known risk to all data publishers.
    • Monitor your web analytics for scraping
    • 20. IP lock out which restricts any IP to maximum access per hour before blocking the IP or requiring a “captcha”
    • 21. Use “captcha” forms instead of allowing extractable email addresses
    • 22. Block the IP address of all of your known competitors
    • 23. Generally scraping is done via patterns on the pages. If we use random page generators then scraping becomes difficult.
    • 24. Use a Flash layer to display the final data so that it cannot be scraped whilst making sure you provide for SEO in the design
  • Thinking ahead – make it work for SEO
    • Use absolute URLs in your links
    • 25. Use internal linking strategically
    • 26.  Make sure each content headline is a link
    • 27.  Add copyright notice and a link to your site in the RSS feed
    • 28.  Get some extra juice in Technorati.
  • And our advice...
    Recognise there is a SEO Risk / Opportunity
    Decide on your approach
    Go legal OR
    Make it difficult OR
    Make it work for you OR
    All of the above
    Constantly monitor the situation and develop / refine your approach as part of your online strategy
  • 29. The Risks and Rewards of Data Scraping for SEOSiteVisibilitySearch & Digital Marketing ExpertsWe think beyond the click™
  • 30. Some legal &helpful uses of scraping
    • Market research & business intelligence
    • 31. Data mining your competitor's website to compare prices, products offered, business partners acquired and other critical data.
    • 32. Reputation Management! What if you were alerted to every good or bad comment said about your company or product on a blog, forum or website and could respond with correction or enhancements before mis-information was spread around the Internet?