“A little Wget magic”<br />Webscraping for journalists<br />CAJ May 13, 2011<br />
Webscraping<br />Using software that simulates a web browser to download large quantities of information from a web site.<...
Why webscrape?<br /><ul><li> Assemble your own copy of online data
 Save time pointing-and-clicking</li></li></ul><li>Why webscrape?<br /><ul><li> Data publishers (governments) want you to ...
Is it legal?<br />Yes. But. <br />Do it ethically.<br />Watch for robots.txt<br />
Tools for scraping<br /><ul><li>DownThemAll(2)
APIs
Upcoming SlideShare
Loading in …5
×

Webscraping for jounalists

2,384 views

Published on

From a presentation I have at the Canadian Association of Journalists on how journalists can learn to web scrape. Most of the presentation was real-time demos not included in this PPT deck.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,384
On SlideShare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Webscraping for jounalists

  1. 1. “A little Wget magic”<br />Webscraping for journalists<br />CAJ May 13, 2011<br />
  2. 2. Webscraping<br />Using software that simulates a web browser to download large quantities of information from a web site.<br />
  3. 3. Why webscrape?<br /><ul><li> Assemble your own copy of online data
  4. 4. Save time pointing-and-clicking</li></li></ul><li>Why webscrape?<br /><ul><li> Data publishers (governments) want you to access data on their terms</li></li></ul><li>
  5. 5.
  6. 6.
  7. 7. Is it legal?<br />Yes. But. <br />Do it ethically.<br />Watch for robots.txt<br />
  8. 8. Tools for scraping<br /><ul><li>DownThemAll(2)
  9. 9. APIs
  10. 10. Wget
  11. 11. Custom scripts</li>

×