Scraping in 20 mins
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,396
On Slideshare
2,800
From Embeds
3,596
Number of Embeds
18

Actions

Shares
Downloads
31
Comments
0
Likes
5

Embeds 3,596

http://onlinejournalismblog.com 2,936
http://www.newsrewired.com 225
http://scrapingforjournalists.posterous.com 155
http://www.media.ba 101
https://twitter.com 67
http://onlinejournalismblog.wordpress.com 39
http://media.ba 22
http://www.twylah.com 19
http://www.newsblur.com 10
https://si0.twimg.com 6
http://www.linkedin.com 5
http://us-w1.rockmelt.com 3
http://pinterest.com 2
https://twimg0-a.akamaihd.net 2
http://localhost 1
http://feeds.feedburner.com 1
http://ranksit.com 1
https://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Scraping in 20 mins Paul Bradshaw * Leanpub.com/scrapingforjournalistsFriday, 13 July 2012
  • 2. *Friday, 13 July 2012
  • 3. Function (Parameters) *Friday, 13 July 2012
  • 4. Function (Parameters) =SUM(A2:A50) =AVERAGE(B2:B300) =COUNTIF(A10:A3000,”Smith”) *Friday, 13 July 2012
  • 5. (“string”, index) *Friday, 13 July 2012
  • 6. Tip: search for documentation *Friday, 13 July 2012
  • 7. Tip: search for structure around data *Friday, 13 July 2012
  • 8. *Friday, 13 July 2012
  • 9. //div[starts-with(@ class, ‘jobWrap’)]*Friday, 13 July 2012
  • 10. bit.ly/nrwscraper2 *Friday, 13 July 2012
  • 11. excelnotes.posterous.com /tag/importxml /tag/importhtml *Friday, 13 July 2012
  • 12. *Friday, 13 July 2012
  • 13. https://scraperwiki.com/scrapers/ basic_twitter_scraper/ *Friday, 13 July 2012
  • 14. https://scraperwiki.com/docs/python/tutorials/ - Screen Scraper 2 *Friday, 13 July 2012
  • 15. Things to know • Libraries • Functions • Variables • Lists or arrays [‘Bob’, ‘Jane’] • Index • String, integer, float • If/Else • For loops • OperatorsFriday, 13 July 2012
  • 16. Following the data • From String (URL) -> • Variable (html) -> • Variable (root) -> • Variable containing a list (tds) -> • Variable (td)Friday, 13 July 2012
  • 17. Looping through a list • Tds = [‘Duarte’, ‘Sihl’, ‘Franzi’, ‘Paul’] • For td in tds • The first time, td = Duarte • The second time, td = Sihl • Then td = Franzi • Then td = Paul • Then it has finished the loop!Friday, 13 July 2012
  • 18. *Friday, 13 July 2012
  • 19. Leanpub.com/scrapingforjournalists @paulbradshaw onlinejournalismblog.com helpmeinvestigate.com slideshare.net/onlinejournalist * linkedin.com/in/onlinejournalistFriday, 13 July 2012