Scraping              in 20 mins                                          Paul Bradshaw                                   ...
*Friday, 13 July 2012
Function (Parameters)                            *Friday, 13 July 2012
Function (Parameters)       =SUM(A2:A50)       =AVERAGE(B2:B300)       =COUNTIF(A10:A3000,”Smith”)                        ...
(“string”, index)                                       *Friday, 13 July 2012
Tip: search for                       documentation     *Friday, 13 July 2012
Tip: search for structure            around data   *Friday, 13 July 2012
*Friday, 13 July 2012
//div[starts-with(@                       class, ‘jobWrap’)]*Friday, 13 July 2012
bit.ly/nrwscraper2                                       *Friday, 13 July 2012
excelnotes.posterous.com            /tag/importxml           /tag/importhtml                         *Friday, 13 July 2012
*Friday, 13 July 2012
https://scraperwiki.com/scrapers/                             basic_twitter_scraper/                                      ...
https://scraperwiki.com/docs/python/tutorials/ -                     Screen Scraper 2                                     ...
Things to know       •     Libraries       •     Functions       •     Variables       •     Lists or arrays [‘Bob’, ‘Jane...
Following the data       •     From String (URL) ->       •     Variable (html) ->       •     Variable (root) ->       • ...
Looping through a list       •     Tds = [‘Duarte’, ‘Sihl’, ‘Franzi’, ‘Paul’]       •     For td in tds       •     The fi...
*Friday, 13 July 2012
Leanpub.com/scrapingforjournalists                                             @paulbradshaw                              ...
Upcoming SlideShare
Loading in...5
×

Scraping in 20 mins

6,064

Published on

Presenti

Scraping in 20 mins

  1. 1. Scraping in 20 mins Paul Bradshaw * Leanpub.com/scrapingforjournalistsFriday, 13 July 2012
  2. 2. *Friday, 13 July 2012
  3. 3. Function (Parameters) *Friday, 13 July 2012
  4. 4. Function (Parameters) =SUM(A2:A50) =AVERAGE(B2:B300) =COUNTIF(A10:A3000,”Smith”) *Friday, 13 July 2012
  5. 5. (“string”, index) *Friday, 13 July 2012
  6. 6. Tip: search for documentation *Friday, 13 July 2012
  7. 7. Tip: search for structure around data *Friday, 13 July 2012
  8. 8. *Friday, 13 July 2012
  9. 9. //div[starts-with(@ class, ‘jobWrap’)]*Friday, 13 July 2012
  10. 10. bit.ly/nrwscraper2 *Friday, 13 July 2012
  11. 11. excelnotes.posterous.com /tag/importxml /tag/importhtml *Friday, 13 July 2012
  12. 12. *Friday, 13 July 2012
  13. 13. https://scraperwiki.com/scrapers/ basic_twitter_scraper/ *Friday, 13 July 2012
  14. 14. https://scraperwiki.com/docs/python/tutorials/ - Screen Scraper 2 *Friday, 13 July 2012
  15. 15. Things to know • Libraries • Functions • Variables • Lists or arrays [‘Bob’, ‘Jane’] • Index • String, integer, float • If/Else • For loops • OperatorsFriday, 13 July 2012
  16. 16. Following the data • From String (URL) -> • Variable (html) -> • Variable (root) -> • Variable containing a list (tds) -> • Variable (td)Friday, 13 July 2012
  17. 17. Looping through a list • Tds = [‘Duarte’, ‘Sihl’, ‘Franzi’, ‘Paul’] • For td in tds • The first time, td = Duarte • The second time, td = Sihl • Then td = Franzi • Then td = Paul • Then it has finished the loop!Friday, 13 July 2012
  18. 18. *Friday, 13 July 2012
  19. 19. Leanpub.com/scrapingforjournalists @paulbradshaw onlinejournalismblog.com helpmeinvestigate.com slideshare.net/onlinejournalist * linkedin.com/in/onlinejournalistFriday, 13 July 2012
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×