Your SlideShare is downloading. ×
0
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Scraping in 20 mins
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Scraping in 20 mins

6,014

Published on

Presenti

Presenti

Transcript

  • 1. Scraping in 20 mins Paul Bradshaw * Leanpub.com/scrapingforjournalistsFriday, 13 July 2012
  • 2. *Friday, 13 July 2012
  • 3. Function (Parameters) *Friday, 13 July 2012
  • 4. Function (Parameters) =SUM(A2:A50) =AVERAGE(B2:B300) =COUNTIF(A10:A3000,”Smith”) *Friday, 13 July 2012
  • 5. (“string”, index) *Friday, 13 July 2012
  • 6. Tip: search for documentation *Friday, 13 July 2012
  • 7. Tip: search for structure around data *Friday, 13 July 2012
  • 8. *Friday, 13 July 2012
  • 9. //div[starts-with(@ class, ‘jobWrap’)]*Friday, 13 July 2012
  • 10. bit.ly/nrwscraper2 *Friday, 13 July 2012
  • 11. excelnotes.posterous.com /tag/importxml /tag/importhtml *Friday, 13 July 2012
  • 12. *Friday, 13 July 2012
  • 13. https://scraperwiki.com/scrapers/ basic_twitter_scraper/ *Friday, 13 July 2012
  • 14. https://scraperwiki.com/docs/python/tutorials/ - Screen Scraper 2 *Friday, 13 July 2012
  • 15. Things to know • Libraries • Functions • Variables • Lists or arrays [‘Bob’, ‘Jane’] • Index • String, integer, float • If/Else • For loops • OperatorsFriday, 13 July 2012
  • 16. Following the data • From String (URL) -> • Variable (html) -> • Variable (root) -> • Variable containing a list (tds) -> • Variable (td)Friday, 13 July 2012
  • 17. Looping through a list • Tds = [‘Duarte’, ‘Sihl’, ‘Franzi’, ‘Paul’] • For td in tds • The first time, td = Duarte • The second time, td = Sihl • Then td = Franzi • Then td = Paul • Then it has finished the loop!Friday, 13 July 2012
  • 18. *Friday, 13 July 2012
  • 19. Leanpub.com/scrapingforjournalists @paulbradshaw onlinejournalismblog.com helpmeinvestigate.com slideshare.net/onlinejournalist * linkedin.com/in/onlinejournalistFriday, 13 July 2012

×