Your SlideShare is downloading. ×
Scraping in 20 mins
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Scraping in 20 mins

5,954

Published on

Presenti

Presenti

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,954
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
31
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Scraping in 20 mins Paul Bradshaw * Leanpub.com/scrapingforjournalistsFriday, 13 July 2012
  • 2. *Friday, 13 July 2012
  • 3. Function (Parameters) *Friday, 13 July 2012
  • 4. Function (Parameters) =SUM(A2:A50) =AVERAGE(B2:B300) =COUNTIF(A10:A3000,”Smith”) *Friday, 13 July 2012
  • 5. (“string”, index) *Friday, 13 July 2012
  • 6. Tip: search for documentation *Friday, 13 July 2012
  • 7. Tip: search for structure around data *Friday, 13 July 2012
  • 8. *Friday, 13 July 2012
  • 9. //div[starts-with(@ class, ‘jobWrap’)]*Friday, 13 July 2012
  • 10. bit.ly/nrwscraper2 *Friday, 13 July 2012
  • 11. excelnotes.posterous.com /tag/importxml /tag/importhtml *Friday, 13 July 2012
  • 12. *Friday, 13 July 2012
  • 13. https://scraperwiki.com/scrapers/ basic_twitter_scraper/ *Friday, 13 July 2012
  • 14. https://scraperwiki.com/docs/python/tutorials/ - Screen Scraper 2 *Friday, 13 July 2012
  • 15. Things to know • Libraries • Functions • Variables • Lists or arrays [‘Bob’, ‘Jane’] • Index • String, integer, float • If/Else • For loops • OperatorsFriday, 13 July 2012
  • 16. Following the data • From String (URL) -> • Variable (html) -> • Variable (root) -> • Variable containing a list (tds) -> • Variable (td)Friday, 13 July 2012
  • 17. Looping through a list • Tds = [‘Duarte’, ‘Sihl’, ‘Franzi’, ‘Paul’] • For td in tds • The first time, td = Duarte • The second time, td = Sihl • Then td = Franzi • Then td = Paul • Then it has finished the loop!Friday, 13 July 2012
  • 18. *Friday, 13 July 2012
  • 19. Leanpub.com/scrapingforjournalists @paulbradshaw onlinejournalismblog.com helpmeinvestigate.com slideshare.net/onlinejournalist * linkedin.com/in/onlinejournalistFriday, 13 July 2012

×