• Like
Scraping in 20 mins
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Scraping in 20 mins

  • 5,845 views
Published

Presenti

Presenti

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,845
On SlideShare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
31
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Scraping in 20 mins Paul Bradshaw * Leanpub.com/scrapingforjournalistsFriday, 13 July 2012
  • 2. *Friday, 13 July 2012
  • 3. Function (Parameters) *Friday, 13 July 2012
  • 4. Function (Parameters) =SUM(A2:A50) =AVERAGE(B2:B300) =COUNTIF(A10:A3000,”Smith”) *Friday, 13 July 2012
  • 5. (“string”, index) *Friday, 13 July 2012
  • 6. Tip: search for documentation *Friday, 13 July 2012
  • 7. Tip: search for structure around data *Friday, 13 July 2012
  • 8. *Friday, 13 July 2012
  • 9. //div[starts-with(@ class, ‘jobWrap’)]*Friday, 13 July 2012
  • 10. bit.ly/nrwscraper2 *Friday, 13 July 2012
  • 11. excelnotes.posterous.com /tag/importxml /tag/importhtml *Friday, 13 July 2012
  • 12. *Friday, 13 July 2012
  • 13. https://scraperwiki.com/scrapers/ basic_twitter_scraper/ *Friday, 13 July 2012
  • 14. https://scraperwiki.com/docs/python/tutorials/ - Screen Scraper 2 *Friday, 13 July 2012
  • 15. Things to know • Libraries • Functions • Variables • Lists or arrays [‘Bob’, ‘Jane’] • Index • String, integer, float • If/Else • For loops • OperatorsFriday, 13 July 2012
  • 16. Following the data • From String (URL) -> • Variable (html) -> • Variable (root) -> • Variable containing a list (tds) -> • Variable (td)Friday, 13 July 2012
  • 17. Looping through a list • Tds = [‘Duarte’, ‘Sihl’, ‘Franzi’, ‘Paul’] • For td in tds • The first time, td = Duarte • The second time, td = Sihl • Then td = Franzi • Then td = Paul • Then it has finished the loop!Friday, 13 July 2012
  • 18. *Friday, 13 July 2012
  • 19. Leanpub.com/scrapingforjournalists @paulbradshaw onlinejournalismblog.com helpmeinvestigate.com slideshare.net/onlinejournalist * linkedin.com/in/onlinejournalistFriday, 13 July 2012