Scraping in 20 mins
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Scraping in 20 mins

on

  • 6,326 views

Presenti

Presenti

Statistics

Views

Total Views
6,326
Views on SlideShare
2,743
Embed Views
3,583

Actions

Likes
5
Downloads
29
Comments
0

17 Embeds 3,583

http://onlinejournalismblog.com 2927
http://www.newsrewired.com 223
http://scrapingforjournalists.posterous.com 155
http://www.media.ba 100
https://twitter.com 67
http://onlinejournalismblog.wordpress.com 39
http://media.ba 22
http://www.twylah.com 19
http://www.newsblur.com 10
https://si0.twimg.com 6
http://www.linkedin.com 5
http://us-w1.rockmelt.com 3
http://pinterest.com 2
https://twimg0-a.akamaihd.net 2
http://localhost 1
http://ranksit.com 1
http://feeds.feedburner.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

Scraping in 20 mins Presentation Transcript

  • 1. Scraping in 20 mins Paul Bradshaw * Leanpub.com/scrapingforjournalistsFriday, 13 July 2012
  • 2. *Friday, 13 July 2012
  • 3. Function (Parameters) *Friday, 13 July 2012
  • 4. Function (Parameters) =SUM(A2:A50) =AVERAGE(B2:B300) =COUNTIF(A10:A3000,”Smith”) *Friday, 13 July 2012
  • 5. (“string”, index) *Friday, 13 July 2012
  • 6. Tip: search for documentation *Friday, 13 July 2012
  • 7. Tip: search for structure around data *Friday, 13 July 2012
  • 8. *Friday, 13 July 2012
  • 9. //div[starts-with(@ class, ‘jobWrap’)]*Friday, 13 July 2012
  • 10. bit.ly/nrwscraper2 *Friday, 13 July 2012
  • 11. excelnotes.posterous.com /tag/importxml /tag/importhtml *Friday, 13 July 2012
  • 12. *Friday, 13 July 2012
  • 13. https://scraperwiki.com/scrapers/ basic_twitter_scraper/ *Friday, 13 July 2012
  • 14. https://scraperwiki.com/docs/python/tutorials/ - Screen Scraper 2 *Friday, 13 July 2012
  • 15. Things to know • Libraries • Functions • Variables • Lists or arrays [‘Bob’, ‘Jane’] • Index • String, integer, float • If/Else • For loops • OperatorsFriday, 13 July 2012
  • 16. Following the data • From String (URL) -> • Variable (html) -> • Variable (root) -> • Variable containing a list (tds) -> • Variable (td)Friday, 13 July 2012
  • 17. Looping through a list • Tds = [‘Duarte’, ‘Sihl’, ‘Franzi’, ‘Paul’] • For td in tds • The first time, td = Duarte • The second time, td = Sihl • Then td = Franzi • Then td = Paul • Then it has finished the loop!Friday, 13 July 2012
  • 18. *Friday, 13 July 2012
  • 19. Leanpub.com/scrapingforjournalists @paulbradshaw onlinejournalismblog.com helpmeinvestigate.com slideshare.net/onlinejournalist * linkedin.com/in/onlinejournalistFriday, 13 July 2012