Scraping in 20 mins
Upcoming SlideShare
Loading in...5
×
 

Scraping in 20 mins

on

  • 6,229 views

Presenti

Presenti

Statistics

Views

Total Views
6,229
Views on SlideShare
2,653
Embed Views
3,576

Actions

Likes
5
Downloads
29
Comments
0

17 Embeds 3,576

http://onlinejournalismblog.com 2922
http://www.newsrewired.com 221
http://scrapingforjournalists.posterous.com 155
http://www.media.ba 100
https://twitter.com 67
http://onlinejournalismblog.wordpress.com 39
http://media.ba 22
http://www.twylah.com 19
http://www.newsblur.com 10
https://si0.twimg.com 6
http://www.linkedin.com 5
http://us-w1.rockmelt.com 3
http://pinterest.com 2
https://twimg0-a.akamaihd.net 2
http://localhost 1
http://ranksit.com 1
http://feeds.feedburner.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

Scraping in 20 mins Scraping in 20 mins Presentation Transcript

  • Scraping in 20 mins Paul Bradshaw * Leanpub.com/scrapingforjournalistsFriday, 13 July 2012
  • *Friday, 13 July 2012
  • Function (Parameters) *Friday, 13 July 2012
  • Function (Parameters) =SUM(A2:A50) =AVERAGE(B2:B300) =COUNTIF(A10:A3000,”Smith”) *Friday, 13 July 2012
  • (“string”, index) *Friday, 13 July 2012
  • Tip: search for documentation *Friday, 13 July 2012
  • Tip: search for structure around data *Friday, 13 July 2012
  • *Friday, 13 July 2012
  • //div[starts-with(@ class, ‘jobWrap’)]*Friday, 13 July 2012
  • bit.ly/nrwscraper2 *Friday, 13 July 2012
  • excelnotes.posterous.com /tag/importxml /tag/importhtml *Friday, 13 July 2012
  • *Friday, 13 July 2012
  • https://scraperwiki.com/scrapers/ basic_twitter_scraper/ *Friday, 13 July 2012
  • https://scraperwiki.com/docs/python/tutorials/ - Screen Scraper 2 *Friday, 13 July 2012
  • Things to know • Libraries • Functions • Variables • Lists or arrays [‘Bob’, ‘Jane’] • Index • String, integer, float • If/Else • For loops • OperatorsFriday, 13 July 2012
  • Following the data • From String (URL) -> • Variable (html) -> • Variable (root) -> • Variable containing a list (tds) -> • Variable (td)Friday, 13 July 2012
  • Looping through a list • Tds = [‘Duarte’, ‘Sihl’, ‘Franzi’, ‘Paul’] • For td in tds • The first time, td = Duarte • The second time, td = Sihl • Then td = Franzi • Then td = Paul • Then it has finished the loop!Friday, 13 July 2012
  • *Friday, 13 July 2012
  • Leanpub.com/scrapingforjournalists @paulbradshaw onlinejournalismblog.com helpmeinvestigate.com slideshare.net/onlinejournalist * linkedin.com/in/onlinejournalistFriday, 13 July 2012