Scraping in 20 mins
Upcoming SlideShare
Loading in...5
×
 

Scraping in 20 mins

on

  • 6,264 views

Presenti

Presenti

Statistics

Views

Total Views
6,264
Views on SlideShare
2,686
Embed Views
3,578

Actions

Likes
5
Downloads
29
Comments
0

17 Embeds 3,578

http://onlinejournalismblog.com 2923
http://www.newsrewired.com 222
http://scrapingforjournalists.posterous.com 155
http://www.media.ba 100
https://twitter.com 67
http://onlinejournalismblog.wordpress.com 39
http://media.ba 22
http://www.twylah.com 19
http://www.newsblur.com 10
https://si0.twimg.com 6
http://www.linkedin.com 5
http://us-w1.rockmelt.com 3
http://pinterest.com 2
https://twimg0-a.akamaihd.net 2
http://localhost 1
http://ranksit.com 1
http://feeds.feedburner.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

Scraping in 20 mins Scraping in 20 mins Presentation Transcript

  • Scraping in 20 mins Paul Bradshaw * Leanpub.com/scrapingforjournalistsFriday, 13 July 2012
  • *Friday, 13 July 2012
  • Function (Parameters) *Friday, 13 July 2012
  • Function (Parameters) =SUM(A2:A50) =AVERAGE(B2:B300) =COUNTIF(A10:A3000,”Smith”) *Friday, 13 July 2012
  • (“string”, index) *Friday, 13 July 2012
  • Tip: search for documentation *Friday, 13 July 2012
  • Tip: search for structure around data *Friday, 13 July 2012
  • *Friday, 13 July 2012
  • //div[starts-with(@ class, ‘jobWrap’)]*Friday, 13 July 2012
  • bit.ly/nrwscraper2 *Friday, 13 July 2012
  • excelnotes.posterous.com /tag/importxml /tag/importhtml *Friday, 13 July 2012
  • *Friday, 13 July 2012
  • https://scraperwiki.com/scrapers/ basic_twitter_scraper/ *Friday, 13 July 2012
  • https://scraperwiki.com/docs/python/tutorials/ - Screen Scraper 2 *Friday, 13 July 2012
  • Things to know • Libraries • Functions • Variables • Lists or arrays [‘Bob’, ‘Jane’] • Index • String, integer, float • If/Else • For loops • OperatorsFriday, 13 July 2012
  • Following the data • From String (URL) -> • Variable (html) -> • Variable (root) -> • Variable containing a list (tds) -> • Variable (td)Friday, 13 July 2012
  • Looping through a list • Tds = [‘Duarte’, ‘Sihl’, ‘Franzi’, ‘Paul’] • For td in tds • The first time, td = Duarte • The second time, td = Sihl • Then td = Franzi • Then td = Paul • Then it has finished the loop!Friday, 13 July 2012
  • *Friday, 13 July 2012
  • Leanpub.com/scrapingforjournalists @paulbradshaw onlinejournalismblog.com helpmeinvestigate.com slideshare.net/onlinejournalist * linkedin.com/in/onlinejournalistFriday, 13 July 2012