• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Scraping in 20 mins
 

Scraping in 20 mins

on

  • 6,112 views

Presenti

Presenti

Statistics

Views

Total Views
6,112
Views on SlideShare
2,558
Embed Views
3,554

Actions

Likes
5
Downloads
26
Comments
0

17 Embeds 3,554

http://onlinejournalismblog.com 2908
http://www.newsrewired.com 219
http://scrapingforjournalists.posterous.com 155
http://www.media.ba 94
https://twitter.com 67
http://onlinejournalismblog.wordpress.com 39
http://media.ba 22
http://www.twylah.com 19
http://www.newsblur.com 10
https://si0.twimg.com 6
http://www.linkedin.com 5
http://us-w1.rockmelt.com 3
http://pinterest.com 2
https://twimg0-a.akamaihd.net 2
http://localhost 1
http://ranksit.com 1
http://feeds.feedburner.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

Scraping in 20 mins Scraping in 20 mins Presentation Transcript

  • Scraping in 20 mins Paul Bradshaw * Leanpub.com/scrapingforjournalistsFriday, 13 July 2012
  • *Friday, 13 July 2012
  • Function (Parameters) *Friday, 13 July 2012
  • Function (Parameters) =SUM(A2:A50) =AVERAGE(B2:B300) =COUNTIF(A10:A3000,”Smith”) *Friday, 13 July 2012
  • (“string”, index) *Friday, 13 July 2012
  • Tip: search for documentation *Friday, 13 July 2012
  • Tip: search for structure around data *Friday, 13 July 2012
  • *Friday, 13 July 2012
  • //div[starts-with(@ class, ‘jobWrap’)]*Friday, 13 July 2012
  • bit.ly/nrwscraper2 *Friday, 13 July 2012
  • excelnotes.posterous.com /tag/importxml /tag/importhtml *Friday, 13 July 2012
  • *Friday, 13 July 2012
  • https://scraperwiki.com/scrapers/ basic_twitter_scraper/ *Friday, 13 July 2012
  • https://scraperwiki.com/docs/python/tutorials/ - Screen Scraper 2 *Friday, 13 July 2012
  • Things to know • Libraries • Functions • Variables • Lists or arrays [‘Bob’, ‘Jane’] • Index • String, integer, float • If/Else • For loops • OperatorsFriday, 13 July 2012
  • Following the data • From String (URL) -> • Variable (html) -> • Variable (root) -> • Variable containing a list (tds) -> • Variable (td)Friday, 13 July 2012
  • Looping through a list • Tds = [‘Duarte’, ‘Sihl’, ‘Franzi’, ‘Paul’] • For td in tds • The first time, td = Duarte • The second time, td = Sihl • Then td = Franzi • Then td = Paul • Then it has finished the loop!Friday, 13 July 2012
  • *Friday, 13 July 2012
  • Leanpub.com/scrapingforjournalists @paulbradshaw onlinejournalismblog.com helpmeinvestigate.com slideshare.net/onlinejournalist * linkedin.com/in/onlinejournalistFriday, 13 July 2012