Your SlideShare is downloading. ×
0
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Scraping in 60 minutes
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Scraping in 60 minutes

4,290

Published on

Presentation at Data Harvest 2014

Presentation at Data Harvest 2014

Published in: Education, Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,290
On Slideshare
0
From Embeds
0
Number of Embeds
32
Actions
Shares
0
Downloads
51
Comments
0
Likes
7
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Paul Bradshaw Leanpub.com/scrapingforjournalists* Scraping in 60 mins Saturday, 10 May 14
  • 2. https://www.youtube.com/watch?v=Efr-VEkwWoM Saturday, 10 May 14
  • 3. Saturday, 10 May 14
  • 4. Saturday, 10 May 14
  • 5. Saturday, 10 May 14
  • 6. Saturday, 10 May 14
  • 7. Saturday, 10 May 14
  • 8. Saturday, 10 May 14
  • 9. Saturday, 10 May 14
  • 10. * Saturday, 10 May 14
  • 11. * Saturday, 10 May 14
  • 12. * Function (Arguments) (aka parameters) Saturday, 10 May 14
  • 13. * Function (arguments) =SUM(A2:A50) =AVERAGE(B2:B300) =COUNTIF(A10:A3000,”Smith”) Saturday, 10 May 14
  • 14. * Function (parameters) =SUM(range of cells to be summed) =AVERAGE(range of cells to be averaged) =COUNTIF(range of cells to be counted,what to count) Saturday, 10 May 14
  • 15. * (“string”, index) Saturday, 10 May 14
  • 16. * Tip: search for documentation Saturday, 10 May 14
  • 17. * Variable Saturday, 10 May 14
  • 18. * Variables Saturday, 10 May 14
  • 19. * Jargon checklist: Function Arguments Parameters String Index Variable Documentation Saturday, 10 May 14
  • 20. Vote: =importXML or Python? Saturday, 10 May 14
  • 21. * Another function? Saturday, 10 May 14
  • 22. * Search for documentation! https://www.distilled.net/blog/distilled/guide-to-google-docs-importxml/ Saturday, 10 May 14
  • 23. * Query (XPath) Saturday, 10 May 14
  • 24. * XPath is a path through XML (or HTML) <table> = //table <table><tr> = //table//tr <table><tr><td> = //table//tr//td Saturday, 10 May 14
  • 25. * Search for documentation! http://www.w3schools.com/XPath/xpath_syntax.asp Saturday, 10 May 14
  • 26. * Tip: search for structure around data http://www.w4mpjobs.org/SearchJobs.aspx?search=alljobs Saturday, 10 May 14
  • 27. * http:// www.w4mpjobs.org/ SearchJobs.aspx? http://www.w4mpjobs.org/SearchJobs.aspx?search=alljobs Saturday, 10 May 14
  • 28. * Saturday, 10 May 14
  • 29. * "//div[@class= 'leftcolumn']" Saturday, 10 May 14
  • 30. * //div[starts-with(@ class, ‘jobWrap’)] Saturday, 10 May 14
  • 31. * A crib sheet: Saturday, 10 May 14
  • 32. * Chrome extension: Saturday, 10 May 14
  • 33. Saturday, 10 May 14
  • 34. #!/usr/bin/env python import scraperwiki ‘This is a Python script’ (Shebang) import the Scraperwiki library Saturday, 10 May 14
  • 35. #!/usr/bin/env python import scraperwiki html = scraperwiki.scrape('http:// uk.soccerway.com/teams/netherlands/fortuna- sittard/1551/') print html Function (argument) Saturday, 10 May 14
  • 36. #!/usr/bin/env python import scraperwiki html = scraperwiki.scrape('http:// uk.soccerway.com/teams/netherlands/fortuna- sittard/1551/') print html Comes from Scraperwiki library (check documentation) Saturday, 10 May 14
  • 37. #!/usr/bin/env python import scraperwiki html = scraperwiki.scrape('http:// uk.soccerway.com/teams/netherlands/fortuna- sittard/1551/') print html Variable (assigned with = sign) Statement used to show variable Saturday, 10 May 14
  • 38. #!/usr/bin/env python import scraperwiki html = scraperwiki.scrape('http:// uk.soccerway.com/teams/netherlands/fortuna- sittard/1551/') print html Saturday, 10 May 14
  • 39. Jargon checklist: Library Shebang List Saturday, 10 May 14
  • 40. Paul Bradshaw Leanpub.com/scrapingforjournalists* Thank you. Saturday, 10 May 14

×