Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scraping in 60 minutes (CIJ Summer School 2019)

150 views

Published on

Workshop at the Centre for Investigative Journalism Summer School, July 2019 introducing useful tools for scraping database search results and Twitter

Published in: Education
  • Be the first to comment

  • Be the first to like this

Scraping in 60 minutes (CIJ Summer School 2019)

  1. 1. Paul Bradshaw Leanpub.com/scrapingforjournalists* Scraping in 60 mins
  2. 2. How do you scrape? Aron Pilhofer, News Rewired
  3. 3. WYSIWYG tools: OutWit Hub, Apify Browser extensions: Web Scraper, Grepsr, Google Sheets’ =IMPORT functions Workbench Data, IFTTT, Open Refine Morph. io Scraping tools
  4. 4. OutWit Hub
  5. 5. * Chrome extensions:
  6. 6. * Edit column > Add column by fetching URLs…
  7. 7. https://ifttt.com/channels
  8. 8. https://apify.com/apify/google-search-scraper
  9. 9. https://app.workbenchdata.com/workflows/
  10. 10. * app.workbenchdata.co m/workflows/22852 /22850 /25739
  11. 11. https://onlinejournalismblog.com/2013/09/18/ethics-in-data-journalism-mass-data-gathering-scraping-foi-and-deception/
  12. 12. Robots.txt http://www.tcij.org/robots.txt
  13. 13. Database rights Data copyright Terms & conditions Legal considerations
  14. 14. https://moveplanner.zoopla.co.uk/terms-and-conditions
  15. 15. Treat like any source: build in TGTBT checks Seek second sources Seek right of reply/ confirmation Data is just a lead
  16. 16. http://www.storybench.org/to-scrape-or-not-to-scrape-the-technical-and-ethical-challenges-of-collecting-data-off-the-web/
  17. 17. https://www.mediawiki.org/wiki/API:Main_page Does it have an API?
  18. 18. https://github.com/BBC-Data-Unit/music-festivals
  19. 19. Paul Bradshaw Leanpub.com/scrapingforjournalists* Thank you.

×