Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Capturing tweets with twarc

30 views

Published on

Sean Volke, Online Resources Specialist Librarian, on local collecting @snailx presented as part of Digital collecting for NSW public library staff, 27 May 2019

Published in: Government & Nonprofit
  • Be the first to comment

  • Be the first to like this

Capturing tweets with twarc

  1. 1. Capturing tweets with twarc Sean Volke Online Resources Specialist Librarian State Library of NSW @snailx 27 May 2019
  2. 2. Capturing tweets with twarc • Vizie – ongoing, large captures – Developed by Data61/CSIRO • Twarc – retrospective captures – Developed by Documenting the Now
  3. 3. Capturing tweets with twarc
  4. 4. Capturing tweets with twarc The Rise of Eggboy • Saturday 16th March 2019 – teenager smashed an egg on the head of Federal Senator Fraser Anning • Created headlines across the world including ABC, SMH, Junkee, Washington Post, Daily Mail, and others • Happened on a weekend so wasn’t added to Vizie until Monday • Twarc uses the Twitter API to capture tweets from the previous week
  5. 5. Capturing tweets with twarc twarc search eggboy > eggboy.jsonl
  6. 6. Capturing tweets with twarc File Sizes Line Counts
  7. 7. Capturing tweets with twarc twarc dehydrate eggboy.jsonl > eggboy_IDs.txt twarc hydrate eggboy_IDs.txt > eggboy_restored.jsonl
  8. 8. Capturing tweets with twarc wordcloud.py eggboy.jsonl > eggboy_wordcloud.html
  9. 9. Capturing tweets with twarc Twarc utilities • wall.py – creates a wall of tweets • network.py – creates a dataset for visualisation tools • geojson.py – generate geocoordinates • filter_date.py – filter by date • gender.py – guestimate gender
  10. 10. Capturing tweets with twarc • Twarc: https://github.com/DocNow/twarc • Webrecorder: https://webrecorder.io/ • youtube-dl: https://github.com/ytdl-org/youtube-dl

×