Academic excellence for business and the professions
#mla15 #s398
Butterfly Hunt:
On Collecting #mla14 Tweets
Dr Ernesto Priego
Centre for Information Science
@ernestopriego #citylis
Ernesto.Priego.1@city.ac.uk
#mla15 #398
	
  #mla15	
  #398	
  
Quickly…
•  1. Twitter data collection- “butterfly
hunt”, citizen archiving
•  2. How we collected and what we
collected
•  3. #mla14 Twitter data summary
•  4. #mla15 so far
•  5. Why Twitter data collection matters
	
  #mla15	
  #398	
  
Male	
  Monarch	
  bu4erfly	
  
CC-­‐BY	
  SA	
  Captain-­‐tucker	
  
Wikimedia	
  Commons	
  	
  
Conference Tweets Collection as Butterfly Hunt
	
  #mla15	
  #398	
  
“They	
  would	
  flu4er	
  
toward	
  a	
  blossom,	
  hover	
  
over	
  it.	
  	
  My	
  bu4erfly	
  net	
  
upraised,	
  I	
  stood	
  waiKng	
  
only	
  for	
  the	
  spell	
  that	
  the	
  
flowers	
  seemed	
  to	
  cast	
  on	
  
the	
  pair	
  of	
  wings	
  to	
  have	
  
finished	
  its	
  work,	
  when	
  all	
  
of	
  a	
  sudden	
  the	
  delicate	
  
body	
  would	
  glide	
  off	
  
sideways	
  with	
  a	
  gentle	
  
buffeKng	
  of	
  the	
  air,	
  to	
  cast	
  
its	
  shadow	
  –	
  moKonless	
  as	
  
before	
  –	
  over	
  another	
  
flower,	
  which	
  just	
  as	
  
suddenly	
  it	
  would	
  leave	
  
without	
  touching.	
  ”	
  (51)	
  
Walter	
  Benjamin	
  at	
  the	
  Bibliothèque	
  NaKonale	
  de	
  Paris,	
  April	
  1937,	
  	
  
photo	
  by	
  Gisèle	
  Freud	
  
Conference Tweet Collection as Citizen Archiving
•  Citizen archivists are “the first responders
of history… arriving early on the scene to
gather, capture, describe and preserve
ephemeral artifacts of interest and helping
to ensure they survive over time to share
with the future.”
•  “Almost any collection becomes interesting
once you get enough stuff in one place.”
-Butch Lazorchak, The Signal. Digital
Preservation. Library of Congress, May 8,
2013
http://blogs.loc.gov/digitalpreservation/2013/05/ian-mackaye-and-
citizen-archiving/
Screenshot	
  from	
  the	
  Fugazi	
  Live	
  Series	
  of	
  a	
  
Notorious	
  Dallas	
  Concert	
  in	
  1990.	
  
Data Collection; Data Sharing
•  TAGS – Twitter Archiving Google Spreadsheet by Martin Hawksey
(@mhawksey)
•  Not as easy as just letting it run automatically– devil is in the details
•  Issues: API Limits, Spreadsheet Settings and Limits, Different Time
Zones, Spam, Character Encoding, Languages, Duplication, Bots,
Automated Tweets
•  Automated methods and manual methods are required
•  Online collaboration – Chris Zarate and I, across time zones,
countries, institutions
•  Shared on open access repositories:
•  http://bit.ly/MLA14TwitterArchive (see also
http://cdrs.columbia.edu/cdrsmain/2014/02/mla14-twitter-archive-
added-to-academic-commons/ )
	
  #mla15	
  #398	
  
#MLA14 Tweet Activity on Conference Days (9-12
January 2014)
	
  #mla15	
  #398	
  
#MLA14 Conference Days Summary
	
  #mla15	
  #398	
  
#MLA14 Conference Days Participation
	
  #mla15	
  #398	
  
#MLA14 Most Frequent Word*:
digital: 1,474 occurrences
(humanities: 1,152)
	
  #mla15	
  #398	
  
	
  *	
  	
  In	
  9-­‐12	
  January	
  2014	
  dataset,	
  obtained	
  with	
  Words	
  in	
  the	
  EnKre	
  Corpus,	
  Voyant	
  Tools	
  
Top Ten #MLA14 user_lang*
	
  #mla15	
  #398	
  
*From	
  19	
  different	
  user_lang	
  opKons	
  in	
  dataset	
  
#MLA14 Tweets and Geolocation
•  Only	
  2%	
  of	
  
Tweets	
  had	
  
geolocaKon	
  
enabled	
  
•  Of	
  484	
  	
  Tweets	
  
with	
  
coordinates	
  in	
  
dataset	
  464	
  
were	
  unique	
  
locaKons;	
  most	
  
from	
  Chicago	
  
metro	
  area	
  	
  
	
  #mla15	
  #398	
  
Why Does It Matter?
•  Historical record of scholarly social media participation and an
increasingly-important manifestation of the MLA Convention activity
•  Twitter is experienced “live”, as it happens, but Web and Mobile
clients do not yet allow long-term archiving of hashtag acttivity or
easier manipulation
•  Tweets as ephemera; preservation for future analysis needed
•  Twitter may or may not last; it may change significantly; Twitter API
rules and behaviour change very often and without warning
•  Archiving and analyising Twitter evidence (data) may offer insights
into scholarly behaviour, disciplinary, thematic, linguistic and socio-
cultural trends over time, conference sentiment/feedback, for
scholarly social networks mapping, etc.
	
  #mla15	
  #398	
  
Thank You!
	
  #mla15	
  #398	
  
References
•  Priego, Ernesto, and Zarate, Chris. “#MLA14 Twitter Archive, 9-12 January 2014”. Dataset. City
Research Online. 2014. Web.
http://openaccess.city.ac.uk/3083/ Accessed 7 January 2015.
•  Priego, Ernesto. #MLA14: A First Look (I). Far Away, Yet Close. MLA Commons. 16 January 2014.
Web.http://remoteparticipation.commons.mla.org/2014/01/16/mla14-a-first-look/ Accessed 7
January 2015 (four parts).
•  Priego, Ernesto. “Some Thoughts on Why You Would Like to Archive and Share [Small] Twitter
Data Sets”. 28 May 2014. Web.
https://epriego.wordpress.com/2014/05/28/some-thoughts-why-you-would-like-to-archive-and-
share-twitter-small-data/ Accessed 7 January 2015.
•  Download this presentation:
http://www.slideshare.net/epriego/butterfly-hunt-on-collecting-mla14-tweets

Butterfly Hunt: On Collecting #mla14 Tweets (#mla15 #s398)

  • 1.
    Academic excellence forbusiness and the professions #mla15 #s398 Butterfly Hunt: On Collecting #mla14 Tweets Dr Ernesto Priego Centre for Information Science @ernestopriego #citylis Ernesto.Priego.1@city.ac.uk
  • 2.
  • 3.
    Quickly… •  1. Twitterdata collection- “butterfly hunt”, citizen archiving •  2. How we collected and what we collected •  3. #mla14 Twitter data summary •  4. #mla15 so far •  5. Why Twitter data collection matters  #mla15  #398   Male  Monarch  bu4erfly   CC-­‐BY  SA  Captain-­‐tucker   Wikimedia  Commons    
  • 4.
    Conference Tweets Collectionas Butterfly Hunt  #mla15  #398   “They  would  flu4er   toward  a  blossom,  hover   over  it.    My  bu4erfly  net   upraised,  I  stood  waiKng   only  for  the  spell  that  the   flowers  seemed  to  cast  on   the  pair  of  wings  to  have   finished  its  work,  when  all   of  a  sudden  the  delicate   body  would  glide  off   sideways  with  a  gentle   buffeKng  of  the  air,  to  cast   its  shadow  –  moKonless  as   before  –  over  another   flower,  which  just  as   suddenly  it  would  leave   without  touching.  ”  (51)   Walter  Benjamin  at  the  Bibliothèque  NaKonale  de  Paris,  April  1937,     photo  by  Gisèle  Freud  
  • 5.
    Conference Tweet Collectionas Citizen Archiving •  Citizen archivists are “the first responders of history… arriving early on the scene to gather, capture, describe and preserve ephemeral artifacts of interest and helping to ensure they survive over time to share with the future.” •  “Almost any collection becomes interesting once you get enough stuff in one place.” -Butch Lazorchak, The Signal. Digital Preservation. Library of Congress, May 8, 2013 http://blogs.loc.gov/digitalpreservation/2013/05/ian-mackaye-and- citizen-archiving/ Screenshot  from  the  Fugazi  Live  Series  of  a   Notorious  Dallas  Concert  in  1990.  
  • 6.
    Data Collection; DataSharing •  TAGS – Twitter Archiving Google Spreadsheet by Martin Hawksey (@mhawksey) •  Not as easy as just letting it run automatically– devil is in the details •  Issues: API Limits, Spreadsheet Settings and Limits, Different Time Zones, Spam, Character Encoding, Languages, Duplication, Bots, Automated Tweets •  Automated methods and manual methods are required •  Online collaboration – Chris Zarate and I, across time zones, countries, institutions •  Shared on open access repositories: •  http://bit.ly/MLA14TwitterArchive (see also http://cdrs.columbia.edu/cdrsmain/2014/02/mla14-twitter-archive- added-to-academic-commons/ )  #mla15  #398  
  • 7.
    #MLA14 Tweet Activityon Conference Days (9-12 January 2014)  #mla15  #398  
  • 8.
    #MLA14 Conference DaysSummary  #mla15  #398  
  • 9.
    #MLA14 Conference DaysParticipation  #mla15  #398  
  • 10.
    #MLA14 Most FrequentWord*: digital: 1,474 occurrences (humanities: 1,152)  #mla15  #398    *    In  9-­‐12  January  2014  dataset,  obtained  with  Words  in  the  EnKre  Corpus,  Voyant  Tools  
  • 11.
    Top Ten #MLA14user_lang*  #mla15  #398   *From  19  different  user_lang  opKons  in  dataset  
  • 12.
    #MLA14 Tweets andGeolocation •  Only  2%  of   Tweets  had   geolocaKon   enabled   •  Of  484    Tweets   with   coordinates  in   dataset  464   were  unique   locaKons;  most   from  Chicago   metro  area      #mla15  #398  
  • 13.
    Why Does ItMatter? •  Historical record of scholarly social media participation and an increasingly-important manifestation of the MLA Convention activity •  Twitter is experienced “live”, as it happens, but Web and Mobile clients do not yet allow long-term archiving of hashtag acttivity or easier manipulation •  Tweets as ephemera; preservation for future analysis needed •  Twitter may or may not last; it may change significantly; Twitter API rules and behaviour change very often and without warning •  Archiving and analyising Twitter evidence (data) may offer insights into scholarly behaviour, disciplinary, thematic, linguistic and socio- cultural trends over time, conference sentiment/feedback, for scholarly social networks mapping, etc.  #mla15  #398  
  • 14.
    Thank You!  #mla15  #398   References •  Priego, Ernesto, and Zarate, Chris. “#MLA14 Twitter Archive, 9-12 January 2014”. Dataset. City Research Online. 2014. Web. http://openaccess.city.ac.uk/3083/ Accessed 7 January 2015. •  Priego, Ernesto. #MLA14: A First Look (I). Far Away, Yet Close. MLA Commons. 16 January 2014. Web.http://remoteparticipation.commons.mla.org/2014/01/16/mla14-a-first-look/ Accessed 7 January 2015 (four parts). •  Priego, Ernesto. “Some Thoughts on Why You Would Like to Archive and Share [Small] Twitter Data Sets”. 28 May 2014. Web. https://epriego.wordpress.com/2014/05/28/some-thoughts-why-you-would-like-to-archive-and- share-twitter-small-data/ Accessed 7 January 2015. •  Download this presentation: http://www.slideshare.net/epriego/butterfly-hunt-on-collecting-mla14-tweets