What Your Tweets Tell Us About You, Speaker NotesDocument Transcript
• Introduce paper title • Ask people to interact, comment, respond to our questions during presentation using #tweetprivacy • Credits o Charlesworth – whose Digital Lives Report was one of the only papers that provided any analysis and guidance in the area of social media archiving. Interest in social media data is multidisciplinary, resulting in conflicting views regarding the ethical management of captured datasets. Curators will be required to navigate these conflicting views as they work to provide appropriate mechanisms for access and reuse of these data. We hope to encourage researchers, library, archive, or repository staff to engage in a cross-‐disciplinary conversation about the privacy issues (as well as the host of other issues) inherent in using social media as a primary source for research. We’re going to show you a clip from Laila Sakr’s presentation at the Tech@state Data Visualization Conference in Washington DC. The clip provides a good example of how researchers are using twitter and other social media data. [Play Clip] There are two key things I want to point out: 1. Long-‐term archiving of this data and other curatorial issues like value, authenticity, and significant properties are absent from this talk, which is not surprising. They were also absent in many of the papers we read that utilized Twitter data. This demonstrates that there is an overall emphasis by researches at this point, on collection and analysis rather than on preservation. 2. Sakr makes sure to say that she is downloading only the publicly available tweets using the search API and how this could potentially affect her sample and the validity of it. She’s not talking about it in terms of privacy issues – which further illustrates that the focus is on analysis rather privacy or the ethics. We’d like to take an informal poll similar to last night’s poll of the audience’s willingness to have their genome sequenced. Who among those of you who use Twitter as a communication tool is completely fine with having your tweets, profile information, images, location data downloaded, analyzed, archived, preserved? -‐of those of you with your hands raised, how many of you have tweeted something of a more personal nature that you might not want archived? And who here is actively involved with the collection of Twitter data? – any social media data? ?What do you do with it – Tweet here] The reason I ask is we found through our work with the Hypercities Egypt Twitter data, that the issue of whether or not there are privacy concerns with a data source like Twitter is essentially a research ethics issue; which varies depending on the role and/or subject background of the researcher and how they view the context of the data creation. (refer Confounding Relationships to point out various roles)
So, our central thesis is that perceptions of privacy in social media platforms are formed by disciplinary culture, the capabilities and constraints of the platform, and community norms the platform itself. Does analyzing a persons Tweets constitute researching a human subject? Or are Tweets a creative text which requires proper citation and credit to the authors or tweeters? Or are Tweets part of the open public record. Social scientists tend to view the data as Human Subject research, while Humanists tend to view the data as a form of publication. These very different ways of viewing the data require different methods for dealing with privacy. We feel it is important to state that social media data are not homogenous; each platform has its own unique constraints for the creation/inclusion of content as well as constraints on how users may engage in the space, and their expectations and norms of interaction. Our case study focuses on Twitter, so while we provide a general framework assessing privacy issues with social media, it must be understood, that because of the uniqueness of Twitter’s Privacy Policies, Terms of Service, Developers Rules of the Road, the analysis and interpretation are not necessarily generalizable to other platforms, such as Facebook. Like many data curation activities there will be some facets which can be generalized, while others may be platform, or subject specific. Part of determining the curation needs of social media data will be to determine these boundaries. What can we learn about you from Twitter? [Show different visualizations, then tweet map, tweet image] Depending on how the data are visualized we can learn about you as an individual, your internet relations, or as part of huge collective, or nothing about you as an individual (r-‐shief image). Different visualizations will enable better anonymization than others. However, the underlying dataset used to generate the visualizations will still contain: if your account is unprotected, name, location, photos, etc. anything you decide to share in your timeline – so if you include other personal info – like an email or some such thing, we can find it out about you. But What else can we find out about you? [show the Alyaa Gad slide – then the Google Search] Thanks to the power of search engines like google, we can get a lot more information, which may be collected and archived as well. Our Case Study or what I like to call “we’ve got tweets, now what?” Todd Presner, a UCLA Faculty member and two researchers collected a subset of the overall Twitter data available. He asked the library to archive it. Before we could do anything with it, we had to assess what he had collected. The HyperCities team used the Twitter Search API to pull data based on the location parameter (within 200 km of the center of Cairo), time period (January 30, 2011 through February 24, 2011), AND one of three hashtags (#jan25 OR #egypt OR #tahrir). They downloaded approximately 420,000 public Tweets during the initial phase of this analysis and continue to feed their site with live feeds.