Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

3,349 views

Published on

Seventh lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).

Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta

Published in: Data & Analytics
  • Be the first to comment

Ethical and Legal Issues in Computational Social Science - Lecture 7 in Introduction to Computational Social Science

  1. 1. ETHICAL& LEGALISSUES IN COMPUTATIONAL SOCIAL SCIENCE LECTURE 7, 21.9.2015 INTRODUCTION TO COMPUTATIONAL SOCIAL SCIENCE (CSS01) LAURI ELORANTA
  2. 2. • LECTURE 1: Introduction to Computational Social Science [DONE] • Tuesday 01.09. 16:00 – 18:00, U35, Seminar room114 • LECTURE 2: Basics of Computation and Modeling [DONE] • Wednesday 02.09. 16:00 – 18:00, U35, Seminar room 113 • LECTURE 3: Big Data and Information Extraction [DONE] • Monday 07.09. 16:00 – 18:00, U35, Seminar room 114 • LECTURE 4: Network Analysis [DONE] • Monday 14.09. 16:00 – 18:00, U35, Seminar room 114 • LECTURE 5: Complex Systems [DONE] • Tuesday 15.09. 16:00 – 18:00, U35, Seminar room 114 • LECTURE 6: Simulation in Social Science [DONE] • Wednesday 16.09. 16:00 – 18:00, U35, Seminar room 113 • LECTURE 7: Ethical and Legal issues in CSS [TODAY] • Monday 21.09. 16:00 – 18:00, U35, Seminar room 114 • LECTURE 8: Summary • Tuesday 22.09. 17:00 – 19:00, U35, Seminar room 114 LECTURESSCHEDULE
  3. 3. • PART 1: BIG DATA IS PROBLEMATIC • Ethics • Access • Privacy • PART 2: LEGAL ISSUES IN CSS LECTURE 7OVERVIEW
  4. 4. BIGDATA& BIGPROBLEMS
  5. 5. • Big promises vs. Big challenges • The research subjects are humans • The massive amounts of data are gathered from human based interactions • This underlines challenges in: •  Research Ethics •  Privacy •  Transparency •  Trust •  Research Method: how to design and conduct research in an ethical manner? •  Access to data •  Who owns the research data? •  Do you have access to research data? •  Purpose of Research (agenda) COMPUTATIONALSOCIAL SCIENCE IS PROBLEMATIC
  6. 6. King, G. 2011. Ensuring the Data-Rich Future of the Social Sciences. Science. 11 February 2011: Vol. 331 no. 6018 pp. 719-721.
  7. 7. Democratic Society Understanding of Knowledge Privacy (vs. surveillance) Big Data
  8. 8. 1. Big data changes the definition of knowledge 2. Claims to big data objectivity and accuracy are misleading 3. Bigger data are not always better data 4. Taken out of context big data loses its meaning 5. Accessibility does not make big data research ethical 6. Limited access to big data creates new digital divides CRITICALQUESTIONS FOR BIG DATA(BOYD&CRAWFORD2012)
  9. 9. • “Big Data has emerged a system of knowledge that is already changing the objects of knowledge, while also having the power to inform how we understand human networks and community. ‘Change the instruments, and you will change the entire social theory that goes with them’, Latour (2009) reminds us.” (Boyd & Crawford 2012) • “Rather, it is a profound change at the levels of epistemology and ethics. Big Data reframes key questions about the constitution of knowledge, the processes of research, how we should engage with information, and the nature and the categorization of reality.” • Do numbers really speak for themselves? • The inherent bias of the tools an technologies! 1.BIGDATACHANGESTHE DEFINITIONOFKNOWLEDGE (Boyd & Crawford 2012)
  10. 10. • “In reality, working with Big Data is still subjective, and what it quantifies does not necessarily have a closer claim on objective truth” • There’s a risk that big data widens the division between “subjective” qualitative research and “objective” quantitative research • Processing and analyzing big data contains quite many subjective steps that sometimes are not recognized subjective • How data is cleaned • What methods of analysis are used and how • How results are interpreted • The reliability of data sets? • Errors in data sets • Transparency on how the data set is collected is typically very limited! • Biases and limitations of data set 2.BIGDATAISNOTTHAT OBJECTIVE (Boyd & Crawford 2012)
  11. 11. • Just because big data presents us with large quantities of data does not mean that methodological issues are no longer relevant. Understanding sample, for example, is more important now than ever. • Validity • Reliability • Fit for research question? • Good example of sample limitations and bias is Twitter data • Does not represent “all people” even though millions of people might be included in the data set • No visibility on the sample selection of the data set • Size does not equal representability • Restricted access to Twitter firehose, garden hose etc… 3.BIGGERDATA ARENOTALWAYS BETTERDATA (Boyd & Crawford 2012)
  12. 12. • Data related tools and methods might not be transferable from context to context • E.g. Facebook graph might mean something in Facebook, but it is hardly the full representation of the persons real life social network • Activity and intensity in social media context might not have the same meaning in real life • Big data is not generic data about social interactions in general, but specific to the source it is collected from 4.TAKENOUTOFCONTEXT,BIG DATALOSESITSMEANING (Boyd & Crawford 2012)
  13. 13. • “[W]hat is the status of so-called ‘public’ data on social media sites? Can it simply be used, without requesting permission? What constitutes best ethical practice for researchers? Privacy campaigners already see this as a key battleground where better privacy protections are needed. The difficulty is that privacy breaches are hard to make specific – is there damage done at the time? What about 20 years hence? ‘Any data on human subjects inevitably raise privacy issues, and the real risks of abuse of such data are difficult to quantify’ (Nature, cited in Berry 2011).” • Open access to data does not mean that the research is automatically ethical. • Understanding of processes of mining and anonymizing Big Data are typically limited: true accountability requires critical thinking even in cases where some ethical board have granted access for research • Significant questions in relation to control and power: researchers have the tools and the access, while social media users as a whole do not. 5.JUSTBECAUSEITISACCESSIBLE DOESNOTMAKEITETHICAL (Boyd & Crawford 2012)
  14. 14. • “But who gets access? For what purposes? In what contexts? And with what constraints? While the explosion of research using data sets from social media sources would suggest that access is straightforward, it is anything but. “ • Only Social Media companies have full access to data, an average scholar does not. • Access to data typically costs  creates uneven opportunities for research • Top tier universities are in better position • Skills required for accessing data are restricted to those with computational background • This can be also seen as a gendered division • Limited access creates a huge bias in relation to the questions asked • Who get’s to decide the purposes big data is used 6.LIMITEDACCESSTOBIGDATA CREATESNEWDIGITALDIVIDES (Boyd & Crawford 2012)
  15. 15. • Current ethical protocols are not adequate for the types of digital social research increasingly being conducted. • Information generated by users of social media platforms and services cannot be considered equivalent to conventional types of offline information collected by social researchers. • Challenges according Neuhaus & Webmoor (2012): 1. Change in the enactment of the participant and researcher relationship (computer mediated setting where this relationship is mediated) 2. Number of individuals in one research data set has sky rocketed, but so has the privacy / accountability risks 3. Problems of identity in relation to research “participants” and “research data”. What roles do these actors actually play. 4. Collected data may reveal user’s identities after remixing with other data points, even when the original research dataset was anonymized 5. Peer reviews and accountability might be at stake because nowadays a single researcher has access to millions and millions of data points previously accessible only by teams of researchers. BIG DATARESEARCH SETUPCHALLENGES (Neuhaus & Webmoor 2012)
  16. 16. • Neuhaus and Webmoor (2012) propose agile ethics for big data research: • Researchers and institutions should accept the fact that this kind of large-scale data mining still involves human subjects. • Logging of research activities and big data collection • As contract between researchers and participants is not possible, we need to place data generation on more of an equal footing with final outputs; to think of it in terms of authorship. • Taking responsibility of the data sets • Agile ethics is more an attitude, or a mode of engagement and sensibility for good practice, as opposed to a formal list of procedures and protocols • Flexibility is integral to agile research: considering case by case • An agile ethics makes the counterintuitive move to increased openness and transparency; to expose our-selves equally with those wrapped up in our projects. AGILE ETHICS IN BIG DATA RESEARCH (Neuhaus & Webmoor 2012)
  17. 17. • The power is inherently relational between the following stakeholders: • Big Data Collectors: decide which data is collected, stored and for how long. Deciding who gets access. • Big Data Utilizers: uses and redefines the use of data. Can be both collector & utilizer. Determining new behaviour by imposing new social rules of manipulating social processes. • Big Data Generators: • Natural actors, that generate massive amounts of new data voluntarily, unvoluntarily, knowingly, unknowingly… • Artificial actors • Physical phenomena • In this power network ethical decision making is no longer a agency based activity but relational network based ethics NEWPOWERDISTRIBUTION & NETWORKEDETHICS (Zwitter 2014)
  18. 18. • “Big data poses big privacy risks. The harvesting of large sets of personal data and the use of state of the art analytics implicate growing privacy concerns. Protecting privacy will become harder as information is multiplied and shared ever more widely among multiple parties around the world.“ (Tene & Polonetsky 2014) • Big data threatens privacy and democracy • Incremental Effect: the growing potential of user identification with more and more data • Automated decision making based on data and questions of discrimination and the narrowing of choice • Predictive analysis based on sensitive individual information • Lack of access and exclusion: only a few benefit from big data and have access to in vast amounts • Problems with research ethics • Chilling effects of the surveillance society as people change their behaviour based on the notion of 24/7 monitoring BIG CONCERNS ON PRIVACY (Tene & Polonetsky 2014)
  19. 19. • Key thing to consider in any computational social science study is how to protect the privacy of individuals and groups that are research subjects • Research data needs to be anonymized in some way •  Unfortunately this can be quite hard, as in data sets with many data points the data can be connected to the individual, even from anonymous data • Also critical issue is group privacy in the sense, that although the individual level data might be non-personal, the group level aggregated data might reveal something “private” from the group PROTECTINGTHE PRIVACYOF THE RESEARCHSUBJECT (Zwitter 2014)
  20. 20. PRISONERSARCHITECTUREFOR HANDLINGRESEARCHSUBJECTPRIVACY (HUTTON&HENDERSON2012)
  21. 21. LEGALISSUESIN COMPUTATIONAL SOCIALSCIENCE
  22. 22. 1. Legalities and rights concerning the normal use of software, services and data: What have the research subjects agreed on? 2. Legalities and rights concerning the research use of software, services and data: What is allowed for research and what have you agreed on as a researcher? 3. Legalities and rights concerning the distribution of your own work (code + data): How can I distribute this in a way that it benefits the society the most? THREE LEGALAREASTO UNDERSTAND
  23. 23. • Database and software are typically protected by copyrights (or similar rights) and their usage are regulated via database and software licenses. • Protection for databases vary from country to country. European Union has a special database rights that protect each database for 15 years. • For normal copyright this is the lifetime of the author +70 years. This applies to all software. • In order to use the database or software, a license for the use is needed: • Agree with the terms of service • Agree with the license RIGHTS & LICENSES
  24. 24. • EULA: End user license agreement. Typically in distributed and installed software and apps. Include also asking permissions for end user data collection and processing. (Wikipedia 2015, End-user license agreement) • Terms of service: “The Terms-of-Service Agreement, is mainly used for legal purposes, by websites and internet service providers, that store a user's personal data, such as e-commerce and social networking services. A legitimate terms-of-service agreement, is legally binding, and may be subject to change.” (Wikipedia 2015, Terms of service) END USERAGREEMENTS
  25. 25. • User rights and responsibilities • Proper or expected usage; potential misuse • Accountability for online actions, behavior, and conduct • Privacy policy outlining the use of personal data • Payment details such as membership or subscription fees, etc. • Opt-out policy describing procedure for account termination, if available • Disclaimer/Limitation of Liability clarifying the site's legal liability for damages incurred by users • User notification upon modification of terms, if offered ITEMS INATYPICALTERMS OF SERVICE (Wikipedia 2015, Terms of service)
  26. 26. CASEINSTAGRAM (Image from: http://www.thevine.com.au/life/tech/instagram-your-photos-of-cats-are-worth- money-updated-20121219-243481/)
  27. 27. • Terms of service also govern what one is able to do with the service as a users. • In many cases a researchers is a user in this respect: thus terms of service may define what and how one is able to research • E.g. Is web-scraping allowed? • E.g. How much information the user is able to get via an API • As researcher needs to agree with the terms of service to conduct the research, there might be legal consequences if service terms are breached •  Highly important to read and understand the legal agreements in relation to one’s research TERMSOFSERVICEGOVERN ALSOUSEOFDATA(&RESEARCH)
  28. 28. • When using open source software and/or sharing your code, it is important to understand under which software license this is done • There are differences between different open source licenses with big implications (in general all allow license cost free modification, copying and distribution). • Two major types of open source software licenses: 1. Permissive free software licenses 2. Copyleft licenses • In addition there is the Creative Commons (CC) license family, which is more general and extends to many other areas than software. Open Databases are typically licensed under CC, or CC0 public domain. • The Open Knowledge Foundation is also promoting Open Database License (ODbL) OPEN SOURCE LICENSES
  29. 29. • Give rights to use, modify and distribute the software and do not limit the potential further use of the software. •  Permissive: The further distribution of the software may or may not be free of charge •  Gives permissions to do anything freely • Typically requires crediting the original authors • Can be seen as “the academic” license. Most well know versions are from MIT and Berkeley licenses • MIT License • BSD License PERMISSIVE OPEN SOURCE LICENSES
  30. 30. • Copyleft is the practice of offering people the right to freely distribute copies and modified versions of a work with the stipulation that the same rights be preserved in derivative works down the line. (Wikipedia, Copyleft) •  Software done based on copyleft software is automatically under copyleft license. (It can be seen as contagious in this sense) • Most well known copyleft licenses are GNU GPL and its versions COPYLEFTOPEN SOURCE LICENSES
  31. 31. • “Works in the public domain are those whose intellectual property rights have expired, have been forfeited, or are inapplicable. Examples include the works of Shakespeare and Beethoven, most of the early silent films, the formulae of Newtonian physics, Serpent encryption algorithm and powered flight.” (Wikipedia 2015, Public Domain) • Getting things to public domain can be quite hard: some countries may even prohibit any attempt by copyright owners to surrender rights automatically conferred by law. • An alternative way: issue a license which irrevocably grants as many rights as possible to the general public.  CC0 license from Creative Commons PUBLIC DOMAIN (Wikipedia 2015, Public Domain)
  32. 32. • “The Data Protection Directive (officially Directive 95/46/EC on the protection of individuals with regard to the processing of personal data and on the free movement of such data) is a European Union directive adopted in 1995 which regulates the processing of personal data within the European Union. It is an important component of EU privacy and human rights law. On 25 January 2012, the European Commission unveiled a draft European General Data Protection Regulation that will supersede the Data Protection Directive.” (Wikipedia 2015, Data Protection Directive) • Governs the processing and transfer of personal data • Introduced the right to be forgotten • The U.S. has no single data protection law, and legislation is on ad hoc basis EUROPEAN DATA PROTECTION DIRECTIVE
  33. 33. • Read Instagram’s latest Terms of Use, Privacy Policy and API Terms of Use: https://instagram.com/about/legal/terms/ • What implications does the terms have in relation to potential research that uses Instagram pictures as research data? LECTUREASSIGNMENT1
  34. 34. • Watch the following videos on big data & privacy: • https://www.youtube.com/watch?v=H_pqhMO3ZSY • Read the following articles on ethics, surveillance and big data: • Zwitter, A. (2014). Big Data ethics. Big Data & Society, 1(2), 2053951714559253. • Lyon, D. (2014). Surveillance, Snowden, and big data: capacities, consequences, critique. Big Data & Society, 1(2), 2053951714541861. LECTUREASSIGNMENT2
  35. 35. • Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, 15(5), 662-679. • Zwitter, A. (2014). Big Data ethics. Big Data & Society, 1(2), 2053951714559253. • Richards, N. M., & King, J. H. (2014). Big data ethics. Wake Forest Law Review. • Neuhaus, F., & Webmoor, T. (2012). Agile ethics for massified research and visualization. Information, Communication & Society, 15(1), 43-65. • Lyon, D. (2014). Surveillance, snowden, and big data: capacities, consequences, critique. Big Data & Society, 1(2), 2053951714541861. LECTURE 7 READING
  36. 36. • Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, 15(5), 662-679. • Zwitter, A. (2014). Big Data ethics. Big Data & Society, 1(2), 2053951714559253. • Richards, N. M., & King, J. H. (2014). Big data ethics. Wake Forest Law Review. • Bollier, D., & Firestone, C. M. (2010). The promise and peril of big data (p. 56). Washington, DC, USA: Aspen Institute, Communications and Society Program. • Tene, O., & Polonetsky, J. (2012). Big data for all: Privacy and user control in the age of analytics. Nw. J. Tech. & Intell. Prop., 11, xxvii. • Neuhaus, F., & Webmoor, T. (2012). Agile ethics for massified research and visualization. Information, Communication & Society, 15(1), 43-65. • Lyon, D. (2014). Surveillance, snowden, and big data: capacities, consequences, critique. Big Data & Society, 1(2), 2053951714541861. • Hutton, L., & Henderson, T. (2013). An architecture for ethical and privacy- sensitive social network experiments. ACM SIGMETRICS Performance Evaluation Review, 40(4), 90-95. REFERENCES
  37. 37. Thank You! Questions and comments? twitter: @laurieloranta

×