Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Ethics for Mathematicians

1,016 views

Published on

This is my attempt at an introduction to data ethics for mathematicians. Mathematicians increasingly need to deal with these kinds of issues, but we don't have the tradition of ethics training from other disciplines.

I welcome comments on how to improve these slides. Did I miss any salient points? Do you want to offer a different perspective on any of these? Do you want to offer any counterpoints? (Please e-mail me directly with comments and suggestions.)

Eventually, I hope to develop these slides further into an article for a venue aimed at mathematical scientists, and of course I would love to have knowledgeable coauthors who can offer a different perspective from mine.

Published in: Education
  • Be the first to comment

Data Ethics for Mathematicians

  1. 1. More generally: A discussion of ethics for data, research, and publishing Mason A. Porter (@masonporter) Department of Mathematics UCLA
  2. 2. § It’s important. § People need to be able to replicate our work. § Making sure their own code is correct § Natural self-correction in science (and ability to understand precisely every choice we make in our work) § Not traditionally part of mathematical training, but increasingly we are using social data — including potentially personal data — in our research
  3. 3. § We use a lot more real data nowadays, and in particular this includes a lot of human (and animal) data. § Much less a part of the research (and thus training) tradition in mathematics than in other disciplines § Other disciplines have thought a lot more about ethics than mathematics § In many cases, unfortunately, because they’ve messed up the ethics historically, sometimes substantially, and we need to learn from the best practices they’ve developed "Look, lady. Just because my grandfather didn't rape the environment and exploit the workers doesn't make me a peasant. And it's not that he didn't want to rape the environment and exploit the workers; I'm sure he did. It's just that as a barber, he didn't have that much opportunity." – Roger Cobb [Steve Martin], All of Me (1984) Thanks to Peter Mucha for the quote suggestion (and an excuse to allude to this movie)
  4. 4. § Be honest and fair (obviously) § Design ethically thoughtful research § Explain your decisions to others § [Points 2 and 3 taken from slides by Matt Salganik]
  5. 5. FOUR PRINCIPLES § Respect for persons § (Note: Animal research also has thorny ethical issues!) § Beneficence § Justice § Respect for Law and Public Interest How do you balance these four principles?
  6. 6. § Always be honest about your work
  7. 7. § If you are working with personal data, you need to check with your Institutional Review Board (IRB) to ensure that you are doing the work in an ethical way. § They may tell you that you don’t need to submit a formal application, or they may tell you that you do. Let them know briefly what data you have access to (or plan to acquire, and how) and what you plan to do with it. § Different IRBs of course can rule differently. § Rules differ in different countries § Human data versus animal data § In these slides, I have human data in mind, but animal data and its acquisition of course also has major ethical considerations. § Look through UCLA’s website for the Office of the Human Research Protection Program (OHRPP): http://ora.research.ucla.edu/ohrpp/Pages/OHRPPHome.aspx § “IRB is a floor, not a ceiling” (from Matt Salganik’s slides)
  8. 8. § A well-known, heavily-used set of courses: https://www.citiprogram.org/index.cfm?pageID=86 § I found this from a link from UCLA’s OHRPP website. § Several years ago, I did some IRB training. (When preparing these slides, I couldn’t find the specific online course I took.) In addition to helping to think about issues, if something does go wrong, you do (from a practical point of view) want to be able to say that you have appropriate ethics training. § Note:The training required/expected/available differs substantially across countries. § Example: From my experience, my impression is that the UK appears to be less stringent about human data than the US, but it appears to be more stringent about non-human animals.
  9. 9. § The more your research has the potential to violate personal privacy, the more helpful for humanity the outcome needs to have the potential to be.
  10. 10. § Informed Consent § Understanding and managing informational risk § Privacy § Making decisions in the face of uncertainty § Other notes § Put yourself in everyone else’s shoes § Think of research ethics as continuous, not discrete (sliding scale) Bullet points from Matt Salganik’s slides
  11. 11. § You must provide sufficient (and precise) detail for people to be able to replicate your work! § Try to include it in your papers, but people are human, so if somebody e-mails you to ask for a clarification, copy of code (even if poorly commented), or something else, you should respond and send it to them, provide it’s something that you have the right to send them.
  12. 12. § To the extent possible, you should publish your data and usable (and well-commented) code along with your work. § There can be tension between these ideals and issues of personal privacy, nondisclosure agreements, and so on. § If using synthetic data, publish code to generate the data and the generated examples that you used in your paper. § Supplementary material for the paper on the journal website, Github, Figshare, and other venues § Likely relevant for literally all of you § E.g., if you are doing any numerical computations at all, this is desirable § E.g., adjacency matrices for graphs in a definition–theorem–proof paper is also useful for readers (though level of necessity depends on how large the graphs are) § Admission: I have been trying to get better about this over the years. I am very good about responding to e-mail queries, and the goal (though there exist practical considerations) is to be precise about all of my steps and to put as much online as feasible.
  13. 13. § For empirical data, if you have permission to post something (e.g., does the data “belong” to somebody else?) and it doesn’t invade privacy, you should post it because that promotes good science.
  14. 14. § Alternative name:“replication crisis” § https://en.wikipedia.org/wiki/Replication_crisis Take a look, e.g., at the work of Victoria Snodden: http://web.stanford.edu/~vcs/
  15. 15. § Be explicit about anything you did, so that others can know what choices you made and evaluate whether they think it is the best procedure for your analysis § E.g., sampling biases change properties of data § There are many reasons that one makes choices, so it’s not that you shouldn’t make them, but it’s part of your scientific procedure, so tell people exactly what you did so they know exactly what these choices were. (They may want to make different choices.) § “Manipulating” is a loaded word; here I mean it in a neutral way (i.e.,“changes”), rather than in a negative one.
  16. 16. § When are things actually “anonymous” § Is “full” anonymization even possible?
  17. 17. Slide from Matt Salganik
  18. 18. Slide from Matt Salganik
  19. 19. § https://en.wikipedia.org/wiki/Netflix_Prize#Cancelled_sequel
  20. 20. § Acknowledge all sources of data § Include precise means of how you got data and how somebody else can get the data (e.g., who do they contact?), especially if there is a reason that you are unable to post the data itself § Be generous when acknowledging people in papers: useful discussions, ideas, etc. § Be fair and appropriate when discussing work by authors in past papers § You are standing on the shoulders of giants. :) Given credit where it is due. § Difference between somebody “showing” something in a past paper versus “reporting” it. The former is a statement of verifying validity; the latter is a historical fact (assuming what you write is accurate).
  21. 21. § There can be complications in posting data to the public, no matter how well- intentioned. § This is a great data set to advance several avenues of research in network science, and my goal is for people to be able to do that. § Learning the hard way § Urgently arranging a phone meeting with the head of Facebook’s Data Science team § An important learning experience for me § A small chapter in the long story of data privacy § A blog entry that is very critical of me (though this differs from my side of the story): http://www.michaelzimmer.org/2011/02/15/facebook-data-of-1-2-million-users-from-2005- released/ § Led to my learning much more about these issues (though under very stressful circumstances), a page about research using human data in Oxford’s Mathematical Institute, etc. § https://www.maths.ox.ac.uk/members/policies/data-protection/research-using-data-involving-humans
  22. 22. § Research in collaboration with companies or government:What is it ok to include in a publication or post online? § Tension between open data and personal privacy § Terms-of-service agreements and nondisclosure agreements § In what sense can you replicate work if you can’t post everything? § “Softer” replication: do you observe similar phenomena in circumstances that have some similarities but are not the same? § E.g., human behavior in different social networks
  23. 23. § See, e.g., the discussion around this paper: http://science.sciencemag.org/content/early/2015/05/06/science.aaa1160.full § Eytan Bakshy, Solomon Messing, & Lada Adamic, Exposure to ideologically diverse news and opinion on Facebook, Science, 2015 § They can’t tell us Facebook’s sampling algorithm, so how are we as scientists going to go about “replicating” their work? § Note: Do their insights apply to other online social networks? One should be able to do a weaker form of replication such that the most interesting qualitative results are not merely a property of specifics on Facebook § Also:What about this work being public versus being entirely within Facebook and us never seeing any of it?
  24. 24. § A. D. I. Kramer, J. E. Guillory, and J.T. Hancock. Experimental evidence of massive- scale emotional contagion through social networks. Proceedings of the National Academy of Sciences of the United States of America, 111(24):8788–8790, 2014 § Look up articles on this one § Experiments on Facebook with changes in people’s feeds § Also:What about this work being public versus being entirely within Facebook and us never seeing any of it? Note: Academic researchers have IRBs that need to approve a study before it starts, whereas Facebook has a publication review board to approve publication of a study after it's already been done.Thus, we know that this study occurred because FB concluded that it could be published. We don’t know about what stuff is done with our data from FB and other companies when it doesn’t get published.
  25. 25. Should academic researchers and companies follow the same rules?
  26. 26. § You can apply this comment generally to “data science” if you like, though the property of connectivity in networks provides substantial additional issues beyond just data science (and “Big Data”, etc.).
  27. 27. § Short essay by Johan Ugander (Management Science & Engineering, Stanford): https://medium.com/@jugander/truth-lies-and-an-ethics-of-personalization- e4ccfa7f2b84#.rzap3hm70 § As an example, he discusses “Cambridge Analytica, identified by the NY Times as the hired guns behind Trump’s online targeting.” § Alexander Nix (CEO of CA) gave the following example in a video. Quoting Ugander’s essay:“if you own a private beach, he notes, you’d have more success keeping people off your beach by putting up a “Warning: sharks beyond this point” sign vs. a “private property” sign.The problem is: he recommends this strategy —  and personalized versions of it — without any consideration to whether there actually are any sharks, advocating “behavioral communication” that is completely detached from any truth about reality. In fewer words: crafting lies, and then targeting them.”
  28. 28. § http://callingbullshit.org § Full title:“Calling Bullshit in the Age of Big Data” § A course designed by Carl Bergstrom and Jevin West (University of Washington) § Excellent syllabus and reading materials § Various parts of it relate to ethics, and they also have a unit directly about ethics: http://callingbullshit.org/syllabus.html#Ethics
  29. 29. § Targeted advertising (different trailers for people of different races) for the movie "Straight outta Compton": http://www.businessinsider.com/why-straight-outta- compton-had-different-trailers-for-people-of-different-races § Different levels of prior familiarity with gangsta rap pioneers N.W. A. (Ice Cube, Dr. Dre, etc.) § Papers by Arvind Narayanan and collaborators, including: § http://senglehardt.com/papers/ccs16_online_tracking.pdf § https://5harad.com/papers/twivacy.pdf § J. Su et al.,“De-anonymizing Web Browsing Data with Social Networks”, 2016 § C. Kanich et al.,“Spamalytics: An Empirical Analysis of Spam Marketing Conversion” (2008): § http://www.umiacs.umd.edu/~tdumitra/courses/ENEE757/Fall14/papers/Kanich08.pdf § B. Markines et al.,“Social spam detection” (2009): § http://dl.acm.org/citation.cfm?doid=1531914.1531924
  30. 30. § “Tastes,Ties, and Time” Facebook data set § One discussion about the controversy associated with this data set: http://www.chronicle.com/article/Harvards-Privacy-Meltdown/128166/ § Research by Sinan Aral and collaborators on manipulation of voting on social media sites § One discussion: https://techcrunch.com/2013/08/11/reddit-science-herd/
  31. 31. § Mathematicians are relatively new to using human data, but we don’t yet have the ethics training to help us deal with the thorny issues § Learn from the best practices (and past mistakes) from other disciplines § As in those other disciplines, mathematicians should be getting ethics training § Read about — and think about and discuss — various controversies and other studies.We all may set our bars in a different place, but we need to do it conscientiously. § It’s a sliding bar: the more potential for invasion of personal privacy, the more valuable the potential outcome has to be for humanity § IRB approval is only a lower bound
  32. 32. § While I have more training and experience with these issues than most mathematicians, I am very much an amateur on data ethics compared to people from the social and human sciences, for whom this is a standard part of the training from the beginning of their education. § With this in mind, please contact me with any suggestions on these slides. Did I miss any salient points? Do you disagree with any of the discussed points? Are there any other studies that are especially crucial to bring up? § Eventually, I hope to develop these slides further into an article for a venue in the mathematical sciences. Let me know if you are interested in being involved in writing this article.
  33. 33. § Several suggestions for resources from Johan Ugander § Several comments on my slides and suggestions for resources from Peter Mucha § Website from Matt Salganik’s class on Computational Social Science (Fall 2016): http://www.princeton.edu/~mjs3/soc596_f2016/ § I drew some material and ideas from his slides on ethics § It would be pretty ironic if I plagiarized these slides, wouldn’t it?

×