Ethical challenges for online social science research: Networks, rentals and confessionals

Uploaded on

Presentation at the 5th International Conference on eSocial Science. Part of a workshop on the law and ethics of eSocial Science research. It outlines three domains I am currently researching and some …

Presentation at the 5th International Conference on eSocial Science. Part of a workshop on the law and ethics of eSocial Science research. It outlines three domains I am currently researching and some of the ethical issues I have encountered including reporting on a third party (Facebook), deception (craigslist) and information access (

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Ethical challenges for online social science research: Networks, Rentals and Confessionals Bernie Hogan Research Fellow, Oxford Internet Institute NCeSS - 5th International Conference on e-Social Science June 24, 2009. Cologne, Germany Wednesday, June 24, 2009 1
  • 2. Three unethical studies? • Facebook network research • Craigslist audit study • Wednesday, June 24, 2009 2
  • 3. Wednesday, June 24, 2009 3
  • 4. What are the techniques? • Spidering - Technically fussy, often considered inappropriate by data controller • API - Technically restrictive, gives false sense of data ownership (See Facebook Developer Terms of Use Section 2.A.6) • Datadump - Facebook gives you the data • Someone else’s application - May not give data, but only a picture. • Handcoding - Spidering for masochists Wednesday, June 24, 2009 4
  • 5. Who gets the data? • Golder, S., Wilkinson, D. M., and Huberman, B. A. (2007). Rhythms of social interaction: Messaging within a massive online network. In 3rd International Conference on Communities and Technologies, East Lansing, MI. Springer. • Traud, A., Kelsic, E., Mucha, P., and Porter, M. (2008). Community structure in online collegiate networks. Working paper. • Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., and Christakis, N. (2008). Tastes, ties, and time: A new social network dataset using Social Networks, 30(4):330–342. Wednesday, June 24, 2009 5
  • 6. But isn’t it anonymous? No. • Backstrom, L., Dwork, C., and Kleinberg, J. (2007). Wherefore art thou r3579x? : anonymized social networks, hidden patterns, and structural steganography. In Proceedings of the 16th international conference on World Wide Web, pages 181–190. ACM New York, NY, USA. • Direct attack needs ~ sqrt(log(n)) nodes. • Narayanan, A. and Shmatikov,V. (2009). De-anonymizing social networks. Forthcoming: IEEE C&S. • Starting with even less and matching to existing network can get over 90% of the network accurately. Wednesday, June 24, 2009 6
  • 7. Or simply use this guy Zimmer, Michael. 2009. “But the Data is Already Public”: On the Ethics of Research in Facebook. 8th International Conference of Computer Ethics: Philosophical Enquiry. Corfu, Greece. Wednesday, June 24, 2009 7
  • 8. The only anonymous network is one where you know don’t know the network structure. This is unrealistic. Wednesday, June 24, 2009 8
  • 9. So what’s the precedent? • Personal networks with informed consent. • Name generators have historically asked individuals to report data on their friends. • They jump through an ethical loop-hole vis-a-vis the fact that this is recall data. • Information networks, however, permit not only data created by an individual, but the friend of a friend data that is merely accessible, not created, by the respondent. Wednesday, June 24, 2009 9
  • 10. Facebook properties enable you to report on your friends to a third party. Respondent Friend 1 ? Friend 2 Wednesday, June 24, 2009 10
  • 11. Wednesday, June 24, 2009 11
  • 12. Wednesday, June 24, 2009 12
  • 13. Methods • This is a University of Toronto ethics board-approved audit study. • We selected, a highly popular free online classifieds site. • From March to June 2007 we selected approximately 10 new ads each day for inclusion in the study. • Each landlord was emailed 5 messages. Each message included one of five ethnicities randomly assigned with one of five message bodies. Each experiment used one gender only. Wednesday, June 24, 2009 13
  • 14. 1. Price and number of bedrooms 2. Masked email 3. Well-formed almost always in header. address. date 4 . PostingID - key 5. Link to well-formed Google map, or to linking data failing that, nearest intersection. Wednesday, June 24, 2009 14
  • 15. Jitter means that messages are We send messages out one day after the sent at a random time within "5" posting (rather than immediately) at short minutes of the specified time. regular intervals. The parameters can be Makes batches of messages look tuned. more realistic By default we alternate between This window shows the five name / message male and female names. combinations that will be sent out. Wednesday, June 24, 2009 15
  • 16. Date Email address. 1 of 5 different message bodies. Secret posting ID: 1 of 5 female arabic names ddhfegjfb = 337546951 Wednesday, June 24, 2009 16
  • 17. Map of rentals in Greater Toronto Area Geographic distribution of rental ads (97% showing) Wednesday, June 24, 2009 17
  • 18. Ranked responses for names by ethnicity and gender • We ranked each of the Male Female 50 names from 1 (least 519 756 responses) to 50 (most responses). Arab 31 113 Black 97 129 • The table shows the sum of the ranks for all 5 SE Asian 88 179 names used in each ethnicity-gender Caucasian 146 164 combination. Jewish 157 171 Wednesday, June 24, 2009 18
  • 19. Issues • Racism is often difficult to assess through direct questioning. • Deception in this study is necessary. • There is no direct personal harm, and no direct manipulation. Wednesday, June 24, 2009 19
  • 20. Wednesday, June 24, 2009 20
  • 21. Online confessional site • What constitutes anonymity? • Grouphug is a website of approximately one million posts (approximately 95% unique). • Does not store IP, actively discourages quoting other posts and encodes the entries in non-sequential strings (timestamps exist but are hidden) Wednesday, June 24, 2009 21
  • 22. Nothing here to see... (catch 22) Wednesday, June 24, 2009 22
  • 23. Ok, here are some examples • “I am so happy that I can confess again. I don't even care about seeing my confessions on here, it's just the feeling of getting it off your chest and sending it away!” (136158003) • “I pee in the shower because I hate everyone I live with.” (255678370) Wednesday, June 24, 2009 23
  • 24. Some worse examples • “I paid my friend 200 dollars to do over 400 pages of homework for the year, so that i can ditch school as much as i want, while lying to my mother and saying im still going to school” (194778021) • “I have HPV, its a std. I have known about it for 7 years, but that has not stopped me from having sex with 9 people with out a condom. 4 of the girls where married. I have never told anyone about my std. I have no idea how many people are infected because of me, it keeps me up at night.” (275447713) Wednesday, June 24, 2009 24
  • 25. So... • Do we ignore anonymous confessionals as too toxic, or treat them as insight to the id? • Can we even analyze this data or merely view it as passive bystanders? Are there legal implications, especially dealing with data designed to resist tracking? What is my responsibility if I can do nothing to follow up (or even confirm the veracity of the statement)? Wednesday, June 24, 2009 25
  • 26. Summary • Facebook - the ethics of capturing someone else’s relationships is ambiguous. The network I see is not mine - it is what I am allowed to see. I defer to Facebook’s terms of use. • Craigslist - the ethics of understanding racism as it actually operates online is problematic. I defer to utilitarian arguments and approval from the ethics board. • Grouphug - the ethics of viewing and storing, let alone analyzing, confessionals is ambiguous. How can we assure no personally identifying information without looking for it? How can we anonymize a million entries? Wednesday, June 24, 2009 26
  • 27. Opportunities • We can get unprecedented access to society in the wild. • But is this fair? Is it justified? • How close to ‘the social good’ must one be to justify this work? Wednesday, June 24, 2009 27
  • 28. Thank You Bernie Hogan Wednesday, June 24, 2009 28