Data Ethics, Data
Privacy, and Support
Systems
Bonnie Tijerina
Researcher, Data & Society
Librarian
Convener, ER&L and D4D
bonnie@datasociety.net
@bonlth
http://bit.ly/ethicsindata
Data & Fairness
The Future of
Labor in a Data-
Centric Society
Enabling
Connected
Learning
Ethics in “Big
Data” Research
Intelligence and
Autonomy
Data, Human
Rights & Human
Security
•Part 1: Data Ethics Amongst
Technical Researchers
•Part 2: Privacy & Research Data
Use and Reuse
•Part 3: Support System &
infrastructure Discussion
Part 1: Data Ethics Amongst
Technical Researchers
Ethics in the News
Big Data Ethics Support
Systems and Networks
Bonnie Tijerina, danah boyd & Emily F. Keller
Data & Society Research Institute
Big Data
(CC BY-NC 2.0-licensed photo by janneke staaks.)
Ethical Concerns
•Data Collection
•Data Storage
•Data Sharing, Reuse, Replicability
•Re-identification and Consent
•Unknown and Emerging
Data Collection
•Web scraping
•TOS violations
•Secondary reuse
Data Storage
•3rd Party Storage
•Improper Storage
•Data Security
Sharing, Reuse, Replicability
•Questions about ‘Public Data’
•Non-intrusive research
•Value of Data Reuse
•Frustration when attempting to reuse data
Re-identification and Consent
•Googling quotes
•The game of reidentification
•Balancing concrete data and privacy
Unknown and Emerging
“The thing that worries me more is that as we
develop new methods and new techniques and
new types of research, new potentials for harm,
new ethical questions are coming up for which we
don’t have many examples yet and people are not
aware - the unknown ‘gotchas’ that I worry about
people falling into.”
Reframing Ethics
Definitions
• Violations
• Trustworthy
• Morality
• Conscience
• Mistakes
• Transparency
• Responsibility
• Protection
• Integrity
• Anonymity
• Accountability
• Responsibility
• Teaching
• Consent
• Objectivity
• Security
RDM Services in Libraries
• Triage Services
• Support of copyright and IP data issues
• Consultations on data-related skills
• Resource for questions on storage, data sharing, metadata
preparation
• Discipline-specific data liaisons
Part 2: Privacy & Research
Data Reuse
Privacy in Research Data
•Advancements in data science & data
exchange & social changes … meet scholarly
communication
•Increased publication of data & data deposit
mandates
•Privacy relevant in STM but concerns also
present in HSS
Initial Issues
• Data does not contain PII but could if data set is combined with
related data sets
• Data that is historical in nature contains PII
• Data from back-end usage of systems and policies regarding
tracking of researcher behavior
• Security of data replication and interaction with other systems
• Privacy metadata about the data set
More Issues
• When is privacy about a community, not just an individual?
• Replication is not being evaluated
• Reuse restrictions
• Longitudinal consent
• Ethical but not legal responsibilities
• Library repositories will not take PII
• “Privacy is a footnote in the policy world and OA Movement. It’s
being left to the local community”
Outputs/Outcomes
• High level framework
• Outline of situations where principles would be
applied
• Identification of key areas of variance in privacy
laws worldwide
• Set of technical metadata
• Bibliography
Part 3: Infrastructure
Discussion
Ethics in Research Data Project
How can infrastructures that exist around
technical researchers support the
emerging ethical issues in data
research?
Formal Structures
CC BY-NC 2.0-licensed image by Aaron Parecki
•IRB
•DMPs
•Requirements
On-Campus Workshop
Exercise 1. Alice Exercise
• How will Alice know how to write a DMP for her work? Where will she see what others have written?
• Who will teach her about how to navigate IRB? How will she learn how to create an appropriate protocol?
• How will Alice store and secure her data? Are there services on campus that she can turn to that will support her? What will she be
expected to manage?
• How will she know which private services to use in her work? (e.g., Dropbox, AWS, Azure, etc.) Who helps her figure out the
security and terms of service issues that might arise?
• Who will help her negotiate a contract with ImageCompany?
• Who will help her learn how to navigate Amazon Mechanical Turk’s protocols and processes?
• Who can help her deal with best (technical and social) practices in scraping data or navigating data collection on Amazon
Mechanical Turk?
• Who will help her understand how to assess the limitations and biases in her data sets?
• How can Alice get help with technical issues she’s encountered? (Consider statistics questions, machine learning questions, etc.
Imagine what happens when her advisor is toobusy, lacks the expertise, or she’s too afraid to ask.)
• Should her processes change depending on the social implications of her project? What if she’s
consulting for ImageCompany as part of getting access to the data?
• What are the questions Alice hasn’t even thought of yet?
Exercise 2. Data Clinic Model
• Who would the Data Clinic support? Students? Faculty? What disciplines? What would be needed to support
computer science faculty vs. humanities students?
• What range of problems could the Data Clinic help with? Imagine technical, logistical, and security-related
support services. How much of this overlaps with the Stats Clinic?
• Who would staff the Data Clinic? What is the role of faculty, students, and support services? What kind of hires
would be needed?
• What kind of funding support would be needed for this to work at your university? Who would pay for the
services? What are the economic hurdles for this?
• What kind of organizational support would be needed to make this work at your university? Bottom up versus
top town? Who would need to buy-in? What are the organizational barriers to making this happen?
• What kind of technical and structural support would be needed? (e.g. secure data storage, centralized
company contracts) Who currently provides those services?
• What would this model not help with?
• What will get in the way of a Data Clinic working at your university?
Interlocutor
(photo source)
Video
http://datasociety.net/output/supporting-
ethics-in-data-research/
Teaching Ethics
• Apprenticeship
• Required click-through training
• Embedded ethical questioning
• Lecture series
• Case studies
• For-credit courses
Expanded Role for Libraries
•Data Sharing
•Proper use of information, copyright, and
licensing
•Education & Advocacy for the Open Movement
THANK YOU.
Bonnie Tijerina
bonnie@datasociety.net
bonnietijerina.com
@bonlth
http://bit.ly/ethicsindata

ERN-Data-Ethics.pptx

  • 1.
    Data Ethics, Data Privacy,and Support Systems
  • 2.
    Bonnie Tijerina Researcher, Data& Society Librarian Convener, ER&L and D4D bonnie@datasociety.net @bonlth http://bit.ly/ethicsindata
  • 3.
    Data & Fairness TheFuture of Labor in a Data- Centric Society Enabling Connected Learning Ethics in “Big Data” Research Intelligence and Autonomy Data, Human Rights & Human Security
  • 4.
    •Part 1: DataEthics Amongst Technical Researchers •Part 2: Privacy & Research Data Use and Reuse •Part 3: Support System & infrastructure Discussion
  • 5.
    Part 1: DataEthics Amongst Technical Researchers
  • 6.
  • 7.
    Big Data EthicsSupport Systems and Networks Bonnie Tijerina, danah boyd & Emily F. Keller Data & Society Research Institute
  • 8.
    Big Data (CC BY-NC2.0-licensed photo by janneke staaks.)
  • 9.
    Ethical Concerns •Data Collection •DataStorage •Data Sharing, Reuse, Replicability •Re-identification and Consent •Unknown and Emerging
  • 10.
    Data Collection •Web scraping •TOSviolations •Secondary reuse
  • 11.
    Data Storage •3rd PartyStorage •Improper Storage •Data Security
  • 12.
    Sharing, Reuse, Replicability •Questionsabout ‘Public Data’ •Non-intrusive research •Value of Data Reuse •Frustration when attempting to reuse data
  • 13.
    Re-identification and Consent •Googlingquotes •The game of reidentification •Balancing concrete data and privacy
  • 14.
    Unknown and Emerging “Thething that worries me more is that as we develop new methods and new techniques and new types of research, new potentials for harm, new ethical questions are coming up for which we don’t have many examples yet and people are not aware - the unknown ‘gotchas’ that I worry about people falling into.”
  • 15.
    Reframing Ethics Definitions • Violations •Trustworthy • Morality • Conscience • Mistakes • Transparency • Responsibility • Protection • Integrity • Anonymity • Accountability • Responsibility • Teaching • Consent • Objectivity • Security
  • 16.
    RDM Services inLibraries • Triage Services • Support of copyright and IP data issues • Consultations on data-related skills • Resource for questions on storage, data sharing, metadata preparation • Discipline-specific data liaisons
  • 17.
    Part 2: Privacy& Research Data Reuse
  • 18.
    Privacy in ResearchData •Advancements in data science & data exchange & social changes … meet scholarly communication •Increased publication of data & data deposit mandates •Privacy relevant in STM but concerns also present in HSS
  • 19.
    Initial Issues • Datadoes not contain PII but could if data set is combined with related data sets • Data that is historical in nature contains PII • Data from back-end usage of systems and policies regarding tracking of researcher behavior • Security of data replication and interaction with other systems • Privacy metadata about the data set
  • 21.
    More Issues • Whenis privacy about a community, not just an individual? • Replication is not being evaluated • Reuse restrictions • Longitudinal consent • Ethical but not legal responsibilities • Library repositories will not take PII • “Privacy is a footnote in the policy world and OA Movement. It’s being left to the local community”
  • 22.
    Outputs/Outcomes • High levelframework • Outline of situations where principles would be applied • Identification of key areas of variance in privacy laws worldwide • Set of technical metadata • Bibliography
  • 23.
  • 24.
    Ethics in ResearchData Project How can infrastructures that exist around technical researchers support the emerging ethical issues in data research?
  • 25.
    Formal Structures CC BY-NC2.0-licensed image by Aaron Parecki •IRB •DMPs •Requirements
  • 26.
  • 27.
    Exercise 1. AliceExercise • How will Alice know how to write a DMP for her work? Where will she see what others have written? • Who will teach her about how to navigate IRB? How will she learn how to create an appropriate protocol? • How will Alice store and secure her data? Are there services on campus that she can turn to that will support her? What will she be expected to manage? • How will she know which private services to use in her work? (e.g., Dropbox, AWS, Azure, etc.) Who helps her figure out the security and terms of service issues that might arise? • Who will help her negotiate a contract with ImageCompany? • Who will help her learn how to navigate Amazon Mechanical Turk’s protocols and processes? • Who can help her deal with best (technical and social) practices in scraping data or navigating data collection on Amazon Mechanical Turk? • Who will help her understand how to assess the limitations and biases in her data sets? • How can Alice get help with technical issues she’s encountered? (Consider statistics questions, machine learning questions, etc. Imagine what happens when her advisor is toobusy, lacks the expertise, or she’s too afraid to ask.) • Should her processes change depending on the social implications of her project? What if she’s consulting for ImageCompany as part of getting access to the data? • What are the questions Alice hasn’t even thought of yet?
  • 28.
    Exercise 2. DataClinic Model • Who would the Data Clinic support? Students? Faculty? What disciplines? What would be needed to support computer science faculty vs. humanities students? • What range of problems could the Data Clinic help with? Imagine technical, logistical, and security-related support services. How much of this overlaps with the Stats Clinic? • Who would staff the Data Clinic? What is the role of faculty, students, and support services? What kind of hires would be needed? • What kind of funding support would be needed for this to work at your university? Who would pay for the services? What are the economic hurdles for this? • What kind of organizational support would be needed to make this work at your university? Bottom up versus top town? Who would need to buy-in? What are the organizational barriers to making this happen? • What kind of technical and structural support would be needed? (e.g. secure data storage, centralized company contracts) Who currently provides those services? • What would this model not help with? • What will get in the way of a Data Clinic working at your university?
  • 29.
  • 30.
  • 32.
    Teaching Ethics • Apprenticeship •Required click-through training • Embedded ethical questioning • Lecture series • Case studies • For-credit courses
  • 33.
    Expanded Role forLibraries •Data Sharing •Proper use of information, copyright, and licensing •Education & Advocacy for the Open Movement
  • 34.