RDAP 15: Providing access to restricted data in our institutions

•Download as PPTX, PDF•

2 likes•1,009 views

Research Data Access and Preservation Summit, 2015 Minneapolis, MN April 22-23, 2015 Part of “Beyond metadata: Supporting non-standardized documentation to facilitate data reuse” Sarah Pickle, CLIR/DLF Social Science Data Curation Fellow, Penn State University Libraries

Education

Providing access to restricted
data in our institutions
SARAH PICKLE
April 23, 2015

Cover art for Fienberg, et al., Sharing Research Data (Washington, D.C.:
National Academy Press, 1985).

we have a lot of data that are
restricted.

we have a lot of data that are
restricted.
[how] can we share them?

A framework for documenting
restricted data

What do we need to keep in mind?
 Ethical treatment of human subjects and approvals for data use
agreements

What do we need to keep in mind?
 Ethical treatment of human subjects and approvals for data use
agreements
 Secure technology for data transport, storage, access, and use

Preliminary recommendations
 Ethical treatment of human subjects and approvals for data use agreements
disclosure risks • risky links • informed consent language • cultural context

Viewers also liked

RDAP 15: Treating data like data: Unifying data processing workflows for data...ASIS&T

RDAP 15: Data Curation Issues for Electronic Theses and Dissertations at Corn...ASIS&T

RDAP 15: Supplemental Files for ETDS: Diversity, Documentation, and DataASIS&T

Building Organizational Capacity for Data Collections using Electronic Theses...Aaron Collie

Dancing With HIPAA (HxRefactored 2014) David Harlow 05 14 2014 ...David Harlow

Slide share2013scscb

The De-identification of Clinical DataKhaled El Emam

Viewers also liked (7)

RDAP 15: Treating data like data: Unifying data processing workflows for data...

RDAP 15: Data Curation Issues for Electronic Theses and Dissertations at Corn...

RDAP 15: Supplemental Files for ETDS: Diversity, Documentation, and Data

Building Organizational Capacity for Data Collections using Electronic Theses...

Dancing With HIPAA (HxRefactored 2014) David Harlow 05 14 2014 ...

Slide share2013

The De-identification of Clinical Data

Similar to RDAP 15: Providing access to restricted data in our institutions

Getting data into the data repositoryInternational Food Policy Research Institute (IFPRI)

Data as a service: a human-centered design approach/Retha de la HarpeAfrican Open Science Platform

Data science and ethics in fundraisingJames Orton

Ethics and Politics of Big Datarobkitchin

Reality CheckMEASURE Evaluation

Challenge of Technology Mediated Social ParticipationUniq UI: Usability, UX, and UI design, consulting and training

A Little Privacy, Please… Diving into Data Privacy for NonprofitsTechSoup

Critical issues in the collection, analysis and use of student (digital) dataUniversity of South Africa (Unisa)

Today's Data Grow Tomorrow's CitizensCommunication and Media Studies, Carleton University

Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultCASRAI

Data sharing in the age of the Social MachineUlrik Lyngs

Managing and publishing sensitive data in the social sciences - Webinar trans...ARDC

La ricerca scientifica nell'era dei Big Data - Sabina LeonelliIsmel - Istituto per la Memoria e la Cultura del Lavoro, dell'Impresa e dei Diritti Sociali

Me and My Big Data Project DIPRC2019

Secure Data Sharing and Related Matters – An NIH ViewPhilip Bourne

Data Sharing & Data CitationMicah Altman

National Cancer Policy Forum Summit - Warren Kibbe Keynote November 2013Warren Kibbe

Publishing and sharing sensitive data 28 JuneARDC

Steve Knight by DesignFuture Perfect 2012

What are sensitive data and why might they be trickier to publish?ARDC

Similar to RDAP 15: Providing access to restricted data in our institutions (20)

Getting data into the data repository

Data as a service: a human-centered design approach/Retha de la Harpe

Data science and ethics in fundraising

Ethics and Politics of Big Data

Reality Check

Challenge of Technology Mediated Social Participation

A Little Privacy, Please… Diving into Data Privacy for Nonprofits

Critical issues in the collection, analysis and use of student (digital) data

Today's Data Grow Tomorrow's Citizens

Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault

Data sharing in the age of the Social Machine

Managing and publishing sensitive data in the social sciences - Webinar trans...

La ricerca scientifica nell'era dei Big Data - Sabina Leonelli

Me and My Big Data Project

Secure Data Sharing and Related Matters – An NIH View

Data Sharing & Data Citation

National Cancer Policy Forum Summit - Warren Kibbe Keynote November 2013

Publishing and sharing sensitive data 28 June

Steve Knight by Design

What are sensitive data and why might they be trickier to publish?

Recently uploaded

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching

9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt

Student login on Anyboli platform.helpinRaunakKeshri1

Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417

Advanced Views - Calendar View in Odoo 17Celine George

Class 11th Physics NEET formula sheet pdfAyushMahapatra5

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

Advance Mobile Application Development class 07Dr. Mazin Mohamed alkathiri

A Critique of the Proposed National Education Policy ReformChameera Dedduwage

The Most Excellent Way | 1 Corinthians 13Steve Thomason

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics

Mattingly "AI & Prompt Design: The Basics of Prompt Design"National Information Standards Organization (NISO)

Measures of Central Tendency: Mean, Median and ModeThiyagu K

1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh

IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services

Interactive Powerpoint_How to Master effective communicationnomboosow

Introduction to Nonprofit Accounting: The BasicsTechSoup

Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha

APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur

Recently uploaded (20)

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

9548086042 for call girls in Indira Nagar with room service

Student login on Anyboli platform.helpin

Unit-IV- Pharma. Marketing Channels.pptx

Advanced Views - Calendar View in Odoo 17

Class 11th Physics NEET formula sheet pdf

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...

Advance Mobile Application Development class 07

A Critique of the Proposed National Education Policy Reform

The Most Excellent Way | 1 Corinthians 13

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...

Mattingly "AI & Prompt Design: The Basics of Prompt Design"

Measures of Central Tendency: Mean, Median and Mode

1029-Danh muc Sach Giao Khoa khoi 6.pdf

IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...

Interactive Powerpoint_How to Master effective communication

Introduction to Nonprofit Accounting: The Basics

Call Girls in Dwarka Mor Delhi Contact Us 9654467111

APM Welcome, APM North West Network Conference, Synergies Across Sectors

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...

RDAP 15: Providing access to restricted data in our institutions

1. Providing access to restricted data in our institutions SARAH PICKLE April 23, 2015

3. Cover art for Fienberg, et al., Sharing Research Data (Washington, D.C.: National Academy Press, 1985).

4. 2015

5. we have a lot of data that are restricted.

6. we have a lot of data that are restricted. [how] can we share them?

7. we have a lot of data that are restricted. [how] can we share them?

8. A framework for documenting restricted data

10.

11. Whom do we need to talk to?

12. Whom do we need to talk to?

13. Whom do we need to talk to?

14. Whom do we need to talk to?

15. Whom do we need to talk to?

16. Whom do we need to talk to?

17. Whom do we need to talk to?

18. What do we need to keep in mind?

19. What do we need to keep in mind?  Ethical treatment of human subjects and approvals for data use agreements

20. What do we need to keep in mind?  Ethical treatment of human subjects and approvals for data use agreements  Secure technology for data transport, storage, access, and use

21. What do we need to keep in mind?  Ethical treatment of human subjects and approvals for data use agreements  Secure technology for data transport, storage, access, and use  Efficient use of data in secure environment

22. What do we need to keep in mind?  Ethical treatment of human subjects and approvals for data use agreements  Secure technology for data transport, storage, access, and use  Efficient use of data in secure environment  Greater good of sharing data

23. Preliminary recommendations

24. Preliminary recommendations  Ethical treatment of human subjects and approvals for data use agreements disclosure risks • risky links • informed consent language • cultural context

25. Preliminary recommendations  Ethical treatment of human subjects and approvals for data use agreements disclosure risks • risky links • informed consent language • cultural context  Secure technology for data transport, storage, access, and use IT specs • data use agreement

26. Preliminary recommendations  Ethical treatment of human subjects and approvals for data use agreements disclosure risks • risky links • informed consent language • cultural context  Secure technology for data transport, storage, access, and use IT specs • data use agreement  Efficient use of data in secure environment (specific!) deidentification activities

27. Preliminary recommendations  Ethical treatment of human subjects and approvals for data use agreements disclosure risks • risky links • informed consent language • cultural context  Secure technology for data transport, storage, access, and use IT specs • data use agreement  Efficient use of data in secure environment (specific!) deidentification activities  Greater good of sharing data pre-written appeal to IRB

Editor's Notes

This is how we used to think about data sharing.
This image is from the cover of a book published in 1985. But, if we’re honest with ourselves, it’s not that far off from how we still share sensitive data today. In fact, if we just replace the floppy with a DVD…
Yep—that’s pretty much what it looks like. When sensitive data are actually shared, DVDs with those data are passed around from agency to researcher or between researchers and stored in locked boxes in secure data rooms. Access and use are restricted. Because of the privacy concerns around sensitive data, there are miles of red tape between those hands.
Social scientists rely heavily on secondary data for their research. Those who collect data themselves want to—and may well have to—share their work, but they don’t know how to, because much of their data includes private information about their research subjects.
But I’m starting to think that some additional documentation can help with giving and getting access to these restricted data. So this presentation will share a preliminary framework for thinking about what additional documentation might be needed in order to help enable access to and reuse of restricted data. I’ll also offer a handful of suggestions for fulfilling that need. This is preliminary and open to feedback, new ideas, etc.! --------------------- Let’s start from the beginning. By “restricted data,” I mean data that could simply be sensitive or could possibly even cause harm to people or property. Despite the sensitive nature of some social science data, we have a greater chance of being able to share restricted data from the social sciences than we do data from other fields that are bound by, say, HIPPA regulations (e.g., biomedicine) or export control (e.g., engineering). Sensitive data in the social sciences that are restricted are typically identifiable information. And since social scientists are often funded by NSF, NIH, and the like, they’re also bound by requirements to share their data. But that’s hard to do, given the need to protect the privacy of study participants. Where are we now with sharing restricted social science data? There’s currently a great deal of work being done at a select number of academic campuses to facilitate access to restricted-use social science data that are provided by federal agencies and by organizations like ICPSR and NORC. Developing physical and virtual data enclaves in which restricted data can be securely, safely stored and used (besides the population research centers out there, this is happening at place like Cornell, Emory, Johns Hopkins, UVa, Wisconsin, Rutgers) Improving processes for getting restricted data use agreements—that is, contracts for secondary data use—signed (At Penn State, there are currently three different units that can sign data use agreements, and it’s unclear who or under what circumstances PIs should go to one rather than the others; and they certainly aren’t coordinated to ensure that security requirements and negotiations that take place in one office are consistent with those in the other offices.) ------------------------ But really, the focus of this presentation is on us in this room.
How can WE—data managers or curators at universities or in research organizations—serve as the providers of sensitive data rather than just facilitators? Sure, facilitation is hard, but I want to ask how we can play a role akin to that of the Department of Education, which publishes the data from the National Center for Education Statistics? How can we provide access to the data collected by our researchers, rather than only facilitating access to data that belong to other entities? In a few places, we already see the university stepping up as a provider of restricted data: e.g., Health and Retirement Study & Michigan Center on the Demography of Aging (Michigan), Add Health (UNC-CH) But I want to address what folks like us are more likely to encounter: the myriad data sets that are much smaller than those from Pop Centers that are being created by our social scientists all the time. Two Penn State examples: Jenny Trinitapoli’s data on religion and health in Africa; the Tremin Research Program on Women’s Health, a longitudinal study of women’s reproductive health These are data that need to be shared—whose greatest contributions to research are in those sensitive data that would be obscured in an anonymized public version of these files—but their PIs don’t know how to share them because those researchers don’t have an entire center or program dedicated to sharing their data and they don’t have the money to take advantage of services like those provided by ICPSR I’ll briefly mention that advice is emerging on how to plan early for sharing human subjects data—Elizabeth Buchanan at UW-Stout is a leading figure in this conversation; speaks to how researchers might prepare their informed consent language for easier reuse. But while we can be proactive and try to get in on the ground floor of a project, at this stage, we’re still more likely to have a PI approach us at the end of a project, asking for help sharing her data. So what can we do with the data that fall in our laps? How can we share these data responsibly? Answer so far: Let’s create a public-use files of these data: We have pretty great guidance for how to do that (the report from ANDS, as the most recent example, but also from the UK Data Archive). While an anonymized, public dataset may be sufficient to help address some research questions, anonymization can obscure the information that is likely to be most useful to other researchers. (Ex: being able to drill down to smaller geographic areas in order to speak to distinctions between neighborhoods. Allows for more nuanced investigations.) These research questions may well only be addressed through access to the restricted files. But it can be such a pain to actually get ahold of and use these data for all the reasons I’ve already mentioned—finding secure spaces to work; getting contracts signed; but also convincing IRB it’s necessary to work with these data. (Reference proposed Jisc study http://bit.ly/1CUZQP6 )
Now I’d like to try to provide a framework for how we might help document restricted social science data in order to facilitate access and reuse. So that we can be the data providers, not just facilitators. Caveat again: just some preliminary ideas here and many may seem obvious. I’d love to hear your reactions. If we, data managers/curators/repository staff, want to help provide this kind of first-level access to the many sensitive datasets on our campuses, we can’t just lock dozens of DVDs or external hard drives in a closet only we have the key to and then see what comes at us.
It’d be great if this was all it took to share these data.
But really, there are a ton of detours between these PIs. So, we have to think strategically about who and what is responsible for all those detours and ask how we can prepare for them.
Whom do we need to talk to in order to navigate this crazy route?
Original PI, who knows what they gathered and how they gathered it
Secondary PI, who knows how they’ll use the data and the risk of disclosure when they use the data, given their knowledge of the field
On-campus policy officers and contract negotiators: IRB (human subjects), Office of Sponsored Programs. Anyone who determines under what conditions data may and can be shared and used.
On-campus IT (security folks), who need to talk with those policy makers and enforcers in order to ensure compliance and consistency in securing the data.
Finally, we also need to involve advocates a higher level, e.g., NSF program officers, who say that sharing these data is important. If we don’t have them behind is, it’ll be hard to motivate our local policy makers and enforcers to take this risk on. Once we understand the concerns of all these parties and navigate this tricky route between PIs…
…MAYBE we can actually share this stuff. ------------------------------------- When we first get all these people involved in this kind of conversation, that conversation is often limited to what we can’t do. IRB, Risk Management, OSP: their goal is to mitigate risk and protect subjects; it’s understandable. But what happens, as a result? What generally happens to a restricted dataset at the end of a project? Best case scenarios: it stays on lockdown, accessible only to the research team; an anonymized file is produced and shared—a public-use dataset. With the exception of those big, federally-supported datasets mentioned earlier or those that end up in ICPSR or another national data archive (b/c they have a lot of money to pay ICPSR or another archive to curate them), there is rarely ever any useful information available about the datasets on lockdown or anything useful about the restricted versions of public datasets. Rare even to have metadata records so that potential users could know these resources exist. Honestly, though, that would be one good place to start: creating tombstone metadata records in our institutional repositories that don’t link to the restricted dataset, but at least provide contact information for the gatekeeper. This record could also contain a field specifying the embargo periods for the data and trigger dates for public releases. But that’s just a start. Once researchers know about these restricted datasets, how then can we provide access to them in a secure and responsible way? If we keep pushing and asking how we can share instead of why we can’t, we’ll need to turn to the policies that govern all this. So, when a researcher wants to use a restricted dataset, she might run into a handful of challenges.
Here are some things we’ll need to keep in mind.
First, we need to keep in mind the ethical treatment of human subjects participants by the secondary data user and approvals for their data use agreements. One challenge is that the IRB has trouble making a determination about the risk to human subjects risks in a dataset that the IRB can’t see. And there’s a parallel challenge of trying to evaluate the risks that might arise when dataset A is linked to dataset B when the Board can’t examine one or both of those datasets. IRB has trouble knowing what exactly participants have consented to w/r/t how the data might be used in a study other than the one they were direct participants in. Still on the topic of general research ethics: when trying to protect sensitive data, we often do so at the risk of eliding important contextual information, which in turn could lead to the misuse or misappropriation. This is an especially grave risk when personal human subject information is involved. These ethical matters also motivate the regulations included in data use agreements. What’s more, the issue of DUAs bleeds into technology requirements for ensuring the safe use and storage of these data.
DUAs force—or should force—us to think about the technology used to transport, store, get access to, and work on the restricted data. They’re really about how to do all that safely and securely. But the groups on our campuses that sign DUAs—OSP/Risk Management/Purchasing—often have trouble interpreting what security protections need to be in place for storing and using the data, who can implement those protections, and who is ultimately responsible, should there be a breach.
Another consideration has to do with the fact that researchers might need to be able to make efficient use of the data in that secure environment. This is because there will certainly be limitations on who can get access to the data once acquired, and there might be limitations on how much time those who can work in the data are allowed to so. So, for example: Before the researcher herself gets access, we need to think about how her programmers/graduate students can prepare code for a restricted-use file using only a public-use copy of the dataset
Finally, when we run up against these challenges and feel helpless, we should keep in mind how providing access to the data could potentially benefit the greater good and let that motivate us to keep pressing on. (To be honest, this is an overarching issue to keep in mind as we think about our need to share restricted data; the reminder comes from an NSF program officer): Assuming that, these days, practically any sensitive data—even if they’re “anonymized”—are reidentifiable—which may well be the case—how damaging would it be to the subjects for these data to be shared in a controlled way? And is that risk outweighed by greater beneficence or justice these data can help issue? The Belmont Report—which addresses Ethical Principles and Guidelines for the Protection of Human Subjects—does say the latter two tenets of beneficence and justice must be individually just as important as respect for persons when we weigh the basic ethical principles of research.
With these considerations in mind, what can we do? What additional documentation can we—as data providers at relatively small scale—make available in order to help address those challenges? Here are some untested recommendations.
With respect to the first point, we can provide detailed documentation about: which variables individually or in combination pose disclosure risks; why they do; and how those risks might change given the secondary user’s areas of expertise or other considerations. Disclose potential risky linking with other datasets Include copies of informed consent language Provide even more cultural context, for example, for the data in order to try to prevent misuse and misappropriation of them
On the second point, we can draft flexible data use agreements could be provided as an appendix to the dataset; they’d detail IT specifications for securely transporting, storing, providing access to, and using the data. But they could be adjusted, depending on the secondary PI’s institutional context.
Enabling efficient use of the data ties in with ideas already mentioned in the context of human subjects review: information on precisely how variable names have been changed in, say, creating the public file of a restricted dataset, and how the coding in the restricted file differs from the public one.
Finally, we can help promote use of the data as a contribution to the greater good: we can provide language helping to articulate how the IRB, for example, might want to weigh the three tenets of ethical research when making its decision about whether a PI should be approved to use the restricted dataset. ----------------- This where I am now as I try to tease out the different challenges we face in sharing those restricted data dropped in our laps: it largely boils down to concerns related to human subjects protections and technology. It would certainly be easier to address those concerns before the data are gathered—before the original study goes to the IRB—but since that’s still rarely the case, I hope I’ve sketched out at least a framework for how we can approach the question of sharing restricted data collected by our local researchers.

RDAP 15: Providing access to restricted data in our institutions

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to RDAP 15: Providing access to restricted data in our institutions

Similar to RDAP 15: Providing access to restricted data in our institutions (20)

More from ASIS&T

More from ASIS&T (20)

Recently uploaded

Recently uploaded (20)

RDAP 15: Providing access to restricted data in our institutions

Editor's Notes