Presentation given to the California Center for Population Research on principles of research ethics, data management for protection of privacy and confidentiality, and applying for access to restricted data in social science research.
2. Goals for today’s presentation
• Discuss basic professional standards for
researchers
• Discuss ways to manage data to protect
privacy and confidentiality
• Using restricted-access data
5/19/2014 CCPR - May 19, 2014 2
Thanks to Shira Safir, UCLA School of
Public Health for many of the slides
3. Professional Standards for Researchers
• "For individuals, research integrity is an aspect
of moral character and experience. It involves
above all a commitment to intellectual
honesty and personal responsibility for one’s
actions and to a range of practices that
characterize responsible research conduct."
(Taken from the National Academies of
Sciences Report)
Source: Slide shared by Shira Shafir, UCLA School of Public Health
5/19/2014 CCPR - May 19, 2014 3
4. Professional Standards for Researchers
• The best ethical practices produce the best research.
• These practices include:
– Proficiency and fairness in peer review;
– Accuracy and fairness in representing contributions to research
proposals and reports;
– Collegiality in interactions, communications and sharing of
resources;
– Honesty and fairness in proposing, performing, and reporting
research;
– Disclosure of conflicts of interest;
– Protection of human subjects in the conduct of research;
Source: Slide shared by Shira Shafir, UCLA School of Public Health
5/19/2014 CCPR - May 19, 2014 4
5. Collaborative Research
• Today, advances in the research are rarely
made by single investigators.
• Typically, collaboration allows the
investigative team to ask powerful new
questions, the answers to which would be
otherwise unattainable.
Source: Slide shared by Shira Shafir, UCLA School of Public Health
5/19/2014 CCPR - May 19, 2014 5
6. Conflicts of Interest
• “Conflict of Interest” is a legal term that
encompasses a wide spectrum of behaviors or
actions involving personal gain or financial
interest.
• A conflict of interest exists when an individual
exploits, or appears to exploit, his or her
position for personal gain or for the profit of a
member of his or her immediate family or
household. Source: Slide shared by Shira Shafir, UCLA School of Public Health
5/19/2014 CCPR - May 19, 2014 6
7. Misconduct in Research
• Errors involving deception comprise the most
serious category of errors.
• These types of errors involve:
– Violate privacy and confidentiality of research subjects
– Making up data or results (fabrication)
– Changing or misreporting data or results (falsification)
– Using the ideas or words of another without giving
appropriate credit (plagiarism)
• When in doubt, the government provides 26 excellent
guidelines: http://ori.hhs.gov/plagiarism-0
Source: Slide shared by Shira Shafir, UCLA School of Public Health
5/19/2014 CCPR - May 19, 2014 7
8. Manage data to protect respondents
• Assess disclosure risk: “A disclosure risk occurs if
an unacceptably narrow estimation of a
respondent’s confidential information is possible or
if exact disclosure is possible with a high level of
confidence.” http://neon.vb.cbs.nl/casc/Glossary.htm
• Procedures
– Informed consent
– Remove identifiers
– Data manipulation
– Restrict access
5/19/2014 CCPR - May 19, 2014 8
9. Informed consent
• “Informed consent is a process of communication
between a subject and researcher to enable the person
to decide voluntarily whether to participate in a study.
Human subjects involved in a project must participate
willingly and be adequately informed about the
research. The informed consent must include a
statement describing how the confidentiality of subject
records will be maintained. However, it also is
important that informed consent be written in a way
that does not unduly limit an investigator's discretion
to share data with the research community.”
5/19/2014 CCPR - May 19, 2014 9
Source: ICPSR web site on Confidentiality
10. Remove identifiers
• Direct identifiers. These are variables that point explicitly to particular
individuals or units. Examples include:
– Names, Addresses, including ZIP and other postal codes
– Telephone numbers, including area codes
– Social Security numbers
– Other linkable numbers such as driver's license numbers, certification numbers, etc.
• Indirect identifiers. These are variables that can be problematic as they
may be used together or in conjunction with other information to
identify individual respondents. Examples include:
– Detailed geographic information (e.g., state, county, province, or census tract of residence)
– Organizations to which the respondent belongs
– Educational institutions (from which the respondent graduated and year of graduation)
– Detailed occupational titles
– Place where respondent grew up
– Exact dates of events (birth, death, marriage, divorce)
– Detailed income
– Offices or posts held by respondent
5/19/2014 CCPR - May 19, 2014 10
Source: ICPSR web site on Confidentiality
11. Data Manipulation
• Recoding -- can include converting dates to time intervals, exact dates
of birth to age groups, detailed geographic codes to broader levels of
geography, and income to income ranges or categories.
• Removal -- eliminating the variable from the dataset entirely.
• Top-coding -- restricting the upper range of a variable.
• Collapsing and/or combining variables -- combining values of a single
variable or merging data recorded in two or more variables into a new
summary variable.
• Sampling -- rather than providing all of the original data, releasing a
random sample of sufficient size to yield reasonable inferences.
• Swapping -- matching unique cases on the indirect identifier, then
exchanging the values of key variables between the cases. This retains
the analytic utility and covariate structure of the dataset while
protecting subject confidentiality. Swapping is a service that archives
may offer to limit disclosure risk. (For more in-depth discussion of this
technique, see O’Rourke, 2003 and 2006.)
• Disturbing -- adding random variation or stochastic error to the variable.
This retains the statistical properties between the variable and its
covariates, while preventing someone from using the variable as a
means for linking records.5/19/2014 11
Source: ICPSR Guide to Social Science Data Preparation and Archiving
12. Restrict Access
• “Restricted-use data are distributed in cases
when removing potentially identifying
information would significantly impair the
analytic potential of the data. In other cases,
data contain highly sensitive personal
information and cannot be shared as a public-
use file. In these cases, ICPSR provides access
to a restricted-use version that retains the
confidential data but requires controlled
conditions for accessing them.”
5/19/2014 CCPR - May 19, 2014 12
Source: ICPSR Web site on confidentiality
13. Ways to restrict access
• ICPSR has established several mechanisms by which restricted-use data can be
distributed:
• Secure online analysis (publicly available): This option provides immediate
access to restricted-use data behind an analytic interface that has
programmable disclosure protection.
• Secure online analysis (password protected): This option provides analysis of
restricted-use data behind an interface with programmable disclosure
protection for selected users. With this option, users may have to submit an
application to access the data, or they may be part of a defined group, such as
a research group.
• Restricted Use Data Agreement: With this option, users submit a request to
access the data, and after approval, download the data using a single-use
password or receive the data on CD-ROM.
• Virtual Data Enclave (VDE): The VDE is a secure, online environment via which
approved users analyze restricted-use data using several software options
available within the VDE, such as SAS, Stata, and SUDAAN.
• Physical Data Enclave: For highly restricted data, ICPSR has a physical enclave,
which requires that approved users be on site at ICPSR to use the data. Data
use in the physical data enclave is monitored by ICPSR staff.
5/19/2014 CCPR - May 19, 2014 13
Source: ICPSR Web site on confidentiality
14. Apply for Restricted Data Access
• Application for Restricted-Use Data. This includes information about the
Investigator and the research project that requires access to the restricted-use
data. The application may require the current CVs of all researchers who will be
working on the project.
• Confidential Data Security Plan. The fundamental goals of this plan are to
ensure that the restricted-use data are securely stored at the institution and
accessible only to the people listed in the request. For some data security plans,
questions are presented in such a way for the Investigator to describe in detail
how this responsibility will be met. The questions and answers combined
comprise the confidential data security plan. For some data security plans, static
terms are presented to which the requester must agree.
• Restricted Data Use Agreement. This is a legal agreement between the
University of Michigan and the Investigator's institution specifying the terms of
the use of the restricted-use data.
• Supplemental Agreement with Research Staff Form. This identifies every
person other than the Investigator who will have access to the restricted-use
data. New research staff added in the course of a project must be added to and
sign this form before they can access the data.
• Pledge of Confidentiality. The Investigator and all research staff must sign this
pledge before they can access the data.
5/19/2014 CCPR - May 19, 2014 14
Source: ICPSR web site on restricted data access
15. Information needed to apply
• Name, department, and title of the Investigator
• Description of the proposed research that supports
need to access restricted-use data
• Information on data formats needed and data-
storage technology
• Approval or exemption for the research project from
the Institutional Review Board of the Investigator's
organization (for some restricted-use data)
• A Restricted Data Use Agreement signed by the
Investigator and a legal representative from the
Investigator's institution
• Other required information as specified in the
Restricted Data Use Agreement
5/19/2014 CCPR - May 19, 2014 15
Source: ICPSR web site on restricted data access
16. Add Health example
5/19/2014 CCPR - May 19, 2014 16
http://www.cpc.unc.edu/projects/addhealth/data/restricteduse/security
17. National Center for Health Statistics -
example
5/19/2014 CCPR - May 19, 2014 17
http://www.cdc.gov/rdc/
18. General Social Survey - example
5/19/2014 CCPR - May 19, 2014 18
http://publicdata.norc.org:41000/gss/documents//OTHR/ObtainingGSSSensitiveDataFiles.pdf
19. Issues affecting access
• Changes to the “Common Rule” – aka “The Federal Policy for the
Protection of Human Subjects” adopted by a number of federal
agencies in 1991.
• Other uses of private data – Biosense
http://www.cdc.gov/biosense/index.html
• Legislation – “Senators Intend to Amend Federal Student Privacy Law”
• Non-academic uses of personal data -- Top U.S. Retailers to Share
Data in Fight on Cybercrime
5/19/2014 CCPR - May 19, 2014 19
20. How can the Data Archive help?
• Review data and documentation to be
deposited
• Advise on:
– Data protection plan
– Data management plan
– Apply for restricted data access
– Identify appropriate UCLA administrators to
handle agreements
5/19/2014 CCPR - May 19, 2014 20
21. Contact Us
• Social Sciences Data Archive
• 1120-H Rolfe Hall
• libbie@ucla.edu
• 310-825-0716
5/19/2014 CCPR - May 19, 2014 21