Managing Confidential Information in Research


Published on

Workshop Presented at IQSS

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This work. Managing Confidential information in research, by Micah Altman ( is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  • - Institutions may give “limited assurance” of common rule compliance – just for funded research. Most give “general assurance”
  • Managing Confidential Information in Research

    1. 1. Managing ConfidentialInformation in Research Micah Altman Senior Research Scientist Institute for Quantitative Social Science Harvard University
    2. 2. Personally identifiable privateinformation is surprisingly common Includes information from a variety of sources, such as…  Research data, even if you aren‟t the original collector  Student “records” such as e-mail, grades  Logs from web-servers, other systems Lots of things are potentially identifying:  Under some federal laws: IP addresses, dates, zipcodes, …  Birth date + zipcode + gender uniquely identify ~87% of people in the U.S. [Sweeney 2002]  With date and place of birth, can guess first five digits of social security number (SSN) > 60% of the time. (Can guess the whole thing in under 10 tries, for a significant minority of people.) [Aquisti& Gross 2009]  Analysis of writing style or eclectic tastes has been used to identify individuals Brownstein, et al., 2006 , NEJM 355(16), Tables, graphs and maps can also reveal identifiable information 4 [Micah Altman, 3/10/2011]
    3. 3. IQSS (and affiliates) offer you support across all stages of your quantitative research: Research design, including: design of surveys, selection of statistical methods. Primary and secondary data collection, including: the collection of geospatial and survey data. Data management, including: storage, cataloging, permanent archiving, and distribution. Data analysis, including : statistical consulting, GIS consulting, high performance research computing. 6 [Micah Altman, 3/10/2011]
    4. 4. The IQSS grants administration team helps with every aspect of the grant process. Contact us when you are planning your proposal. Assisting in identifying research funding opportunities Consulting on writing proposals Assisting IQSS affiliates with:  preparation, review and submission of all grant applications (“pre-award support”)  management of their sponsored research portfolio (“post-award support”)  Interpret sponsor policies  Coordinate with FAS Research Administration and the Central Office for Sponsored Programs… And, of course, support seminars like this! 7 [Micah Altman, 3/10/2011]
    5. 5. Goals for course Overview of key areas Identify key concepts & issues Summarize Harvard policies, procedures, resources Establish framework for action Provide connection to resources, literature 8 [Micah Altman, 3/10/2011]
    6. 6. Outline [Preliminaries] Law, policy, ethics Research methods, design, manage ment Information Security (Storage, Transmission, U se) Disclosure Limitation [Additional Resources & Summary of 9 Recommendations] [Micah Altman, 3/10/2011]
    7. 7. Steps to Manage Confidential Research Data Identify potentially sensitive information in planning  Identify legal requirements, institutional requirements, data use agreements  Consider obtaining a certificate of confidentiality  Plan for IRB review Reduce sensitivity of collected data in design Separate sensitive information in collection Encrypt sensitive information in transit Desensitize information in processing  Removing names and other direct identifiers  Suppressing, aggregating, or perturbing indirect identifiers Protect sensitive information in systems  Use systems that are controlled, securely configured, and audited  Ensure people are authenticated, authorized, licensed Review sensitive information before dissemination  Review disclosure risk  Apply non-statistical disclosure limitation  Apply statistical disclosure limitation  Review past releases and publically available data  Check for changes in the law  Require a use agreement 10 [Micah Altman, 3/10/2011]
    8. 8. Law, policy, ethics Law, Policy & Ethics Research design … Information security Disclosure limitation  Ethical Obligations  Laws  Fun and games   Harvard Policies  [Summary]12 [Micah Altman, 3/10/2011]
    9. 9. Law, policy, ethicsConfidentiality & Research Ethics Research design … Information security Disclosure Belmont Principles limitation  Respect for Persons  individuals should be treated as autonomous agents  persons with diminished autonomy are entitled to protection  implies “informed consent”  implies respect for confidentiality and privacy  Beneficence  research must have individual and/or societal benefit to justify risks  implies minimizing risk/benefit ratio 13 [Micah Altman, 3/10/2011]
    10. 10. Scientific & Societal Benefits Law, policy, ethicsof Data Sharing Research design … Information security Increases replicability of research Disclosure limitation  Journal publication policies may apply Increases scientific impact of research  Follow up studies  Extensions  Citations Public interest in data produced by public funder  Funder policies may apply Public interest in data that supports public policy  FOIA and state FOI laws may apply Open data facilitates…  Transparent government  Scientific collaboration  Scientific verification  New forms of science  Participation in science  Hands-on education  Continuity of researchSources: Fienberg et. al 1985; ICSU 2004; Nature 2009 14 [Micah Altman, 3/10/2011]
    11. 11. Sources of Confidentiality Restrictions for Law, policy, ethicsUniversity Research Data Research design … Information security Disclosure Overlapping laws limitation Different laws apply to different cases All affiliates subject to university policy(Not included: EU directive, foreign laws, classified data, …) 15 [Micah Altman, 3/10/2011]
    12. 12. Law, policy, ethics45 CFR 46 [Overview] Research design … Information security“The Common Rule” Disclosure limitation Governs human subject research  With federal funds/ at federal institution Establishes rules for conduct of research Establishes confidentiality and consent requirement for for identified private data However, some information may be required to be disclosed under state and federal laws (e.g. in cases of child abuse) Delegates procedural decisions to Institutional Review Boards (IRB‟s) 16 [Micah Altman, 3/10/2011]
    13. 13. Law, policy, ethicsHIPAA [Overview] Research design … Information securityHealth Insurance Portability and Accountability Act Disclosure limitation Protects personal health care information for „covered entities‟ Detailed technical protection requirements Provides clearest legal standards for dissemination Provides a „safe harbor‟ Has become an accepted practice for dissemination in other areas where laws are less clear HITECH Act of 2009 extends HIPAA  Extends coverage to associated entities of covered entities  Additional technical safeguards  Adds breach reporting requirement HIPAA provides three dissemination options … 17 [Micah Altman, 3/10/2011]
    14. 14. Dissemination under HIPAA Law, policy, ethics Research design …[option 1] Information security Disclosure limitation “safe harbor” -- remove 18 identifiers  [Personal identifiers]  Names  Social Security #‟s; Personal Account #‟s; Certificate/License #‟s; full face photos (and comparable images); biometric id‟s; medical  Any other unique identifying number, characteristic, or code  [Asset identifiers]  fax #‟s; phone #‟s; vehicle #‟s;  personal URL‟s; IP addresses; e-mail addresses  Device ID‟s and serial numbers  [Quasi identifiers]  dates smaller than a year (and ages > 89 collapsed into one category)  geographic subdivisions smaller than a state (except for 3 digits of zipcode, if unit > 20,000 people) And  Entity does not have actual knowledge [direct and clear awareness] that it would be possible to use the remaining information alone or in combination with other information to identify the subject 18 [Micah Altman, 3/10/2011]
    15. 15. Dissemination under HIPAA Law, policy, ethics Research design …[Option 2] Information security “limited dataset” – leave some quasi-id‟s Disclosure limitation  Remove personal and asset identifiers  Permitted dates: dates of birth, death, service, years  Permitted geographic subdivisions: town, city, state, zip codeAnd  Require access control and data use agreement. 19 [Micah Altman, 3/10/2011]
    16. 16. Dissemination under HIPAA Law, policy, ethics Research design …[Option 3] Information security “qualified statistician” – statistical determination Disclosure limitation Have qualified statistician determine, using generally accepted statistical and scientific principles and methods, that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by the anticipated recipient to identify the subject of the information. Important caveats  Methods and results of the analysis must be documented  No bright line for “qualified”, text of rule is: “a person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable.” [Section 164.514(b)(1)]  No clear definitions for “generally accepted” or “very small” or “reasonably available information”… however, there are references in the federal register to statistical publications to be used as “starting points” 20 [Micah Altman, 3/10/2011]
    17. 17. Law, policy, ethicsFERPA Research design … Information security Family Educational Rights and Privacy Act Disclosure limitation Applies schools that receive federal (D.O.E.) funding Restricts use of student (not employee) information Establishes  Right to privacy of educational records  Right to inspect and correct records (with appeal to Federal government)  Definition of public “directory” information  Right to block access to public “directory” information, and to other records Educational records include:  Identified information about student  Maintained by institution  Not …  Employee records  Some medical and law-enforcement records  Records solely in the possession and for use by the creator (e.g. unpublished instructor notes) Personally identifiable information includes:  Direct identifiers  Indirect (quasi) identifiers  Indirectly linkable identifiers  “Information requested by a person who the educational agency or institution reasonably believes knows the identity of the student to whom the education record relates.” 21 [Micah Altman, 3/10/2011]
    18. 18. Law, policy, ethicsMA 201 CMR 17 Research design … Information security DisclosureStandards for the Protection of Personal Information limitation Strongest U.S. general privacy protection law Has been delayed/modified repeatedly Requires reporting of breaches  If data is not encrypted  Or encryption key is released in conjunction with data Requires specific technical protections:  Firewalls  Encryption of data transmitted in public  Anti-virus software  Software updates 22 [Micah Altman, 3/10/2011]
    19. 19. Inconsistencies in Requirements Law, policy, ethics and Definitions Research design … Information security Inconsistent definitions of “personally identifiable” Disclosure Inconsistent definitions of sensitive information limitation Requirements for to de-identify jibes with statistical realities FERPA HIPAA Common MA 201 CMR Rule 17Coverage Students in Medical Information Living persons in Mass. Residents Educational in “Covered research by Institutions Entities” funded institutionsIdentification -Direct -Direct -Direct -DirectCriteria -Indirect -Indirect -Indirect -Linked -Linked -Linked -Bad intent (!)Sensitivity Any non-directory Any medical Private Financial, State,Criteria information information information – Federal based on harm IdentifiersManagement - Directory opt-out - Consent - Consent - SpecificRequirement - [Implied] good - Specific technical - [Implied] risk technicals 23 practice safeguards minimization safeguards [Micah Altman, 3/10/2011] -Breach - Breach
    20. 20. Law, policy, ethicsThird Party Requirements Research design … Information security Licensing requirements Disclosure limitation Intellectual property requirements Federal/state law and/or policy requirements  State protection of personal information laws  Freedom of information laws (FOIA & State FOI)  State mandatory abuse/neglect notification laws And … think ahead to publisher requirements  Replication requirements  IP requirements Examples  NSF requires data from funded research be shared  NIH requires a data sharing plan for large projects  Wellcome Trust requires a data sharing plan  Many leading journals require data sharing 24 [Micah Altman, 3/10/2011]
    21. 21. Law, policy, ethics(Some) More Laws & Standards Research design … Information security California Laws  Detailed technical controls over information Disclosure  Lots of rules systems limitation  Applies any data about California residents  Sarbanes-Oxley (aka, SOX, aka SARBOX)  Privacy policy  Corporate and Auditing Accountability and Responsibility Act of 2002  Disclosure  Applies to U.S. public company  Reporting policy boards, management and public accounting firms EU Directive 95/46/EC  Rarely applies to research in universities  Data protection directive  Section 404 requires annual assessment of  Provides for notice, limits on purpose of organizational internal controls – but does not use, consent, security, disclosure, access, account specify details of controls ability  Classified Data  Forbids transfer of data to entities in countries  Separate and complex rules and requirements compliant with directive  The University does not accept classified data  U.S. is not compliant but …  But, may have “Controlled But Unclassified”  Organizations can certify compliance with FTC  Vaguely defined area  No auditing/enforcement !  Mostly government produced area  Substantial criticism of this arrangement  Penalties unclear Payment Card Industry (PCI) Security Standards  And… export controlled information, under ITAR  Governs treatment of credit card numbers and EAR  Requires reports, audits, fines  Export control include  Detailed technical measures technologies, software, documentation/design  Not a law, but helps define good practice documents may be included  Large penalties  Nevada law mandates PCI standards  … and over 1100 International Human Subjects FISMA laws…  Federal Information Security Management Act (FISMA), Public Law (P.L.) 107-347.  Is starting to be applied to NIH sponsored 25 research [Micah Altman, 3/10/2011]
    22. 22. Law, policy, ethicsPredicted Legal Changes for 2011… Research design … Information security Disclosure Caselaw limitation  “personal privacy” does not apply to information about corporations (a corporation is not a “person” for this purpose) FCC vs. ATT 2011 Scheduled  EU “cookie privacy” directive 2009/136/EC goes into effect  Proposed updates to EU information privacy directives Very Likely  New information privacy laws in selected states in 2011 Likely  Increased federal regulation of internet privacy 26 [Micah Altman, 3/10/2011]
    23. 23. Law, policy, ethicsWhat’s wrong with this picture? Research design … Information security Disclosure limitationName SSN Birthdate Zipcode Gender Favorite # of crimes Ice Cream committedA.Jones 12341 01011961 02145 M Raspberr 0 yB. Jones 12342 02021961 02138 M Pistachio 0C. Jones 12343 11111972 94043 M Chocolat 0 eD. Jones 12344 12121972 94043 M Hazelnut 0E. Jones 12345 03251972 94041 F Lemon 0F. Jones 12346 03251972 02127 F Lemon 1G. Jones 12347 08081989 02138 F Peach 1H. Smith 12348 01011973 63200 F Lime 2I. Smith 12349 02021973 63300 M Mango 4J. Smith 12350 02021973 63400 M Coconut 16K. Smith 12351 03031974 64500 M Frog 32L. Smith 12352 04041974 64600 M Vanilla 64M. Smith 12353 04041974 64700 F Pumpkin 128N. 12354 04041974 64800 F Allergic 256 27 [Micah Altman, 3/10/2011] Smi
    24. 24. Law, policy, ethics What’s wrong with this picture? Research design … Information securityIdentifier Sensitive Identifier Sensitive Private Disclosure Private Identifier limitation IdentifierName SSN Birthdate Zipcode Gender Favorite # of crimes Ice Cream committedA.Jones 12341 01011961 02145 M Raspberr 0 Mass resident yB. Jones 12342 02021961 02138 M Pistachio 0 CalifornianC. Jones 12343 11111972 94043 M Chocolat 0 eD. Jones 12344 12121972 94043 M Hazelnut 0 Twins, separated at birth?E. Jones 12345 03251972 94041 F Lemon 0 FERPA too?F. Jones 12346 03251972 02127 F Lemon 1G. Jones 12347 08081989 02138 F Peach 1H. Smith 12348 01011973 63200 F Lime 2I. Smith 12349 02021973 63300 M Mango 4J. Smith 12350 02021973 63400 M Coconut 16K. Smith 12351 03031974 64500 M Frog 32L. Smith 12352 04041974 64600 M Vanilla 64 Unexpected Response?M. Smith 12353 04041974 64700 F Pumpkin 128 28 [Micah Altman, 3/10/2011]N. Smith 12354 04041974 64800 F Allergic 256
    25. 25. Harvard: Law, policy, ethicsEnterprise Security Policy (HEISP) Research design … Information security Storing High Risk Confidential Information (HRCI) Disclosure  Must not be stored on individual user computer or limitation portable storage device  Confidential information on Harvard computing devices  Must be stored on "target computers" or secure locked containers  confidential information must be protected Human subject information  confidential information on portable devices must be encrypted  All research on human subjects must be approved by the IRB  laptops must have encryption (some schools require whole-disk encryption)  All proposals must include a data management plan  systems must be scanned annually Personally identifiable medical information (PIMI)  Cannot save confidential information on computer  "Covered entities" at Harvard are subject to HIPAA directly accessible from the internet, open Harvard requirements networks  PIMI is to be treated as HRCI throughout the university  Employees who have access must annually agree to Obtaining confidential information requires approval confidentiality agreements All confidential information must be encrypted when  Access to lists and database of Harvard University ID transported across any network numbers is restricted Public directories must adhere to privacy preferences  Each school must provide training establishes by the individuals  Registrars have developed common definition of Identifying Users with Access to Confidential FERPA directory information Information  Must adhere to student requests to block their directory  System owners must be able to identify users that have information, per FERPA access to confidential information  Accepting Payment Cards - Restricted to procedures  Strong passwords outlined in HU Credit Card Merchant Handbook  No account/password sharing Inhibit password guessing with logging and lockouts Limit application availability time with timeouts Limit user access to confidential information based on business need [ More on next page…] 31 [Micah Altman, 3/10/2011]
    26. 26. Law, policy, ethicsHEISP – Part 2 Research design … Information security Physical Environment  Network take down Disclosure limitation  All digital/non-digital media must be  Network managers run vulnerability properly protected scans Computers must be physically  May take computers off the network secure  Service Resumption Automatic logging must be  Must have a service resumption plan if consistent with written policies loss of confidential data is a substantial business risk Vendor contracts  require approval by security officer  Incident Response Policy  Include OGC contract rider  Disposition and destruction of records Computer operator  computer must be regularly updated  Acquisition/use by unauthorized persons must be reported to OGC  operated securely  Only necessary application installed  Interacting with legal authorities -- always refer to OGC unless  annually certify compliance with imminent health/safety risk requires university policies otherwise Computer setup - must filter malicious traffic  Web based surveys must have protections in place “Target” systems and controllers  Private address space; locally firewalled  Annual vulnerability scanning 32 [Micah Altman, 3/10/2011]
    27. 27. Law, policy, ethics Harvard: Research design … Information security Research Data Security Policy (HRDSP) Disclosure limitation Sensitivity of research data based on potential harm if disclosed:  Level 5 = “extremely sensitive”  Level 4 = “very sensitive” ~= HRCI  Level 3 = “sensitive” ~= HCI  Level 2 = “benign” ~= Good computer hygiene  Level 1 = anonymous and not business confidential Required protections based on sensitivity  Level 5: Entirely disconnected from network (“bubble security”)  Level 4: Protections as per HRCI  Level 3: Protections as per HCI  Level 2: Good computer hygiene Designates procedures for treatment of external data use agreements [ next section ]  Legally binding  Can be both very detailed and not supported by Harvard security procedures  Investigator should not sign these – forward to OSP Designates responsibilities for IRB, Investigator, OSP, IT, Security Officers. 33 [Micah Altman, 3/10/2011]
    28. 28. Harvard: Law, policy, ethics Research design …Researcher Responsibilities Information security … for knowing the rules Disclosure limitation … for identifying potentially confidential information in all forms (digital/analogue; on-line/off-line) … for notifying recipients of their responsibility to protect confidentiality … for obtaining IRB approval for any human subjects research … for following an IRB approved plan … for obtaining OSP approval of restricted data use agreements with providers, even if no money involved… and for proper  Storage  Access  Transmission  Disposal Confidentiality is not an “IT problem” 34 [Micah Altman, 3/10/2011]
    29. 29. Harvard: Law, policy, ethics Research design …Staff – Personnel Manual Information security Disclosure Protect Harvard information and systems limitation Keep your own information in Peoplesoft up to date Comply with copyrights and DMCA Comply with Harvard systems policies and procedures All information produced at work is Harvard property Attach only approved devices to the Harvard aff_Personnel_Manual/Section2/Privacy.shtml 35 [Micah Altman, 3/10/2011]
    30. 30. Law, policy, ethics Key Concepts & Issues Review Research design … Information security Privacy Disclosure  Control over extent and circumstances of sharing limitation Confidentiality  Treatment of private, sensitive information Sensitive information  Information that would cause harm if disclosed and linked to an individual Personally/individually identifiable information  Private information  Directly or indirectly linked to an identifiable individual Human subjects A living person …  who is interacted with to obtain research data  who‟s private identifiable information is included in research data Research  Systematic investigation  Designed to develop or contribute to generalizable knowledge “Common Rule”  Law governing funded human subjects research HIPAA  Law governing use of personal health information in covered and associated entities MA 201 CMR 17  Law governing use of certain personal identifiers for Massachusetts residents 36 [Micah Altman, 3/10/2011]
    31. 31. Law, policy, ethics Checklist: Identify Requirements Research design … Information securityCheck if research includes … Disclosure limitation Interaction with humans  Common Rule &HEISP/HRDSP appliesCheck if data used includes identified … Student records  FERPA &HEISP/HRDSP applies State, federal, financial id‟s  state law &HEISP/HRDSP applies Medical/health information  HIPAA (likely) &HEISP/HRDSP applies Human subjects & private info  Common Rule &HEISP/HRDSP appliesCheck for other requirements/restriction on data dissemination: Data provider restrictions and University approvals thereof Open data requirements and norms University information policy 37 [Micah Altman, 3/10/2011]
    32. 32. Law, policy, ethics Resources Research design …  E.A. Bankert& R.J. Andur, 2006, Institutional Review Board: Management and Function, Information security Jones and Bartlett Publishers Disclosure  P. Ohm, “Broken Promises of Privacy”, SSRN Working Paper limitation []  D. J. Mazur, 2007. Evaluating the Science and ethics of Research on Humans, Johns Hopkins University Press  IRB: Ethics & Human Research [Journal], Hastings Press  Journal of Empirical Research on Human Research Ethics, University of California Press  201 CMR 17 text  FERPA Website  HIPAA Website  Common Rule Website  State laws  Harvard Enterprise Information Security Policy/ Research Data Security Policy  Harvard Institutional Review Board  Harvard FAS Policies and Procedures  IQSS Policies and Procedures [Micah Altman, 3/10/2011]
    33. 33. Research design, methods, Law, policy, ethics management Research design … Information security Disclosure  Reducing risk limitation  Sensitivity of information  Partitioning  Decreasing identification  Managing confidentiality and dissemination  [Summary]39 [Micah Altman, 3/10/2011]
    34. 34. Law, policy, ethicsTrade-offs Research design … Information security Anonymity vs. research utility Disclosure limitation Sensitivity vs. research utility (Anonymity * Sensitivity) vs. research costs/efforts 40 [Micah Altman, 3/10/2011]
    35. 35. Law, policy, ethicsTypes of Sensitive Information Research design … Information security Information is sensitive, if, once disclosed there Disclosure limitation is a “significant” likelihood of harm IRB literature suggests possible categories of harm:  loss of insurability  loss of employability  criminal liability  psychological harm  social harm to a vulnerable group  loss of reputational harm  emotional harm  dignitary harm  physical harm: risk of disease, injury, or death 41 [Micah Altman, 3/10/2011]
    36. 36. Law, policy, ethics Levels of sensitivity Research design … No widely accepted scale Information security Publicly available data not sensitive under “common rule” Disclosure Common rule anchors scale at “minimal risk”: limitation “if disclosed, the probability and magnitude of harm or discomfort anticipated are not greater in and of themselves than those ordinarily encountered in daily life or during the performance of routine physical or psychological examinations or tests” Harvard Research Data Security Policy  Level 5- Extremely sensitive information about individually identifiable people. Information that if exposed poses significant risk of serious harm. Includes information posing serious risk of criminal liability, serious psychological harm or other significant injury, loss of insurability or employability, or significant social harm to an individual or group.  Level 4 - Very sensitive information about individually identifiable people Information that if exposed poses a non-minimal risk of moderate harm. Includes civil liability, moderate psychological harm, or material social harm to individuals or groups, medical records not classified as Level 5, sensitive-but-unclassified national security information, and financial identifiers (as per HRCI standards).  Level 3- Sensitive information about individually identifiable people Information that if disclosed poses a significant risk of minor harm. Includes information that would reasonably be expected to damage reputation or cause embarrassment; and FERPA records.  Level 2 – Benign information about individually identifiable people Information that would not be considered harmful, but that as to which a subject has been promised confidentiality.  Level 1 – De-identified information about people, and information not about people 42 [Micah Altman, 3/10/2011]
    37. 37. Law, policy, ethicsIRB Review Scope Research design … IRB approval needed for all: Information security  federally-funded research; Disclosure limitation  or any research at (almost all) institutions receiving federal funding that involve “human subjects”. ( Any organization operating under a general “federal-wide assurance”)  All human subjects research at Harvard Human subject: individual about whom an investigator (whether professional or student) conducting research obtains  (1) Data through intervention or interaction with a living individual, or  (2) Identifiable private information about living individuals See 43 [Micah Altman, 3/10/2011]
    38. 38. Law, policy, ethicsResearch not requiring IRB approval Research design … Non-research: Information security not generalizable knowledge & systematic inquiry Disclosure Non-funded: limitation institution receives no federal funds for research Not human subject:  No living people described  Observation only AND no private identifiable information is obtained Human Subjects, but “exempt” under 45 CFR 46  use of existing, publicly-available data  use of existing non-public data, if data is individuals cannot be directly or indirectly identified  research conducted in educational settings, involving normal educational practices  taste & food quality evaluation  federal program evaluation approved by agency head  observational, survey, test & interview of public officials and candidates (in their formal capacity, or not identified) Caution not all “exempt” is exempt…  Some research on prisoners, children, not exemptable  Some universities require review of “exempt” research Harvard requires review of all human subject research See: decisioncharts.htm 44 [Micah Altman, 3/10/2011]
    39. 39. Law, policy, ethicsIRB’s and Confidential Information Research design … Information security Disclosure IRB‟s review consent procedures and documentation limitation IRB‟s may review data management plans  May require procedures to minimize risk of disclosure  May require procedures to minimize harm resulting from disclosure IRB‟s make determination of sensitivity of information -- potential harm resulting from disclosure IRB‟s make determination regarding whether data is de-identified for “public use” [see NHRPAC, “Recommendations on Public Use Data Files”] 45 [Micah Altman, 3/10/2011]
    40. 40. Law, policy, ethicsHarvard IRB Approval Research design … Information security Disclosure The Harvard Institutional Review Board (IRB) must approve alllimitation human subjects research at Harvard prior to data collection or use Research involves human subjects if:  There is any interaction or intervention with living humans; or  If identifiable private data about living humans is used Some examples of human subject research in soc sci:  Surveys  Behavioral experiments  Educational tests and evaluations  Analysis of identified private data collected from people (your e-mail inbox, logs of web-browsing activity, facebook activity, ebay bids … ) The IRB will:  Assess research protocol  Identify whether research is exempt from further review and management  Identify sensitivity level of data 46 [Micah Altman, 3/10/2011]
    41. 41. Law, policy, ethicsHarvard Responsibilities Research design … Information security Disclosure The Harvard Institutional Review Board (IRB) must approve alllimitation human subjects research at Harvard prior to data collection or use Research involves human subjects if:  There is any interaction or intervention with living humans; or  If identifiable private data about living humans is used Some examples of human subject research in soc sci:  Surveys  Behavioral experiments  Educational tests and evaluations  Analysis of identified private data collected from people (your e-mail inbox, logs of web-browsing activity, facebook activity, ebay bids … ) The IRB will:  Assess research protocol  Identify whether research is exempt from further review and management  Identify sensitivity level of data 47 [Micah Altman, 3/10/2011]
    42. 42. Law, policy, ethics HRDSP Responsibilities Research design … Information security Disclosure Responsibilities limitation  Researchers are responsible for disclosing to IRB, and follow IRB approved plan  IRB is responsible for ensuring adequacy of Investigators plans; granting (lawful) variances from security requirements justified by research needs  IT is responsible for assisting with the identification of security level, and assisting in the implementation of security protections  Security Officer/CIO may review IT facilities and approve (give written designation) that they meet protections for a given level 48 [Micah Altman, 3/10/2011]
    43. 43. Valuation of private Law, policy, ethicsinformation is uncertain Research design … Information security Privacy valuations often inconsistent Disclosure  Framing effects: ordering, endowment effect, possibly others limitation  Non-normal/uniform distribution of valuations  One study: < 10% of subjects would give up $2 of a $12 gift card to buy anonymity of purchases [Aquesti and Lowenstein 2009] Cost benefit of information security may not be optimal for users [Herley 2009]  E.g. Loss from all phishing attacks is 100x less than time spent in avoiding them  Note, however weaknesses in this analysis:  Only loss of time modeled – no valuation of privacy made  Institutional costs not included – only personal costs  Very simplified model – not calibrated through surveys etc. Repeated surveys of students show they tend to disclose a lot, e.g.:  >80% of students sampled in several studies had public facebook pages with birthdays, home town and other private information  This information can easily be used to link to other databases!  Disclosure of extensive information on sexual orientation, private cell #‟s, drinking habits, etc. etc. not uncommon [See Kolek& Saunders 2008] Emerging markets for privacy?  Micropayments for disclosures  49  [Micah Altman, 3/10/2011]
    44. 44. Law, policy, ethicsReducing Risk in Data Collection Research design … Information security Disclosure Avoid collecting sensitive information, unless it is limitation required by research design, method, or hypothesis  Unnecessary sensitive information  not minimal risk  Reducing sensitivity  higher participation, greater honesty Collect sensitive information in private settings  Reduces risk of disclosure  Increases participation Reduce sensitivity through indirect measures  Less sensitive proxies  E.g. Implicit association test [Greenwald, et al. 1998]  Unfolding brackets  Group response collection  Random response technique [Warner 1965]  Item count/unmatched count/list experiment technique 50 [Micah Altman, 3/10/2011]
    45. 45. Managing Sensitive Data Law, policy, ethicsCollection Research design … Information security Disclosure Separate: limitation sensitive measures, (quasi)-identifiers, other measures If possible avoid storing identifiers with measures:  Collect identifying information beforehand  Assign opaque subject identifiers For sensitive data:  Collect on-line directly (with appropriate protections); or  Encrypt collection devices/media (laptops, usb keys, etc) For very/extremely sensitive data:  Collect with oversight directly; then  Store on encrypted device and;  Transfer to secure server as soon as feasible 51 [Micah Altman, 3/10/2011]
    46. 46. Randomized Response Law, policy, ethicsTechnique Research design … Information security Disclosure limitation Sensitiv e questio n Subjec Record t rolls a die > Answer 2 Variations: - Ask two different questions - Item counts with sensitive and non- Say sensitive items – eliminate “YES” subject-randomization - Regression analysis methods52 [Micah Altman, 3/10/2011]
    47. 47. Law, policy, ethics Our Table – Less (?) Sensitive Less (?) Research design … Sensitive Information securityName SSN Birthdate Zipcode Gender Favorite Treat? # Disclosure Ice Cream acts limitation *A.Jones 12341 01011961 02145 M Raspberry 0 0B. Jones 12342 02021961 02138 M Pistachio 1 20C. Jones 12343 11111972 94043 M Chocolate 0 0D. Jones 12344 12121972 94043 M Hazelnut 1 12E. Jones 12345 03251972 94041 F Lemon 0 0F. Jones 12346 03251972 02127 F Lemon 1 7G. Jones 12347 08081989 02138 F Peach 0 1H. Smith 12348 01011973 63200 F Lime 1 17I. Smith 12349 02021973 63300 M Mango 0 4J. Smith 12350 02021973 63400 M Coconut 1 18K. Smith 12351 03031974 64500 M Frog 0 32L. Smith 12352 04041974 64600 M Vanilla 1 65M. Smith 12353 04041974 64700 F Pumpkin 0 128N. Smith 12354 04041974 64800 F Allergic 1 256* Acts = crimes if treatment = 0; crimes + acts of generosity if treatment =1 53 [Micah Altman, 3/10/2011]
    48. 48. Randomized Response – Law, policy, ethicsPros and Cons Research design … Information security Pros Disclosure  Can substantially reduce risks of disclosure limitation  Can increase response rate  Can decrease mis-reporting Warning!  None of the randomized models uses a formal measure of disclosure limitation  Some would clearly violate measures (such as differential privacy) we‟ll see in section 4  Do not use as a replacement for disclosure limitation Other Issues  Loss of statistical efficiency (if compliance would otherwise be the same)  Complicates data analysis, especially model-based analysis  Leaving randomization up to subject can be unreliable  May provide less confidentiality protection if:  Randomization is incomplete  Records of randomization assignment are kept  Lists of responses overlap across questions  Sensitive question response is large enough to dominate overall response  Non-sensitive question responses are extremely predictable, or publicly observable 54 [Micah Altman, 3/10/2011]
    49. 49. Law, policy, ethicsPartitioning Information Research design … Information security Reduces risk in information management Disclosure limitation Partition data information based on sensitivity  Identifying information  Descriptive information  Sensitive information  Other information Segregate  Storage of information  Access regimes  Data collections channels  Data transmission channels Plan to segregate as early as feasible in data collection and processing Link segregated information with artificial keys … 55 [Micah Altman, 3/10/2011]
    50. 50. Partitioned table Not IdentifiedName SSN Birthdate Zipcode Gender LINK LINK Favorite Treat # Ice Cream act sA.Jones 12341 0101196 02145 M 1401 1401 Raspberry 0 0 1 283 Pistachio 1 20B. 12342 0202196 02138 M 283 8979 Chocolate 0 0Jones 1 7023 Hazelnut 1 12C. 12343 11111972 94043 M 8979Jones 1498 Lemon 0 0D. 12344 1212197 94043 M 7023 1036 Lemon 1 7Jones 2 3864 Peach 0 1E. 12345 0325197 94041 F 1498 2124 Lime 1 17Jones 2 4339 Mango 0 4F. 12346 0325197 02127 F 1036 Jon 2 6629 Coconut 1 18 es 9091 Frog 0 32G. 12347 0808198 02138 F 3864 9918 Vanilla 1 65 Jon 9 es 4749 Pumpkin 0 12 8H. Smith 12348 0101197 63200 F 2124 3 8197 Allergic 1 25 6I. Smith 12349 0202197 63300 M 4339 56 3 [Micah Altman, 3/10/2011]
    51. 51. Law, policy, ethicsChoosing Linking Keys Research design … Entirely randomized Information security  Most resistant to relink Disclosure  Mapping from original id to random keys is highly sensitive limitation  Must keep and be able to access mapping to add new identified data  Most computer-generated random numbers are not sufficient by themselves  Most are PSEUDO random – predictable sequences  Use a cryptographic secure PRNG: Blum Blum Shub, AES (or other block cypher) in counter mode OR  Use real random numbers (e.g. from physical sources – see OR  Use a PRNG with a real random seed to random the order of the table; then another to generate the ID‟s for this randomly ordered table Encryption  More troublesome to compute  Same id‟s + same key + same “salt” produces same values  facilitates merging  ID‟s can be recovered if key is exposed, cracked, or algorithm weak Cryptographic Hash e.g. SHA-256  Security is well understood  Tools available to compute  Same id‟s produce same hashes  easier to merge new identified data  ID‟s cannot be recovered from hash because hash loses information  ID‟s can be confirmed if identifying information is known or guessable Cryptographic Hash + secret key  Security is well understood  Tools available to compute  Same id‟s produce same hashes  easier to merge new identified data  ID‟s cannot be recovered from hash because hash loses information  ID‟s cannot be confirmed unless key is also known Do not choose arbitrary mathematical functions of other identifiers! 57 [Micah Altman, 3/10/2011]
    52. 52. Anonymous Data Collection: Law, policy, ethicsPros & Cons Research design … Information security Pros Disclosure limitation  Presumption that data is not identifiable  May increase participation  May increase honesty Cons  Barrier to follow-up, longitudinal studies  Can conflict with quality control, validation  Data still may be indirectly identifiable if respondent descriptive information is collected  Linking data to other sources of information may have large research benefits 59 [Micah Altman, 3/10/2011]
    53. 53. Law, policy, ethicsAnonymous Data Collection Methods Research design … Information security Trusted third party intermediates Disclosure limitation Respondent initiates re-contacts No identifying information recorded Use id‟s randomized to subjects, destroy mapping 60 [Micah Altman, 3/10/2011]
    54. 54. Law, policy, ethicsRemote Data Collection Challenges Research design … Information security Where network connection is readily available, Disclosure easy to transfer as collected, or enter on remote systemlimitation  Encrypted network file transfer (e.g. SFTP, part of ssh )  Encrypted/tunneled network file system (e.g. Expandrive) Where network connection is less reliable, high bandwidth  Whole-disk-encrypted laptop  Plus, Encrypted cloud backup solutions: CrashPlan, BackBlaze, SpiderOak Small data, short term  Encrypted USB keys (e.g. w/IronKey, TrueCrypt, PGP) Foreign Travel  Be aware of U.S. EAR export restrictions, use commercial or widely-available open encryption software only. Do not use bespoke software.  Be aware of country import restrictions (as of 2008): Burma, Belarus, China, Hungary, Iran, Israel, Morocco, Russia, Saudi Arabia, Tunisia, Ukraine  Encrypt data if possible, but don‟t break foreign laws. Check with department of state. 61 [Micah Altman, 3/10/2011]
    55. 55. Online/Electronic data collection Law, policy, ethicschallenges Research design … Information security IP addresses are identifiers  IP addresses can be logged automatically by host, even if not intended by researcher Disclosure limitation  IP addresses can trivially be observed as data is collected  Partial ID numbers can be used for probabilistic geographical identification at sub-zipcode levels Cookies may be identifiers  Cookies provide a way to link data more easily  May or may not explicitly identify subject Jurisdiction  Data collected from subjects from other states / countries could subject you to laws in that jurisdiction  Jurisdiction may depend on residency of subject, availability of data collection instrument in jurisdiction, or explicit data collections efforts within jurisdiction Vendor  Vendor could retain IP addresses, identifying cookies, etc., even if not intended by researcher Recommendation  Use only vendors that certify compliance with your confidentiality policy  Do not retain IP numbers if data is being collected anonymously  Use SSL/TLS encryption unless data is non-sensitive and anonymous Some tools for anonymizing IP addresses and system/network Harvard policy  Recommendations as above  62 Plus: do not use or display Level 4+ for web surveys [Micah Altman, 3/10/2011]
    56. 56. Law, policy, ethicsCertificates of Confidentiality Research design … Information security Issued by DHHS agencies such as NIH, CDC, Disclosure FDA limitation Protects against many types of forced legal disclosure of confidential information May not protect against all state disclosure law Does not protect against voluntary disclosures by researcher/research institutions 64 [Micah Altman, 3/10/2011]
    57. 57. Law, policy, ethicsConfidentiality & Consent Research design … Information securityBest practice is to describe in consent form... Disclosure limitation Practices in place to protect confidentiality Plans for making the data available, to whom, and under what circumstances, rationale. Limitations on confidentiality (e.g. limits to a certificate of confidentiality under state law, planned voluntary disclosure) Consent form should be consistent with your:  Data management plan  Data sharing plans and requirements Not generally best practice to promise  Unlimited confidentiality  Destruction of all data  Restriction of all data to original researchers 65 [Micah Altman, 3/10/2011]
    58. 58. Law, policy, ethicsData Management Plan Research design … Information security When is it required?  Any NIH request over $500K Disclosure limitation  All NSF proposals after 12/31/2010  NIJ  Wellcome Trust  Any proposal where collected data will be a resource beyond the project Safeguarding data during collection  Documentation  Backup and recovery  Review Treatment of confidential information  Overview:  Separation of identifying and sensitive information  Obtain certificate of confidentiality, other legal safeguards  De-identification and public use files Dissemination  Archiving commitment (include letter of support)  Archiving timeline  Access procedures  Documentation  User vetting, tracking, and support One size does not fit all projects. 66 [Micah Altman, 3/10/2011]
    59. 59. Data Management Plan Law, policy, ethicsOutline Research design … Information security Data description  Planned documentation and supporting  Budget Disclosure materials limitation  nature of data {generated, observed,  Cost of preparing data and documentation experimental information; amples; publications;  Quality assurance procedures for metadata physical collections; software; models} and documentation  Cost of permanent archiving  scale of data  Data Organization [if complex]  Intellectual Property Rights Access and Sharing  File organization  Entities who hold property rights  Plans for depositing in an existing public  Naming conventions  Types of IP rights in data database  Protections provided  Quality Assurance [if not described in main  Access procedures proposal]  Dispute resolution process  Embargo periods  Procedures for ensuring data quality in  Legal Requirements  Access charges collections, and expected measurement error  Provider requirements and plans to meet them  Timeframe for access  Cleaning and editing procedures  Institutional requirements and plans to meet  Technical access methods  Validation methods them  Restrictions on access  Storage, backup, replication, and  Archiving and Preservation Audience versioning  Requirements for data destruction, if applicable  Facilities  Procedures for long term preservation  Potential secondary users  Methods  Institution responsible for long-term costs of  Potential scope or scale of use  Procedures data preservation  Reasons not to share or reuse  Frequency  Succession plans for data should archiving Existing Data [ If applicable ] entity go out of existence  Replication  description of existing data relevant to the  Ethics and privacy project  Version management  Informed consent  plans for integration with data collection  Recovery guarantees  Protection of privacy  added value of collection, need to collect/create Security new data  Other ethical issues  Procedural controls Formats  Technical Controls  Adherence  Generation and dissemination formats and  When will adherence to data management plan  Confidentiality concerns be checked or demonstrated procedural justification  Access control rules  Who is responsible for managing data in the  Storage format and archival justification  Restrictions on use project Metadata and documentation  Responsibility  Who is responsible for checking adherence to  Metadata to be provided data management plan  Individual or project team role responsible for  Metadata standards used data management  Treatment of field notes, and collection records 67 [Micah Altman, 3/10/2011]
    60. 60. IQSS Law, policy, ethicsData Management Services Research design … Information security The Henry A. Murray Research Archive Disclosure  Harvard’s endowed permanent data archive limitation  Assists in developing data management plans  Can provide cataloging assistance for public release of data  Dissemination of data through IQSS Dataverse Network The IQSS Dataverse Network  Standard data management plan for public, small data  Provides easy virtual archiving and dissemination  Data is catalogued and controlled by you  You theme and brand your virtual archive  Universally searchable, citable  Automatically provides data formatting and statistical analysis on-line 68 [Micah Altman, 3/10/2011]
    61. 61. Data Management Plans Examples Law, policy, ethics(Summaries) Research design … Example 1 Information security The proposed research will involve a small sample (less than 20 subjects) recruited from clinical facilities in the Disclosure New York City area with Williams syndrome. This rare craniofacial disorder is associated with distinguishing facial limitation features, as well as mental retardation. Even with the removal of all identifiers, we believe that it would be difficult if not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting subjects. Therefore, we are not planning to share the data. Example 2 The proposed research will include data from approximately 500 subjects being screened for three bacterial sexually transmitted diseases (STDs) at an inner city STD clinic. The final dataset will include self-reported demographic and behavioral data from interviews with the subjects and laboratory data from urine specimens provided. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and associated documentation available to users only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed. Example 3 This application requests support to collect public-use data from a survey of more than 22,000 Americans over the age of 50 every 2 years. Data products from this study will be made available without cost to researchers and analysts. User registration is required in order to access or download files. As part of the registration process, users must agree to the conditions of use governing access to the public release data, including restrictions against attempting to identify study participants, destruction of the data after analyses are completed, reporting responsibilities, restrictions on redistribution of the data to third parties, and proper acknowledgement of the data resource. Registered users will receive user support, as well as information related to errors in the data, future releases, workshops, and publication lists. The information provided to users will not be used for commercial purposes, and will not be redistributed to third parties. FROM NIH, [] 69 [Micah Altman, 3/10/2011]
    62. 62. External Data UsageAgreements Controls on Confidential Information Harvard University has developed extensive technical and administrative Between provider and procedures to be used with all identified personal information and other confidential information. The University classifies this form of information internally individual as "Harvard Confidential Information" -- or HCI.  Careful – you‟re liable Any use of HCI at Harvard includes the following safeguards: - Systems security. Any system used to store HCI is subject to a checklist of  Harvard will not help if you sign technical and procedural security measures including: operating system and applications must be patched to current security levels, a host based firewall is  University review and OSP enabled, anti virus software is enabled and the definitions file is current - Server security. Any server used to distribute HCI to other systems (e.g. through signature strongly providing remote file system), or otherwise offering login access, must employ recommended additional security measures including: connection through a private network only; limitation on length of idle sessions; limitations on incorrect password attempts; and additional logging and monitoring. Between provider and - Access restriction: an individual is allowed to access HCI only if there is a University specific need for access. All access to HCI is over physically controlled and/or encrypted channels.  University Liable - Disposal processes: including secure file erasure and document destruction. - Encryption: HCI must be strongly encrypted whenever it is transmitted across a  Requires University Approved public network, stored on a laptop, or stored on a portable device such as a flash drive or on portable media. Signer : OSP This is only a brief summary. The full University security policy can be found here: Avoid nonstandard protections And a more detailed checklist used to verify systems compliance is found here: whenever possible  DUA‟s can impose very specific These safeguards are applied consistently throughout the university, we believe and detailed requirements that these requirements offer stringent protection for the requested data. And these requirements will be applied in addition to any others required by specific  Compatible in spirit does not data use agreement. apply compatibility in legal practice  70 Use University [Micah Altman, 3/10/2011] policies/procedures as a