EllenWrightClayton, MD, JD
BradleyA. Malin, PhD
VanderbiltUniversity
 The risk of re-identification has been overcalled in the NPRM
 Inadequateattention has been paid to the importance of
penalties for those who seek to re-identify people
 “it is acknowledged that a time when investigators
will be able [sic] readily ascertain the identity of
individuals from their genetic information may not
be far away.”
 There has been some success to date
 But there is no epidemic of re-identification
 As of July1, 2015, NIH has approved 20,178 DataAccess Requests
and,of these Requests, has identifiedand managed 27 policy
complianceviolations.Thenumber of compliance violations
represents 0.1% of the approved DataAccess Requests and are
categorizedas being related to data submission, research use or
data access, data security, and the publication embargo period.
 https://gds.nih.gov/20ComplianceStatistics_dbGap.html
 What matters is what is probable, not what is
possible
 Thepeople and institutions that hold the data and
want/need to share them
 Theindividuals who want to re-identify the people
from whom the data came
 “Whitehat”
 Harm to individuals and institutions
 Selling identified data
 Theindividuals to whom the data relates
 Safe Harbor (SH)Game
 Defender shares data according to federal policy
 BasicGame
 Defender shares data to maximize overall payoff
 SH-Friendly
 Defender constrains strategy space to disclose no greater
detail than SH
 NoAttack
▪ Defender constrains strategy space to disclose no greater
detail than SH
6
Wanet al, PLoSOne. 2015
7
$0.00
$0.50
$1.00
$1.50
$2.00
$2.50
$3.00
$0.00 $500.00 $1,000.00 $1,500.00
Attacker
Publisher
SH
Basic
SH - Friendly NoAttack
● $1200:Benefit per record
● $300:Cost per violation
● Average Payoff Per Record
● $4:Access cost per record
● ~30,000Census records
 Obtaining external data
 Linking data
 Penalties for attack
 Violating data use agreements – to date these have been
limiting access to data for brief period of time
▪ Precision Medicine Initiative calls on Congress to enact
penalties for misuse
 Penalties/damages for injuring individuals
▪ Manyof these are not within the control of individuals
 Data sharing is desirable
 Just because re-identification is possible at times
does not mean that it is probable
 Risk always exists but can be mitigated
 Governance and security are important
 Accountability for those who seek to re-identify is a
crucial but at present insufficiently robust strategy
for mitigating risk
 YevgenivVorobeychik
 Murat Kantarcioglu
 ZhiyuWan
 Weiyi Xia
 Dan Roden
 R01HG006844
 U01HG008672
 U01HG008701

Ellen Wright Clayton, "Modeling Risk to Privacy in Genomics Research"

  • 1.
    EllenWrightClayton, MD, JD BradleyA.Malin, PhD VanderbiltUniversity
  • 2.
     The riskof re-identification has been overcalled in the NPRM  Inadequateattention has been paid to the importance of penalties for those who seek to re-identify people
  • 3.
     “it isacknowledged that a time when investigators will be able [sic] readily ascertain the identity of individuals from their genetic information may not be far away.”  There has been some success to date
  • 4.
     But thereis no epidemic of re-identification  As of July1, 2015, NIH has approved 20,178 DataAccess Requests and,of these Requests, has identifiedand managed 27 policy complianceviolations.Thenumber of compliance violations represents 0.1% of the approved DataAccess Requests and are categorizedas being related to data submission, research use or data access, data security, and the publication embargo period.  https://gds.nih.gov/20ComplianceStatistics_dbGap.html  What matters is what is probable, not what is possible
  • 5.
     Thepeople andinstitutions that hold the data and want/need to share them  Theindividuals who want to re-identify the people from whom the data came  “Whitehat”  Harm to individuals and institutions  Selling identified data  Theindividuals to whom the data relates
  • 6.
     Safe Harbor(SH)Game  Defender shares data according to federal policy  BasicGame  Defender shares data to maximize overall payoff  SH-Friendly  Defender constrains strategy space to disclose no greater detail than SH  NoAttack ▪ Defender constrains strategy space to disclose no greater detail than SH 6 Wanet al, PLoSOne. 2015
  • 7.
    7 $0.00 $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $0.00 $500.00 $1,000.00$1,500.00 Attacker Publisher SH Basic SH - Friendly NoAttack ● $1200:Benefit per record ● $300:Cost per violation ● Average Payoff Per Record ● $4:Access cost per record ● ~30,000Census records
  • 8.
     Obtaining externaldata  Linking data  Penalties for attack  Violating data use agreements – to date these have been limiting access to data for brief period of time ▪ Precision Medicine Initiative calls on Congress to enact penalties for misuse  Penalties/damages for injuring individuals ▪ Manyof these are not within the control of individuals
  • 9.
     Data sharingis desirable  Just because re-identification is possible at times does not mean that it is probable  Risk always exists but can be mitigated  Governance and security are important  Accountability for those who seek to re-identify is a crucial but at present insufficiently robust strategy for mitigating risk
  • 10.
     YevgenivVorobeychik  MuratKantarcioglu  ZhiyuWan  Weiyi Xia  Dan Roden  R01HG006844  U01HG008672  U01HG008701