Add Health Network Data Challenges:
IRB and Security Issues
Kathleen Mullan Harris
University of North Carolina at Chapel Hill
Social Networks and Health Workshop
May 13, 2019
Topics to cover
• Design of Add Health Study
• Unique network data
• Main IRB issues
• Security risks in data dissemination and user access
• Data security protocols
• Data sharing plan
• Successes and ongoing challenges
Add Health
• National Longitudinal Study of Adolescent to
Adult Health
• On-going program project that began in 1994
funded by NIH.
• Developed in response to a congressional
mandate to fund a study of adolescent health.
• Mandate to understand the role of social
environments in which adolescents live.
Initial Goal:
Putting the Individual
Into Context
Key Features of Add Health Design
• Nationally representative study that explores the cause
of health and health-related behaviors of adolescents
and their outcomes in adulthood.
• Multi-survey, multi-wave inter-disciplinary design.
• Direct measurement of the social contexts of adolescent
life and their effects on health and health behavior.
• Unprecedented racial and ethnic diversity and
genetically informed sibling samples.
• Extensive biomarker collection across all waves.
Sampling Structure
Disabled SampleSaturation
Samples
from 16 Schools
Main Sample 200/Community
Ethnic Samples
Genetic
Samples
High Educ
Black
Puerto Rican
Chinese
Identical Twins Full SibsFraternal Twins
Unrelated Pairs
in Same HH
Half Sibs
Sampling Frame of Adolescents and Parents N = 100,000+ (100 to 4,000 per pair of schools)
School Sampling Frame = QED
H
Feeder
HS HS HS HS
Feeder Feeder Feeder Feeder
Cuban
HS
SCHOOL
FAMILYNEIGHBORHOOD
PEER
Cultural & Policy Environment
Add Health Contextual Model
Unique Network Data
• Nominations of 5 best male and 5 best female friends
during in-school administration;
• Nomination of sexual and romantic partners in Wave I in-
home interview;
• Nomination of best friend in Wave I and Wave II in-home
interview;
• Nomination of 5 best male and female friends, sexual and
romantic partners in Wave I in-home interview in saturated
samples (N~3,000).
• 1,500 couple sample at Wave III.
• Contact with [potential] Wave I (1995) friends at Wave III
(2001-02) (asked of 7th and 8th graders from Wave I).
Wave I Friendship nominations
• Version A: [For R’s asked to nominate up to 5 male
and 5 female friends.]
• First, please tell me the name of your 5 best male
friends, starting with your best male friend. (If R is
female, add: If you have a boyfriend, list him first. If
not, begin with your best male friend.)
• Version B: [For R’s asked to nominate 1 male and 1
female friends.]
• The next questions are about your friends. First,
please think of your best male friend. What is his
name?
Source: Moody, 2001, American Journal of Sociology 107: 679-716
Race/Ethnicity N %
Mexico 1,767 8.5
Cuba 508 2.5
Central-South America 647 3.1
Puerto Rico 570 2.8
China 341 1.7
Philippines 643 3.1
Other Asia 601 2.9
Black (Africa/Afro-Caribbean) 4,601 22.2
Non-Hispanic White (Eur/Canada) 10,760 52.0
Native American (non-Hispanic) 248 1.2
Total N 20,686 100.0
Race and Ethnic Diversity in Add Health
Missing on race/ethnicity=59
Family Structure of Adolescents
Add Health 1995
N %
2 biological parents 10,339 53.3
2 adoptive parents 403 0.7
Bio Mom/ Step Dad 2,756 13.6
Bio Dad/ Step Mom 591 2.6
Single Mom 4,520 20.4
Single Dad 637 3.1
Surrogate parent(s) 1,499 6.3
Total 20,745 100.0
Domingue et al. 2018. The social genome of friends and schoolmates
in the National Longitudinal Study of Adolescent to Adult Health.
Main IRB Issues
• Protecting confidentiality of respondents’ identity;
• Protecting the confidentiality of friendship and
partner nominations;
• Minimizing deductive disclosure risks.
• All 3 of these issues combined with wide
dissemination of highly sensitive data.
Strong Motivation for Data Sharing
• Add Health very expensive;
• National representation;
• Design provided unprecedented and unique opportunities
for research that no other study allowed;
• Multi-disciplinary design;
• Omnibus study with comprehensive coverage of health
and health behavior in adolescence;
• Longitudinal study to help sort out cause and effect.
• Add Health policy: No proprietary period for investigators.
Special Data Security Concerns to Address
in Add Health Data Sharing Plan
• Contextual nature of design;
• Third-party nominations of friends;
• Third-party nominations of romantic and sex partners:
– No consent from nominated partners
• Geocodes of adolescent’s home address;
• Sensitive nature of data, especially during adolescence;
• With each additional wave of longitudinal data and new
data sources added, data security risks increase.
Pledge of Confidentiality to
Add Health Participants
Add Health Wave I Consent form for Adolescent,
1994-95.
• “I understand that my statements and answers will
be completely protected so that no one will be able
to connect my answers to me. My answers will not
be given to my parents/guardians or to any
unauthorized person by project staff.”
Two Major Security Risks to Privacy and
Confidentiality of Add Health Respondents
• Direct disclosure
– link between name and questionnaire information.
• Deductive disclosure
– Discerning a respondent’s identity and their responses
through use of known characteristics of that person;
– Add Health especially high risk of deductive disclosure
because of the clustered study design;
– Almost 400,000 people knew of the participation of at
least one, if not more, the adolescents in Add Health at
Wave I.
For Example
• In more than 90,000 cases in in-school
administration, takes only 5 variables to distinguish
as few as 7 respondents:
• Number of respondents = 90,118
– Gender = female (44,482)
– Age = 16 (8,250)
– Race = Asian (570)
– Does not live with father figure (104)
– Participates in chorus (7)
Data Security Procedures
• Add Health Data Management Security Plan:
• Protects against breach of confidentiality from both
direct and deductive disclosure
• Protects from subpoena (or hostile sharing)
• Also have a certificate of confidentiality
Protections from Direct Disclosure
• Based on the separation of identities and data.
• Names and addresses of respondents reside with a
security manager (i.e., honest broker) outside U.S.
• Identifying information only in U.S. during data
collection.
• Multiple identification numbers used for each wave
of data collection.
• Respondent names never connected to data.
Wave IV IDs
• F4ID – Wave IV Field ID
• B4ID – Wave IV Biospecimen ID
• C4ID – Wave IV Cortisol ID (pretest only)
• H4ID – Wave IV Household ID
• M4ID – Wave IV Military ID
• Q4ID – Wave IV Questionnaire ID
• S4ID –School ID used at Wave IV
• AID (altered ID)- analysis file identification number that
links data from all waves of data collection
Protections from Deductive Disclosure
• Tiered data dissemination plan:
– Differs by level of risk of deductive disclosure;
– Requirements for access and use of data differ by the
level of risk;
– Add Health users must meet these requirements and
assurances for safeguarding Add Health data.
• Users also required to abide by procedures for the
dissemination of research results.
• Access to all data via restricted data use contract.
Four tier data dissemination
according to disclosure risks
• Public-use data (no friendship or romantic pairs
linkages)
• Restricted-use data (no romantic pairs, HIV)
• High-security restricted data
• Secure data facility for analyzing high school
transcript data and for using geocodes to link
contextual data.
• [Secure server for access to genome-wide data.]
Add Health Dissemination Plan
Data Users
• User requirements to protect from deductive disclosure:
– Pledge of confidentiality
– Monitoring of data use
– Store data securely
– Deletion of temporary data analysis files every six
months.
– Security of printed information
– Password protected screen saver, set to activate after
3 minutes of idle time.
– Access data only from approved locations.
Wave II
1996
(88.6%)
Wave III
2001-2002
(77.4%)
School
Admin
144
Adolescents
in grades 7-12
20,745
Adolescents
in grades 8-12
14,738
Adults
Aged 24-32
15,701
Wave IV
2008
(80.3%)
Wave I
1994-1995
(79%)
In-School
Administration
Survey
Administration
Students
90,118
Partners
1,507
Parent
17,670
Young Adults
Aged 18-26
15,197
School
Admin
128
IIV Study
~100
Wave V
2016-18
Adults
Aged 32-42
Target: 12,000
Social, Behavioral, and Biological Linkages Across the Life Course
Parent
3,001
IIV Study
~100
Adolescence
Transition to
Adulthood
Young Adulthood Adulthood
Wave I-II (Ages 12-20) Wave III (Ages 18-26) Wave IV (Ages 24-32) Wave V (Ages 32-42)
Embedded genetic sample of ~3,000 pairs
Physical development
Height, weight Height, weight Height, weight, waist Height, weight, waist
STI tests (urine) Metabolic Metabolic
HIV test (saliva) Immune function Immune function
Genetic
(buccal cell DNA)
Inflammation Inflammation
Cardiovascular Cardiovascular
Genetic
(buccal cell DNA)
Genetic
(whole blood)
Medications Medications
Renal
Biological Data Across Waves
Social and Biological Longitudinal Data in Add Health
Adolescence Adulthood
Wave I-II Wave III Wave IV Wave V
(12-20) (18-26) (24-32) (32-42)
Social environmental data:
school college college work
family family family family
romantic rel romantic rel romantic rel romantic rel
neighborhood neighborhood neighborhood neighborhood
community community community community
peer peer
Biological data:
Biological resemblance to siblings in household on 3,000 pairs
height height ht, wt, waist, BMI ht, wt, waist, BMI
weight weight, BMI BP, pulse BP, pulse
BMI STI test results immune immune
HIV test results inflammation inflammation
DNA diabetes diabetes
DNA kidney disease
GWAS mRNA, DNAm
Original Data Sharing Plan able to
accommodate collection of new sensitive data
• STD and HIV results (on both members of
couple sample)
• Links to military and other administrative data
• Biomarkers
• Genetic data
• Longitudinal parent data
Wave III Questions on Friends
This section is administered only to respondents who were
in grades 7 or 8 at Wave I.
Moody algorithm lists of 10 potential friends displayed
on laptop screen:
• Please indicate which of the following people who went
to <SCHOOL> are your friends now or were your friends
when you were in school with them.
– Current relationship with <FNAME>
– How often see <FNAME>
– How often you contact <FNAME>
– <FNAME> one of your closest friends
– Friendship beginning and end dates
Other Data Security
• Keep up with technological advances in data
access, hacking, and intruding
• SANS Institute
– Organization that provides computer security training and
certification
– Continuing training and updating security systems on 20
top threats to data security
– Add Health staff data security specialist
• Procedures for PI, study administrators, survey
field staff and interviewers, data manager, and
data users
Dissemination Successes
• 3,500 peer-reviewed articles in more than 800 different
disciplinary journals
– New article based on Add Health published every
day of the work week
• 200 books, book chapters, and reports; 4,000
conference presentations
• 900 master’s theses and doctoral dissertations
• Over 1,000 independent grants awarded using Add
Health data
• Over 200,000 citations to Add Health research
Over 50,000 Add Health Users
Around the World
Add Health Users in the US
Ongoing Data Security Challenges
• Time-consuming
• Multiple staff demands and skills
• Resort to custom-approach with users to set up security
plans at IT-poor institutions
• Educate
– IRBs
– Users
– Investigators
– Never-ending with rapidly changing technology and
methods of data intrusion.
Ongoing Data Security Challenges
• Genetic Data
– Candidate and SNP data
– Genome-wide data
• Other genetic and biological data
– Microbiome
– Gene expression data
• Administrative records
– Social security
– Birth, death record
Ancillary Studies
• Researchers may propose ancillary studies to add
data to Add Health;
– No access to respondents;
– Geographic, political, environmental (via geocodes);
– Biological and genetic data (via biological specimen
archive);
– Review ancillary study proposals for scientific merit,
value and burden to Add Health, scarcity and amount
requested of specimen archive.
– Cover all costs, including Add Health costs to check
and merge data via Data Security Manager outside
the U.S. and disclosure analysis;
Ancillary Studies
• Add Health policies:
– Ancillary investigators must check and clean
data unlinked to Add Health longitudinal data;
– Contextual data has no proprietary period
• The day we provide the data to the Ancillary Study PI
is the day we distribute to community of Add Health
users.
– Biological and genetic data have a 1-year
proprietary period.
Add Health Info
• Website: www.cpc.unc.edu/addhealth
• Description of data access and dissemination:
http://www.cpc.unc.edu/projects/addhealth/data
• Restricted data use contract:
http://www.cpc.unc.edu/projects/addhealth/data/contract
Wave IV Consent Form (excerpt)
• Add Health has always attempted to protect privacy to the greatest extent
possible. Your answers, measurements, and test results will be held in strict
privacy by the Add Health and RTI project staff and not given to unauthorized
persons. Extensive security procedures are in place to make sure that
participants’ answers are not linked to their names. Your name will not be
reported with your responses or test results; these data are labeled only with an
ID number. Similarly, no names or other identifying information will be stored
with the DNA or blood drops. Only an ID number will identify the samples. This
ID number is different from the one associated with your interview responses.
Researchers and technicians at the laboratories that will conduct the tests do
not have access to your identifying information.
• After data collection and cleaning are completed, your identifying information will
be electronically transmitted to the Add Health Security Manager in Canada, and
your interview answers without identifying information will be sent to the Add
Health project staff at the University of North Carolina at Chapel Hill. Storing
different types of information in different places helps keep your identity
confidential.
Add Health Co-Funders
• National Institute of Child Health and Human Development
• National Cancer Institute*
• National Center for Health Statistics, Centers for Disease Control and Prevention, DHHS
• National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, DHHS*
• National Center for Minority Health and Health Disparities*
• National Institute of Allergy and Infectious Diseases*
• National Institute of Deafness and Other Communication Disorders*
• National Institute of General Medical Sciences
• National Institute of Mental Health
• National Institute of Nursing Research*
• National Institute on Aging*
• National Institute on Alcohol Abuse and Alcoholism*
• National Institute on Drug Abuse*
• National Science Foundation*
• Office of AIDS Research, NIH*
• Office of the Assistant Secretary for Planning and Evaluation, DHHS*
• Office of Behavioral and Social Sciences Research, NIH*
• Office of the Director, NIH
• Office of Minority Health, Centers for Disease Control and Prevention, DHHS
• Office of Minority Health, Office of Public Health and Science, DHHS
• Office of Population Affairs, DHHS*
• Office of Research on Women's Health, NIH*
*Wave IV co-funders
Wave V co-funders

01 Add Health Network Data Challenges: IRB and Security Issues

  • 1.
    Add Health NetworkData Challenges: IRB and Security Issues Kathleen Mullan Harris University of North Carolina at Chapel Hill Social Networks and Health Workshop May 13, 2019
  • 2.
    Topics to cover •Design of Add Health Study • Unique network data • Main IRB issues • Security risks in data dissemination and user access • Data security protocols • Data sharing plan • Successes and ongoing challenges
  • 3.
    Add Health • NationalLongitudinal Study of Adolescent to Adult Health • On-going program project that began in 1994 funded by NIH. • Developed in response to a congressional mandate to fund a study of adolescent health. • Mandate to understand the role of social environments in which adolescents live.
  • 4.
    Initial Goal: Putting theIndividual Into Context
  • 5.
    Key Features ofAdd Health Design • Nationally representative study that explores the cause of health and health-related behaviors of adolescents and their outcomes in adulthood. • Multi-survey, multi-wave inter-disciplinary design. • Direct measurement of the social contexts of adolescent life and their effects on health and health behavior. • Unprecedented racial and ethnic diversity and genetically informed sibling samples. • Extensive biomarker collection across all waves.
  • 6.
    Sampling Structure Disabled SampleSaturation Samples from16 Schools Main Sample 200/Community Ethnic Samples Genetic Samples High Educ Black Puerto Rican Chinese Identical Twins Full SibsFraternal Twins Unrelated Pairs in Same HH Half Sibs Sampling Frame of Adolescents and Parents N = 100,000+ (100 to 4,000 per pair of schools) School Sampling Frame = QED H Feeder HS HS HS HS Feeder Feeder Feeder Feeder Cuban HS
  • 7.
    SCHOOL FAMILYNEIGHBORHOOD PEER Cultural & PolicyEnvironment Add Health Contextual Model
  • 8.
    Unique Network Data •Nominations of 5 best male and 5 best female friends during in-school administration; • Nomination of sexual and romantic partners in Wave I in- home interview; • Nomination of best friend in Wave I and Wave II in-home interview; • Nomination of 5 best male and female friends, sexual and romantic partners in Wave I in-home interview in saturated samples (N~3,000). • 1,500 couple sample at Wave III. • Contact with [potential] Wave I (1995) friends at Wave III (2001-02) (asked of 7th and 8th graders from Wave I).
  • 9.
    Wave I Friendshipnominations • Version A: [For R’s asked to nominate up to 5 male and 5 female friends.] • First, please tell me the name of your 5 best male friends, starting with your best male friend. (If R is female, add: If you have a boyfriend, list him first. If not, begin with your best male friend.) • Version B: [For R’s asked to nominate 1 male and 1 female friends.] • The next questions are about your friends. First, please think of your best male friend. What is his name?
  • 10.
    Source: Moody, 2001,American Journal of Sociology 107: 679-716
  • 11.
    Race/Ethnicity N % Mexico1,767 8.5 Cuba 508 2.5 Central-South America 647 3.1 Puerto Rico 570 2.8 China 341 1.7 Philippines 643 3.1 Other Asia 601 2.9 Black (Africa/Afro-Caribbean) 4,601 22.2 Non-Hispanic White (Eur/Canada) 10,760 52.0 Native American (non-Hispanic) 248 1.2 Total N 20,686 100.0 Race and Ethnic Diversity in Add Health Missing on race/ethnicity=59
  • 12.
    Family Structure ofAdolescents Add Health 1995 N % 2 biological parents 10,339 53.3 2 adoptive parents 403 0.7 Bio Mom/ Step Dad 2,756 13.6 Bio Dad/ Step Mom 591 2.6 Single Mom 4,520 20.4 Single Dad 637 3.1 Surrogate parent(s) 1,499 6.3 Total 20,745 100.0
  • 13.
    Domingue et al.2018. The social genome of friends and schoolmates in the National Longitudinal Study of Adolescent to Adult Health.
  • 14.
    Main IRB Issues •Protecting confidentiality of respondents’ identity; • Protecting the confidentiality of friendship and partner nominations; • Minimizing deductive disclosure risks. • All 3 of these issues combined with wide dissemination of highly sensitive data.
  • 15.
    Strong Motivation forData Sharing • Add Health very expensive; • National representation; • Design provided unprecedented and unique opportunities for research that no other study allowed; • Multi-disciplinary design; • Omnibus study with comprehensive coverage of health and health behavior in adolescence; • Longitudinal study to help sort out cause and effect. • Add Health policy: No proprietary period for investigators.
  • 16.
    Special Data SecurityConcerns to Address in Add Health Data Sharing Plan • Contextual nature of design; • Third-party nominations of friends; • Third-party nominations of romantic and sex partners: – No consent from nominated partners • Geocodes of adolescent’s home address; • Sensitive nature of data, especially during adolescence; • With each additional wave of longitudinal data and new data sources added, data security risks increase.
  • 17.
    Pledge of Confidentialityto Add Health Participants Add Health Wave I Consent form for Adolescent, 1994-95. • “I understand that my statements and answers will be completely protected so that no one will be able to connect my answers to me. My answers will not be given to my parents/guardians or to any unauthorized person by project staff.”
  • 18.
    Two Major SecurityRisks to Privacy and Confidentiality of Add Health Respondents • Direct disclosure – link between name and questionnaire information. • Deductive disclosure – Discerning a respondent’s identity and their responses through use of known characteristics of that person; – Add Health especially high risk of deductive disclosure because of the clustered study design; – Almost 400,000 people knew of the participation of at least one, if not more, the adolescents in Add Health at Wave I.
  • 19.
    For Example • Inmore than 90,000 cases in in-school administration, takes only 5 variables to distinguish as few as 7 respondents: • Number of respondents = 90,118 – Gender = female (44,482) – Age = 16 (8,250) – Race = Asian (570) – Does not live with father figure (104) – Participates in chorus (7)
  • 20.
    Data Security Procedures •Add Health Data Management Security Plan: • Protects against breach of confidentiality from both direct and deductive disclosure • Protects from subpoena (or hostile sharing) • Also have a certificate of confidentiality
  • 21.
    Protections from DirectDisclosure • Based on the separation of identities and data. • Names and addresses of respondents reside with a security manager (i.e., honest broker) outside U.S. • Identifying information only in U.S. during data collection. • Multiple identification numbers used for each wave of data collection. • Respondent names never connected to data.
  • 22.
    Wave IV IDs •F4ID – Wave IV Field ID • B4ID – Wave IV Biospecimen ID • C4ID – Wave IV Cortisol ID (pretest only) • H4ID – Wave IV Household ID • M4ID – Wave IV Military ID • Q4ID – Wave IV Questionnaire ID • S4ID –School ID used at Wave IV • AID (altered ID)- analysis file identification number that links data from all waves of data collection
  • 23.
    Protections from DeductiveDisclosure • Tiered data dissemination plan: – Differs by level of risk of deductive disclosure; – Requirements for access and use of data differ by the level of risk; – Add Health users must meet these requirements and assurances for safeguarding Add Health data. • Users also required to abide by procedures for the dissemination of research results. • Access to all data via restricted data use contract.
  • 24.
    Four tier datadissemination according to disclosure risks • Public-use data (no friendship or romantic pairs linkages) • Restricted-use data (no romantic pairs, HIV) • High-security restricted data • Secure data facility for analyzing high school transcript data and for using geocodes to link contextual data. • [Secure server for access to genome-wide data.]
  • 25.
  • 26.
    Data Users • Userrequirements to protect from deductive disclosure: – Pledge of confidentiality – Monitoring of data use – Store data securely – Deletion of temporary data analysis files every six months. – Security of printed information – Password protected screen saver, set to activate after 3 minutes of idle time. – Access data only from approved locations.
  • 27.
    Wave II 1996 (88.6%) Wave III 2001-2002 (77.4%) School Admin 144 Adolescents ingrades 7-12 20,745 Adolescents in grades 8-12 14,738 Adults Aged 24-32 15,701 Wave IV 2008 (80.3%) Wave I 1994-1995 (79%) In-School Administration Survey Administration Students 90,118 Partners 1,507 Parent 17,670 Young Adults Aged 18-26 15,197 School Admin 128 IIV Study ~100 Wave V 2016-18 Adults Aged 32-42 Target: 12,000 Social, Behavioral, and Biological Linkages Across the Life Course Parent 3,001 IIV Study ~100
  • 28.
    Adolescence Transition to Adulthood Young AdulthoodAdulthood Wave I-II (Ages 12-20) Wave III (Ages 18-26) Wave IV (Ages 24-32) Wave V (Ages 32-42) Embedded genetic sample of ~3,000 pairs Physical development Height, weight Height, weight Height, weight, waist Height, weight, waist STI tests (urine) Metabolic Metabolic HIV test (saliva) Immune function Immune function Genetic (buccal cell DNA) Inflammation Inflammation Cardiovascular Cardiovascular Genetic (buccal cell DNA) Genetic (whole blood) Medications Medications Renal Biological Data Across Waves
  • 29.
    Social and BiologicalLongitudinal Data in Add Health Adolescence Adulthood Wave I-II Wave III Wave IV Wave V (12-20) (18-26) (24-32) (32-42) Social environmental data: school college college work family family family family romantic rel romantic rel romantic rel romantic rel neighborhood neighborhood neighborhood neighborhood community community community community peer peer Biological data: Biological resemblance to siblings in household on 3,000 pairs height height ht, wt, waist, BMI ht, wt, waist, BMI weight weight, BMI BP, pulse BP, pulse BMI STI test results immune immune HIV test results inflammation inflammation DNA diabetes diabetes DNA kidney disease GWAS mRNA, DNAm
  • 30.
    Original Data SharingPlan able to accommodate collection of new sensitive data • STD and HIV results (on both members of couple sample) • Links to military and other administrative data • Biomarkers • Genetic data • Longitudinal parent data
  • 31.
    Wave III Questionson Friends This section is administered only to respondents who were in grades 7 or 8 at Wave I. Moody algorithm lists of 10 potential friends displayed on laptop screen: • Please indicate which of the following people who went to <SCHOOL> are your friends now or were your friends when you were in school with them. – Current relationship with <FNAME> – How often see <FNAME> – How often you contact <FNAME> – <FNAME> one of your closest friends – Friendship beginning and end dates
  • 32.
    Other Data Security •Keep up with technological advances in data access, hacking, and intruding • SANS Institute – Organization that provides computer security training and certification – Continuing training and updating security systems on 20 top threats to data security – Add Health staff data security specialist • Procedures for PI, study administrators, survey field staff and interviewers, data manager, and data users
  • 33.
    Dissemination Successes • 3,500peer-reviewed articles in more than 800 different disciplinary journals – New article based on Add Health published every day of the work week • 200 books, book chapters, and reports; 4,000 conference presentations • 900 master’s theses and doctoral dissertations • Over 1,000 independent grants awarded using Add Health data • Over 200,000 citations to Add Health research
  • 34.
    Over 50,000 AddHealth Users Around the World
  • 35.
  • 36.
    Ongoing Data SecurityChallenges • Time-consuming • Multiple staff demands and skills • Resort to custom-approach with users to set up security plans at IT-poor institutions • Educate – IRBs – Users – Investigators – Never-ending with rapidly changing technology and methods of data intrusion.
  • 37.
    Ongoing Data SecurityChallenges • Genetic Data – Candidate and SNP data – Genome-wide data • Other genetic and biological data – Microbiome – Gene expression data • Administrative records – Social security – Birth, death record
  • 38.
    Ancillary Studies • Researchersmay propose ancillary studies to add data to Add Health; – No access to respondents; – Geographic, political, environmental (via geocodes); – Biological and genetic data (via biological specimen archive); – Review ancillary study proposals for scientific merit, value and burden to Add Health, scarcity and amount requested of specimen archive. – Cover all costs, including Add Health costs to check and merge data via Data Security Manager outside the U.S. and disclosure analysis;
  • 39.
    Ancillary Studies • AddHealth policies: – Ancillary investigators must check and clean data unlinked to Add Health longitudinal data; – Contextual data has no proprietary period • The day we provide the data to the Ancillary Study PI is the day we distribute to community of Add Health users. – Biological and genetic data have a 1-year proprietary period.
  • 40.
    Add Health Info •Website: www.cpc.unc.edu/addhealth • Description of data access and dissemination: http://www.cpc.unc.edu/projects/addhealth/data • Restricted data use contract: http://www.cpc.unc.edu/projects/addhealth/data/contract
  • 41.
    Wave IV ConsentForm (excerpt) • Add Health has always attempted to protect privacy to the greatest extent possible. Your answers, measurements, and test results will be held in strict privacy by the Add Health and RTI project staff and not given to unauthorized persons. Extensive security procedures are in place to make sure that participants’ answers are not linked to their names. Your name will not be reported with your responses or test results; these data are labeled only with an ID number. Similarly, no names or other identifying information will be stored with the DNA or blood drops. Only an ID number will identify the samples. This ID number is different from the one associated with your interview responses. Researchers and technicians at the laboratories that will conduct the tests do not have access to your identifying information. • After data collection and cleaning are completed, your identifying information will be electronically transmitted to the Add Health Security Manager in Canada, and your interview answers without identifying information will be sent to the Add Health project staff at the University of North Carolina at Chapel Hill. Storing different types of information in different places helps keep your identity confidential.
  • 42.
    Add Health Co-Funders •National Institute of Child Health and Human Development • National Cancer Institute* • National Center for Health Statistics, Centers for Disease Control and Prevention, DHHS • National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, DHHS* • National Center for Minority Health and Health Disparities* • National Institute of Allergy and Infectious Diseases* • National Institute of Deafness and Other Communication Disorders* • National Institute of General Medical Sciences • National Institute of Mental Health • National Institute of Nursing Research* • National Institute on Aging* • National Institute on Alcohol Abuse and Alcoholism* • National Institute on Drug Abuse* • National Science Foundation* • Office of AIDS Research, NIH* • Office of the Assistant Secretary for Planning and Evaluation, DHHS* • Office of Behavioral and Social Sciences Research, NIH* • Office of the Director, NIH • Office of Minority Health, Centers for Disease Control and Prevention, DHHS • Office of Minority Health, Office of Public Health and Science, DHHS • Office of Population Affairs, DHHS* • Office of Research on Women's Health, NIH* *Wave IV co-funders Wave V co-funders