More Related Content
Similar to Bio it 2014-published (20)
Bio it 2014-published
- 1. © 2013 New York Genome Center 1NYGC PRIVILEGED & CONFIDENTIAL
Privacy, Regulatory and
Security Requirements in
a Collaborative Clinical
Genomics Environment
TOBY BLOOM, PH.D
BIO-IT WORLD
APRIL 29, 2014
- 2. © 2013 New York Genome Center 2NYGC PRIVILEGED & CONFIDENTIAL
NYGC OVERVIEW
Independent, non-profit research organization
Founded as a collaboration of 12 NYC medical institutions
Focused on clinical genomics
Expecting to handle PHI, HIPAA regulations, FISMA-moderate security from the
beginning.
Merging many kinds of data
The center’s mission is to save lives by creating an unprecedented collaboration of
technology, science and medicine.
- 3. © 2013 New York Genome Center 3NYGC PRIVILEGED & CONFIDENTIAL
MEMBER INSTITUTIONS
- 4. © 2013 New York Genome Center 4NYGC PRIVILEGED & CONFIDENTIAL
NEW YORK BIOMEDICAL COMMUNITY
Fostering Collaboration
Enhancing efficiencies
Promoting advances in medicine faster
Sharing data is essential!!
- 5. © 2013 New York Genome Center 5NYGC PRIVILEGED & CONFIDENTIAL
HOW DO SECURITY, PRIVACY &
REGULATIONS AFFECT OUR
MISSION?
- 6. © 2013 New York Genome Center 6NYGC PRIVILEGED & CONFIDENTIAL
MY DEFINITIONS
Privacy:
Ensuring that information that anyone considers
personal and would not want known by others is
protected
Security
The means by which we constrain access to data, so
that private data is protected from access by
unauthorized individuals, and is not changed, removed,
or made unavailable by unauthorized individuals.
Regulations
Laws and governmental or organization rules that
govern how data may be accessed and used.
ØALL OF THESE IMPACT SHARING OF DATA
- 7. © 2013 New York Genome Center 7NYGC PRIVILEGED & CONFIDENTIAL
DATA SHARING AND AGGREGATION
ARE CRITICAL
Complex diseases may need huge numbers of
samples to gain statistical power
Sequencing more patients when enough
sequence exists for a new study is a waste of
resources and precious research funding
In rare diseases, you may not ever see the
same thing twice
……..
- 8. © 2013 New York Genome Center 8NYGC PRIVILEGED & CONFIDENTIAL
RISKS OF SHARING YOUR GENOMIC DATA
SHOULDN’T BE UNDERESTIMATED EITHER
GINA does not protect against denial of
disability coverage, life insurance, long-term
care insurance based on genetic information!
For you or your family members!!!!
Some people can afford not to worry about
those issues
But for some, it’s critical!
Does sharing only for research projects, not
publicly, reduce this risk sufficiently?
- 9. © 2013 New York Genome Center 9NYGC PRIVILEGED & CONFIDENTIAL
AN EXAMPLE: NYC CLINICAL DATA
RESEARCH NETWORK
“Both the opportunity and the anxiety are
pretty electrifying,” Francis S. Collins, director
of the National Institutes of Health, said in an
interview.
- 10. © 2013 New York Genome Center 10NYGC PRIVILEGED & CONFIDENTIAL
NYC CLINICAL DATA RESEARCH
NETWORK
FUNDED by PCORI
Individual Researchers
- 11. © 2013 New York Genome Center 11NYGC PRIVILEGED & CONFIDENTIAL
NYC-CDRN GOALS
Collect de-identified data from all patients
from all of the member health systems
2.5-6.5 Million patient records
De-duplicated across health systems
Expect the first 2.5M records (with incomplete
data) by August 1, assuming legal approvals
Available for retrospective studies
Available for cohort identification
Will eventually host prospective studies as well
Proposal promised connections to genomic data
- 12. © 2013 New York Genome Center 12NYGC PRIVILEGED & CONFIDENTIAL
THE DETAILS
Expect to have at least 2.4 million patient records
by August
Currently have 2M “dummy” records
Waiting for the legalities….
De-duplicated across health systems!
NYGC provides de-identified information only
But we receive “limited data sets” under HIPAA
Healthix and Bronx RHIO – trusted brokers - have
identifying information but no health data
What are we permitted to do with this data?
What are the privacy, security, regulatory
issues?
- 13. © 2013 New York Genome Center 13NYGC PRIVILEGED & CONFIDENTIAL
PRIVACY: AT WHAT LEVEL CAN WE
GUARANTEE THIS?
Patients are “fully de-identified” in any data we
make available (according to HIPAA standards)
Is that really true?
One physician tells me that 3 consecutive phosphate
readings are fully identifying
Providers do not want to be identified, and we will
keep NO provider information
Plan was to provide proxy ids for health systems
Allowing comparisons, but not identification
But patient 3-digit zip codes are permitted by HIPAA in NY
And that will identify the hospital!!!!
- 14. © 2013 New York Genome Center 14NYGC PRIVILEGED & CONFIDENTIAL
REGULATORY
Lawyers do not agree on what constitutes re-
identification under HIPAA
I can identify cohorts for prospective studies
from the collected data.
Can I give those anonymized ids back to the
hospital they came from to ask that the patients be
contacted for consent to participate in the study?
Or does that constitute knowingly using de-
identified data for re-identification purposes –
Even though I will never see the patient identity?
- 15. © 2013 New York Genome Center 15NYGC PRIVILEGED & CONFIDENTIAL
CLINICAL GENOMICS
Many more challenges
Identifiable information
Many types of data
Electronic Health Records
Genomic Data
Personally reported data
Device data
Image data
Current Auto-Immune Disease Project uses most
of these and more
- 16. © 2013 New York Genome Center 16NYGC PRIVILEGED & CONFIDENTIAL
LINKING TO OTHER DATA
Prospective studies with additional (possibly
identifiable) data collection
Linking to genomic data
Linking to personal device data, patient-provided
data, etc.
How do we isolate identifiable information
from the de-identified data, to prevent re-
identification, and still allow the data to be
linked for studies with appropriate consents?
A security question!!!!
- 17. © 2013 New York Genome Center 17NYGC PRIVILEGED & CONFIDENTIAL
HOW DO WE CONNECT THIS TO
GENOMIC DATA?
Genomic data does not fall under HIPAA – yet
But it is considered “identifying information”
Does accessing genomic data and the de-
identified patient data by matching
anonymized ids constitute re-identification of
the de-identified data?
We may need to keep a new copy (consented)
of the same data for each project.
- 18. © 2013 New York Genome Center 18NYGC PRIVILEGED & CONFIDENTIAL
PCORI: A MIX OF PRIVACY,
REGULATIONS AND SECURITY ISSUES
Are we using the data in acceptable ways
without explicit patient consent?
Are we meeting HIPAA regulations around re-
identification and use of limited datasets?
Do we have adequate security around data
transfers and access control from external
networks (eg PCORNet)?
- 19. © 2013 New York Genome Center 19NYGC PRIVILEGED & CONFIDENTIAL
MAINTAINING A GENOMIC DATA
WAREHOUSE
- 20. © 2013 New York Genome Center 20NYGC PRIVILEGED & CONFIDENTIAL
NYGC’S GOAL IS TO ENABLE DATA
SHARING!
Collecting yet more data
Maintaining a catalog of data hosted by
collaborators
Security for multi-tenancy models also!
Secure transmission of data among
collaborators
Maintaining our own data securely
- 21. © 2013 New York Genome Center 21NYGC PRIVILEGED & CONFIDENTIAL
DATA SECURITY IS VERY GRANULAR
Protecting researchers from themselves
Ensure protection of unpublished data
IRB approvals and informed consents limit who can use data
Researchers don’t always understand the details
Project-level access control works initially
But data sharing agreements can allow access to only some
samples in a project for secondary use
Check boxes on informed consents are a big culprit
And sample-level security is insufficient because owners of
data may allow the same samples to be used in multiple
studies
But preclude researchers in one study from seeing results of
others
- 22. © 2013 New York Genome Center 22NYGC PRIVILEGED & CONFIDENTIAL
OPTIONS FOR ACCESS CONTROL
Force all access through a catalog
Doesn’t work for methods requiring file paths
Users hate it
FUSE file systems
User-space virtual file system
Too slow
Linux access control
Doesn’t work with NFS V3
NFS allows only 16 groups per user
That limits everyone to 16 project-sample combinations
And it doesn’t work with databases!!
May well need cell-level access within databases
- 23. © 2013 New York Genome Center 23NYGC PRIVILEGED & CONFIDENTIAL
SECURITY OF GENOMIC DATA
Supporting prospective studies means maintaining
identifiable data
As does storing genomic data – connected or not
Our infrastructure is FISMA-moderate compliant
Is this sufficient?
BAM files are too big to encrypt at rest and still
access in pipelines!!
Hardware assisted encryption still takes 3 hours to
decrypt a BAM file
Encrypted disk may be sufficient – but expensive at
least
Can’t follow standard HIPAA/HiTECH suggestions
- 24. © 2013 New York Genome Center 24NYGC PRIVILEGED & CONFIDENTIAL
EDGE SECURITY
Edge Security
We’re FISMA moderate compliant
We’ve passed pharma security audits
We’ve passed independent security audits
We regularly do penetration testing
We monitor logs
Is this sufficient?
We’ll never be entirely sure
- 25. © 2013 New York Genome Center 25NYGC PRIVILEGED & CONFIDENTIAL
THE BALANCING ACT!
Collaboration Restrictions
- 26. © 2013 New York Genome Center 26NYGC PRIVILEGED & CONFIDENTIAL
ACKNOWLEDGEMENTS
PCORI
Rainu Kaushal(Cornell – PCORI
PI)
George Hripsak(Columbia)
Parsa Mirhaji (Montefiore)
Alex Low (Cornell)
Tom Check (Healthix)
Tom Campion (Cornell)
Deborah Ascheim(Mt Sinai)
Many others
Rockefeller
Mayu Frank
Dana Orange
NYGC
Cristyn Kells
Dorian Leary
Uday Evani
Nina Lapchyk
Shailu Gargeya
Chris Black
Scott Collins
Jen Baldwin
Bob Darnell
Cornell Tech
Deborah Estrin
Funded In Part by the Patient-Centered Outcomes Research Institute