Accelerate responsible clinical trials data sharing while safeguarding participant privacy

384 views
259 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
384
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Accelerate responsible clinical trials data sharing while safeguarding participant privacy

  1. 1. www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6 WEBINAR: Accelerate Responsible Clinical Trials Data Sharing While Safeguarding Participant Privacy
  2. 2. © 2014 Privacy Analytics, Inc. 2 Presenters Chris Wright, Vice President, Marketing and Today’s Moderator, Privacy Analytics, Inc. Dr. Khaled El Emam, CEO and founder of Privacy Analytics, Inc.
  3. 3. © 2014 Privacy Analytics, Inc. 3 Presenter Chris Wright, Vice President, Marketing and Today’s Moderator, Privacy Analytics, Inc.
  4. 4. © 2014 Privacy Analytics, Inc. 4 1. Please be sure to mute your phones 2. We’ll have a Q&A after the webinar. Please craft your questions in the dialogue box you see to your right 3. And we’re giving away copies of our Anonymizing Health Data. Please click the link below to fill out the form. We’ll send the presentation to everyone after the webinar Some Housecleaning http://info.privacyanalytics.ca/anonymizinghealthcaredata.html
  5. 5. © 2014 Privacy Analytics, Inc. 5 1. Overview of Privacy Analytics 2. Background on clinical trials transparency 3. Special considerations when anonymizing clinical trials data 4. A risk-based methodology for data anonymization Agenda
  6. 6. © 2014 Privacy Analytics, Inc. 6 About Privacy Analytics For organizations that want to safeguard and enable their data for secondary use … • Software that automates the de-identification and masking of data using a risk-based approach to anonymize personal information • Integrated capabilities to anonymize structured and unstructured data from multiple sources • Peer-reviewed methodologies and value- added services that certify data as de- identified using the expert statistical method under HIPAA
  7. 7. © 2014 Privacy Analytics, Inc. 7 Presenter Dr. Khaled El Emam, CEO and founder of Privacy Analytics, Inc.
  8. 8. © 2014 Privacy Analytics, Inc. 8 1. Overview of Privacy Analytics 2. Background on clinical trials transparency 3. Special considerations when anonymizing clinical trials data 4. A risk-based methodology for data anonymization Agenda
  9. 9. © 2014 Privacy Analytics, Inc. 9 Industry Principles
  10. 10. © 2014 Privacy Analytics, Inc. 10 • 30 April 2013: Final advice to the European Medicines Agency from the clinical trial advisory group on protecting patient confidentiality • 24 June 2013: Publication and access to clinical trials data (draft policy) • 14 May 2014: Finalisation of EMA policy on publication of and access to clinical trial data • 12 June 2014: European Medicines Agency agrees policy on publication of clinical trial data with more user-friendly amendments
  11. 11. © 2014 Privacy Analytics, Inc. 11 “Adequately de-identified data should be made available for wide access”
  12. 12. © 2014 Privacy Analytics, Inc. 12
  13. 13. © 2014 Privacy Analytics, Inc. 13 What About the FDA ?
  14. 14. © 2014 Privacy Analytics, Inc. 14 Direct & Quasi-identifiers Examples of direct identifiers: Name, address, telephone number, fax number, MRN, health card number, health plan beneficiary number, VID, license plate number, email address, photograph, biometrics, SSN, SIN, device number, clinical trial record number Examples of quasi-identifiers: sex, date of birth or age, geographic locations (such as postal codes, census geography, information about proximity to known or unique landmarks), language spoken at home, ethnic origin, total years of schooling, marital status, criminal history, total income, visible minority status, profession, event dates, number of children, high level diagnoses and procedures
  15. 15. © 2014 Privacy Analytics, Inc. 15 Anonymization Landscape
  16. 16. © 2014 Privacy Analytics, Inc. 16 De-identification Standards
  17. 17. © 2014 Privacy Analytics, Inc. 17 HIPAA Safe Harbor Method Safe Harbor Direct Identifiers and Quasi-identifiers 1. Names 2. ZIP Codes (except first three) 3. All elements of dates (except year) 4. Telephone numbers 5. Fax numbers 6. Electronic mail addresses 7. Social security numbers 8. Medical record numbers 9. Health plan beneficiary numbers 10.Account numbers 11.Certificate/license numbers 12.Vehicle identifiers and serial numbers, including license plate numbers 13.Device identifiers and serial numbers 14.Web Universal Resource Locators (URLs) 15.Internet Protocol (IP) address numbers 16.Biometric identifiers, including finger and voice prints 17.Full face photographic images and any comparable images; 18. Any other unique identifying number, characteristic, or code
  18. 18. © 2014 Privacy Analytics, Inc. 18 Safe Harbor Implementations - I
  19. 19. © 2014 Privacy Analytics, Inc. 19 Safe Harbor Implementations - II
  20. 20. © 2014 Privacy Analytics, Inc. 20 Expert Determination (Statistical) Method • A person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable: I. Applying such principles and methods; determines that the risk is “very small” that the information could be used, alone or in combination with other reasonably available information by an anticipated recipient to identify an individual who is a subject of the information; and II. Documents the methods and results of the analysis that justify such determination
  21. 21. © 2014 Privacy Analytics, Inc. 21 Section Takeaways • European regulators are moving in the direction of requiring clinical trials data release • In two stages: redacted CSRs and then data • Industry is taking the initiative to develop mechanism for data sharing already • There is a dearth of good standards to address privacy concerns Current Status
  22. 22. © 2014 Privacy Analytics, Inc. 22 1. Overview of Privacy Analytics 2. Background on clinical trials transparency 3. Special considerations when anonymizing clinical trials data 4. A risk-based methodology for data anonymization Agenda
  23. 23. © 2014 Privacy Analytics, Inc. 23 Anonymization Approaches • Microdata release: individual-level participant data (IPD) is being provided to data recipients as flat files (CSV or SAS) or database files – Microdata can be public or available through controlled access • Online portal: data recipients can access IPD through a portal and perform their analysis through the portal only – No raw data download allowed (different control mechanisms used) – Online portal registration can be public or through a qualification process
  24. 24. © 2014 Privacy Analytics, Inc. 24 No Zero Risk
  25. 25. © 2014 Privacy Analytics, Inc. 25 Anonymizing Portal Access • Is it necessary to anonymize data if it is on a portal ? – There are three types of attack: • Deliberate attack by recipient – manage that risk through contracts and audit trails • Data breach – managed by manufacturer through portal controls • Inadvertent re-identification – could happen if data recipient lives in the same geography as some the participants – It is inadvertent disclosure risk that needs to be managed in a portal – anonymization is still needed
  26. 26. © 2014 Privacy Analytics, Inc. 26 Rare Diseases • Clinical trials on participants with rare diseases have very small cohorts – can that data be anonymized ? • This depends on a number of factors: – Whether the trial participants represent a fraction of all patients in the relevant geographies with the disease – Whether the rare disease is visible or not – Whether an adversary would know if someone has a rare disease – Whether a portal is used or not • It should not be taken for granted that it is not possible to anonymize rare disease trials
  27. 27. © 2014 Privacy Analytics, Inc. 27 Data Quality Balance
  28. 28. © 2014 Privacy Analytics, Inc. 28 Replicating Results • Disclosed data should replicate the results of any published studies from the clinical trial • This imposes a stringent standard on any anonymization techniques that are used • It would be challenging for a manufacturer if it was not possible to replicate the results from published studies
  29. 29. © 2014 Privacy Analytics, Inc. 29 What to Expect When Anonymizing • With sophisticated anonymization techniques, the anonymized data analysis will replicate the conclusions but not necessarily the exact values • With basic anonymization techniques, the conclusions may not be replicated
  30. 30. © 2014 Privacy Analytics, Inc. 30 Anonymizing Dates • Can convert all dates to intervals from enrollment • However, if the enrollment period was short then reversing a range of possible enrollment dates may be plausible – That risk should be measured rather than assumed – Will depend on whether geography is also known • Date shifting is another scheme which allows the disclosure of precise dates and can still provides assurances about re- identification risk
  31. 31. © 2014 Privacy Analytics, Inc. 31 Anonymizing Patient Locations • Most clinical trials do not collect that information for analysis purposes • However, if that information is needed then geo-clustering of ZIP/postal codes is a good technique for protecting location information • It maintains geospatial specificity
  32. 32. © 2014 Privacy Analytics, Inc. 32 Poor Selection of Pseudonyms
  33. 33. © 2014 Privacy Analytics, Inc. 33 Releasing Site Details • Replacing the site name with an ID may not always be effective • The highest recruiting sites are likely knowable from clinicaltrials.gov or equivalent registries • A frequency analysis on the data would reveal which site was the highest recruiting (especially if country information is provided) • The risk is from geoproxy attacks – many participants will seek care in facilities close to where they live • For a nontrivial percentage of participants, it may be possible to predict their residence location with some accuracy
  34. 34. © 2014 Privacy Analytics, Inc. 34 Public IPD? • Public IPD will be challenging to anonymize adequately and ensure exact replication of published results • Public IPD is still useful with that caveat – may be good for summary statistics and the investigation of basic relationships • Therefore this should not be discounted • Needs to be augmented with other data release methods that would allow the disclosure of more detailed data
  35. 35. © 2014 Privacy Analytics, Inc. 35 Data Release Strategy • Strategy 1: – When a data request is received, the data set is anonymized to specifically meet the data request – Must be repeated for all data requests • Strategy 2: – Create one anonymized data set for each trial and irrespective of the data request, the same complete anonymized data set is released – Much more cost effective, but probably provides more data than is needed
  36. 36. © 2014 Privacy Analytics, Inc. 36 The Importance of Governance • More than just technical approaches are needed • Governance necessary for: – Tracking data users – Stigmatizing analytics reviews – Audits where necessary – Review of anonymization practices – Monitoring legislative and regulatory environment
  37. 37. © 2014 Privacy Analytics, Inc. 37 Section Takeaways Special Considerations • Multiple approaches to releasing IPD • Challenges releasing high quality public IPD • Sophisticated anonymization techniques are needed to ensure data quality • Governance also needed (as well as technical approaches) • European regulators are moving in the direction of requiring clinical trials data release • In two stages: redacted CSRs and then data • Industry is taking the initiative to develop mechanism for data sharing already • There is a dearth of good standards to address privacy concerns Current Status
  38. 38. © 2014 Privacy Analytics, Inc. 38 1. Overview of Privacy Analytics 2. Background on clinical trials transparency 3. Special considerations when anonymizing clinical trials data 4. A risk-based methodology for data anonymization Agenda
  39. 39. © 2014 Privacy Analytics, Inc. Identifiability Spectrum Little De-identification Significant De-identification 5 20 3 2 10 8 11 16 A range of operational precedents exist based on the situational context of the data’s use and available mitigating controls that protect it.
  40. 40. © 2014 Privacy Analytics, Inc. Re-identification Risk: Example DIRECT IDENTIFIERS INDIRECT IDENTIFIERS SENSITIVE VARIABLES OTHER ID Name Telephone No. Sex Year of Birth Lab Test Lab Result Pay Delay 1 John Smith (412) 668-5468 M 1959 Albumin, Serum 4.8 37 2 Alan Smith (413) 822-5074 M 1969 Creatine Kinase 86 36 3 Alice Brown (416) 886-5314 F 1955 Alkaline Phosphatase 66 52 4 Hercules Green (613)763-5254 M 1959 Bilirubin <0 36 5 Alicia Freds (613) 586-6222 F 1942 BUN/Creatinine Ratio 17 82 6 Gill Stringer (954) 699-5423 F 1975 Calcium, Serum 9.2 34 7 Marie Kirkpatrick (416) 786-6212 F 1966 Free Thyroxine Index 2.7 23 8 Leslie Hall (905) 668-6581 F 1987 Globulin, Total 3.5 9 9 Douglas Henry (416) 423-5965 M 1959 B-type Natriuretic peptide 134 38 10 Fred Thompson (416) 421-7719 M 1967 Creatine Kinase 80 21 3 Two quasi-identifiers matching in three cells within a dataset 3 Two quasi-identifiers matching in three cells within a dataset
  41. 41. © 2014 Privacy Analytics, Inc. 41 Little De-identification Significant De-identification 5 20 3 2 10 8 11 16 Spectrum of Identifiability Leading research organizations apply these precedents to data release for secondary purposes. We’ve embedded these precedents into our software, PARAT CORE.
  42. 42. © 2014 Privacy Analytics, Inc. Managing Re-identification Risk
  43. 43. © 2014 Privacy Analytics, Inc. Complexity Stifles Time to Insight “… removing patient identifiers and formatting all data sets [ ..] can take up to six months.” Roche Description of Their Clinical Trials Data Sharing Process for Research Requests … and the volume of clinical trials data releases will continue to grow rapidly
  44. 44. © 2014 Privacy Analytics, Inc. 44 Automating Anonymization
  45. 45. © 2014 Privacy Analytics, Inc. Reduce Complexity: Accelerate Data Releases A scalable set of packaged capabilities that enables the release of anonymized data for analysis quickly, securely and cost-effectively: Automate Audit Analyze
  46. 46. © 2014 Privacy Analytics, Inc. 46 Creating Expertise to Govern Data Releases • Course on risk-based anonymization (2-day): on-site or remote • Exam on body of knowledge and work through case studies • Maintaining knowledge over time through continuous education • Coaching on two data sets • Requires automated support to operationalize
  47. 47. © 2014 Privacy Analytics, Inc. Challenges: • Significant size of the data set. Held more than five years of clinical, prescription, laboratory, scheduling and billing data of patients • Numerous release requests from more than 2664 clinics and 5850 physicians Post-marketing Surveillance Analytic Outcomes: De-identified data to analyze: • Post-marketing surveillance of adverse events • Public health surveillance • Prescription pattern analysis • Health services analysis Wanted to anonymize data on 535,595 patients from general practices Longitudinal data needed to be used for on-going and on- demand analytics 47
  48. 48. © 2014 Privacy Analytics, Inc. 48 GI Protocol • Two arm protocol; GI events after taking NSAIDs with and without a PPI
  49. 49. © 2014 Privacy Analytics, Inc. 49 Chlamydia Protocol • Females 14-24 years old inclusive tested and tested positive for Chlamydia in the previous 12 months
  50. 50. © 2014 Privacy Analytics, Inc. 50 Section Takeaways A risk-based methodology can be used to release high quality IPD The process can be automated to accelerate data release, reduce costs, ensure consistency, and provide a defensible result Can develop internal expertise or outsource the whole data release process Methodology & Software Special Considerations • Multiple approaches to releasing IPD • Challenges releasing high quality public IPD • Sophisticated anonymization techniques are needed to ensure data quality • Governance also needed (as well as technical approaches) • European regulators are moving in the direction of requiring clinical trials data release • In two stages: redacted CSRs and then data • Industry is taking the initiative to develop mechanism for data sharing already • There is a dearth of good standards to address privacy concerns Current Status
  51. 51. © 2014 Privacy Analytics, Inc. Balancing Privacy with Data Utility Data Quality 1 Analytic Granularity2 Depth of Insight 3 Ensuring de-identified data has analytic usefulness by minimizing the amount of distortion but still ensure that re- identification risk is very small Allowing users to configure the extent of de-identification to match the characteristics of the analysis that is anticipated Enabling analysis of the total patient health experience, to compile a complete picture of this experience from multiple data sources and types The Analytic Benefits of our Approach
  52. 52. © 2014 Privacy Analytics, Inc. 52 Also, contact me to learn more at cwright@privacyanalytics.ca. We can set up a personalized demo or have a discussion on your current anonymization needs. Just drop me a line. We’re giving away copies of our Anonymizing Health Data: http://info.privacyanalytics.ca/anonymizinghealthcaredata.html Anonymization Survey: • http://surveys.ronin.com/wix/p1834 200753.aspx?src=1 July 14-16, Health Analytics Expo and Symposium, Chicago, IL. Final Thoughts
  53. 53. © 2014 Privacy Analytics, Inc. 53 Question and Answer ? ? ?

×