Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

563 views
442 views

Published on

How to Get More Granular Data Analysis by Automating Statistical De-identification
(View webinar by visiting this link: https://vimeo.com/90976799)

Leveraging healthcare data for secondary use can have a positive impact on the efficacy of healthcare service delivery, patient care and population health analysis. And yet, traditional approaches to anonymization limit the ability of statisticians and analysts to gain more granular insight into data sets used for secondary purposes.

Part one of this webinar series examines how organizations can automate statistical de-identification to optimize the application of business intelligence and advanced analytics, enabling more granular-level analysis of anonymized data sets.

Privacy and compliance and data analytic professionals will learn how their organizations can:

Apply risk-based approaches to anonymize data based on situational and governing principles of its use;
Automate statistical de-identification of data sets to maximize the application of business intelligence and predictive software insights; and,
Understand how other organizations have applied statistical de-identification to data with real-world examples and demos.

Published in: Healthcare, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
563
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

  1. 1. www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6 WEBINAR: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data How to Get More Granular Analysis by Automating Statistical De-identification
  2. 2. © 2014 Privacy Analytics, Inc. Presenters Luk Arbuckle, Director of Analytics, Privacy Analytics, Inc. Chris Wright, Vice President, Marketing and Today’s Moderator, Privacy Analytics, Inc. Grant Middleton, Solution Architect, Privacy Analytics, Inc.
  3. 3. © 2014 Privacy Analytics, Inc. 1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More 2. Incorporating Risk-based Approaches to Anonymization 3. Gaining Analytic Utility and Value from Anonymized Data Sets: A Case Study 4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data 5. Summary 6. Question and Answer Agenda
  4. 4. © 2014 Privacy Analytics, Inc. Privacy Analytics For organizations that want to safeguard and enable their data for secondary use … • Software that automates the de-identification and masking of data using a risk-based approach to anonymize personal information • Integrated capabilities to anonymize structured and unstructured data from multiple sources • Peer-reviewed methodologies and value- added services that certify data as de- identified using the expert statistical method under HIPAA
  5. 5. © 2014 Privacy Analytics, Inc. Creating Value with Analytics … To control healthcare costs and drive greater efficiencies, organizations will need to become more rigorous in their management, analysis and governance of data and its privacy. McKinsey & Company, “The ‘Big Data’ Revolution in Healthcare: Accelerating Value and Innovation,” January 2013
  6. 6. © 2014 Privacy Analytics, Inc. Which is Difficult in an Analytic Ecosystem 4. Sourced: Misha Paval, Computer & Information Science and Engineering Directorate Information & Intelligent Systems Division, National Science Division, Webinar, January 2012
  7. 7. © 2014 Privacy Analytics, Inc. Reconciling Analytic Imperatives with Privacy Population HealthRegulation Comparative Benchmarking Releasing Data Detecting Fraud Monetizing Data Compliance Accelerating Research Data Complexity Re-identification Risk Post-marketing surveillance Data Breach Marketing Reputation Ethics
  8. 8. © 2014 Privacy Analytics, Inc. Bridging Analytics and Privacy Secondary use of health data applies outside of direct health care delivery. It includes such activities as analysis, research, quality and safety measurement, public health, payment, provider certification or accreditation, marketing and other business applications. Leveraging Data for Secondary Use
  9. 9. © 2014 Privacy Analytics, Inc. Anonymizing Data for Analytic Utility Greater Analytic Utility AnalyticUtility Privacy Governance To allow for richer analysis of anonymized data within well defined regulatory and internal privacy protocols of our customers … 1. Identify and classify variables in the data 2. Mask direct identifiers 3. Determine the threshold for de- identification 4. De-identify indirect identifiers 5. Reporting and certification Current Masking Software
  10. 10. © 2014 Privacy Analytics, Inc. Gaining Richer Analytic Value Primary Structured and Unstructured Data • Income = $82,000 • Plan # 54678 • MRN: 123 • cwright@ • Chris Wright • Born Jan 15, 1978 • Zip code: 12345 Safe Harbor Method (Data Masking) Expert Determination (Statistical De-identification) • MRN: 123 • cwright@ • Chris Wright • Born Jan 15, 1978 • Zip code: 12345 • Income = $82,000 • Plan # 54678 External Structured Data • Income = $82,000 • Plan # 65123 • MRN: 589 • rwong@ • Robert Wong • Born Sept 18, 1978 • Zip code: 12346 • Income = $82,000 • Plan # 54678 EMR data and notes at last PCP visit: • Admission date: 08/05/2012 • Discharge date: 08/07/2012 • MRN: 123 • cwright@ • Chris Wright • Born Jan 15, 1978 • Zip code: 12345 Internal Structured Data Structured & Unstructured Data EMR data and notes at last PCP visit: • Admission date: 08/05/2012 • Discharge date: 08/07/2012 Statistical de-identification allows for richer data analysis EMR data and notes at last PCP visit: • Admission date: 08/05/2012 • Discharge date: 08/07/2012 EMR data and notes at last PCP visit: • Admission date: 08/08/2012 • Discharge date: 08/10/2012
  11. 11. © 2014 Privacy Analytics, Inc. 1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More 2. Incorporating Risk-based Approaches to Anonymization 3. Gaining Analytic Utility and Value from Anonymized Data Sets: A Case Study 4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data 5. Summary 6. Question and Answer Agenda
  12. 12. © 2014 Privacy Analytics, Inc. Presenter Luk Arbuckle, Director of Analytics, Privacy Analytics, Inc.
  13. 13. © 2014 Privacy Analytics, Inc. Statistical De-identification Method If the measured risk does not meet the threshold, specific transformations (such as generalization and suppression) are applied to reduce the risk. Based on plausible re- identification attacks, appropriate metrics are selected and used to measure actual re-identification risk from the data. De-identification Process Measure Risk Apply Transformations Set Risk Threshold Based on the characteristics of the data recipient, the data, and precedents a quantitative threshold is set. Managing the Risk of Re-identification
  14. 14. © 2014 Privacy Analytics, Inc. Enabling Analytics to Use Anonymized Data Managing the Risk of Re-identification is our Starting Point We measure the risk of re-identification along a spectrum of identifiability that takes into account an individual’s data, mitigating controls that protect it and how the data will be used and governed for secondary purposes. Individual Individual’s Data Mitigating Controls Analytic Purpose Protect Individual Privacy Gain Analytic Value
  15. 15. © 2014 Privacy Analytics, Inc. Re-identification Risk: Example Two matching quasi identifiers in three rows. Two matching quasi identifiers in three rows. Two matching quasi identifiers in three rows.
  16. 16. © 2014 Privacy Analytics, Inc. Identifiablity Spectrum and Secondary Use Range of Operational Precedents Re-identification risk thresholds are established precedents used by leading research organizations. These thresholds are based on the situational context and mitigating controls associated with a data set’s use for secondary purposes. Data is anonymized based on whether indirect identifiers can be matched within a given cell size. 5 20 3 2 10 Identifiable information De-identified Information 8 11 16
  17. 17. © 2014 Privacy Analytics, Inc. Identifiablity Spectrum and Secondary Use Range of Operational Precedents Re-identification risk thresholds are established precedents used by leading research organizations depending on how they assess the risk of disclosure. As such, they use a wide variety of operational precedents to trigger the application of anonymization techniques. What we’ve done is captured and automated them. Little De-identification Significant De-identification 5 20 3 2 10 8 11 16
  18. 18. © 2014 Privacy Analytics, Inc. Measuring Re-identification Risk 18
  19. 19. © 2014 Privacy Analytics, Inc. 1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More 2. Incorporating Risk-based Approaches to Anonymization 3. Gaining Analytic Utility and Value from Anonymized Data Sets 4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data 5. Summary 6. Question and Answer Agenda
  20. 20. © 2014 Privacy Analytics, Inc. Post-marketing and Public Health Surveillance Challenges: • Significant size and complex data set. Held more than five years of clinical, prescription, laboratory, scheduling and billing data of patients • Numerous release requests from 2,664 clinics and 5,850 physicians • Data complexity: 820 columns/73 tables Case Study: EMR Software Vendor Analytic Outcomes: De-identified data to analyze: • Post-marketing surveillance of adverse events • Public health surveillance • Prescription pattern analysis • Health services analysis Wanted to anonymize data on 535,595 patients from general practices Longitudinal data needed to be used for on-going and on- demand analytics 20
  21. 21. © 2014 Privacy Analytics, Inc. Assessing Analytic Value – Date Shifting Length of service (LOS) before and after date shifting was performed on this EMR data. We examined whether the date shifting associated with anonymization lengthens or shortens the LOS for patients. Source: Anonymizing Health Data, Chapter 13, De-identification and Data Quality: A Clinical Data Warehouse
  22. 22. © 2014 Privacy Analytics, Inc. What We Discovered … Source: Anonymizing Health Data, Chapter 13, De-identification and Data Quality: A Clinical Data Warehouse Length of service (LOS) was the same before and after statistical de-identification. The mean difference in LOS before and after date shifting is 0.2 days. The expected LOS follows a normal distribution for the 90% of patients shown in this diagram, which makes date shifting seem like a perfectly natural process.
  23. 23. © 2014 Privacy Analytics, Inc. Summary • Most analyses performed on clinical database use descriptive statistics and cross-tabulations • Anonymization meets the requirements of these techniques, while maintaining the essential analytic utility of the original data • Data evaluation should be statistical as opposed to deterministic – comparing a before and after approach of an anonymized data set • In short, anonymization allowed this vendor to fully leverage their data for secondary purposes – all within a reasonable range of optimal utility and value
  24. 24. © 2014 Privacy Analytics, Inc. Presenter Grant Middleton, Solution Architect, Privacy Analytics, Inc.
  25. 25. © 2014 Privacy Analytics, Inc. 1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More 2. Incorporating Risk-based Approaches to Anonymization 3. Gaining Analytic Utility and Value from Anonymized Data Sets: Case Study 4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data 5. Summary 6. Question and Answer Agenda
  26. 26. © 2014 Privacy Analytics, Inc. PARAT Providing organizations with a scalable set of capabilities to automate the anonymization of structured data • Evaluate data quality for analysis after de- identification • Simulate attacks to determine levels of risk associated with the re-identification of personal information • Configure re-identification risk threshold settings directly from Privacy Analytics’ online Risk Assessment application • Determine enterprise policies for data sharing using risk-based methodologies for assessing re- identification • Automate data sharing agreements and certifications that confirm risks are “very small” for re-identification Stronger Safeguards. Richer Analysis. Integrated Solution.
  27. 27. © 2014 Privacy Analytics, Inc. PARAT + BI
  28. 28. © 2014 Privacy Analytics, Inc. Presenter Luk Arbuckle, Director Analytics, Privacy Analytics, Inc.
  29. 29. © 2014 Privacy Analytics, Inc. PARAT + Predictive
  30. 30. © 2014 Privacy Analytics, Inc. Predictive Modelling and Anonymized Data Before anonymization days no yes yes A simple model for a complex problem—predicting the number of days until the next visit. With more variables, and more data prep, you can get a much more accurate model.
  31. 31. © 2014 Privacy Analytics, Inc. Predictive Modelling and Anonymized Data After anonymization days no yes yes But the point is that the results are almost identical. Date shifting, with randomized intervals, allows us to develop predictive models that give us the same answers.
  32. 32. © 2014 Privacy Analytics, Inc. Predictive Modelling and Anonymized Data • Predictive modelling, done right, is challenging. You need a rich source of data to begin with. Then you need to clean and format it, so that you have quality data to work with. • Anonymization, done right, can provide you with a rich source of data. The data cleaning and formatting will still be there, but no more than before. • Predictive modelling can produce the same results, before and after anonymization. Put the time and effort into anonymization so you have quality data to work with.
  33. 33. © 2014 Privacy Analytics, Inc. 1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More 2. Incorporating Risk-based Approaches to Anonymization 3. Gaining Analytic Utility and Value from Anonymized Data Sets: Case Study 4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data 5. Summary 6. Question and Answer Agenda
  34. 34. © 2014 Privacy Analytics, Inc. Balancing Privacy with Data Utility Data Quality 1 Analytic Granularity2 Depth of Insight 3 Ensuring de-identified data has analytic usefulness by minimizing the amount of distortion but still ensure that re- identification risk is very small Allowing users to configure the extent of de-identification to match the characteristics of the analysis that is anticipated Enabling analysis of the total patient health experience, to compile a complete picture of this experience from multiple data sources and types The Analytic Benefits of a Statistical De-identified Method
  35. 35. © 2014 Privacy Analytics, Inc. Upcoming Events • April 16-17, Healthcare Business Intelligence Forum, Washington, D.C. • April 23, Noon EST, The Second Part of Webinar Series: Fear and Loathing Data Monetization • May 21-22, e-Health Initiative, Washington, D.C. • Take the Anonymization Survey: • http://surveys.ronin.com/wix/p1834200753.aspx?src=1
  36. 36. © 2014 Privacy Analytics, Inc. If you’d like to learn more, we’re offering free of charge our latest chapter 13 from Anonymizing Health Data, which provides greater detail into the case study presented today. Learn proven methods for anonymizing health data to share meaningful datasets, without exposing patient identity Leading experts walk you through a risk-based methodology, using case studies from their efforts to de- identify hundreds of data sets Drop me a line if you’d like a copy of the chapter: cwright@privacyanalytics.ca Resources Also, contact me to learn more. We can set up a personalized demo or have a discussion on your current anonymization needs. Just drop me a line.

×