0
Improving Healthcare Outcomes by Leveraging
Unstructured Data for Secondary Use
How Organizations Can Anonymize Unstructur...
Our Presenters
Dr. Khaled El Emam is the founder and
CEO of Privacy Analytics Inc.

Chris Wright is Privacy Analytics’ VP
...
Agenda
• Privacy Analytics – Overview
• The Proliferation of Unstructured Data in Healthcare
• The Role of Unstructured Da...
Privacy Analytics - Overview
For organizations that want to safeguard and enable their personal
information for secondary ...
Secondary Use for Healthcare Data

Definition
Secondary use of health data applies
personal health information (PHI) for u...
Our Customers
State of
Louisiana
Department of Preventative Medicine

www.privacyanalytics.ca | 855.686.4781
info@privacya...
The Changing Healthcare Data Landscape

Richer levels of
aggregated data,
but increasingly
granular with the
view of captu...
The Proliferation of Unstructured Data
According to IBM, Ovum and other researchers, 80-90 percent of all
medical data tod...
Growth of Unstructured Data in EHR’s

• 3X increase for EHR
systems with basic
clinician notes since
2008
• 11X increase i...
Creating the Conditions for an Analytic Pipeline
EHR unstructured data is rich with insight, but requires another step to
...
Balancing Privacy and Utility for Secondary Use

1

Data Quality

Ensuring de-identified
data has analytic
usefulness by
d...
Agenda
• Privacy Analytics – Overview
• The Proliferation of Unstructured Data in Healthcare
• The Role of Unstructured Da...
Types of Text

• Files (text files, XML files, other formats that can be converted
to text such as Word and PDF)
• Text fi...
Example Text Information
Ms. Semenza, admitted 07/19/2002, is an 84-year-old woman
with a history of diverticulitis who wa...
Detecting Personal Information
Direct Identifiers

Indirect Identifiers

First name

City

Middle name

State

Last name

...
Detecting Personal Information

Ms. Semenza, admitted 07/19/2002, is an 84-year-old woman
with a history of diverticulitis...
Anonymizing the Information

• Redact: “*****”
• Redact & Tag: <Firstname index=1/>
• Randomize and Replace: replaces the ...
Redact and Tag
Ms. <Lastname index=1/>, admitted <Date index=1/>, is an
<Age index=1/> woman with a history of diverticuli...
De-identification Standards

www.privacyanalytics.ca | 855.686.4781
info@privacyanalytics.ca

PRIVACYANALYTICS.CA
19
Performance
Standards
Direct Identifiers

Indirect Identifiers

Recall

> 95%

Risk-Based Threshold

Precision

> 80%

> 7...
Referential Integrity

www.privacyanalytics.ca | 855.686.4781
info@privacyanalytics.ca

PRIVACYANALYTICS.CA
21
PARAT Software
Providing organizations with a robust, scalable set of capabilities to
anonymize structured and unstructure...
PARAT 5.3
Summary
• EHR’s represent a rich and growing source of unstructured data for
secondary use
• Anonymization needs to be und...
Learn More …
• Let us know if you’d like to learn more. We have experts available for either a
demo, or a 30-minute worksh...
Question and Answer

?

?
?

www.privacyanalytics.ca | 855.686.4781
info@privacyanalytics.ca

PRIVACYANALYTICS.CA
26

26
Upcoming SlideShare
Loading in...5
×

Improving Healthcare Outcomes By Leveraging Data for Secondary Use

218

Published on

Healthcare organizations are responding to meaningful use and accountable care initiatives that focus on increasing the quality of care, improving patient safety and reducing costs. Analyzing patient-level data in electronic health records for secondary use is critical to driving these initiatives successfully.

Privacy and compliance and data analytic professionals will learn how their organizations can:

- Comply with HIPAA-based requirements for anonymizing unstructured data for secondary use;
- Discover personal information in text-based formats and apply risk-based rules to their de-identification;
- De-identify and mask unstructured data in text and XML formats; and,
- Determine risk metrics associated with anonymization and its quality for analysis.

To listen to this presentation, please visit https://vimeo.com/80921698/settings

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
218
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Improving Healthcare Outcomes By Leveraging Data for Secondary Use"

  1. 1. Improving Healthcare Outcomes by Leveraging Unstructured Data for Secondary Use How Organizations Can Anonymize Unstructured Data in Electronic Health Records www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca
  2. 2. Our Presenters Dr. Khaled El Emam is the founder and CEO of Privacy Analytics Inc. Chris Wright is Privacy Analytics’ VP Marketing, and will be your moderator today. www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 2
  3. 3. Agenda • Privacy Analytics – Overview • The Proliferation of Unstructured Data in Healthcare • The Role of Unstructured Data in Improving Healthcare Outcomes • Key Steps to Anonymize Unstructured Data – How to Mitigate the Risk of Re-identification – What Are Your Organization’s Compliance Considerations – Anonymization Test Case • DEMO • Summary • Question and Answer www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 3
  4. 4. Privacy Analytics - Overview For organizations that want to safeguard and enable their personal information for secondary use … • Purpose-built software that automates the deidentification and masking of data using a risk-based approach to anonymize personal information in compliance with HIPAA requirements • Integrated capabilities to anonymize structured and unstructured data from multiple sources • Peer-reviewed methodologies and value-added services that certify data for secondary use www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 4
  5. 5. Secondary Use for Healthcare Data Definition Secondary use of health data applies personal health information (PHI) for uses outside of direct health care delivery. It includes such activities as analysis, research, quality and safety measurement, public health, payment, provider certification or accreditation, marketing, and other business applications, including strictly commercial activities. 1 1. Definition sourced from white paper, “Toward a National Framework for the Secondary Use of Health Data: An American Medical Informatics Association White Paper”, J Am Med Inform Assoc 2007;14:1-9 doi:10.1197/jamia.M2273 www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 5
  6. 6. Our Customers State of Louisiana Department of Preventative Medicine www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 6
  7. 7. The Changing Healthcare Data Landscape Richer levels of aggregated data, but increasingly granular with the view of capturing the totality of patient information, experiences and interactions McKinsey & Company, “The ‘Big Data’ Revolution in Healthcare: Accelerating Value and Innovation,” January 2013 www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 7
  8. 8. The Proliferation of Unstructured Data According to IBM, Ovum and other researchers, 80-90 percent of all medical data today is unstructured ... and that volume is doubling every five years. 1 Electronic health records where personal information resides in XML as free form text and needs to be anonomyized for analysis Medical devices where unstructured data or free form text from machine “dumps” (i.e. x-ray machines or CAT scans) is sent to a database(s) for analysis Online Forums where patients or providers discuss their conditions or cases, requiring anonymization to facilitate sentiment analysis and other forms of information analysis 1. http://ovum.com/2012/05/11/unlocking-the-potential-of-unstructured-medical-data/ www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 8
  9. 9. Growth of Unstructured Data in EHR’s • 3X increase for EHR systems with basic clinician notes since 2008 • 11X increase in the adoption of comprehensive EHR systems since 2008 1. The Office of the National Coordinator for Health Information Technology, ONC Data Brief, March 2013 www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 9
  10. 10. Creating the Conditions for an Analytic Pipeline EHR unstructured data is rich with insight, but requires another step to optimize its utility for secondary use and derive actionable insight Creating an Analytic Pipeline for Unstructured Data for Secondary Use Unstructured data’s utility for action is shaped by the relative degree of compliance, risk and anonymization applied. Discharge summaries Physician notes XML code EHR Unstructured Anonymized Data Data Reporting Text fields Comments Scanned docs www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 10 Advanced Analytics
  11. 11. Balancing Privacy and Utility for Secondary Use 1 Data Quality Ensuring de-identified data has analytic usefulness by determining its relative risk associated with its disclosure, sharing and re-sale www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 2 Analytic Granularity Allowing users to configure de-identification for aggregated and microlevel analysis of patient level data without compromising privacy and costly breaches PRIVACYANALYTICS.CA 11 3 Depth of Insight Enabling analysis of the total patient health experience, to compile a complete picture of this experience from multiple data sources and types
  12. 12. Agenda • Privacy Analytics – Overview • The Proliferation of Unstructured Data in Healthcare • The Role of Unstructured Data in Improving Healthcare Outcomes • Key Steps to Anonymize Unstructured Data – How to Mitigate the Risk of Re-identification – What Are Your Organization’s Compliance Considerations – Anonymization Test Case • DEMO • Summary • Question and Answer www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 12
  13. 13. Types of Text • Files (text files, XML files, other formats that can be converted to text such as Word and PDF) • Text fields in a database (e.g., notes and comments fields) www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 13
  14. 14. Example Text Information Ms. Semenza, admitted 07/19/2002, is an 84-year-old woman with a history of diverticulitis who was found to have colon cancer on colonoscopy, which was performed in July of 2002. An invasive moderately differentiated adenocarcinoma was noted in the transverse colon at 80 cm. Patient address: shipwreckkelly@gmail.com www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 14
  15. 15. Detecting Personal Information Direct Identifiers Indirect Identifiers First name City Middle name State Last name Country Street ZIP Code PO Box Postal Code Email address Organization (facility) name IP address Age Phone number Date ID (e.g., SSN and CC) www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 15
  16. 16. Detecting Personal Information Ms. Semenza, admitted 07/19/2002, is an 84-year-old woman with a history of diverticulitis who was found to have colon cancer on colonoscopy, which was performed in July of 2002. An invasive moderately differentiated adenocarcinoma was noted in the transverse colon at 80 cm. Patient address: shipwreckkelly@gmail.com www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 16
  17. 17. Anonymizing the Information • Redact: “*****” • Redact & Tag: <Firstname index=1/> • Randomize and Replace: replaces the value with a randomly generated value • Special generalization rules for dates and ZIP Codes www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 17
  18. 18. Redact and Tag Ms. <Lastname index=1/>, admitted <Date index=1/>, is an <Age index=1/> woman with a history of diverticulitis who was found to have colon cancer on colonoscopy, which was performed in <Date index=2/>. An invasive moderately differentiated adenocarcinoma was noted in the transverse colon at 80 cm. Patient address: <Email index=1/> www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 18
  19. 19. De-identification Standards www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 19
  20. 20. Performance Standards Direct Identifiers Indirect Identifiers Recall > 95% Risk-Based Threshold Precision > 80% > 70% Example on i2b2 Data Set Direct Identifiers Indirect Identifiers Recall (all-or-nothing) 0.95 – 1.0 0.78 – 1.0 Precision 0.93 – 1.0 0.8 to 1.0 www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 20
  21. 21. Referential Integrity www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 21
  22. 22. PARAT Software Providing organizations with a robust, scalable set of capabilities to anonymize structured and unstructured data Use standard and configurable dictionary (Gazetter) to enable faster and more accurate discovery Automate integration with different data sources and applications Match masked personal information to corresponding anonomyized unstructured text data Tag-based indexing to ensure personal information (i.e. the name Chris) is replaced consistently throughout the database Modular architecture for optimal extensibility Stronger Safeguards. Richer Analysis. Integrated. www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 22
  23. 23. PARAT 5.3
  24. 24. Summary • EHR’s represent a rich and growing source of unstructured data for secondary use • Anonymization needs to be understood as a critical step in creating a reporting and analytic pipeline that optimizes data utility and is compliant with legal requirements • Defensible anonymization of free form text that is compliant is possible • Anonymization can be completed across unstructured and structured data to attain consistency • This can scale to large volumes of data and flat files www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 24
  25. 25. Learn More … • Let us know if you’d like to learn more. We have experts available for either a demo, or a 30-minute workshop to better understand your anonymization needs for structured and unstructured data. You can reach us at info@privacyanalytics.ca • We also have several events upcoming: – December 5: Privacy by Design User Forum @ Fairmont Royal York, Toronto, Ontario – December 6: Twin Cities Health Privacy Summit @ Mayo Clinic, Minneapolis, Minnesota – December 11-12: Health Data Summit, NAHDO 28th Annual Conference, Denver, Colorado www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 25
  26. 26. Question and Answer ? ? ? www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca PRIVACYANALYTICS.CA 26 26
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×