This document provides an overview of challenges related to deidentifying and masking data. It begins with a disclaimer and then lists topics to be covered, including capturing requirements, definitions and terminology, and data governance roles and responsibilities. Definitions of protected health information and personally identifiable information are given. The document discusses Idaho data breach laws and notification requirements. Techniques for data masking like substitution, shuffling, and encryption are defined. Links to resources on deidentification, data masking, and data privacy are provided.
1. Dogs and Masks:
The Challenges of Deidentifying
and Masking data
Sandy Dunn,CISO Blue Cross of Idaho
August 2, 2018
*** Disclaimer ***
This presentation views and opinions are my own, and do not represent the views or endorsement of my
employer Blue Cross of Idaho.All the information is publicly available.
3. Last Presentation Summary
My job as CISO Data is the NewOil
Leverage similar
historical problems
Don’t Do Security
Stuff without
looking at the
problem holistically
Data Governance
Roles and
Responsibilities
CISO
5. 1. Names
2. All geographical subdivisions smaller than a State
3. All elements of dates (except year) for dates directly related to an individual, including birth date,
admission date, discharge date, date of death;
4. Phone numbers
5. Fax numbers
6. Electronic mail addresses
7. Social Security numbers
8. Medical record numbers
9. Health plan beneficiary numbers
HIPAA PHI: List of 18 Identifiers
Capturing Requirements
10. Account numbers
11. Certificate/license numbers
12. Vehicle identifiers and serial numbers, including license plate numbers
13. Device identifiers and serial numbers
14. Web Universal Resource Locators (URLs)
15. Internet Protocol (IP) address numbers
16. Biometric identifiers, including finger and voice prints
17. Full face photographic images and any comparable images and
18. Any other unique identifying number, characteristic, or code
(note this does not mean the unique code assigned by the investigator to code the data)
6. State Data Breach
Federal laws related to cybersecurity are sector-specific, meaning
they apply only to a particular industry such as financial or healthcare.
7. Idaho Data Breach Laws:
Notification Requirements and Penalties
Idaho state law requires businesses to notify affected individuals of a breach as soon as possible, unless a
“good-faith, reasonable, and prompt” investigation reveals that the personal information has not and
will not be misused.
This law also applies to businesses that maintain personal data for another entity.
Businesses that fail to notify can be fined up to $25,000 per breach.
Definition of Protected Information :Combination of (1) name or other identifying info, PLUS (2) one or
more of these "data" elements: SSN; driver's license number; or account number, credit card number,
debit card number if accompanied by PIN, password, or access codes
Notification required only if breaches “materially compromise the security, confidentiality, or integrity
of” PI.
Notification can be written, phone, or electronic
9. Terms
Data masking or data obfuscation is the process of hiding original data with random or altered characters that
makes the resulting data un-traceable to the original.
• Static data tables are loaded to a separate environment. Data masking rules are applied to stable (inactive) data . Dev / test
• On-the-fly data is transferred from environment to environment without data touching a disk on its way. The same technique is applied to
"Dynamic Data Masking" but one record at a time. Most useful for CI/D environments. It sends small subsets of masked testing data from
production to development / test.
• Dynamic happens at runtime, on-demand. It is attribute-based and policy-driven
Techniques
• Substitution another authentic looking value is substituted for the existing value
• Shuffling similar to the substitution method but it derives the substitution set from the same column of data that is being masked. In very
simple terms, the data is randomly shuffled within the column
• Number and date variance – If the overall data set needs to retain demographic and actuarial data integrity applying a random numeric
variance of +/- 120 days to date fields would preserve the date distribution but still prevent traceability back to a known entity based on their
known actual date or birth or a known date value of whatever record is being masked
• Encryption key used to grant visibility to the data
• Masking out character scrambling or masking out of certain fields
Synthetic or hypothetical data completely made up data
https://en.wikipedia.org/wiki/Data_masking
10. DiscussionTopics
How do we get started in driving the importance of Data Security throughout the company?
What does leadership need to do to drive Data Security effectiveness and ensure that Data Security is moving forward?
What is the most important Data Security item we should focus on today?
How do you recommend setting up and managing system access?
What is your process to identify, track and classify data?
How do you work around “Shadow IT” when it comes to Data Security?
Network Segmentation
License issues
Structured vs Unstructured
Information Classification
11.
12. Data Governance
BusinessOwner Legal /
Compliance /
Enterprise Risk
Data
Governance
Cybersecurity
Data
Stewardship
Identify data
roles &
responsibility
Define Requirements SME Audit / Enforce
Structured /
Unstructured
Own process /
workflow
Requirements How Find / Enforce
Data
Classification
Public
Restricted
Confidential
Do Define Monitor use Enforce
Implement
Controls
Data Quality Only Good Data Enforce Requirements How
Data
Management
Building the full
data lifecycle
Do Requirements How Protect
13. Links toTools and Papers
NISTIR 8053 De_Identification of Personal Information https://nvlpubs.nist.gov/nistpubs/ir/2015/nist.ir.8053.pdf
HiTrust De-Identification Framework https://ecfsapi.fcc.gov/file/60001569792.pdf
A BeginnersGuide to Data Masking - Imperva HTTP://www.poer.ro/wp-
content/uploads/2018/01/Camouflage_Data_Masking_Beginners.pdf
Practical Implications of Sharing Data: A Primer on Data
Privacy,Anonymization, and De-Identification
https://support.sas.com/resources/papers/proceedings15/1884-2015.pdf
Securing Sensitive Data in Databases & Datalakes Using Cirro
Data Puppy
https://s3.amazonaws.com/cirro.com/downloads/cirro-data-migrator-
whitepaper.pdf