With the rise of the Internet of Things, Big Data and Open Data, data privacy is increasingly important to organizations. Data de-identification is a process to remove identifying information from a data set. This presentation will provide a gentle introduction to data de-identification, anonymization and the reverse process of re-identification.
2. This presentation content is for educational and information purposes only.
BACKGROUND
➔ Assessment
➔ Treatment
➔ Financing
RISK MANAGEMENT
3. This presentation content is for educational and information purposes only.
DE-IDENTIFICATION
De-identification is a process which removes the
association (personal information) between a
subject (person) and another entity (data set).
WHAT IS DE-IDENTIFICATION?
4. This presentation content is for educational and information purposes only.
DE-IDENTIFICATION
WHAT IS DE-IDENTIFICATION?
RISK TREATMENT
CONTROLS
DE-IDENTIFICATION... ...
... ...
5. This presentation content is for educational and information purposes only.
DE-IDENTIFICATION
EXAMPLE
Name Birth Date Postal Code Ice Cream
Bob Smith Jan 1, 1957 K1A 0B1 Chocolate Chip
Alice Wilson Mar 3, 1963 B1K 1A0 Vanilla
... ... ... ...
6. This presentation content is for educational and information purposes only.
DE-IDENTIFICATION
EXAMPLE
Name Birth Date Postal Code Ice Cream
Bob Smith Jan 1, 1957 K1A 0B1 Chocolate Chip
Alice Wilson Mar 3, 1963 B1K 1A0 Vanilla
... ... ... ...
Direct Identifier
7. This presentation content is for educational and information purposes only.
DE-IDENTIFICATION
EXAMPLE
Name Birth Date Postal Code Ice Cream
Bob Smith Jan 1, 1957 K1A 0B1 Chocolate Chip
Alice Wilson Mar 3, 1963 B1K 1A0 Vanilla
... ... ... ...
Indirect (Quasi) Identifiers
8. This presentation content is for educational and information purposes only.
DE-IDENTIFICATION
EXAMPLE
ID Name Birth Date Postal Code Ice Cream
47562 Bob Smith Jan 1, 1976 K1A 0B1 Chocolate Chip
17236 Alice Wilson Mar 3, 1963 B1K 1A0 Vanilla
... ... ... ...
ID Ice Cream
47562 Chocolate Chip
17236 Vanilla
... ...
9. This presentation content is for educational and information purposes only.
DE-IDENTIFICATION
How about images? video?
CHALLENGES
10. This presentation content is for educational and information purposes only.
DE-IDENTIFICATION
➔ Goals
◆ Reduce Risk
◆ Maximize Data Use
WHY IS IT IMPORTANT?
11. This presentation content is for educational and information purposes only.
DE-IDENTIFICATION
TECHNIQUES
➔ Suppression
➔ Variation / Noise
➔ Swapping
➔ Masking
12. This presentation content is for educational and information purposes only.
RE-IDENTIFICATION
Re-identification is a process to reassociate a
subject to the original entity in order to determine
the identity of the subject.
WHAT IS RE-IDENTIFICATION?
13. This presentation content is for educational and information purposes only.
RE-IDENTIFICATION
EXAMPLE
Name Birth Date Postal Code Ice Cream
Bob Smith Jan 1, 1957 K1A 0B1 Chocolate Chip
Alice Wilson Mar 3, 1963 B1K 1A0 Vanilla
... ... ... ...
14. This presentation content is for educational and information purposes only.
RE-IDENTIFICATION
LINKAGE
Birth Date
Postal Code
...
Ice Cream
Name
Telephone
...
Secondary
Source
15. This presentation content is for educational and information purposes only.
RE-IDENTIFICATION
➔ Pattern
◆ Account Numbers
◆ Licence Plates
◆ ...
BRUTE FORCE
16. This presentation content is for educational and information purposes only.
RE-IDENTIFICATION
➔ 1997 - Governor's medical records
➔ 2006 - AOL Search Data
➔ 2014 - New York City Taxi
HISTORICAL EVENTS
17. This presentation content is for educational and information purposes only.
ANONYMIZATION
Anonymization is a process which is irreversible and
inhibits the reassociation of the subject to the
original entity.
WHAT IS ANONYMIZATION?
18. This presentation content is for educational and information purposes only.
ANONYMIZATION
EXAMPLE
ID Name Birth Date Postal Code
47562 Bob Smith Jan 1, 1957 K1A 0B1
17236 Alice Wilson Mar 3, 1963 B1K 1A0
ID Ice Cream
47562 Chocolate Chip
17236 Vanilla
19. This presentation content is for educational and information purposes only.
RISK MANAGEMENT
➔ Audits
➔ Agreements
◆ Data Use Agreement (DUA)
➔ Policies & Procedures
➔ Education & Training
➔ Limits on Use / Collection
➔ Security
MORE CONTROLS
21. This presentation content is for educational and information purposes only.
REFERENCES
Garfinkel L. S. (2015). NIST 8053 De-Identification of Personal
Information. U.S. Department of Commerce. Gaithersburg, MD
Retreived from http://dx.doi.org/10.6028/NIST.IR.8053
Nelson S. G. (2015). Practical Implications of Sharing Data: A Primer on
Data Privacy, Anonymization, and De-Identification. ThotWave
Technologies. Chapel Hill, NC. Retreived from http://suppor
t.sas.com /resources/papers/proceedings15/1884-2015.pdf
Cavoukian A., Emam E. K. (2011). Dispelling the Myths Surrounding
De-identification Anonymization Remains a Strong Tool for
Protecting Privacy. Toronto, Canada. Retreived from https:
//www
.ipc.on.ca/images/Resources/anonymization.pdf