2. Table of contents
• The Goal
• What is delimited data?
• Various computational systems for protecting
delimited data
MinGen
Data fly
μ-Argus System
k-Similar Algorithm
Scrub System
• References
3. The Goal
Explore computational techniques to:
Release useful information
in such a way that the identity
of any individual or entity
contained in data cannot be
recognized while the data remains practically
useful
4. What is delimited data?
• Data separated by a delimiter such as a comma
character(,) or a tab.
• Generally used in hospital records, office
records etc.
• eg.
5. Computational systems for maintaining privacy
when disclosing person-specific information
Computational systems Description
MinGen uses the generalization and suppression as disclosure
limitation techniques
Datafly System generalizes values based on a profile of the data recipient at
the time of disclosure
μ-Argus System somewhat similar system which is becoming a
European standard for disclosing public use data
k-Similar algorithm finds optimal results such that the data are minimally
distorted yet adequately protected
Scrub System locates and suppresses or replaces personally identifying
information in letters, notes and other textual documents
7. Datafly System
• Maintains anonymity in released data by
automatically substituting, generalizing and
suppressing information as appropriate.
• Decisions are made at the attribute and tuple level at
the time of database access
• Role based approach
• The end result - a subset of the original database that
provides minimal linking and matching of data
because
each tuple matches as many people as the data
holder specifies.
8. Datafly System
• User sets anonymity value
• The Datafly System iteratively computes
increasingly less specific versions of the values
for the attribute until eventually the desired
anonymity level is attained.
• The iterative process ends when there exists k
tuples having the same values assigned across a
group of attributes
9. Datafly System
•Output table - attributes and
tuples correspond to the
anonymity level specified by the
data holder
•anonymity level = 0.7.
10. μ-Argus System
• Provides protection by enforcing a k requirement on the
values found in a quasi-identifier.
• The data holder:
provides a value of k
specifies which attributes are sensitive by assigning a
value to each attribute between 0 and 3 denoting "not
identifying," "most identifying," "more identifying," and
"identifying," respectively.
• The program identifies rare and therefore unsafe
combinations by testing some 2- or 3-combinations of
attributes declared to be sensitive.
11. μ-Argus System
• Unsafe combinations are eliminated by generalizing
attributes within the combination and by local cell
suppression.
• Rather than removing entire tuples when one or more
attributes contain outlier information as is done in the
Datafly System, the m-Argus System simply suppresses
or blanks out the outlier values at the cell-level
• The resulting data typically contain all the tuples and
attributes of the original data, though values may be
missing in some cell locations.
13. k-Similar Algorithm
• There does not exists fewer than k tuples in the
release data having the same values across the
quasi identifier.
• Based on correctness of the k similar
clustering algo k- map protection is avoided
14. Scrub System
• Provides a methodology for removing
personally identifying info in medical writings
integrity of the info remains intact
Identity of the person remains confidential
• called Scrubbing
15. References
• Sweeney, Latanya. "Foundations of privacy protection from a computer
science perspective." In Proceedings, Joint Statistical Meeting, AAAS,
Indianapolis, IN. 2000.
Editor's Notes
Generalizes values within attributes as needed, and removes extreme outlier information from the released data.