Privacy is a big issue these days. Legally, EU data protection directive will be revised, Google was defeated in EU court and forced to erase data link uopn user's request.
However, we are facing various technical problems to be solved even if limiting to anonyumisation or k-anonymity. In this slide, we describe three of these problmes.
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Problems in Technology to Use Anonymized Personal Data
1. Problems in Technology to Use
Anonymized Personal Data
Hiroshi Nakagawa
Information Technology Center
The University of Tokyo
2. OECD guideline will be revised, and one of
the important point is:
A right to be forgotten
Google is defeated in EU Court, and agrees to
erase its personal data link upon consumers’
request.
In Japan, Google is defeated and erases its
personal data link upon consumers’ request.
Legal issue but involve
some technical Issue
Current Situation around Privacy
3. Current Situation around Privacy
EU Data Protection Directive Regulation
2014/3/12
Notice and Consent may not work in Bigdata
(Schoenberger)
Accountability of Database provider
Notice and Consent should work (Cavoukian)
Putting you in control
Data protection first, not an afterthought
Privacy Data Ecosystem Trust Network
Technical Issues
4. Current Situation around Privacy
EU does not deem Japanese personal data law
is adequate for EU standard, and prohibits to
export EU citizen’s personal data to Japan.
Japanese government is moving towards
revision of Japanese personal data protection
law. One of the purpose is to get the adequacy.
Personal data can be transferred to the third
party without consent
if risk of re-identification is reduced.
Technical Issue
5. • OECD guideline revision, EU data protection
regulation,….
• A right to be forgotten:
– When you no longer want your data to be
processed and there are no legitimate grounds for
retaining it, the data will be deleted.
– This is about empowering individuals, not about
restricting freedom of the press.
– Legally, balance of these two is issue
• Easier access to your own data:
– Much more technical issue
6. ◆When DB without personal ID works
as anonymized DB?
◆Data source person can accessed or
erased his/her own data in anonymized
DB without personal ID ?
◆ Does anonymization have side effect?
Then three of the technical problems of anonymized
data are:
7. Part 1
◆When DB without personal ID
works as anonymized DB?
“Anonymize” means deleting personal ID and
maybe something like k-anonymity
Here, personal data consists of
(ID, Quasi ID, Other date(including sensitive data).
8. ◆When anonymity works?
Classic categorizations
• (ID, Quasi ID, other data).
• Quasi ID(address, age, sex, etc.)
No QIDs QIDs
Whose data is stored
in DB is unknown
Unknown & no QID Unknown & QID
Whose data is stored
in DB is known
Known & no QID Known & QID
9. New Categories
– Suppose that personal ID, such as name is deleted
• Known DB: Whether a specified person’s personal data is
stored in DB is definitely known.
• P Known DB: Whether a specified person’s personal data is
stored in DB is probabilistically known.
• Unknown DB: Whether a specified person’s personal data is
stored in DB is not known.
– These categorization has not got enough attention.
10. Known,
Probabilistically Known (P Known),
Unknown
• Some outsider is able to observe the personal data
gathering process.
then observed person’s personal data is known to be
stored in DB
Such as using train boarding pass or buying wine at a liquor shop.
Known DB is the DB consists observable personal action
If some one opt-out from “known DB”, it becomes P Known.
P Known DB is built with sampled personal data from the
original DB.
We only know probabilistically whether a specified person’s
data is stored in the DB
11. k-anonymized DB
Known/P Known
sampling and k-anonymity
• To protect private data in personal data from the third party
– (1) Transfer DB of randomly sampled data, or statistics of the
whole known DB, to the third party
– (2) Transfer k-anonymized DB the third party
The whole known DB Sampled
DB
(1) Randomly
sampled data
=P Known
(2) k-anonymize
=Known
12. Other personal data makes things worse
Because, other personal data can be used as
Quasi ID
Two aspects
Traditional view:QID+personal data whose
gathering process is not observed by other people
Current view: QID+personal data whose
gathering process can be observed by other
people It is even problematic to transfer the
third party this type of data without ID and QID.
13. When anonymized DB works?
No ID & No quasi ID No ID but some quasi IDs
Whose data is stored
in DB is unknown
(Unknown DB)
not personal data
Unknown& QID
k-anonymity works
Whose data is stored
in DB is
probabilistically
Known
(P Unknown DB)
Such as Sampled DB
P Known & no QID
The risk depends on
sampling rate.
P Known & QID
k-anonymity may work.
The risk depends both on
sampling rate and granularity
of QID, such as data gathering
frequency.
Whose data is stored
in DB is known
(Known DB)
Known & no QID
If personal history of
location is used as PID, k-
anonymity degrade the
value of data too much .
Known & QID
Quite risky
14. Summary
If personal data gathering action can be
observed by other people, k-anonymity
severely degrades the value of data.
If personal data gathering action can not be
observed by other people,
in no QID case, k-anonymity is not needed
In case of QID included, k-anonymity of QID may
work.
15. Part 2
◆Data source person can
accessed or erased his/her own
data in anonymized DB without
personal ID ?
16. Traditional view:QID+unobserved personal
data
ID QID Sensitive data Other data
name Address, age,sex Disease, …
ID pseudonym
name a123x
pseudonym Q ID Sensitive data Other data
a123x Address, age,sex Disease, …
split
Other DB including ID,
QID
Matching these two DBs may
enable to link sensitive data and
ID even without pseudonym
17. Access request from
To keep privacy stricter, pseudonym is frequently changed. But access is possible
with pseudonym data base.
ID(name, etc.) Other personal data
ID
(name, etc.)
Pseudonym
(ex. A123B )
Pseudonym
(ex. A123B )
Other personal data
This table is
strictly controlled
Data mining is done only
on this data, safe
If access is required, DB
manager connects ID and
other personal data with
Psesudonym table
ID
(name, etc.)
Pseu:A123B4
Pseu:C1263B
Pseu:X91234
Pseu:Z12345
Pseu:A123B4 Other personal data:1
Pseu:C1263B Other personal data:2
Pseu:X91234 Other personal data:3
Pseu:Z12345 Other personal data:4
split
No k-anonymity cases
18. DB manager
What is distributed to third parties is the DB without ID, but…
This person requests to access his data, DB manager requests these four
pseudonym. Then, the third party realize these four are of the same person’s data!
This table is not transferred to
any one outside
ID
name
pseudo:A123B4
pseudo:C1263B
pseudo:X91234
pseudo:Z12345
pseudo:A123B4 Personal data :1
pseudo:C1263B Personal data :2
pseudo:X91234 Personal data :3
pseudo:Z12345 Personal data :4
Third parties only receive
this part of DB
pseudo:A123B4
pseudo:C1263B
pseudo:X91234
pseudo:Z12345
Personal data :1
Personal data :2
Personal data :3
Personal data :4
To remedy this situation, DB
manager add many other
unrelated person’s pseudonyms
19. Because, obviously, adding unrelated person’s
pseudonyms does not work.
In erasure case, if the third party is malicious,
we do not have any protection methods that
works.
But rectification and erasure request are more
difficult
20. Access is possible in k-anonymity
ID Pseudo
Bob a12
Bill b23
Chris c34
Pseudo QID sensitive
a12 xxx flu
b23 xxx obesity
c34 xxx diabetes
DB manager A
Service provider :B who received
3-anonymized data from A
Bob
②request
for access
to personal
data about
(a12,b23,
c34)
④show Bob the data
corresponding to his data = a12’s
data
③3 persons’
sensitive data
Request for access
21. Erasure request for k-anonymized DB
makes trouble
ID pseudo
Bob b23
Bill c34
pseudo QID sensitive
b23 xxx High blood press
c34 xxx Cancer
DB manager who makes
2-anonymity DB
①request for erase
Its Bill.
Erase my
data.
2-anonymity collapses.
1-anonymity? No kidding!
Re-build k-anonymity? Oh ,NO!
Third Party who has
only 2-anonymized DB
②request to erase c34 data
22. Three solutions
• Erasing one person’s data collapses k-anonymity.
Solution1:Do k-anonymize DB again, but consuming too
time, and need to distribute new k-anon. DB, too costly!X
Solution 2:Erase k persons’ data altogether if one of them is
erased. seemingly OK
Degrade the quality of DB or accuracy of data mining from the DB
Solution 3:If beforehand, we use k+α-anonymity, then DB is
still k-anonymity after erasing α persons’ date
probably OK
However, if α is not small, the quality of DB of k+α-anonymity is
degraded.
24. name age gen Address(number, street
name, ward name)
Location at some
time
Alex 35 M 101 Hongo, Bunkyo consumer finance: K
Bill 30 M 120 Yushima, Bunkyo University T
Ken 33 M 312 Yayoi, Bunkyo University T
Paul 39 M 421 Sendagi, Bunkyo Hospital Y
Name(anonym) age gen address Location at some time
Alex 30 M Bunkyo consumer finance: K
Bill 30 M Bunkyo University T
Ken 30 M Bunkyo University T
Paul 30 M Bunkyo Hospital Y
4-Anonymize
A,B,K,P are not regarded as distinct person,
Then all four are suspected to visit consumer
finance: K (meaning not good financially)
Side effect of K-anonymity
25. Location k-anonymizing can triggers false light
k-anonymized area:
k persons in it
consumer
finance
shop: C
This student is seeking job
now. If he is suspected to go
to a consumer finance shop,
it does no-good effect for
his job finding activity
False Light
26. Location k-anonymizing can triggers false light
is remedied by dividing shop C into 4 areas
k-anonymized area:
k persons in it
consumer
finance
shop: C
27. Only one person is at consumer finance shop: C among all k
persons in a k-anonymized area
Suspecting a person at shop C is not reasonable
k-anonymized area:
k persons in it
consumer
finance
shop: C
False Light
k-anonymized area:
k persons in it
k-anonymized area:
k persons in it
k-anonymized area:
k persons in it
28. (#of Person at shop C)/k
Subjective Probability of suspecting
Something Wrong
1
0
1
Subjective Prob. of
suspecting a person went
to shop C
Expected damage
Expected damage
estimated by the
third person
Needed money for
Precaution
This area is almost free from false
light . The problem is how to select
k to confined into this area!
29. Summary
• There is a side effect of k-anonymity, so called
false light.
• In k-anonymity in location, the side effect is
reduced by reorganizing k-anonymity area.