SlideShare a Scribd company logo
1 of 29
Download to read offline
Problems in Technology to Use
Anonymized Personal Data
Hiroshi Nakagawa
Information Technology Center
The University of Tokyo
OECD guideline will be revised, and one of
the important point is:
A right to be forgotten
Google is defeated in EU Court, and agrees to
erase its personal data link upon consumers’
request.
In Japan, Google is defeated and erases its
personal data link upon consumers’ request.
Legal issue but involve
some technical Issue
Current Situation around Privacy
Current Situation around Privacy
 EU Data Protection Directive  Regulation
2014/3/12
 Notice and Consent may not work in Bigdata
(Schoenberger)
 Accountability of Database provider
 Notice and Consent should work (Cavoukian)
 Putting you in control
 Data protection first, not an afterthought
 Privacy Data Ecosystem  Trust Network
Technical Issues
Current Situation around Privacy
EU does not deem Japanese personal data law
is adequate for EU standard, and prohibits to
export EU citizen’s personal data to Japan.
Japanese government is moving towards
revision of Japanese personal data protection
law. One of the purpose is to get the adequacy.
Personal data can be transferred to the third
party without consent
if risk of re-identification is reduced.
Technical Issue
• OECD guideline revision, EU data protection
regulation,….
• A right to be forgotten:
– When you no longer want your data to be
processed and there are no legitimate grounds for
retaining it, the data will be deleted.
– This is about empowering individuals, not about
restricting freedom of the press.
– Legally, balance of these two is issue
• Easier access to your own data:
– Much more technical issue
◆When DB without personal ID works
as anonymized DB?
◆Data source person can accessed or
erased his/her own data in anonymized
DB without personal ID ?
◆ Does anonymization have side effect?
Then three of the technical problems of anonymized
data are:
Part 1
◆When DB without personal ID
works as anonymized DB?
“Anonymize” means deleting personal ID and
maybe something like k-anonymity
Here, personal data consists of
(ID, Quasi ID, Other date(including sensitive data).
◆When anonymity works?
Classic categorizations
• (ID, Quasi ID, other data).
• Quasi ID(address, age, sex, etc.)
No QIDs QIDs
Whose data is stored
in DB is unknown
Unknown & no QID Unknown & QID
Whose data is stored
in DB is known
Known & no QID Known & QID
New Categories
– Suppose that personal ID, such as name is deleted
• Known DB: Whether a specified person’s personal data is
stored in DB is definitely known.
• P Known DB: Whether a specified person’s personal data is
stored in DB is probabilistically known.
• Unknown DB: Whether a specified person’s personal data is
stored in DB is not known.
– These categorization has not got enough attention.
Known,
Probabilistically Known (P Known),
Unknown
• Some outsider is able to observe the personal data
gathering process.
  then observed person’s personal data is known to be
stored in DB
 Such as using train boarding pass or buying wine at a liquor shop.
 Known DB is the DB consists observable personal action
 If some one opt-out from “known DB”, it becomes P Known.
 P Known DB is built with sampled personal data from the
original DB.
 We only know probabilistically whether a specified person’s
data is stored in the DB
k-anonymized DB
Known/P Known
sampling and k-anonymity
• To protect private data in personal data from the third party
– (1) Transfer DB of randomly sampled data, or statistics of the
whole known DB, to the third party
– (2) Transfer k-anonymized DB the third party
The whole known DB Sampled
DB
(1) Randomly
sampled data
=P Known
(2) k-anonymize
=Known
Other personal data makes things worse
Because, other personal data can be used as
Quasi ID
Two aspects
Traditional view:QID+personal data whose
gathering process is not observed by other people
Current view: QID+personal data whose
gathering process can be observed by other
people  It is even problematic to transfer the
third party this type of data without ID and QID.
When anonymized DB works?
No ID & No quasi ID No ID but some quasi IDs
Whose data is stored
in DB is unknown
(Unknown DB)
not personal data
Unknown& QID
k-anonymity works
Whose data is stored
in DB is
probabilistically
Known
(P Unknown DB)
Such as Sampled DB
P Known & no QID
The risk depends on
sampling rate.
P Known & QID
k-anonymity may work.
The risk depends both on
sampling rate and granularity
of QID, such as data gathering
frequency.
Whose data is stored
in DB is known
(Known DB)
Known & no QID
If personal history of
location is used as PID, k-
anonymity degrade the
value of data too much .
Known & QID
Quite risky
Summary
If personal data gathering action can be
observed by other people, k-anonymity
severely degrades the value of data.
If personal data gathering action can not be
observed by other people,
in no QID case, k-anonymity is not needed
In case of QID included, k-anonymity of QID may
work.
Part 2
◆Data source person can
accessed or erased his/her own
data in anonymized DB without
personal ID ?
Traditional view:QID+unobserved personal
data
ID QID Sensitive data Other data
name Address, age,sex Disease, …
ID pseudonym
name a123x
pseudonym Q ID Sensitive data Other data
a123x Address, age,sex Disease, …
split
Other DB including ID,
QID
Matching these two DBs may
enable to link sensitive data and
ID even without pseudonym
Access request from
To keep privacy stricter, pseudonym is frequently changed. But access is possible
with pseudonym data base.
ID(name, etc.) Other personal data
ID
(name, etc.)
Pseudonym
(ex. A123B )
Pseudonym
(ex. A123B )
Other personal data
This table is
strictly controlled
Data mining is done only
on this data,  safe
If access is required, DB
manager connects ID and
other personal data with
Psesudonym table
ID
(name, etc.)
Pseu:A123B4
Pseu:C1263B
Pseu:X91234
Pseu:Z12345
Pseu:A123B4 Other personal data:1
Pseu:C1263B Other personal data:2
Pseu:X91234 Other personal data:3
Pseu:Z12345 Other personal data:4
split
No k-anonymity cases
DB manager
What is distributed to third parties is the DB without ID, but…
This person requests to access his data, DB manager requests these four
pseudonym. Then, the third party realize these four are of the same person’s data!
This table is not transferred to
any one outside
ID
name
pseudo:A123B4
pseudo:C1263B
pseudo:X91234
pseudo:Z12345
pseudo:A123B4 Personal data :1
pseudo:C1263B Personal data :2
pseudo:X91234 Personal data :3
pseudo:Z12345 Personal data :4
Third parties only receive
this part of DB
pseudo:A123B4
pseudo:C1263B
pseudo:X91234
pseudo:Z12345
Personal data :1
Personal data :2
Personal data :3
Personal data :4
To remedy this situation, DB
manager add many other
unrelated person’s pseudonyms
Because, obviously, adding unrelated person’s
pseudonyms does not work.
In erasure case, if the third party is malicious,
we do not have any protection methods that
works.
But rectification and erasure request are more
difficult
Access is possible in k-anonymity
ID Pseudo
Bob a12
Bill b23
Chris c34
Pseudo QID sensitive
a12 xxx flu
b23 xxx obesity
c34 xxx diabetes
DB manager A
Service provider :B who received
3-anonymized data from A
Bob
②request
for access
to personal
data about
(a12,b23,
c34)
④show Bob the data
corresponding to his data = a12’s
data
③3 persons’
sensitive data
Request for access
Erasure request for k-anonymized DB
makes trouble
ID pseudo
Bob b23
Bill c34
pseudo QID sensitive
b23 xxx High blood press
c34 xxx Cancer
DB manager who makes
2-anonymity DB
①request for erase
Its Bill.
Erase my
data.
2-anonymity collapses.
 1-anonymity? No kidding!
Re-build k-anonymity? Oh ,NO!
Third Party who has
only 2-anonymized DB
②request to erase c34 data
Three solutions
• Erasing one person’s data collapses k-anonymity.
 Solution1:Do k-anonymize DB again, but consuming too
time, and need to distribute new k-anon. DB, too costly!X
 Solution 2:Erase k persons’ data altogether if one of them is
erased. seemingly OK
 Degrade the quality of DB or accuracy of data mining from the DB
 Solution 3:If beforehand, we use k+α-anonymity, then DB is
still k-anonymity after erasing α persons’ date
 probably OK
 However, if α is not small, the quality of DB of k+α-anonymity is
degraded.
Part 3
◆Does anonymization have side
effect?
k-anonymity of Location and
False Light
name age gen Address(number, street
name, ward name)
Location at some
time
Alex 35 M 101 Hongo, Bunkyo consumer finance: K
Bill 30 M 120 Yushima, Bunkyo University T
Ken 33 M 312 Yayoi, Bunkyo University T
Paul 39 M 421 Sendagi, Bunkyo Hospital Y
Name(anonym) age gen address Location at some time
Alex 30 M Bunkyo consumer finance: K
Bill 30 M Bunkyo University T
Ken 30 M Bunkyo University T
Paul 30 M Bunkyo Hospital Y
4-Anonymize
A,B,K,P are not regarded as distinct person,
Then all four are suspected to visit consumer
finance: K (meaning not good financially)
Side effect of K-anonymity
Location k-anonymizing can triggers false light
k-anonymized area:
k persons in it
consumer
finance
shop: C
This student is seeking job
now. If he is suspected to go
to a consumer finance shop,
it does no-good effect for
his job finding activity
False Light
Location k-anonymizing can triggers false light
is remedied by dividing shop C into 4 areas
k-anonymized area:
k persons in it
consumer
finance
shop: C
Only one person is at consumer finance shop: C among all k
persons in a k-anonymized area
 Suspecting a person at shop C is not reasonable
k-anonymized area:
k persons in it
consumer
finance
shop: C
False Light
k-anonymized area:
k persons in it
k-anonymized area:
k persons in it
k-anonymized area:
k persons in it
(#of Person at shop C)/k
Subjective Probability of suspecting
Something Wrong
1
0
1
Subjective Prob. of
suspecting a person went
to shop C
Expected damage
Expected damage
estimated by the
third person
Needed money for
Precaution
This area is almost free from false
light . The problem is how to select
k to confined into this area!
Summary
• There is a side effect of k-anonymity, so called
false light.
• In k-anonymity in location, the side effect is
reduced by reorganizing k-anonymity area.

More Related Content

Viewers also liked

Query Processing with k-Anonymity
Query Processing with k-AnonymityQuery Processing with k-Anonymity
Query Processing with k-Anonymity
Waqas Tariq
 

Viewers also liked (20)

Lions, zebras and Big Data Anonymization
Lions, zebras and Big Data AnonymizationLions, zebras and Big Data Anonymization
Lions, zebras and Big Data Anonymization
 
Protecting sensitive labels in social network data anonymization
Protecting sensitive labels in social network data anonymizationProtecting sensitive labels in social network data anonymization
Protecting sensitive labels in social network data anonymization
 
Query Processing with k-Anonymity
Query Processing with k-AnonymityQuery Processing with k-Anonymity
Query Processing with k-Anonymity
 
2014人工知能学会大会および情報処理学会EIP研究会発表資料
2014人工知能学会大会および情報処理学会EIP研究会発表資料2014人工知能学会大会および情報処理学会EIP研究会発表資料
2014人工知能学会大会および情報処理学会EIP研究会発表資料
 
A Happy New Year 2016
A Happy New Year 2016A Happy New Year 2016
A Happy New Year 2016
 
データ利用における個人情報の保護
データ利用における個人情報の保護データ利用における個人情報の保護
データ利用における個人情報の保護
 
k-匿名化が誘発する濡れ衣:解決編
k-匿名化が誘発する濡れ衣:解決編k-匿名化が誘発する濡れ衣:解決編
k-匿名化が誘発する濡れ衣:解決編
 
匿名加工情報を使えないものか?(改訂版)
匿名加工情報を使えないものか?(改訂版)匿名加工情報を使えないものか?(改訂版)
匿名加工情報を使えないものか?(改訂版)
 
数式を使わないプライバシー保護技術
数式を使わないプライバシー保護技術数式を使わないプライバシー保護技術
数式を使わないプライバシー保護技術
 
未出現事象の出現確率
未出現事象の出現確率未出現事象の出現確率
未出現事象の出現確率
 
匿名化の技術的俯瞰ー匿名加工情報の観点から
匿名化の技術的俯瞰ー匿名加工情報の観点から匿名化の技術的俯瞰ー匿名加工情報の観点から
匿名化の技術的俯瞰ー匿名加工情報の観点から
 
時系列パーソナル・データの プライバシー
時系列パーソナル・データのプライバシー時系列パーソナル・データのプライバシー
時系列パーソナル・データの プライバシー
 
差分プライベート最小二乗密度比推定
差分プライベート最小二乗密度比推定差分プライベート最小二乗密度比推定
差分プライベート最小二乗密度比推定
 
プライバシー保護のためのサンプリング、k-匿名化、そして差分プライバシー
プライバシー保護のためのサンプリング、k-匿名化、そして差分プライバシープライバシー保護のためのサンプリング、k-匿名化、そして差分プライバシー
プライバシー保護のためのサンプリング、k-匿名化、そして差分プライバシー
 
シンギュラリティ以後
シンギュラリティ以後シンギュラリティ以後
シンギュラリティ以後
 
シンギュラリティ以前
シンギュラリティ以前シンギュラリティ以前
シンギュラリティ以前
 
プライバシー保護の法制と技術課題(2014年時点)
プライバシー保護の法制と技術課題(2014年時点)プライバシー保護の法制と技術課題(2014年時点)
プライバシー保護の法制と技術課題(2014年時点)
 
パーソナル履歴データに対する匿名化と再識別:SCIS2017
パーソナル履歴データに対する匿名化と再識別:SCIS2017パーソナル履歴データに対する匿名化と再識別:SCIS2017
パーソナル履歴データに対する匿名化と再識別:SCIS2017
 
クラシックな機械学習の入門 2.ベイズ統計に基づく推論
クラシックな機械学習の入門 2.ベイズ統計に基づく推論クラシックな機械学習の入門 2.ベイズ統計に基づく推論
クラシックな機械学習の入門 2.ベイズ統計に基づく推論
 
クラシックな機械学習の入門 3. 線形回帰および識別
クラシックな機械学習の入門 3. 線形回帰および識別クラシックな機械学習の入門 3. 線形回帰および識別
クラシックな機械学習の入門 3. 線形回帰および識別
 

Similar to Problems in Technology to Use Anonymized Personal Data

Similar to Problems in Technology to Use Anonymized Personal Data (20)

Protecting Privacy When Disclosing Information: K Anonymity and its Enforceme...
Protecting Privacy When Disclosing Information: K Anonymity and its Enforceme...Protecting Privacy When Disclosing Information: K Anonymity and its Enforceme...
Protecting Privacy When Disclosing Information: K Anonymity and its Enforceme...
 
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdfSFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
 
Data & Services / Service Design Drinks
Data & Services / Service Design DrinksData & Services / Service Design Drinks
Data & Services / Service Design Drinks
 
In:Confidence 2019 - Balancing the conflicting objectives of data access and ...
In:Confidence 2019 - Balancing the conflicting objectives of data access and ...In:Confidence 2019 - Balancing the conflicting objectives of data access and ...
In:Confidence 2019 - Balancing the conflicting objectives of data access and ...
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
What All Organisations Need to Know About Data Protection and Cloud Computing...
What All Organisations Need to Know About Data Protection and Cloud Computing...What All Organisations Need to Know About Data Protection and Cloud Computing...
What All Organisations Need to Know About Data Protection and Cloud Computing...
 
The Domains of Identity & Self-Sovereign Identity MyData 2018
The Domains of Identity & Self-Sovereign Identity MyData 2018The Domains of Identity & Self-Sovereign Identity MyData 2018
The Domains of Identity & Self-Sovereign Identity MyData 2018
 
earlegal #8 - Données à caractère personnel, anonymisation/pseudonymisation ?
earlegal #8 - Données à caractère personnel, anonymisation/pseudonymisation ?earlegal #8 - Données à caractère personnel, anonymisation/pseudonymisation ?
earlegal #8 - Données à caractère personnel, anonymisation/pseudonymisation ?
 
Feedback on Personal Data Protection Bill 2019
Feedback on Personal Data Protection Bill 2019Feedback on Personal Data Protection Bill 2019
Feedback on Personal Data Protection Bill 2019
 
Privacy solutions decode2021_jon_oliver
Privacy solutions decode2021_jon_oliverPrivacy solutions decode2021_jon_oliver
Privacy solutions decode2021_jon_oliver
 
Personal identifiable information vs attribute data
Personal identifiable information vs attribute data Personal identifiable information vs attribute data
Personal identifiable information vs attribute data
 
The Privacy Law Landscape: Issues for the research community
The Privacy Law Landscape: Issues for the research communityThe Privacy Law Landscape: Issues for the research community
The Privacy Law Landscape: Issues for the research community
 
INT 1010 07-4.pdf
INT 1010 07-4.pdfINT 1010 07-4.pdf
INT 1010 07-4.pdf
 
current-trends
current-trendscurrent-trends
current-trends
 
PII.pptx
PII.pptxPII.pptx
PII.pptx
 
Designing products and services with GDPR
Designing products and services with GDPRDesigning products and services with GDPR
Designing products and services with GDPR
 
3D's: Dating, Deception and Data Portability | Mozfest 2019
3D's: Dating, Deception and Data Portability | Mozfest 20193D's: Dating, Deception and Data Portability | Mozfest 2019
3D's: Dating, Deception and Data Portability | Mozfest 2019
 
Defi MOOC Fa21 - Decentralized Identity.pptx.pdf
Defi MOOC Fa21 - Decentralized Identity.pptx.pdfDefi MOOC Fa21 - Decentralized Identity.pptx.pdf
Defi MOOC Fa21 - Decentralized Identity.pptx.pdf
 
Statistical discolosure control
Statistical discolosure controlStatistical discolosure control
Statistical discolosure control
 
Database Applications and Implications.pdf
Database Applications and Implications.pdfDatabase Applications and Implications.pdf
Database Applications and Implications.pdf
 

More from Hiroshi Nakagawa

More from Hiroshi Nakagawa (20)

人工知能学会大会2020ーAI倫理とガバナンス
人工知能学会大会2020ーAI倫理とガバナンス人工知能学会大会2020ーAI倫理とガバナンス
人工知能学会大会2020ーAI倫理とガバナンス
 
信頼できるAI評価リスト パーソナルAIエージェントへの適用例
信頼できるAI評価リスト パーソナルAIエージェントへの適用例信頼できるAI評価リスト パーソナルAIエージェントへの適用例
信頼できるAI評価リスト パーソナルAIエージェントへの適用例
 
NICT-nakagawa2019Feb12
NICT-nakagawa2019Feb12NICT-nakagawa2019Feb12
NICT-nakagawa2019Feb12
 
情報ネットワーク法学会研究大会
情報ネットワーク法学会研究大会情報ネットワーク法学会研究大会
情報ネットワーク法学会研究大会
 
最近のAI倫理指針からの考察
最近のAI倫理指針からの考察最近のAI倫理指針からの考察
最近のAI倫理指針からの考察
 
AI and Accountability
AI and AccountabilityAI and Accountability
AI and Accountability
 
AI Forum-2019_Nakagawa
AI Forum-2019_NakagawaAI Forum-2019_Nakagawa
AI Forum-2019_Nakagawa
 
2019 3-9-nakagawa
2019 3-9-nakagawa2019 3-9-nakagawa
2019 3-9-nakagawa
 
CPDP2019 summary-report
CPDP2019 summary-reportCPDP2019 summary-report
CPDP2019 summary-report
 
情報法制研究所 第5回情報法セミナー:人工知能倫理と法制度、社会
情報法制研究所 第5回情報法セミナー:人工知能倫理と法制度、社会情報法制研究所 第5回情報法セミナー:人工知能倫理と法制度、社会
情報法制研究所 第5回情報法セミナー:人工知能倫理と法制度、社会
 
Ai e-accountability
Ai e-accountabilityAi e-accountability
Ai e-accountability
 
自動運転と道路沿い情報インフラ
自動運転と道路沿い情報インフラ自動運転と道路沿い情報インフラ
自動運転と道路沿い情報インフラ
 
暗号化によるデータマイニングと個人情報保護
暗号化によるデータマイニングと個人情報保護暗号化によるデータマイニングと個人情報保護
暗号化によるデータマイニングと個人情報保護
 
Defamation Caused by Anonymization
Defamation Caused by AnonymizationDefamation Caused by Anonymization
Defamation Caused by Anonymization
 
人工知能と社会
人工知能と社会人工知能と社会
人工知能と社会
 
人工知能学会合同研究会2017-汎用人工知能研究会(SIG-AGI)招待講演
人工知能学会合同研究会2017-汎用人工知能研究会(SIG-AGI)招待講演人工知能学会合同研究会2017-汎用人工知能研究会(SIG-AGI)招待講演
人工知能学会合同研究会2017-汎用人工知能研究会(SIG-AGI)招待講演
 
情報ネットワーク法学会2017大会第8分科会発表資料
情報ネットワーク法学会2017大会第8分科会発表資料情報ネットワーク法学会2017大会第8分科会発表資料
情報ネットワーク法学会2017大会第8分科会発表資料
 
学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」
学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」
学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」
 
AI社会論研究会
AI社会論研究会AI社会論研究会
AI社会論研究会
 
Social Effects by the Singularity -Pre-Singularity Era-
Social Effects by the Singularity  -Pre-Singularity Era-Social Effects by the Singularity  -Pre-Singularity Era-
Social Effects by the Singularity -Pre-Singularity Era-
 

Recently uploaded

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Recently uploaded (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
 

Problems in Technology to Use Anonymized Personal Data

  • 1. Problems in Technology to Use Anonymized Personal Data Hiroshi Nakagawa Information Technology Center The University of Tokyo
  • 2. OECD guideline will be revised, and one of the important point is: A right to be forgotten Google is defeated in EU Court, and agrees to erase its personal data link upon consumers’ request. In Japan, Google is defeated and erases its personal data link upon consumers’ request. Legal issue but involve some technical Issue Current Situation around Privacy
  • 3. Current Situation around Privacy  EU Data Protection Directive  Regulation 2014/3/12  Notice and Consent may not work in Bigdata (Schoenberger)  Accountability of Database provider  Notice and Consent should work (Cavoukian)  Putting you in control  Data protection first, not an afterthought  Privacy Data Ecosystem  Trust Network Technical Issues
  • 4. Current Situation around Privacy EU does not deem Japanese personal data law is adequate for EU standard, and prohibits to export EU citizen’s personal data to Japan. Japanese government is moving towards revision of Japanese personal data protection law. One of the purpose is to get the adequacy. Personal data can be transferred to the third party without consent if risk of re-identification is reduced. Technical Issue
  • 5. • OECD guideline revision, EU data protection regulation,…. • A right to be forgotten: – When you no longer want your data to be processed and there are no legitimate grounds for retaining it, the data will be deleted. – This is about empowering individuals, not about restricting freedom of the press. – Legally, balance of these two is issue • Easier access to your own data: – Much more technical issue
  • 6. ◆When DB without personal ID works as anonymized DB? ◆Data source person can accessed or erased his/her own data in anonymized DB without personal ID ? ◆ Does anonymization have side effect? Then three of the technical problems of anonymized data are:
  • 7. Part 1 ◆When DB without personal ID works as anonymized DB? “Anonymize” means deleting personal ID and maybe something like k-anonymity Here, personal data consists of (ID, Quasi ID, Other date(including sensitive data).
  • 8. ◆When anonymity works? Classic categorizations • (ID, Quasi ID, other data). • Quasi ID(address, age, sex, etc.) No QIDs QIDs Whose data is stored in DB is unknown Unknown & no QID Unknown & QID Whose data is stored in DB is known Known & no QID Known & QID
  • 9. New Categories – Suppose that personal ID, such as name is deleted • Known DB: Whether a specified person’s personal data is stored in DB is definitely known. • P Known DB: Whether a specified person’s personal data is stored in DB is probabilistically known. • Unknown DB: Whether a specified person’s personal data is stored in DB is not known. – These categorization has not got enough attention.
  • 10. Known, Probabilistically Known (P Known), Unknown • Some outsider is able to observe the personal data gathering process.   then observed person’s personal data is known to be stored in DB  Such as using train boarding pass or buying wine at a liquor shop.  Known DB is the DB consists observable personal action  If some one opt-out from “known DB”, it becomes P Known.  P Known DB is built with sampled personal data from the original DB.  We only know probabilistically whether a specified person’s data is stored in the DB
  • 11. k-anonymized DB Known/P Known sampling and k-anonymity • To protect private data in personal data from the third party – (1) Transfer DB of randomly sampled data, or statistics of the whole known DB, to the third party – (2) Transfer k-anonymized DB the third party The whole known DB Sampled DB (1) Randomly sampled data =P Known (2) k-anonymize =Known
  • 12. Other personal data makes things worse Because, other personal data can be used as Quasi ID Two aspects Traditional view:QID+personal data whose gathering process is not observed by other people Current view: QID+personal data whose gathering process can be observed by other people  It is even problematic to transfer the third party this type of data without ID and QID.
  • 13. When anonymized DB works? No ID & No quasi ID No ID but some quasi IDs Whose data is stored in DB is unknown (Unknown DB) not personal data Unknown& QID k-anonymity works Whose data is stored in DB is probabilistically Known (P Unknown DB) Such as Sampled DB P Known & no QID The risk depends on sampling rate. P Known & QID k-anonymity may work. The risk depends both on sampling rate and granularity of QID, such as data gathering frequency. Whose data is stored in DB is known (Known DB) Known & no QID If personal history of location is used as PID, k- anonymity degrade the value of data too much . Known & QID Quite risky
  • 14. Summary If personal data gathering action can be observed by other people, k-anonymity severely degrades the value of data. If personal data gathering action can not be observed by other people, in no QID case, k-anonymity is not needed In case of QID included, k-anonymity of QID may work.
  • 15. Part 2 ◆Data source person can accessed or erased his/her own data in anonymized DB without personal ID ?
  • 16. Traditional view:QID+unobserved personal data ID QID Sensitive data Other data name Address, age,sex Disease, … ID pseudonym name a123x pseudonym Q ID Sensitive data Other data a123x Address, age,sex Disease, … split Other DB including ID, QID Matching these two DBs may enable to link sensitive data and ID even without pseudonym
  • 17. Access request from To keep privacy stricter, pseudonym is frequently changed. But access is possible with pseudonym data base. ID(name, etc.) Other personal data ID (name, etc.) Pseudonym (ex. A123B ) Pseudonym (ex. A123B ) Other personal data This table is strictly controlled Data mining is done only on this data,  safe If access is required, DB manager connects ID and other personal data with Psesudonym table ID (name, etc.) Pseu:A123B4 Pseu:C1263B Pseu:X91234 Pseu:Z12345 Pseu:A123B4 Other personal data:1 Pseu:C1263B Other personal data:2 Pseu:X91234 Other personal data:3 Pseu:Z12345 Other personal data:4 split No k-anonymity cases
  • 18. DB manager What is distributed to third parties is the DB without ID, but… This person requests to access his data, DB manager requests these four pseudonym. Then, the third party realize these four are of the same person’s data! This table is not transferred to any one outside ID name pseudo:A123B4 pseudo:C1263B pseudo:X91234 pseudo:Z12345 pseudo:A123B4 Personal data :1 pseudo:C1263B Personal data :2 pseudo:X91234 Personal data :3 pseudo:Z12345 Personal data :4 Third parties only receive this part of DB pseudo:A123B4 pseudo:C1263B pseudo:X91234 pseudo:Z12345 Personal data :1 Personal data :2 Personal data :3 Personal data :4 To remedy this situation, DB manager add many other unrelated person’s pseudonyms
  • 19. Because, obviously, adding unrelated person’s pseudonyms does not work. In erasure case, if the third party is malicious, we do not have any protection methods that works. But rectification and erasure request are more difficult
  • 20. Access is possible in k-anonymity ID Pseudo Bob a12 Bill b23 Chris c34 Pseudo QID sensitive a12 xxx flu b23 xxx obesity c34 xxx diabetes DB manager A Service provider :B who received 3-anonymized data from A Bob ②request for access to personal data about (a12,b23, c34) ④show Bob the data corresponding to his data = a12’s data ③3 persons’ sensitive data Request for access
  • 21. Erasure request for k-anonymized DB makes trouble ID pseudo Bob b23 Bill c34 pseudo QID sensitive b23 xxx High blood press c34 xxx Cancer DB manager who makes 2-anonymity DB ①request for erase Its Bill. Erase my data. 2-anonymity collapses.  1-anonymity? No kidding! Re-build k-anonymity? Oh ,NO! Third Party who has only 2-anonymized DB ②request to erase c34 data
  • 22. Three solutions • Erasing one person’s data collapses k-anonymity.  Solution1:Do k-anonymize DB again, but consuming too time, and need to distribute new k-anon. DB, too costly!X  Solution 2:Erase k persons’ data altogether if one of them is erased. seemingly OK  Degrade the quality of DB or accuracy of data mining from the DB  Solution 3:If beforehand, we use k+α-anonymity, then DB is still k-anonymity after erasing α persons’ date  probably OK  However, if α is not small, the quality of DB of k+α-anonymity is degraded.
  • 23. Part 3 ◆Does anonymization have side effect? k-anonymity of Location and False Light
  • 24. name age gen Address(number, street name, ward name) Location at some time Alex 35 M 101 Hongo, Bunkyo consumer finance: K Bill 30 M 120 Yushima, Bunkyo University T Ken 33 M 312 Yayoi, Bunkyo University T Paul 39 M 421 Sendagi, Bunkyo Hospital Y Name(anonym) age gen address Location at some time Alex 30 M Bunkyo consumer finance: K Bill 30 M Bunkyo University T Ken 30 M Bunkyo University T Paul 30 M Bunkyo Hospital Y 4-Anonymize A,B,K,P are not regarded as distinct person, Then all four are suspected to visit consumer finance: K (meaning not good financially) Side effect of K-anonymity
  • 25. Location k-anonymizing can triggers false light k-anonymized area: k persons in it consumer finance shop: C This student is seeking job now. If he is suspected to go to a consumer finance shop, it does no-good effect for his job finding activity False Light
  • 26. Location k-anonymizing can triggers false light is remedied by dividing shop C into 4 areas k-anonymized area: k persons in it consumer finance shop: C
  • 27. Only one person is at consumer finance shop: C among all k persons in a k-anonymized area  Suspecting a person at shop C is not reasonable k-anonymized area: k persons in it consumer finance shop: C False Light k-anonymized area: k persons in it k-anonymized area: k persons in it k-anonymized area: k persons in it
  • 28. (#of Person at shop C)/k Subjective Probability of suspecting Something Wrong 1 0 1 Subjective Prob. of suspecting a person went to shop C Expected damage Expected damage estimated by the third person Needed money for Precaution This area is almost free from false light . The problem is how to select k to confined into this area!
  • 29. Summary • There is a side effect of k-anonymity, so called false light. • In k-anonymity in location, the side effect is reduced by reorganizing k-anonymity area.