22. Definition: Privacy and Confidentially
• Privacy
• Privacy encompasses not only the famous ‘right to be left alone’ or
keeping one’s personal matters and relationships secret, but also
the ability to share information selectivity but not publicly.
• Confidentially
• Confidentiality is preserving authorized restrictions on information
access and disclosure, including means for protecting personal
privacy and proprietary information.
30. The Importance of activity in the tails
• The Latest Data indicate that more than 20 percent of all
personal health care spending in 2009 ($275 billion) was on
behalf of just 1 percent of the population.
35. Knowledge is Power
• “Big Data” has great potential to benefit society. At the same
time, its availability creates significant potential for mistaken,
misguided or malevolent uses of personal information.
• The conundrum for the law is to provide space for big data to
fulfill its potential for social benefit, while protecting citizens
adequately from related individual and social harms. Current
privacy law evolved to address different concerns and must
be adapted to confront big data’s challenges.”
36. 従来 (またはビッグデータ時代以前) の
データセット
• PII 情報の管理さえに留意していれば、データの接合でプライ
バシーが流出することは防げていた
• PII (Personal Identifiable Information)
• Any Information About an individual maintained by an agency,
including (1) any information that can be used to distinguish or trace
an individual’s identity, such as name, social security number, data
and place of birth, mother’s maiden name, or biometric records; and
(2) any other information that is linked or linkable to an individual,
such as medical, educational, financial, and employment information.
• 日本の場合
• 保険番号, パスポート番号, 名前, 住所, マイナンバー(ここ数年)
37. データバイアス
• リサーチクエスチョンに正しく
対応しないデータセットを選ん
でしまう危険性
• 対照群 (control group) が設定
されていない危険性
• “Similarly, overreliance on, say,
Twitter Data, in targeting
resources after harricanes can
lead to misallocation of
resources towards young,
Internet-savvy people with cell
phones and away from elderly
or impoverished
neighbourhoods”
https://azanaerunawano5to4.hatenablog.com/
entry/2015/09/03/101948
40. Statistical Disclosure control Techniques
• Statistical Disclosure Control
• Concepts and Methods that ensure the confidentiality of micro and
aggregated that are to be published. It is methodology used to
design statistical outputs in a way that someone with access to that
output cannot relate a known individual (or other responding unit) to
an element in the output.
42. Research Data Centers
• 特定のデータセットを, SaaS
形式で提供する
• 個人の研究者が、ローカルに
データを保持する必要性が生
じない
• マスクあるいは処理された
データのみを入手可能
• 日本だと限定的
• ヨーロッパだとRISISが代表
的
43. ビッグデータを匿名化することは可能か?
• “It is also nearly impossible to anonymize data. Big Data are
often structured in such a way that essentially everyone in
the file is unique, either because so many variables exist or
because they are so frequent or geographically detailed, that
they make it easy to reidentify individual pattarns.”
• “There are no data stewards controlling access to individual
data. Data are often so interconnected (think social media
network data) that one person’s action can disclose
information about another person without that person even
knowing that their data are being accessed.”
46. 個人のデータを如何に保護するか?
• “Rather than attempt to deanonymize medical records, for
instance, an attacker (or commercial actor) might instead
infer a rule that relates a string of more easily observable or
accessible indicators to a specific medical condition,
rendering large populations vulnerable to such inferences
even in the absence of PII. Ironically, this is often the very
thing about big data that generate the most excitement: the
capability to detect subtle correlations and draw actionable
inferences. But it is this same feature that renders the
traditional protections afforded by anonymity (again, more
accurately, pseudosymmetry) much less effective.”
47. 個人のデータを如何に保護するか?
(cont.)
• The Value of Anonymity inheres not in namelessness, and
not even in the extension of the previous value of
namelessness to all uniquely identifying information, but
instead to something we called “reachability, ” the possibility
of knocking on your door, hauling you out of bed, calling your
phone number, threatening you with sanction, holding you
accountable – with or without access to identifying
information.
49. Legal and Ethical Framework
• “The Most Data are housed no longer in statistical agencies,
with well-defined rules of conduct, but in businesses or
administrative agencies. In addition, since digital data can be
alive forever, ownership could be claimed by yet-to-be-born
relatives whose personal privacy could be threatened by
release of information about blood relations.”
• “Traditional regulatory tools for managing privacy, notice, and
consent have failed to provide a viable market mechanism
allowing a form of self-regulation governing industry data
collection”
50. Legal and Ethical Framework (cont.)
• (1) Rules take into account the varying levels of inherent risk
to individuals across different data sets
• (2) traditional definitions of PII need to be rethought
• (3) regulation has a role in creating and policing walls
between data sets
• (4) those analyzing big data must be reminded, with a
frequency in proportion to the sensitivity of the data, that
they are dealing with people
• (5) the ethics of big data research must be an open topic for
continual reassessment.
79. References (for today’s lecture)
• Chapter 11: Privacy and Confidentiality in Big Data and
Social Science, Stefan Bender, Deutsche Bundesbank, Ron S.
Jarmin, US Census Bureau, Frauke Kreuter, University of
Maryland, Julia Lane, NYU