SlideShare a Scribd company logo
1 of 43
Prepared for
MIT Libraries Program on Information Research
Brown Bag Talk
September 2013

Managing Confidential Information –
Trends and Approaches
Dr. Micah Altman
<escience@mit.edu>
Director of Research, MIT Libraries
Standard Disclaimer
These opinions are my own, they are not the opinions
of MIT, Brookings, any of the project funders, nor (with
the exception of co-authored previously published
work) my collaborators
Secondary disclaimer:

“It’s tough to make predictions, especially about the
future!”
-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill,

Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi,
Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle,
George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White,
etc.

Information Privacy Across the Research Lifecycle
Collaborators & Co-Conspirators
• Privacy Tools for Sharing Research Data
Team
(Salil Vadhan, P.I.)
http://privacytools.seas.harvard.edu/peopl
e
• Research Support
Supported in part by NSF grant CNS-1237235

Information Privacy Across the Research Lifecycle
Related Work
Main Project:
• Privacy Tools for Sharing Research Data
http://privacytools.seas.harvard.edu/
.

Related publications:
• Novak, K., Altman, M., Broch, E., Carroll, J. M., Clemins, P. J., Fournier, D.,
Laevart, C., et al. (2011). Communicating Science and Engineering Data in the
Information Age. Computer Science and Telecommunications. National
Academies Press
• Vadhan, S. , et al. 2010. “Re: Advance Notice of Proposed Rulemaking: Human
Subjects Research Protections”. Available from:
http://dataprivacylab.org/projects/irb/Vadhan.pdf
• Altman, M. (2012). “Mitigating Threats To Data Quality Throughout the Curation
Lifecycle. In G. Marciano, C. Lee, & H. Bowden (Eds.), Curating For Quality.
datacuration.web.unc.edu

These slides & Privacy Across the Research Lifecycle from:
most reprints available
Information
informatics.mit.edu
Level Setting

Information Privacy Across the Research Lifecycle
Identifying Information Is Common
•

Includes information from a variety of sources,
such as…
– Research data, even if you aren’t the original
collector
– Student “records” such as e-mail, grades
– Logs from web-servers, other systems

•

Lots of things are potentially identifying:
– Under some federal laws: IP addresses, dates,
zipcodes, …
– Birth date + zipcode + gender uniquely identify ~87%
of people in the U.S.
[Sweeney 2002]
Try it: http://aboutmyinfo.org/index.html
– With date and place of birth, can guess first five
digits of social security number (SSN) > 60% of the
time. (Can guess the whole thing in under 10 tries,
for a significant minority of people.) [Aquisti & Gross
2009]
– Analysis of writing style or eclectic tastes has been
used to identify individuals

•

Tables, graphs and maps can also reveal
identifiable information

Brownstein, et al., 2006 , NEJM 355(16),

Information Privacy Across the Research Lifecycle
Some Sources of Confidentiality Restrictions for
University Held Research and Education Information

• Overlapping laws
• Different laws
apply to different
cases
• Additional data
usage agreements
and license terms
apply

Information Privacy Across the Research Lifecycle
Different Requirements and Definitions
FERPA

HIPAA

Common Rule

MA 201 CMR 17

Coverage

Students in
Educational
Institutions

Medical Information
in “Covered Entities”

Living persons in
research by
funded
institutions

Mass. Residents

Identification
Criteria

-Direct
-Indirect
-Linked
-Bad intent (!)

-Direct
-Indirect
-Linked

-Direct
-Indirect
-Linked

-Direct

Sensitivity
Criteria

Any non-directory
information

Any medical
information

Private
information –
based on harm

Financial, State,
Federal Identifiers

Management
Requirements

- Directory opt-out
- [Implied] good
practice

- Consent
- Specific technical
safeguards
- Breach notification

- Consent
- [Implied] risk
minimization

- Specific
technical
safeguards
- Breach
notification

Information Privacy Across the Research Lifecycle
*

* 2010
Information Privacy Across the Research Lifecycle
Recognized Benefits of Data Sharing
• Pioneering NRC report [Fienberg, et. al 1985] on
data sharing recommended:
– Sharing data should be a regular practice.
– Investigators should share their data by the time of
publication of initial major results of analyses of the
data except in compelling circumstances.
– Data relevant to public policy should be shared as
quickly and widely as possible.
– Plans for data sharing should be an integral part of a
research plan whenever data sharing is feasible.

• Numerous subsequent reports recommend data
sharing.
Information Privacy Across the Research Lifecycle
Private Information & Information Services
• Recommendations
• Annotations & Tagging
• Class discussion forum

• Social Highlighting
Information Privacy Across the Research Lifecycle
Access Control Model
Access Control

Resource
Auditing

Client

Authorization

Credentials

Authentication

Request/Respo
nse

Log

Resource Control Model
External Auditor

Information Privacy Across the Research Lifecycle
Disclosure Limitation
Data InputOutput Model
Contingency table
“The correlation between X and
Y was large and statistically
significant”

Summary statistics

DATA

Information Visualization

DATA

* Jones

*

* 1961

021*

* Jones

*

* 1961

021*

* Jones

*

* 1972

9404*

* Jones

*

* 1972

9404*

* Jones

*

* 1972

9404*

Public use sample microdata

Published Outputs
Information Privacy Across the Research Lifecycle
Example

Information Privacy Across the Research Lifecycle
Exemplar: Social Media Analysis
Attribute Type

Examples

Data: Structure

-

network

Data: Attribute Types

-

Continuous/Discrete/
Scale: ratio/interval/ordinal/nominal

Data: Performance
Characteristics

-

10M-1B observations
Sample from stream of continuously
updated corpus
Dozens of dimensions/measures

Measurement: Unit of
Observation

-

Individuals; Interactions

Measurement: Measurement
type

-

Observational

Measurement: Performance
characteristic

-

High volume
Complex network structure
Sparsity
Systematic and sparse metadata

Management Constraints

-

License; Replication

Analysis methods

-

Bespoke algorithms (clustering);
nonlinear optimization; Bayesian
methods

Desired Outputs

-

Summary scalars (model coefficients)
Summary table
Static /interactive visualization

More Information
•
•
•

Information Privacy Across the Research Lifecycle

Grimmer, Justin, and Gary King. "General purpose computerassisted clustering and conceptualization." Proceedings of the
National Academy of Sciences 108.7 (2011): 2643-2650.
King, Gary, Jennifer Pan, and Molly Roberts. "How censorship in
China allows government criticism but silences collective
expression." APSA 2012 Annual Meeting Paper. 2012.
Lazer, David, et al. "Life in the network: the coming age of
computational social science." Science (New York, NY) 323.5915
(2009): 721.
What’s wrong with this picture?
Name

SSN

Birthdate

Zipcode

Gender

Favorite
Ice Cream

# of crimes
committed

A. Jones

12341

01011961

02145

M

Raspberry

0

B. Jones

12342

02021961

02138

M

Pistachio

0

C. Jones

12343

11111972

94043

M

Chocolate

0

D. Jones

12344

12121972

94043

M

Hazelnut

0

E. Jones

12345

03251972

94041

F

Lemon

0

F. Jones

12346

03251972

02127

F

Lemon

1

G. Jones

12347

08081989

02138

F

Peach

1

H. Smith

12348

01011973

63200

F

Lime

2

I. Smith

12349

02021973

63300

M

Mango

4

J. Smith

12350

02021973

63400

M

Coconut

16

K. Smith

12351

03031974

64500

M

Frog

32

L. Smith

12352

04041974

64600

M

Vanilla

64

M. Smith

12353

04041974

64700

F

Pumpkin

128

N. SmithJones

12354

04041974

64800

F

Allergic

256

Information Privacy Across the Research Lifecycle
What’s wrong with this picture?
HIPPA &
MA
Identifier

Identifier
&
Sensitibe

HIPAA
dentifier

HIPAA
Identifier

Sensitive

IndirectI
Identifier

Name

SSN

Birthdate

Zipcode

Gender

Favorite
Ice Cream

# of crimes
committed

A. Jones

12341

01011961

02145

M

Raspberry

0

B. Jones

12342

02021961

02138

M

Pistachio

0

C. Jones

12343

11111972

94043

M

Chocolate

0

D. Jones

12344

12121972

94043

M

Hazelnut

0

E. Jones

12345

03251972

94041

F

Lemon

0

F. Jones

12346

03251972

02127

F

Lemon

1

G. Jones

12347

08081989

02138

F

Peach

1

H. Smith

12348

01011973

63200

F

Lime

2

I. Smith

12349

02021973

63300

M

Mango

4

J. Smith

12350

02021973

63400

M

Coconut

16

K. Smith

12351

03031974

64500

M

Frog

32

L. Smith

12352

04041974

64600

M

Vanilla

64

M. Smith

12353

04041974

64700

F

Pumpkin

128

N. Smith

12354

04041974

64800

F

Allergic

256

v. 23 (7/18/2013)

Managing Confidential Data

Mass resident

Californian

Twins, separated at birth?
FERPA too?

Unexpected Response?
17
Help, help, I’m being suppressed…
Synthetic

Var

Global Recode

Local Suppression

Aggregation
+
Perturbation

Name

SSN

Birthdate

Zipcode

Gender

Favorite
Ice Cream

# of crimes
committed

[Name 1]

12341

*1961

021*

M

Raspberry

.1

[Name 2]

12342

*1961

021*

M

Pistachio

-.1

[Name 3]

12343

*1972

940*

M

Chocolate

0

[Name 4]

12344

*1972

940*

M

Hazelnut

0

[Name 5]

12345

*1972

940*

F

Lemon

.6

[Name 6]

12346

*1972

021*

F

Lemon

.6

[Name 7]

12347

*1989

021*

*

Peach

64.6

[Name 8]

12348

*1973

632*

F

Lime

3

[Name 9]

12349

*1973

633*

M

Mango

3

[Name 10]

12350

*1973

634*

M

Coconut

37.2

[Name 11]

12351

*1974

645*

M

*

37.2

[Name 12]

12352

*1974

646*

M

Vanilla

37.2

[Name 13]

12353

*1974

647*

F

*

64.4

[Name 14]

12354

*1974

648*

F

Allergic

256

Information Privacy Across the Research Lifecycle

Row
k-anonymous – but not
protected
Additional
background

Sort Order/
Structure
Name

SSN

Birthdate

Zipcode

Gender

Favorite
Ice Cream

*

* 1961

021*

M

Raspberry

*

* 1961

021*

M

Pistachio

*

* 1972

9404*

*

Chocolate

0

* Jones

*

* 1972

9404*

*

Hazelnut

0

* Jones

*

* 1972

9404*

*

Lemon

0

* Jones

*

*

021*

F

Lemon

1

* Jones

*

*

021*

F

Peach

1

* Smith

*

* 1973

63*

*

Lime

2

* Smith

*

* 1973

63*

*

Mango

4

* Smith

*

* 1973

63*

*

Coconut

16

* Smith

*

* 1974

64*

M

Frog

32

* Smith

*

* 1974

64*

M

Vanilla

64

* Smith

*

04041974

64*

F

Pumpkin

128

* Smith

*

04041974

64*

F

Allergic

256

Disclosure
limitation

0

* Jones

Information security

0

* Jones

Research design …

# of crimes
committed

* Jones

Law, policy, ethics

Information Privacy Across the Research Lifecycle

Homogeneity
Climate

Information Privacy Across the Research Lifecycle
Commercial Data Breaches
• Data from 100 million
individuals exposed
this year…
• Only a portion of
breaches are reported
• Difficult to trace
impacts… but
estimated 8.3M
identity thefts in 2005
Information Privacy Across the Research Lifecycle

Source:
http://www.informationisbeau
tiful.net/visualizations/worldsbiggest-data-breaches-hacks/
Cloud computing risks
• Cloud computing decouples
physical and computing
infrastructure
• Increasingly used for core-IT,
research computing, data
collection, storage, and
analysis
• Confidentiality issues
– Auditing and compliance
– Access and commingling of
data
– Location of data and services
and legal jurisdiction
– Vulnerabilities of network
communication using single
well-known key
– Vulnerability of key storage
Information Privacy Across the Research Lifecycle
Legal & Cultural Challenges
• EU right to be forgotten;
French “le droit à l'oubli”;
California social media privacy act
• Consumer privacy bill of rights;
Do not track;
Privacy Icons
• Evolving case law on locational privacy
• Public records, mug shots, and revenge porn
• State-level action on privacy regulation
• Attitudes towards sharing; surveillance
Information Privacy Across the Research Lifecycle
New Data – New Challenges
• How to limit disclosure without
completely destroying utility?
– The “Netflix Problem”: large, sparse datasets that
overlap can be probabilistically linked [Narayan
and Shmatikov 2008]
– The “GIS”: fine geo-spatial-temporal data
impossible mask, when correlated with external
data [Zimmerman 2008]
– The “Facebook Problem”: Possible to identify
masked network data, if only a few nodes
controlled. [Backstrom, et. al 2007]
– The “Blog problem” : Pseudononymous
communication can be linked through textual
analysis [Tomkins et. al 2004]
[For more examples see Vadhan, et al 2010]
Information Privacy Across the Research Lifecycle

Source: [Calberese 2008; Real Time
Rome Project 2007]
Weather

Information Privacy Across the Research Lifecycle
Possible Legal/Regulatory
Changes for 2013-15

Law, policy, ethics
Research design …
Information security
Disclosure
limitation

• Likely
– New information privacy laws in selected states
– Increased open data requirements
from federal funders
– Adoption of data availability
requirements by increasing numbers of journals

Information Privacy Across the Research Lifecycle
Information Privacy Across the Research Lifecycle
Research

Information Privacy Across the Research Lifecycle
Traditional approaches are failing
• Modal traditional approach:
–
–
–
–

removing subjects’ names
storing descriptive information in a locked filing cabinet
publishing summary tables
(sometimes) release a public use version that suppressed
and recoded descriptive information

• Problems
– law is changing – requirements are becoming more
complex
– research computing is moving towards the cloud, other
distributed storage
– researchers are using new forms of data that create new
privacy issues
– advances in the formal analysis of disclosure risk imply the
impracticality of “de-identification” as required by law
Information Privacy Across the Research Lifecycle
Privacy Tools for Sharing Research Data
A National Science Foundation Secure and Trustworthy Cyberspace Project
Supported by award #1237235

Differentially Private Algorithms
Shield Individuals in Databases

The Dataverse Network will Distribute
and Manage Confidential Databases
Information Privacy Across the Research Lifecycle

Policy tools Guide Information
Management Across the Research Lifecycle
Approaches
•

Policy
–
–
–
–

Legal Reforms
Information Accountability
Economic rights
Information transparency

–
–

Privacy Nudges
Privacy Icons

•

•

Cryptography
–
–
–
–

•

Multiparty computation
Zero knowledge protocols
Functional encryption
Homomorphic encryption

Statistics
–
–
–
–

•

Aboutmydata.com

Synthetic data
Reidentification risk
K-anonymity; homogeneity
Differential privacy

Information Lifecycle & Infrastructure
–
–
–
–

Open consent
Metadata frameworks
Information accountability
Policy aware filesystems

–

Data Vaults

–
–

Secure data enclave
Standardized Data Use Agreements

•
•

IRODs
Project VRM

Information Privacy Across the Research Lifecycle
Recent Work –
Economics & Public Policy Research/Outreach
•
•
•
•
•
•
•

March 2013 – Dwork & Vadhan lead roundtable in Differential Privacy and Law
and Policy (conference), Cardozo Law School
March 2013 – Altman provided oral comments (recorded) on Public Workshop on
Revisions to the Common Rule, National Academies, on limits of HIPAA approach to
privacy.
May 2013 – Altman & Crosas submitted written testimony to Public Access to
Federally-Supported Research and Development Data, National Academies;
including approaches to management of privacy for data sharing.
June 2013 – Dwork, Sweeney, & Vadhan invited & participated in Privacy Law
Scholars Conference, George Washington Law School/Berkeley Law School
June 2013 -- Yiling Chen, Stephen Chong, Ian Kash, Tal Moran, and Salil Vadhan.
“Truthful Mechanisms for Agents that Value Privacy”, Proceedings of the 14th ACM
Conference on Electronic Commerce (EC), June 2013.
September - Integrating Approaches to Privacy across the Research Lifecycle
Workshop
In Progress – Rewrite and expansion of, Vadhan, S. , et al. 2010. “Re: Advance
Notice of Proposed Rulemaking: Human Subjects Research Protections”, proposing
framework for integrating modern privacy concepts in to Human Subjects
protections.
Information Privacy Across the Research Lifecycle
Information Life Cycle Model
Long-term
access

Creation/Colle
ction

Re-use
•
•
•
•

Scientometric
Education
Scientific
Policy

Storage/I
ngest

Research
methods

Statistical /
Computational
Frameworks
Data Management
Systems

External
dissemination/publica
tion

Analysis

Legal / Policy
Frameworks∂
∂

Processing

Internal Sharing

Information Privacy Across the Research Lifecycle
Example: Stakeholder Concerns Across Lifecycle
Legal Issues

Stakeholder

Concerns

Research Consumers
- Readers
- Secondary researcher

Replicate and extend
Secondary analysis
Link research

Research Publishers
- Print publishers
- Research archives

Replicable research
Promote use of their publications
Protect publisher IP
Avoid third party IP/Privacy Issues

Copyright
Licensing

Project Personnel:
- Investigators
- Research Staff

Replicable Research
Publish
Promote use of Publications
Track use

Copyright

Research sponsors:
- Home institution
- Funding sources

Replicable Research
Policy Relevance
Accessibility of Research
Protect IP
Avoid third party IP/Privacy Issues

Privacy
Research sources:
Confidentiality
- Research Subjects.
Intellectual Property
- Owners of subject material
- Owners of supplementary data

Information
Transfer
Information Privacy Across the Research Lifecycle

Fair Use

Licensing
Freedom of Information
Copyright

Licensing
Copyright
DMCA
Informed Consent
Privacy
Trade secrets
Modeling Features
Features

Characteristics

Data

-

Structure; Source; Unit of observation; Attribute
types; Dimensionality; Number of observations;
homogeneity; frequency of updates; quality
characteristics

Analytic Results

-

Form of output; analysis methodology;
analysis/inferential goal; utility/loss/quality

Disclosure scenario

-

- Source of threat; areas of vulnerability; attacker
objectives, background knowledge, capability;
Breach criteria/disclosure concept

Stakeholders

-

Stakeholder types; capacities; trust relationships;
budgets

Lifecycle characteristics

-

Lifecycle stages controlled/in scope; policies used;
stakeholders involved at each stage

Current privacy management approach

-

Regulation/policy; legal controls;
statistical/computational disclosure methods;
information security controls

Information Privacy Across the Research Lifecycle
Legal/Policy Frameworks
Intellectual Property

Contract
Trade Secret

Contract

Intellectual
Attribution
Moral Rights

Patent

Click-Wrap
TOU

License

Database Rights
Journal
Replication
Requirements

FOIA
State FOI
Laws

Funder Open
Access

Fair Use

DMCA

Trademark

Common Rule
45 CFR 26

Copyright

Rights of
Publicity

HIPAA
EU Privacy Directive

FERPA

(Invasion,
Defamation)

CIPSEA

Potentially
Harmful

State
Privacy Laws
Classified

Access
Rights

Sensitive but
Unclassified

Privacy
Torts

Export
Restrictions

(Archeological
Sites,
Endangered
Species, Animal
Testing, …)

EAR

ITAR

Confidentiality
Law, policy, ethics
Research design …

Risk Assessment

Information security
Disclosure limitation

• [NIST 800-100, simplification of NIST 800-30]
Threat Modeling

Analysis
- likelihood
- impact
- mitigating controls

System Analysis

Vulnerability
Identification

Institute
Selected Controls

Testing and
Auditing

Information Security Control Selection Process

Information Privacy Across the Research Lifecycle
Systems Policy Research questions deriving from
Information Lifecycle Analysis
•

Infrastructure requirements analysis
– Data acquisition, storage, dissemination
– Identification, authorization, authentication
– Metadata, protocols

•

System design: potential implementation cost of interactive privacy:
– Information security -- hardening
– Information security – certification & auditing
– Model server development, provisioning, maintenance, reliability, availability

•

System design: information security tradeoffs of Interactive privacy mechanisms:
–
–
–
–

•

Availability risks: denial of service attack
Availability/integrity risks: privacy budget exhaustion attacks
Integrity risks: modification of delivered results (e.g. man-in-the-middle attacks)
Secrecy/privacy: breach of authentication/authorization layer

System design: optimizing privacy & utility across lifecycle
– When does limiting disclosive data collection dominate methods at the data analysis stage
– When does restricted virtual data enclaves + public synthetic data dominate interactive mechanisms

•

System design: Information use/reuse

– Support of scientific analysis use cases (model diagnostics, exploratory data analysis, integration of externa
data) within interactive privacy systems.
– Align informational assumptions across stages & incorporating informative priors?
– Requirements for scientific replication/verification of results produced by model servers?
Information Privacy Across the Research Lifecycle
Legal Policy Research questions deriving from
Information Lifecycle Analysis
• Legal requirements across lifecycle stages
• Legal instruments
-- capturing scientific privacy concepts in legal
instruments consistently across lifecycle
– service level agreements
– consent terms
– deposit agreement
– data usage agreements
– Regulatory language
Information Privacy Across the Research Lifecycle
Public Policy Research Questions
• Where does market fail for sharing confidential research
data?
– What market conditions are theoretically violated?
– What is the empirical evidence of the degree of violation?
– How do degree of violation vary by policy context & use case?

• Policy equlibria
– What are contribution and privacy equilibria for data sharing
under different privacy concepts?

• Interventions
– How do proposed interventions (e.g. advise & consent;
“privacy icons”, uniform regulations, breach notification,
information accountability, anonymization ) correspond to
sources of market failures?
Information Privacy Across the Research Lifecycle
Beyond Legal Research -- Market Theory
• Condition on Markets •
– No political/legal
distortions

[See, e.g., Posner 1978]

– Common knowledge
– No barriers to entry

Conditions on exchange

[See e.g., Benisch, Kelley, Sadeh,
& Cranor 2011; McDonald &
Cranor 2010]

– No transaction costs
– No information
asymmetries

• Conditions on agents • Conditions on
[See e.g. Acquisti 2010; Tsai,
equilibrium valuation
Egelman, Cranor & Aquisti 2010]
– Perfect rationality
– Self-interested
– Infinitely many
agents
– Stable preferences

– Pareto optimality vs.
economic surplus
– Ignorability of
distributional
concern

• Conditions on goods
– Consumptive goods
– Excludable goods
– Decreasing returns
to scale
– Transferability
– No externalities
Information Privacy Across the Research Lifecycle
Bibliography (Selected)

• L. Willenborg and T. D. Waal. Elements of Statistical Disclosure Control,
volume 155 of Lecture Notes in Statistics. Springer Verlag, New York, NY,
2001.
• Higgins, Sarah. "The DCC curation lifecycle model." International Journal of
Digital Curation 3.1 (2008): 134-140.www.dcc.ac.uk/resources/curationlifecycle-model
• ESSNET, Handbook on Statistical Disclosure Control. 2011.
neon.vb.cbs.nl/casc/SDC_Handbook.pdf
• Fung, Benjamin, et al. "Privacy-preserving data publishing: A survey of recent
developments." ACM Computing Surveys (CSUR) 42.4 (2010): 14.
• Altman, M. (2012). “Mitigating Threats To Data Quality Throughout the
Curation Lifecycle. In G. Marciano, C. Lee, & H. Bowden (Eds.), Curating For
Quality. datacuration.web.unc.edu
Information Privacy Across the Research
Lifecycle
Managing Confidential Information – Trends and Approaches

More Related Content

What's hot

July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...Micah Altman
 
Big Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPBig Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPMicah Altman
 
Confidential data management_key_concepts
Confidential data management_key_conceptsConfidential data management_key_concepts
Confidential data management_key_conceptsMicah Altman
 
Comments to FTC on Mobile Data Privacy
Comments to FTC on Mobile Data PrivacyComments to FTC on Mobile Data Privacy
Comments to FTC on Mobile Data PrivacyMicah Altman
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman
 
Redistricting and Voting Technology
Redistricting and Voting TechnologyRedistricting and Voting Technology
Redistricting and Voting TechnologyMicah Altman
 
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSMicah Altman
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data CitationMicah Altman
 
Data Citation Rewards and Incentives
 Data Citation Rewards and Incentives Data Citation Rewards and Incentives
Data Citation Rewards and IncentivesMicah Altman
 
Best Practices for Sharing Economics Data
Best Practices for Sharing Economics DataBest Practices for Sharing Economics Data
Best Practices for Sharing Economics DataMicah Altman
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...Micah Altman
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation InfrastructureMicah Altman
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Meeting Federal Research Requirements
Meeting Federal Research RequirementsMeeting Federal Research Requirements
Meeting Federal Research RequirementsICPSR
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset ConversationMicah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014ICPSR
 

What's hot (20)

July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
 
Big Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPBig Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTP
 
Confidential data management_key_concepts
Confidential data management_key_conceptsConfidential data management_key_concepts
Confidential data management_key_concepts
 
Comments to FTC on Mobile Data Privacy
Comments to FTC on Mobile Data PrivacyComments to FTC on Mobile Data Privacy
Comments to FTC on Mobile Data Privacy
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
 
Redistricting and Voting Technology
Redistricting and Voting TechnologyRedistricting and Voting Technology
Redistricting and Voting Technology
 
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data Citation
 
Data Citation Rewards and Incentives
 Data Citation Rewards and Incentives Data Citation Rewards and Incentives
Data Citation Rewards and Incentives
 
Wilbanks Can We Simultaneously Support Both Privacy & Research?
Wilbanks Can We Simultaneously Support Both Privacy & Research?Wilbanks Can We Simultaneously Support Both Privacy & Research?
Wilbanks Can We Simultaneously Support Both Privacy & Research?
 
Best Practices for Sharing Economics Data
Best Practices for Sharing Economics DataBest Practices for Sharing Economics Data
Best Practices for Sharing Economics Data
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation Infrastructure
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Meeting Federal Research Requirements
Meeting Federal Research RequirementsMeeting Federal Research Requirements
Meeting Federal Research Requirements
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
 

Similar to Managing Confidential Information – Trends and Approaches

Privacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesPrivacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesMicah Altman
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysCliff Lampe
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Robert Oostenveld
 
A Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information PrivacyA Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information PrivacyMicah Altman
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interityIUPUI
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRARDC
 
Finding and Using Secondary Data and Resources for Research
Finding and Using Secondary Data  and Resources for ResearchFinding and Using Secondary Data  and Resources for Research
Finding and Using Secondary Data and Resources for ResearchDr. Karen Whiteman
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveMicah Altman
 
UN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingUN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingMicah Altman
 
Characterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchCharacterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchMicah Altman
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...ICPSR
 
Netnography and Research Ethics: From ACR 2015 Doctoral Symposium
Netnography and Research Ethics: From ACR 2015 Doctoral SymposiumNetnography and Research Ethics: From ACR 2015 Doctoral Symposium
Netnography and Research Ethics: From ACR 2015 Doctoral SymposiumUniversity of Southern California
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Sdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalSdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalkimlyman
 
Data Collection Methods
Data Collection MethodsData Collection Methods
Data Collection MethodsSOMASUNDARAM T
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Galit Shmueli
 

Similar to Managing Confidential Information – Trends and Approaches (20)

Privacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesPrivacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use Cases
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveys
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data
 
A Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information PrivacyA Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information Privacy
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interity
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSR
 
Finding and Using Secondary Data and Resources for Research
Finding and Using Secondary Data  and Resources for ResearchFinding and Using Secondary Data  and Resources for Research
Finding and Using Secondary Data and Resources for Research
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics Perspective
 
La ricerca scientifica nell'era dei Big Data - Sabina Leonelli
La ricerca scientifica nell'era dei Big Data - Sabina LeonelliLa ricerca scientifica nell'era dei Big Data - Sabina Leonelli
La ricerca scientifica nell'era dei Big Data - Sabina Leonelli
 
UN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingUN Global Pulse Privacy Framing
UN Global Pulse Privacy Framing
 
Rdm slides march 2014
Rdm slides march 2014Rdm slides march 2014
Rdm slides march 2014
 
Characterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science ResearchCharacterizing Data and Software for Social Science Research
Characterizing Data and Software for Social Science Research
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
 
Netnography and Research Ethics: From ACR 2015 Doctoral Symposium
Netnography and Research Ethics: From ACR 2015 Doctoral SymposiumNetnography and Research Ethics: From ACR 2015 Doctoral Symposium
Netnography and Research Ethics: From ACR 2015 Doctoral Symposium
 
A brave new world: student surveillance in higher education
A brave new world: student surveillance in higher educationA brave new world: student surveillance in higher education
A brave new world: student surveillance in higher education
 
CISER & the Data Reference Interview
CISER & the Data Reference InterviewCISER & the Data Reference Interview
CISER & the Data Reference Interview
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Sdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalSdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) final
 
Data Collection Methods
Data Collection MethodsData Collection Methods
Data Collection Methods
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
 

More from Micah Altman

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesMicah Altman
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset ConversationMicah Altman
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer ReviewMicah Altman
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer ReviewMicah Altman
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An OverviewMicah Altman
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral DistrictingMicah Altman
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenaryMicah Altman
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...Micah Altman
 
Agenda's for Preservation Research
Agenda's for Preservation ResearchAgenda's for Preservation Research
Agenda's for Preservation ResearchMicah Altman
 
Software Repositories for Research -- An Environmental Scan
Software Repositories for Research -- An Environmental ScanSoftware Repositories for Research -- An Environmental Scan
Software Repositories for Research -- An Environmental ScanMicah Altman
 
Can computers be feminist? Program on Information Science Talk by Gillian Smith
Can computers be feminist? Program on Information Science Talk by Gillian SmithCan computers be feminist? Program on Information Science Talk by Gillian Smith
Can computers be feminist? Program on Information Science Talk by Gillian SmithMicah Altman
 

More from Micah Altman (20)

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategies
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...
 
Agenda's for Preservation Research
Agenda's for Preservation ResearchAgenda's for Preservation Research
Agenda's for Preservation Research
 
Software Repositories for Research -- An Environmental Scan
Software Repositories for Research -- An Environmental ScanSoftware Repositories for Research -- An Environmental Scan
Software Repositories for Research -- An Environmental Scan
 
Can computers be feminist? Program on Information Science Talk by Gillian Smith
Can computers be feminist? Program on Information Science Talk by Gillian SmithCan computers be feminist? Program on Information Science Talk by Gillian Smith
Can computers be feminist? Program on Information Science Talk by Gillian Smith
 

Recently uploaded

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Recently uploaded (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Managing Confidential Information – Trends and Approaches

  • 1. Prepared for MIT Libraries Program on Information Research Brown Bag Talk September 2013 Managing Confidential Information – Trends and Approaches Dr. Micah Altman <escience@mit.edu> Director of Research, MIT Libraries
  • 2. Standard Disclaimer These opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators Secondary disclaimer: “It’s tough to make predictions, especially about the future!” -- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc. Information Privacy Across the Research Lifecycle
  • 3. Collaborators & Co-Conspirators • Privacy Tools for Sharing Research Data Team (Salil Vadhan, P.I.) http://privacytools.seas.harvard.edu/peopl e • Research Support Supported in part by NSF grant CNS-1237235 Information Privacy Across the Research Lifecycle
  • 4. Related Work Main Project: • Privacy Tools for Sharing Research Data http://privacytools.seas.harvard.edu/ . Related publications: • Novak, K., Altman, M., Broch, E., Carroll, J. M., Clemins, P. J., Fournier, D., Laevart, C., et al. (2011). Communicating Science and Engineering Data in the Information Age. Computer Science and Telecommunications. National Academies Press • Vadhan, S. , et al. 2010. “Re: Advance Notice of Proposed Rulemaking: Human Subjects Research Protections”. Available from: http://dataprivacylab.org/projects/irb/Vadhan.pdf • Altman, M. (2012). “Mitigating Threats To Data Quality Throughout the Curation Lifecycle. In G. Marciano, C. Lee, & H. Bowden (Eds.), Curating For Quality. datacuration.web.unc.edu These slides & Privacy Across the Research Lifecycle from: most reprints available Information informatics.mit.edu
  • 5. Level Setting Information Privacy Across the Research Lifecycle
  • 6. Identifying Information Is Common • Includes information from a variety of sources, such as… – Research data, even if you aren’t the original collector – Student “records” such as e-mail, grades – Logs from web-servers, other systems • Lots of things are potentially identifying: – Under some federal laws: IP addresses, dates, zipcodes, … – Birth date + zipcode + gender uniquely identify ~87% of people in the U.S. [Sweeney 2002] Try it: http://aboutmyinfo.org/index.html – With date and place of birth, can guess first five digits of social security number (SSN) > 60% of the time. (Can guess the whole thing in under 10 tries, for a significant minority of people.) [Aquisti & Gross 2009] – Analysis of writing style or eclectic tastes has been used to identify individuals • Tables, graphs and maps can also reveal identifiable information Brownstein, et al., 2006 , NEJM 355(16), Information Privacy Across the Research Lifecycle
  • 7. Some Sources of Confidentiality Restrictions for University Held Research and Education Information • Overlapping laws • Different laws apply to different cases • Additional data usage agreements and license terms apply Information Privacy Across the Research Lifecycle
  • 8. Different Requirements and Definitions FERPA HIPAA Common Rule MA 201 CMR 17 Coverage Students in Educational Institutions Medical Information in “Covered Entities” Living persons in research by funded institutions Mass. Residents Identification Criteria -Direct -Indirect -Linked -Bad intent (!) -Direct -Indirect -Linked -Direct -Indirect -Linked -Direct Sensitivity Criteria Any non-directory information Any medical information Private information – based on harm Financial, State, Federal Identifiers Management Requirements - Directory opt-out - [Implied] good practice - Consent - Specific technical safeguards - Breach notification - Consent - [Implied] risk minimization - Specific technical safeguards - Breach notification Information Privacy Across the Research Lifecycle
  • 9. * * 2010 Information Privacy Across the Research Lifecycle
  • 10. Recognized Benefits of Data Sharing • Pioneering NRC report [Fienberg, et. al 1985] on data sharing recommended: – Sharing data should be a regular practice. – Investigators should share their data by the time of publication of initial major results of analyses of the data except in compelling circumstances. – Data relevant to public policy should be shared as quickly and widely as possible. – Plans for data sharing should be an integral part of a research plan whenever data sharing is feasible. • Numerous subsequent reports recommend data sharing. Information Privacy Across the Research Lifecycle
  • 11. Private Information & Information Services • Recommendations • Annotations & Tagging • Class discussion forum • Social Highlighting Information Privacy Across the Research Lifecycle
  • 12. Access Control Model Access Control Resource Auditing Client Authorization Credentials Authentication Request/Respo nse Log Resource Control Model External Auditor Information Privacy Across the Research Lifecycle
  • 13. Disclosure Limitation Data InputOutput Model Contingency table “The correlation between X and Y was large and statistically significant” Summary statistics DATA Information Visualization DATA * Jones * * 1961 021* * Jones * * 1961 021* * Jones * * 1972 9404* * Jones * * 1972 9404* * Jones * * 1972 9404* Public use sample microdata Published Outputs Information Privacy Across the Research Lifecycle
  • 14. Example Information Privacy Across the Research Lifecycle
  • 15. Exemplar: Social Media Analysis Attribute Type Examples Data: Structure - network Data: Attribute Types - Continuous/Discrete/ Scale: ratio/interval/ordinal/nominal Data: Performance Characteristics - 10M-1B observations Sample from stream of continuously updated corpus Dozens of dimensions/measures Measurement: Unit of Observation - Individuals; Interactions Measurement: Measurement type - Observational Measurement: Performance characteristic - High volume Complex network structure Sparsity Systematic and sparse metadata Management Constraints - License; Replication Analysis methods - Bespoke algorithms (clustering); nonlinear optimization; Bayesian methods Desired Outputs - Summary scalars (model coefficients) Summary table Static /interactive visualization More Information • • • Information Privacy Across the Research Lifecycle Grimmer, Justin, and Gary King. "General purpose computerassisted clustering and conceptualization." Proceedings of the National Academy of Sciences 108.7 (2011): 2643-2650. King, Gary, Jennifer Pan, and Molly Roberts. "How censorship in China allows government criticism but silences collective expression." APSA 2012 Annual Meeting Paper. 2012. Lazer, David, et al. "Life in the network: the coming age of computational social science." Science (New York, NY) 323.5915 (2009): 721.
  • 16. What’s wrong with this picture? Name SSN Birthdate Zipcode Gender Favorite Ice Cream # of crimes committed A. Jones 12341 01011961 02145 M Raspberry 0 B. Jones 12342 02021961 02138 M Pistachio 0 C. Jones 12343 11111972 94043 M Chocolate 0 D. Jones 12344 12121972 94043 M Hazelnut 0 E. Jones 12345 03251972 94041 F Lemon 0 F. Jones 12346 03251972 02127 F Lemon 1 G. Jones 12347 08081989 02138 F Peach 1 H. Smith 12348 01011973 63200 F Lime 2 I. Smith 12349 02021973 63300 M Mango 4 J. Smith 12350 02021973 63400 M Coconut 16 K. Smith 12351 03031974 64500 M Frog 32 L. Smith 12352 04041974 64600 M Vanilla 64 M. Smith 12353 04041974 64700 F Pumpkin 128 N. SmithJones 12354 04041974 64800 F Allergic 256 Information Privacy Across the Research Lifecycle
  • 17. What’s wrong with this picture? HIPPA & MA Identifier Identifier & Sensitibe HIPAA dentifier HIPAA Identifier Sensitive IndirectI Identifier Name SSN Birthdate Zipcode Gender Favorite Ice Cream # of crimes committed A. Jones 12341 01011961 02145 M Raspberry 0 B. Jones 12342 02021961 02138 M Pistachio 0 C. Jones 12343 11111972 94043 M Chocolate 0 D. Jones 12344 12121972 94043 M Hazelnut 0 E. Jones 12345 03251972 94041 F Lemon 0 F. Jones 12346 03251972 02127 F Lemon 1 G. Jones 12347 08081989 02138 F Peach 1 H. Smith 12348 01011973 63200 F Lime 2 I. Smith 12349 02021973 63300 M Mango 4 J. Smith 12350 02021973 63400 M Coconut 16 K. Smith 12351 03031974 64500 M Frog 32 L. Smith 12352 04041974 64600 M Vanilla 64 M. Smith 12353 04041974 64700 F Pumpkin 128 N. Smith 12354 04041974 64800 F Allergic 256 v. 23 (7/18/2013) Managing Confidential Data Mass resident Californian Twins, separated at birth? FERPA too? Unexpected Response? 17
  • 18. Help, help, I’m being suppressed… Synthetic Var Global Recode Local Suppression Aggregation + Perturbation Name SSN Birthdate Zipcode Gender Favorite Ice Cream # of crimes committed [Name 1] 12341 *1961 021* M Raspberry .1 [Name 2] 12342 *1961 021* M Pistachio -.1 [Name 3] 12343 *1972 940* M Chocolate 0 [Name 4] 12344 *1972 940* M Hazelnut 0 [Name 5] 12345 *1972 940* F Lemon .6 [Name 6] 12346 *1972 021* F Lemon .6 [Name 7] 12347 *1989 021* * Peach 64.6 [Name 8] 12348 *1973 632* F Lime 3 [Name 9] 12349 *1973 633* M Mango 3 [Name 10] 12350 *1973 634* M Coconut 37.2 [Name 11] 12351 *1974 645* M * 37.2 [Name 12] 12352 *1974 646* M Vanilla 37.2 [Name 13] 12353 *1974 647* F * 64.4 [Name 14] 12354 *1974 648* F Allergic 256 Information Privacy Across the Research Lifecycle Row
  • 19. k-anonymous – but not protected Additional background Sort Order/ Structure Name SSN Birthdate Zipcode Gender Favorite Ice Cream * * 1961 021* M Raspberry * * 1961 021* M Pistachio * * 1972 9404* * Chocolate 0 * Jones * * 1972 9404* * Hazelnut 0 * Jones * * 1972 9404* * Lemon 0 * Jones * * 021* F Lemon 1 * Jones * * 021* F Peach 1 * Smith * * 1973 63* * Lime 2 * Smith * * 1973 63* * Mango 4 * Smith * * 1973 63* * Coconut 16 * Smith * * 1974 64* M Frog 32 * Smith * * 1974 64* M Vanilla 64 * Smith * 04041974 64* F Pumpkin 128 * Smith * 04041974 64* F Allergic 256 Disclosure limitation 0 * Jones Information security 0 * Jones Research design … # of crimes committed * Jones Law, policy, ethics Information Privacy Across the Research Lifecycle Homogeneity
  • 20. Climate Information Privacy Across the Research Lifecycle
  • 21. Commercial Data Breaches • Data from 100 million individuals exposed this year… • Only a portion of breaches are reported • Difficult to trace impacts… but estimated 8.3M identity thefts in 2005 Information Privacy Across the Research Lifecycle Source: http://www.informationisbeau tiful.net/visualizations/worldsbiggest-data-breaches-hacks/
  • 22. Cloud computing risks • Cloud computing decouples physical and computing infrastructure • Increasingly used for core-IT, research computing, data collection, storage, and analysis • Confidentiality issues – Auditing and compliance – Access and commingling of data – Location of data and services and legal jurisdiction – Vulnerabilities of network communication using single well-known key – Vulnerability of key storage Information Privacy Across the Research Lifecycle
  • 23. Legal & Cultural Challenges • EU right to be forgotten; French “le droit à l'oubli”; California social media privacy act • Consumer privacy bill of rights; Do not track; Privacy Icons • Evolving case law on locational privacy • Public records, mug shots, and revenge porn • State-level action on privacy regulation • Attitudes towards sharing; surveillance Information Privacy Across the Research Lifecycle
  • 24. New Data – New Challenges • How to limit disclosure without completely destroying utility? – The “Netflix Problem”: large, sparse datasets that overlap can be probabilistically linked [Narayan and Shmatikov 2008] – The “GIS”: fine geo-spatial-temporal data impossible mask, when correlated with external data [Zimmerman 2008] – The “Facebook Problem”: Possible to identify masked network data, if only a few nodes controlled. [Backstrom, et. al 2007] – The “Blog problem” : Pseudononymous communication can be linked through textual analysis [Tomkins et. al 2004] [For more examples see Vadhan, et al 2010] Information Privacy Across the Research Lifecycle Source: [Calberese 2008; Real Time Rome Project 2007]
  • 25. Weather Information Privacy Across the Research Lifecycle
  • 26. Possible Legal/Regulatory Changes for 2013-15 Law, policy, ethics Research design … Information security Disclosure limitation • Likely – New information privacy laws in selected states – Increased open data requirements from federal funders – Adoption of data availability requirements by increasing numbers of journals Information Privacy Across the Research Lifecycle
  • 27. Information Privacy Across the Research Lifecycle
  • 28. Research Information Privacy Across the Research Lifecycle
  • 29. Traditional approaches are failing • Modal traditional approach: – – – – removing subjects’ names storing descriptive information in a locked filing cabinet publishing summary tables (sometimes) release a public use version that suppressed and recoded descriptive information • Problems – law is changing – requirements are becoming more complex – research computing is moving towards the cloud, other distributed storage – researchers are using new forms of data that create new privacy issues – advances in the formal analysis of disclosure risk imply the impracticality of “de-identification” as required by law Information Privacy Across the Research Lifecycle
  • 30. Privacy Tools for Sharing Research Data A National Science Foundation Secure and Trustworthy Cyberspace Project Supported by award #1237235 Differentially Private Algorithms Shield Individuals in Databases The Dataverse Network will Distribute and Manage Confidential Databases Information Privacy Across the Research Lifecycle Policy tools Guide Information Management Across the Research Lifecycle
  • 31. Approaches • Policy – – – – Legal Reforms Information Accountability Economic rights Information transparency – – Privacy Nudges Privacy Icons • • Cryptography – – – – • Multiparty computation Zero knowledge protocols Functional encryption Homomorphic encryption Statistics – – – – • Aboutmydata.com Synthetic data Reidentification risk K-anonymity; homogeneity Differential privacy Information Lifecycle & Infrastructure – – – – Open consent Metadata frameworks Information accountability Policy aware filesystems – Data Vaults – – Secure data enclave Standardized Data Use Agreements • • IRODs Project VRM Information Privacy Across the Research Lifecycle
  • 32. Recent Work – Economics & Public Policy Research/Outreach • • • • • • • March 2013 – Dwork & Vadhan lead roundtable in Differential Privacy and Law and Policy (conference), Cardozo Law School March 2013 – Altman provided oral comments (recorded) on Public Workshop on Revisions to the Common Rule, National Academies, on limits of HIPAA approach to privacy. May 2013 – Altman & Crosas submitted written testimony to Public Access to Federally-Supported Research and Development Data, National Academies; including approaches to management of privacy for data sharing. June 2013 – Dwork, Sweeney, & Vadhan invited & participated in Privacy Law Scholars Conference, George Washington Law School/Berkeley Law School June 2013 -- Yiling Chen, Stephen Chong, Ian Kash, Tal Moran, and Salil Vadhan. “Truthful Mechanisms for Agents that Value Privacy”, Proceedings of the 14th ACM Conference on Electronic Commerce (EC), June 2013. September - Integrating Approaches to Privacy across the Research Lifecycle Workshop In Progress – Rewrite and expansion of, Vadhan, S. , et al. 2010. “Re: Advance Notice of Proposed Rulemaking: Human Subjects Research Protections”, proposing framework for integrating modern privacy concepts in to Human Subjects protections. Information Privacy Across the Research Lifecycle
  • 33. Information Life Cycle Model Long-term access Creation/Colle ction Re-use • • • • Scientometric Education Scientific Policy Storage/I ngest Research methods Statistical / Computational Frameworks Data Management Systems External dissemination/publica tion Analysis Legal / Policy Frameworks∂ ∂ Processing Internal Sharing Information Privacy Across the Research Lifecycle
  • 34. Example: Stakeholder Concerns Across Lifecycle Legal Issues Stakeholder Concerns Research Consumers - Readers - Secondary researcher Replicate and extend Secondary analysis Link research Research Publishers - Print publishers - Research archives Replicable research Promote use of their publications Protect publisher IP Avoid third party IP/Privacy Issues Copyright Licensing Project Personnel: - Investigators - Research Staff Replicable Research Publish Promote use of Publications Track use Copyright Research sponsors: - Home institution - Funding sources Replicable Research Policy Relevance Accessibility of Research Protect IP Avoid third party IP/Privacy Issues Privacy Research sources: Confidentiality - Research Subjects. Intellectual Property - Owners of subject material - Owners of supplementary data Information Transfer Information Privacy Across the Research Lifecycle Fair Use Licensing Freedom of Information Copyright Licensing Copyright DMCA Informed Consent Privacy Trade secrets
  • 35. Modeling Features Features Characteristics Data - Structure; Source; Unit of observation; Attribute types; Dimensionality; Number of observations; homogeneity; frequency of updates; quality characteristics Analytic Results - Form of output; analysis methodology; analysis/inferential goal; utility/loss/quality Disclosure scenario - - Source of threat; areas of vulnerability; attacker objectives, background knowledge, capability; Breach criteria/disclosure concept Stakeholders - Stakeholder types; capacities; trust relationships; budgets Lifecycle characteristics - Lifecycle stages controlled/in scope; policies used; stakeholders involved at each stage Current privacy management approach - Regulation/policy; legal controls; statistical/computational disclosure methods; information security controls Information Privacy Across the Research Lifecycle
  • 36. Legal/Policy Frameworks Intellectual Property Contract Trade Secret Contract Intellectual Attribution Moral Rights Patent Click-Wrap TOU License Database Rights Journal Replication Requirements FOIA State FOI Laws Funder Open Access Fair Use DMCA Trademark Common Rule 45 CFR 26 Copyright Rights of Publicity HIPAA EU Privacy Directive FERPA (Invasion, Defamation) CIPSEA Potentially Harmful State Privacy Laws Classified Access Rights Sensitive but Unclassified Privacy Torts Export Restrictions (Archeological Sites, Endangered Species, Animal Testing, …) EAR ITAR Confidentiality
  • 37. Law, policy, ethics Research design … Risk Assessment Information security Disclosure limitation • [NIST 800-100, simplification of NIST 800-30] Threat Modeling Analysis - likelihood - impact - mitigating controls System Analysis Vulnerability Identification Institute Selected Controls Testing and Auditing Information Security Control Selection Process Information Privacy Across the Research Lifecycle
  • 38. Systems Policy Research questions deriving from Information Lifecycle Analysis • Infrastructure requirements analysis – Data acquisition, storage, dissemination – Identification, authorization, authentication – Metadata, protocols • System design: potential implementation cost of interactive privacy: – Information security -- hardening – Information security – certification & auditing – Model server development, provisioning, maintenance, reliability, availability • System design: information security tradeoffs of Interactive privacy mechanisms: – – – – • Availability risks: denial of service attack Availability/integrity risks: privacy budget exhaustion attacks Integrity risks: modification of delivered results (e.g. man-in-the-middle attacks) Secrecy/privacy: breach of authentication/authorization layer System design: optimizing privacy & utility across lifecycle – When does limiting disclosive data collection dominate methods at the data analysis stage – When does restricted virtual data enclaves + public synthetic data dominate interactive mechanisms • System design: Information use/reuse – Support of scientific analysis use cases (model diagnostics, exploratory data analysis, integration of externa data) within interactive privacy systems. – Align informational assumptions across stages & incorporating informative priors? – Requirements for scientific replication/verification of results produced by model servers? Information Privacy Across the Research Lifecycle
  • 39. Legal Policy Research questions deriving from Information Lifecycle Analysis • Legal requirements across lifecycle stages • Legal instruments -- capturing scientific privacy concepts in legal instruments consistently across lifecycle – service level agreements – consent terms – deposit agreement – data usage agreements – Regulatory language Information Privacy Across the Research Lifecycle
  • 40. Public Policy Research Questions • Where does market fail for sharing confidential research data? – What market conditions are theoretically violated? – What is the empirical evidence of the degree of violation? – How do degree of violation vary by policy context & use case? • Policy equlibria – What are contribution and privacy equilibria for data sharing under different privacy concepts? • Interventions – How do proposed interventions (e.g. advise & consent; “privacy icons”, uniform regulations, breach notification, information accountability, anonymization ) correspond to sources of market failures? Information Privacy Across the Research Lifecycle
  • 41. Beyond Legal Research -- Market Theory • Condition on Markets • – No political/legal distortions [See, e.g., Posner 1978] – Common knowledge – No barriers to entry Conditions on exchange [See e.g., Benisch, Kelley, Sadeh, & Cranor 2011; McDonald & Cranor 2010] – No transaction costs – No information asymmetries • Conditions on agents • Conditions on [See e.g. Acquisti 2010; Tsai, equilibrium valuation Egelman, Cranor & Aquisti 2010] – Perfect rationality – Self-interested – Infinitely many agents – Stable preferences – Pareto optimality vs. economic surplus – Ignorability of distributional concern • Conditions on goods – Consumptive goods – Excludable goods – Decreasing returns to scale – Transferability – No externalities Information Privacy Across the Research Lifecycle
  • 42. Bibliography (Selected) • L. Willenborg and T. D. Waal. Elements of Statistical Disclosure Control, volume 155 of Lecture Notes in Statistics. Springer Verlag, New York, NY, 2001. • Higgins, Sarah. "The DCC curation lifecycle model." International Journal of Digital Curation 3.1 (2008): 134-140.www.dcc.ac.uk/resources/curationlifecycle-model • ESSNET, Handbook on Statistical Disclosure Control. 2011. neon.vb.cbs.nl/casc/SDC_Handbook.pdf • Fung, Benjamin, et al. "Privacy-preserving data publishing: A survey of recent developments." ACM Computing Surveys (CSUR) 42.4 (2010): 14. • Altman, M. (2012). “Mitigating Threats To Data Quality Throughout the Curation Lifecycle. In G. Marciano, C. Lee, & H. Bowden (Eds.), Curating For Quality. datacuration.web.unc.edu Information Privacy Across the Research Lifecycle

Editor's Notes

  1. Personal information is ubiquitous and it is becoming increasingly easy to link information to individuals. Laws, regulations and policies governing information privacy are complex, but most intervene through either access or anonymization at the time of data publication. Trends in information collection and management -- cloud storage, &quot;big&quot; data, and debates about the right to limit access to published but personal information complicate data management, and make traditional approaches to managing confidential data decreasingly effective. This session presented as part of the the Program on Information Science seminar series, examines trends information privacy. And the session will also discuss emerging approaches and research around managing confidential research information throughout its lifecycle.This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  2. Other image source: wikimedia commons