Context, Causality, and Information Flow:
Implications for Privacy Engineering, Security,
and Data Economics
A presentation of doctoral dissertation research by
Sebastian Benthall
UC Berkeley School of Information
Outline
● Motivation
● Overview of Projects
● Project #1: Contextual Integrity through the Lens of Computer Science
● Disciplinary Bridge
● Project #2: Origin Privacy: Causality and Data Protection
● Project #3: Data Games and the Value of Information
● Concluding remarks
THIS IS A FUN INTERACTIVE TALK:
An image slide means it is time to ask a question.
I’ll answer one question per picture.
Motivation
Four modalities regulating cyberspace. (Lessig, 2009)
Social Norms
Market
Law
Technology
Social Norms
Market
Law
Technology
Tech Industry
Public Regulation
Social Norms
Market
Law
Technology
Tech Industry
Public Regulation
a2
+ b2
= c2
“Words, words,
words…”
CS, Statistics, EE
Law Social Philosophy
Economics
Social Norms
Market
Law
Technology
Tech Industry
Public Regulation
a2
+ b2
= c2
“Words, words,
words…”
CS, Statistics, EE
Law Social Philosophy
Economics
Social Norms
Market
Law
Technology
Tech Industry
Public Regulation
a2
+ b2
= c2
“Words, words,
words…”
CS, Statistics, EE
Law Social Philosophy
Economics
#?!$?!@?!*?!
Social Norms
Market
Law
Technology
Tech Industry
Public Regulation
a2
+ b2
= c2
“Words, words,
words…”
CS, Statistics, EE
Law Social Philosophy
Economics
#?!$?!@?!*?!
Social Norms
Market
Law
Technology
Tech Industry
Public Regulation
a2
+ b2
= c2
“Words, words,
words…”
CS, Statistics, EE
Law Social Philosophy
Economics
#?!$?!@?!*?!
We can’t ignore this problem.
Overview
Social Norms
Market
Law
Technology
CS, Statistics, EE
Law Social Philosophy
Economics
#?!$?!@?!*?!
Social Norms
Market
Law
Technology
CS, Statistics, EE
Law Social Philosophy
Economics
Social Norms
Market
Law
Technology
Project #1
“Contextual Integrity
through the Lens of
Computer Science”
CS, Statistics, EE
Law Social Philosophy
Contextual Integrity
Economics
...a typical I School dissertation...
Surveys the use of Contextual Integrity, a theory of privacy norms, in Computer Science.
Identifies theoretical gaps in CI and opportunities for innovation in privacy CS.
Social Norms
Market
Law
Technology
Project #2
“Origin Privacy:
Causality and Data
Protection”
CS, Statistics, EE
Law Social Philosophy
Economics
...a typical I School dissertation...
Identifies a information flow restrictions in law based both on semantics and origin.
Resolves this ambiguity through theoretical contribution: situated information flow.
Shows application of this contribution to information security in embedded systems.
Social Norms
Market
Law
Technology
Project #3
“Data Games and the
Value of Information”
CS, Statistics, EE
Law Social Philosophy
Economics
...a typical I School dissertation...
Invents data economics to fill gap in economic theory.
Theoretical contribution: data games, mechanism design for information flow.
Answers: what is the value of information? Demonstrates with several examples.
Social Norms
Market
Law
Technology
CS, Statistics, EE
Law Social Philosophy
Economics
...a typical I School dissertation...
problem solved
:-p
Project #1:
Contextual Integrity
through the Lens of
Computer Science
Social NormsTechnology
Contextual Integrity through the Lens of computer science
Sebastian Benthall
Seda Gürses
Helen Nissenbaum
A presentation of S. Benthall, S. Gürses and H. Nissenbaum. Contextual Integrity through the Lens of Computer
Science. Foundations and Trends in Privacy and Security, vol. 2, no. 1, pp. 1–69, 2017
Included as Chapter 2 of Sebastian Benthall’s doctoral dissertation titled.
“Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data
Economics”
Project Goals
● Characterize different ways various CS efforts have interpreted and applied
Contextual Integrity (CI);
● Identify gaps in both contextual integrity and its technical projection that this body
of work reveals;
● Distill insights from these applications in order to facilitate future applications of
contextual integrity in privacy research and design.
“Making CI more actionable for computer science and computer scientists.”
Background: What is Contextual Integrity?
A social philosophy of privacy developed by Helen Nissenbaum.
● Privacy is appropriate information flow.
● Appropriateness depends on social context;
social contexts have information norms.
● Norms are adapted to societal values, contextual purposes, and individual ends.
● Norms are structured with five parameters:
○ (1) Sender, (2) Receiver, (3) Subject, (4) Attribute, (5) Transmission Principle
Example: In the health care context, there is a norm that when (1) a patient gives
information about (3) their (4) health to (2) a doctor, that information is treated
confidentially.
Background: Context in computing and policy
● Contextual Integrity:
○ Privacy as appropriate information flow according to contextual norms. First paper: 2004..
○ Uptake in computer science since 2006.
● Context in ubiquitous computing
○ An earlier computer science research tradition, pioneered by e.g. Dey in 2001 is also concerned
with privacy
○ “Context” refers to a situation: facts about the user, computer, environment. Location, identity,
state…
○ Dourish (2004) publishes a critique, arguing for interactional (not representational) context in
UbiComp.
● Context in policy
○ Excitement about privacy (FTC, White House, WEF) as respect for context motivates computer
science interest in Contextual Integrity...
○ … but within CS, multiple traditions are blended together.
Study: research method
● Developed analytic template based on research questions.
● Searched for CS papers that claim to be using CI. (We found 20)
● Applied analytic template systematically to each paper.
● Used results to derive answers to each research question.
A systematic review of computer science literature using Contextual Integrity.
Study: research questions
● RQ1. For what kind of problems and solutions do computer scientists use CI?
○ Particular subfields of CS.
● RQ2. How have the authors dealt with the conceptual aspects of CI?
○ Social contexts, norms with specific parameters…
● RQ3. How have the authors dealt with the normative aspects of CI?
○ Norms are derived from social contexts, which are adaptations of a differentiated society.
● RQ4. Do the researchers expand on CI?
○ Where do CS researchers need to fill gaps or add to CI to make concrete systems work?
Results: RQ1 Architecture
CS researchers used CI across a few classes of technical architecture.
● User interfaces and experiences. These focus on an individual user’s activity
and preferences, rather than social norms.
● Infrastructure. Catering to a large set of users and diverse applications.
○ Social platforms. Technology that spans multiple social contexts.
○ Technical platforms. Technology that mediates many different other technologies. What about
the operators of these platforms?
○ Formal models. Frameworks to be used in design, but without implementation details.
● Decentralization. Decentralized architectures mirror complexity of society
itself. An interesting area for future research.
Findings: RQ1 Architecture
Theoretical Gaps:
- “Modular Contextual Integrity”,
faceting CI and giving guidelines
for design and research at specific
levels of the technical stack
- Specific guidance for
infrastructure design
Calls to Action:
- Be explicit about how system is
situated among other actors
(operators, moderators, etc.)
- Develop formal models that
connect user preferences with
contextual norms
Results: RQ2 What did they mean by context?
CS researchers had varying understandings of ‘context’’; e.g. sphere vs. situation.
Substantiality Abstract: Hospitals in general. Concrete: Mount Sinai Beth Israel hospital.
Domain Social: A classroom with a teacher and
students is a social context.
Technical: A language education mobile app.
Valence Normative: A conference Code of Conduct
is an account of norms inherent in a context.
Descriptive: A list of attendees, keynote
speakers, and program committee members
is a description of the context.
Stability
(Dourish, ‘04)
Representational: The Oval Office in the
White House is an easily represented
context..
Interactional: A flash mob is an interactional
context.
Findings: RQ2 Contexts
Theoretical Gaps:
- CI needs an account of how social
spheres connect to sociotechnical
situations
- What about interactional
contexts?
Calls to Action:
- Specifically address how ‘context’
is used, and when technology
bridges two or more meanings of
the term
- Detail flows of information to
third parties; what context is
that?
Results: RQ3 Source of Normativity
CI is specific about where norms come from: social adaptation to ends, purposes,
and values within differentiated spheres of society.
CS papers have not adopted this source of normativity entirely. Instead, they use:
● Compliance and Policy. System is designed to comply with existing laws and
policies.
● Threats. System is designed with a Threat Model, typical of security research.
● User preferences and expectations. Individual user preferences and/or
expectations solicited.
● Engagement. Users interact with system to determine norms dynamically.
Findings: RQ3 Normativity
Theoretical Gaps:
- Connect CI’s metaethical theory
with concrete sources of
normativity familiar to CS
- Spheres to threats?
- Spheres to user expectations?
- Spheres to the law?
Calls to Action:
- Measuring norms, not
expectations
- Supporting user engagement
around identifying norms
- Technical solutions for handling
conflicts over norms
Results: RQ4 Expanding CI
● Technological adaptation to changing social conditions.
● Technology operating in multiple contexts at once, or addressing context clash,
where activity in different contexts interact.
● Addressing the temporality and duration of information, and its effect on
privacy
● User decision making with respect to privacy and information flow controls.
Findings: RQ4 Expanding CI
Theoretical Gaps:
- Develop account of normative
change and adaptation
- Address the questions around
multiple interacting contexts
- Address time: duration of
information, forgetting, etc.
- What about user choice?
Calls to Action:
- More modeling CI from
information theory, information
flow security
- CI and differential privacy?
Disciplinary
Bridge
Bridge: Themes from Project #1
Contextual Integrity needs to be expanded...
● ...to account for social and technological platforms that span multiple social
spheres, perhaps by introducing an “operator” context.
● ...to account for more of the meanings of “context” that range from abstract
social spheres to concrete sociotechnical situations.
● ...for clarity on how social norms form to reflect ends, purposes, and values in
society, and the relationship between these norms and the law.
● ...to address the challenging cases where multiple social contexts collide or
clash.
What we get…
… life and technology make things
complicated.
Bridge: Dealing with context collisions
What society wants from privacy...
Professional Personal
Med Edu Fin
Professional Personal
Med
Edu
Fin
The problem with information semantics
Contextual Integrity says there are five parameters of an information norm:
Sender, Receiver, Subject, attribute, and Transmission Principle.
[Patient, Doctor, Patient, Health, Confidentiality]
But... information topics are indeterminate. E.g.:
What does information mean?
● In CI, information gets its meaning from its context: how actors in roles
normatively communicate with each other.
○ The meaning of information and the contextual practices are mutually constitutive.
● When information flows in a new way (between situations), that information
gets new meanings.
○ E.g. When your relatives see a Facebook post intended for friends, it gives them the opportunity
to make judgments about you that were not the intended meaning.
● Technical context collapse is challenging not because it violates norms, but
because it is beyond our social understanding but creates information flows
with new social meaning that may be disruptive to social life.
What does
“information”
mean?
We have gotten this far without a definition or theory.
No wonder things are so #?!$?!@?!*?!.
What does “information” mean?
According to Dretske (1981) (epistemology, philosopher of mind)
building on Shannon (1948), information is a naturalistic and causal
property:
Information that P is the message/signal needed for a suitably equipped
observer to learn P, due to the nomic associations of the signal with P.
Nomic means “law-like”, as in scientific law.
The red light carries the information that the train is coming because
(lawfully, regularly) the red is light if and only if the train is coming.
In the following projects we will update Dretske’s theory.
Using insights from statistics and computer science, we
will arrive at a specific formal concept of
situated information flow
for cross-disciplinary use.
Bayesian Networks
Bayesian Networks (BN) are a formalism for
representing the relationship between random events.
A BN has:
● A directed, acyclic graph of nodes, representing random variables, connected
by edges
● A conditional probability distribution (CPD) for each node, which is the
probability distribution of its random variables, conditional on its parent.
Together. these define a joint probability distribution over all the random variables,
with some important independence relations qualitatively inferable from the graph.
A
C
D
B E
What is information flow, really?
Pearl’s (2000) system for understanding causality is widely
acknowledged and applied in statistics, philosophy, machine learning,
cognitive psychology, social science research methods, …
Events are part of a causal
structure represented as a
directed acyclic graph.
This structure determines the
conditional dependency
of events on each other.
Recession Earthquake
Burglary
Alarm
What is information flow, really?
The alarm carries information about earthquakes, burglaries, and
recessions. (Topics are indeterminate).
In this model, the recession
and earthquakes are
conditionally independent.
I(Recession, Earthquake) = 0
(Carry no information
about each other;
have no mutual information.)
Recession Earthquake
Burglary
Alarm
What is information flow, really?
The alarm carries information about earthquakes, burglaries, and
recessions. (Topics are indeterminate).
In this model, the recession and earthquakes
are conditionally dependent
if we know the alarm has gone off. Recession Earthquake
Burglary
Alarm
Information flow: a unified model
1. Privacy is appropriate information flow.
(Nissenbaum)
2. Information flow is a message or signal from which
something can be learned because of nomic association.
(Dretske)
3. The nomic associations are the conditional
dependencies derived from causal structure. (Pearl)
The meaning of data is a function of the processes that
generated it, and their context.
Causality
Recession Earthquake
Burglary
Alarm
Causality
What makes these causal models is Pearl’s do-calculus: an intervention on an event
severs the links from its parent nodes.
An intervention can made by anything exogenous to the model.
Recession Earthquake
Burglary
Alarm
do
Causality
“But this isn’t causality!
What about Rubin, treatment effects, randomized controlled experiments, ….”
- an economist in the audience
Pearlian causation fits how we experience and reason about causality (e.g., Sloman).
Interventionist causation has support from philosophers (e.g., Woodward).
It is compatible with other methods of causal inference and model fitting (e.g.,
Gelman).
It is used widely in social sciences like demography and sociology (e.g., Elwert).
It is the consensus view. We should use it!
situated information flow
is a causal flow between events
in the context of other causal relations.
Project #2:
Origin Privacy:
Causality and Data
Protection
Law Technology
Origin Privacy: Causality and Data Protection
Sebastian Benthall
Anupam Datta
Michael Tschantz
A technical report.
Included as Chapter 4 of Sebastian Benthall’s doctoral dissertation titled.
“Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data
Economics”
Origin Privacy: Highlights
● Policy motivations:
○ Origin and topic information flow restrictions in the law
○ Bounded information processing systems
● Use the theory of situated information flow
● Model a system embedded in its environment
● This uncovers an interesting class of security threats due to a confusion
between caused and embedded inputs.
(There’s a lot more in the chapter…)
Origin Privacy: Policy requirements: HIPAA
The HIPAA Privacy Rule defines psychotherapy notes as
notes recorded by a health care provider who is a mental health professional
documenting or analyzing the contents of … [a] counseling session
Psychotherapy notes are more protected than other protected health information,
intended only for use by the therapist.
These restrictions are tied to the provenance of the information: the counseling
session.
Origin Privacy: Policy requirements: GLBA
The Privacy Rule protects a consumer's "nonpublic personal information" (NPI).
● any information an individual gives you to get a financial product or service (e.g.,
name, address, income)
● any information you get about an individual from a transaction involving your
financial product(s) or service(s) (for example, the fact that an individual is your
consumer or customer),
● any information you get about an individual in connection with providing a financial
product or service (for example, information from court records or from a consumer
report).
There are origin requirements, but also more general subject requirements.
(from FTC.gov)
Claim: policies use origin and meaning based information flow restrictions in an
ambiguous way because:
(1) real information flow is situated
(2) the causal context determines:
- The origin
- The nomic associations (meaning)
Policy ambiguity problem solved!
Pregnancy
Purchases
Advertisements
Policy requirements: PCI DSS
“The PCI DSS security requirements apply to all system components included in or
connected to the cardholder data environment. The cardholder data environment
(CDE) is comprised of people, processes and technologies that store, process, or
transmit cardholder data or sensitive authentication data.”
PCI DSS determines the domain in which it applies in terms of the physical
connections between components.
Could PCI DSS be enforced or complied with if it applied to system components
unconnected from the CDE?
Environment
Embedded Causal System (ECS) model
System
Environment
Embedded Causal System (ECS) model
SH
SL
AH
AL
Is this system secure? AL
independent of SH
?
Environment
Embedded Causal System (ECS) model
SH
SL
AH
AL
Is this system secure? AL
independent of SH
?
✔
Environment
Embedded Causal System (ECS) model
SH
SL
AH
AL
Ei
EL
EH
Is this system secure? AL
independent of SH
?
Environment
Embedded Causal System (ECS) model
SH
SL
AH
AL
Ei
EL
EH
Is this system secure? AL
independent of SH
?
Environment
Embedded Causal System (ECS) model
SH
SL
AH
AL
Ei
EL
EH
Is this system secure? AL
independent of SH
?
X
Environment
Embedded Causal System (ECS) model
SH
SL
AH
AL
Ei
EL
EH
Is this system secure? AL
independent of SH
?
do
do
Environment
Embedded Causal System (ECS) model
SH
SL
AH
AL
Ei
EL
EH
The point
● Probabilistic modeling of situated information flows can express both an
information processing system and its environment.
● Different security properties can be mapped onto this model and onto different
model conditions (interventions, observations)
● This gives us a fine-grained way to do compliance engineering.
Project #3:
Data Games and the
Value of Information
MarketLaw
The story so far...
● Social expectations of privacy may be expressed as norms of information flow
indexed to social spheres (Contextual Integrity)...
● … but our situation today is messy; our contexts collide because of our technical
infrastructure.
● Our complex situation means that we have lost control over what our
information means. Topics (part of the structure of norms) are indeterminate.
● Moving forward, we should scaffold our theory of privacy with situated
information flow, and build up to normative theory.
Data Games and the Value of Information
Goals (narrow version):
● Address the problem raised by Chapter 1 about how to model and design for
cross-context information flows in infrastructure…
● ...using the insights about situated information flow…
● ...to understand the economic impact of data protection laws.
Social Norms
Market
Law
Technology
Tech Industry
Public Regulation
a2
+ b2
= c2
“Words, words,
words…”
CS, Statistics, EE
Law Social Philosophy
Economics
#?!$?!@?!*?!
U.S. Data Protection Laws
● Intellectual property laws
○ Since Feist v. Rural (1991), data (facts) are not protected by copyright.
○ Samuelson (2000) argues intellectual property won’t work for privacy because property is
alienable, but privacy rights aren’t.
● Confidentiality and sectoral privacy laws. HIPAA, GLBA, FERPA,
attorney-client privilege.
○ All tied to specific sectors or spheres.
○ They do reduce information flow outside of the situations where they apply.
○ But they do not regulate “the gaps”
● FTC notice-and-consent self-regulatory standard
○ Paradox: the more technically and legally detailed the notices, the more ignorant the consent!
○ Nobody thinks this is working.
E.U. General Data Protection Regulation
● It’s based on privacy rights. (Compare with IP)
○ Sort of like a property right, but different.
● Omnibus. It covers all the cases. (Compare with sectoral laws)
● New obligations protect the rights:
○ Data minimization says don’t keep or process data for no agreed upon reason.
○ Also some general exceptions to data protection, which may erode the protections...
● Consent is given to particular purposes of use.
○ A purpose is less complicated than legalease or technical data flow, so better notices?
Purpose-binding in the GDPR is reminiscent of Contextual Integrity, but is based on
rights not norms.
Law and economics for data?
● There is an important legal tradition of law and economics, using economic
theory to inform legal judgments.
● Do we have one for the data economy?
● To better design data protection policy (and antitrust? etc.), we need economic
theory that captures the economic impact on everyone involved (data
processors, data subjects, and others)...
Data Games and the Value of Information
Goals (real agenda):
● Using the insights about situated information flow…
● … develop a new tool, data games, for understanding the value of information ...
● … to start a new field of inquiry, data economics, that can better understand the
foundational principles of the information economy!
“Surely, that has been done before,” you say.
Contextualism in privacy economics
● “The Economics of Privacy,” by Acquisti, Taylor, and Wagman (2016) surveys
the existing privacy literature.
● They judge that economics can only ever deal with privacy in a contextually
specific way.
● While this sounds nice, it runs into the same problem as CI! Namely, …
● We know the most important practice in the data economy is date reuse, i.e., use
of data collected in one context for another!
Something new!
● We need a new data economics! Really!
● We need a way to model the outcomes of creating and destroying information
flows between strategic actors.
● The difference in outcome for each actor is the value of information.
We need a new tool to measure the value of information..
We will start with situated information flow
and add features for game theory
and mechanism design.
We can call this new tool a
data game
Multi-Agent Influence Diagrams (MAIDs)
MAIDs: Bayesian Networks + game theory. (Daphne Koller & Brian Milch, ‘03)
They have a set of agents and a directed acyclic graph with three kinds of nodes:
Chance variables. Random variables with a CPD conditioning on their
Parents in the graph.
Decision variables. The have an associated player. They
do not have a CPD (this is chosen by the player later).
Utility variables. These have a CPD and an associated player. They
may not have any descendants.
X
Y
Z
Multi-Agent Influence Diagrams (MAIDs)
To “play” a MAID:
1) Each player simultaneously chooses a CPD for
every decision node they control. This is their
strategy profile, σ.
2) The strategies induce the MAID into a Bayesian
Network, which is sampled.
3) The sum of the sampled values of each player’s
utility variables are awarded as payoffs.
W
X1
Z1
X2
Z2
Y
Multi-Agent Influence Diagrams (MAIDs)
W
X1
Z1
X2
Z2
Y
W
X1
Z1
X2
Z2
Y
σ
σ
+
Data Games: Optional Edge
An optional edge (dotted arrow) implies the
diagram represents two different games, an open
and a closed case.
Note that the edge is an information flow.
We can now reason systematically about
consequences of allowing an information flow.
(This is mechanism design).
W
X1
Z1
X2
Z2
Y
Let’s look at two data games
for economic contexts that are well-understood
Example: Principal/Agent
V B
Up
Ua
Earliest privacy economics argument (?) by Richard Posner (1981):
Employers depend on information about potential hires (V) to make efficient
decisions (B).
More privacy means less information means less efficiency.
Example: Principal/Agent
E(U) Open Closed
Principal (E(X | X > w) - w) P[X > w] (x - w)[E(X) > w]
Agent w P[X > w] w[E(X) > w]
Agent, x > w w w[E(X) > w]
Agent, x < w 0 w[E(X) > w]
V B
Up
Ua
Example: Price discrimination
V R B
Uf
Uc
An important economic use of personal information is price differentiation (Shapiro
and Varian, 1998).
The firm chooses its price (R) with or without knowledge of the consumer’s demand
(V). The consumer chooses whether or not to buy (B) after getting the price.
Example: Price discrimination
E(U) Open Closed
Firm x - ϵ z* P[X > z]
Consumer ϵ (x - z*)P[X > z]
Consumer, x > z* ϵ x - z*
Consumer, x < z* ϵ 0
z* = argmaxz
E[z P(z < x)]
V R B
Uf
Uc
Let’s look at two data games
for economic situations that have not yet been studied
Example: Expert Services
V
C
R
A
Uf
Uc
W The expert knows some specialized facts
about the world (W) (i.e., medicine, law, the
web).
The client’s personal character traits (C),
determine the value of each of m actions.
The expert, who may or may not know the
character traits, provides a recommendation
(R). The consumer takes an action (A).
The incentives of the expert and the client
are aligned (no conflicts of interest).
Example: Expert Services
V
C
R!
A
Uf
Uc
W First observation:
If the domain of R allows for enough bits (in
the Shannon sense) for the expert to encode
all the information in W, then personalization
is irrelevant.
This demonstrates a fundamental link
between Shannon information theory and
data economics: information bottlenecks
matter.
Example: Expert Services
V
C R
A
Uf
Uc
W First observation:
If the domain of R is narrow compared to the
expert knowledge W, then personalization
(an open edge C -> R) does improve outcomes
for the expert and client.
The value of a personalized service is
efficient dissemination of information
through a small channel.
Example: Context Collision
V’ B’
Up
Uc
D R B
Uf
Uc
W
Cross-context flows: the point
Data is valuable not as a good, but as a strategic resource.
It’s not consumed; it is part of the structure of the game of social and economic
relations itself.
Market externalities are the rule, not the exception.
Traditional theories of market equilibrium, industrial organization,
etc. are not going to cut it. We need to start the field of data
economics.
Tactical vs. Strategic Flows
● When considering an optional information flow, we can compare the
equilibrium outcomes of the open and closed cases. Call this the strategic
consequences of the information flow.
● We can also consider the outcome of opening the flow, while keeping the
closed equilibrium strategy for all players except the recipient of the
information. Call this the tactical consequences of the flow.
We can use this to make sensitive distinctions about, e.g. the effects of a data breach
vs. the chilling effects of ongoing surveillance. A data economics for cybersecurity?
Thank you.

Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics

  • 1.
    Context, Causality, andInformation Flow: Implications for Privacy Engineering, Security, and Data Economics A presentation of doctoral dissertation research by Sebastian Benthall UC Berkeley School of Information
  • 2.
    Outline ● Motivation ● Overviewof Projects ● Project #1: Contextual Integrity through the Lens of Computer Science ● Disciplinary Bridge ● Project #2: Origin Privacy: Causality and Data Protection ● Project #3: Data Games and the Value of Information ● Concluding remarks THIS IS A FUN INTERACTIVE TALK: An image slide means it is time to ask a question. I’ll answer one question per picture.
  • 3.
  • 4.
    Four modalities regulatingcyberspace. (Lessig, 2009) Social Norms Market Law Technology
  • 5.
  • 6.
    Social Norms Market Law Technology Tech Industry PublicRegulation a2 + b2 = c2 “Words, words, words…” CS, Statistics, EE Law Social Philosophy Economics
  • 7.
    Social Norms Market Law Technology Tech Industry PublicRegulation a2 + b2 = c2 “Words, words, words…” CS, Statistics, EE Law Social Philosophy Economics
  • 8.
    Social Norms Market Law Technology Tech Industry PublicRegulation a2 + b2 = c2 “Words, words, words…” CS, Statistics, EE Law Social Philosophy Economics #?!$?!@?!*?!
  • 9.
    Social Norms Market Law Technology Tech Industry PublicRegulation a2 + b2 = c2 “Words, words, words…” CS, Statistics, EE Law Social Philosophy Economics #?!$?!@?!*?!
  • 10.
    Social Norms Market Law Technology Tech Industry PublicRegulation a2 + b2 = c2 “Words, words, words…” CS, Statistics, EE Law Social Philosophy Economics #?!$?!@?!*?! We can’t ignore this problem.
  • 12.
  • 13.
    Social Norms Market Law Technology CS, Statistics,EE Law Social Philosophy Economics #?!$?!@?!*?!
  • 14.
    Social Norms Market Law Technology CS, Statistics,EE Law Social Philosophy Economics
  • 15.
    Social Norms Market Law Technology Project #1 “ContextualIntegrity through the Lens of Computer Science” CS, Statistics, EE Law Social Philosophy Contextual Integrity Economics ...a typical I School dissertation... Surveys the use of Contextual Integrity, a theory of privacy norms, in Computer Science. Identifies theoretical gaps in CI and opportunities for innovation in privacy CS.
  • 16.
    Social Norms Market Law Technology Project #2 “OriginPrivacy: Causality and Data Protection” CS, Statistics, EE Law Social Philosophy Economics ...a typical I School dissertation... Identifies a information flow restrictions in law based both on semantics and origin. Resolves this ambiguity through theoretical contribution: situated information flow. Shows application of this contribution to information security in embedded systems.
  • 17.
    Social Norms Market Law Technology Project #3 “DataGames and the Value of Information” CS, Statistics, EE Law Social Philosophy Economics ...a typical I School dissertation... Invents data economics to fill gap in economic theory. Theoretical contribution: data games, mechanism design for information flow. Answers: what is the value of information? Demonstrates with several examples.
  • 18.
    Social Norms Market Law Technology CS, Statistics,EE Law Social Philosophy Economics ...a typical I School dissertation... problem solved :-p
  • 20.
    Project #1: Contextual Integrity throughthe Lens of Computer Science Social NormsTechnology
  • 21.
    Contextual Integrity throughthe Lens of computer science Sebastian Benthall Seda Gürses Helen Nissenbaum A presentation of S. Benthall, S. Gürses and H. Nissenbaum. Contextual Integrity through the Lens of Computer Science. Foundations and Trends in Privacy and Security, vol. 2, no. 1, pp. 1–69, 2017 Included as Chapter 2 of Sebastian Benthall’s doctoral dissertation titled. “Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics”
  • 22.
    Project Goals ● Characterizedifferent ways various CS efforts have interpreted and applied Contextual Integrity (CI); ● Identify gaps in both contextual integrity and its technical projection that this body of work reveals; ● Distill insights from these applications in order to facilitate future applications of contextual integrity in privacy research and design. “Making CI more actionable for computer science and computer scientists.”
  • 23.
    Background: What isContextual Integrity? A social philosophy of privacy developed by Helen Nissenbaum. ● Privacy is appropriate information flow. ● Appropriateness depends on social context; social contexts have information norms. ● Norms are adapted to societal values, contextual purposes, and individual ends. ● Norms are structured with five parameters: ○ (1) Sender, (2) Receiver, (3) Subject, (4) Attribute, (5) Transmission Principle Example: In the health care context, there is a norm that when (1) a patient gives information about (3) their (4) health to (2) a doctor, that information is treated confidentially.
  • 24.
    Background: Context incomputing and policy ● Contextual Integrity: ○ Privacy as appropriate information flow according to contextual norms. First paper: 2004.. ○ Uptake in computer science since 2006. ● Context in ubiquitous computing ○ An earlier computer science research tradition, pioneered by e.g. Dey in 2001 is also concerned with privacy ○ “Context” refers to a situation: facts about the user, computer, environment. Location, identity, state… ○ Dourish (2004) publishes a critique, arguing for interactional (not representational) context in UbiComp. ● Context in policy ○ Excitement about privacy (FTC, White House, WEF) as respect for context motivates computer science interest in Contextual Integrity... ○ … but within CS, multiple traditions are blended together.
  • 26.
    Study: research method ●Developed analytic template based on research questions. ● Searched for CS papers that claim to be using CI. (We found 20) ● Applied analytic template systematically to each paper. ● Used results to derive answers to each research question. A systematic review of computer science literature using Contextual Integrity.
  • 27.
    Study: research questions ●RQ1. For what kind of problems and solutions do computer scientists use CI? ○ Particular subfields of CS. ● RQ2. How have the authors dealt with the conceptual aspects of CI? ○ Social contexts, norms with specific parameters… ● RQ3. How have the authors dealt with the normative aspects of CI? ○ Norms are derived from social contexts, which are adaptations of a differentiated society. ● RQ4. Do the researchers expand on CI? ○ Where do CS researchers need to fill gaps or add to CI to make concrete systems work?
  • 28.
    Results: RQ1 Architecture CSresearchers used CI across a few classes of technical architecture. ● User interfaces and experiences. These focus on an individual user’s activity and preferences, rather than social norms. ● Infrastructure. Catering to a large set of users and diverse applications. ○ Social platforms. Technology that spans multiple social contexts. ○ Technical platforms. Technology that mediates many different other technologies. What about the operators of these platforms? ○ Formal models. Frameworks to be used in design, but without implementation details. ● Decentralization. Decentralized architectures mirror complexity of society itself. An interesting area for future research.
  • 29.
    Findings: RQ1 Architecture TheoreticalGaps: - “Modular Contextual Integrity”, faceting CI and giving guidelines for design and research at specific levels of the technical stack - Specific guidance for infrastructure design Calls to Action: - Be explicit about how system is situated among other actors (operators, moderators, etc.) - Develop formal models that connect user preferences with contextual norms
  • 30.
    Results: RQ2 Whatdid they mean by context? CS researchers had varying understandings of ‘context’’; e.g. sphere vs. situation. Substantiality Abstract: Hospitals in general. Concrete: Mount Sinai Beth Israel hospital. Domain Social: A classroom with a teacher and students is a social context. Technical: A language education mobile app. Valence Normative: A conference Code of Conduct is an account of norms inherent in a context. Descriptive: A list of attendees, keynote speakers, and program committee members is a description of the context. Stability (Dourish, ‘04) Representational: The Oval Office in the White House is an easily represented context.. Interactional: A flash mob is an interactional context.
  • 31.
    Findings: RQ2 Contexts TheoreticalGaps: - CI needs an account of how social spheres connect to sociotechnical situations - What about interactional contexts? Calls to Action: - Specifically address how ‘context’ is used, and when technology bridges two or more meanings of the term - Detail flows of information to third parties; what context is that?
  • 32.
    Results: RQ3 Sourceof Normativity CI is specific about where norms come from: social adaptation to ends, purposes, and values within differentiated spheres of society. CS papers have not adopted this source of normativity entirely. Instead, they use: ● Compliance and Policy. System is designed to comply with existing laws and policies. ● Threats. System is designed with a Threat Model, typical of security research. ● User preferences and expectations. Individual user preferences and/or expectations solicited. ● Engagement. Users interact with system to determine norms dynamically.
  • 33.
    Findings: RQ3 Normativity TheoreticalGaps: - Connect CI’s metaethical theory with concrete sources of normativity familiar to CS - Spheres to threats? - Spheres to user expectations? - Spheres to the law? Calls to Action: - Measuring norms, not expectations - Supporting user engagement around identifying norms - Technical solutions for handling conflicts over norms
  • 34.
    Results: RQ4 ExpandingCI ● Technological adaptation to changing social conditions. ● Technology operating in multiple contexts at once, or addressing context clash, where activity in different contexts interact. ● Addressing the temporality and duration of information, and its effect on privacy ● User decision making with respect to privacy and information flow controls.
  • 35.
    Findings: RQ4 ExpandingCI Theoretical Gaps: - Develop account of normative change and adaptation - Address the questions around multiple interacting contexts - Address time: duration of information, forgetting, etc. - What about user choice? Calls to Action: - More modeling CI from information theory, information flow security - CI and differential privacy?
  • 37.
  • 38.
    Bridge: Themes fromProject #1 Contextual Integrity needs to be expanded... ● ...to account for social and technological platforms that span multiple social spheres, perhaps by introducing an “operator” context. ● ...to account for more of the meanings of “context” that range from abstract social spheres to concrete sociotechnical situations. ● ...for clarity on how social norms form to reflect ends, purposes, and values in society, and the relationship between these norms and the law. ● ...to address the challenging cases where multiple social contexts collide or clash.
  • 39.
    What we get… …life and technology make things complicated. Bridge: Dealing with context collisions What society wants from privacy... Professional Personal Med Edu Fin Professional Personal Med Edu Fin
  • 40.
    The problem withinformation semantics Contextual Integrity says there are five parameters of an information norm: Sender, Receiver, Subject, attribute, and Transmission Principle. [Patient, Doctor, Patient, Health, Confidentiality] But... information topics are indeterminate. E.g.:
  • 41.
    What does informationmean? ● In CI, information gets its meaning from its context: how actors in roles normatively communicate with each other. ○ The meaning of information and the contextual practices are mutually constitutive. ● When information flows in a new way (between situations), that information gets new meanings. ○ E.g. When your relatives see a Facebook post intended for friends, it gives them the opportunity to make judgments about you that were not the intended meaning. ● Technical context collapse is challenging not because it violates norms, but because it is beyond our social understanding but creates information flows with new social meaning that may be disruptive to social life.
  • 42.
    What does “information” mean? We havegotten this far without a definition or theory. No wonder things are so #?!$?!@?!*?!.
  • 43.
    What does “information”mean? According to Dretske (1981) (epistemology, philosopher of mind) building on Shannon (1948), information is a naturalistic and causal property: Information that P is the message/signal needed for a suitably equipped observer to learn P, due to the nomic associations of the signal with P. Nomic means “law-like”, as in scientific law. The red light carries the information that the train is coming because (lawfully, regularly) the red is light if and only if the train is coming.
  • 44.
    In the followingprojects we will update Dretske’s theory. Using insights from statistics and computer science, we will arrive at a specific formal concept of situated information flow for cross-disciplinary use.
  • 45.
    Bayesian Networks Bayesian Networks(BN) are a formalism for representing the relationship between random events. A BN has: ● A directed, acyclic graph of nodes, representing random variables, connected by edges ● A conditional probability distribution (CPD) for each node, which is the probability distribution of its random variables, conditional on its parent. Together. these define a joint probability distribution over all the random variables, with some important independence relations qualitatively inferable from the graph. A C D B E
  • 46.
    What is informationflow, really? Pearl’s (2000) system for understanding causality is widely acknowledged and applied in statistics, philosophy, machine learning, cognitive psychology, social science research methods, … Events are part of a causal structure represented as a directed acyclic graph. This structure determines the conditional dependency of events on each other. Recession Earthquake Burglary Alarm
  • 47.
    What is informationflow, really? The alarm carries information about earthquakes, burglaries, and recessions. (Topics are indeterminate). In this model, the recession and earthquakes are conditionally independent. I(Recession, Earthquake) = 0 (Carry no information about each other; have no mutual information.) Recession Earthquake Burglary Alarm
  • 48.
    What is informationflow, really? The alarm carries information about earthquakes, burglaries, and recessions. (Topics are indeterminate). In this model, the recession and earthquakes are conditionally dependent if we know the alarm has gone off. Recession Earthquake Burglary Alarm
  • 49.
    Information flow: aunified model 1. Privacy is appropriate information flow. (Nissenbaum) 2. Information flow is a message or signal from which something can be learned because of nomic association. (Dretske) 3. The nomic associations are the conditional dependencies derived from causal structure. (Pearl) The meaning of data is a function of the processes that generated it, and their context.
  • 50.
  • 51.
    Causality What makes thesecausal models is Pearl’s do-calculus: an intervention on an event severs the links from its parent nodes. An intervention can made by anything exogenous to the model. Recession Earthquake Burglary Alarm do
  • 52.
    Causality “But this isn’tcausality! What about Rubin, treatment effects, randomized controlled experiments, ….” - an economist in the audience Pearlian causation fits how we experience and reason about causality (e.g., Sloman). Interventionist causation has support from philosophers (e.g., Woodward). It is compatible with other methods of causal inference and model fitting (e.g., Gelman). It is used widely in social sciences like demography and sociology (e.g., Elwert). It is the consensus view. We should use it!
  • 53.
    situated information flow isa causal flow between events in the context of other causal relations.
  • 55.
    Project #2: Origin Privacy: Causalityand Data Protection Law Technology
  • 56.
    Origin Privacy: Causalityand Data Protection Sebastian Benthall Anupam Datta Michael Tschantz A technical report. Included as Chapter 4 of Sebastian Benthall’s doctoral dissertation titled. “Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics”
  • 57.
    Origin Privacy: Highlights ●Policy motivations: ○ Origin and topic information flow restrictions in the law ○ Bounded information processing systems ● Use the theory of situated information flow ● Model a system embedded in its environment ● This uncovers an interesting class of security threats due to a confusion between caused and embedded inputs. (There’s a lot more in the chapter…)
  • 58.
    Origin Privacy: Policyrequirements: HIPAA The HIPAA Privacy Rule defines psychotherapy notes as notes recorded by a health care provider who is a mental health professional documenting or analyzing the contents of … [a] counseling session Psychotherapy notes are more protected than other protected health information, intended only for use by the therapist. These restrictions are tied to the provenance of the information: the counseling session.
  • 59.
    Origin Privacy: Policyrequirements: GLBA The Privacy Rule protects a consumer's "nonpublic personal information" (NPI). ● any information an individual gives you to get a financial product or service (e.g., name, address, income) ● any information you get about an individual from a transaction involving your financial product(s) or service(s) (for example, the fact that an individual is your consumer or customer), ● any information you get about an individual in connection with providing a financial product or service (for example, information from court records or from a consumer report). There are origin requirements, but also more general subject requirements. (from FTC.gov)
  • 60.
    Claim: policies useorigin and meaning based information flow restrictions in an ambiguous way because: (1) real information flow is situated (2) the causal context determines: - The origin - The nomic associations (meaning) Policy ambiguity problem solved! Pregnancy Purchases Advertisements
  • 61.
    Policy requirements: PCIDSS “The PCI DSS security requirements apply to all system components included in or connected to the cardholder data environment. The cardholder data environment (CDE) is comprised of people, processes and technologies that store, process, or transmit cardholder data or sensitive authentication data.” PCI DSS determines the domain in which it applies in terms of the physical connections between components. Could PCI DSS be enforced or complied with if it applied to system components unconnected from the CDE?
  • 62.
  • 63.
    Environment Embedded Causal System(ECS) model SH SL AH AL Is this system secure? AL independent of SH ?
  • 64.
    Environment Embedded Causal System(ECS) model SH SL AH AL Is this system secure? AL independent of SH ? ✔
  • 65.
    Environment Embedded Causal System(ECS) model SH SL AH AL Ei EL EH Is this system secure? AL independent of SH ?
  • 66.
    Environment Embedded Causal System(ECS) model SH SL AH AL Ei EL EH Is this system secure? AL independent of SH ?
  • 67.
    Environment Embedded Causal System(ECS) model SH SL AH AL Ei EL EH Is this system secure? AL independent of SH ? X
  • 68.
    Environment Embedded Causal System(ECS) model SH SL AH AL Ei EL EH Is this system secure? AL independent of SH ? do do
  • 69.
    Environment Embedded Causal System(ECS) model SH SL AH AL Ei EL EH
  • 70.
    The point ● Probabilisticmodeling of situated information flows can express both an information processing system and its environment. ● Different security properties can be mapped onto this model and onto different model conditions (interventions, observations) ● This gives us a fine-grained way to do compliance engineering.
  • 72.
    Project #3: Data Gamesand the Value of Information MarketLaw
  • 73.
    The story sofar... ● Social expectations of privacy may be expressed as norms of information flow indexed to social spheres (Contextual Integrity)... ● … but our situation today is messy; our contexts collide because of our technical infrastructure. ● Our complex situation means that we have lost control over what our information means. Topics (part of the structure of norms) are indeterminate. ● Moving forward, we should scaffold our theory of privacy with situated information flow, and build up to normative theory.
  • 74.
    Data Games andthe Value of Information Goals (narrow version): ● Address the problem raised by Chapter 1 about how to model and design for cross-context information flows in infrastructure… ● ...using the insights about situated information flow… ● ...to understand the economic impact of data protection laws.
  • 75.
    Social Norms Market Law Technology Tech Industry PublicRegulation a2 + b2 = c2 “Words, words, words…” CS, Statistics, EE Law Social Philosophy Economics #?!$?!@?!*?!
  • 76.
    U.S. Data ProtectionLaws ● Intellectual property laws ○ Since Feist v. Rural (1991), data (facts) are not protected by copyright. ○ Samuelson (2000) argues intellectual property won’t work for privacy because property is alienable, but privacy rights aren’t. ● Confidentiality and sectoral privacy laws. HIPAA, GLBA, FERPA, attorney-client privilege. ○ All tied to specific sectors or spheres. ○ They do reduce information flow outside of the situations where they apply. ○ But they do not regulate “the gaps” ● FTC notice-and-consent self-regulatory standard ○ Paradox: the more technically and legally detailed the notices, the more ignorant the consent! ○ Nobody thinks this is working.
  • 77.
    E.U. General DataProtection Regulation ● It’s based on privacy rights. (Compare with IP) ○ Sort of like a property right, but different. ● Omnibus. It covers all the cases. (Compare with sectoral laws) ● New obligations protect the rights: ○ Data minimization says don’t keep or process data for no agreed upon reason. ○ Also some general exceptions to data protection, which may erode the protections... ● Consent is given to particular purposes of use. ○ A purpose is less complicated than legalease or technical data flow, so better notices? Purpose-binding in the GDPR is reminiscent of Contextual Integrity, but is based on rights not norms.
  • 78.
    Law and economicsfor data? ● There is an important legal tradition of law and economics, using economic theory to inform legal judgments. ● Do we have one for the data economy? ● To better design data protection policy (and antitrust? etc.), we need economic theory that captures the economic impact on everyone involved (data processors, data subjects, and others)...
  • 79.
    Data Games andthe Value of Information Goals (real agenda): ● Using the insights about situated information flow… ● … develop a new tool, data games, for understanding the value of information ... ● … to start a new field of inquiry, data economics, that can better understand the foundational principles of the information economy! “Surely, that has been done before,” you say.
  • 80.
    Contextualism in privacyeconomics ● “The Economics of Privacy,” by Acquisti, Taylor, and Wagman (2016) surveys the existing privacy literature. ● They judge that economics can only ever deal with privacy in a contextually specific way. ● While this sounds nice, it runs into the same problem as CI! Namely, … ● We know the most important practice in the data economy is date reuse, i.e., use of data collected in one context for another!
  • 81.
    Something new! ● Weneed a new data economics! Really! ● We need a way to model the outcomes of creating and destroying information flows between strategic actors. ● The difference in outcome for each actor is the value of information.
  • 82.
    We need anew tool to measure the value of information.. We will start with situated information flow and add features for game theory and mechanism design. We can call this new tool a data game
  • 83.
    Multi-Agent Influence Diagrams(MAIDs) MAIDs: Bayesian Networks + game theory. (Daphne Koller & Brian Milch, ‘03) They have a set of agents and a directed acyclic graph with three kinds of nodes: Chance variables. Random variables with a CPD conditioning on their Parents in the graph. Decision variables. The have an associated player. They do not have a CPD (this is chosen by the player later). Utility variables. These have a CPD and an associated player. They may not have any descendants. X Y Z
  • 84.
    Multi-Agent Influence Diagrams(MAIDs) To “play” a MAID: 1) Each player simultaneously chooses a CPD for every decision node they control. This is their strategy profile, σ. 2) The strategies induce the MAID into a Bayesian Network, which is sampled. 3) The sum of the sampled values of each player’s utility variables are awarded as payoffs. W X1 Z1 X2 Z2 Y
  • 85.
    Multi-Agent Influence Diagrams(MAIDs) W X1 Z1 X2 Z2 Y W X1 Z1 X2 Z2 Y σ σ +
  • 86.
    Data Games: OptionalEdge An optional edge (dotted arrow) implies the diagram represents two different games, an open and a closed case. Note that the edge is an information flow. We can now reason systematically about consequences of allowing an information flow. (This is mechanism design). W X1 Z1 X2 Z2 Y
  • 87.
    Let’s look attwo data games for economic contexts that are well-understood
  • 88.
    Example: Principal/Agent V B Up Ua Earliestprivacy economics argument (?) by Richard Posner (1981): Employers depend on information about potential hires (V) to make efficient decisions (B). More privacy means less information means less efficiency.
  • 89.
    Example: Principal/Agent E(U) OpenClosed Principal (E(X | X > w) - w) P[X > w] (x - w)[E(X) > w] Agent w P[X > w] w[E(X) > w] Agent, x > w w w[E(X) > w] Agent, x < w 0 w[E(X) > w] V B Up Ua
  • 90.
    Example: Price discrimination VR B Uf Uc An important economic use of personal information is price differentiation (Shapiro and Varian, 1998). The firm chooses its price (R) with or without knowledge of the consumer’s demand (V). The consumer chooses whether or not to buy (B) after getting the price.
  • 91.
    Example: Price discrimination E(U)Open Closed Firm x - ϵ z* P[X > z] Consumer ϵ (x - z*)P[X > z] Consumer, x > z* ϵ x - z* Consumer, x < z* ϵ 0 z* = argmaxz E[z P(z < x)] V R B Uf Uc
  • 93.
    Let’s look attwo data games for economic situations that have not yet been studied
  • 94.
    Example: Expert Services V C R A Uf Uc WThe expert knows some specialized facts about the world (W) (i.e., medicine, law, the web). The client’s personal character traits (C), determine the value of each of m actions. The expert, who may or may not know the character traits, provides a recommendation (R). The consumer takes an action (A). The incentives of the expert and the client are aligned (no conflicts of interest).
  • 95.
    Example: Expert Services V C R! A Uf Uc WFirst observation: If the domain of R allows for enough bits (in the Shannon sense) for the expert to encode all the information in W, then personalization is irrelevant. This demonstrates a fundamental link between Shannon information theory and data economics: information bottlenecks matter.
  • 96.
    Example: Expert Services V CR A Uf Uc W First observation: If the domain of R is narrow compared to the expert knowledge W, then personalization (an open edge C -> R) does improve outcomes for the expert and client. The value of a personalized service is efficient dissemination of information through a small channel.
  • 97.
    Example: Context Collision V’B’ Up Uc D R B Uf Uc W
  • 99.
    Cross-context flows: thepoint Data is valuable not as a good, but as a strategic resource. It’s not consumed; it is part of the structure of the game of social and economic relations itself. Market externalities are the rule, not the exception. Traditional theories of market equilibrium, industrial organization, etc. are not going to cut it. We need to start the field of data economics.
  • 100.
    Tactical vs. StrategicFlows ● When considering an optional information flow, we can compare the equilibrium outcomes of the open and closed cases. Call this the strategic consequences of the information flow. ● We can also consider the outcome of opening the flow, while keeping the closed equilibrium strategy for all players except the recipient of the information. Call this the tactical consequences of the flow. We can use this to make sensitive distinctions about, e.g. the effects of a data breach vs. the chilling effects of ongoing surveillance. A data economics for cybersecurity?
  • 101.