1. Graph Cluster Randomization:
Network Exposure to Multiple Universes
Authors:
Johan Ugander, Cornell University
Brian Karrer, Facebook
Lars Backstrom, Facebook
Jon Kleinberg, Cornell University
Presented by:
Subhashis Hazarika,
Ohio State University
2. Motivation
• To estimate “average effect” of a treatment on a sample when the
treatment of individuals in the sample spills over to the neighboring
individuals via an underlying social network.
• A/B testing is so far the standard approach for “average effect” estimation
of a treatment on sample population.
• But A/B testing doesn’t take into account the social interference of the
sample being treated.
17-10-2013
2
3. A/B testing
• Assumption : SUTVA (single unit treatment value assumption)
New page A
Default page B
• Treatment group
• Control group
• Individuals respond
independently
• Independent response
• Universe A and Universe B are treated as two separate parallel universes.
17-10-2013
3
4. Proposed Solution
Graph Cluster Randomization
– Formulate Average Treatment and Network Exposure w.r.t graphtheoretic conditions
– Apply graph cluster randomization algorithms on the formulated
model
– Come up with an unbiased estimator i.e; Horvitz-Thompson estimator,
with an upper bound on the estimator variance that is linear in the
degrees of the graph.
17-10-2013
4
5. Average Treatment
• Given by Aronow and Samii equation without taking into consideration
SUTVA.
• Let
be the treatment assignment vector.
• Let
be the potential outcome of user i under the treatment
assignment vector z .
• Then the avg. treatment effect is given by:
17-10-2013
5
6. Network Exposure
• User i is “network exposed to a treatment” (with assignment vector say z)
if i’s response under z is same as i’s response in the assignment vector 1.
• So there can be the following exposure (or conditions )for the experiment:
o Full exposure
o Absolute k exposure
o Fractional q exposure
17-10-2013
6
7. Graph Cluster Randomization
• At a high level GCR is a technique in which the graph is partitioned into
clusters and then randomization between treatment and control is
performed at cluster level.
• We just need to know about the intersection of the set of clusters with the
local graph structure near the vertex.
17-10-2013
7
8. Exposure Models
• Exposure Condition of an individual determines how they experience the
intervention in full conjunction with how the world experiences the
intervention.
• Let
be the set of all assignment vector z for which i experiences
outcome x. which is basically the exposure condition for i.
• Exposure Model for user i is a set of exposure conditions that completely
partitions the possible assignment vectors z.
• Here we are interested only with
17-10-2013
and
.
8
9. Exposure Conditions
• Neighborhood Exposure( local exposure conditions ):
Full neighborhood exposure
Absolute k- neighborhood exposure
Fractional q- neighborhood exposure
• Core Exposure(global dependency):
Component exposure
Absolute k-core exposure
Fractional q-core exposure
Note:: assignment vectors of core exposure are entirely contained in the
associated neighborhood exposure.
17-10-2013
9
10. Randomization and Estimation
Select assignment vector z at random from Z in the range of
.
is distribution of Z.
is probability of network exposure to treatment.
Therefore avg. treatment effect is given by Horvitz-Thompson estimator,
The expectation over Z gives the actual avg. treatment effect.
17-10-2013
10
11. Exposure Probabilities
Model : Full neighborhood exposure + independent vertex randomization
– Probability of exposure to treatment will be
– Probability of exposure to control will be
– Exposure prob. for high degree vertex will be exponentially small in di and this will
dramatically increase the variance of HT estimator.
17-10-2013
11
13. Exposure Probabilities
• This model has an upper bound given by
.
• This also gives an upper bound on the core exposure probabilities, given
by the following proposition.
17-10-2013
13
16. Estimator Variance
• Thus we achieve O(1/n) bound on variance but only when the maximum
degree is bounded.
• Variance can grow exponentially with the degree.
• Hence they try to introduce a condition on the graph clustering such that
the degree remain bounded and we still have the variance growth.
17-10-2013
16
18. Variance in Restricted-Growth Graph
• Consider single cycle (k=1) graph of n vertices with basic cluster size c=2
• For c = 2
• For c >= 2
17-10-2013
18
20. Clustering Restricted-Growth Graph
• Using 3-net for the shortest path metric of graph G.
Initially all vertices are unmarked.
While there are unmarked vertices, in step j find an arbitrary unmarked vertex v,
selecting v to be vertex vj and marking all vertices in B2(vj).
Suppose k such vertices are defined and let S = {v1,v2,…..vk}
For every vertex w of G assign w to the closest vertex vi belonging to S, breaking ties
consistently.
For every vj, let Cj be the set of all vertices assigned to vj.
17-10-2013
20