Graph cluster randomization

Graph Cluster Randomization:
Network Exposure to Multiple Universes
Authors:
 Johan Ugander, Cornell University
 Brian Karrer, Facebook
 Lars Backstrom, Facebook
 Jon Kleinberg, Cornell University

Presented by:
Subhashis Hazarika,
Ohio State University

Motivation
• To estimate “average effect” of a treatment on a sample when the
treatment of individuals in the sample spills over to the neighboring
individuals via an underlying social network.
• A/B testing is so far the standard approach for “average effect” estimation
of a treatment on sample population.
• But A/B testing doesn’t take into account the social interference of the
sample being treated.

17-10-2013

2

A/B testing
• Assumption : SUTVA (single unit treatment value assumption)
New page A

Default page B

• Treatment group

• Control group

• Individuals respond
independently

• Independent response

• Universe A and Universe B are treated as two separate parallel universes.

17-10-2013

3

Proposed Solution
Graph Cluster Randomization
– Formulate Average Treatment and Network Exposure w.r.t graphtheoretic conditions
– Apply graph cluster randomization algorithms on the formulated
model
– Come up with an unbiased estimator i.e; Horvitz-Thompson estimator,
with an upper bound on the estimator variance that is linear in the
degrees of the graph.

17-10-2013

4

Average Treatment
• Given by Aronow and Samii equation without taking into consideration
SUTVA.
• Let
be the treatment assignment vector.
• Let
be the potential outcome of user i under the treatment
assignment vector z .
• Then the avg. treatment effect is given by:

17-10-2013

5

Network Exposure
• User i is “network exposed to a treatment” (with assignment vector say z)
if i’s response under z is same as i’s response in the assignment vector 1.
• So there can be the following exposure (or conditions )for the experiment:
o Full exposure
o Absolute k exposure
o Fractional q exposure

17-10-2013

6

Graph Cluster Randomization
• At a high level GCR is a technique in which the graph is partitioned into
clusters and then randomization between treatment and control is
performed at cluster level.
• We just need to know about the intersection of the set of clusters with the
local graph structure near the vertex.

17-10-2013

7

Exposure Models
• Exposure Condition of an individual determines how they experience the
intervention in full conjunction with how the world experiences the
intervention.
• Let
be the set of all assignment vector z for which i experiences
outcome x. which is basically the exposure condition for i.
• Exposure Model for user i is a set of exposure conditions that completely
partitions the possible assignment vectors z.
• Here we are interested only with

17-10-2013

and

.

8

Exposure Conditions
• Neighborhood Exposure( local exposure conditions ):
 Full neighborhood exposure
 Absolute k- neighborhood exposure
 Fractional q- neighborhood exposure

• Core Exposure(global dependency):
 Component exposure
 Absolute k-core exposure
 Fractional q-core exposure

Note:: assignment vectors of core exposure are entirely contained in the
associated neighborhood exposure.

17-10-2013

9

Randomization and Estimation
Select assignment vector z at random from Z in the range of

.

is distribution of Z.
is probability of network exposure to treatment.
Therefore avg. treatment effect is given by Horvitz-Thompson estimator,

The expectation over Z gives the actual avg. treatment effect.

17-10-2013

10

Exposure Probabilities
Model : Full neighborhood exposure + independent vertex randomization
– Probability of exposure to treatment will be

– Probability of exposure to control will be

– Exposure prob. for high degree vertex will be exponentially small in di and this will
dramatically increase the variance of HT estimator.

17-10-2013

11

For absolute and fractional neighborhood models we have the following
probabilities.

17-10-2013

12

• This model has an upper bound given by

.

• This also gives an upper bound on the core exposure probabilities, given
by the following proposition.

17-10-2013

13

Estimator Variance

The variance of effect estimator is given by:

Final variance:

17-10-2013

14

Estimator Variance
Final co-variance:

17-10-2013

15

Estimator Variance
• Thus we achieve O(1/n) bound on variance but only when the maximum
degree is bounded.
• Variance can grow exponentially with the degree.
• Hence they try to introduce a condition on the graph clustering such that
the degree remain bounded and we still have the variance growth.

17-10-2013

16

Restricted-Growth Graph

• Let Br(v) be the set of vertices within r hops of a vertex v.

17-10-2013

17

Variance in Restricted-Growth Graph
• Consider single cycle (k=1) graph of n vertices with basic cluster size c=2

• For c = 2
• For c >= 2

17-10-2013

18

Variance in Restricted-Growth Graph

17-10-2013

19

Clustering Restricted-Growth Graph
• Using 3-net for the shortest path metric of graph G.
 Initially all vertices are unmarked.
 While there are unmarked vertices, in step j find an arbitrary unmarked vertex v,
selecting v to be vertex vj and marking all vertices in B2(vj).
 Suppose k such vertices are defined and let S = {v1,v2,…..vk}
 For every vertex w of G assign w to the closest vertex vi belonging to S, breaking ties
consistently.
 For every vj, let Cj be the set of all vertices assigned to vj.

17-10-2013

20

Variance Bounds

17-10-2013

21

Graph cluster randomization

Recommended

Recommended

More Related Content

Similar to Graph cluster randomization

Similar to Graph cluster randomization (20)

More from Subhashis Hazarika

More from Subhashis Hazarika (13)

Recently uploaded

Recently uploaded (20)

Graph cluster randomization