Modeling networks: regression with additive and
multiplicative effects
Alexander Volfovsky
Department of Statistical Science, Duke
May 25 2017
May 25, 2017
Health Networks
Why model networks?
Interested in understanding the formation of relationships
1
Why model networks?
Interested in understanding the formation of relationships
Applied fields: sociology, economics, biology, epidemiology
1
Why model networks?
Interested in understanding the formation of relationships
Applied fields: sociology, economics, biology, epidemiology
Fundamental theory questions:
1
Why model networks?
Interested in understanding the formation of relationships
Applied fields: sociology, economics, biology, epidemiology
Fundamental theory questions:
What assumptions are made for different network models?
1
Why model networks?
Interested in understanding the formation of relationships
Applied fields: sociology, economics, biology, epidemiology
Fundamental theory questions:
What assumptions are made for different network models?
What models work when the assumptions fail?
1
Why model networks?
Interested in understanding the formation of relationships
Applied fields: sociology, economics, biology, epidemiology
Fundamental theory questions:
What assumptions are made for different network models?
What models work when the assumptions fail?
How to develop fail-safes to overcome these problems?
1
Why model networks?
Interested in understanding the formation of relationships
Applied fields: sociology, economics, biology, epidemiology
Fundamental theory questions:
What assumptions are made for different network models?
What models work when the assumptions fail?
How to develop fail-safes to overcome these problems?
Where to apply these?
1
Why model networks?
Interested in understanding the formation of relationships
Applied fields: sociology, economics, biology, epidemiology
Fundamental theory questions:
What assumptions are made for different network models?
What models work when the assumptions fail?
How to develop fail-safes to overcome these problems?
Where to apply these?
Causal inference
1
Why model networks?
Interested in understanding the formation of relationships
Applied fields: sociology, economics, biology, epidemiology
Fundamental theory questions:
What assumptions are made for different network models?
What models work when the assumptions fail?
How to develop fail-safes to overcome these problems?
Where to apply these?
Causal inference
Link prediction
1
Some context: Facebook
Facebook wants to change its’ ad algorithm.
2
Source: Wikimedia
Some context: Facebook
Facebook wants to change its’ ad algorithm.
Can’t do it on the whole graph
2
Source: Wikimedia
Some context: Facebook
Facebook wants to change its’ ad algorithm.
Can’t do it on the whole graph
Need “total network effect”
2
Source: Wikimedia
How do they solve it?
Interested in estimating
1
N
N
i=1
[Yi (all treated) − Yi (all controls)]
“At a high level, graph cluster randomization is a technique in
which the graph is partitioned into a set of clusters, and then
randomization between treatment and control is performed at
the cluster level.”
Where can we find clusters?
Observable information (e.g. same school)
Unobservable information (“social space”)
3
Some context: (im)migration
Want to know how
regime change affects
population.
Politicians during
election years care
about direct effects.
4
Source: http://openscience.alpine-geckos.at/courses/social-network-
analyses/empirical-network-analysis/
Some more context
Studying tram traffic in Vienna
5
Source: kurier.at
And one more
Studying taxi rides in Porto
442 taxis
1.7 million rides with (x, y) coordinates at 15 second intervals.
6
Source: Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017).
Automatic Differentiation Variational Inference. Journal of Machine Learning
Research, 18(14), 1-45.
And one more
Studying taxi rides in Porto
Project into a 100 dimensional latent space.
Learn hidden interpretable patterns...
7
Source: Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017).
Automatic Differentiation Variational Inference. Journal of Machine Learning
Research, 18(14), 1-45.
Relational data: common examples and goals
Changes in exports from year to year
−0.30 −0.20 −0.10
−0.4−0.20.00.20.4
first eigenvector of R^
row
secondeigenvectorofR^
row
Australia
Austria
Brazil
Canada
China
China, Hong Kong SAR
Finland
France
Germany
GreeceIndonesia
Ireland
Italy
Japan
Malaysia
Mexico
Netherlands
New Zealand
Norway
Rep. of Korea
Spain
Switzerland
Thailand
Turkey
United Kingdom
USA
−0.25 −0.15 −0.05 0.05
−0.3−0.10.10.3
first eigenvector of R^
col
secondeigenvectorofR^
col
Australia
Austria
Brazil
Canada
China
China, Hong Kong SAR
Finland
France
Germany
Greece
Indonesia
Ireland
Italy
Japan
Malaysia
Mexico
Netherlands
New ZealandNorway
Rep. of Korea
Spain
Switzerland
Thailand
Turkey
United Kingdom
USA
Network regression problems yij = xij β + ij frequently assume
independence of the ij
8
Estimating β in network regression
−0.30 −0.20 −0.10
−0.4−0.20.00.20.4
first eigenvector of R^
row
secondeigenvectorofR^
row
Australia
Austria
Brazil
Canada
China
China, Hong Kong SAR
Finland
France
Germany
GreeceIndonesia
Ireland
Italy
Japan
Malaysia
Mexico
Netherlands
New Zealand
Norway
Rep. of Korea
Spain
Switzerland
Thailand
Turkey
United Kingdom
USA
−0.25 −0.15 −0.05 0.05
−0.3−0.10.10.3
first eigenvector of R^
col
secondeigenvectorofR^
col
Australia
Austria
Brazil
Canada
China
China, Hong Kong SAR
Finland
France
Germany
Greece
Indonesia
Ireland
Italy
Japan
Malaysia
Mexico
Netherlands
New ZealandNorway
Rep. of Korea
Spain
Switzerland
Thailand
Turkey
United Kingdom
USA
For Y =< X, β > +E we have
OLS (assume no dependence among ij ):
ˆβ(ols)
= (mat(X)t
mat(X))−1
mat(X)t
vec(Y )
Oracle GLS (assume dependence among ij ):
ˆβ(gls)
= (mat(X)t
(Σ−1
)mat(X))−1
mat(X)t
(Σ−1
)vec(Y )
9
Network models
The data
There are n actors/nodes labeled 1, . . . , n
Y is a sociomatrix: yij is a dyadic relationship between node i
and node j.
yii frequently undefined.
Covariates:
node specific: xi
dyad specific: xij
Social relations model
Goal: describe the variability in Y .
Sender effects describe sociability.
Receiver effects describe popularity.
Capture this in the Social Relations Model (SRM)
yij = ai + bj + ij
Almost an ANOVA — want to relate ai to bi since the
senders/receivers are from the same set.
Social relations model
yij =µ + ai + bj + ij
(ai , bi )
iid
∼N(0, Σab)
( ij , ji )
iid
∼N(0, Σe)
Σab =
σ2
a σab
σab σ2
b
describes sender/receiver variability and
within person similarity.
Σe = σ2 1 ρ
ρ 1
describes within dyad correlation.
12
Variability
var(yij ) =σ2
a + 2σab + σ2
b + σ2
cov(yij , yik) =σ2
a
cov(yij , ukj ) =σ2
b
cov(yij , yjk) =σab
cov(yij , yji ) =2σab + ρσ2
How hard is it to fit this model?
fit_SRM <- ame(Y)
13
Pictures that pop up
These help capture how well the Markov Chain is mixing and
goodness of fit information.
14
Source: Hoff (2015). arXiv:1506.08237
Goodness of fit
Posterior predictive distributions.
sd.rowmean: standard deviation of row means of Y .
sd.colmean: standard deviation of column means of Y .
dyad.dep: correlation between vectorized Y and vectorized Y t
triad.dep:
i jk eij ejkeki
#triangle on n nodes
Var(vec(Y ))3/2
15
Source: Hoff (2015). arXiv:1506.08237
Incorporating covariates
Imagine you have some covariates and want to fit
yij = βt
d xd,ij + βt
r xr,i + βt
cxc,j + ai + bj + ij
xd,ij are dyad specific covariates.
xr,i are row (sender) covariates.
xc,i are column (receiver) covariates.
Frequently xr,i = xc,i = xi
When does this not make sense?
(Example: popularity is affected by athletic success, but
sociability is not)
How hard is it to fit this model?
fit_SRRM <- ame(Y, Xd=Xd,Xr=Xr,Xc=Xc)
16
Parsing the input
fit_SRRM <- ame(Y,
Xdyad=Xd, #n x n x pd array of covariates
Xrow=Xr, #n x pr matrix of nodal row covariates
Xcol=Xc #n x pc matrix of nodal column covariates
)
Xri,p is the value of the pth row covariate for node i.
Xdi,j,p is the value of the pth dyadic covariate in the direction
of i to j.
Back to basics
Can you get rid of the dependencies in the model?
fit_rm<-ame(Y,Xd=Xd,Xr=Xn,Xc=Xn,
rvar=FALSE, #should you fit row random effects?
cvar=FALSE, #should you fit column random effects?
dcor=FALSE #should you fit a dyadic correlation?
)
Note that summary will output:
Variance parameters:
pmean psd
va 0.000 0.000
cab 0.000 0.000
vb 0.000 0.000
rho 0.000 0.000
ve 0.229 0.011
18
So what’s missing here?
We have a lot of left over variability.
Common themes in network analysis:
Homophily: similar people connect to each other
Stochastic equivalence: similar people act similarly
19
Which is which?
Source: Hoff (2008). NIPS
Which is which?
Left: homophily; Right: stochastic equivalence
What are good models for this?
Source: Hoff (2008). NIPS
Introducing multiplicative effects
SR(R)M can represent second-order dependencies very well.
Has a hard time capturing “triadic” behavior.
Homophily: create dyadic covariates xd,ij = xi xj
Generally this can be represented by
xt
ri
Bxj,i = k l bkl xr,ikxc,jl
This is linear in the covariates and so can be baked into the
amen framework.
Sometimes there is excess correlation to account.
This suggests a multiplicative effects model:
yij = βt
d xd,ij + βt
r xr,i + βt
cxc,j + ai + bj + ut
i vj + ij
21
Fitting these models and beyond
fit_ame2<-ame(Y,Xd,Xn,Xn,
R=2 #dimension of the multiplicative effect
)
22
Source: Hoff (2015). arXiv:1506.08237
What happened here?
Why do multiplicative effects help triadic behavior?
Triadic measure is related to transitivity (at least for binary
data).
Turns out homophily can capture transitivity...
yij = βt
d xd,ij + βt
r xr,i + βt
cxc,j + ai + bj + ut
i vj + ij
ui is information about the sender, vj is information about the
receiver
if ui ≈ vj then ut
i vj > 0...
if ui ≈ uj then there is some stochastic equivalence...
Lets generalize: ordinal models
Imagine a binary (probit) model:
yij = 1zij >0 zij = µ + ai + bj + ij
Looks like the SRM on the latent scale.
fit_SRM<-ame(Y,
model="bin" #lots of model options here
)
If we go to the iid set up this is just an Erdos-Renyi model:
fit_SRG<-ame(Y,model="bin",
rvar=FALSE,cvar=FALSE,dcor=FALSE)
24
Even more general
Consider the following generative model:
zij = ut
i Dvj + ij
yij = g(zij )
25
Even more general
Consider the following generative model:
zij = ut
i Dvj + ij
yij = g(zij )
ui are latent factors describing i as a sender
Even more general
Consider the following generative model:
zij = ut
i Dvj + ij
yij = g(zij )
ui are latent factors describing i as a sender
vj are latent factors describing j as a receiver
Even more general
Consider the following generative model:
zij = ut
i Dvj + ij
yij = g(zij )
ui are latent factors describing i as a sender
vj are latent factors describing j as a receiver
D is a matrix of factor weights
Even more general
Consider the following generative model:
zij = ut
i Dvj + ij
yij = g(zij )
ui are latent factors describing i as a sender
vj are latent factors describing j as a receiver
D is a matrix of factor weights
g is an increasing function mapping the latent space to the
observed space.
Even more general
Consider the following generative model:
zij = ut
i Dvj + ij
yij = g(zij )
ui are latent factors describing i as a sender
vj are latent factors describing j as a receiver
D is a matrix of factor weights
g is an increasing function mapping the latent space to the
observed space.
(Some gs... Normal: g(z) = z, binomial: g(z) = 1z≥0)
25
This works for symmetric matrices too!
Imagine that yij = yji then the model looks like:
zij = ui Λuj + ij
yij = g(zij )
26
This works for symmetric matrices too!
Imagine that yij = yji then the model looks like:
zij = ui Λuj + ij
yij = g(zij )
ui ≈ uj represents stochastic equivalence
This works for symmetric matrices too!
Imagine that yij = yji then the model looks like:
zij = ui Λuj + ij
yij = g(zij )
ui ≈ uj represents stochastic equivalence
Λ is a matrix of eigenvalues:
This works for symmetric matrices too!
Imagine that yij = yji then the model looks like:
zij = ui Λuj + ij
yij = g(zij )
ui ≈ uj represents stochastic equivalence
Λ is a matrix of eigenvalues:
positive λi imply homophily, negative ones imply heterophily.
26
What is this latent space?
Problem 1: need to select a dimension R.
27
What is this latent space?
Problem 1: need to select a dimension R.
This is hard... sometimes there is some intuition.
27
What is this latent space?
Problem 1: need to select a dimension R.
This is hard... sometimes there is some intuition.
Problem 2: should the latent positions be interpreted?
27
What is this latent space?
Problem 1: need to select a dimension R.
This is hard... sometimes there is some intuition.
Problem 2: should the latent positions be interpreted?
Unclear — maybe think of the distances in this space...
27
What is this latent space?
Problem 1: need to select a dimension R.
This is hard... sometimes there is some intuition.
Problem 2: should the latent positions be interpreted?
Unclear — maybe think of the distances in this space...
Problem 3: what about my favorite other models like
stochastic blockmodels?
What is this latent space?
Problem 1: need to select a dimension R.
This is hard... sometimes there is some intuition.
Problem 2: should the latent positions be interpreted?
Unclear — maybe think of the distances in this space...
Problem 3: what about my favorite other models like
stochastic blockmodels?
These are just a subclass of models! For example, the
stochastic blockmodel has discrete support for the latent
positions.
What is this latent space?
All quotes from Hoff, et al 2002
A subset of individuals in the population with a large number
of social ties between them may be indicative of a group of
individuals who have nearby positions in this space of
characteristics, or social space.
Various concepts of social space have been discussed by
McFarland and Brown (1973) and Faust (1988).
In the context of this article, social space refers to a space of
unobserved latent characteristics that represent potential
transitive tendencies in network relations.
A probability measure over these unobserved characteristics
induces a model in which the presence of a tie between two
individuals is dependent on the presence of other ties.
(Tiny portion of the) literature
Nowicki, Krzysztof, and Tom A. B. Snijders. ”Estimation and
prediction for stochastic blockstructures.” Journal of the American
Statistical Association 96, no. 455 (2001): 1077-1087.
Hoff, Peter D., Adrian E. Raftery, and Mark S. Handcock. ”Latent
space approaches to social network analysis.” Journal of the
american Statistical association 97, no. 460 (2002): 1090-1098.
Hoff, Peter. ”Modeling homophily and stochastic equivalence in
symmetric relational data.” In Advances in Neural Information
Processing Systems, pp. 657-664. 2008.
Airoldi, Edoardo M., David M. Blei, Stephen E. Fienberg, and Eric
P. Xing. ”Mixed membership stochastic blockmodels.” Journal of
Machine Learning Research 9, no. Sep (2008): 1981-2014.
Hoff, Peter, Bailey Fosdick, Alex Volfovsky, and Katherine Stovel.
”Likelihoods for fixed rank nomination networks.” Network Science
1, no. 03 (2013): 253-277.
Hoff, Peter D. ”Dyadic data analysis with amen.” arXiv preprint
arXiv:1506.08237 (2015).
ame(Y, Xdyad=NULL, Xrow=NULL, Xcol=NULL,
rvar = !(model=="rrl") , cvar = TRUE, dcor = !symmetric,
nvar = TRUE, R = 0, model="nrm",
intercept=!is.element(model,c("rrl","ord")),
symmetric=FALSE,
odmax=rep(max(apply(Y>0,1,sum,na.rm=TRUE)),nrow(Y)), ...)
Y: an n x n square relational matrix of relations.
Xdyad: an n x n x pd array of covariates
Xrow: an n x pr matrix of nodal row covariates
Xcol: an n x pc matrix of nodal column covariates
rvar: logical: fit row random effects (asymmetric case)?
cvar: logical: fit column random effects (asymmetric case)?
dcor: logical: fit a dyadic correlation (asymmetric case)?
nvar: logical: fit nodal random effects (symmetric case)?
R: int: dimension of the multiplicative effects (can be 0)
model: char: one of "nrm","bin","ord","cbin","frn","rrl"
odmax: a scalar integer or vector of length n giving the
maximum number of nominations that each node may make
What’s in the ...?
seed = 1, nscan = 10000, burn = 500, odens = 25,
plot=TRUE, print = TRUE, gof=TRUE
seed: random seed
nscan: number of iterations of the Markov chain
(beyond burn-in)
burn: burn in for the Markov chain
odens: output density for the Markov chain
plot: logical: plot results while running?
print: logical: print results while running?
gof: logical: calculate goodness of fit statistics?
An AddHealth Example
32
Social network data
Datasets: PROSPER, NSCR, AddHealth
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
9
10
11
12
0.000.050.100.150.20
proportion
Figure 3
interest is a comparison of such estima
in order to see if the relationships betw
study in Section 3.2. To this end, w
33
Social network data
Datasets: PROSPER, NSCR, AddHealth
Relate network characteristics to
individual-level behavior !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
9
10
11
12
0.000.050.100.150.20
proportion
Figure 3
interest is a comparison of such estima
in order to see if the relationships betw
study in Section 3.2. To this end, w
33
Social network data
Datasets: PROSPER, NSCR, AddHealth
Relate network characteristics to
individual-level behavior
Literature: ERGM, latent variable models
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
9
10
11
12
0.000.050.100.150.20
proportion
Figure 3
interest is a comparison of such estima
in order to see if the relationships betw
study in Section 3.2. To this end, w
33
Social network data
Datasets: PROSPER, NSCR, AddHealth
Relate network characteristics to
individual-level behavior
Literature: ERGM, latent variable models
Assumptions:
Data is fully observed
The support is the set of all
sociomatrices
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
9
10
11
12
0.000.050.100.150.20
proportion
Figure 3
interest is a comparison of such estima
in order to see if the relationships betw
study in Section 3.2. To this end, w
33
Social network data
Datasets: PROSPER, NSCR, AddHealth
Relate network characteristics to
individual-level behavior
Literature: ERGM, latent variable models
Assumptions:
Data is fully observed
The support is the set of all
sociomatrices
In practice:
Ranked data
Censored observations
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
9
10
11
12
0.000.050.100.150.20
proportion
Figure 3
interest is a comparison of such estima
in order to see if the relationships betw
study in Section 3.2. To this end, w
33
Social network data
Datasets: PROSPER, NSCR, AddHealth
Relate network characteristics to
individual-level behavior
Literature: ERGM, latent variable models
Assumptions:
Data is fully observed
The support is the set of all
sociomatrices
In practice:
Ranked data
Censored observations
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
9
10
11
12
0.000.050.100.150.20
proportion
Figure 3
interest is a comparison of such estima
in order to see if the relationships betw
study in Section 3.2. To this end, w
A type of likelihood that accommodates the ranked and censored
nature of data from Fixed Rank Nomination (FRN) surveys and
allows for estimation of regression effects.
33
Data collection examples
PROmoting School Community-University Partnerships to
Enhance Resilience (PROSPER): “Who are your best and
closest friends in your grade?”
National Longitudinal Study of Adolescent to Adult Health
(AddHealth): “Your male friends. List your closest male
friends. List your best male friend first, then your next best
friend, and so on.”
34
Notation
Z = {zij : i = j} is a sociomatrix of
ordinal relationships
zij > zik denotes person i preferring
person j to person k
Z =





− z12 · · · z1n
z21 −
... −
zn1 −





35
Notation
Z = {zij : i = j} is a sociomatrix of
ordinal relationships
zij > zik denotes person i preferring
person j to person k
Z =





− z12 · · · z1n
z21 −
... −
zn1 −





35
Notation
Z = {zij : i = j} is a sociomatrix of
ordinal relationships
zij > zik denotes person i preferring
person j to person k
Z =





− z12 · · · z1n
z21 −
... −
zn1 −





Instead of Z we observe a sociomatrix Y = {yij : i = j}
35
Notation
Z = {zij : i = j} is a sociomatrix of
ordinal relationships
zij > zik denotes person i preferring
person j to person k
Z =





− z12 · · · z1n
z21 −
... −
zn1 −





Instead of Z we observe a sociomatrix Y = {yij : i = j}
Different sampling schemes define different maps between Y
and Z (set relations between yij and zij ).
35
Notation
Z = {zij : i = j} is a sociomatrix of
ordinal relationships
zij > zik denotes person i preferring
person j to person k
Z =





− z12 · · · z1n
z21 −
... −
zn1 −





Instead of Z we observe a sociomatrix Y = {yij : i = j}
Different sampling schemes define different maps between Y
and Z (set relations between yij and zij ).
Statistical model {p (Z|θ) : θ ∈ Θ} assists in analysis
35
Fixed rank nominations
yij > yik ⇒ zij > zik
}F (Y )yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
m = maximal number of nominations, di = individual outdegree
36
Fixed rank nominations
yij > yik ⇒ zij > zik
}F (Y )yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
m = maximal number of nominations, di = individual outdegree
Differentiates between different ranks
Captures censoring in the data
zi
yi
1 2 3 4 5 6 7 8 9 10
36
Fixed rank nominations
yij > yik ⇒ zij > zik
}F (Y )yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
m = maximal number of nominations, di = individual outdegree
Differentiates between different ranks
Captures censoring in the data
zi
yi
1 2 3 4 5 6 7 8 9 10
4 3 2 1 0 0 0 0 0 0
36
Fixed rank nominations
yij > yik ⇒ zij > zik
}F (Y )yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
m = maximal number of nominations, di = individual outdegree
Differentiates between different ranks
Captures censoring in the data
zi
yi
1 2 3 4 5 6 7 8 9 10
4 3 2 1 0 0 0 0 0 0
zi1 zi2 zi3 zi4 0> 0> 0> 0> 0> 0>> > > >
36
Fixed rank nominations
yij > yik ⇒ zij > zik
}F (Y )yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
m = maximal number of nominations, di = individual outdegree
Differentiates between different ranks
Captures censoring in the data
zi
yi
1 2 3 4 5 6 7 8 9 10
36
Fixed rank nominations
yij > yik ⇒ zij > zik
}F (Y )yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
m = maximal number of nominations, di = individual outdegree
Differentiates between different ranks
Captures censoring in the data
zi
yi
1 2 3 4 5 6 7 8 9 10
5 4 3 2 1 0 0 0 0 0
36
Fixed rank nominations
yij > yik ⇒ zij > zik
}F (Y )yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
m = maximal number of nominations, di = individual outdegree
Differentiates between different ranks
Captures censoring in the data
zi
yi
1 2 3 4 5 6 7 8 9 10
5 4 3 2 1 0 0 0 0 0
zi1 zi2 zi3 zi4 zi5 ? ? ? ? ?> > > > >
36
Rank
yij > yik ⇒ zij > zik } R (Y )
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
37
Rank
yij > yik ⇒ zij > zik } R (Y )
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
Valid but not fully informative: F (Y ) R (Y )
zi
yi
1 2 3 4 5 6 7 8 9 10
37
Rank
yij > yik ⇒ zij > zik } R (Y )
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
Valid but not fully informative: F (Y ) R (Y )
zi
yi
1 2 3 4 5 6 7 8 9 10
4 3 2 1 0 0 0 0 0 0
37
Rank
yij > yik ⇒ zij > zik } R (Y )
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
Valid but not fully informative: F (Y ) R (Y )
zi
yi
1 2 3 4 5 6 7 8 9 10
4 3 2 1 0 0 0 0 0 0
zi1 zi2 zi3 zi4 ? ? ? ? ? ?> > > >
37
Rank
yij > yik ⇒ zij > zik } R (Y )
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
Valid but not fully informative: F (Y ) R (Y )
zi
yi
1 2 3 4 5 6 7 8 9 10
37
Rank
yij > yik ⇒ zij > zik } R (Y )
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
Valid but not fully informative: F (Y ) R (Y )
zi
yi
1 2 3 4 5 6 7 8 9 10
5 4 3 2 1 0 0 0 0 0
37
Rank
yij > yik ⇒ zij > zik } R (Y )
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
Valid but not fully informative: F (Y ) R (Y )
zi
yi
1 2 3 4 5 6 7 8 9 10
5 4 3 2 1 0 0 0 0 0
zi1 zi2 zi3 zi4 zi5 ? ? ? ? ?> > > > >
37
Rank
yij > yik ⇒ zij > zik } R (Y )
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
Valid but not fully informative: F (Y ) R (Y )
Cannot estimate row (“sender”) specific effects
zi
yi
1 2 3 4 5 6 7 8 9 10
5 4 3 2 1 0 0 0 0 0
zi1 zi2 zi3 zi4 zi5 ? ? ? ? ?> > > > >
37
Binary
yij > yik ⇒ zij > zik
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
} B (Y )
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
B(Y)
38
Binary
yij > yik ⇒ zij > zik
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
} B (Y )
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
B(Y)
Neither fully informative nor valid!
Discards information on the ranks
Ignores the censoring on the outdegrees
In particular: F (Y ) ⊂ B (Y )
zi
yi
1 2 3 4 5 6 7 8 9 10
38
Binary
yij > yik ⇒ zij > zik
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
} B (Y )
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
B(Y)
Neither fully informative nor valid!
Discards information on the ranks
Ignores the censoring on the outdegrees
In particular: F (Y ) ⊂ B (Y )
zi
yi
1 2 3 4 5 6 7 8 9 10
4 3 2 1 0 0 0 0 0 0
38
Binary
yij > yik ⇒ zij > zik
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
} B (Y )
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
B(Y)
Neither fully informative nor valid!
Discards information on the ranks
Ignores the censoring on the outdegrees
In particular: F (Y ) ⊂ B (Y )
zi
yi
1 2 3 4 5 6 7 8 9 10
4 3 2 1 0 0 0 0 0 0
>0 >0 >0 >0 0> 0> 0> 0> 0> 0>
38
Binary
yij > yik ⇒ zij > zik
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
} B (Y )
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
B(Y)
Neither fully informative nor valid!
Discards information on the ranks
Ignores the censoring on the outdegrees
In particular: F (Y ) ⊂ B (Y )
zi
yi
1 2 3 4 5 6 7 8 9 10
38
Binary
yij > yik ⇒ zij > zik
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
} B (Y )
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
B(Y)
Neither fully informative nor valid!
Discards information on the ranks
Ignores the censoring on the outdegrees
In particular: F (Y ) ⊂ B (Y )
zi
yi
1 2 3 4 5 6 7 8 9 10
5 4 3 2 1 0 0 0 0 0
38
Binary
yij > yik ⇒ zij > zik
yij = 0 and di < m ⇒ zij ≤ 0
yij > 0 ⇒ zij > 0
} B (Y )
yij = 0 ⇒ zij < 0
F(Y)
R(Y)
B(Y)
Neither fully informative nor valid!
Discards information on the ranks
Ignores the censoring on the outdegrees
In particular: F (Y ) ⊂ B (Y )
zi
yi
1 2 3 4 5 6 7 8 9 10
5 4 3 2 1 0 0 0 0 0
>0 >0 >0 >0 >0 0> 0> 0> 0> 0>
38
Bayesian Estimation for Fixed Rank Nominations
Model: Z ∼ p(Z|θ), θ ∈ Θ
Data: Z ∈ F(Y )
Likelihood:
LF (θ : Y ) = Pr (Z ∈ F (Y )|θ) =
F(Y )
dP (Z|θ)
Estimation: Given p(θ), p(θ|Z ∈ F(Y )) can be approximated
by a Gibbs sampler.
39
Bayesian Estimation for Fixed Rank Nominations
Model: Z ∼ p(Z|θ), θ ∈ Θ
Data: Z ∈ F(Y )
Likelihood:
LF (θ : Y ) = Pr (Z ∈ F (Y )|θ) =
F(Y )
dP (Z|θ)
Estimation: Given p(θ), p(θ|Z ∈ F(Y )) can be approximated
by a Gibbs sampler.
Simulate zij ∼ p(zij |θ, Z−ij , Z ∈ F(Y )):
39
Bayesian Estimation for Fixed Rank Nominations
Model: Z ∼ p(Z|θ), θ ∈ Θ
Data: Z ∈ F(Y )
Likelihood:
LF (θ : Y ) = Pr (Z ∈ F (Y )|θ) =
F(Y )
dP (Z|θ)
Estimation: Given p(θ), p(θ|Z ∈ F(Y )) can be approximated
by a Gibbs sampler.
Simulate zij ∼ p(zij |θ, Z−ij , Z ∈ F(Y )):
1. yij > 0: zij ∼ p(zij |θ, Z−ij )1zij ∈(a,b) where
a = max(zik : yik < yij ) and b = min(zik : yik > yij ).
39
Bayesian Estimation for Fixed Rank Nominations
Model: Z ∼ p(Z|θ), θ ∈ Θ
Data: Z ∈ F(Y )
Likelihood:
LF (θ : Y ) = Pr (Z ∈ F (Y )|θ) =
F(Y )
dP (Z|θ)
Estimation: Given p(θ), p(θ|Z ∈ F(Y )) can be approximated
by a Gibbs sampler.
Simulate zij ∼ p(zij |θ, Z−ij , Z ∈ F(Y )):
1. yij > 0: zij ∼ p(zij |θ, Z−ij )1zij ∈(a,b) where
a = max(zik : yik < yij ) and b = min(zik : yik > yij ).
2. yij = 0 and di < m: zij ∼ p(zij |Z−ij , θ)1zij ≤0.
39
Bayesian Estimation for Fixed Rank Nominations
Model: Z ∼ p(Z|θ), θ ∈ Θ
Data: Z ∈ F(Y )
Likelihood:
LF (θ : Y ) = Pr (Z ∈ F (Y )|θ) =
F(Y )
dP (Z|θ)
Estimation: Given p(θ), p(θ|Z ∈ F(Y )) can be approximated
by a Gibbs sampler.
Simulate zij ∼ p(zij |θ, Z−ij , Z ∈ F(Y )):
1. yij > 0: zij ∼ p(zij |θ, Z−ij )1zij ∈(a,b) where
a = max(zik : yik < yij ) and b = min(zik : yik > yij ).
2. yij = 0 and di < m: zij ∼ p(zij |Z−ij , θ)1zij ≤0.
3. yij = 0 and di = m: zij ∼ p(zij |Z−ij , θ)1zij ≤min(zik :yik >0)
39
Bayesian Estimation for Fixed Rank Nominations
Model: Z ∼ p(Z|θ), θ ∈ Θ
Data: Z ∈ F(Y )
Likelihood:
LF (θ : Y ) = Pr (Z ∈ F (Y )|θ) =
F(Y )
dP (Z|θ)
Estimation: Given p(θ), p(θ|Z ∈ F(Y )) can be approximated
by a Gibbs sampler.
Simulate zij ∼ p(zij |θ, Z−ij , Z ∈ F(Y )):
1. yij > 0: zij ∼ p(zij |θ, Z−ij )1zij ∈(a,b) where
a = max(zik : yik < yij ) and b = min(zik : yik > yij ).
2. yij = 0 and di < m: zij ∼ p(zij |Z−ij , θ)1zij ≤0.
3. yij = 0 and di = m: zij ∼ p(zij |Z−ij , θ)1zij ≤min(zik :yik >0)
Allows for imputation of missing yij
39
Simulations
We generated Z from the following Social Relations Model
(Warner, Kenny and Stoto (1979)):
zij = βt
xij + ai + bj + ij
ai
bi
iid
∼ normal 0,
1 0.5
0.5 1
ij
ji
iid
∼ normal 0,
1 0.9
0.9 1
Mean model: βtxij = β0 + βr xir + βcxjc + βd1 xij1 + βd2 xij2
xir , xjc: individual level variables
xij1: pair specific variable
xij2: co-membership in a group
40
Simulations
We generated Z from the following Social Relations Model
(Warner, Kenny and Stoto (1979)):
zij = βt
xij + ai + bj + ij
ai
bi
iid
∼ normal 0,
1 0.5
0.5 1
ij
ji
iid
∼ normal 0,
1 0.9
0.9 1
Mean model: βtxij = β0 + βr xir + βcxjc + βd1 xij1 + βd2 xij2
xir , xjc: individual level variables
xij1: pair specific variable
xij2: co-membership in a group
βr = βc = βd1 = βd2 = 1 and β0 = −3.26
xir , xic, xij1
iid
∼ N (0, 1) xij2 = si sj /.42 for si
iid
∼ binary (1/2)
40
Simulations
We generated Z from the following Social Relations Model
(Warner, Kenny and Stoto (1979)):
zij = βt
xij + ai + bj + ij
ai
bi
iid
∼ normal 0,
1 0.5
0.5 1
ij
ji
iid
∼ normal 0,
1 0.9
0.9 1
Mean model: βtxij = β0 + βr xir + βcxjc + βd1 xij1 + βd2 xij2
xir , xjc: individual level variables
xij1: pair specific variable
xij2: co-membership in a group
βr = βc = βd1 = βd2 = 1 and β0 = −3.26
xir , xic, xij1
iid
∼ N (0, 1) xij2 = si sj /.42 for si
iid
∼ binary (1/2)
40
Simulations - Censoring
8 simulations for each m ∈ {5, 15} with 100 nodes each1 2 3 4 5 6 7 8
0.00.51.01.5
!r
! !
! ! ! ! ! !
!
!
! !
! ! !
!
m = 5
1 2 3 4 5 6 7 8
0.40.81.21.6
!c
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
1 2 3 4 5 6 7 8
0.81.0
!d1
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!
! !
!
!
!
!
1 2 3 4 5 6 7 8
0.40.81.2
!d2
!
! ! !
!
!
!
!
!
!
!
!
!
!
! !
! !
!
!
!
! !
!
simulation
1 2 3 4 5 6 7 8
0.00.51.01.5
! !
!
! !
! ! !
!
!
! !
! ! ! !
m = 15
1 2 3 4 5 6 7 8
0.40.81.21.6
! !
!
!
! !
!
!
! !
!
! ! !
!
!! !
!
!
! !
!
!
1 2 3 4 5 6 7 8
0.81.0
!
!
! !
!
!
! !
!
!
! !
!
! !
!
!
!
!
!
!
! !
!
1 2 3 4 5 6 7 8
0.40.81.2
! ! !
! !
!
!
!
! !
!
! !
!
!
!! ! !
! !
!
!
!
simulation
m = 5 m = 15
1 2 3 4 5 6 7 8
0.00.51.01.5
!r
! !
! ! ! ! ! !
!
!
! !
! ! !
!
1 2 3 4 5 6 7 8
0.40.81.21.6
!c
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
1 2 3 4 5 6 7 8
0.81.0
!d1
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!
! !
!
!
!
!
1 2 3 4 5 6 7 8
0.40.81.2
!d2
!
! ! !
!
!
!
!
!
!
!
!
!
!
! !
! !
!
!
!
! !
!
simulation
1 2 3 4 5 6 7 8
0.00.51.01.5
! !
!
! !
! ! !
!
!
! !
! ! ! !
1 2 3 4 5 6 7 8
0.40.81.21.6
! !
!
!
! !
!
!
! !
!
! ! !
!
!! !
!
!
! !
!
!
1 2 3 4 5 6 7 8
0.81.0
!
!
! !
!
!
! !
!
!
! !
!
! !
!
!
!
!
!
!
! !
!
1 2 3 4 5 6 7 8
0.40.81.2
! ! !
! !
!
!
!
! !
!
! !
!
!
!! ! !
! !
!
!
!
simulation
m = 5 m = 15
Confidence intervals under the three different likelihood for column
and an iid dyadic variable. The groups of three CIs are based on
binary, FRN and rank likelihoods from left to right.
41
Simulations - Censoring
1 2 3 4 5 6 7 8
0.00.51.01.5
!r
! !
! ! ! ! ! !
!
!
! !
! ! !
!
m = 5
1 2 3 4 5 6 7 8
0.40.81.21.6
!c
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
1 2 3 4 5 6 7 8
0.81.0
!d1
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!
! !
!
!
!
!
2
1 2 3 4 5 6 7 8
0.00.51.01.5
! !
!
! !
! ! !
!
!
! !
! ! ! !
m = 15
1 2 3 4 5 6 7 8
0.40.81.21.6
! !
!
!
! !
!
!
! !
!
! ! !
!
!! !
!
!
! !
!
!
1 2 3 4 5 6 7 8
0.81.0
!
!
! !
!
!
! !
!
!
! !
!
! !
!
!
!
!
!
!
! !
!
2
m = 5 m = 15
Rank likelihood cannot estimate row effects
Z ∈ R (Y ) ⇐⇒ Z + c1t
∈ R (Y ) ∀c ∈ Rn
Simulations - Censoring
1 2 3 4 5 6 7 8
0.00.51.01.5
!r
! !
! ! ! ! ! !
!
!
! !
! ! !
!
m = 5
1 2 3 4 5 6 7 8
0.40.81.21.6
!c
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
1 2 3 4 5 6 7 8
0.81.0
!d1
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!
! !
!
!
!
!
2
1 2 3 4 5 6 7 8
0.00.51.01.5
! !
!
! !
! ! !
!
!
! !
! ! ! !
m = 15
1 2 3 4 5 6 7 8
0.40.81.21.6
! !
!
!
! !
!
!
! !
!
! ! !
!
!! !
!
!
! !
!
!
1 2 3 4 5 6 7 8
0.81.0
!
!
! !
!
!
! !
!
!
! !
!
! !
!
!
!
!
!
!
! !
!
2
m = 5 m = 15
Rank likelihood cannot estimate row effects
Z ∈ R (Y ) ⇐⇒ Z + c1t
∈ R (Y ) ∀c ∈ Rn
Binary likelihood poorly estimates row effects
Simulations - Censoring
1 2 3 4 5 6 7 8
0.00.51.01.5
!r
! !
! ! ! ! ! !
!
!
! !
! ! !
!
m = 5
1 2 3 4 5 6 7 8
0.40.81.21.6
!c
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
1 2 3 4 5 6 7 8
0.81.0
!d1
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!
! !
!
!
!
!
2
1 2 3 4 5 6 7 8
0.00.51.01.5
! !
!
! !
! ! !
!
!
! !
! ! ! !
m = 15
1 2 3 4 5 6 7 8
0.40.81.21.6
! !
!
!
! !
!
!
! !
!
! ! !
!
!! !
!
!
! !
!
!
1 2 3 4 5 6 7 8
0.81.0
!
!
! !
!
!
! !
!
!
! !
!
! !
!
!
!
!
!
!
! !
!
2
m = 5 m = 15
Rank likelihood cannot estimate row effects
Z ∈ R (Y ) ⇐⇒ Z + c1t
∈ R (Y ) ∀c ∈ Rn
Binary likelihood poorly estimates row effects
Large amount of censoring
Simulations - Censoring
1 2 3 4 5 6 7 8
0.00.51.01.5
!r
! !
! ! ! ! ! !
!
!
! !
! ! !
!
m = 5
1 2 3 4 5 6 7 8
0.40.81.21.6
!c
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
1 2 3 4 5 6 7 8
0.81.0
!d1
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!
! !
!
!
!
!
2
1 2 3 4 5 6 7 8
0.00.51.01.5
! !
!
! !
! ! !
!
!
! !
! ! ! !
m = 15
1 2 3 4 5 6 7 8
0.40.81.21.6
! !
!
!
! !
!
!
! !
!
! ! !
!
!! !
!
!
! !
!
!
1 2 3 4 5 6 7 8
0.81.0
!
!
! !
!
!
! !
!
!
! !
!
! !
!
!
!
!
!
!
! !
!
2
m = 5 m = 15
Rank likelihood cannot estimate row effects
Z ∈ R (Y ) ⇐⇒ Z + c1t
∈ R (Y ) ∀c ∈ Rn
Binary likelihood poorly estimates row effects
Large amount of censoring
⇒ Heterogeneity of censored outdegrees is low
Simulations - Censoring
1 2 3 4 5 6 7 8
0.00.51.01.5
!r
! !
! ! ! ! ! !
!
!
! !
! ! !
!
m = 5
1 2 3 4 5 6 7 8
0.40.81.21.6
!c
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
1 2 3 4 5 6 7 8
0.81.0
!d1
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!
! !
!
!
!
!
2
1 2 3 4 5 6 7 8
0.00.51.01.5
! !
!
! !
! ! !
!
!
! !
! ! ! !
m = 15
1 2 3 4 5 6 7 8
0.40.81.21.6
! !
!
!
! !
!
!
! !
!
! ! !
!
!! !
!
!
! !
!
!
1 2 3 4 5 6 7 8
0.81.0
!
!
! !
!
!
! !
!
!
! !
!
! !
!
!
!
!
!
!
! !
!
2
m = 5 m = 15
Rank likelihood cannot estimate row effects
Z ∈ R (Y ) ⇐⇒ Z + c1t
∈ R (Y ) ∀c ∈ Rn
Binary likelihood poorly estimates row effects
Large amount of censoring
⇒ Heterogeneity of censored outdegrees is low
⇒ Regression coefficients estimated too low
Simulations - Censoring
1 2 3 4 5 6 7 8
0.81.0
!d1
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!
! !
!
!
!
!
1 2 3 4 5 6 7 8
0.40.81.2
!d2
!
! ! !
!
!
!
!
!
!
!
!
!
!
! !
! !
!
!
!
! !
!
simulation
1 2 3 4 5 6 7 8
0.81.0
!
!
! !
!
!
! !
!
!
! !
!
! !
!
!
!
!
!
!
! !
!
1 2 3 4 5 6 7 8
0.40.81.2
! ! !
! !
!
!
!
! !
!
! !
!
!
!! ! !
! !
!
!
!
simulation
m = 5 m = 15
Recall: xij2 ∝ si sj , an indicator of comembership to a group
43
Simulations - Censoring
1 2 3 4 5 6 7 8
0.81.0
!d1
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!
! !
!
!
!
!
1 2 3 4 5 6 7 8
0.40.81.2
!d2
!
! ! !
!
!
!
!
!
!
!
!
!
!
! !
! !
!
!
!
! !
!
simulation
1 2 3 4 5 6 7 8
0.81.0
!
!
! !
!
!
! !
!
!
! !
!
! !
!
!
!
!
!
!
! !
!
1 2 3 4 5 6 7 8
0.40.81.2
! ! !
! !
!
!
!
! !
!
! !
!
!
!! ! !
! !
!
!
!
simulation
m = 5 m = 15
Recall: xij2 ∝ si sj , an indicator of comembership to a group
Ignore the censoring
43
Simulations - Censoring
1 2 3 4 5 6 7 8
0.81.0
!d1
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!
! !
!
!
!
!
1 2 3 4 5 6 7 8
0.40.81.2
!d2
!
! ! !
!
!
!
!
!
!
!
!
!
!
! !
! !
!
!
!
! !
!
simulation
1 2 3 4 5 6 7 8
0.81.0
!
!
! !
!
!
! !
!
!
! !
!
! !
!
!
!
!
!
!
! !
!
1 2 3 4 5 6 7 8
0.40.81.2
! ! !
! !
!
!
!
! !
!
! !
!
!
!! ! !
! !
!
!
!
simulation
m = 5 m = 15
Recall: xij2 ∝ si sj , an indicator of comembership to a group
Ignore the censoring
⇒ Binary likelihood underestimates row variability
43
Simulations - Censoring
1 2 3 4 5 6 7 8
0.81.0
!d1
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!!
!
! !
!
!
!
!
1 2 3 4 5 6 7 8
0.40.81.2
!d2
!
! ! !
!
!
!
!
!
!
!
!
!
!
! !
! !
!
!
!
! !
!
simulation
1 2 3 4 5 6 7 8
0.81.0
!
!
! !
!
!
! !
!
!
! !
!
! !
!
!
!
!
!
!
! !
!
1 2 3 4 5 6 7 8
0.40.81.2
! ! !
! !
!
!
!
! !
!
! !
!
!
!! ! !
! !
!
!
!
simulation
m = 5 m = 15
Recall: xij2 ∝ si sj , an indicator of comembership to a group
Ignore the censoring
⇒ Binary likelihood underestimates row variability
⇒ Underestimate the variability in xij2
43
Simulations - information in the ranks
Let C (Y ) be the set of values for which the following is true:
yij > 0 ⇒ zij > 0
yij = 0 and di < m ⇒ zij ≤ 0
min {zij : yij > 0} ≥ max {zij : yij = 0}
We refer to LC (θ : Y ) = Pr (Z ∈ C (Y )|θ) as the censored
binary likelihood.
Recognizes censoring but ignores information in the ranks
Simulations - information in the ranks
Let C (Y ) be the set of values for which the following is true:
yij > 0 ⇒ zij > 0
yij = 0 and di < m ⇒ zij ≤ 0
min {zij : yij > 0} ≥ max {zij : yij = 0}
We refer to LC (θ : Y ) = Pr (Z ∈ C (Y )|θ) as the censored
binary likelihood.
Recognizes censoring but ignores information in the ranks
Performs similarly to FRN in the previous study
Less precise than FRN when m is big
Simulations - information in the ranks
Same setup as before, but average uncensored outdegree is m
10 20 30 40 50
0.20.40.60.81.01.21.4
m
relativeconcentrationaroundtruevalue
! ! !
! !r
!
!
! ! !c
!
!
!
! !d1
! !
!
! !d2
2: Posterior concentration around true parameter values. The average of E[(β −
(S)]/E[(β − β∗)2|C(S)] across eight simulated datasets for each m ∈ {5, 15, 30, 50}.
censored binomial likelihood. As the censored binomial likelihood recognizes the censoring in
data, we expect it to provide parameter estimates that do not have the biases of the binomial
ood estimators. On the other hand, LC ignores the information in the ranks of the scored
duals, and so we might expect it to provide less precise estimates than the FRN likelihood.
βr : row
βc: column
βd1: continuous dyad
βd2: co-membership
Relative concentration around true value of each parameter:
Measured by E (β − 1)
2
|F (Y ) /E (β − 1)
2
|C (Y ) for each β
45
Simulations - information in the ranks
Same setup as before, but average uncensored outdegree is m
10 20 30 40 50
0.20.40.60.81.01.21.4
m
relativeconcentrationaroundtruevalue
! ! !
! !r
!
!
! ! !c
!
!
!
! !d1
! !
!
! !d2
2: Posterior concentration around true parameter values. The average of E[(β −
(S)]/E[(β − β∗)2|C(S)] across eight simulated datasets for each m ∈ {5, 15, 30, 50}.
censored binomial likelihood. As the censored binomial likelihood recognizes the censoring in
data, we expect it to provide parameter estimates that do not have the biases of the binomial
ood estimators. On the other hand, LC ignores the information in the ranks of the scored
duals, and so we might expect it to provide less precise estimates than the FRN likelihood.
βr : row
βc: column
βd1: continuous dyad
βd2: co-membership
Relative concentration around true value of each parameter:
Measured by E (β − 1)
2
|F (Y ) /E (β − 1)
2
|C (Y ) for each β
When m n, most of the information found by considering
ranked/unranked individuals as groups rather than the relative
ordering of the ranked individuals.
AddHealth Data - Results
−3.65−3.50−3.35
β
intercept
q
q
−0.050.000.050.10
rsmoke rdrink rgpa
q q q
q
q
q
−0.050.000.050.10
csmoke cdrink cgpa
q q q
q
q q
q q q
−0.050.000.050.10
β
dsmoke ddrink dgpa
q q q q q q q
q q
0.20.40.6
β
dacad darts dsport dcivic
q
qq
q
qq
q
qq
q
qq
0.20.40.60.81.0
β
dgrade drace
q q q
q
q q
646 females were asked to rank up to 5 female friends
Mean model with row, column and dyadic effects for smoking,
drinking and gpa as well as dyadic effects for comembership in
activities and grade, and a similarity-in-race measure.
The CIs are based on binary, FRN and rank likelihoods.
46

08 Inference for Networks – DYAD Model Overview (2017)

  • 1.
    Modeling networks: regressionwith additive and multiplicative effects Alexander Volfovsky Department of Statistical Science, Duke May 25 2017 May 25, 2017 Health Networks
  • 2.
    Why model networks? Interestedin understanding the formation of relationships 1
  • 3.
    Why model networks? Interestedin understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology 1
  • 4.
    Why model networks? Interestedin understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: 1
  • 5.
    Why model networks? Interestedin understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: What assumptions are made for different network models? 1
  • 6.
    Why model networks? Interestedin understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: What assumptions are made for different network models? What models work when the assumptions fail? 1
  • 7.
    Why model networks? Interestedin understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: What assumptions are made for different network models? What models work when the assumptions fail? How to develop fail-safes to overcome these problems? 1
  • 8.
    Why model networks? Interestedin understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: What assumptions are made for different network models? What models work when the assumptions fail? How to develop fail-safes to overcome these problems? Where to apply these? 1
  • 9.
    Why model networks? Interestedin understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: What assumptions are made for different network models? What models work when the assumptions fail? How to develop fail-safes to overcome these problems? Where to apply these? Causal inference 1
  • 10.
    Why model networks? Interestedin understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: What assumptions are made for different network models? What models work when the assumptions fail? How to develop fail-safes to overcome these problems? Where to apply these? Causal inference Link prediction 1
  • 11.
    Some context: Facebook Facebookwants to change its’ ad algorithm. 2 Source: Wikimedia
  • 12.
    Some context: Facebook Facebookwants to change its’ ad algorithm. Can’t do it on the whole graph 2 Source: Wikimedia
  • 13.
    Some context: Facebook Facebookwants to change its’ ad algorithm. Can’t do it on the whole graph Need “total network effect” 2 Source: Wikimedia
  • 14.
    How do theysolve it? Interested in estimating 1 N N i=1 [Yi (all treated) − Yi (all controls)] “At a high level, graph cluster randomization is a technique in which the graph is partitioned into a set of clusters, and then randomization between treatment and control is performed at the cluster level.” Where can we find clusters? Observable information (e.g. same school) Unobservable information (“social space”) 3
  • 15.
    Some context: (im)migration Wantto know how regime change affects population. Politicians during election years care about direct effects. 4 Source: http://openscience.alpine-geckos.at/courses/social-network- analyses/empirical-network-analysis/
  • 16.
    Some more context Studyingtram traffic in Vienna 5 Source: kurier.at
  • 17.
    And one more Studyingtaxi rides in Porto 442 taxis 1.7 million rides with (x, y) coordinates at 15 second intervals. 6 Source: Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017). Automatic Differentiation Variational Inference. Journal of Machine Learning Research, 18(14), 1-45.
  • 18.
    And one more Studyingtaxi rides in Porto Project into a 100 dimensional latent space. Learn hidden interpretable patterns... 7 Source: Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017). Automatic Differentiation Variational Inference. Journal of Machine Learning Research, 18(14), 1-45.
  • 19.
    Relational data: commonexamples and goals Changes in exports from year to year −0.30 −0.20 −0.10 −0.4−0.20.00.20.4 first eigenvector of R^ row secondeigenvectorofR^ row Australia Austria Brazil Canada China China, Hong Kong SAR Finland France Germany GreeceIndonesia Ireland Italy Japan Malaysia Mexico Netherlands New Zealand Norway Rep. of Korea Spain Switzerland Thailand Turkey United Kingdom USA −0.25 −0.15 −0.05 0.05 −0.3−0.10.10.3 first eigenvector of R^ col secondeigenvectorofR^ col Australia Austria Brazil Canada China China, Hong Kong SAR Finland France Germany Greece Indonesia Ireland Italy Japan Malaysia Mexico Netherlands New ZealandNorway Rep. of Korea Spain Switzerland Thailand Turkey United Kingdom USA Network regression problems yij = xij β + ij frequently assume independence of the ij 8
  • 20.
    Estimating β innetwork regression −0.30 −0.20 −0.10 −0.4−0.20.00.20.4 first eigenvector of R^ row secondeigenvectorofR^ row Australia Austria Brazil Canada China China, Hong Kong SAR Finland France Germany GreeceIndonesia Ireland Italy Japan Malaysia Mexico Netherlands New Zealand Norway Rep. of Korea Spain Switzerland Thailand Turkey United Kingdom USA −0.25 −0.15 −0.05 0.05 −0.3−0.10.10.3 first eigenvector of R^ col secondeigenvectorofR^ col Australia Austria Brazil Canada China China, Hong Kong SAR Finland France Germany Greece Indonesia Ireland Italy Japan Malaysia Mexico Netherlands New ZealandNorway Rep. of Korea Spain Switzerland Thailand Turkey United Kingdom USA For Y =< X, β > +E we have OLS (assume no dependence among ij ): ˆβ(ols) = (mat(X)t mat(X))−1 mat(X)t vec(Y ) Oracle GLS (assume dependence among ij ): ˆβ(gls) = (mat(X)t (Σ−1 )mat(X))−1 mat(X)t (Σ−1 )vec(Y ) 9
  • 21.
    Network models The data Thereare n actors/nodes labeled 1, . . . , n Y is a sociomatrix: yij is a dyadic relationship between node i and node j. yii frequently undefined. Covariates: node specific: xi dyad specific: xij
  • 22.
    Social relations model Goal:describe the variability in Y . Sender effects describe sociability. Receiver effects describe popularity. Capture this in the Social Relations Model (SRM) yij = ai + bj + ij Almost an ANOVA — want to relate ai to bi since the senders/receivers are from the same set.
  • 23.
    Social relations model yij=µ + ai + bj + ij (ai , bi ) iid ∼N(0, Σab) ( ij , ji ) iid ∼N(0, Σe) Σab = σ2 a σab σab σ2 b describes sender/receiver variability and within person similarity. Σe = σ2 1 ρ ρ 1 describes within dyad correlation. 12
  • 24.
    Variability var(yij ) =σ2 a+ 2σab + σ2 b + σ2 cov(yij , yik) =σ2 a cov(yij , ukj ) =σ2 b cov(yij , yjk) =σab cov(yij , yji ) =2σab + ρσ2 How hard is it to fit this model? fit_SRM <- ame(Y) 13
  • 25.
    Pictures that popup These help capture how well the Markov Chain is mixing and goodness of fit information. 14 Source: Hoff (2015). arXiv:1506.08237
  • 26.
    Goodness of fit Posteriorpredictive distributions. sd.rowmean: standard deviation of row means of Y . sd.colmean: standard deviation of column means of Y . dyad.dep: correlation between vectorized Y and vectorized Y t triad.dep: i jk eij ejkeki #triangle on n nodes Var(vec(Y ))3/2 15 Source: Hoff (2015). arXiv:1506.08237
  • 27.
    Incorporating covariates Imagine youhave some covariates and want to fit yij = βt d xd,ij + βt r xr,i + βt cxc,j + ai + bj + ij xd,ij are dyad specific covariates. xr,i are row (sender) covariates. xc,i are column (receiver) covariates. Frequently xr,i = xc,i = xi When does this not make sense? (Example: popularity is affected by athletic success, but sociability is not) How hard is it to fit this model? fit_SRRM <- ame(Y, Xd=Xd,Xr=Xr,Xc=Xc) 16
  • 28.
    Parsing the input fit_SRRM<- ame(Y, Xdyad=Xd, #n x n x pd array of covariates Xrow=Xr, #n x pr matrix of nodal row covariates Xcol=Xc #n x pc matrix of nodal column covariates ) Xri,p is the value of the pth row covariate for node i. Xdi,j,p is the value of the pth dyadic covariate in the direction of i to j.
  • 29.
    Back to basics Canyou get rid of the dependencies in the model? fit_rm<-ame(Y,Xd=Xd,Xr=Xn,Xc=Xn, rvar=FALSE, #should you fit row random effects? cvar=FALSE, #should you fit column random effects? dcor=FALSE #should you fit a dyadic correlation? ) Note that summary will output: Variance parameters: pmean psd va 0.000 0.000 cab 0.000 0.000 vb 0.000 0.000 rho 0.000 0.000 ve 0.229 0.011 18
  • 30.
    So what’s missinghere? We have a lot of left over variability. Common themes in network analysis: Homophily: similar people connect to each other Stochastic equivalence: similar people act similarly 19
  • 31.
    Which is which? Source:Hoff (2008). NIPS
  • 32.
    Which is which? Left:homophily; Right: stochastic equivalence What are good models for this? Source: Hoff (2008). NIPS
  • 33.
    Introducing multiplicative effects SR(R)Mcan represent second-order dependencies very well. Has a hard time capturing “triadic” behavior. Homophily: create dyadic covariates xd,ij = xi xj Generally this can be represented by xt ri Bxj,i = k l bkl xr,ikxc,jl This is linear in the covariates and so can be baked into the amen framework. Sometimes there is excess correlation to account. This suggests a multiplicative effects model: yij = βt d xd,ij + βt r xr,i + βt cxc,j + ai + bj + ut i vj + ij 21
  • 34.
    Fitting these modelsand beyond fit_ame2<-ame(Y,Xd,Xn,Xn, R=2 #dimension of the multiplicative effect ) 22 Source: Hoff (2015). arXiv:1506.08237
  • 35.
    What happened here? Whydo multiplicative effects help triadic behavior? Triadic measure is related to transitivity (at least for binary data). Turns out homophily can capture transitivity... yij = βt d xd,ij + βt r xr,i + βt cxc,j + ai + bj + ut i vj + ij ui is information about the sender, vj is information about the receiver if ui ≈ vj then ut i vj > 0... if ui ≈ uj then there is some stochastic equivalence...
  • 36.
    Lets generalize: ordinalmodels Imagine a binary (probit) model: yij = 1zij >0 zij = µ + ai + bj + ij Looks like the SRM on the latent scale. fit_SRM<-ame(Y, model="bin" #lots of model options here ) If we go to the iid set up this is just an Erdos-Renyi model: fit_SRG<-ame(Y,model="bin", rvar=FALSE,cvar=FALSE,dcor=FALSE) 24
  • 37.
    Even more general Considerthe following generative model: zij = ut i Dvj + ij yij = g(zij ) 25
  • 38.
    Even more general Considerthe following generative model: zij = ut i Dvj + ij yij = g(zij ) ui are latent factors describing i as a sender
  • 39.
    Even more general Considerthe following generative model: zij = ut i Dvj + ij yij = g(zij ) ui are latent factors describing i as a sender vj are latent factors describing j as a receiver
  • 40.
    Even more general Considerthe following generative model: zij = ut i Dvj + ij yij = g(zij ) ui are latent factors describing i as a sender vj are latent factors describing j as a receiver D is a matrix of factor weights
  • 41.
    Even more general Considerthe following generative model: zij = ut i Dvj + ij yij = g(zij ) ui are latent factors describing i as a sender vj are latent factors describing j as a receiver D is a matrix of factor weights g is an increasing function mapping the latent space to the observed space.
  • 42.
    Even more general Considerthe following generative model: zij = ut i Dvj + ij yij = g(zij ) ui are latent factors describing i as a sender vj are latent factors describing j as a receiver D is a matrix of factor weights g is an increasing function mapping the latent space to the observed space. (Some gs... Normal: g(z) = z, binomial: g(z) = 1z≥0) 25
  • 43.
    This works forsymmetric matrices too! Imagine that yij = yji then the model looks like: zij = ui Λuj + ij yij = g(zij ) 26
  • 44.
    This works forsymmetric matrices too! Imagine that yij = yji then the model looks like: zij = ui Λuj + ij yij = g(zij ) ui ≈ uj represents stochastic equivalence
  • 45.
    This works forsymmetric matrices too! Imagine that yij = yji then the model looks like: zij = ui Λuj + ij yij = g(zij ) ui ≈ uj represents stochastic equivalence Λ is a matrix of eigenvalues:
  • 46.
    This works forsymmetric matrices too! Imagine that yij = yji then the model looks like: zij = ui Λuj + ij yij = g(zij ) ui ≈ uj represents stochastic equivalence Λ is a matrix of eigenvalues: positive λi imply homophily, negative ones imply heterophily. 26
  • 47.
    What is thislatent space? Problem 1: need to select a dimension R. 27
  • 48.
    What is thislatent space? Problem 1: need to select a dimension R. This is hard... sometimes there is some intuition. 27
  • 49.
    What is thislatent space? Problem 1: need to select a dimension R. This is hard... sometimes there is some intuition. Problem 2: should the latent positions be interpreted? 27
  • 50.
    What is thislatent space? Problem 1: need to select a dimension R. This is hard... sometimes there is some intuition. Problem 2: should the latent positions be interpreted? Unclear — maybe think of the distances in this space... 27
  • 51.
    What is thislatent space? Problem 1: need to select a dimension R. This is hard... sometimes there is some intuition. Problem 2: should the latent positions be interpreted? Unclear — maybe think of the distances in this space... Problem 3: what about my favorite other models like stochastic blockmodels?
  • 52.
    What is thislatent space? Problem 1: need to select a dimension R. This is hard... sometimes there is some intuition. Problem 2: should the latent positions be interpreted? Unclear — maybe think of the distances in this space... Problem 3: what about my favorite other models like stochastic blockmodels? These are just a subclass of models! For example, the stochastic blockmodel has discrete support for the latent positions.
  • 53.
    What is thislatent space? All quotes from Hoff, et al 2002 A subset of individuals in the population with a large number of social ties between them may be indicative of a group of individuals who have nearby positions in this space of characteristics, or social space. Various concepts of social space have been discussed by McFarland and Brown (1973) and Faust (1988). In the context of this article, social space refers to a space of unobserved latent characteristics that represent potential transitive tendencies in network relations. A probability measure over these unobserved characteristics induces a model in which the presence of a tie between two individuals is dependent on the presence of other ties.
  • 54.
    (Tiny portion ofthe) literature Nowicki, Krzysztof, and Tom A. B. Snijders. ”Estimation and prediction for stochastic blockstructures.” Journal of the American Statistical Association 96, no. 455 (2001): 1077-1087. Hoff, Peter D., Adrian E. Raftery, and Mark S. Handcock. ”Latent space approaches to social network analysis.” Journal of the american Statistical association 97, no. 460 (2002): 1090-1098. Hoff, Peter. ”Modeling homophily and stochastic equivalence in symmetric relational data.” In Advances in Neural Information Processing Systems, pp. 657-664. 2008. Airoldi, Edoardo M., David M. Blei, Stephen E. Fienberg, and Eric P. Xing. ”Mixed membership stochastic blockmodels.” Journal of Machine Learning Research 9, no. Sep (2008): 1981-2014. Hoff, Peter, Bailey Fosdick, Alex Volfovsky, and Katherine Stovel. ”Likelihoods for fixed rank nomination networks.” Network Science 1, no. 03 (2013): 253-277. Hoff, Peter D. ”Dyadic data analysis with amen.” arXiv preprint arXiv:1506.08237 (2015).
  • 55.
    ame(Y, Xdyad=NULL, Xrow=NULL,Xcol=NULL, rvar = !(model=="rrl") , cvar = TRUE, dcor = !symmetric, nvar = TRUE, R = 0, model="nrm", intercept=!is.element(model,c("rrl","ord")), symmetric=FALSE, odmax=rep(max(apply(Y>0,1,sum,na.rm=TRUE)),nrow(Y)), ...) Y: an n x n square relational matrix of relations. Xdyad: an n x n x pd array of covariates Xrow: an n x pr matrix of nodal row covariates Xcol: an n x pc matrix of nodal column covariates rvar: logical: fit row random effects (asymmetric case)? cvar: logical: fit column random effects (asymmetric case)? dcor: logical: fit a dyadic correlation (asymmetric case)? nvar: logical: fit nodal random effects (symmetric case)? R: int: dimension of the multiplicative effects (can be 0) model: char: one of "nrm","bin","ord","cbin","frn","rrl" odmax: a scalar integer or vector of length n giving the maximum number of nominations that each node may make
  • 56.
    What’s in the...? seed = 1, nscan = 10000, burn = 500, odens = 25, plot=TRUE, print = TRUE, gof=TRUE seed: random seed nscan: number of iterations of the Markov chain (beyond burn-in) burn: burn in for the Markov chain odens: output density for the Markov chain plot: logical: plot results while running? print: logical: print results while running? gof: logical: calculate goodness of fit statistics?
  • 57.
  • 58.
    Social network data Datasets:PROSPER, NSCR, AddHealth ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! 9 10 11 12 0.000.050.100.150.20 proportion Figure 3 interest is a comparison of such estima in order to see if the relationships betw study in Section 3.2. To this end, w 33
  • 59.
    Social network data Datasets:PROSPER, NSCR, AddHealth Relate network characteristics to individual-level behavior ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! 9 10 11 12 0.000.050.100.150.20 proportion Figure 3 interest is a comparison of such estima in order to see if the relationships betw study in Section 3.2. To this end, w 33
  • 60.
    Social network data Datasets:PROSPER, NSCR, AddHealth Relate network characteristics to individual-level behavior Literature: ERGM, latent variable models ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! 9 10 11 12 0.000.050.100.150.20 proportion Figure 3 interest is a comparison of such estima in order to see if the relationships betw study in Section 3.2. To this end, w 33
  • 61.
    Social network data Datasets:PROSPER, NSCR, AddHealth Relate network characteristics to individual-level behavior Literature: ERGM, latent variable models Assumptions: Data is fully observed The support is the set of all sociomatrices ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! 9 10 11 12 0.000.050.100.150.20 proportion Figure 3 interest is a comparison of such estima in order to see if the relationships betw study in Section 3.2. To this end, w 33
  • 62.
    Social network data Datasets:PROSPER, NSCR, AddHealth Relate network characteristics to individual-level behavior Literature: ERGM, latent variable models Assumptions: Data is fully observed The support is the set of all sociomatrices In practice: Ranked data Censored observations ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! 9 10 11 12 0.000.050.100.150.20 proportion Figure 3 interest is a comparison of such estima in order to see if the relationships betw study in Section 3.2. To this end, w 33
  • 63.
    Social network data Datasets:PROSPER, NSCR, AddHealth Relate network characteristics to individual-level behavior Literature: ERGM, latent variable models Assumptions: Data is fully observed The support is the set of all sociomatrices In practice: Ranked data Censored observations ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! 9 10 11 12 0.000.050.100.150.20 proportion Figure 3 interest is a comparison of such estima in order to see if the relationships betw study in Section 3.2. To this end, w A type of likelihood that accommodates the ranked and censored nature of data from Fixed Rank Nomination (FRN) surveys and allows for estimation of regression effects. 33
  • 64.
    Data collection examples PROmotingSchool Community-University Partnerships to Enhance Resilience (PROSPER): “Who are your best and closest friends in your grade?” National Longitudinal Study of Adolescent to Adult Health (AddHealth): “Your male friends. List your closest male friends. List your best male friend first, then your next best friend, and so on.” 34
  • 65.
    Notation Z = {zij: i = j} is a sociomatrix of ordinal relationships zij > zik denotes person i preferring person j to person k Z =      − z12 · · · z1n z21 − ... − zn1 −      35
  • 66.
    Notation Z = {zij: i = j} is a sociomatrix of ordinal relationships zij > zik denotes person i preferring person j to person k Z =      − z12 · · · z1n z21 − ... − zn1 −      35
  • 67.
    Notation Z = {zij: i = j} is a sociomatrix of ordinal relationships zij > zik denotes person i preferring person j to person k Z =      − z12 · · · z1n z21 − ... − zn1 −      Instead of Z we observe a sociomatrix Y = {yij : i = j} 35
  • 68.
    Notation Z = {zij: i = j} is a sociomatrix of ordinal relationships zij > zik denotes person i preferring person j to person k Z =      − z12 · · · z1n z21 − ... − zn1 −      Instead of Z we observe a sociomatrix Y = {yij : i = j} Different sampling schemes define different maps between Y and Z (set relations between yij and zij ). 35
  • 69.
    Notation Z = {zij: i = j} is a sociomatrix of ordinal relationships zij > zik denotes person i preferring person j to person k Z =      − z12 · · · z1n z21 − ... − zn1 −      Instead of Z we observe a sociomatrix Y = {yij : i = j} Different sampling schemes define different maps between Y and Z (set relations between yij and zij ). Statistical model {p (Z|θ) : θ ∈ Θ} assists in analysis 35
  • 70.
    Fixed rank nominations yij> yik ⇒ zij > zik }F (Y )yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) m = maximal number of nominations, di = individual outdegree 36
  • 71.
    Fixed rank nominations yij> yik ⇒ zij > zik }F (Y )yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) m = maximal number of nominations, di = individual outdegree Differentiates between different ranks Captures censoring in the data zi yi 1 2 3 4 5 6 7 8 9 10 36
  • 72.
    Fixed rank nominations yij> yik ⇒ zij > zik }F (Y )yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) m = maximal number of nominations, di = individual outdegree Differentiates between different ranks Captures censoring in the data zi yi 1 2 3 4 5 6 7 8 9 10 4 3 2 1 0 0 0 0 0 0 36
  • 73.
    Fixed rank nominations yij> yik ⇒ zij > zik }F (Y )yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) m = maximal number of nominations, di = individual outdegree Differentiates between different ranks Captures censoring in the data zi yi 1 2 3 4 5 6 7 8 9 10 4 3 2 1 0 0 0 0 0 0 zi1 zi2 zi3 zi4 0> 0> 0> 0> 0> 0>> > > > 36
  • 74.
    Fixed rank nominations yij> yik ⇒ zij > zik }F (Y )yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) m = maximal number of nominations, di = individual outdegree Differentiates between different ranks Captures censoring in the data zi yi 1 2 3 4 5 6 7 8 9 10 36
  • 75.
    Fixed rank nominations yij> yik ⇒ zij > zik }F (Y )yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) m = maximal number of nominations, di = individual outdegree Differentiates between different ranks Captures censoring in the data zi yi 1 2 3 4 5 6 7 8 9 10 5 4 3 2 1 0 0 0 0 0 36
  • 76.
    Fixed rank nominations yij> yik ⇒ zij > zik }F (Y )yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) m = maximal number of nominations, di = individual outdegree Differentiates between different ranks Captures censoring in the data zi yi 1 2 3 4 5 6 7 8 9 10 5 4 3 2 1 0 0 0 0 0 zi1 zi2 zi3 zi4 zi5 ? ? ? ? ?> > > > > 36
  • 77.
    Rank yij > yik⇒ zij > zik } R (Y ) yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) R(Y) 37
  • 78.
    Rank yij > yik⇒ zij > zik } R (Y ) yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) R(Y) Valid but not fully informative: F (Y ) R (Y ) zi yi 1 2 3 4 5 6 7 8 9 10 37
  • 79.
    Rank yij > yik⇒ zij > zik } R (Y ) yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) R(Y) Valid but not fully informative: F (Y ) R (Y ) zi yi 1 2 3 4 5 6 7 8 9 10 4 3 2 1 0 0 0 0 0 0 37
  • 80.
    Rank yij > yik⇒ zij > zik } R (Y ) yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) R(Y) Valid but not fully informative: F (Y ) R (Y ) zi yi 1 2 3 4 5 6 7 8 9 10 4 3 2 1 0 0 0 0 0 0 zi1 zi2 zi3 zi4 ? ? ? ? ? ?> > > > 37
  • 81.
    Rank yij > yik⇒ zij > zik } R (Y ) yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) R(Y) Valid but not fully informative: F (Y ) R (Y ) zi yi 1 2 3 4 5 6 7 8 9 10 37
  • 82.
    Rank yij > yik⇒ zij > zik } R (Y ) yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) R(Y) Valid but not fully informative: F (Y ) R (Y ) zi yi 1 2 3 4 5 6 7 8 9 10 5 4 3 2 1 0 0 0 0 0 37
  • 83.
    Rank yij > yik⇒ zij > zik } R (Y ) yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) R(Y) Valid but not fully informative: F (Y ) R (Y ) zi yi 1 2 3 4 5 6 7 8 9 10 5 4 3 2 1 0 0 0 0 0 zi1 zi2 zi3 zi4 zi5 ? ? ? ? ?> > > > > 37
  • 84.
    Rank yij > yik⇒ zij > zik } R (Y ) yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 yij = 0 ⇒ zij < 0 F(Y) R(Y) Valid but not fully informative: F (Y ) R (Y ) Cannot estimate row (“sender”) specific effects zi yi 1 2 3 4 5 6 7 8 9 10 5 4 3 2 1 0 0 0 0 0 zi1 zi2 zi3 zi4 zi5 ? ? ? ? ?> > > > > 37
  • 85.
    Binary yij > yik⇒ zij > zik yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 } B (Y ) yij = 0 ⇒ zij < 0 F(Y) R(Y) B(Y) 38
  • 86.
    Binary yij > yik⇒ zij > zik yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 } B (Y ) yij = 0 ⇒ zij < 0 F(Y) R(Y) B(Y) Neither fully informative nor valid! Discards information on the ranks Ignores the censoring on the outdegrees In particular: F (Y ) ⊂ B (Y ) zi yi 1 2 3 4 5 6 7 8 9 10 38
  • 87.
    Binary yij > yik⇒ zij > zik yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 } B (Y ) yij = 0 ⇒ zij < 0 F(Y) R(Y) B(Y) Neither fully informative nor valid! Discards information on the ranks Ignores the censoring on the outdegrees In particular: F (Y ) ⊂ B (Y ) zi yi 1 2 3 4 5 6 7 8 9 10 4 3 2 1 0 0 0 0 0 0 38
  • 88.
    Binary yij > yik⇒ zij > zik yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 } B (Y ) yij = 0 ⇒ zij < 0 F(Y) R(Y) B(Y) Neither fully informative nor valid! Discards information on the ranks Ignores the censoring on the outdegrees In particular: F (Y ) ⊂ B (Y ) zi yi 1 2 3 4 5 6 7 8 9 10 4 3 2 1 0 0 0 0 0 0 >0 >0 >0 >0 0> 0> 0> 0> 0> 0> 38
  • 89.
    Binary yij > yik⇒ zij > zik yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 } B (Y ) yij = 0 ⇒ zij < 0 F(Y) R(Y) B(Y) Neither fully informative nor valid! Discards information on the ranks Ignores the censoring on the outdegrees In particular: F (Y ) ⊂ B (Y ) zi yi 1 2 3 4 5 6 7 8 9 10 38
  • 90.
    Binary yij > yik⇒ zij > zik yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 } B (Y ) yij = 0 ⇒ zij < 0 F(Y) R(Y) B(Y) Neither fully informative nor valid! Discards information on the ranks Ignores the censoring on the outdegrees In particular: F (Y ) ⊂ B (Y ) zi yi 1 2 3 4 5 6 7 8 9 10 5 4 3 2 1 0 0 0 0 0 38
  • 91.
    Binary yij > yik⇒ zij > zik yij = 0 and di < m ⇒ zij ≤ 0 yij > 0 ⇒ zij > 0 } B (Y ) yij = 0 ⇒ zij < 0 F(Y) R(Y) B(Y) Neither fully informative nor valid! Discards information on the ranks Ignores the censoring on the outdegrees In particular: F (Y ) ⊂ B (Y ) zi yi 1 2 3 4 5 6 7 8 9 10 5 4 3 2 1 0 0 0 0 0 >0 >0 >0 >0 >0 0> 0> 0> 0> 0> 38
  • 92.
    Bayesian Estimation forFixed Rank Nominations Model: Z ∼ p(Z|θ), θ ∈ Θ Data: Z ∈ F(Y ) Likelihood: LF (θ : Y ) = Pr (Z ∈ F (Y )|θ) = F(Y ) dP (Z|θ) Estimation: Given p(θ), p(θ|Z ∈ F(Y )) can be approximated by a Gibbs sampler. 39
  • 93.
    Bayesian Estimation forFixed Rank Nominations Model: Z ∼ p(Z|θ), θ ∈ Θ Data: Z ∈ F(Y ) Likelihood: LF (θ : Y ) = Pr (Z ∈ F (Y )|θ) = F(Y ) dP (Z|θ) Estimation: Given p(θ), p(θ|Z ∈ F(Y )) can be approximated by a Gibbs sampler. Simulate zij ∼ p(zij |θ, Z−ij , Z ∈ F(Y )): 39
  • 94.
    Bayesian Estimation forFixed Rank Nominations Model: Z ∼ p(Z|θ), θ ∈ Θ Data: Z ∈ F(Y ) Likelihood: LF (θ : Y ) = Pr (Z ∈ F (Y )|θ) = F(Y ) dP (Z|θ) Estimation: Given p(θ), p(θ|Z ∈ F(Y )) can be approximated by a Gibbs sampler. Simulate zij ∼ p(zij |θ, Z−ij , Z ∈ F(Y )): 1. yij > 0: zij ∼ p(zij |θ, Z−ij )1zij ∈(a,b) where a = max(zik : yik < yij ) and b = min(zik : yik > yij ). 39
  • 95.
    Bayesian Estimation forFixed Rank Nominations Model: Z ∼ p(Z|θ), θ ∈ Θ Data: Z ∈ F(Y ) Likelihood: LF (θ : Y ) = Pr (Z ∈ F (Y )|θ) = F(Y ) dP (Z|θ) Estimation: Given p(θ), p(θ|Z ∈ F(Y )) can be approximated by a Gibbs sampler. Simulate zij ∼ p(zij |θ, Z−ij , Z ∈ F(Y )): 1. yij > 0: zij ∼ p(zij |θ, Z−ij )1zij ∈(a,b) where a = max(zik : yik < yij ) and b = min(zik : yik > yij ). 2. yij = 0 and di < m: zij ∼ p(zij |Z−ij , θ)1zij ≤0. 39
  • 96.
    Bayesian Estimation forFixed Rank Nominations Model: Z ∼ p(Z|θ), θ ∈ Θ Data: Z ∈ F(Y ) Likelihood: LF (θ : Y ) = Pr (Z ∈ F (Y )|θ) = F(Y ) dP (Z|θ) Estimation: Given p(θ), p(θ|Z ∈ F(Y )) can be approximated by a Gibbs sampler. Simulate zij ∼ p(zij |θ, Z−ij , Z ∈ F(Y )): 1. yij > 0: zij ∼ p(zij |θ, Z−ij )1zij ∈(a,b) where a = max(zik : yik < yij ) and b = min(zik : yik > yij ). 2. yij = 0 and di < m: zij ∼ p(zij |Z−ij , θ)1zij ≤0. 3. yij = 0 and di = m: zij ∼ p(zij |Z−ij , θ)1zij ≤min(zik :yik >0) 39
  • 97.
    Bayesian Estimation forFixed Rank Nominations Model: Z ∼ p(Z|θ), θ ∈ Θ Data: Z ∈ F(Y ) Likelihood: LF (θ : Y ) = Pr (Z ∈ F (Y )|θ) = F(Y ) dP (Z|θ) Estimation: Given p(θ), p(θ|Z ∈ F(Y )) can be approximated by a Gibbs sampler. Simulate zij ∼ p(zij |θ, Z−ij , Z ∈ F(Y )): 1. yij > 0: zij ∼ p(zij |θ, Z−ij )1zij ∈(a,b) where a = max(zik : yik < yij ) and b = min(zik : yik > yij ). 2. yij = 0 and di < m: zij ∼ p(zij |Z−ij , θ)1zij ≤0. 3. yij = 0 and di = m: zij ∼ p(zij |Z−ij , θ)1zij ≤min(zik :yik >0) Allows for imputation of missing yij 39
  • 98.
    Simulations We generated Zfrom the following Social Relations Model (Warner, Kenny and Stoto (1979)): zij = βt xij + ai + bj + ij ai bi iid ∼ normal 0, 1 0.5 0.5 1 ij ji iid ∼ normal 0, 1 0.9 0.9 1 Mean model: βtxij = β0 + βr xir + βcxjc + βd1 xij1 + βd2 xij2 xir , xjc: individual level variables xij1: pair specific variable xij2: co-membership in a group 40
  • 99.
    Simulations We generated Zfrom the following Social Relations Model (Warner, Kenny and Stoto (1979)): zij = βt xij + ai + bj + ij ai bi iid ∼ normal 0, 1 0.5 0.5 1 ij ji iid ∼ normal 0, 1 0.9 0.9 1 Mean model: βtxij = β0 + βr xir + βcxjc + βd1 xij1 + βd2 xij2 xir , xjc: individual level variables xij1: pair specific variable xij2: co-membership in a group βr = βc = βd1 = βd2 = 1 and β0 = −3.26 xir , xic, xij1 iid ∼ N (0, 1) xij2 = si sj /.42 for si iid ∼ binary (1/2) 40
  • 100.
    Simulations We generated Zfrom the following Social Relations Model (Warner, Kenny and Stoto (1979)): zij = βt xij + ai + bj + ij ai bi iid ∼ normal 0, 1 0.5 0.5 1 ij ji iid ∼ normal 0, 1 0.9 0.9 1 Mean model: βtxij = β0 + βr xir + βcxjc + βd1 xij1 + βd2 xij2 xir , xjc: individual level variables xij1: pair specific variable xij2: co-membership in a group βr = βc = βd1 = βd2 = 1 and β0 = −3.26 xir , xic, xij1 iid ∼ N (0, 1) xij2 = si sj /.42 for si iid ∼ binary (1/2) 40
  • 101.
    Simulations - Censoring 8simulations for each m ∈ {5, 15} with 100 nodes each1 2 3 4 5 6 7 8 0.00.51.01.5 !r ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! m = 5 1 2 3 4 5 6 7 8 0.40.81.21.6 !c ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 !d1 ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.2 !d2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! simulation 1 2 3 4 5 6 7 8 0.00.51.01.5 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! m = 15 1 2 3 4 5 6 7 8 0.40.81.21.6 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! simulation m = 5 m = 15 1 2 3 4 5 6 7 8 0.00.51.01.5 !r ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.21.6 !c ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 !d1 ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.2 !d2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! simulation 1 2 3 4 5 6 7 8 0.00.51.01.5 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.21.6 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! simulation m = 5 m = 15 Confidence intervals under the three different likelihood for column and an iid dyadic variable. The groups of three CIs are based on binary, FRN and rank likelihoods from left to right. 41
  • 102.
    Simulations - Censoring 12 3 4 5 6 7 8 0.00.51.01.5 !r ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! m = 5 1 2 3 4 5 6 7 8 0.40.81.21.6 !c ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 !d1 ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 2 1 2 3 4 5 6 7 8 0.00.51.01.5 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! m = 15 1 2 3 4 5 6 7 8 0.40.81.21.6 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 2 m = 5 m = 15 Rank likelihood cannot estimate row effects Z ∈ R (Y ) ⇐⇒ Z + c1t ∈ R (Y ) ∀c ∈ Rn
  • 103.
    Simulations - Censoring 12 3 4 5 6 7 8 0.00.51.01.5 !r ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! m = 5 1 2 3 4 5 6 7 8 0.40.81.21.6 !c ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 !d1 ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 2 1 2 3 4 5 6 7 8 0.00.51.01.5 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! m = 15 1 2 3 4 5 6 7 8 0.40.81.21.6 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 2 m = 5 m = 15 Rank likelihood cannot estimate row effects Z ∈ R (Y ) ⇐⇒ Z + c1t ∈ R (Y ) ∀c ∈ Rn Binary likelihood poorly estimates row effects
  • 104.
    Simulations - Censoring 12 3 4 5 6 7 8 0.00.51.01.5 !r ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! m = 5 1 2 3 4 5 6 7 8 0.40.81.21.6 !c ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 !d1 ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 2 1 2 3 4 5 6 7 8 0.00.51.01.5 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! m = 15 1 2 3 4 5 6 7 8 0.40.81.21.6 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 2 m = 5 m = 15 Rank likelihood cannot estimate row effects Z ∈ R (Y ) ⇐⇒ Z + c1t ∈ R (Y ) ∀c ∈ Rn Binary likelihood poorly estimates row effects Large amount of censoring
  • 105.
    Simulations - Censoring 12 3 4 5 6 7 8 0.00.51.01.5 !r ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! m = 5 1 2 3 4 5 6 7 8 0.40.81.21.6 !c ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 !d1 ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 2 1 2 3 4 5 6 7 8 0.00.51.01.5 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! m = 15 1 2 3 4 5 6 7 8 0.40.81.21.6 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 2 m = 5 m = 15 Rank likelihood cannot estimate row effects Z ∈ R (Y ) ⇐⇒ Z + c1t ∈ R (Y ) ∀c ∈ Rn Binary likelihood poorly estimates row effects Large amount of censoring ⇒ Heterogeneity of censored outdegrees is low
  • 106.
    Simulations - Censoring 12 3 4 5 6 7 8 0.00.51.01.5 !r ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! m = 5 1 2 3 4 5 6 7 8 0.40.81.21.6 !c ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 !d1 ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 2 1 2 3 4 5 6 7 8 0.00.51.01.5 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! m = 15 1 2 3 4 5 6 7 8 0.40.81.21.6 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.81.0 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 2 m = 5 m = 15 Rank likelihood cannot estimate row effects Z ∈ R (Y ) ⇐⇒ Z + c1t ∈ R (Y ) ∀c ∈ Rn Binary likelihood poorly estimates row effects Large amount of censoring ⇒ Heterogeneity of censored outdegrees is low ⇒ Regression coefficients estimated too low
  • 107.
    Simulations - Censoring 12 3 4 5 6 7 8 0.81.0 !d1 ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.2 !d2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! simulation 1 2 3 4 5 6 7 8 0.81.0 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! simulation m = 5 m = 15 Recall: xij2 ∝ si sj , an indicator of comembership to a group 43
  • 108.
    Simulations - Censoring 12 3 4 5 6 7 8 0.81.0 !d1 ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.2 !d2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! simulation 1 2 3 4 5 6 7 8 0.81.0 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! simulation m = 5 m = 15 Recall: xij2 ∝ si sj , an indicator of comembership to a group Ignore the censoring 43
  • 109.
    Simulations - Censoring 12 3 4 5 6 7 8 0.81.0 !d1 ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.2 !d2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! simulation 1 2 3 4 5 6 7 8 0.81.0 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! simulation m = 5 m = 15 Recall: xij2 ∝ si sj , an indicator of comembership to a group Ignore the censoring ⇒ Binary likelihood underestimates row variability 43
  • 110.
    Simulations - Censoring 12 3 4 5 6 7 8 0.81.0 !d1 ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.2 !d2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! simulation 1 2 3 4 5 6 7 8 0.81.0 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 2 3 4 5 6 7 8 0.40.81.2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! simulation m = 5 m = 15 Recall: xij2 ∝ si sj , an indicator of comembership to a group Ignore the censoring ⇒ Binary likelihood underestimates row variability ⇒ Underestimate the variability in xij2 43
  • 111.
    Simulations - informationin the ranks Let C (Y ) be the set of values for which the following is true: yij > 0 ⇒ zij > 0 yij = 0 and di < m ⇒ zij ≤ 0 min {zij : yij > 0} ≥ max {zij : yij = 0} We refer to LC (θ : Y ) = Pr (Z ∈ C (Y )|θ) as the censored binary likelihood. Recognizes censoring but ignores information in the ranks
  • 112.
    Simulations - informationin the ranks Let C (Y ) be the set of values for which the following is true: yij > 0 ⇒ zij > 0 yij = 0 and di < m ⇒ zij ≤ 0 min {zij : yij > 0} ≥ max {zij : yij = 0} We refer to LC (θ : Y ) = Pr (Z ∈ C (Y )|θ) as the censored binary likelihood. Recognizes censoring but ignores information in the ranks Performs similarly to FRN in the previous study Less precise than FRN when m is big
  • 113.
    Simulations - informationin the ranks Same setup as before, but average uncensored outdegree is m 10 20 30 40 50 0.20.40.60.81.01.21.4 m relativeconcentrationaroundtruevalue ! ! ! ! !r ! ! ! ! !c ! ! ! ! !d1 ! ! ! ! !d2 2: Posterior concentration around true parameter values. The average of E[(β − (S)]/E[(β − β∗)2|C(S)] across eight simulated datasets for each m ∈ {5, 15, 30, 50}. censored binomial likelihood. As the censored binomial likelihood recognizes the censoring in data, we expect it to provide parameter estimates that do not have the biases of the binomial ood estimators. On the other hand, LC ignores the information in the ranks of the scored duals, and so we might expect it to provide less precise estimates than the FRN likelihood. βr : row βc: column βd1: continuous dyad βd2: co-membership Relative concentration around true value of each parameter: Measured by E (β − 1) 2 |F (Y ) /E (β − 1) 2 |C (Y ) for each β 45
  • 114.
    Simulations - informationin the ranks Same setup as before, but average uncensored outdegree is m 10 20 30 40 50 0.20.40.60.81.01.21.4 m relativeconcentrationaroundtruevalue ! ! ! ! !r ! ! ! ! !c ! ! ! ! !d1 ! ! ! ! !d2 2: Posterior concentration around true parameter values. The average of E[(β − (S)]/E[(β − β∗)2|C(S)] across eight simulated datasets for each m ∈ {5, 15, 30, 50}. censored binomial likelihood. As the censored binomial likelihood recognizes the censoring in data, we expect it to provide parameter estimates that do not have the biases of the binomial ood estimators. On the other hand, LC ignores the information in the ranks of the scored duals, and so we might expect it to provide less precise estimates than the FRN likelihood. βr : row βc: column βd1: continuous dyad βd2: co-membership Relative concentration around true value of each parameter: Measured by E (β − 1) 2 |F (Y ) /E (β − 1) 2 |C (Y ) for each β When m n, most of the information found by considering ranked/unranked individuals as groups rather than the relative ordering of the ranked individuals.
  • 115.
    AddHealth Data -Results −3.65−3.50−3.35 β intercept q q −0.050.000.050.10 rsmoke rdrink rgpa q q q q q q −0.050.000.050.10 csmoke cdrink cgpa q q q q q q q q q −0.050.000.050.10 β dsmoke ddrink dgpa q q q q q q q q q 0.20.40.6 β dacad darts dsport dcivic q qq q qq q qq q qq 0.20.40.60.81.0 β dgrade drace q q q q q q 646 females were asked to rank up to 5 female friends Mean model with row, column and dyadic effects for smoking, drinking and gpa as well as dyadic effects for comembership in activities and grade, and a similarity-in-race measure. The CIs are based on binary, FRN and rank likelihoods. 46