adams – 1
Gathering Social Network Data
Freeman, Linton C. 2000. "See you in the Funny Papers: Cartoons and Social Networks." Connections 23(1):32-42.
Figure 15.
The remaining figures all deal with applications of various sorts. Figures 16,
17 and 18 deal with the issue of searching through a network. Figure 16 is
yet another Sally Forth strip by Greg Howard. It is concerned with the use
of social networks to find a job (Granovetter, 1974).
Like
Know
Exert Authority Over
Talk to
Pound
Positive
Negative
A B C
D
EF G
H
(a uni-modal, multiplex, directed, valenced, network)
adams – 2
jimi adams
Associate Professor
University of Colorado Denver
Department of Health & Behavioral Sciences
DNAC Workshop on Social Networks & Health:
Gathering Social Network Data
adams – 3
This visualization is of the literature on PMTCT in
AIDS & JAIDS 1988-2008. See more information:
adams, jimi & Ryan Light. 2014. "Mapping
Interdisciplinary Fields: Efficiencies, Gaps &
Redundancies in HIV/AIDS Research." PLoS One
9(12):e115092
This session’s aims:
1. Network “Sampling”
2. Network Measurement
§ “The Boundary Specification Problem”
3. Platforms for Data Collection (overview)
4. A brief aside on Ethics in SNA
5. Some Assessments of SN Data Quality
Overview
adams – 4
Principles not Recipes
§ Apply to range of data collection strategies
§ NOT only for surveys
§ Qual & Quant data have most of the same
considerations outlined here
§ Passive (e.g., archival, observational, etc.)
approaches as well as active
§ Many differing opinions on solutions w/in
identified domains
§ common rules of thumb, not “best practices”
à feel free to ask questions e.g., “how would
this apply to ___ study?”
An Aside before we get going…
Field testing a network survey of religious
leaders near Rumphi, Malawi (2005).
adams – 5
Sampling Measurement Modes Ethics Assessment
§ When studying this network, how much of it
are we interested in capturing?
§ Dyadic
§ directly connected node pairs
§ Ego Network
§ focal node and all
directly connected alters
§ Sub-groups
§ Attribute based groups
§ Ego + n-steps
§ “link tracing” 1, 2, 3, n
§ “Complete”
§ Node-based boundaries +
ALL ties within
§ Which of these aims you’re after will help
determine how you gather network data, in
terms of:
§ Sampling
§ Measurement
Network “Sampling”
Butts CT, Acton RM, Marcum CS.
“Interorganizational Collaboration in the
Hurricane Katrina Response.”
Journal of Social Structure 2012;13(1).
adams – 6
Sampling Measurement Modes Ethics Assessment
§ Data – Respondent (ego) & the people they are
connected to (alters)
§ Key aims – # & characteristics of ego’s alters
§ e.g., GSS “Important Matters” Name Generator:
§ “From time to time, most people discuss important
matters with other people. Looking back over the last
six months -- who are the people with whom you
discussed matters important to you? Just tell me their
first names or initials.”
§ Popular Examples –
§ MDICP (fertility & HIV/AIDS “discussion partners”),
NHSLS (sexual partnering & social support),
Indianapolis Network Mental Health Study, many
many more
Collecting Network Data
1 - Local / Ego Network Designs
1
Morris M. Network Epidemiology: A Handbook for survey design and Data Collection. Oxford University Press; 2004.
adams – 7
Sampling Measurement Modes Ethics Assessment
Coleman JS, Katz E, Menzel H. The
Diffusion of an Innovation Among
Physicians. Sociometry 1957;20(4):253.
Collecting Network Data
2 – “Complete” Network Designs
2
1. Identify “complete” population
boundary
2. Enumerate all relationships
within
§ How “complete” is all?
§ Popular Examples –
§ Add Health, Framingham
Heart Study, PROSPER,
CKM Physicians, increasing
number of others
Morris M. Network Epidemiology: A Handbook for survey design and Data Collection. Oxford University Press; 2004.
adams – 8
Sampling Measurement Modes Ethics Assessment
Especially common for hard to
identify/find populations.
1. Start with a sample of “seeds.”
2. Follow Ego-Network Design
for eliciting SN data from
some sample of “seeds.”
3. Sample some proportion of
nominated alters to recruit as
next wave of respondents
§ Random Walks
§ Strong Ties
§ Census, etc…
4. Repeat
§ Popular Examples –
§ “Project 90”, lots of others
especially in disease tracing
Collecting Network Data
3 – Partial Network Designs
3
(Image Source:
https://www.math.umass.edu/~gile/kgilepurdue2012.pdf)
Morris M. Network Epidemiology: A Handbook for survey design and Data Collection. Oxford University Press; 2004.
adams – 9
Sampling Measurement Modes Ethics Assessment
1. Local/Ego Network Data (Population-based Sampling)
§ Data – Respondent & people they are connected to
§ Key aims – # & characteristics of respondents’ ties
3. Partial Network Data (Network-based Sampling)
§ Data – Some tracing to reach (contacts of) contacts
§ Key aims – Contact Tracing, especially for
Hidden/Unknown Population Studies
2. “Complete” Network Data (Census)
§ Data – All actors & ties within a boundary
§ Key aims – Define population, enumerate ties w/in
boundary (not |complete|)
Network “Sampling”
1
2
3
Morris M. Network Epidemiology: A Handbook for survey design and Data Collection. Oxford University Press; 2004.
adams – 10
Sampling Measurement Modes Ethics Assessment1987 GSS—is the following (taken from the
GSS codebook):
127. From time to time, most people discuss impor-
tant matters with other people. Looking back over
the last six months—who are the people with whom
you discussed matters important to you? Just tell
me their first names or initials. IF LESS THAN 5
NAMES MENTIONED, PROBE, Anyone else?
ONLY RECORD FIRST 5 NAMES.
NAME1________________________________
NAME2________________________________
NAME3________________________________
NAME4________________________________
NAME5________________________________
The question was followed by this coding
scheme, turned into the GSS variable labeled
“Numgiven”:
128. INTERVIEWER CHECK: HOW MANY
NAMES WERE MENTIONED?
[answer] [code]
0 0
1 1
2 2
3 3
4 4
5 5
+6+ 6
(1) the scale
nearly three-
stunning and
(2) most othe
did not chang
same period.
presents some
ics—that A
declined from
scale of chan
than the contr
colleagues (2
THE SCALE
What sociolo
and social ne
1 McPherso
weighting issue
substantive co
2 Moreover,
ative correlatio
ment appear o
controlled for
ences (on criti
Wed, 05 Aug 2009 03:09:09
The GSS
“Personal Networks”
Name Generator
Marsden PV. 1987. Core Discussion Networks of Americans. American Sociological Review 52:122-131.
Network Measurement
adams – 11
Sampling Measurement Modes Ethics Assessment
§ Age: size drops with age,
§ kin/nonkin varies by age
§ Education: Increases size (& nonkin)
§ Race/ethnicity: Whites have larger nets
§ Heterogeneity varies by race/ethnicity
§ Sex: Women have more kin
§ Size of Place: Urbanites cite more non-kin
“Important Matter”
Ego Networks in the GSS
Subgroup Size Differences
McPherson M, Smith-Lovin L, Brashears ME. Social Isolation in America: Changes in Core Discussion Networks over
Two Decades. American Sociological Review 2006;71:353-375
adams – 12
Sampling Measurement Modes Ethics Assessment
0
5
10
15
20
25
30
0 1 2 3 4 5 6+
1985
2004
“Important Matter”
Ego Networks in the GSS
Estimating Isolation
McPherson M, Smith-Lovin L, Brashears ME. Social Isolation in America: Changes in Core Discussion Networks over
Two Decades. American Sociological Review 2006;71:353-375
adams – 13
Sampling Measurement Modes Ethics Assessment“Important Matter”
Ego Networks in the GSS
Isolation
McPherson M, Smith-Lovin L, Brashears ME. Social Isolation in America: Changes in Core Discussion Networks over
Two Decades. American Sociological Review 2006;71:353-375
adams – 14
Sampling Measurement Modes Ethics Assessment“Important Matter”
Ego Networks in the GSS
Isolation
McPherson M, Smith-Lovin L, Brashears ME. Social Isolation in America: Changes in Core Discussion Networks over
Two Decades. American Sociological Review 2006;71:353-375
adams – 15
Sampling Measurement Modes Ethics Assessment
How large is an “average person’s social network”?
§ Pool & Kochen (1967) – Mathematical model
§ 500
§ Robin Dunbar (1992) – Model from primate interactions and
projecting to humans based on difference in neo-cortex size.
“Replicated” via a study of Christmas cards sent.
§ 150
§ Bernard & Killworth (2001) – Empirical inquiry via “Network
Scale-up Method”
§ Mean – 291; Median 231
§ Facebook “friends” (2015)
§ Mean – 338; median 200
Ego Network Data Collection
Dunbar’s Number
adams – 16
Sampling Measurement Modes Ethics AssessmentCollecting Network Data
2 – “Complete” Network Designs
2
§ National Longitudinal Study of
Adolescent Health (Add Health)
§ 100+ schools, 90k+ students
§ Friendship
§ Full Roster
§ Up to 5 male & female friends
§ Followed up with relationship strength
questions
§ Romantic partners - last 18
months
§ Nang Rong Kinship Networks
§ Study of demographic &
environmental change in 51
villages
§ Full Population – household
kinship rosters & marriage
Morris M. Network Epidemiology: A Handbook for survey design and Data Collection. Oxford University Press; 2004.
adams – 17
Sampling Measurement Modes Ethics Assessment
1
2 - Follow Name Generators w/ (set of) Name Interpreters
Collecting Network Data
1 - Local / Ego Network Designs
Image Source: Tom Valente
adams – 18
Sampling Measurement Modes Ethics Assessment
§ Age: size drops with age,
§ kin/nonkin varies by age
§ Education: Increases size (& nonkin)
§ Race/ethnicity: Whites have larger nets
§ Heterogeneity varies by race/ethnicity (IQV=Whites
.03, Blacks .13, Hispanics .22)
§ Sex: Women have more kin
§ Size of Place: Urbanites cite more non-kin
“Important Matter”
Ego Networks in the GSS
Alter Subgroup Differences
1
McPherson M, Smith-Lovin L, Brashears ME. Social Isolation in America: Changes in Core Discussion Networks over
Two Decades. American Sociological Review 2006;71:353-375
adams – 19
Sampling Measurement Modes Ethics Assessment
3 – Ego Estimates Relationships Among Alters
Collecting Network Data
1 - Local / Ego Network Designs
1
Anglewicz, Philip, jimi adams, Francis Obare, Susan C. Watkins, & Hans-Peter Kohler. 2009. “The Malawi Diffusion & Ideational
Change Project 2004-06: Data collection, data quality & analyses of attrition.” Demographic Research 20(21): 503-540
adams – 20
Sampling Measurement Modes Ethics AssessmentCollecting Network Data
1 - Local / Ego Network Designs
Freeman LC. Visualizing Social Networks. Journal of
Social Structure 2000;1(1).
Hogan B, Carrrasco JA, Wellman B. Visualizing Personal
Networks: Working with Participant-Aided Sociograms.
Field Methods 2007;19(2):116-144.
adams – 21
Sampling Measurement Modes Ethics Assessment
http://networkcanvas.com/
adams – 22
Sampling Measurement Modes Ethics Assessment
2 - Follow Name Generators w/ (set of) Name Interpreters
Collecting Network Data
1 - Local / Ego Network Designs
http://networkcanvas.com/
adams – 23
Sampling Measurement Modes Ethics AssessmentCollecting Network Data
Considerations
How many name generators to use / relations to gather?
Borgatti SP, Mehra A, Brass DJ, Labianca G. Network Analysis in the Social Sciences. Science 2009;323:892-895.
future characteristics depend in part on its posi-
tion in the network structure. Whereas traditional
socialresearchexplainedanindividual’soutcomes
or characteristics as a function of other character-
istics of the same individual (e.g., income as a
functionofeducationandgender),socialnetwork
researchers look to the individual’s social environ-
ment for explanations, whether through influence
formation of network ties and, more generally, to
predict a host of network properties, such as the
clusteredness of networks or the distributions of
node centrality. In the social sciences, most work
of this type has been conducted at the dyadic
level to examine such questions as: What is the
basis of friendship ties? How do firms pick alli-
ance partners? A host of explanations have been
linked indiv
influence (34
Theoretic
common m
quences of so
ofdirecttrans
this is a phys
rial resources
Similarities
Location
e.g.,
Same
spatial
and
temporal
space
e.g.,
Same
clubs
Same
events
etc.
e.g.,
Same
gender
Same
attitude
etc.
Membership
Interactions
e.g.,
Sex with
Talked to
Advice to
Helped
Harmed
etc.
Flows
e.g.,
Information
Beliefs
Personnel
Resources
etc.
Attribute
Social Relations
Kinship
e.g.,
Mother of
Sibling of
e.g.,
Friend of
Boss of
Student of
Competitor of
e.g.,
Likes
Hates
etc.
Other role Affective
e.g.,
Knows
Knows
about
Sees as
happy
etc.
Cognitive
Fig. 3. A typology of ties studied in social network analysis.
VIEW
Marin A, Hampton KN Simplifying the Personal Network Name Generator: Alternatives to Traditional Multiple and Single Name Generators." Field Methods
2007; 19(2):163-193.
adams – 24
Sampling Measurement Modes Ethics Assessment
What do you want to capture?
Collecting Network Data
Considerations
Eagle N, et al. Inferring Friendship Network Structure by Using Mobile Phone
Data. PNAS 2009; 106(36):15274-15278.
adams, j. Distant Friends, Close Strangers? Inferring Friendships from
Behavior. PNAS 2010; 107(9):e29-30.
adams – 25
Sampling Measurement Modes Ethics Assessment
To Cap or Not to Cap # of Alters?
§ Common practice to elicit top N ties (e.g., N = 3-6), differing
practice on whether caps are implicit/explicit
§ can introduce ceiling and/or floor effects
§ can alter individual & network statistics:
§ introduced biases can be differentially associated with
network attributes (e.g., those w/ higher degree)
Collecting Network Data
Considerations
adams – 26
Sampling Measurement Modes Ethics Assessment
§ Specificity vs. General relationship estimates – timing,
denominators (e.g., MDICP’s “AIDS discussion partners”)
Bell DC. Partner Naming and Forgetting: Recall of Network Members. Social Networks 2007; 29(2):279-299.
§ Free recall vs. Roster-based – cognitive difficulty/ease, rosters
can inflate nominations, requirements for instrument prep
Eagle DE, Proeschold-Bell RJ. Methodological Considerations in the Use of Name Generators and Interpreters. Social Networks 2015; 40:75-83.
§ Binary vs. Valued vs. Nested (adapted from Steve Borgatti)
§ Who have you seen regularly in the past six months?
§ How often do you see named alters? a – once a year; b – once a month; c – once a week, d – daily
§ Who do you see at least once a year? Of those,…once a month? Of those…once a week? Of those…daily?
§ In longitudinal studies, priming from previous responses –
under-/over-estimate stability?
Brewer DD. Forgetting in the Recall-Based Elicitation of Person and Social Networks. Social Networks 2000; 22:29-43.
§ Breadth vs. Depth tradeoff – structure vs. content of ties, timing
limitations impact network data collection more than other types
Paik A, Sanchagrin K Social Isolation in America: An Artifact. American Sociological Review 2013; 78(3):339-360.
§ Pre-test, Pre-test, Pre-test
Collecting Network Data
Considerations
adams – 27
Sampling Measurement Modes Ethics Assessment
The “Boundary Specification” Problem
§ “[S]ystem specification is probably a more serious issue for
network analysis than for much survey analysis.” (Laumann et al 1992: 63)
§ Simultaneously about node & tie level inclusion.
§ Approaches to BSP:
§ Realist – defined by the (mutual) subjective
perceptions of actors.
§ Nominalist – exogenously identified conceptual population
boundary.
§ BSP is salient for all network sampling designs (not just a
question for complete network studies, not just surveys, etc.).
Collecting Network Data
Sampling Boundaries
Laumann EO, Marsden PV, Prensky D. “The Boundary Specification Problem in Network Analysis.” In: Freeman LC,
White DR, Romney AK, (eds.) Research Methods in Social Network Analysis. Transaction Publishers; 1994.
adams – 28
Sampling Measurement Modes Ethics AssessmentCollecting Network Data
BSP – Why “Complete” vs. Complete
Lazega E, et al. Effects of Competition on Collective Learning in Advice
Networks. Social Networks 2016; 47:1-14.
Limitations in the BSP
§ Porous boundaries
§ Collaborations outside the lab, advice
outside the organization (Lazega)
§ Sexual partners off the island (Helleringer)
§ Add Health, some schools more bounded
than others (e.g., Sunshine vs. Jefferson)
§ “Off Roster”
§ Respondents – cannot (easily) be linked to
their received nominations
§ Nominations – may not be matched across
multiple namings
à Can artificially
§ Inflate number of nodes in the network
§ Reduce observed local network
characteristics (e.g., reciprocity)
§ Alter network-level statistics
adams – 29
Sampling Measurement Modes Ethics AssessmentCollecting Network Data
2 – “Complete” Network Designs
Ego
M
M
M
M
Out
Un
True Network Observed Network
(Image Source: James Moody)
1 1 1 1 1 1
0 1 0 0 0 1
1 1 1 0 0 N
0 N 1 1 0 N
1 N 0 1 0 N
0 N 0 1 0 1
N N N N N N
1
6
7
5
4
3
2
2
Ego
M
M
M
M
Out
Un
Out7 7
1
6
5
4
3
Un
2Un
2
2
Why is “complete” in quotes?
adams – 30
Sampling Measurement Modes Ethics AssessmentCollecting Network Data
2 – “Complete” Network Designs
(Image Source: James Moody)
2
1 1 1 1 1
N 0 0 0 0
1 1 1 0 0
1 0 0 1 0
1 0 0 1 0
N 0 0 0 0
M
M
M
M
M
Un
True Network
M
M
M
M
M
Un
Observed Network
Un
Un
Un
1
6
5
4
3
2
1
6
5
4
3
2
1
1
1
Why is “complete” in quotes?
adams – 31
Sampling Measurement Modes Ethics Assessment
§ Respondent Driven Sampling - A combination of snowball
sample and mathematical model weighting to account for non-
random sampling designs to allow for the estimation of
population level estimation for hidden/hard-to-reach populations
(Heckathorn 1997)
§ Sampling over networks
§ Not sampling of networks
§ Rarely allows for SNA
§ Gile & Handcock 2011
§ Dombrowski et al. 2011
§ Also – see updates
§ Goel & Salganik 2010
Collecting Network Data
An Aside about RDS
aka – Why aren’t we talking more about it?
Rudolph AE, Crawford ND, Latkin C, Fowler JH, Fuller CM. 2013. “Individual and Neighborhood
Correlates of Membership in Drug Using Networks with Higher Prevalence of HIV in New York City.”
Annals of Epidemiology 23(5):267-274.
adams – 32
Sampling Measurement Modes Ethics Assessment
Ego Partial Complete
Relative
Difficulty of
Process
Low
High (hard to gauge
success without
prior knowledge of
data aims)
Medium (response
rate, biases)
Inference
Given assumptions
hold, same as for
independent samples
Harder, often boot-
strap based
Alternative
modeling
accounting for
dependencies
Respondent
Burden
High Medium Medium-Low
Ease of Node
Matching
NA Built in?
Depends on pre-
collection prep &
knowledge
Researcher
Control
High-Moderate Low Moderate
Network “Sampling” – Tradeoffs
1
2
3
Morris M. Network Epidemiology: A Handbook for survey design and Data Collection. Oxford University Press; 2004.
High-Moderate Low Moderate
Low
High (hard to gauge
success without
prior knowledge of
data aims)
Medium (response
rate, biases)
Given assumptions
hold, same as for
independent samples
Harder, often boot-
strap based
Alternative
modeling
accounting for
dependencies
High Medium Medium-Low
NA Built in?
Depends on prep,
approach & prior
knowledge
adams – 33
Sampling Measurement Modes Ethics Assessment
Davis A, Gardner BB, Gardner MR. Deep South. The University of Chicago Press; 1941
Bipartite / 2-Mode Networks
Original DGG Network
Actor x Actor
Projection
Event x Event
Projection
adams – 34
Sampling Measurement Modes Ethics AssessmentSN Data Collection Strategies
Cognitive Social Structures
David Krackhardt (1987)
recognized the salience of people
within a studied network having
perceptions of the network’s
structure
§ For some questions those
perceptions are as (more) salient
for understanding group dynamics
§ Reorientation of numerous
network “informant accuracy”
studies
§ Expensive!
§ Can be adapted for “external
observers” (Cairns & Cairns)
Gest DS, Moody J, Rulison K. “Density or Distinction? The Roles of Data Structure and Group Detection Methods in
Describing Adolescent Peer Groups.” Journal of Social Structure 2007;8(2).
adams – 35
Sampling Measurement Modes Ethics AssessmentNon-Survey Methods
Observational
Torrens PM, Griffin WA. Exploring the Micro-Social Geography of Children’s Interactions in Pre-School: A Long-Term
Observational Study and Analysis Using Geographic Information Technologies. Environment & Behavior 2012; 45(5):584-614.
Schaefer DR, et al. Fundamental Principles of Network Formation among Preschool Children.
Social Networks 2010; 32:61-71.
adams – 36
Sampling Measurement Modes Ethics AssessmentEnhanced-Survey Methods
ACASI & Other Self-Administered
Kreager DA, et al. Toward a Criminology of Inmate Networks. Justice Quarterly 2016; 33(6):1000-1028.
adams – 37
Sampling Measurement Modes Ethics AssessmentMulti-Mode Methods
Eagle DE, Proeschold-Bell RJ. Methodological Considerations in the Use of
Name Generators and Interpreters. Social Networks 2015; 40:75-83.
adams – 38
Sampling Measurement Modes Ethics AssessmentNon-Survey Methods
Archival Sources
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
0.06
0.18
0.05
0.1
0.02
0.09
0.04
0.06
0.19
0.07
0.16
0.03
0.14
0.05 0.05
0.3
0.08
0.09
0.05
0.1
0.05
0.27
0.15
0.06
0.08
0.03
0.070.07
0.1
0.2
0.3
2009 2010 2011 2012
year
Centralization(Betweenness)
city
●a
●a
●a
●a
●a
●a
●a
Austin
Dallas
EP
Houston
Lubbock
RGV
SA
Centralization (Betweenness): 2009−2012
Loresto FL. Exploring the Use of Social Network Analysis on Physician Networks Created from Medicare Data through
Studying the Use of Minimally Invasive Breast Biopsy Among Physicians: Descriptions, Regressions, and Network Models.
Dissertation – Clinical Science, UTMB. 2018
adams – 39
SNA Data Repositories
§ Lin Freeman’s collection of “classic” datasets
§ http://moreno.ss.uci.edu/data.html
§ Katya Ognyanova’s collection of collections
§ http://kateto.net/2016/05/network-datasets/
§ Mark Newman’s mish-mash collection
§ http://www-personal.umich.edu/~mejn/netdata/
§ Stanford Large Network Dataset Collection
§ https://snap.stanford.edu/data/
§ ICPSR (not network specific, but a good general
resource)
§ http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp
adams – 40
Sampling Measurement Modes Ethics AssessmentEthics in SNA
adams – 41
Sampling Measurement Modes Ethics Assessment
No blanket Approach; aspects to consider:
§ Informed Consent & Deductive Disclosure
§ Anonymity v. Confidentiality
§ Who benefits? Who bears burdens?
§ Tradeoffs
§ Descriptive vs. Predictive Utility
§ Aggregate vs. Individual level predictions
§ E.g., escribing structural (in-)efficiencies in an organization
vs. who to hire/fire/promote within it
Ethics in SNA
adams – 42
Sampling Measurement Modes Ethics AssessmentEthics, 2-mode data and recent events
A (mostly) tongue-in-cheek analysis
adams – 43
Evaluating Social Network Data
Freeman, Linton C. 2000. "See you in the Funny Papers: Cartoons and Social Networks." Connections 23(1):32-42.
adams – 44
Sampling Measurement Modes Ethics Assessment
0
5
10
15
20
25
1985 2004
% NumGiv = 0
2010
“Important Matter” Isolation –
Ego Networks in the GSS
Fischer CS. Still Connected: Family and Friends in America since 1970. Russell Sage Foundation; 2011.
See also: Paik A, Sanchagrin K. Social
Isolation in America: An Artifact.
American Sociological Review 2013;
78(3): 339-360.
adams – 45
Sampling Measurement Modes Ethics Assessment
§ GSS “Important Matters” ego networks – Isolation ~20%
§ About half are “true isolates”
§ Other half are “uninteresteds” (have contacts, nothing
important to talk about)
§ What are “important matters”? (via ~cognitive interviewing)
§ Broad domains – Community Issues, News & Economy, Kids &
Education, Politics & Elections, Life & Health, Relationships, Money &
House, Ideology & Religion, Work
§ “Trivial” (?) topics - where to get a haircut, “cloning headless frogs”
§ Does social capital value hinge on this determination?
§ Hard to determine objectively (see next slide)
SN Data Collection Strategies
What do the Data Represent?
Bearman P, Parigi P. 2004. “Cloning Headless Frogs and Other Important Matters:
Conversation Topics and Network Structure.” Social Forces 83(2):535-557.
adams – 46
Sampling Measurement Modes Ethics AssessmentSN Data Collection Strategies
What do the Data Represent?
Who talks about what, with whom?
Bearman P, Parigi P. 2004. “Cloning Headless Frogs and Other Important Matters:
Conversation Topics and Network Structure.” Social Forces 83(2):535-557.
adams – 47
Sampling Measurement Modes Ethics AssessmentSN Data Collection Strategies
What do the Data Represent?
Who talks about what, with whom?
Bearman P, Parigi P. 2004. “Cloning Headless Frogs and Other Important Matters:
Conversation Topics and Network Structure.” Social Forces 83(2):535-557.
adams – 48
Sampling Measurement Modes Ethics AssessmentSN Data Collection Strategies
What do the Data Represent?
Brashears ME. 2014. “’Trivial’ Topics and Rich Ties: The Relationship between Discussion Topic, Alter Role, and Resource
Availability Using the ‘Important Matters’ Name Generator.” Sociological Science 1:493-511.
adams – 49
Sampling Measurement Modes Ethics AssessmentSN Data Collection Strategies
Reliability
J. Moody / Social Networks 29 (2007) 44–58
adams j, Moody J. “To Tell the Truth? Measuring Concordance in Multiply-Reported Network Data.”
Social Networks 2007;29:44-58.
adams – 50
Sampling Measurement Modes Ethics AssessmentSN Data Collection Strategies
Reliability
R says tie exists, do C/A agree?
C/A says tie exists, does R agree?
adams j, Moody J. “To Tell the Truth? Measuring Concordance in Multiply-Reported Network Data.”
Social Networks 2007;29:44-58.
adams – 51
Sampling Measurement Modes Ethics AssessmentSN Data Collection Strategies
Reliability
Helleringer S, et al. The Reliability of Sexual Partnership Histories: Implications for the
Measurement of Partnership Concurrency During Surveys. AIDS 25(4):503.
Concordance is often found to be
relatively low (BKS)
§ Recency & timing
specificity/anchoring improve
reliability
§ Ongoing > dissolved
§ Marital > nonmarital
§ Single partner > multiple
partners
§ Long term > short term
§ Gender differences
Duration of overlap between concurrent partnerships
An association between partnership duration and the
probability of reporting a partnership implies that when
compared to the complete (concordant) scenario, the
estimated average overlap of relationships is biased
upward (downward) in self-reported survey data on
sexual partnerships. We formally prove this claim in
supplementary Appendix A1; http://links.lww.com/
QAD/A108. Numerical examples indicate that the size
of the bias is greatest for respondents with at least one
marital relation, and when the duration of a respondent’s
concurrent partnerships is highly heterogeneous (e.g. two
long-term partnerships and one short-term partnership).
Discussion
In this study, we used sociocentric network data to assess
the interpartner reliability of partnership histories
collected during surveys of sexual behaviors. We found
very low reliability in reports of nonmarital partnerships,
likely large and of unknown direction. Am
we found no partnership concurrency in
(concordant reports), and very low levels of
according to self-reported data. On the othe
all women and close to 20% of never marrie
concurrent partnerships according to o
scenario, which includes reports made by a
partner(s). This is an important finding in
apparent discrepancy between qualitative s
indicated that concurrent partnerships may
among women in sub-Saharan Africa
quantitative surveys having documented ve
of concurrency among women [12,32]. Th
thus, be attributed to the poor quality of su
concurrent partnerships.
Among men, we also found significantly hi
concurrent partnerships in our complete s
was, however, not true for never married m
there were no differences between self-
complete data. Because the reliability o
reports is much lower in dissolved
508 AIDS 2011, Vol 25 No 4
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Short ShortLong Long Recent RecentDistant Distant
Men Women Men Women
Ongoing relations
Concordant Concordant but disagree re: date Resp only Partner only
Dissolved relations
Fig. 2. Proportion of nonmarital sexual partnerships concordantly or discordantly reported by a respondent and h
by respondent gender and partnership duration/timing. ‘Concordant’ refers to partnerships reported by both partner
but disagree re: date’ refers to partnerships reported by both partners, but one partner reported the partnership as ong
the other reported it as dissolved’; ‘resp only’ refers to relationships only reported by the respondent; ‘partner
relationships only reported by the partner. p[SR] ¼ 100 À ’Partner only’; p[OR] ¼ 100 À ‘Concordant’. The bars rep
number of relationships in which respondents were involved according either to their own self-reports or to the
partners. Bars are stacked to sum up to 100%.
Ever marriedNever marriedAll respondents
(b)
(c)
10
0
Ever marriedNever marriedAll respondents
Concordant scenario
Self−reports
Complete scenario
50
40
30
20
10
0
Ever MarriedNever MarriedAll respondents
Women:self−reports
Women: Complete scenario
Men: self−reports
Men: complete scenario
50
40
30
20
10
0
Fig. 3. Prevalence of partnership concurrency at the time of
the survey according to three different scenarios. The first
scenario (‘concordant scenario’) includes only partnerships
adams – 52
Sampling Measurement Modes Ethics AssessmentDealing with Contested Reports
An W, Schramski S. Analysis of Contested Reports in Exchange Networks based on Actors’ Credibility. Social Networks 2015; 40:25-33
adams – 53
Sampling Measurement Modes Ethics AssessmentSN Data Collection Strategies
Reliability
§ reducible: contained triadic closure (Figure 1a)
§ irreducible: contained no closed triads (Figure 1b)
§ strong: if relations were described using kin labels
(e.g., parent/child, brother/sister)
§ weak: if relations were described using non-kin
recreational labels (e.g., friend, group member)
Table 2 | Regression models predicting performance, accuracy, coverage, relationship accuracy, and Erroneously Close
wordspan, overguess, and number of incorrect ties. Generalized Linear Mixed Model predicting all five dependent variable
Model Number 1 2 3 4 5
Model Type OLS OLS OLS OLS OLS
Fitting Stage Full Full Full Full Trimmed
DV Performance Accuracy Coverage Relationship Accuracy Relationship Accuracy
Reducible 0.128 6 0.042 0.144 6 0.035 0.126 6 0.034 0.089 6 0.037 0.096 6 0.026
(p , 0.002) (p , .001) (p , 0.001) (p , 0.016) (p , 0.001)
Strong 20.087 6 0.042 20.073 6 0.035 20.067 6 0.034 20.011 6 0.037 20.007 6 0.026
(p , 0.039) (p , 0.038) (p , 0.051) (p , 0.776) (p , 0.797)
Reducible* Strong 0.171 6 0.060 0.113 6 0.05 0.129 6 0.049 0.004 6 0.053
(p , 0.005) (p , 0.026) (p , 0.009) (p , 0.935)
Overguess 20.163 6 0.038 0.252 6 0.037
(p , .001) (p , 0.001)
Timespent 0.001 6 0.00005 0.0004 6 0.00004 0.0004 6 0.00004 0.0001 6 0.00004 0.0001 6 0.00004
(p , 0.001) (p , 0.001) (p , 0.001) (p , 0.014) (p , 0.011)
Word Span 0.042 6 0.015 0.036 6 0.012 0.036 6 0.012 0.020 6 0.013
(p , 0.005) (p , .004) (p , 0.003) (p , 0.126)Brashears ME. Humans use Compression Heuristics to Improve the Recall of Social Networks
Nature Human Behavior 2013; 3:1513.
adams – 54
Sampling Measurement Modes Ethics Assessment
§ How commonly is network data likely to be missing?
Grippa F, Gloor PA. You are Who Remembers You: Detecting Leadership through Accuracy of Recall. Social Networks 2009; 31(4): 255-261.
Wang DJ, Shi X, McFarland DA, Leskovec J. Measurement Error in Network Data: A Re-Classification. Social Networks 2012; 34(4):396-409.
Yenigün, D., et al. Omission and commission errors in network cognition and network estimation using ROC curve." Social Networks 2017; 50: 26-34.
§ Diagnosing the biases due to missing network data:
Borgatti SP, Carley KM, Krackhardt D. On the Robustness of Centrality Measures under Conditions of Imperfect Data. Social Networks 2006; 28:124-136.
Gonzalez-Bailon S, et al. Assessing the Bias in Samples of Large Online Networks. Social Networks 2014; 38:16-27.
Kossinets G. Effects of Missing Data on Social Networks. Social Networks 2006; 28(3): 247-268.
Smith JA, Moody J. Structural Effects of Network Sampling Coverage I: Nodes Missing at Random. Social Networks 2013; 35:652-668.
Smith JA, Moody J, Morgan JH. Network Sampling Coverage II: The Effect of non-Random Missing Data on Network Measurement. Social Networks 2017;
48:78-99.
§ Strategies for dealing with missing data:
Handcock MS, Gile KJ. Modeling Social Networks from Sampled Data. Annals of Applied Statistics 2010; 4:5-25.
Huisman M. Imputation of Missing Network Data: Some Simple Procedures. Journal of Social Structure 2009; 10.
Koskinen JH, et al. Bayesian Analysis for Partially Observed Network Data, Missing Ties, Attributes, and Actors. Social Networks 2013; 35:514-527.
Neal JW. "Kracking" the Missing Data Problem: Applying Krackhardt's Cognitive Social Structures to School-Based Social Networks. Sociology of
Education 2008; 81: 140-162.
§ Punchline – it’s a bear, and yet to achieve consensus on what to
do about it.
Missing Data
adams – 55
Sampling Measurement Modes Ethics AssessmentApparent Contradictions don’t
Necessarily indicate Poor Data
PNAS 2000; 97(22): 12385-12388.
adams – 56
Questions?
Also, be on the lookout for a QASS book on this
topic, hopefully, early next year.

01 Network Data Collection

  • 1.
    adams – 1 GatheringSocial Network Data Freeman, Linton C. 2000. "See you in the Funny Papers: Cartoons and Social Networks." Connections 23(1):32-42. Figure 15. The remaining figures all deal with applications of various sorts. Figures 16, 17 and 18 deal with the issue of searching through a network. Figure 16 is yet another Sally Forth strip by Greg Howard. It is concerned with the use of social networks to find a job (Granovetter, 1974). Like Know Exert Authority Over Talk to Pound Positive Negative A B C D EF G H (a uni-modal, multiplex, directed, valenced, network)
  • 2.
    adams – 2 jimiadams Associate Professor University of Colorado Denver Department of Health & Behavioral Sciences DNAC Workshop on Social Networks & Health: Gathering Social Network Data
  • 3.
    adams – 3 Thisvisualization is of the literature on PMTCT in AIDS & JAIDS 1988-2008. See more information: adams, jimi & Ryan Light. 2014. "Mapping Interdisciplinary Fields: Efficiencies, Gaps & Redundancies in HIV/AIDS Research." PLoS One 9(12):e115092 This session’s aims: 1. Network “Sampling” 2. Network Measurement § “The Boundary Specification Problem” 3. Platforms for Data Collection (overview) 4. A brief aside on Ethics in SNA 5. Some Assessments of SN Data Quality Overview
  • 4.
    adams – 4 Principlesnot Recipes § Apply to range of data collection strategies § NOT only for surveys § Qual & Quant data have most of the same considerations outlined here § Passive (e.g., archival, observational, etc.) approaches as well as active § Many differing opinions on solutions w/in identified domains § common rules of thumb, not “best practices” à feel free to ask questions e.g., “how would this apply to ___ study?” An Aside before we get going… Field testing a network survey of religious leaders near Rumphi, Malawi (2005).
  • 5.
    adams – 5 SamplingMeasurement Modes Ethics Assessment § When studying this network, how much of it are we interested in capturing? § Dyadic § directly connected node pairs § Ego Network § focal node and all directly connected alters § Sub-groups § Attribute based groups § Ego + n-steps § “link tracing” 1, 2, 3, n § “Complete” § Node-based boundaries + ALL ties within § Which of these aims you’re after will help determine how you gather network data, in terms of: § Sampling § Measurement Network “Sampling” Butts CT, Acton RM, Marcum CS. “Interorganizational Collaboration in the Hurricane Katrina Response.” Journal of Social Structure 2012;13(1).
  • 6.
    adams – 6 SamplingMeasurement Modes Ethics Assessment § Data – Respondent (ego) & the people they are connected to (alters) § Key aims – # & characteristics of ego’s alters § e.g., GSS “Important Matters” Name Generator: § “From time to time, most people discuss important matters with other people. Looking back over the last six months -- who are the people with whom you discussed matters important to you? Just tell me their first names or initials.” § Popular Examples – § MDICP (fertility & HIV/AIDS “discussion partners”), NHSLS (sexual partnering & social support), Indianapolis Network Mental Health Study, many many more Collecting Network Data 1 - Local / Ego Network Designs 1 Morris M. Network Epidemiology: A Handbook for survey design and Data Collection. Oxford University Press; 2004.
  • 7.
    adams – 7 SamplingMeasurement Modes Ethics Assessment Coleman JS, Katz E, Menzel H. The Diffusion of an Innovation Among Physicians. Sociometry 1957;20(4):253. Collecting Network Data 2 – “Complete” Network Designs 2 1. Identify “complete” population boundary 2. Enumerate all relationships within § How “complete” is all? § Popular Examples – § Add Health, Framingham Heart Study, PROSPER, CKM Physicians, increasing number of others Morris M. Network Epidemiology: A Handbook for survey design and Data Collection. Oxford University Press; 2004.
  • 8.
    adams – 8 SamplingMeasurement Modes Ethics Assessment Especially common for hard to identify/find populations. 1. Start with a sample of “seeds.” 2. Follow Ego-Network Design for eliciting SN data from some sample of “seeds.” 3. Sample some proportion of nominated alters to recruit as next wave of respondents § Random Walks § Strong Ties § Census, etc… 4. Repeat § Popular Examples – § “Project 90”, lots of others especially in disease tracing Collecting Network Data 3 – Partial Network Designs 3 (Image Source: https://www.math.umass.edu/~gile/kgilepurdue2012.pdf) Morris M. Network Epidemiology: A Handbook for survey design and Data Collection. Oxford University Press; 2004.
  • 9.
    adams – 9 SamplingMeasurement Modes Ethics Assessment 1. Local/Ego Network Data (Population-based Sampling) § Data – Respondent & people they are connected to § Key aims – # & characteristics of respondents’ ties 3. Partial Network Data (Network-based Sampling) § Data – Some tracing to reach (contacts of) contacts § Key aims – Contact Tracing, especially for Hidden/Unknown Population Studies 2. “Complete” Network Data (Census) § Data – All actors & ties within a boundary § Key aims – Define population, enumerate ties w/in boundary (not |complete|) Network “Sampling” 1 2 3 Morris M. Network Epidemiology: A Handbook for survey design and Data Collection. Oxford University Press; 2004.
  • 10.
    adams – 10 SamplingMeasurement Modes Ethics Assessment1987 GSS—is the following (taken from the GSS codebook): 127. From time to time, most people discuss impor- tant matters with other people. Looking back over the last six months—who are the people with whom you discussed matters important to you? Just tell me their first names or initials. IF LESS THAN 5 NAMES MENTIONED, PROBE, Anyone else? ONLY RECORD FIRST 5 NAMES. NAME1________________________________ NAME2________________________________ NAME3________________________________ NAME4________________________________ NAME5________________________________ The question was followed by this coding scheme, turned into the GSS variable labeled “Numgiven”: 128. INTERVIEWER CHECK: HOW MANY NAMES WERE MENTIONED? [answer] [code] 0 0 1 1 2 2 3 3 4 4 5 5 +6+ 6 (1) the scale nearly three- stunning and (2) most othe did not chang same period. presents some ics—that A declined from scale of chan than the contr colleagues (2 THE SCALE What sociolo and social ne 1 McPherso weighting issue substantive co 2 Moreover, ative correlatio ment appear o controlled for ences (on criti Wed, 05 Aug 2009 03:09:09 The GSS “Personal Networks” Name Generator Marsden PV. 1987. Core Discussion Networks of Americans. American Sociological Review 52:122-131. Network Measurement
  • 11.
    adams – 11 SamplingMeasurement Modes Ethics Assessment § Age: size drops with age, § kin/nonkin varies by age § Education: Increases size (& nonkin) § Race/ethnicity: Whites have larger nets § Heterogeneity varies by race/ethnicity § Sex: Women have more kin § Size of Place: Urbanites cite more non-kin “Important Matter” Ego Networks in the GSS Subgroup Size Differences McPherson M, Smith-Lovin L, Brashears ME. Social Isolation in America: Changes in Core Discussion Networks over Two Decades. American Sociological Review 2006;71:353-375
  • 12.
    adams – 12 SamplingMeasurement Modes Ethics Assessment 0 5 10 15 20 25 30 0 1 2 3 4 5 6+ 1985 2004 “Important Matter” Ego Networks in the GSS Estimating Isolation McPherson M, Smith-Lovin L, Brashears ME. Social Isolation in America: Changes in Core Discussion Networks over Two Decades. American Sociological Review 2006;71:353-375
  • 13.
    adams – 13 SamplingMeasurement Modes Ethics Assessment“Important Matter” Ego Networks in the GSS Isolation McPherson M, Smith-Lovin L, Brashears ME. Social Isolation in America: Changes in Core Discussion Networks over Two Decades. American Sociological Review 2006;71:353-375
  • 14.
    adams – 14 SamplingMeasurement Modes Ethics Assessment“Important Matter” Ego Networks in the GSS Isolation McPherson M, Smith-Lovin L, Brashears ME. Social Isolation in America: Changes in Core Discussion Networks over Two Decades. American Sociological Review 2006;71:353-375
  • 15.
    adams – 15 SamplingMeasurement Modes Ethics Assessment How large is an “average person’s social network”? § Pool & Kochen (1967) – Mathematical model § 500 § Robin Dunbar (1992) – Model from primate interactions and projecting to humans based on difference in neo-cortex size. “Replicated” via a study of Christmas cards sent. § 150 § Bernard & Killworth (2001) – Empirical inquiry via “Network Scale-up Method” § Mean – 291; Median 231 § Facebook “friends” (2015) § Mean – 338; median 200 Ego Network Data Collection Dunbar’s Number
  • 16.
    adams – 16 SamplingMeasurement Modes Ethics AssessmentCollecting Network Data 2 – “Complete” Network Designs 2 § National Longitudinal Study of Adolescent Health (Add Health) § 100+ schools, 90k+ students § Friendship § Full Roster § Up to 5 male & female friends § Followed up with relationship strength questions § Romantic partners - last 18 months § Nang Rong Kinship Networks § Study of demographic & environmental change in 51 villages § Full Population – household kinship rosters & marriage Morris M. Network Epidemiology: A Handbook for survey design and Data Collection. Oxford University Press; 2004.
  • 17.
    adams – 17 SamplingMeasurement Modes Ethics Assessment 1 2 - Follow Name Generators w/ (set of) Name Interpreters Collecting Network Data 1 - Local / Ego Network Designs Image Source: Tom Valente
  • 18.
    adams – 18 SamplingMeasurement Modes Ethics Assessment § Age: size drops with age, § kin/nonkin varies by age § Education: Increases size (& nonkin) § Race/ethnicity: Whites have larger nets § Heterogeneity varies by race/ethnicity (IQV=Whites .03, Blacks .13, Hispanics .22) § Sex: Women have more kin § Size of Place: Urbanites cite more non-kin “Important Matter” Ego Networks in the GSS Alter Subgroup Differences 1 McPherson M, Smith-Lovin L, Brashears ME. Social Isolation in America: Changes in Core Discussion Networks over Two Decades. American Sociological Review 2006;71:353-375
  • 19.
    adams – 19 SamplingMeasurement Modes Ethics Assessment 3 – Ego Estimates Relationships Among Alters Collecting Network Data 1 - Local / Ego Network Designs 1 Anglewicz, Philip, jimi adams, Francis Obare, Susan C. Watkins, & Hans-Peter Kohler. 2009. “The Malawi Diffusion & Ideational Change Project 2004-06: Data collection, data quality & analyses of attrition.” Demographic Research 20(21): 503-540
  • 20.
    adams – 20 SamplingMeasurement Modes Ethics AssessmentCollecting Network Data 1 - Local / Ego Network Designs Freeman LC. Visualizing Social Networks. Journal of Social Structure 2000;1(1). Hogan B, Carrrasco JA, Wellman B. Visualizing Personal Networks: Working with Participant-Aided Sociograms. Field Methods 2007;19(2):116-144.
  • 21.
    adams – 21 SamplingMeasurement Modes Ethics Assessment http://networkcanvas.com/
  • 22.
    adams – 22 SamplingMeasurement Modes Ethics Assessment 2 - Follow Name Generators w/ (set of) Name Interpreters Collecting Network Data 1 - Local / Ego Network Designs http://networkcanvas.com/
  • 23.
    adams – 23 SamplingMeasurement Modes Ethics AssessmentCollecting Network Data Considerations How many name generators to use / relations to gather? Borgatti SP, Mehra A, Brass DJ, Labianca G. Network Analysis in the Social Sciences. Science 2009;323:892-895. future characteristics depend in part on its posi- tion in the network structure. Whereas traditional socialresearchexplainedanindividual’soutcomes or characteristics as a function of other character- istics of the same individual (e.g., income as a functionofeducationandgender),socialnetwork researchers look to the individual’s social environ- ment for explanations, whether through influence formation of network ties and, more generally, to predict a host of network properties, such as the clusteredness of networks or the distributions of node centrality. In the social sciences, most work of this type has been conducted at the dyadic level to examine such questions as: What is the basis of friendship ties? How do firms pick alli- ance partners? A host of explanations have been linked indiv influence (34 Theoretic common m quences of so ofdirecttrans this is a phys rial resources Similarities Location e.g., Same spatial and temporal space e.g., Same clubs Same events etc. e.g., Same gender Same attitude etc. Membership Interactions e.g., Sex with Talked to Advice to Helped Harmed etc. Flows e.g., Information Beliefs Personnel Resources etc. Attribute Social Relations Kinship e.g., Mother of Sibling of e.g., Friend of Boss of Student of Competitor of e.g., Likes Hates etc. Other role Affective e.g., Knows Knows about Sees as happy etc. Cognitive Fig. 3. A typology of ties studied in social network analysis. VIEW Marin A, Hampton KN Simplifying the Personal Network Name Generator: Alternatives to Traditional Multiple and Single Name Generators." Field Methods 2007; 19(2):163-193.
  • 24.
    adams – 24 SamplingMeasurement Modes Ethics Assessment What do you want to capture? Collecting Network Data Considerations Eagle N, et al. Inferring Friendship Network Structure by Using Mobile Phone Data. PNAS 2009; 106(36):15274-15278. adams, j. Distant Friends, Close Strangers? Inferring Friendships from Behavior. PNAS 2010; 107(9):e29-30.
  • 25.
    adams – 25 SamplingMeasurement Modes Ethics Assessment To Cap or Not to Cap # of Alters? § Common practice to elicit top N ties (e.g., N = 3-6), differing practice on whether caps are implicit/explicit § can introduce ceiling and/or floor effects § can alter individual & network statistics: § introduced biases can be differentially associated with network attributes (e.g., those w/ higher degree) Collecting Network Data Considerations
  • 26.
    adams – 26 SamplingMeasurement Modes Ethics Assessment § Specificity vs. General relationship estimates – timing, denominators (e.g., MDICP’s “AIDS discussion partners”) Bell DC. Partner Naming and Forgetting: Recall of Network Members. Social Networks 2007; 29(2):279-299. § Free recall vs. Roster-based – cognitive difficulty/ease, rosters can inflate nominations, requirements for instrument prep Eagle DE, Proeschold-Bell RJ. Methodological Considerations in the Use of Name Generators and Interpreters. Social Networks 2015; 40:75-83. § Binary vs. Valued vs. Nested (adapted from Steve Borgatti) § Who have you seen regularly in the past six months? § How often do you see named alters? a – once a year; b – once a month; c – once a week, d – daily § Who do you see at least once a year? Of those,…once a month? Of those…once a week? Of those…daily? § In longitudinal studies, priming from previous responses – under-/over-estimate stability? Brewer DD. Forgetting in the Recall-Based Elicitation of Person and Social Networks. Social Networks 2000; 22:29-43. § Breadth vs. Depth tradeoff – structure vs. content of ties, timing limitations impact network data collection more than other types Paik A, Sanchagrin K Social Isolation in America: An Artifact. American Sociological Review 2013; 78(3):339-360. § Pre-test, Pre-test, Pre-test Collecting Network Data Considerations
  • 27.
    adams – 27 SamplingMeasurement Modes Ethics Assessment The “Boundary Specification” Problem § “[S]ystem specification is probably a more serious issue for network analysis than for much survey analysis.” (Laumann et al 1992: 63) § Simultaneously about node & tie level inclusion. § Approaches to BSP: § Realist – defined by the (mutual) subjective perceptions of actors. § Nominalist – exogenously identified conceptual population boundary. § BSP is salient for all network sampling designs (not just a question for complete network studies, not just surveys, etc.). Collecting Network Data Sampling Boundaries Laumann EO, Marsden PV, Prensky D. “The Boundary Specification Problem in Network Analysis.” In: Freeman LC, White DR, Romney AK, (eds.) Research Methods in Social Network Analysis. Transaction Publishers; 1994.
  • 28.
    adams – 28 SamplingMeasurement Modes Ethics AssessmentCollecting Network Data BSP – Why “Complete” vs. Complete Lazega E, et al. Effects of Competition on Collective Learning in Advice Networks. Social Networks 2016; 47:1-14. Limitations in the BSP § Porous boundaries § Collaborations outside the lab, advice outside the organization (Lazega) § Sexual partners off the island (Helleringer) § Add Health, some schools more bounded than others (e.g., Sunshine vs. Jefferson) § “Off Roster” § Respondents – cannot (easily) be linked to their received nominations § Nominations – may not be matched across multiple namings à Can artificially § Inflate number of nodes in the network § Reduce observed local network characteristics (e.g., reciprocity) § Alter network-level statistics
  • 29.
    adams – 29 SamplingMeasurement Modes Ethics AssessmentCollecting Network Data 2 – “Complete” Network Designs Ego M M M M Out Un True Network Observed Network (Image Source: James Moody) 1 1 1 1 1 1 0 1 0 0 0 1 1 1 1 0 0 N 0 N 1 1 0 N 1 N 0 1 0 N 0 N 0 1 0 1 N N N N N N 1 6 7 5 4 3 2 2 Ego M M M M Out Un Out7 7 1 6 5 4 3 Un 2Un 2 2 Why is “complete” in quotes?
  • 30.
    adams – 30 SamplingMeasurement Modes Ethics AssessmentCollecting Network Data 2 – “Complete” Network Designs (Image Source: James Moody) 2 1 1 1 1 1 N 0 0 0 0 1 1 1 0 0 1 0 0 1 0 1 0 0 1 0 N 0 0 0 0 M M M M M Un True Network M M M M M Un Observed Network Un Un Un 1 6 5 4 3 2 1 6 5 4 3 2 1 1 1 Why is “complete” in quotes?
  • 31.
    adams – 31 SamplingMeasurement Modes Ethics Assessment § Respondent Driven Sampling - A combination of snowball sample and mathematical model weighting to account for non- random sampling designs to allow for the estimation of population level estimation for hidden/hard-to-reach populations (Heckathorn 1997) § Sampling over networks § Not sampling of networks § Rarely allows for SNA § Gile & Handcock 2011 § Dombrowski et al. 2011 § Also – see updates § Goel & Salganik 2010 Collecting Network Data An Aside about RDS aka – Why aren’t we talking more about it? Rudolph AE, Crawford ND, Latkin C, Fowler JH, Fuller CM. 2013. “Individual and Neighborhood Correlates of Membership in Drug Using Networks with Higher Prevalence of HIV in New York City.” Annals of Epidemiology 23(5):267-274.
  • 32.
    adams – 32 SamplingMeasurement Modes Ethics Assessment Ego Partial Complete Relative Difficulty of Process Low High (hard to gauge success without prior knowledge of data aims) Medium (response rate, biases) Inference Given assumptions hold, same as for independent samples Harder, often boot- strap based Alternative modeling accounting for dependencies Respondent Burden High Medium Medium-Low Ease of Node Matching NA Built in? Depends on pre- collection prep & knowledge Researcher Control High-Moderate Low Moderate Network “Sampling” – Tradeoffs 1 2 3 Morris M. Network Epidemiology: A Handbook for survey design and Data Collection. Oxford University Press; 2004. High-Moderate Low Moderate Low High (hard to gauge success without prior knowledge of data aims) Medium (response rate, biases) Given assumptions hold, same as for independent samples Harder, often boot- strap based Alternative modeling accounting for dependencies High Medium Medium-Low NA Built in? Depends on prep, approach & prior knowledge
  • 33.
    adams – 33 SamplingMeasurement Modes Ethics Assessment Davis A, Gardner BB, Gardner MR. Deep South. The University of Chicago Press; 1941 Bipartite / 2-Mode Networks Original DGG Network Actor x Actor Projection Event x Event Projection
  • 34.
    adams – 34 SamplingMeasurement Modes Ethics AssessmentSN Data Collection Strategies Cognitive Social Structures David Krackhardt (1987) recognized the salience of people within a studied network having perceptions of the network’s structure § For some questions those perceptions are as (more) salient for understanding group dynamics § Reorientation of numerous network “informant accuracy” studies § Expensive! § Can be adapted for “external observers” (Cairns & Cairns) Gest DS, Moody J, Rulison K. “Density or Distinction? The Roles of Data Structure and Group Detection Methods in Describing Adolescent Peer Groups.” Journal of Social Structure 2007;8(2).
  • 35.
    adams – 35 SamplingMeasurement Modes Ethics AssessmentNon-Survey Methods Observational Torrens PM, Griffin WA. Exploring the Micro-Social Geography of Children’s Interactions in Pre-School: A Long-Term Observational Study and Analysis Using Geographic Information Technologies. Environment & Behavior 2012; 45(5):584-614. Schaefer DR, et al. Fundamental Principles of Network Formation among Preschool Children. Social Networks 2010; 32:61-71.
  • 36.
    adams – 36 SamplingMeasurement Modes Ethics AssessmentEnhanced-Survey Methods ACASI & Other Self-Administered Kreager DA, et al. Toward a Criminology of Inmate Networks. Justice Quarterly 2016; 33(6):1000-1028.
  • 37.
    adams – 37 SamplingMeasurement Modes Ethics AssessmentMulti-Mode Methods Eagle DE, Proeschold-Bell RJ. Methodological Considerations in the Use of Name Generators and Interpreters. Social Networks 2015; 40:75-83.
  • 38.
    adams – 38 SamplingMeasurement Modes Ethics AssessmentNon-Survey Methods Archival Sources ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.06 0.18 0.05 0.1 0.02 0.09 0.04 0.06 0.19 0.07 0.16 0.03 0.14 0.05 0.05 0.3 0.08 0.09 0.05 0.1 0.05 0.27 0.15 0.06 0.08 0.03 0.070.07 0.1 0.2 0.3 2009 2010 2011 2012 year Centralization(Betweenness) city ●a ●a ●a ●a ●a ●a ●a Austin Dallas EP Houston Lubbock RGV SA Centralization (Betweenness): 2009−2012 Loresto FL. Exploring the Use of Social Network Analysis on Physician Networks Created from Medicare Data through Studying the Use of Minimally Invasive Breast Biopsy Among Physicians: Descriptions, Regressions, and Network Models. Dissertation – Clinical Science, UTMB. 2018
  • 39.
    adams – 39 SNAData Repositories § Lin Freeman’s collection of “classic” datasets § http://moreno.ss.uci.edu/data.html § Katya Ognyanova’s collection of collections § http://kateto.net/2016/05/network-datasets/ § Mark Newman’s mish-mash collection § http://www-personal.umich.edu/~mejn/netdata/ § Stanford Large Network Dataset Collection § https://snap.stanford.edu/data/ § ICPSR (not network specific, but a good general resource) § http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp
  • 40.
    adams – 40 SamplingMeasurement Modes Ethics AssessmentEthics in SNA
  • 41.
    adams – 41 SamplingMeasurement Modes Ethics Assessment No blanket Approach; aspects to consider: § Informed Consent & Deductive Disclosure § Anonymity v. Confidentiality § Who benefits? Who bears burdens? § Tradeoffs § Descriptive vs. Predictive Utility § Aggregate vs. Individual level predictions § E.g., escribing structural (in-)efficiencies in an organization vs. who to hire/fire/promote within it Ethics in SNA
  • 42.
    adams – 42 SamplingMeasurement Modes Ethics AssessmentEthics, 2-mode data and recent events A (mostly) tongue-in-cheek analysis
  • 43.
    adams – 43 EvaluatingSocial Network Data Freeman, Linton C. 2000. "See you in the Funny Papers: Cartoons and Social Networks." Connections 23(1):32-42.
  • 44.
    adams – 44 SamplingMeasurement Modes Ethics Assessment 0 5 10 15 20 25 1985 2004 % NumGiv = 0 2010 “Important Matter” Isolation – Ego Networks in the GSS Fischer CS. Still Connected: Family and Friends in America since 1970. Russell Sage Foundation; 2011. See also: Paik A, Sanchagrin K. Social Isolation in America: An Artifact. American Sociological Review 2013; 78(3): 339-360.
  • 45.
    adams – 45 SamplingMeasurement Modes Ethics Assessment § GSS “Important Matters” ego networks – Isolation ~20% § About half are “true isolates” § Other half are “uninteresteds” (have contacts, nothing important to talk about) § What are “important matters”? (via ~cognitive interviewing) § Broad domains – Community Issues, News & Economy, Kids & Education, Politics & Elections, Life & Health, Relationships, Money & House, Ideology & Religion, Work § “Trivial” (?) topics - where to get a haircut, “cloning headless frogs” § Does social capital value hinge on this determination? § Hard to determine objectively (see next slide) SN Data Collection Strategies What do the Data Represent? Bearman P, Parigi P. 2004. “Cloning Headless Frogs and Other Important Matters: Conversation Topics and Network Structure.” Social Forces 83(2):535-557.
  • 46.
    adams – 46 SamplingMeasurement Modes Ethics AssessmentSN Data Collection Strategies What do the Data Represent? Who talks about what, with whom? Bearman P, Parigi P. 2004. “Cloning Headless Frogs and Other Important Matters: Conversation Topics and Network Structure.” Social Forces 83(2):535-557.
  • 47.
    adams – 47 SamplingMeasurement Modes Ethics AssessmentSN Data Collection Strategies What do the Data Represent? Who talks about what, with whom? Bearman P, Parigi P. 2004. “Cloning Headless Frogs and Other Important Matters: Conversation Topics and Network Structure.” Social Forces 83(2):535-557.
  • 48.
    adams – 48 SamplingMeasurement Modes Ethics AssessmentSN Data Collection Strategies What do the Data Represent? Brashears ME. 2014. “’Trivial’ Topics and Rich Ties: The Relationship between Discussion Topic, Alter Role, and Resource Availability Using the ‘Important Matters’ Name Generator.” Sociological Science 1:493-511.
  • 49.
    adams – 49 SamplingMeasurement Modes Ethics AssessmentSN Data Collection Strategies Reliability J. Moody / Social Networks 29 (2007) 44–58 adams j, Moody J. “To Tell the Truth? Measuring Concordance in Multiply-Reported Network Data.” Social Networks 2007;29:44-58.
  • 50.
    adams – 50 SamplingMeasurement Modes Ethics AssessmentSN Data Collection Strategies Reliability R says tie exists, do C/A agree? C/A says tie exists, does R agree? adams j, Moody J. “To Tell the Truth? Measuring Concordance in Multiply-Reported Network Data.” Social Networks 2007;29:44-58.
  • 51.
    adams – 51 SamplingMeasurement Modes Ethics AssessmentSN Data Collection Strategies Reliability Helleringer S, et al. The Reliability of Sexual Partnership Histories: Implications for the Measurement of Partnership Concurrency During Surveys. AIDS 25(4):503. Concordance is often found to be relatively low (BKS) § Recency & timing specificity/anchoring improve reliability § Ongoing > dissolved § Marital > nonmarital § Single partner > multiple partners § Long term > short term § Gender differences Duration of overlap between concurrent partnerships An association between partnership duration and the probability of reporting a partnership implies that when compared to the complete (concordant) scenario, the estimated average overlap of relationships is biased upward (downward) in self-reported survey data on sexual partnerships. We formally prove this claim in supplementary Appendix A1; http://links.lww.com/ QAD/A108. Numerical examples indicate that the size of the bias is greatest for respondents with at least one marital relation, and when the duration of a respondent’s concurrent partnerships is highly heterogeneous (e.g. two long-term partnerships and one short-term partnership). Discussion In this study, we used sociocentric network data to assess the interpartner reliability of partnership histories collected during surveys of sexual behaviors. We found very low reliability in reports of nonmarital partnerships, likely large and of unknown direction. Am we found no partnership concurrency in (concordant reports), and very low levels of according to self-reported data. On the othe all women and close to 20% of never marrie concurrent partnerships according to o scenario, which includes reports made by a partner(s). This is an important finding in apparent discrepancy between qualitative s indicated that concurrent partnerships may among women in sub-Saharan Africa quantitative surveys having documented ve of concurrency among women [12,32]. Th thus, be attributed to the poor quality of su concurrent partnerships. Among men, we also found significantly hi concurrent partnerships in our complete s was, however, not true for never married m there were no differences between self- complete data. Because the reliability o reports is much lower in dissolved 508 AIDS 2011, Vol 25 No 4 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Short ShortLong Long Recent RecentDistant Distant Men Women Men Women Ongoing relations Concordant Concordant but disagree re: date Resp only Partner only Dissolved relations Fig. 2. Proportion of nonmarital sexual partnerships concordantly or discordantly reported by a respondent and h by respondent gender and partnership duration/timing. ‘Concordant’ refers to partnerships reported by both partner but disagree re: date’ refers to partnerships reported by both partners, but one partner reported the partnership as ong the other reported it as dissolved’; ‘resp only’ refers to relationships only reported by the respondent; ‘partner relationships only reported by the partner. p[SR] ¼ 100 À ’Partner only’; p[OR] ¼ 100 À ‘Concordant’. The bars rep number of relationships in which respondents were involved according either to their own self-reports or to the partners. Bars are stacked to sum up to 100%. Ever marriedNever marriedAll respondents (b) (c) 10 0 Ever marriedNever marriedAll respondents Concordant scenario Self−reports Complete scenario 50 40 30 20 10 0 Ever MarriedNever MarriedAll respondents Women:self−reports Women: Complete scenario Men: self−reports Men: complete scenario 50 40 30 20 10 0 Fig. 3. Prevalence of partnership concurrency at the time of the survey according to three different scenarios. The first scenario (‘concordant scenario’) includes only partnerships
  • 52.
    adams – 52 SamplingMeasurement Modes Ethics AssessmentDealing with Contested Reports An W, Schramski S. Analysis of Contested Reports in Exchange Networks based on Actors’ Credibility. Social Networks 2015; 40:25-33
  • 53.
    adams – 53 SamplingMeasurement Modes Ethics AssessmentSN Data Collection Strategies Reliability § reducible: contained triadic closure (Figure 1a) § irreducible: contained no closed triads (Figure 1b) § strong: if relations were described using kin labels (e.g., parent/child, brother/sister) § weak: if relations were described using non-kin recreational labels (e.g., friend, group member) Table 2 | Regression models predicting performance, accuracy, coverage, relationship accuracy, and Erroneously Close wordspan, overguess, and number of incorrect ties. Generalized Linear Mixed Model predicting all five dependent variable Model Number 1 2 3 4 5 Model Type OLS OLS OLS OLS OLS Fitting Stage Full Full Full Full Trimmed DV Performance Accuracy Coverage Relationship Accuracy Relationship Accuracy Reducible 0.128 6 0.042 0.144 6 0.035 0.126 6 0.034 0.089 6 0.037 0.096 6 0.026 (p , 0.002) (p , .001) (p , 0.001) (p , 0.016) (p , 0.001) Strong 20.087 6 0.042 20.073 6 0.035 20.067 6 0.034 20.011 6 0.037 20.007 6 0.026 (p , 0.039) (p , 0.038) (p , 0.051) (p , 0.776) (p , 0.797) Reducible* Strong 0.171 6 0.060 0.113 6 0.05 0.129 6 0.049 0.004 6 0.053 (p , 0.005) (p , 0.026) (p , 0.009) (p , 0.935) Overguess 20.163 6 0.038 0.252 6 0.037 (p , .001) (p , 0.001) Timespent 0.001 6 0.00005 0.0004 6 0.00004 0.0004 6 0.00004 0.0001 6 0.00004 0.0001 6 0.00004 (p , 0.001) (p , 0.001) (p , 0.001) (p , 0.014) (p , 0.011) Word Span 0.042 6 0.015 0.036 6 0.012 0.036 6 0.012 0.020 6 0.013 (p , 0.005) (p , .004) (p , 0.003) (p , 0.126)Brashears ME. Humans use Compression Heuristics to Improve the Recall of Social Networks Nature Human Behavior 2013; 3:1513.
  • 54.
    adams – 54 SamplingMeasurement Modes Ethics Assessment § How commonly is network data likely to be missing? Grippa F, Gloor PA. You are Who Remembers You: Detecting Leadership through Accuracy of Recall. Social Networks 2009; 31(4): 255-261. Wang DJ, Shi X, McFarland DA, Leskovec J. Measurement Error in Network Data: A Re-Classification. Social Networks 2012; 34(4):396-409. Yenigün, D., et al. Omission and commission errors in network cognition and network estimation using ROC curve." Social Networks 2017; 50: 26-34. § Diagnosing the biases due to missing network data: Borgatti SP, Carley KM, Krackhardt D. On the Robustness of Centrality Measures under Conditions of Imperfect Data. Social Networks 2006; 28:124-136. Gonzalez-Bailon S, et al. Assessing the Bias in Samples of Large Online Networks. Social Networks 2014; 38:16-27. Kossinets G. Effects of Missing Data on Social Networks. Social Networks 2006; 28(3): 247-268. Smith JA, Moody J. Structural Effects of Network Sampling Coverage I: Nodes Missing at Random. Social Networks 2013; 35:652-668. Smith JA, Moody J, Morgan JH. Network Sampling Coverage II: The Effect of non-Random Missing Data on Network Measurement. Social Networks 2017; 48:78-99. § Strategies for dealing with missing data: Handcock MS, Gile KJ. Modeling Social Networks from Sampled Data. Annals of Applied Statistics 2010; 4:5-25. Huisman M. Imputation of Missing Network Data: Some Simple Procedures. Journal of Social Structure 2009; 10. Koskinen JH, et al. Bayesian Analysis for Partially Observed Network Data, Missing Ties, Attributes, and Actors. Social Networks 2013; 35:514-527. Neal JW. "Kracking" the Missing Data Problem: Applying Krackhardt's Cognitive Social Structures to School-Based Social Networks. Sociology of Education 2008; 81: 140-162. § Punchline – it’s a bear, and yet to achieve consensus on what to do about it. Missing Data
  • 55.
    adams – 55 SamplingMeasurement Modes Ethics AssessmentApparent Contradictions don’t Necessarily indicate Poor Data PNAS 2000; 97(22): 12385-12388.
  • 56.
    adams – 56 Questions? Also,be on the lookout for a QASS book on this topic, hopefully, early next year.