SlideShare a Scribd company logo
Respondent Driven Sampling &
Network Sampling with Memory
(time permitting…)
M. Giovanna Merli
Sanford School of Public Policy &
Duke Population Research Institute (DUPRI)
Duke University
Funding Acknowledgements
• RDS Data Collection in China (2009-2010)
– “Place-RDS Comparison Study”
• USAID under the terms of cooperative agreements GPO-A-00-03-00003-00 and
GPO-A-00-09-00003-0 (Weir, PI)
• China National Center for STD Control (Chen, PI)
• Duke CFAR AI064518 (Merli, PI)
– “Partnership for Social Science Research on HIV/AIDS in China”
• NICHD R24 HD056670 (Henderson, PI)
• RDS Data Analyses and Simulations (2011-2015)
– “Using Multiple Data Sources to Improve RDS Estimation”
• NICHD R01HD068523 (Merli, PI)
• NSM Data Collection in Tanzania
– PFirst Award/DGHI (Merli, PI)
2
Problems with the study of hidden
populations
Female sex workers, men who have sex with men, injecting drug users,
homeless, undocumented migrants are hidden populations
For these populations we typically want to:
• Obtain accurate and precise estimates of disease prevalence
• Discern impact on larger population health dynamics
• Identify gaps in HIV/STD prevention
Collecting data from hidden populations to infer population representation is
difficult because of the absence of a sampling frame – their members are hard
to identify
– Stigma
– Non response
– Lack of trust
– Rarity
3
Problems with the study of hidden
populations
• Convenience samples, clinic-based inquiries,
and sampling frames with limited coverage
(e.g. venue based sampling) lack basis for
inferring representation
4
Respondent Driven Sampling (RDS)
Heckathorn 1997, 2002; Salganik and Heckathorn 2004;
Volz and Heckathorn 2008
• Most popular solution to
problems of sampling
hidden populations
– 450+ studies
– 624+ papers, 10k+ citations
– Over $185 million from NIH
• Compare to “ego centric”
– 167 studies funded
– $42 million since 1990
5
How RDS works
• RDS primarily used to estimate population proportions of binary
nodal covariates (e.g. gender, infection status, tier of sex work, etc.)
• Leverages social network of respondents to recruit other
respondents
• Chain referral / peer recruitment / link tracing sampling strategy
– “Seed” participants (selected by convenience) receive coupons (2)
– Recruit 2-3 new participants each
– Each new respondent given 2-3 coupons to recruit others
– Recruitment incentives for participating and for successful recruitment
– No one participates more than once
– Process continues until desired sample size is obtained
6
How RDS works
7
10
How RDS works
14
Problems with estimation in link tracing
sampling designs of hidden populations
• Sampling frame
unavailable
• Sample inclusion
probabilities are not
known (hence sampling
weights unknown)
• Researchers have limited
control of the sampling
process
• Seed respondents not
chosen at random
RDS solution
• Sampling probabilities computed under an approximation of
the true sampling process
– RDS assumes non-seed participants are Sampled with Probability
Proportional to self-reported degree – (SPPD)
– Provable in a random walk on most graphs of interest
– Sampling probabilities approximated by degree, hence sampling
weight = 1/degree
• Weighting/estimation can yield asymptotically unbiased
estimates of the population mean
• SPPD assumption underpins much of RDS estimation claims
16
RDS estimators
Estimator Proportion Equation Notes
Naïve 𝑝 = 𝑖𝜖𝜒 𝑥𝑖 𝑛 −1 𝑥𝑖 is the value of the focal
variable for respondent 𝑖; 𝑛 is the
sample size
RDS1-SH
𝑝 = 𝑆0,1 𝑑0 𝑆0,1 𝑑0 + 𝑆1,0 𝑑1
−1 𝑆 𝑎,𝑏 is the estimated proportion of
recruitments from group 𝑎 to 𝑏;
𝑑 𝑎is the estimated average degree
in each group
(Salganik and Heckathorn 2004)
RDS1-LEN
𝑝 = 𝑆0,1
𝑒𝑔𝑜
𝑑0 𝑆0,1
𝑒𝑔𝑜
𝑑0 + 𝑆1,0
𝑒𝑔𝑜
𝑑1
−1 𝑆 𝑎,𝑏
𝑒𝑔𝑜
is the estimated proportion
of network ties from group 𝑎 to 𝑏
based on ego network reports
(Lu 2013)
RDS2-VH 𝒑 = 𝒊∈𝝌 𝒙𝒊 𝒅𝒊
−𝟏
𝒊∈𝝌 𝒅𝒊
−𝟏 −𝟏 𝒅𝒊
−𝟏
is the inverse of self-
reported degree for person 𝒊
(Volz and Heckathorn 2008)
17
In RDS, all approximations are subject to critical
assumptions that are often not met in the field
• About the unobserved sample recruitment process (most crucial)
– Respondent gives a coupon to a friend
– Respondents recruit new participants non-preferentially from amongst their
social contacts (each friend has an equal chance of being picked)
– The initial set of respondents (“seeds”) are drawn with random probabilities
– Respondents report their number of ties accurately (how many people you
know that are members of the population of interest?)
• About the social network structure
– Rapid mixing: The chain referral process converges very quickly to the
stationary distribution of a random walk (i.e. node selection probabilities are
independent of sample starting point)
– Connectedness: The target population must be connected by a network that
consists of a single component
– Network size: Network must be sufficiently large (sampling fraction small) that
sampling without replacement can be treated as if it is equivalent to sampling
with replacement
18
Prior evaluations of RDS
• Comparison of RDS estimates to known parameters of non-
hidden populations
– (Wejnert 2009; Wejnert & Heckathorn 2008; McCreesh et al. 2012)
• Test effects of violating RDS assumptions about social
network structure on synthetic populations
– (Gile & Handcock 2010; Thomas & Gile 2011; Lu et al. 2011)
• Examine effects of network structure in multiple empirical
settings with theoretical/ideal RDS samples
– (Goel & Salganik 2010; Mouw & Verdery 2012; Verdery , Mouw et al. 2015)
• Use full information on participants’ recruitment behavior to
evaluate non-preferential recruitment assumption
– (Yamanis, Merli, Neely et al. Sociological Methods and Research 2013)
19
RDS evaluation in the context of
Female Sex Workers in Liuzhou, China
• Evaluate SPPD assumption and
population coverage (Merli, Moody, Smith et
al., 2015 Social Science and Medicine)
• Evaluate performance of RDS
estimators (Verdery, Merli, Moody et al., 2015
Epidemiology)
• Propose RDS data collection
innovation to improve estimator
performance (Verdery, Merli, Moody, In
Progress)
• Evaluations with a simulation
approach grounded in empirical data
from a hidden population of FSWs in
China (Liuzhou, Guangxi Province)
(Weir, Merli, Li et al. 2012, Sexually Transmitted
Infections)
20
Data
• Two sources
– RDS: 583 FSWs (Oct. 2009 – Feb. 2010) (about 8% of total
FSW population in Liuzhou)
– PLACE (venue based sampling approach): 161 FSWs (Nov.
2009 – Mar. 2010)
• Same target population and inclusion definition
– Women who reside in Liuzhou who exchanged sex for money in last 4 weeks
• Same geographic area and similar time period
• Same measurement of key variables
– Test for biomarker of lifetime exposure to syphilis and core questionnaire
• Same face-to-face interview and common applicant pool for interviewers
• Rare to have two concurrent surveys in same population!
21
Description of the Liuzhou RDS sample
Tier
of sex
work
Venues where clients are
solicited
RDS
(N = 576)
High Karaoke bars, star hotels, discos,
night clubs
250
Middle Hair salons, saunas, massage
parlors, foot cleaning/massage,
bathhouses
268
Low Streets, parks, other public spaces 27
Non-
venue
based
Telephone, text, internet,
private referrals
31
22
Fisher and Merli 2014, Network Science.
Approach, part 1
• Construct “population social network” from data
collected in RDS and PLACE
– Used new methodologies for estimating social network
parameters and simulating population network
• Use Case Control Logistic Regression to estimate homophily
parameters from the RDS data (Smith, SM 2012)
• Use Exponential Random Graph Modeling to generate full
network from local structural features (ERGM; Handcock et al., JOSS 2008)
– Tested various sensitivities about the means by which
this population social network is constructed
• (which data source, venue size estimates, and assumptions
about geographic distribution of social network ties)
23
“Population social network”
Generate “population characteristics”
based on PLACE survey estimates
Add “population social network”
based on RDS survey estimates
24
Approach, part 2
• Simulate RDS chains over “population social
network” (1000 per recruitment scenario)
– Scenarios vary according to different sample
recruitment assumptions
• Seeding of the chain
• Recruitment patterns
– How much does the ideal case (random seeding
and random recruitment) diverge from actual RDS
seeding and recruitment matched to the Liuzhou
FSW data?
25
Results:
Violation of SPPD assumption
• Compared individual degree to
the proportion of times an
individual was sampled across
the simulated chains
– Very high correlation when
seeds and referrals are random
– SSPD assumption increasingly
violated when seeds & referrals
are matched to the actual data
– Over-recruitment of middle tier
sex workers drives the result
• For more:
– Merli, Moody, Smith et al.,
Social Science & Medicine,
2015
26
r=0.82 r=0.96 r=0.97
Merli, Moody, Smith et al., SSM, 2015
Distribution of RDS2-VH proportion estimates
(low/middle tier) across seeding and recruitment
scenarios
27
Verdery, Merli, Moody et al. 2015, Epidemiology
Variability of estimates: Design effects
(ratio of variance in RDS estimates to variance in estimates from same size SRS)
• DE very large, but not out of line with findings of prior work (Goel
and Salganik 2010)
• Large Design Effects imply that much larger sample sizes would
be required to reach level of precision currently assumed from
RDS samples typically in the hundreds
• CDC recommends RDS sample sizes in the hundreds for public
health surveillance – IMPLICATIONS: Not sufficient power to
identify changes in behaviors or disease prevalence
28
DemDem DemRan RanRan
Middle Tier 6.18 19.60 28.20
Discussion
• Seeding and recruitment scenarios
– Matching on seeds not critical
– Matching on recruitment patterns has a larger
effect, exacerbates biases but reduces design
effects
• Problematic because seems harder to control than seed
matching
29
Estimator performance
• Estimator development
– Only one (RDS1-LEN) works
markedly better than
others
• Robust to preferential
recruitment by taking into
account respondents’ ego-
network composition
– BUT unusable for most
(unobservable)
characteristics we care
about
– Still problems with variance
estimation
30
Verdery, Merli, Moody et al. 2015, Epidemiology
Distributions of estimates of proportions in low
tiers of sex work by estimator (recruitment and
seeds matched to the Liuzhou FSW data)
Recent innovation: IP-RDS
(Verdery, Merli, Moody, In Progress)
• What can be done to improve the performance of RDS
estimates while retaining the method’s desirable peer-
driven sample recruitment properties?
• Modify RDS data collection process
• Apply antithetic variate mean estimator to data
• Results from simulations: Improved estimation
performance
31
New data collection protocol
IP-RDS
• Incentivize respondents to invert their
preferences when choosing new respondents,
i.e. respondents are asked to invert their
recruitment preferences on the recruitment
biasing variable (e.g. tier of sex work)
32
“Random recruitment”
33
A
B
C
D
3/9
3/9
3/9
“Preferential recruitment”
34
A
B
C
D
4/9
4/9
1/9
“Inverse-Preferential recruitment”
35
A
B
C
D
2/9
2/9
5/9
Antithetic variate mean estimator
• 𝜇 𝐴𝑉 = 𝑖∈𝑚1 𝑦 𝑖
2
+ 𝑖∈𝑚2 𝑦 𝑖
2
, where
yi is the value of the focal variable for the i
respondent
m1 is the count of recruitments by members of
one group of the recruitment biasing variable
(e.g. tier of sex work), and m2 is the count of
recruitments by members of the other group
36
Distributions of estimates of proportions in low/mid tiers of sex work
by estimator (naïve mean, RDS2-VH, AV-IP_RDS) and level of biased
recruitment behavior (absolute difference in recruitment probabilities
conditional on attribute of targeted peer)
37
Discussion of IP-RDS
• Simple change to RDS protocol
– May or may not require financial incentives for
targeted recruitment (empirical question)
• Outperforms conventional estimators
– Gains in bias reduction comparable to RDS1-LEN
estimator
• Tested on more networks (similar results)
• BUT …Not yet field tested
38
Network Sampling with Memory
• Mouw and Verdery 2012, Sociological
Methodology
• Collects network data
• Introduces researcher’s control over the
sampling process
• Directs the recruitment process to more
efficiently explore the network (avoiding
bottlenecks)
How does NSM work?
• Recruitment starts with a few seed respondents
• Network roster data collected from respondents about
minimally identifying information of their network members
(last name and last four digits of cell phone number) to
connect nodes in the network (up to 10 network members per
respondent)
• NSM sampling algorithm selects up to 3 nominated network
members per respondent and asks respondents for full contact
information on these
• Process proceeds iteratively to recruit new waves of
respondents
Network data collection
How does NSM work?
• NSM sampling algorithm uses two sampling
modes, List and Search
• List mode
– keeps a list, L, of all nominated network members
– samples with replacement from L
– even sampling of new nodes -- new nodes sampled at
the same cumulative sampling rate as earlier nodes
– as list of sampled nodes approaches the full population
network, NSM sample converges to simple random
sampling
How does NSM work?
• Search mode—look for “bridge” nodes to
unexplored parts of the network. Start in
search mode, then switch to list mode.
Simulation results
• Test NSM vs. RDS using 162 university and School
networks from Facebook and Add Health
• Size of networks ranges from 300 to 16,500 nodes
• Estimate % white (Add Health) and % first year students
(Facebook)
• Start from a randomly selected student, repeat 500
times for each network
• Calculate bias, design effects and mean absolute bias
• Test (162 networks) DE is 1.16 for NSM vs 77.38 for RDS
Is it feasible?
• Is it feasible to collect network data on hidden
populations?
• 2010 NSIT (Network Survey of Immigration and
Transnationalism) (Mouw, PI)
• CAHS (Chinese in Africa Health Survey) (Merli, PI)
• Cost effectiveness of gains in precision
NSM field applications
Network Survey of Immigration and
Transnationalism (NSIT)
Mouw et al. 2014. Social Problems;
Verdery et al. 2016. Social Networks
Chinese in Africa Health Survey (CAHS)
Merli, Verdery, Mouw, Li 2016. Migration Studies
46
Red: RDU
Blue: Mexico
Green: Houston
Small: Nominated
Large: Sampled
Network of Chinese migrants in Dar es Salaam
sampled by NSM, size = probability of selecting
next node
Key challenge: Getting referrals from
respondents
• NSIT required recontacting respondents to get
contact information on alters
• CAHS -- “forward” sampling variant (FNSM)—
more practical
– Asked for contact information on a small number
of alters at each interview (selected by NSM
algorithm)
NSM -- Future directions
• NIH R21 grant to test NSM among Chinese
immigrants in RDU (Merli, Mouw, Verdery,
Moody, Keister, Sanders)
– Pilot various approaches to get referrals from
respondents
– Evaluate NSM against ACS
– Test multiple modes of data collection (in-person,
telephone, web)
48

More Related Content

What's hot

02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
dnac
 
11 Network Experiments and Interventions
11 Network Experiments and Interventions11 Network Experiments and Interventions
11 Network Experiments and Interventions
dnac
 
04 Diffusion and Peer Influence
04 Diffusion and Peer Influence04 Diffusion and Peer Influence
04 Diffusion and Peer Influence
dnac
 
00 Social Influence Effects on Men's HIV Testing
00 Social Influence Effects on Men's HIV Testing00 Social Influence Effects on Men's HIV Testing
00 Social Influence Effects on Men's HIV Testing
Duke Network Analysis Center
 
07 Whole Network Descriptive Statistics
07 Whole Network Descriptive Statistics07 Whole Network Descriptive Statistics
07 Whole Network Descriptive Statistics
Duke Network Analysis Center
 
10 More than a Pretty Picture: Visual Thinking in Network Studies
10 More than a Pretty Picture: Visual Thinking in Network Studies10 More than a Pretty Picture: Visual Thinking in Network Studies
10 More than a Pretty Picture: Visual Thinking in Network Studies
dnac
 
09 Ego Network Analysis
09 Ego Network Analysis09 Ego Network Analysis
09 Ego Network Analysis
Duke Network Analysis Center
 
00 Introduction to SN&H: Key Concepts and Overview
00 Introduction to SN&H: Key Concepts and Overview00 Introduction to SN&H: Key Concepts and Overview
00 Introduction to SN&H: Key Concepts and Overview
Duke Network Analysis Center
 
11 Respondent Driven Sampling
11 Respondent Driven Sampling11 Respondent Driven Sampling
11 Respondent Driven Sampling
Duke Network Analysis Center
 
20 Network Experiments
20 Network Experiments20 Network Experiments
20 Network Experiments
Duke Network Analysis Center
 
05 Network Canvas (2017)
05 Network Canvas (2017)05 Network Canvas (2017)
05 Network Canvas (2017)
Duke Network Analysis Center
 
06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)
Duke Network Analysis Center
 
04 Ego Network Analysis
04 Ego Network Analysis04 Ego Network Analysis
04 Ego Network Analysis
Duke Network Analysis Center
 
18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence
Duke Network Analysis Center
 
10 Network Experiments
10 Network Experiments10 Network Experiments
10 Network Experiments
Duke Network Analysis Center
 
00 Differentiating Between Network Structure and Network Function
00 Differentiating Between Network Structure and Network Function00 Differentiating Between Network Structure and Network Function
00 Differentiating Between Network Structure and Network Function
Duke Network Analysis Center
 
09 Diffusion Models & Peer Influence
09 Diffusion Models & Peer Influence09 Diffusion Models & Peer Influence
09 Diffusion Models & Peer Influence
Duke Network Analysis Center
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
Duke Network Analysis Center
 

What's hot (18)

02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
11 Network Experiments and Interventions
11 Network Experiments and Interventions11 Network Experiments and Interventions
11 Network Experiments and Interventions
 
04 Diffusion and Peer Influence
04 Diffusion and Peer Influence04 Diffusion and Peer Influence
04 Diffusion and Peer Influence
 
00 Social Influence Effects on Men's HIV Testing
00 Social Influence Effects on Men's HIV Testing00 Social Influence Effects on Men's HIV Testing
00 Social Influence Effects on Men's HIV Testing
 
07 Whole Network Descriptive Statistics
07 Whole Network Descriptive Statistics07 Whole Network Descriptive Statistics
07 Whole Network Descriptive Statistics
 
10 More than a Pretty Picture: Visual Thinking in Network Studies
10 More than a Pretty Picture: Visual Thinking in Network Studies10 More than a Pretty Picture: Visual Thinking in Network Studies
10 More than a Pretty Picture: Visual Thinking in Network Studies
 
09 Ego Network Analysis
09 Ego Network Analysis09 Ego Network Analysis
09 Ego Network Analysis
 
00 Introduction to SN&H: Key Concepts and Overview
00 Introduction to SN&H: Key Concepts and Overview00 Introduction to SN&H: Key Concepts and Overview
00 Introduction to SN&H: Key Concepts and Overview
 
11 Respondent Driven Sampling
11 Respondent Driven Sampling11 Respondent Driven Sampling
11 Respondent Driven Sampling
 
20 Network Experiments
20 Network Experiments20 Network Experiments
20 Network Experiments
 
05 Network Canvas (2017)
05 Network Canvas (2017)05 Network Canvas (2017)
05 Network Canvas (2017)
 
06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)
 
04 Ego Network Analysis
04 Ego Network Analysis04 Ego Network Analysis
04 Ego Network Analysis
 
18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence
 
10 Network Experiments
10 Network Experiments10 Network Experiments
10 Network Experiments
 
00 Differentiating Between Network Structure and Network Function
00 Differentiating Between Network Structure and Network Function00 Differentiating Between Network Structure and Network Function
00 Differentiating Between Network Structure and Network Function
 
09 Diffusion Models & Peer Influence
09 Diffusion Models & Peer Influence09 Diffusion Models & Peer Influence
09 Diffusion Models & Peer Influence
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
 

Similar to 09 Respondent Driven Sampling and Network Sampling with Memory

What can real time data offer and are decision makers ready for it anyway?
What can real time data offer and are decision makers ready for it anyway?What can real time data offer and are decision makers ready for it anyway?
What can real time data offer and are decision makers ready for it anyway?
Rachel Harris
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveys
Cliff Lampe
 
APLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating Network
APLICwebmaster
 
00 Partner or Perish? Public Health Delivery Systems & Cancer Screenings for ...
00 Partner or Perish? Public Health Delivery Systems & Cancer Screenings for ...00 Partner or Perish? Public Health Delivery Systems & Cancer Screenings for ...
00 Partner or Perish? Public Health Delivery Systems & Cancer Screenings for ...
Duke Network Analysis Center
 
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant DivideDay 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
Ag4HealthNutrition
 
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition OutcomeA Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
soder145
 
Topic_4_Survey.pdf
Topic_4_Survey.pdfTopic_4_Survey.pdf
Topic_4_Survey.pdf
veronica738929
 
02 Network Data Collection (2016)
02 Network Data Collection (2016)02 Network Data Collection (2016)
02 Network Data Collection (2016)
Duke Network Analysis Center
 
Sdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalSdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) final
kimlyman
 
Sampling for Quantities & Qualitative Research Abeer AlNajjar.docx
Sampling for Quantities & Qualitative Research  Abeer AlNajjar.docxSampling for Quantities & Qualitative Research  Abeer AlNajjar.docx
Sampling for Quantities & Qualitative Research Abeer AlNajjar.docx
anhlodge
 
Statistical models for the integration of multiple omics datasets
Statistical models for the integration of multiple omics datasetsStatistical models for the integration of multiple omics datasets
Statistical models for the integration of multiple omics datasets
Said El Bouhaddani
 
Understanding ICPSR's Research Methods-related Metadata
Understanding ICPSR's Research Methods-related MetadataUnderstanding ICPSR's Research Methods-related Metadata
Understanding ICPSR's Research Methods-related Metadata
Lynette Hoelter
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Galit Shmueli
 
Response Rates Impact Data Quality, But not How you Might Think
Response Rates Impact Data Quality, But not How you Might ThinkResponse Rates Impact Data Quality, But not How you Might Think
Response Rates Impact Data Quality, But not How you Might Think
Stephanie Eckman
 
2016 Sessions: Estimating invisible risk populations
2016 Sessions: Estimating invisible risk populations2016 Sessions: Estimating invisible risk populations
2016 Sessions: Estimating invisible risk populations
Sri Lanka College of Sexual Health and HIV Medicine
 
Methodology 2.pptx
Methodology 2.pptxMethodology 2.pptx
Methodology 2.pptx
MarcCollazo1
 
Jpgrund et al peer methods review-icdrh2010-v2
Jpgrund et al peer methods review-icdrh2010-v2Jpgrund et al peer methods review-icdrh2010-v2
Jpgrund et al peer methods review-icdrh2010-v2
Jean-Paul Grund
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
Symeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
Eleftherios Spyromitros-Xioufis
 
VALLENTINE CONCEPT PAPER.pptx
VALLENTINE CONCEPT PAPER.pptxVALLENTINE CONCEPT PAPER.pptx
VALLENTINE CONCEPT PAPER.pptx
Vallentine Okumu
 

Similar to 09 Respondent Driven Sampling and Network Sampling with Memory (20)

What can real time data offer and are decision makers ready for it anyway?
What can real time data offer and are decision makers ready for it anyway?What can real time data offer and are decision makers ready for it anyway?
What can real time data offer and are decision makers ready for it anyway?
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveys
 
APLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating Network
 
00 Partner or Perish? Public Health Delivery Systems & Cancer Screenings for ...
00 Partner or Perish? Public Health Delivery Systems & Cancer Screenings for ...00 Partner or Perish? Public Health Delivery Systems & Cancer Screenings for ...
00 Partner or Perish? Public Health Delivery Systems & Cancer Screenings for ...
 
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant DivideDay 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
 
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition OutcomeA Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
 
Topic_4_Survey.pdf
Topic_4_Survey.pdfTopic_4_Survey.pdf
Topic_4_Survey.pdf
 
02 Network Data Collection (2016)
02 Network Data Collection (2016)02 Network Data Collection (2016)
02 Network Data Collection (2016)
 
Sdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalSdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) final
 
Sampling for Quantities & Qualitative Research Abeer AlNajjar.docx
Sampling for Quantities & Qualitative Research  Abeer AlNajjar.docxSampling for Quantities & Qualitative Research  Abeer AlNajjar.docx
Sampling for Quantities & Qualitative Research Abeer AlNajjar.docx
 
Statistical models for the integration of multiple omics datasets
Statistical models for the integration of multiple omics datasetsStatistical models for the integration of multiple omics datasets
Statistical models for the integration of multiple omics datasets
 
Understanding ICPSR's Research Methods-related Metadata
Understanding ICPSR's Research Methods-related MetadataUnderstanding ICPSR's Research Methods-related Metadata
Understanding ICPSR's Research Methods-related Metadata
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
 
Response Rates Impact Data Quality, But not How you Might Think
Response Rates Impact Data Quality, But not How you Might ThinkResponse Rates Impact Data Quality, But not How you Might Think
Response Rates Impact Data Quality, But not How you Might Think
 
2016 Sessions: Estimating invisible risk populations
2016 Sessions: Estimating invisible risk populations2016 Sessions: Estimating invisible risk populations
2016 Sessions: Estimating invisible risk populations
 
Methodology 2.pptx
Methodology 2.pptxMethodology 2.pptx
Methodology 2.pptx
 
Jpgrund et al peer methods review-icdrh2010-v2
Jpgrund et al peer methods review-icdrh2010-v2Jpgrund et al peer methods review-icdrh2010-v2
Jpgrund et al peer methods review-icdrh2010-v2
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
VALLENTINE CONCEPT PAPER.pptx
VALLENTINE CONCEPT PAPER.pptxVALLENTINE CONCEPT PAPER.pptx
VALLENTINE CONCEPT PAPER.pptx
 

Recently uploaded

NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 

Recently uploaded (20)

NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 

09 Respondent Driven Sampling and Network Sampling with Memory

  • 1. Respondent Driven Sampling & Network Sampling with Memory (time permitting…) M. Giovanna Merli Sanford School of Public Policy & Duke Population Research Institute (DUPRI) Duke University
  • 2. Funding Acknowledgements • RDS Data Collection in China (2009-2010) – “Place-RDS Comparison Study” • USAID under the terms of cooperative agreements GPO-A-00-03-00003-00 and GPO-A-00-09-00003-0 (Weir, PI) • China National Center for STD Control (Chen, PI) • Duke CFAR AI064518 (Merli, PI) – “Partnership for Social Science Research on HIV/AIDS in China” • NICHD R24 HD056670 (Henderson, PI) • RDS Data Analyses and Simulations (2011-2015) – “Using Multiple Data Sources to Improve RDS Estimation” • NICHD R01HD068523 (Merli, PI) • NSM Data Collection in Tanzania – PFirst Award/DGHI (Merli, PI) 2
  • 3. Problems with the study of hidden populations Female sex workers, men who have sex with men, injecting drug users, homeless, undocumented migrants are hidden populations For these populations we typically want to: • Obtain accurate and precise estimates of disease prevalence • Discern impact on larger population health dynamics • Identify gaps in HIV/STD prevention Collecting data from hidden populations to infer population representation is difficult because of the absence of a sampling frame – their members are hard to identify – Stigma – Non response – Lack of trust – Rarity 3
  • 4. Problems with the study of hidden populations • Convenience samples, clinic-based inquiries, and sampling frames with limited coverage (e.g. venue based sampling) lack basis for inferring representation 4
  • 5. Respondent Driven Sampling (RDS) Heckathorn 1997, 2002; Salganik and Heckathorn 2004; Volz and Heckathorn 2008 • Most popular solution to problems of sampling hidden populations – 450+ studies – 624+ papers, 10k+ citations – Over $185 million from NIH • Compare to “ego centric” – 167 studies funded – $42 million since 1990 5
  • 6. How RDS works • RDS primarily used to estimate population proportions of binary nodal covariates (e.g. gender, infection status, tier of sex work, etc.) • Leverages social network of respondents to recruit other respondents • Chain referral / peer recruitment / link tracing sampling strategy – “Seed” participants (selected by convenience) receive coupons (2) – Recruit 2-3 new participants each – Each new respondent given 2-3 coupons to recruit others – Recruitment incentives for participating and for successful recruitment – No one participates more than once – Process continues until desired sample size is obtained 6
  • 8.
  • 9.
  • 10. 10
  • 11.
  • 12.
  • 13.
  • 15. Problems with estimation in link tracing sampling designs of hidden populations • Sampling frame unavailable • Sample inclusion probabilities are not known (hence sampling weights unknown) • Researchers have limited control of the sampling process • Seed respondents not chosen at random
  • 16. RDS solution • Sampling probabilities computed under an approximation of the true sampling process – RDS assumes non-seed participants are Sampled with Probability Proportional to self-reported degree – (SPPD) – Provable in a random walk on most graphs of interest – Sampling probabilities approximated by degree, hence sampling weight = 1/degree • Weighting/estimation can yield asymptotically unbiased estimates of the population mean • SPPD assumption underpins much of RDS estimation claims 16
  • 17. RDS estimators Estimator Proportion Equation Notes Naïve 𝑝 = 𝑖𝜖𝜒 𝑥𝑖 𝑛 −1 𝑥𝑖 is the value of the focal variable for respondent 𝑖; 𝑛 is the sample size RDS1-SH 𝑝 = 𝑆0,1 𝑑0 𝑆0,1 𝑑0 + 𝑆1,0 𝑑1 −1 𝑆 𝑎,𝑏 is the estimated proportion of recruitments from group 𝑎 to 𝑏; 𝑑 𝑎is the estimated average degree in each group (Salganik and Heckathorn 2004) RDS1-LEN 𝑝 = 𝑆0,1 𝑒𝑔𝑜 𝑑0 𝑆0,1 𝑒𝑔𝑜 𝑑0 + 𝑆1,0 𝑒𝑔𝑜 𝑑1 −1 𝑆 𝑎,𝑏 𝑒𝑔𝑜 is the estimated proportion of network ties from group 𝑎 to 𝑏 based on ego network reports (Lu 2013) RDS2-VH 𝒑 = 𝒊∈𝝌 𝒙𝒊 𝒅𝒊 −𝟏 𝒊∈𝝌 𝒅𝒊 −𝟏 −𝟏 𝒅𝒊 −𝟏 is the inverse of self- reported degree for person 𝒊 (Volz and Heckathorn 2008) 17
  • 18. In RDS, all approximations are subject to critical assumptions that are often not met in the field • About the unobserved sample recruitment process (most crucial) – Respondent gives a coupon to a friend – Respondents recruit new participants non-preferentially from amongst their social contacts (each friend has an equal chance of being picked) – The initial set of respondents (“seeds”) are drawn with random probabilities – Respondents report their number of ties accurately (how many people you know that are members of the population of interest?) • About the social network structure – Rapid mixing: The chain referral process converges very quickly to the stationary distribution of a random walk (i.e. node selection probabilities are independent of sample starting point) – Connectedness: The target population must be connected by a network that consists of a single component – Network size: Network must be sufficiently large (sampling fraction small) that sampling without replacement can be treated as if it is equivalent to sampling with replacement 18
  • 19. Prior evaluations of RDS • Comparison of RDS estimates to known parameters of non- hidden populations – (Wejnert 2009; Wejnert & Heckathorn 2008; McCreesh et al. 2012) • Test effects of violating RDS assumptions about social network structure on synthetic populations – (Gile & Handcock 2010; Thomas & Gile 2011; Lu et al. 2011) • Examine effects of network structure in multiple empirical settings with theoretical/ideal RDS samples – (Goel & Salganik 2010; Mouw & Verdery 2012; Verdery , Mouw et al. 2015) • Use full information on participants’ recruitment behavior to evaluate non-preferential recruitment assumption – (Yamanis, Merli, Neely et al. Sociological Methods and Research 2013) 19
  • 20. RDS evaluation in the context of Female Sex Workers in Liuzhou, China • Evaluate SPPD assumption and population coverage (Merli, Moody, Smith et al., 2015 Social Science and Medicine) • Evaluate performance of RDS estimators (Verdery, Merli, Moody et al., 2015 Epidemiology) • Propose RDS data collection innovation to improve estimator performance (Verdery, Merli, Moody, In Progress) • Evaluations with a simulation approach grounded in empirical data from a hidden population of FSWs in China (Liuzhou, Guangxi Province) (Weir, Merli, Li et al. 2012, Sexually Transmitted Infections) 20
  • 21. Data • Two sources – RDS: 583 FSWs (Oct. 2009 – Feb. 2010) (about 8% of total FSW population in Liuzhou) – PLACE (venue based sampling approach): 161 FSWs (Nov. 2009 – Mar. 2010) • Same target population and inclusion definition – Women who reside in Liuzhou who exchanged sex for money in last 4 weeks • Same geographic area and similar time period • Same measurement of key variables – Test for biomarker of lifetime exposure to syphilis and core questionnaire • Same face-to-face interview and common applicant pool for interviewers • Rare to have two concurrent surveys in same population! 21
  • 22. Description of the Liuzhou RDS sample Tier of sex work Venues where clients are solicited RDS (N = 576) High Karaoke bars, star hotels, discos, night clubs 250 Middle Hair salons, saunas, massage parlors, foot cleaning/massage, bathhouses 268 Low Streets, parks, other public spaces 27 Non- venue based Telephone, text, internet, private referrals 31 22 Fisher and Merli 2014, Network Science.
  • 23. Approach, part 1 • Construct “population social network” from data collected in RDS and PLACE – Used new methodologies for estimating social network parameters and simulating population network • Use Case Control Logistic Regression to estimate homophily parameters from the RDS data (Smith, SM 2012) • Use Exponential Random Graph Modeling to generate full network from local structural features (ERGM; Handcock et al., JOSS 2008) – Tested various sensitivities about the means by which this population social network is constructed • (which data source, venue size estimates, and assumptions about geographic distribution of social network ties) 23
  • 24. “Population social network” Generate “population characteristics” based on PLACE survey estimates Add “population social network” based on RDS survey estimates 24
  • 25. Approach, part 2 • Simulate RDS chains over “population social network” (1000 per recruitment scenario) – Scenarios vary according to different sample recruitment assumptions • Seeding of the chain • Recruitment patterns – How much does the ideal case (random seeding and random recruitment) diverge from actual RDS seeding and recruitment matched to the Liuzhou FSW data? 25
  • 26. Results: Violation of SPPD assumption • Compared individual degree to the proportion of times an individual was sampled across the simulated chains – Very high correlation when seeds and referrals are random – SSPD assumption increasingly violated when seeds & referrals are matched to the actual data – Over-recruitment of middle tier sex workers drives the result • For more: – Merli, Moody, Smith et al., Social Science & Medicine, 2015 26 r=0.82 r=0.96 r=0.97 Merli, Moody, Smith et al., SSM, 2015
  • 27. Distribution of RDS2-VH proportion estimates (low/middle tier) across seeding and recruitment scenarios 27 Verdery, Merli, Moody et al. 2015, Epidemiology
  • 28. Variability of estimates: Design effects (ratio of variance in RDS estimates to variance in estimates from same size SRS) • DE very large, but not out of line with findings of prior work (Goel and Salganik 2010) • Large Design Effects imply that much larger sample sizes would be required to reach level of precision currently assumed from RDS samples typically in the hundreds • CDC recommends RDS sample sizes in the hundreds for public health surveillance – IMPLICATIONS: Not sufficient power to identify changes in behaviors or disease prevalence 28 DemDem DemRan RanRan Middle Tier 6.18 19.60 28.20
  • 29. Discussion • Seeding and recruitment scenarios – Matching on seeds not critical – Matching on recruitment patterns has a larger effect, exacerbates biases but reduces design effects • Problematic because seems harder to control than seed matching 29
  • 30. Estimator performance • Estimator development – Only one (RDS1-LEN) works markedly better than others • Robust to preferential recruitment by taking into account respondents’ ego- network composition – BUT unusable for most (unobservable) characteristics we care about – Still problems with variance estimation 30 Verdery, Merli, Moody et al. 2015, Epidemiology Distributions of estimates of proportions in low tiers of sex work by estimator (recruitment and seeds matched to the Liuzhou FSW data)
  • 31. Recent innovation: IP-RDS (Verdery, Merli, Moody, In Progress) • What can be done to improve the performance of RDS estimates while retaining the method’s desirable peer- driven sample recruitment properties? • Modify RDS data collection process • Apply antithetic variate mean estimator to data • Results from simulations: Improved estimation performance 31
  • 32. New data collection protocol IP-RDS • Incentivize respondents to invert their preferences when choosing new respondents, i.e. respondents are asked to invert their recruitment preferences on the recruitment biasing variable (e.g. tier of sex work) 32
  • 36. Antithetic variate mean estimator • 𝜇 𝐴𝑉 = 𝑖∈𝑚1 𝑦 𝑖 2 + 𝑖∈𝑚2 𝑦 𝑖 2 , where yi is the value of the focal variable for the i respondent m1 is the count of recruitments by members of one group of the recruitment biasing variable (e.g. tier of sex work), and m2 is the count of recruitments by members of the other group 36
  • 37. Distributions of estimates of proportions in low/mid tiers of sex work by estimator (naïve mean, RDS2-VH, AV-IP_RDS) and level of biased recruitment behavior (absolute difference in recruitment probabilities conditional on attribute of targeted peer) 37
  • 38. Discussion of IP-RDS • Simple change to RDS protocol – May or may not require financial incentives for targeted recruitment (empirical question) • Outperforms conventional estimators – Gains in bias reduction comparable to RDS1-LEN estimator • Tested on more networks (similar results) • BUT …Not yet field tested 38
  • 39. Network Sampling with Memory • Mouw and Verdery 2012, Sociological Methodology • Collects network data • Introduces researcher’s control over the sampling process • Directs the recruitment process to more efficiently explore the network (avoiding bottlenecks)
  • 40. How does NSM work? • Recruitment starts with a few seed respondents • Network roster data collected from respondents about minimally identifying information of their network members (last name and last four digits of cell phone number) to connect nodes in the network (up to 10 network members per respondent) • NSM sampling algorithm selects up to 3 nominated network members per respondent and asks respondents for full contact information on these • Process proceeds iteratively to recruit new waves of respondents
  • 42. How does NSM work? • NSM sampling algorithm uses two sampling modes, List and Search • List mode – keeps a list, L, of all nominated network members – samples with replacement from L – even sampling of new nodes -- new nodes sampled at the same cumulative sampling rate as earlier nodes – as list of sampled nodes approaches the full population network, NSM sample converges to simple random sampling
  • 43. How does NSM work? • Search mode—look for “bridge” nodes to unexplored parts of the network. Start in search mode, then switch to list mode.
  • 44. Simulation results • Test NSM vs. RDS using 162 university and School networks from Facebook and Add Health • Size of networks ranges from 300 to 16,500 nodes • Estimate % white (Add Health) and % first year students (Facebook) • Start from a randomly selected student, repeat 500 times for each network • Calculate bias, design effects and mean absolute bias • Test (162 networks) DE is 1.16 for NSM vs 77.38 for RDS
  • 45. Is it feasible? • Is it feasible to collect network data on hidden populations? • 2010 NSIT (Network Survey of Immigration and Transnationalism) (Mouw, PI) • CAHS (Chinese in Africa Health Survey) (Merli, PI) • Cost effectiveness of gains in precision
  • 46. NSM field applications Network Survey of Immigration and Transnationalism (NSIT) Mouw et al. 2014. Social Problems; Verdery et al. 2016. Social Networks Chinese in Africa Health Survey (CAHS) Merli, Verdery, Mouw, Li 2016. Migration Studies 46 Red: RDU Blue: Mexico Green: Houston Small: Nominated Large: Sampled Network of Chinese migrants in Dar es Salaam sampled by NSM, size = probability of selecting next node
  • 47. Key challenge: Getting referrals from respondents • NSIT required recontacting respondents to get contact information on alters • CAHS -- “forward” sampling variant (FNSM)— more practical – Asked for contact information on a small number of alters at each interview (selected by NSM algorithm)
  • 48. NSM -- Future directions • NIH R21 grant to test NSM among Chinese immigrants in RDU (Merli, Mouw, Verdery, Moody, Keister, Sanders) – Pilot various approaches to get referrals from respondents – Evaluate NSM against ACS – Test multiple modes of data collection (in-person, telephone, web) 48