RDS and network sampling methods aim to sample hidden populations for which traditional sampling frames do not exist. The document discusses issues with sampling hidden populations and evaluates Respondent Driven Sampling (RDS) and a new method called Network Sampling with Memory (NSM). It finds that RDS estimates can be biased when its assumptions are violated. A new data collection method called Inverse Preferential RDS (IP-RDS) and the NSM method show promise in improving estimation through modifications to the sampling process and collection of network data. Field testing is still needed to validate these innovative approaches.
06 Network Study Design: Ethical Considerations and Safeguardsdnac
This document outlines ethical considerations and safeguards for social network study design. It discusses principles from the Belmont Report including respect for persons, beneficence, and justice. Key risks in social network research are deductive disclosure, outing people, and legal or privacy risks from relational data. Mitigation strategies include data agreements, restricting access to identifying data, training researchers, and communicating clearly with IRBs. The document emphasizes that social network studies require safeguarding participant and alter privacy.
12 Network Experiments and Interventions: Studying Information Diffusion and ...dnac
This document summarizes research on studying information diffusion and collective action through network experiments and interventions. The research aims to identify optimal strategies for information dissemination for public policy by comparing the effectiveness of different dissemination methods, including using phone/IVR, government representatives, and social network seeds. It also examines how an individual's decision to participate is influenced by information and participation within their social network, and whether there are threshold or free-riding effects. The proposed experiments will randomize information dissemination methods and incentives for individuals and networks to participate in community activities across villages in India. Network and individual participation data will be collected through surveys to analyze the impact of social networks and information on collective action.
This document provides an overview of community detection in networks. It begins with an introduction to the concept of communities and their usefulness in network analysis. It then discusses two main approaches to calculating communities - descriptive methods like modularity, and generative methods like stochastic block models. The document notes that community detection is an active area of research, with opportunities to extend current methods. It provides several examples of community detection applications and acknowledges contributions from other researchers in the field.
13 An Introduction to Stochastic Actor-Oriented Models (aka SIENA)dnac
This document provides an introduction to Stochastic Actor-Oriented Models (SAOMs), also known as SIENA models. It discusses when SAOMs are appropriate to use, provides an overview of the general SAOM form, and covers key components like the network and behavior objective functions and rate functions. The presentation also outlines how SAOMs are estimated and fitted to data, provides an empirical example, and discusses extensions. SAOMs model how networks and behaviors change over time as actors make micro-level decisions to maximize their objective functions.
This document provides an overview of ego network analysis. It defines ego networks as consisting of a focal individual (ego) and the people they are connected to (alters). Various measures of ego network composition, structure, and properties can be analyzed, such as size, density, and homophily. These measures provide insight into an individual's social support and influence, and can be used to study health-related questions by examining the characteristics and behaviors present in one's social network. Ego network data is relatively easy to collect and can offer information about both individuals and inferred properties of broader social networks.
01 Introduction to Networks Methods and Measuresdnac
This document provides an introduction to social network analysis. It discusses how networks matter through two fundamental mechanisms: connections and positions. Connections refer to the flow of things through networks, viewing networks as pipes. Positions refer to relational patterns and networks capturing role behavior, viewing networks as roles. The document also covers basic network data structures including nodes, edges, directed/undirected ties, binary/valued ties, and different levels of analysis such as ego networks and complete networks. It provides examples of one-mode and two-mode network data.
Respondent-Driven Sampling (RDS) is a method for sampling hidden populations by leveraging social networks. RDS begins with a small number of initial participants (seeds) who are given coupons to recruit a limited number of people from their social networks. Those recruits then recruit others from their own networks. The process continues in chains of referrals until the target sample size is reached. RDS aims to correct for biases through statistical weighting based on network properties like degree. Over 500 studies have used RDS across diverse fields. While RDS has enabled new insights, its assumptions of random recruitment and unbiased seeds are often violated. Estimates of sampling variance from RDS also tend to be problematic. Nonetheless, R
This document discusses considerations for collecting social network data. It addresses network sampling approaches including ego network designs, complete network designs, and partial network designs. It also covers network measurement including name generators and interpreters. Additional topics include the number of name generators to use, whether to cap the number of alters elicited, specificity of relationship questions, and binary versus valued versus nested response options. The document aims to provide an overview of key issues to consider when gathering social network data.
06 Network Study Design: Ethical Considerations and Safeguardsdnac
This document outlines ethical considerations and safeguards for social network study design. It discusses principles from the Belmont Report including respect for persons, beneficence, and justice. Key risks in social network research are deductive disclosure, outing people, and legal or privacy risks from relational data. Mitigation strategies include data agreements, restricting access to identifying data, training researchers, and communicating clearly with IRBs. The document emphasizes that social network studies require safeguarding participant and alter privacy.
12 Network Experiments and Interventions: Studying Information Diffusion and ...dnac
This document summarizes research on studying information diffusion and collective action through network experiments and interventions. The research aims to identify optimal strategies for information dissemination for public policy by comparing the effectiveness of different dissemination methods, including using phone/IVR, government representatives, and social network seeds. It also examines how an individual's decision to participate is influenced by information and participation within their social network, and whether there are threshold or free-riding effects. The proposed experiments will randomize information dissemination methods and incentives for individuals and networks to participate in community activities across villages in India. Network and individual participation data will be collected through surveys to analyze the impact of social networks and information on collective action.
This document provides an overview of community detection in networks. It begins with an introduction to the concept of communities and their usefulness in network analysis. It then discusses two main approaches to calculating communities - descriptive methods like modularity, and generative methods like stochastic block models. The document notes that community detection is an active area of research, with opportunities to extend current methods. It provides several examples of community detection applications and acknowledges contributions from other researchers in the field.
13 An Introduction to Stochastic Actor-Oriented Models (aka SIENA)dnac
This document provides an introduction to Stochastic Actor-Oriented Models (SAOMs), also known as SIENA models. It discusses when SAOMs are appropriate to use, provides an overview of the general SAOM form, and covers key components like the network and behavior objective functions and rate functions. The presentation also outlines how SAOMs are estimated and fitted to data, provides an empirical example, and discusses extensions. SAOMs model how networks and behaviors change over time as actors make micro-level decisions to maximize their objective functions.
This document provides an overview of ego network analysis. It defines ego networks as consisting of a focal individual (ego) and the people they are connected to (alters). Various measures of ego network composition, structure, and properties can be analyzed, such as size, density, and homophily. These measures provide insight into an individual's social support and influence, and can be used to study health-related questions by examining the characteristics and behaviors present in one's social network. Ego network data is relatively easy to collect and can offer information about both individuals and inferred properties of broader social networks.
01 Introduction to Networks Methods and Measuresdnac
This document provides an introduction to social network analysis. It discusses how networks matter through two fundamental mechanisms: connections and positions. Connections refer to the flow of things through networks, viewing networks as pipes. Positions refer to relational patterns and networks capturing role behavior, viewing networks as roles. The document also covers basic network data structures including nodes, edges, directed/undirected ties, binary/valued ties, and different levels of analysis such as ego networks and complete networks. It provides examples of one-mode and two-mode network data.
Respondent-Driven Sampling (RDS) is a method for sampling hidden populations by leveraging social networks. RDS begins with a small number of initial participants (seeds) who are given coupons to recruit a limited number of people from their social networks. Those recruits then recruit others from their own networks. The process continues in chains of referrals until the target sample size is reached. RDS aims to correct for biases through statistical weighting based on network properties like degree. Over 500 studies have used RDS across diverse fields. While RDS has enabled new insights, its assumptions of random recruitment and unbiased seeds are often violated. Estimates of sampling variance from RDS also tend to be problematic. Nonetheless, R
This document discusses considerations for collecting social network data. It addresses network sampling approaches including ego network designs, complete network designs, and partial network designs. It also covers network measurement including name generators and interpreters. Additional topics include the number of name generators to use, whether to cap the number of alters elicited, specificity of relationship questions, and binary versus valued versus nested response options. The document aims to provide an overview of key issues to consider when gathering social network data.
This document discusses considerations for collecting social network data through surveys. It addresses research design elements like defining the relevant population boundaries and sampling approaches. For surveys specifically, it covers informed consent, name generator questions to identify social ties, response formats, and balancing depth of network detail collected versus sample size. The key challenges are defining the theoretical population of interest, collecting a sufficiently large and representative network sample, and designing survey questions that accurately capture social ties within time and resource constraints.
The document discusses different types of network experiments and interventions. It describes (1) assigning roommates randomly to manipulate networks and assess peer effects, (2) using natural experiments to manipulate exposure over existing networks, and (3) interventions that use networks to affect change. Specifically, it covers exogenous network experiments that randomly assign relationships, issues with experimental assignment, and four types of interventions: targeting individuals, segmentation, induction, and alteration.
The document discusses network diffusion and peer influence. It covers compartmental models of diffusion, how network structure affects diffusion through factors like distance, clustering, and highly connected nodes. Simulation studies show networks with shorter path distances, more independent paths between nodes, and higher clustering coefficients diffuse ideas and behaviors more quickly. The regression analysis finds these network structural characteristics strongly predict a network's relative diffusion ratio compared to random networks.
This document summarizes a study that used a stochastic actor-oriented model to analyze data from a randomized controlled trial in Tanzania. The trial examined how social networks influenced HIV testing rates among young men. Survey data on men's friendship networks and HIV testing behaviors were collected at three time points. The model estimated the effects of descriptive and injunctive social norms within friendship networks and across camps on changes in men's HIV testing from the second to third time points, while accounting for selection effects. The results provide insight into how social influence spreads within networks and impacts health behaviors over time.
This document provides an overview of different measures for analyzing social networks, including centrality measures, connectivity and cohesion measures, and roles. It discusses centrality measures like degree, closeness, betweenness, and eigenvector centrality for individual nodes. It also covers whole network measures like degree distribution, density, and centralization. The document describes local connectivity and cohesion measures including reciprocity, triad census, transitivity, and clustering coefficients. It discusses how these measures can be applied and interpreted for one-mode projections of two-mode networks.
10 More than a Pretty Picture: Visual Thinking in Network Studiesdnac
Visualization has been important in network science since its beginnings to make invisible structures visible. While metrics can describe networks, visualizations allow researchers to see relationships and patterns across multiple dimensions that numbers alone cannot reveal. Effective network visualizations communicate insights that would be difficult to understand otherwise, by depicting global patterns and local details simultaneously in a way that builds intuition about the network's structure and generating processes. However, challenges include lack of consistent display frameworks, integrating too much multidimensional information, and issues of scale for large and dynamic networks.
This document discusses ego network analysis and its advantages over sociocentric network analysis. It begins with an overview of ego networks and sociocentric networks. Ego networks have several practical advantages, including flexibility in data collection, broader inference potential, and the ability to examine overlapping social circles. However, ego networks also have disadvantages like inability to measure reciprocated ties and map broader social structure. The document then reviews common measures used in ego network analysis, including measures of network size, tie strength, composition, and homophily. It provides examples of how to operationalize these concepts.
Networks provide connections and positions that influence health outcomes. Social network analysis examines relationships between actors to understand how networks impact behavior. Networks matter through both connectionist mechanisms like diffusion, and positional mechanisms like social roles. Network data can be analyzed at different levels from individual ego networks to global networks, and can involve one or multiple types of relationships between nodes. Social network data is commonly represented through matrices and lists to encode network structure and allow computational analysis.
This document provides an overview of Respondent-Driven Sampling (RDS), a method for sampling hidden populations using their social networks. RDS begins by selecting initial participants, called "seeds", who each recruit a small number of new participants into the study. Those new participants can then recruit others, creating a chain-referral sample. The document discusses how RDS works, its applications across many hidden populations, and some of its promises and pitfalls, including high sampling variance requiring large sample sizes compared to simple random sampling. It also reviews recent progress on estimating sampling variance from RDS studies.
This document discusses different types of network experiments and interventions. It describes (1) using roommate assignments to make social connections exogenous, assessing peer effects on outcomes like GPA. It also discusses (2) natural experiments that manipulate exposure over existing networks, like popularity or voter turnout. Finally, it outlines (3) different types of network interventions, including targeting influential individuals, segmenting groups, inducing new connections, and altering network structure. The conclusion is that evidence from these experiments shows peer influence is real and we can now focus on how to leverage networks most effectively.
This document discusses Network Canvas, a software for collecting social network data developed by Michelle Birkett and others. It is being used in the RADAR study to collect longitudinal network data from over 1000 young men who have sex with men in Chicago. The software was designed to be intuitive for participants to select, position and manipulate nodes representing their social connections. It aims to capture complex network and attribute data across multiple time points. The document discusses the project workflow, comparisons to other network data collection methods, evaluation plans and sustainability efforts through workshops and community involvement.
I. The document discusses ego networks and how they can be used to study personal networks and relationships. Ego networks combine traditional survey data with network data by collecting information about respondents (egos) and their social ties (alters).
II. Ego network data can be used to examine the effects of network structure and alter characteristics on outcomes of interest. It can also provide insights into diffusion processes within personal networks.
III. While ego network data is useful for studying local network phenomena, global network data is needed to analyze higher-level structural effects, mechanisms of tie formation and diffusion across an entire network. Statistical techniques like randomization and the Quadratic Assignment Procedure are used to analyze ego and global network data
Ego network analysis measures relationships between an individual (ego) and their social contacts (alters). Common measures include degree (number of alters), tie strength, multiplexity (overlap in tie functions), and alter attributes like composition, similarity to ego, and heterogeneity. Measures of relationships between alters, like density and structural holes, provide information on network constraints and opportunities. Proper data management is required to store ego, alter, and alter-alter relationships.
The document discusses network diffusion and peer influence. It begins by defining diffusion and compartment models used to model disease spread. It then discusses how network structure, including topology, timing of connections, and structural transmission, can impact diffusion. Simulation is proposed to test how network features like distance, clustering, redundancy, and high-degree nodes influence spread. The relationships between contact networks, exposure networks based on timing, and actual transmission networks are also introduced.
This document summarizes three types of field experiments related to social networks:
1) Peer effects experiments examine whether individual j influences the behaviors or outcomes of individual i. Examples test whether encouraging individual i to vote or buy a product also influences their friend j.
2) Network formation experiments study what factors affect whether individual i forms a network tie with individual j. Examples test how anonymity, search costs, and interactions affect network tie formation.
3) Designing networks experiments evaluate which network structures maximize outcomes at the network level. Examples design peer groups and seed farmers to test how network structure impacts behavior diffusion.
This study investigated the reciprocal relationships between social network characteristics, social support, and mental health among older adults in the United States. The study found reciprocal associations between social support and depressive symptoms, as well as between social support and certain measures of social network structure. While social support was protective of depression, depression could undermine received support over time. The strongest link between social networks and depression was indirect, through levels of social support. Future research should focus on social support as an important pathway through which social networks impact mental health.
This document discusses diffusion and peer influence through networks. It begins by defining diffusion and compartment models used to model disease spread. It then discusses how network structure, including topology, timing of connections, and clustering, can impact diffusion compared to random mixing. Key network features that influence diffusion speed and reach include distance between actors, number of alternate paths, presence of highly connected "star" nodes, and assortative mixing. The document concludes by exploring how different degree distributions in emergent low-density networks can impact the formation of large connected components.
Graph and language embeddings were used to analyze user data from Reddit to predict whether authors would post in the SuicideWatch subreddit. Metapath2vec was used to generate graph embeddings from subreddit and author relationships. Doc2vec was used to generate document embeddings based on language similarity between submissions and subreddits. Combining the graph and document embeddings in a logistic regression achieved 90% accuracy in predicting SuicideWatch posters, reducing both false positives and false negatives compared to using the embeddings separately. Next steps proposed using the embeddings to better understand similarities between related subreddits and predict risk factors in posts.
AAPOR - comparing found data from social media and made data from surveysCliff Lampe
This presentation was for the 2014 AAPOR conference, and deals with specific components of how "big data" from social media is different from data acquired through surveys.
APLIC 2014 - Social Observatories Coordinating NetworkAPLICwebmaster
NSF project looks to define social science research for the 21st century. The major objective of the SOCN is to continue exploration of ideas regarding the potential form and functioning of such a network of social observatories and to actively engage individuals and groups across the SBE research community in this process.
This document summarizes research on how public health delivery systems in New York State responded to changes in cancer screening policies and the implementation of the Affordable Care Act. The research analyzed network data from 36 organizations delivering cancer screenings to the uninsured over 4 time periods from 2015 to 2017. Key findings include:
- Partners and relationships decreased initially after the ACA but then stabilized, indicating a period of adjustment. Clinical partners and information sharing increased while community partners decreased.
- Referrals within partnerships declined significantly over time, suggesting relationships took time to develop or were not maintained.
- Changes indicate the delivery systems experienced churn as organizations adjusted their partnerships in response to policy changes, but the systems did not reach a new
This document discusses considerations for collecting social network data through surveys. It addresses research design elements like defining the relevant population boundaries and sampling approaches. For surveys specifically, it covers informed consent, name generator questions to identify social ties, response formats, and balancing depth of network detail collected versus sample size. The key challenges are defining the theoretical population of interest, collecting a sufficiently large and representative network sample, and designing survey questions that accurately capture social ties within time and resource constraints.
The document discusses different types of network experiments and interventions. It describes (1) assigning roommates randomly to manipulate networks and assess peer effects, (2) using natural experiments to manipulate exposure over existing networks, and (3) interventions that use networks to affect change. Specifically, it covers exogenous network experiments that randomly assign relationships, issues with experimental assignment, and four types of interventions: targeting individuals, segmentation, induction, and alteration.
The document discusses network diffusion and peer influence. It covers compartmental models of diffusion, how network structure affects diffusion through factors like distance, clustering, and highly connected nodes. Simulation studies show networks with shorter path distances, more independent paths between nodes, and higher clustering coefficients diffuse ideas and behaviors more quickly. The regression analysis finds these network structural characteristics strongly predict a network's relative diffusion ratio compared to random networks.
This document summarizes a study that used a stochastic actor-oriented model to analyze data from a randomized controlled trial in Tanzania. The trial examined how social networks influenced HIV testing rates among young men. Survey data on men's friendship networks and HIV testing behaviors were collected at three time points. The model estimated the effects of descriptive and injunctive social norms within friendship networks and across camps on changes in men's HIV testing from the second to third time points, while accounting for selection effects. The results provide insight into how social influence spreads within networks and impacts health behaviors over time.
This document provides an overview of different measures for analyzing social networks, including centrality measures, connectivity and cohesion measures, and roles. It discusses centrality measures like degree, closeness, betweenness, and eigenvector centrality for individual nodes. It also covers whole network measures like degree distribution, density, and centralization. The document describes local connectivity and cohesion measures including reciprocity, triad census, transitivity, and clustering coefficients. It discusses how these measures can be applied and interpreted for one-mode projections of two-mode networks.
10 More than a Pretty Picture: Visual Thinking in Network Studiesdnac
Visualization has been important in network science since its beginnings to make invisible structures visible. While metrics can describe networks, visualizations allow researchers to see relationships and patterns across multiple dimensions that numbers alone cannot reveal. Effective network visualizations communicate insights that would be difficult to understand otherwise, by depicting global patterns and local details simultaneously in a way that builds intuition about the network's structure and generating processes. However, challenges include lack of consistent display frameworks, integrating too much multidimensional information, and issues of scale for large and dynamic networks.
This document discusses ego network analysis and its advantages over sociocentric network analysis. It begins with an overview of ego networks and sociocentric networks. Ego networks have several practical advantages, including flexibility in data collection, broader inference potential, and the ability to examine overlapping social circles. However, ego networks also have disadvantages like inability to measure reciprocated ties and map broader social structure. The document then reviews common measures used in ego network analysis, including measures of network size, tie strength, composition, and homophily. It provides examples of how to operationalize these concepts.
Networks provide connections and positions that influence health outcomes. Social network analysis examines relationships between actors to understand how networks impact behavior. Networks matter through both connectionist mechanisms like diffusion, and positional mechanisms like social roles. Network data can be analyzed at different levels from individual ego networks to global networks, and can involve one or multiple types of relationships between nodes. Social network data is commonly represented through matrices and lists to encode network structure and allow computational analysis.
This document provides an overview of Respondent-Driven Sampling (RDS), a method for sampling hidden populations using their social networks. RDS begins by selecting initial participants, called "seeds", who each recruit a small number of new participants into the study. Those new participants can then recruit others, creating a chain-referral sample. The document discusses how RDS works, its applications across many hidden populations, and some of its promises and pitfalls, including high sampling variance requiring large sample sizes compared to simple random sampling. It also reviews recent progress on estimating sampling variance from RDS studies.
This document discusses different types of network experiments and interventions. It describes (1) using roommate assignments to make social connections exogenous, assessing peer effects on outcomes like GPA. It also discusses (2) natural experiments that manipulate exposure over existing networks, like popularity or voter turnout. Finally, it outlines (3) different types of network interventions, including targeting influential individuals, segmenting groups, inducing new connections, and altering network structure. The conclusion is that evidence from these experiments shows peer influence is real and we can now focus on how to leverage networks most effectively.
This document discusses Network Canvas, a software for collecting social network data developed by Michelle Birkett and others. It is being used in the RADAR study to collect longitudinal network data from over 1000 young men who have sex with men in Chicago. The software was designed to be intuitive for participants to select, position and manipulate nodes representing their social connections. It aims to capture complex network and attribute data across multiple time points. The document discusses the project workflow, comparisons to other network data collection methods, evaluation plans and sustainability efforts through workshops and community involvement.
I. The document discusses ego networks and how they can be used to study personal networks and relationships. Ego networks combine traditional survey data with network data by collecting information about respondents (egos) and their social ties (alters).
II. Ego network data can be used to examine the effects of network structure and alter characteristics on outcomes of interest. It can also provide insights into diffusion processes within personal networks.
III. While ego network data is useful for studying local network phenomena, global network data is needed to analyze higher-level structural effects, mechanisms of tie formation and diffusion across an entire network. Statistical techniques like randomization and the Quadratic Assignment Procedure are used to analyze ego and global network data
Ego network analysis measures relationships between an individual (ego) and their social contacts (alters). Common measures include degree (number of alters), tie strength, multiplexity (overlap in tie functions), and alter attributes like composition, similarity to ego, and heterogeneity. Measures of relationships between alters, like density and structural holes, provide information on network constraints and opportunities. Proper data management is required to store ego, alter, and alter-alter relationships.
The document discusses network diffusion and peer influence. It begins by defining diffusion and compartment models used to model disease spread. It then discusses how network structure, including topology, timing of connections, and structural transmission, can impact diffusion. Simulation is proposed to test how network features like distance, clustering, redundancy, and high-degree nodes influence spread. The relationships between contact networks, exposure networks based on timing, and actual transmission networks are also introduced.
This document summarizes three types of field experiments related to social networks:
1) Peer effects experiments examine whether individual j influences the behaviors or outcomes of individual i. Examples test whether encouraging individual i to vote or buy a product also influences their friend j.
2) Network formation experiments study what factors affect whether individual i forms a network tie with individual j. Examples test how anonymity, search costs, and interactions affect network tie formation.
3) Designing networks experiments evaluate which network structures maximize outcomes at the network level. Examples design peer groups and seed farmers to test how network structure impacts behavior diffusion.
This study investigated the reciprocal relationships between social network characteristics, social support, and mental health among older adults in the United States. The study found reciprocal associations between social support and depressive symptoms, as well as between social support and certain measures of social network structure. While social support was protective of depression, depression could undermine received support over time. The strongest link between social networks and depression was indirect, through levels of social support. Future research should focus on social support as an important pathway through which social networks impact mental health.
This document discusses diffusion and peer influence through networks. It begins by defining diffusion and compartment models used to model disease spread. It then discusses how network structure, including topology, timing of connections, and clustering, can impact diffusion compared to random mixing. Key network features that influence diffusion speed and reach include distance between actors, number of alternate paths, presence of highly connected "star" nodes, and assortative mixing. The document concludes by exploring how different degree distributions in emergent low-density networks can impact the formation of large connected components.
Graph and language embeddings were used to analyze user data from Reddit to predict whether authors would post in the SuicideWatch subreddit. Metapath2vec was used to generate graph embeddings from subreddit and author relationships. Doc2vec was used to generate document embeddings based on language similarity between submissions and subreddits. Combining the graph and document embeddings in a logistic regression achieved 90% accuracy in predicting SuicideWatch posters, reducing both false positives and false negatives compared to using the embeddings separately. Next steps proposed using the embeddings to better understand similarities between related subreddits and predict risk factors in posts.
AAPOR - comparing found data from social media and made data from surveysCliff Lampe
This presentation was for the 2014 AAPOR conference, and deals with specific components of how "big data" from social media is different from data acquired through surveys.
APLIC 2014 - Social Observatories Coordinating NetworkAPLICwebmaster
NSF project looks to define social science research for the 21st century. The major objective of the SOCN is to continue exploration of ideas regarding the potential form and functioning of such a network of social observatories and to actively engage individuals and groups across the SBE research community in this process.
This document summarizes research on how public health delivery systems in New York State responded to changes in cancer screening policies and the implementation of the Affordable Care Act. The research analyzed network data from 36 organizations delivering cancer screenings to the uninsured over 4 time periods from 2015 to 2017. Key findings include:
- Partners and relationships decreased initially after the ACA but then stabilized, indicating a period of adjustment. Clinical partners and information sharing increased while community partners decreased.
- Referrals within partnerships declined significantly over time, suggesting relationships took time to develop or were not maintained.
- Changes indicate the delivery systems experienced churn as organizations adjusted their partnerships in response to policy changes, but the systems did not reach a new
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant DivideAg4HealthNutrition
This document discusses the benefits and challenges of integrating qualitative and quantitative research methods. It argues that keeping qualitative and quantitative research separate unnecessarily limits understanding of the social world. Both methods have strengths, and using them together can overcome their individual weaknesses. The document outlines differences in qualitative and quantitative research and provides an example study that combined the methods sequentially and concurrently to better understand long-term poverty impacts in Bangladesh.
A Two-sample Approach for State Estimates of a Chronic Condition Outcomesoder145
This document describes a two-sample approach to estimate state-level chronic condition prevalence using the National Health Interview Survey (NHIS) and Current Population Survey (CPS). The method involves identifying chronic condition status in NHIS, creating identically coded covariates in both surveys, predicting survey membership using covariates, and imputing missing CPS values using predicted survey membership and covariates. State estimates produced using this method found 21 states with significantly different chronic condition rates compared to using region-level estimates. Limitations and opportunities for future extensions are also discussed.
Survey research involves using questionnaires to collect data from a sample that represents a larger population in order to describe opinions, behaviors, or other variables. It is important to select a representative sample through careful probability sampling techniques to avoid biased results. There are different types of survey designs such as cross-sectional, successive independent samples, and longitudinal designs. Reliability and validity must be established to ensure survey measures accurately assess the intended constructs. While survey research can identify correlations, causation cannot be definitively determined due to limitations in establishing directionality and ruling out alternative explanations.
This document discusses considerations for collecting social network data through surveys. It addresses research design elements like defining the boundaries of the relevant population, sampling approaches for collecting local, global or complete network data, and sources of network data including surveys, archives, and secondary data sources. The document also provides guidance on survey elements like name generators, response formats, and balancing breadth versus depth of network data collection given time constraints of surveys.
Sdal air health and social development (jan. 27, 2014) finalkimlyman
This document summarizes a workshop on health and social development analytics using big data. It discusses how data sources are becoming larger, more diverse and used for multiple purposes. This presents opportunities to better understand issues but also challenges around privacy, bias and data quality. The workshop aims to identify partnership opportunities and prototype projects using integrated data to address health and social issues. Case studies from various institutions are presented using combined data sources like medical records, surveys and environmental factors.
Sampling for Quantities & Qualitative Research Abeer AlNajjar.docxanhlodge
Sampling for Quantities & Qualitative Research
Abeer AlNajjar
1
Population
Target group (universe in texts)
Census (to study every member of a population)
because measuring every member of a population usually is not feasible most researchers employ a Sample
Sample ( a subgroup of the population)
2
Communication researchers are interested in a population (also called a universe when applied to texts) of communicators, all the people who posses a particular characteristic, or, in the case of those who study texts, all the messages that share a characteristic of interest.
The population of interest to researchers (often called the target group) might be members of a business, communication majors at a university, all students at a university, all people living in a city, all eligible voters in a country.
Texts ( editorials published in a specific newspaper for a week, or a large universe such as every editorial published In every newspaper in the UAE, or even larger such as all persuasive messages).
The best way to generalize to a population is to study every member of a population (Census)
If every member is studied, we know, by definition, the population’s response at the point in time the study was done
Sample
The results from the sample are then generalized back to (used to represent) the population
Representative sample ( population validity)
Its similarity to its parent population
3
The results from the sample are then generalized back to (used to represent) the population). For such generalization to be valid (demonstrate population validity), the sample must be representative of its population. That is, it must accurately approximate the population.
Types of sampling
Random sampling (probability sampling)
Involves selecting a sample in such a way that each person in the population of interest has an equal chance of being included
Nonrandom sampling (nonprobability sampling)
Is what ever researchers do instead of using procedures that ensure that each member of a population has an equal chance of being selected
Sampling error
Is a number that express how much the characteristic of a sample probably differ from the characteristics of a population
5
There are 2 different types of sampling procedures, and differ in terms of how confident we are about the ability of the selected sample to represent the population from which it is drawn
Random sampling (probability sampling)
Involves selecting a sample in such a way that each person in the population of interest has an equal chance of being included
By giving everyone an equal chance , random sampling eliminates the danger of researchers biasing the selection process because of their own opinions or desires. By eliminating bias, random sampling provides the best assurance that the same characteristics of the population exist in the sample, and, therefore, that the sample represents the population.
Nonrandom sampling: it sometimes is .
Statistical models for the integration of multiple omics datasetsSaid El Bouhaddani
This document summarizes statistical methods for integrating multiple omics datasets. It discusses current approaches like integrating results across datasets or using factorization methods, and their drawbacks. It then presents a new probabilistic method called PO2PLS that addresses issues like high dimensionality, correlations, and heterogeneity across omics datasets. PO2PLS is applied to a hypertrophic cardiomyopathy dataset, showing a relationship between epigenomics and transcriptomics. Extensions to integrate indirect omics and additional biological information are also discussed.
Understanding ICPSR's Research Methods-related MetadataLynette Hoelter
This presentation is to be given at the 2015 ICPSR Official Representatives meeting. It is a workshop for those with little to no statistical or research methods background but who need to assist others in finding data appropriate for their research projects.
Response Rates Impact Data Quality, But not How you Might ThinkStephanie Eckman
delivered at World Bank, part of Development Data Group Learning Series
Washington DC, 2016-03-07
Response rates do not always provide an accurate depiction of data quality. Research based on a large multi-country survey indicate that when interviewers play a substantial role in sample selection, interviewer manipulation may artificially generate high response rates. For example, when using the random walk selection technique, interviewers should select every kth household, but they have substantial leeway in deciding which household is the kth one, and may preferentially select those where someone is home. Or, when rostering a household to select a random respondent, interviewers may leave off household members who are seldom at home. If many interviewers engage is such behaviors, a high response rate may in fact be the result of biased sample selection and therefore indicate low data quality.
There are two lessons from these findings. First, response rates should not be used as the sole or primary proxy for data quality. Second, whenever possible, interviewers’ role in sample selection should be minimized. The talk concludes with a review of alternative sampling methods that take advantage of geospatial data such as satellite photos, drone imagery and handheld GPS devices. The ideal sampling techniques are ones that minimize interviewer discretion and allow for verification of interviewer performance.
This document provides guidance on developing the methodology section of a research proposal. It discusses including descriptions of the research type (qualitative, quantitative, mixed), population and sampling method, and data collection and analysis tools and procedures. For the research type, the population should be defined along with the sampling strategy and sample size. Common data collection methods include surveys, interviews, and experiments. It is important to explain why the chosen methods are appropriate for answering the research question. The methodology allows readers to evaluate the reliability and validity of the study.
Jpgrund et al peer methods review-icdrh2010-v2Jean-Paul Grund
This document summarizes peer-driven interventions in harm reduction. It discusses various peer models used to target vulnerable populations, including peer education, counseling, and leader models. It focuses on describing the peer-driven intervention model in more detail. This model employs a community-based and small group supportive care approach. Studies show the peer-driven intervention model can recruit more diverse samples and have larger, longer-lasting effects on knowledge and behaviors compared to traditional outreach models, at a lower cost. The document concludes by discussing new areas for peer interventions and an upcoming randomized controlled trial of a peer-driven intervention with homeless populations in the Netherlands.
This paper looks at the problem of privacy in the context
of Online Social Networks (OSNs). In particular, it examines the predictability of different types of personal information based on OSN data and compares it to the perceptions of users about the disclosure of their information. To this end, a real life dataset is composed. This consists of the Facebook data (images, posts and likes) of 170 people along with
their replies to a survey that addresses both their personal information, as well as their perceptions about the sensitivity and the predictability of different types of information. Importantly, we evaluate several learning techniques for the prediction of user attributes based on their OSN data. Our analysis shows that the perceptions of users with respect to
the disclosure of specific types of information are often incorrect. For instance, it appears that the predictability of their political beliefs and employment status is higher than they tend to believe. Interestingly, it also appears that information that is characterized by users as more sensitive, is actually more easily predictable than users think, and vice versa (i.e. information that is characterized as relatively less sensitive is less easily predictable than users might have thought).
This study aims to investigate factors that influence male partner involvement in eliminating mother-to-child HIV transmission in Makueni County, Kenya. Male involvement is currently low, at 0.2% testing rate in antenatal clinics. The study will employ a mixed-methods design using questionnaires, focus groups, and key informant interviews to assess male involvement levels, how involvement influences elimination of transmission, and barriers/opportunities. Data will be collected from October 2022 to December 2022, analyzed from January 2023 to March 2023, and the thesis submitted by July 2023. Ethics approval will be obtained and informed consent, confidentiality, and participants' right to withdraw will be ensured. The budget is KSh 410
Similar to 09 Respondent Driven Sampling and Network Sampling with Memory (20)
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Deep Software Variability and Frictionless Reproducibility
09 Respondent Driven Sampling and Network Sampling with Memory
1. Respondent Driven Sampling &
Network Sampling with Memory
(time permitting…)
M. Giovanna Merli
Sanford School of Public Policy &
Duke Population Research Institute (DUPRI)
Duke University
2. Funding Acknowledgements
• RDS Data Collection in China (2009-2010)
– “Place-RDS Comparison Study”
• USAID under the terms of cooperative agreements GPO-A-00-03-00003-00 and
GPO-A-00-09-00003-0 (Weir, PI)
• China National Center for STD Control (Chen, PI)
• Duke CFAR AI064518 (Merli, PI)
– “Partnership for Social Science Research on HIV/AIDS in China”
• NICHD R24 HD056670 (Henderson, PI)
• RDS Data Analyses and Simulations (2011-2015)
– “Using Multiple Data Sources to Improve RDS Estimation”
• NICHD R01HD068523 (Merli, PI)
• NSM Data Collection in Tanzania
– PFirst Award/DGHI (Merli, PI)
2
3. Problems with the study of hidden
populations
Female sex workers, men who have sex with men, injecting drug users,
homeless, undocumented migrants are hidden populations
For these populations we typically want to:
• Obtain accurate and precise estimates of disease prevalence
• Discern impact on larger population health dynamics
• Identify gaps in HIV/STD prevention
Collecting data from hidden populations to infer population representation is
difficult because of the absence of a sampling frame – their members are hard
to identify
– Stigma
– Non response
– Lack of trust
– Rarity
3
4. Problems with the study of hidden
populations
• Convenience samples, clinic-based inquiries,
and sampling frames with limited coverage
(e.g. venue based sampling) lack basis for
inferring representation
4
5. Respondent Driven Sampling (RDS)
Heckathorn 1997, 2002; Salganik and Heckathorn 2004;
Volz and Heckathorn 2008
• Most popular solution to
problems of sampling
hidden populations
– 450+ studies
– 624+ papers, 10k+ citations
– Over $185 million from NIH
• Compare to “ego centric”
– 167 studies funded
– $42 million since 1990
5
6. How RDS works
• RDS primarily used to estimate population proportions of binary
nodal covariates (e.g. gender, infection status, tier of sex work, etc.)
• Leverages social network of respondents to recruit other
respondents
• Chain referral / peer recruitment / link tracing sampling strategy
– “Seed” participants (selected by convenience) receive coupons (2)
– Recruit 2-3 new participants each
– Each new respondent given 2-3 coupons to recruit others
– Recruitment incentives for participating and for successful recruitment
– No one participates more than once
– Process continues until desired sample size is obtained
6
15. Problems with estimation in link tracing
sampling designs of hidden populations
• Sampling frame
unavailable
• Sample inclusion
probabilities are not
known (hence sampling
weights unknown)
• Researchers have limited
control of the sampling
process
• Seed respondents not
chosen at random
16. RDS solution
• Sampling probabilities computed under an approximation of
the true sampling process
– RDS assumes non-seed participants are Sampled with Probability
Proportional to self-reported degree – (SPPD)
– Provable in a random walk on most graphs of interest
– Sampling probabilities approximated by degree, hence sampling
weight = 1/degree
• Weighting/estimation can yield asymptotically unbiased
estimates of the population mean
• SPPD assumption underpins much of RDS estimation claims
16
17. RDS estimators
Estimator Proportion Equation Notes
Naïve 𝑝 = 𝑖𝜖𝜒 𝑥𝑖 𝑛 −1 𝑥𝑖 is the value of the focal
variable for respondent 𝑖; 𝑛 is the
sample size
RDS1-SH
𝑝 = 𝑆0,1 𝑑0 𝑆0,1 𝑑0 + 𝑆1,0 𝑑1
−1 𝑆 𝑎,𝑏 is the estimated proportion of
recruitments from group 𝑎 to 𝑏;
𝑑 𝑎is the estimated average degree
in each group
(Salganik and Heckathorn 2004)
RDS1-LEN
𝑝 = 𝑆0,1
𝑒𝑔𝑜
𝑑0 𝑆0,1
𝑒𝑔𝑜
𝑑0 + 𝑆1,0
𝑒𝑔𝑜
𝑑1
−1 𝑆 𝑎,𝑏
𝑒𝑔𝑜
is the estimated proportion
of network ties from group 𝑎 to 𝑏
based on ego network reports
(Lu 2013)
RDS2-VH 𝒑 = 𝒊∈𝝌 𝒙𝒊 𝒅𝒊
−𝟏
𝒊∈𝝌 𝒅𝒊
−𝟏 −𝟏 𝒅𝒊
−𝟏
is the inverse of self-
reported degree for person 𝒊
(Volz and Heckathorn 2008)
17
18. In RDS, all approximations are subject to critical
assumptions that are often not met in the field
• About the unobserved sample recruitment process (most crucial)
– Respondent gives a coupon to a friend
– Respondents recruit new participants non-preferentially from amongst their
social contacts (each friend has an equal chance of being picked)
– The initial set of respondents (“seeds”) are drawn with random probabilities
– Respondents report their number of ties accurately (how many people you
know that are members of the population of interest?)
• About the social network structure
– Rapid mixing: The chain referral process converges very quickly to the
stationary distribution of a random walk (i.e. node selection probabilities are
independent of sample starting point)
– Connectedness: The target population must be connected by a network that
consists of a single component
– Network size: Network must be sufficiently large (sampling fraction small) that
sampling without replacement can be treated as if it is equivalent to sampling
with replacement
18
19. Prior evaluations of RDS
• Comparison of RDS estimates to known parameters of non-
hidden populations
– (Wejnert 2009; Wejnert & Heckathorn 2008; McCreesh et al. 2012)
• Test effects of violating RDS assumptions about social
network structure on synthetic populations
– (Gile & Handcock 2010; Thomas & Gile 2011; Lu et al. 2011)
• Examine effects of network structure in multiple empirical
settings with theoretical/ideal RDS samples
– (Goel & Salganik 2010; Mouw & Verdery 2012; Verdery , Mouw et al. 2015)
• Use full information on participants’ recruitment behavior to
evaluate non-preferential recruitment assumption
– (Yamanis, Merli, Neely et al. Sociological Methods and Research 2013)
19
20. RDS evaluation in the context of
Female Sex Workers in Liuzhou, China
• Evaluate SPPD assumption and
population coverage (Merli, Moody, Smith et
al., 2015 Social Science and Medicine)
• Evaluate performance of RDS
estimators (Verdery, Merli, Moody et al., 2015
Epidemiology)
• Propose RDS data collection
innovation to improve estimator
performance (Verdery, Merli, Moody, In
Progress)
• Evaluations with a simulation
approach grounded in empirical data
from a hidden population of FSWs in
China (Liuzhou, Guangxi Province)
(Weir, Merli, Li et al. 2012, Sexually Transmitted
Infections)
20
21. Data
• Two sources
– RDS: 583 FSWs (Oct. 2009 – Feb. 2010) (about 8% of total
FSW population in Liuzhou)
– PLACE (venue based sampling approach): 161 FSWs (Nov.
2009 – Mar. 2010)
• Same target population and inclusion definition
– Women who reside in Liuzhou who exchanged sex for money in last 4 weeks
• Same geographic area and similar time period
• Same measurement of key variables
– Test for biomarker of lifetime exposure to syphilis and core questionnaire
• Same face-to-face interview and common applicant pool for interviewers
• Rare to have two concurrent surveys in same population!
21
22. Description of the Liuzhou RDS sample
Tier
of sex
work
Venues where clients are
solicited
RDS
(N = 576)
High Karaoke bars, star hotels, discos,
night clubs
250
Middle Hair salons, saunas, massage
parlors, foot cleaning/massage,
bathhouses
268
Low Streets, parks, other public spaces 27
Non-
venue
based
Telephone, text, internet,
private referrals
31
22
Fisher and Merli 2014, Network Science.
23. Approach, part 1
• Construct “population social network” from data
collected in RDS and PLACE
– Used new methodologies for estimating social network
parameters and simulating population network
• Use Case Control Logistic Regression to estimate homophily
parameters from the RDS data (Smith, SM 2012)
• Use Exponential Random Graph Modeling to generate full
network from local structural features (ERGM; Handcock et al., JOSS 2008)
– Tested various sensitivities about the means by which
this population social network is constructed
• (which data source, venue size estimates, and assumptions
about geographic distribution of social network ties)
23
24. “Population social network”
Generate “population characteristics”
based on PLACE survey estimates
Add “population social network”
based on RDS survey estimates
24
25. Approach, part 2
• Simulate RDS chains over “population social
network” (1000 per recruitment scenario)
– Scenarios vary according to different sample
recruitment assumptions
• Seeding of the chain
• Recruitment patterns
– How much does the ideal case (random seeding
and random recruitment) diverge from actual RDS
seeding and recruitment matched to the Liuzhou
FSW data?
25
26. Results:
Violation of SPPD assumption
• Compared individual degree to
the proportion of times an
individual was sampled across
the simulated chains
– Very high correlation when
seeds and referrals are random
– SSPD assumption increasingly
violated when seeds & referrals
are matched to the actual data
– Over-recruitment of middle tier
sex workers drives the result
• For more:
– Merli, Moody, Smith et al.,
Social Science & Medicine,
2015
26
r=0.82 r=0.96 r=0.97
Merli, Moody, Smith et al., SSM, 2015
27. Distribution of RDS2-VH proportion estimates
(low/middle tier) across seeding and recruitment
scenarios
27
Verdery, Merli, Moody et al. 2015, Epidemiology
28. Variability of estimates: Design effects
(ratio of variance in RDS estimates to variance in estimates from same size SRS)
• DE very large, but not out of line with findings of prior work (Goel
and Salganik 2010)
• Large Design Effects imply that much larger sample sizes would
be required to reach level of precision currently assumed from
RDS samples typically in the hundreds
• CDC recommends RDS sample sizes in the hundreds for public
health surveillance – IMPLICATIONS: Not sufficient power to
identify changes in behaviors or disease prevalence
28
DemDem DemRan RanRan
Middle Tier 6.18 19.60 28.20
29. Discussion
• Seeding and recruitment scenarios
– Matching on seeds not critical
– Matching on recruitment patterns has a larger
effect, exacerbates biases but reduces design
effects
• Problematic because seems harder to control than seed
matching
29
30. Estimator performance
• Estimator development
– Only one (RDS1-LEN) works
markedly better than
others
• Robust to preferential
recruitment by taking into
account respondents’ ego-
network composition
– BUT unusable for most
(unobservable)
characteristics we care
about
– Still problems with variance
estimation
30
Verdery, Merli, Moody et al. 2015, Epidemiology
Distributions of estimates of proportions in low
tiers of sex work by estimator (recruitment and
seeds matched to the Liuzhou FSW data)
31. Recent innovation: IP-RDS
(Verdery, Merli, Moody, In Progress)
• What can be done to improve the performance of RDS
estimates while retaining the method’s desirable peer-
driven sample recruitment properties?
• Modify RDS data collection process
• Apply antithetic variate mean estimator to data
• Results from simulations: Improved estimation
performance
31
32. New data collection protocol
IP-RDS
• Incentivize respondents to invert their
preferences when choosing new respondents,
i.e. respondents are asked to invert their
recruitment preferences on the recruitment
biasing variable (e.g. tier of sex work)
32
36. Antithetic variate mean estimator
• 𝜇 𝐴𝑉 = 𝑖∈𝑚1 𝑦 𝑖
2
+ 𝑖∈𝑚2 𝑦 𝑖
2
, where
yi is the value of the focal variable for the i
respondent
m1 is the count of recruitments by members of
one group of the recruitment biasing variable
(e.g. tier of sex work), and m2 is the count of
recruitments by members of the other group
36
37. Distributions of estimates of proportions in low/mid tiers of sex work
by estimator (naïve mean, RDS2-VH, AV-IP_RDS) and level of biased
recruitment behavior (absolute difference in recruitment probabilities
conditional on attribute of targeted peer)
37
38. Discussion of IP-RDS
• Simple change to RDS protocol
– May or may not require financial incentives for
targeted recruitment (empirical question)
• Outperforms conventional estimators
– Gains in bias reduction comparable to RDS1-LEN
estimator
• Tested on more networks (similar results)
• BUT …Not yet field tested
38
39. Network Sampling with Memory
• Mouw and Verdery 2012, Sociological
Methodology
• Collects network data
• Introduces researcher’s control over the
sampling process
• Directs the recruitment process to more
efficiently explore the network (avoiding
bottlenecks)
40. How does NSM work?
• Recruitment starts with a few seed respondents
• Network roster data collected from respondents about
minimally identifying information of their network members
(last name and last four digits of cell phone number) to
connect nodes in the network (up to 10 network members per
respondent)
• NSM sampling algorithm selects up to 3 nominated network
members per respondent and asks respondents for full contact
information on these
• Process proceeds iteratively to recruit new waves of
respondents
42. How does NSM work?
• NSM sampling algorithm uses two sampling
modes, List and Search
• List mode
– keeps a list, L, of all nominated network members
– samples with replacement from L
– even sampling of new nodes -- new nodes sampled at
the same cumulative sampling rate as earlier nodes
– as list of sampled nodes approaches the full population
network, NSM sample converges to simple random
sampling
43. How does NSM work?
• Search mode—look for “bridge” nodes to
unexplored parts of the network. Start in
search mode, then switch to list mode.
44. Simulation results
• Test NSM vs. RDS using 162 university and School
networks from Facebook and Add Health
• Size of networks ranges from 300 to 16,500 nodes
• Estimate % white (Add Health) and % first year students
(Facebook)
• Start from a randomly selected student, repeat 500
times for each network
• Calculate bias, design effects and mean absolute bias
• Test (162 networks) DE is 1.16 for NSM vs 77.38 for RDS
45. Is it feasible?
• Is it feasible to collect network data on hidden
populations?
• 2010 NSIT (Network Survey of Immigration and
Transnationalism) (Mouw, PI)
• CAHS (Chinese in Africa Health Survey) (Merli, PI)
• Cost effectiveness of gains in precision
46. NSM field applications
Network Survey of Immigration and
Transnationalism (NSIT)
Mouw et al. 2014. Social Problems;
Verdery et al. 2016. Social Networks
Chinese in Africa Health Survey (CAHS)
Merli, Verdery, Mouw, Li 2016. Migration Studies
46
Red: RDU
Blue: Mexico
Green: Houston
Small: Nominated
Large: Sampled
Network of Chinese migrants in Dar es Salaam
sampled by NSM, size = probability of selecting
next node
47. Key challenge: Getting referrals from
respondents
• NSIT required recontacting respondents to get
contact information on alters
• CAHS -- “forward” sampling variant (FNSM)—
more practical
– Asked for contact information on a small number
of alters at each interview (selected by NSM
algorithm)
48. NSM -- Future directions
• NIH R21 grant to test NSM among Chinese
immigrants in RDU (Merli, Mouw, Verdery,
Moody, Keister, Sanders)
– Pilot various approaches to get referrals from
respondents
– Evaluate NSM against ACS
– Test multiple modes of data collection (in-person,
telephone, web)
48