This document provides an overview of survey sampling techniques. It discusses why samples are used instead of censuses, defines key terms like population and sample, and describes different probability and non-probability sampling methods. The central limit theorem states that as sample size increases, the sample mean will converge on the population mean. There are sampling errors from using a sample instead of a census, as well as non-sampling errors from things like measurement error. Steps in sampling include defining the population, finding a sampling frame, drawing a random sample, and ensuring it is representative and unbiased.
2. Survey Research
• Employs questionnaires
and interviews to ask
people to provide
information about
themselves - their
attitudes, beliefs,
opinions, demographics,
behaviors, etc...
3. Why not a Census
Samples are used instead of collecting data from the
entire population (that is, conducting a census) because
sampling
• is less costly
• can be completed in less time
• is the only choice when it is not possible to measure
the entire population
4. Definitions
• Sample is a subset of a
population
• Sampling is the process of
choosing members of a
population to be included in the
sample
• Census is a sample consisting
of the entire population
6. Central Limit Theorem
• The sample mean =
population mean
• Standard error
• The extent to which the
sample will be like the
population
• Probability theory says
that 95% of sample will
fall 1 to 2 standard
deviations from the mean
7. Sampling
and non-sampling errors
• Sampling Error
• Coverage error reflects the extent to which your frame
misses your target
• Non-sampling error results from selection bias or
measurement error (or other sources)
9. Steps in sampling
1. Define population
2. Find sampling frame
• List of elements from which the population is going to
be selected
1. Draw sample – randomly.
11. Probability Sampling
• Population is known
• Avoid researchers conscious or unconscious biases
• Random sample from a list containing the names of
everyone in the population
• To select a set of elements from a population in such a
way that accurately portrays the total population
• Each element has an equal chance of selection
12. Selection Biases
• Convenience sample: easiest to reach available units;
may not reflect harder to reach or non-responding units
• Judgment or subjective sample selected by a researcher
• Multiplicity listings in the sampling frame (example:
multiple phone numbers)
• Substitution: if a person is not available, ask a family
member or select another available person
• Allowing sample to consist entirely of volunteers
13. Sampling frame weaknesses
• Undercoverage is failure to include all units from the
target population in the sampling frame (example:
people without land telephone lines, unlisted
numbers; gated communities; prisons; hospitals;
internet surveys)
14. Types of Probability Samples
• Simple random sample
• Systematic sample
• Stratified random sample
• Cluster sample
15. Simple Random Sample
• All individuals in
population have an equal
probability of being
sampled
• Units are selected at
random
16. Systematic Sample
• Starting point is selected from a
list of population units using a
random number
• That unit, and every k-th unit on
the list thereafter are chosen to
form the sample
• Sample elements are equally
spaced in the list
• When units in the population list
are mixed (random order), then
systematic sample behaves much
like SRS
• When population list is not
mixed, then systematic sample
may not be representative of the
population (e.g., order bias)
17. Stratified Random Sampling
• Population is divided into groups
called strata
• SRS is selected from each stratum
• Strata are often subgroups of interest
such as males or females, regions of
the country, companies of certain size
• Examples: majors, ethnic group
• Random sample within strata
• Oversample to get a good
representation of small groups
18. Cluster sampling
• Used when you can’t identify
individuals
• Randomly sample clusters of
people in identifiable groups (e.g.,
classes in a school, churches,
families, etc.)
• Collect data from all people within
the clusters that are sampled
19. Sampling
• Number name
• Random number
• Alphabetical list
(systematic)
• Divide the population in
strata
• 600 men & 400 men
• 100 = n
• 600 X .10 and 400 X .10
21. Nonprobability Sampling
• A type of sampling procedure in which one cannot
specify the probability that any member of the
population will be included in the sample
• Accidental or convenience sample
• Volunteers
• Introduces biases – big problem when people select
themselves to be part of the survey (return a
magazine survey, for example)
22. Snowball sample
• Hard to reach or identify population
• Identify a few people who recommend other people Â
25. Exemplary Surveys
• Detroit Area Survey (DAS)
• European Social Survey
(ESS)
• General Social Survey (GSS)
• International Social Survey
• Los Angeles Family and
Neighborhood Survey (LA
FANS)
•  National Longitudinal Study
of Adolescent Health (Add
Health)
• Panel Study of Income
Dynamics (PSID)
• Phoenix Area Social Survey
(PASS)
• Project on Human
Development in Chicago
Neighborhoods (PHDCN)
• World Values Survey (WSS)
27. Study design
• Cross-sectional
• Longitudinal
• Over time
• Trend
• Different samples at different points
• Panel
• Same sample over different points
28. Response Rate
• Why people don’t respond
• Participants may never receive the invitation to participate
• Some people simply refuse
• Some people are physically or mentally unable to do so
•  Why people do respond
• Costs vs. rewards of participating
• Individual interest in research project
• Previous participation in research
29. Item nonresponse
• Eyes skip items while reading
• Interviewers can either foster pro-response or anti-response
behavior in participants
• Topic of the item (more sensitive items are responded to less
often)
• Survey structure
• Difficulty of the item
• Institutional requirements and policies
• Personal reasons
32. Questionnaire Development
• Question order
• Design
• Group items with like anchors together
•  Keep like questions and contexts together
• Place very sensitive questions toward the middle &
end
34. Pilot Test
• Use actual survey population members
• Anticipate survey context
• Test parts of the survey
• Determining a pilot sample size
• Ask questions after someone completes the survey
• See how the data falls
35. Evaluate Questions
• Simple and direct
• Avoid double-barreled
• Avoid loaded questions
• Avoid negative wording
36. Survey Exercise
• Book exercises
• Evaluate and adjust the survey based your
understanding of a quality questionnaire
37. Errors
• A scale to measure a latent variable
• Recall bias
• Social desirability
• Neutral option
• Forced-choice answers
Development of questionnaire design, sampling methods, and data collection methods… Survey research typically focused on the measurement of attitudes and opinions… if that is what you are interest… then survye research os the best moethdological approach. Measuring intelligence was first effort to question wording.
However, you do not need inferential statistics …. Because I am not making inferences to the population. A breakthrough in survey research came probability sampling…
Research uses data from a sample to make inferences about a population
Every member has an equal chance of being selected.
The standard deviation is deviation from the mean. Thus, the larger deviations from the mean, the larger the SD. The small deviations from the mean, the smaller SD. Thus, if there is a larger SD, the less accurately that our sample reflects the population. We have depend upon probability theory. If sample is distributed the same way as the population, your sample should represent the population… statistics summarize the population The degree of confidence we place in the responses people give us. Convention tells us we want a 95% level of confidence that our sample represents the population
Errors form your estimation proceducare two types of errors… toa what extent the population mean differs from sampling mean .. Allows us use statistics summaries of the population Non-sampling – happiness scale… know something about the items measurement error Sampling error – some people are missed in the sampling process, low chance or too high chance of being selected.
N=1500 to represent the nation for precision…. 400 Reduce random error In a probability sample, it depends upon three factors., Larger sample sizes provide more accurate estimates of the characteristics of the population. Confidence interval – specify where the population value probably lies. CL is the probability lies within a confidence interval. 95% confident that between 35 and 35 percent of all voters favor candidate A. Greater diversity equates to greater chance for error
– who are you trying to measure… how will the potential sample members be identified and selected? How will you recruit your sample? How well a sample represents a population depends on the sample frame , ..... the list I – list of sampled people. When a list was obtained is an important factor in designing study., organizational employees, newspapers, professional association, Easier lists are chosen over harder ones (p.208-babbie) Everyone has an equal chance of being included.
Two types of sampling… statistics applied to probability samples
Avoid researchers conscious or unconscious biases… all elements should have en equal chance of being selected Statistical analysis rests on the normal distribution. Each element has a know probability of being sampled. Drawing a random from a known sample. You have to know the population that you are sampling. If you don’t, the nonprobability sample, you cannot use statistics to analyze the data because you are not generalizing to the population. Â
Sampling protects us from our biases and ensure our statistics are accurate. You can always use them, but if your sample is not representative, your statistics are meaningless. Sample should closely reflect the population – otherwise there is sampling bias. When members are given no chance of being selected. Sampling bias is removed when all members get an equal chance of being selected. It could inherently biased. People on the street List of phone numbers – more than one phone number Call person is not available, but wife will speak for husband Open the phone line on the news, people will call back… funding for fire department These are nonprobablistic samples
Phone book does not contain all people who are registered to vote. People who do not have phones.
once sampling frame is established, then you can select the sample The sample of individuals must contain the same variations of the entire population… representativeness…
Put a bunch of pieces of paper in a bag and mix them up -most important for stats… Operationally, drawing a simple random sample requires a numbered list of the population. 8 units are randomly sampled from the survey population of 400. dvantage: Simplicity. Disadvantage: It may not be as accurate as stratified sampling or as cheap as cluster sampling.  For simplicity, assume that each person in the population appears once and only once. If there were 8,500 people on a list, and the goal was to select a simple random sample of 100, the procedure would be straightforward. People on the list would be numbered from 1 to 8,500. Then a computer, a table of random numbers, or some other generator of random numberswould be used to produce 100 different numbers within the same range. The individuals corresponding to the 100 numbers chosen would constitute a simple random sample of that population of 8,500. If the list is in a computerized data file, randomizing the ordering of the list, then choosing the first 100 people on the reordered list, would produce an equivalent result.
In the following diagram, every 10 th unit is sampled. Fall prey to the order of the list… . You are dividing elements by K. Your random start number will pick a particular group. A population of 1, 2,3,4,5,6,7 Select every 2 nd element. My random is start is 2. 2,4,6… if 3, 3,5,7… You can only select one group. Quite popular because as a statistician… design a sampling plan… women cam from Japan will their husband worked a manufacture company… mental health…. Selected a list from the clinic. Only an employer of the clinic can see the patient list. I have to design a sampling procedure that can be implement by the clinic. These people are not statisticians. I have to give a simple procedure.. Tell them the random start… the tell how many patients there and I tell them 100 people. Its behavior represents a SRS. If the outcome of the study regarding how the list was made (such as alphabetical)
Advantage: Accuracy. Disadvantage: Requires prior information about the population being sampled. What is the sample is not homogeneuous. You could first organize by college classes (freshmen) and then randomly select within a sample Divide into groups – males to females, income, level education, geography, draw a random sample from each strata You need to stratify to ensure that you enough cases to analyze – Indians. You have the option to get the cases of minority groups in the survey. People can only be in only strata – mutually exclusive.
In the following diagram, 3 of the 16 available clusters were randomly sampled. -Household. Interview the entire household rather than one father, one mother Aggregate unit into larger sampling units. Households… select a sample of clusters. Household, you can collect every member in the household…. Or select cluster and select sample within the cluster. Analysis method that are appropriate to clsuters… telephone wire distance. How do you know your estimate is right? You select randomly 5 tables in the cafeteria, and then you interview every person sitting at each of these five tables. Cluster sample; probabilistic.
TO avoid order bias 1000 sample 10% of the sample Cluster is single participant, five clusters are drawn… sample of 5
Samples of convenience -Worst research, we are using one today, members that are accessible Volunteers have certain characteristics
Prostitutes. You need to access networks. People recommending other people in waves. Not a probability sample
Based on the researcher’s judgment about which ones are most useful or reprs\\esentative These are also known as convenience samples. You don’t know what population they represent. Technically, you cannot use stats. You are violating randomness. People do all the time.
Each of the surveys below has a web presence and you can find the web site of the study itself and usually other web sources that are informative about the survey. For summary descriptions and additional information about downloading data, you can also use ICPSR. http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp
Credible, personalized, interesting, contact info (sponsor) $2-$5 increase response rates, gift cards Post card, post card followups, survey with self-addressed envelope
Means it is just a snapshot… Study the sample people at two or more points in time Allows observation of change
Rates are notoriously low… very difficult to work with humans Percentage of those sampled that complete the survey Low response rate may indicate sampling bias problems Usually lower for mail surveys, higher for telephone interviews
Inadequate comprehension Unwillingness to disclose information  Survey structure  People often do not respond to open-ended questions  People often skip items not relevant to them and end up skipping others relevant to them at the same time  Personal reasons Older and less educated people are less likely to respond  Reluctant participants more likely to skip questions
Only a couple open-ended questions
The first question shows what the purpose of the survey is White space matrix
Understanding of terms, range of experiences, scale development Think aloud as they answer questions Expert panel is the least expensive method and the most ciritcal
What is a pilot test n-30
I tell my students that I want no talking and then pass out a survey about internet usage (download it here). Every question on the survey is either double barreled, leading, biased, or has response options that make no sense or overlap. As a class we go through each question picking it apart. We then formulate new questions that don't violate any of the basic survey design rules.
Multiple questions about one concept… overlapping increase accuracy Time --- remember times over a month… times during a day… how many times do you check facebook No opinion, Don’ t Know,