1. Research Methods
Sampling and Generalizability
The Basics of Sampling
Population:
o Entire set of individuals or other entities to which study findings are generalized
o Does not necessarily have to be people
Population might also refer to:
Adults living in a geographical area (e.g. city, state), or working in
a given organization
Set a countries, corporations, government agencies, events, etc.
“All….” and countable
o Examples:
“All political science / sociology majors at MSU”
“All high schools in Kentucky”
o Important that population be carefully and fully defined and that it be relevant to
the research question being asked
Again, remember McDonald and Popkin’s work on the “Vanishing Voter”
Question: How did they re-conceptualize the traditional meaning
of a “population”
o Went from Voting Age Population (VEP) to Voting
Eligible Population (VEP)
It is normally infeasible to interview the entire population of anything about anything
o The U.S. census experiences an undercounting problem in many cities
It is better to select a few members of the population for further inquiry
o Known as a sample
Sample
o Any subset of units collected in some way from a population
o Data we use to actually test theories
Quality / Precision / Reliability of Sample based on:
Overall sample size
How members are chosen to be in sample size
o Will discuss this more later
Population or Sample?
Which should we really use?
Advantages of Sampling
o Time and Money
2. Disadvantages of Sampling
o Information based on sample is usually less accurate or more subject to error than
is important collected from a population
Some studies do not lend themselves to sampling
o Case Studies: Involve detailed examination of just one a few units
Sometimes you really don’t have a choice…
o At the end of the day, decision is usually made on practical grounds—due to time,
money, and other “research costs”
Example: State of the Union vs. State of the State Addresses
Show Stateline.org
Cannot analyze population of these gubernatorial speeches because
complete dataset does not exist before 2003
o Developing a better dataset—to get a little closer to
population—might end up being my dissertation
Fundamental Concepts
As we know, social scientists are mainly interested in certain characteristics about
populations such as differences between individuals, groups, societal relationships, etc.
Population Parameter
o Characteristics about a population that can be quantified as a number
Examples: Proportion, Mean/Average, etc.
Population Proportion = P
Population Mean = u (Greek mu)
Estimator
o Numerically estimates the value of population characteristic, or population
parameter
Sample Statistic
o An estimator of a population parameter derived from a population sample
Examples:
Sample Proportion = p
Sample Mean = Y-bar
Element
o Not hydrogen or helium…
o We know this better as a unit of analysis
o A single occurrence, realization, or instance of the objects or entities being
studied
Examples: Individuals, States, Cities, Countries, Political Speeches, Wars
3. Stratum
o We will discuss this more shortly—but for now, know that a population can be
stratified—or subdivided or broken up into groups of similar elements—before a
sample is drawn
o Each stratum is subgroup of a population that shares one or more characteristics
Examples:
MSU students stratified by class, major, or GPA
o Latin Graduation Honors are type of stratification
Cum Laude, Magna Cum Laude, Summa Cum
Laude
Teten stratified State of the Union addresses into:
o Founding Period
Washington to John Quincy Adams (1790 to 1825)
o Traditional Period
JQA to Taft (1825-1913)
o Modern Period
Wilson to Present (1913-)
Stratification based on word length of the
address
Sampling Frame
o Particular population from which sample is actually drawn
o Closer sampling frame is to population of interest or theoretical population, the
better off you are
Example: Mall
If you interview every nth person entering Fayette Mall about who
they are going to vote for in November, you are not going to get
entire population unless Anthony Davis is in Food Court signing
autographs and then everyone in Lexington will go to Mall
o Remember the Literary Digest Poll?
In 1936, the presidential election was between Republican Alfred Landon
(Kansas Governor) and FDR
o LR predicted that Landon would win, 55% to FDR’s 41%
In actuality, Landon only carried Maine and Vermont and won
a whopping eight electoral votes (1.5% of total)
Why did this happen?
o Comes down to polling techniques
Magazine sent out 10 million poll ballots and gotten a 24%
response rate (2.4 million respondents)
o However, they only surveyed their readership
Group with disposable income (because they could still afford
a subscription during Great Depression)
o Used two other lists for surveying:
4. Registered automobile users
Telephone users
Question: While this could probably be considered
statistically significant today, what was the big problem
in 1936?
These groups had high incomes, which made it
much more likely that they would vote for
Republican candidate
o Sampling frame ended up oversampling wealthy and GOP
Were not factoring in Great Depression and fact Hoover was doing next to
nothing to help
Lots of poor people voted, and they overwhelmingly voted for
FDR
o If sampling frame is incomplete or inappropriate, then sample bias will occur
Sample will be unrepresentative of the population, and inaccurate
conclusions may result
Sample bias may also be caused by a biased selection of elements, even if
frame is complete and accurate
Sampling Unit
o Entity listed in a sampling frame
Can be thought of like an element
Types of Samples
Basic differences made between different sampling types due to how data is collected
Probability Sample
o Sample for which each element in the total population has a known probability of
being included in the sample
Can calculate how accurately sample reflects population from which it is
known
Nonprobability Sample
o Sample in which each element in the total population has an unknown probability
of being selected
Without knowing probability, you cannot use statistical theory to make
inferences about population
5. PROBABILITY SAMPLES
Simple Random Sample (SRS)
Each element and combination of elements has an equal chance of being selected
o What has to happen for this to occur?
A list of all elements in the population must be available
A method for selecting those elements must be used that ensure each
element has equal chance of being selected
While seemingly simple, drawing a true SRS can be difficult
Class Activity:
o Write down list of random numbers for 30 seconds
o How “random” are the lists?
Example: Vietnam Draft
As Vietnam War continued and opposition to national policies
grew (LBJ), need to make draft process fairer so that all men—not
just poor and minorities—would have chance to serve
Because you could not go out and pick men at random, the
Selective Service began lottery system
Likelihood that man would be drafted was determined randomly
by writing every day of year on individuals slips of paper, placing
slips in separate capsules, and putting all capsules in a barrel
o VIDEO: “The Draft Lottery—Vietnam War”
Selective Service estimated that anyone w/ number higher than 200
would not be called; process seemed fair
However, people found negative correlation between day of birth
and draft number
o If you were born in later months of year, you had more
change to serve than people born in early months
Capsules were probably not mixed well
One way to get around this issue is to assign number to each element in sampling frame,
and then use random numbers generator
o Simply list of random numbers
o Suppose we had list of all 500 MSU political science majors, and we wanted to
randomly sample 10 to ask their thoughts about our program
We would have to number each person: 1, 2, 3…
Then we start at random place in random numbers table and start selecting
numbers (if same number twice, we ignore it)
Would have # 463, #335, #658, #618, #161, #543… as subjects
Since SRS only requires list of population members, we could use it to survey members
of Congress, all countries in world, or cities with more than 50,000 people
6. Systematic Sample
Elements selected from list at predetermined intervals
o May be easier than random number generator but still requires list of target
population
In a systematic sample, every Kth element on list is selected
o K is number that will result in desired number of elements being chosen
K = Sampling Interval or “skip” between elements
K = Population Size (N) / Sample size (n)
o Example: Had 25 people and wanted to sample five, you could sample #2, 7, 12,
17, and 22
Useful when dealing with long list of population elements (e.g. all SC justices)
Often used in product testing
o Example: Working at JIF plant, job is to check that lids screwed on
Would make sense to simply sample every 5th
lid
Stratified Sample
Probability sample where elements sharing one or more characteristics are grouped
o Elements are selected from each group in proportion to group’s representation to
total population
Two Main Types:
o Proportionate Sample
Stratified sample were each stratum represented in proportion to its size in
population
Example: Imagine that there were 500 members in Congress, with
six parties:
o Blue Party – 100 Members Sample 20
o Red Party – 100 Members Sample 20
o Green Party – 50 members Sample 10
o White Party – 150 members Sample 30
o Brown Party – 50 members Sample 10
o Black Party – 50 members Sample 10
Say we wanted to sample 100 of these members on
an upcoming policy issue
First have to calculate sampling fraction
100 / 500 = 1/ 5 (Refer to Sampling Counts above)
Helps issue of SRS, where all 100 might come from White Party
o Disproportionate Sample
Stratified sample where each stratum is not represented in proportion to its
size in population
Example: Thinking about differences between racial groups in
US—number in sample for one race might be too small to make
valid inferences
o Sample disproportionately to get enough of that race
Issue of Weighting
7. Cluster Samples
Probability sample in which sampling frame initially consists of clusters of elements
Groups / clusters of elements are identified and listed as sampling units
o Within each sampling unit, certain elements are identified and sampled
Happens a lot when dealing with public opinion polling
o Step 1: Get Murray map and identify city blocks
This becomes sampling frame
o Step 2: Sample (either randomly or systemically) smaller number of blocks
o Step 3: Go to selected blocks and list all houses on block
o Step 4: Sample list of households to actually interview
Advantages of Cluster Sampling
o Allows researchers to get around problem of acquiring list of elements in target
population
o Reduces fieldwork costs for public opinion surveys (people closer together)
Disadvantage of Cluster Sampling
o Greater level of imprecision
Error arises at each stage of cluster sample
Example: Sample of city blocks will not necessarily be
representative of all city blocks
Systematic, stratified, and cluster samples are better than SRS
NONPROBABILITY SAMPLES
Nonprobability Sample
o Sample in which each element in the total population has an unknown probability
of being selected
o Probability samples are preferred because they more accurately represent
population and thus, can better calculate estimated values closer to population
Purposive Sample
Researcher exercises considerable discretion over what observations to study
Goal: To study a diverse and usually limited number of observations
Example: Fenno and Home Style
o Describes behavior of 18 incumbent representatives in Congress
Convenience Sample
Elements are included because they are convenient or easy for a researcher to study
o Example: Studying those State of the State Addresses found on Stateline.org
Used for exploratory research or when target population is impossible to define / locate
8. Quota Sample
Sample in which elements are sampled in proportion to their representation in population
Similar to proportionate stratified sampling, but elements are quota sample are NOT
chosen in reasoned or probabilistic manner
o Chosen in convenience fashion until each type of element (quota) has been
reached
Leads to biased and inaccurate measures of target population
Example: 1948 Gallup Poll used quota sampling and predicted that Thomas Dewey,
Republican governor of New York would beat incumbent President Harry S Truman
Snowball Sample
Initial respondents are used to identify others who might quality for inclusion into
sample
o Asked to provide names for further surveying / interviewing
Useful when trying to study members in a typically elusive population:
o Draft Dodgers
o Political Protestors
o Drug Users