Learning Objectives.docx

Learning Objectives
 Identify the sampling method and its potential limitations.
As mentioned in the introduction to this section, we will begin with the first stage of data production—
sampling. Our discussion will be framed around the following examples:
lightbulb_outline
#1
Suppose you want to determine the musical preferences of all students at
your university, based on a sample of students. Here are some examples of
the many possible ways to pursue this problem.
Post a music-lovers' survey on a university internet bulletin board, asking
students to vote for their favorite type of music.
This is an example of a volunteer sample, where individuals have selected
themselves to be included. Such a sample is almost guaranteed to be biased.
In general, volunteer samples tend to be composed of individuals who have a
particularly strong opinion about an issue, and are looking for an opportunity
to voice it. Whether the variable's values obtained from such a sample are
over- or under-stated, and to what extent, cannot be determined. As a result,
data obtained from a voluntary response sample is quite useless when you
think about the "Big Picture". This is because the sampled individuals only
provide information about themselves, and we cannot generalize to any
larger group at all.
Comment: It should be mentioned that in some cases volunteer samples are the only
ethical way to obtain a sample. In medical studies, for example, in which new
treatments are tested, subjects must choose to participate by signing a consent form
that highlights the potential risks and benefits. As we will discuss in the next module,
a volunteer sample is not so problematic in a study conducted for the purpose of
comparing several treatments.
lightbulb_outline
#2
Stand outside the Student Union, across from the Fine Arts Building, and ask
students passing by to respond to your question about musical preference.
This is an example of a convenience sample, where individuals happen to
be at the right time and place to suit the schedule of the researcher. In
general, convenience sampling produces a biased sample. In this case, the
proximity to the Fine Arts Building might result in a disproportionate number

of students favoring classical music. A convenience sample may also be
susceptible to bias because certain types of individuals are more likely to be
selected than others. In the extreme, some convenience samples are
designed in such a way that certain individuals have no chance at all of being
selected, as in the next example.
It should be mentioned that depending on what variable is being studied,
there are cases when a convenience sample may produce reliable data. If, for
example, instead of musical preference, the variable of interest is, ‘number of
siblings’, then the proximity to the Fine Arts building should not produce any
bias.
lightbulb_outline
#3
Ask your professors for email rosters of all the students in your classes.
Randomly sample some addresses, and email those students with your
question about musical preference.
Here is a case where the sampling frame - the list of potential individuals to
be sampled—does not match the population of interest. The population of
interest consists of all students at the university, whereas the sampling frame
consists of only your classmates. There may be bias arising because of this
discrepancy. For example, students with similar majors will tend to take the
same classes as you. Their musical preferences may also be somewhat
different from those of the general population of students. It is always best to
have the sampling frame match the population as closely as possible.
lightbulb_outline
#4
Obtain a student directory with email addresses of all the university's
students, and send the music poll to every 50th name on the list.
This is called systematic sampling. It may not be subject to any clear bias,
but it would not be as safe as taking a random sample.
If individuals are sampled completely at random, and without replacement,
then each group of a given size is just as likely to be selected as all the other
groups of that size. This is called a simple random sample (SRS). In
contrast, a systematic sample would not allow for sibling students to be
selected, because they have the same last name. In a simple random sample,
sibling students would have just as much of a chance of both being selected

as any other pair of students. Therefore, there may be subtle sources of bias
in using a systematic sampling plan.
lightbulb_outline
#5
Obtain a student directory with email addresses of all the university's
students, and send your music poll to a simple random sample of students.
As long as all of the students respond, then the sample is not subject to
any bias. It should succeed in being representative of the population of
interest.
But what if only 40% of those s Learning Objectives
So far we've discussed several sampling plans, and determined that a simple random sample is the only
one we discussed that is not subject to any bias.
A simple random sample is the easiest way to base a selection on randomness. There are other, more
sophisticated, sampling techniques that utilize randomness that are often preferable in real-life
circumstances. Any plan that relies on random selection is called a probability sampling plan (or
technique). The following three probability sampling plans are among the most commonly used:
 Simple Random Sampling is, as the name suggests, the simplest
probability sampling plan. It is equivalent to “selecting names out of
a hat.” Each individual has the same chance of being selected.
 Cluster Sampling—This sampling technique is used when our
population is naturally divided into groups (which we call clusters).
For example, all the students in a university are divided into majors;
all the nurses in a certain city are divided into hospitals; all
registered voters are divided into precincts (election districts). In
cluster sampling, we take a random sample of clusters, and use all
the individuals within the selected clusters as our sample. For
example, to get a sample of high-school seniors from a certain city,
you choose three high schools at random from among all high
schools in that city. Then use all the high school seniors in the three
selected high schools as your sample.
 Stratified Sampling—Stratified sampling is used when our
population is naturally divided into sub-populations, which we call
stratum (plural: strata). For example, all the students in a certain
college are divided by gender or by year in college. All the
registered voters in a certain city are divided by race. In stratified

sampling, we choose a simple random sample from each stratum,
and our sample consists of all these simple random samples put
together. For example, to get a random sample of high-school
seniors from a certain city, we choose a random sample of 25
seniors from each of the high schools in that city. Our sample
consists of all these samples put together.
Each probability sampling plan, if applied correctly, is not subject to any bias, and thus will produce
samples that represent well the population from which they were drawn.
Comment: Cluster vs. Stratified
Students sometimes get confused about the difference between cluster sampling and stratified sampling.
Even though both methods start out with the population somehow divided into groups, the two methods
are very different. In cluster sampling, we take a random sample of whole groups of individuals, while in
stratified sampling we take a simple random sample from each group. For example, say we want to
conduct a study on the sleeping habits of undergraduate students at a certain university and need to
obtain a sample. The students are naturally divided by majors, and let's say that in this university there
are 40 different majors. In cluster sampling, we would randomly choose, say, five majors (groups) out of
the 40, and use all the students in these five majors as our sample. In stratified sampling, we would
obtain a random sample of, say, 10 students from each of the 40 majors (groups), and use the 400
chosen students as the sample. Clearly in this example, stratified sampling is much better. The student’s
major might have an effect on the student's sleeping habits. Therefore, we would like to make sure that
we have representatives from all the different majors. We’ll stress this point again following the example
and activity.
lightbulb_outline
Example
Suppose you would like to study the job satisfaction of hospital nurses in a
certain city based on a sample. Besides taking a simple random sample, here
are two additional ways to obtain such a sample.
1. Suppose that the city has 10 hospitals. Choose one of the 10 hospitals at
random and interview all the nurses in that hospital regarding their job
satisfaction. This is an example of cluster sampling, in which the hospitals are
the clusters.
2. Choose a random sample of 50 nurses from each of the 10 hospitals and
interview these 50 * 10 = 500 regarding their job satisfaction. This is an
example of stratified sampling, in which each hospital is a stratum.
Did I Get This

Question 1
This is not a form; we suggest that you use the browse mode and read all
parts of the question carefully.
What sampling technique is being used in this scenario?
Voters are selected at random from an alphabetical list of all registered
voters.
Cluster sampling
Simple random sampling
Stratified sampling
Systematic sampling
Nextquestion
Cluster or Stratified—which one is better?
Let’s go back and revisit the job satisfaction of hospital nurses example and discuss the pros and cons of
the two sampling plans that are presented. Certainly, it will be much easier to conduct the study using
the cluster sample. All interviews are conducted in one hospital as opposed to the stratified sample, in
which the interviews need to be conducted in 10 different hospitals. However, the hospital that a nurse
works in probably has a direct impact on his/her job satisfaction. In that sense, getting data from just
one hospital might provide biased results. In this case, it will be very important to have representation
from all the city hospitals, and therefore the stratified sample is definitely preferable. On the other hand,
say that instead of job satisfaction, our study focuses on the age or weight of hospital nurses.
In this case, it is probably not as crucial to get representation from the different hospitals. Therefore, the
more easily obtained cluster sample might be preferable.
Comment:
Another commonly used sampling technique is multistage sampling, which is essentially a “complex
form” of sampling. When conducting cluster sampling, it might be unrealistic, or too expensive, to
sample all the individuals in the chosen clusters. In cases like this, it would make sense to have another
stage of sampling, in which you choose a sample from each of the randomly selected clusters. This is
why we use the term multistage sampling.
For example, say you would like to study the exercise habits of college students in the state of California.
You might choose eight colleges (clusters) at random, but you are certainly not going to use all the
students in these eight colleges as your sample. It is simply not realistic to conduct your study that way.
Instead, you move on to the second stage of your sampling plan. Here you choose a random sample of
100 males and a random sample of 100 females from each of the 8 colleges you selected in stage 1.
So in total you have 8 * (100+100) = 1,600 college students in your sample.
In this case, stage 1 was a cluster sample of eight colleges. Stage 2 was a stratified sample within each
college where the stratum was gender.

Multistage sampling can have more than two stages. For example, to obtain a random sample of
physicians in the United States, you choose 10 states at random (stage 1, cluster). From each state you
choose at random eight hospitals (stage 2, cluster). Finally, from each hospital, you choose five
physicians from each sub-specialty (stage 3, stratified).
Here is another example:
Did I Get This
An insurance industry research foundation wants to study the quality of care given to all patients at risk
for heart disease in the United States. Since not all those at risk seek treatment, the foundation randomly
selects 3,500 claims only from among those at-risk patients who were actually treated for chest pain. The
foundation obtains this sample in several stages. First, the foundation identifies five large companies that
represent a broad cross-section of patients, chooses two of the five at random, and gains access to the
claims of all the companies' patients. The two companies' claims are classified (depending on their origin)
to seven geographical regions (California, Florida, Great Lakes, Midwest, Northeast, Southern, and
Southwest), and within each region, five counties are selected that represent a continuum spanning rural,
suburban, and urban populations. (In total, then, patients from 35 counties are included).
Within each county (and for each company), claims of 25 male and 25 female patients treated for chest
pain are randomly selected for the study, for a total of 3,500 patients.
In this study:
Question 4
The population is:
a diverse group of U.S. patients treated for chest pain.
all U.S. patients at risk for heart disease.
a subset of 3,500 from a diverse group of U.S. patients treated for chest pain.
Question 5
The sampling frame is:
Question 6
The sample is:

For each stage in the multistage sampling plan of this study, identify the sampling technique that was
used:
Question 7
1) The research foundation identifies five large companies that represent a
broad cross-section of patients, chooses two of the five at random, and gains
access to the claims of all the companies' patients.
Cluster sampling
Stratified sampling
Question 8
2) The two companies' claims are classified (depending on their origin)
according to seven geographical regions, and within each region, the
sampling continues.
Cluster sampling
Stratified sampling
Question 9
3) From each region, five representative counties are selected. (In total, all
the claims originating from 35 counties are examined.)
Cluster sampling
Stratified sampling
Question 10

4) Within each county (and for each company), claims of 25 male and 25
female patients are randomly selected.
Cluster sampling
Stratified sampling
Alternative Versionopens in new tab
Types of Sampling
Some common types of sampling include simple random sample, cluster sample,
stratified sample, and multistage sample. Each has their own unique properties.
PreviousNext
 Alternative Versionopens in new tab
1 / 4
Simple Random Sampling - each individual has the same chance of being selected.
elected email you back with their vote?
The results of this poll would not necessarily be representative of the
population, because of the potential problems associated with volunteer
response. Since individuals are not compelled to respond, often a relatively
small subset take the trouble to participate. Volunteer response is not as
problematic as a volunteer sample (presented in example 1 above). However,
there is still a danger that those who do respond are different from those who
don't, with respect to the variable of interest. An improvement would be to
follow up with a second email, asking politely for students' cooperation. This
may boost the response rate, resulting in a sample that is fairly representative
of the entire population of interest. It may be the best that you can do, under
the circumstances. Nonresponse is still an issue, but at least you have
managed to reduce its impact on your results.
Sampling, Bias, and Generalizable
There are many ways to conduct research, but not all of them have the same effect.
There are different levels of bias and generalization to each kind of sampling.
PreviousNext
1 / 4

A researcher posts a survey to the Internet asking for people to respond to a survey. When
people are reached in this manner and self-select you have a volunteer sampling. Volunteer
sampling, biased, non-generalizable.
Learning Objectives
So far we've discussed several sampling plans, and determined that a simple random sample is the only
one we discussed that is not subject to any bias.
A simple random sample is the easiest way to base a selection on randomness. There are other, more
sophisticated, sampling techniques that utilize randomness that are often preferable in real-life
circumstances. Any plan that relies on random selection is called a probability sampling plan (or
technique). The following three probability sampling plans are among the most commonly used:
 Simple Random Sampling is, as the name suggests, the simplest
probability sampling plan. It is equivalent to “selecting names out of
a hat.” Each individual has the same chance of being selected.
 Cluster Sampling—This sampling technique is used when our
population is naturally divided into groups (which we call clusters).
For example, all the students in a university are divided into majors;
all the nurses in a certain city are divided into hospitals; all
registered voters are divided into precincts (election districts). In
cluster sampling, we take a random sample of clusters, and use all
the individuals within the selected clusters as our sample. For
example, to get a sample of high-school seniors from a certain city,
you choose three high schools at random from among all high
schools in that city. Then use all the high school seniors in the three
selected high schools as your sample.
 Stratified Sampling—Stratified sampling is used when our
population is naturally divided into sub-populations, which we call
stratum (plural: strata). For example, all the students in a certain
college are divided by gender or by year in college. All the
registered voters in a certain city are divided by race. In stratified
sampling, we choose a simple random sample from each stratum,
and our sample consists of all these simple random samples put
together. For example, to get a random sample of high-school
seniors from a certain city, we choose a random sample of 25
seniors from each of the high schools in that city. Our sample
consists of all these samples put together.

Each probability sampling plan, if applied correctly, is not subject to any bias, and thus will produce
samples that represent well the population from which they were drawn.
Comment: Cluster vs. Stratified
Students sometimes get confused about the difference between cluster sampling and stratified sampling.
Even though both methods start out with the population somehow divided into groups, the two methods
are very different. In cluster sampling, we take a random sample of whole groups of individuals, while in
stratified sampling we take a simple random sample from each group. For example, say we want to
conduct a study on the sleeping habits of undergraduate students at a certain university and need to
obtain a sample. The students are naturally divided by majors, and let's say that in this university there
are 40 different majors. In cluster sampling, we would randomly choose, say, five majors (groups) out of
the 40, and use all the students in these five majors as our sample. In stratified sampling, we would
obtain a random sample of, say, 10 students from each of the 40 majors (groups), and use the 400
chosen students as the sample. Clearly in this example, stratified sampling is much better. The student’s
major might have an effect on the student's sleeping habits. Therefore, we would like to make sure that
we have representatives from all the different majors. We’ll stress this point again following the example
and activity.
lightbulb_outline
Example
Suppose you would like to study the job satisfaction of hospital nurses in a
certain city based on a sample. Besides taking a simple random sample, here
are two additional ways to obtain such a sample.
1. Suppose that the city has 10 hospitals. Choose one of the 10 hospitals at
random and interview all the nurses in that hospital regarding their job
satisfaction. This is an example of cluster sampling, in which the hospitals are
the clusters.
2. Choose a random sample of 50 nurses from each of the 10 hospitals and
interview these 50 * 10 = 500 regarding their job satisfaction. This is an
example of stratified sampling, in which each hospital is a stratum.
Did I Get This
Question 1

What sampling technique is being used in this scenario?
Voters are selected at random from an alphabetical list of all registered
voters.
Cluster sampling
Stratified sampling
Systematic sampling
Nextquestion
Cluster or Stratified—which one is better?
Let’s go back and revisit the job satisfaction of hospital nurses example and discuss the pros and cons of
the two sampling plans that are presented. Certainly, it will be much easier to conduct the study using
the cluster sample. All interviews are conducted in one hospital as opposed to the stratified sample, in
which the interviews need to be conducted in 10 different hospitals. However, the hospital that a nurse
works in probably has a direct impact on his/her job satisfaction. In that sense, getting data from just
one hospital might provide biased results. In this case, it will be very important to have representation
from all the city hospitals, and therefore the stratified sample is definitely preferable. On the other hand,
say that instead of job satisfaction, our study focuses on the age or weight of hospital nurses.
In this case, it is probably not as crucial to get representation from the different hospitals. Therefore, the
more easily obtained cluster sample might be preferable.
Comment:
Another commonly used sampling technique is multistage sampling, which is essentially a “complex
form” of sampling. When conducting cluster sampling, it might be unrealistic, or too expensive, to
sample all the individuals in the chosen clusters. In cases like this, it would make sense to have another
stage of sampling, in which you choose a sample from each of the randomly selected clusters. This is
why we use the term multistage sampling.
For example, say you would like to study the exercise habits of college students in the state of California.
You might choose eight colleges (clusters) at random, but you are certainly not going to use all the
students in these eight colleges as your sample. It is simply not realistic to conduct your study that way.
Instead, you move on to the second stage of your sampling plan. Here you choose a random sample of
100 males and a random sample of 100 females from each of the 8 colleges you selected in stage 1.
So in total you have 8 * (100+100) = 1,600 college students in your sample.
In this case, stage 1 was a cluster sample of eight colleges. Stage 2 was a stratified sample within each
college where the stratum was gender.
Multistage sampling can have more than two stages. For example, to obtain a random sample of
physicians in the United States, you choose 10 states at random (stage 1, cluster). From each state you
choose at random eight hospitals (stage 2, cluster). Finally, from each hospital, you choose five
physicians from each sub-specialty (stage 3, stratified).
Did I Get This
An insurance industry research foundation wants to study the quality of care given to all patients at risk
for heart disease in the United States. Since not all those at risk seek treatment, the foundation randomly

selects 3,500 claims only from among those at-risk patients who were actually treated for chest pain. The
foundation obtains this sample in several stages. First, the foundation identifies five large companies that
represent a broad cross-section of patients, chooses two of the five at random, and gains access to the
claims of all the companies' patients. The two companies' claims are classified (depending on their origin)
to seven geographical regions (California, Florida, Great Lakes, Midwest, Northeast, Southern, and
Southwest), and within each region, five counties are selected that represent a continuum spanning rural,
suburban, and urban populations. (In total, then, patients from 35 counties are included).
Within each county (and for each company), claims of 25 male and 25 female patients treated for chest
pain are randomly selected for the study, for a total of 3,500 patients.
In this study:
Question 4
The population is:
Question 5
The sampling frame is:
Question 6
The sample is:
For each stage in the multistage sampling plan of this study, identify the sampling technique that was
used:
Question 7

1) The research foundation identifies five large companies that represent a
broad cross-section of patients, chooses two of the five at random, and gains
access to the claims of all the companies' patients.
Cluster sampling
Stratified sampling
Question 8
2) The two companies' claims are classified (depending on their origin)
according to seven geographical regions, and within each region, the
sampling continues.
Cluster sampling
Stratified sampling
Question 9
3) From each region, five representative counties are selected. (In total, all
the claims originating from 35 counties are examined.)
Cluster sampling
Stratified sampling
Question 10
4) Within each county (and for each company), claims of 25 male and 25
female patients are randomly selected.
Cluster sampling
Stratified sampling

Types of Sampling
Some common types of sampling include simple random sample, cluster sample,
stratified sample, and multistage sample. Each has their own unique properties.
PreviousNext
1 / 4
Simple Random Sampling - each individual has the same chance of being selected.
Learning Objectives
Comment: Sample size
So far, we have made no mention of sample size. Our first priority is to make sure the sample is
representative of the population, by using some form of probability sampling plan. Next, we focus on
sample size. To get a more precise idea of what values are taken by the variable of interest for the entire
population, a larger sample does a better job than a smaller one.
We will discuss the issue of sample size in more detail in the Inference unit. We will see how changes in
the sample size affect the conclusions we can draw about the population.
lightbulb_outline
Example
Suppose hospital administrators would like to find out how the staff would
rate the quality of food in the hospital cafeteria. Which of the four sampling
plans below would be best?
1. The person responsible for polling stands outside the cafeteria door and
asks the next 5 staff members who come out to give the food a rating on a
scale of 1 to 10.
2. The person responsible for polling stands outside the cafeteria door and
asks the next 50 staff members who come out to give the food a rating on a
scale of 1 to 10.
3. The person responsible for polling takes a random sample of 5 staff
members from the list of all those employed at the hospital. The pollster asks
them to rate the cafeteria food on a scale of 1 to 10.

4. The person responsible for polling takes a random sample of 50 staff
members from the list of all those employed at the hospital. The pollster asks
them to rate the cafeteria food on a scale of 1 to 10.
Plans 1 and 2 would be biased in favor of higher ratings, since staff members
with unfavorable opinions about cafeteria food would be likely to eat
elsewhere. Plan 3, since it is random, would be unbiased. However, with such
a small sample, you run the risk of including people who provide unusually
low or unusually high ratings. In other words, the average rating could vary
quite a bit depending on who happens to be included in that small sample.
Plan 4 would be best because the participants have been chosen at random to
avoid bias and the larger sample size provides more information about the
opinions of all hospital staff members.
lightbulb_outline
Example
Suppose a student enrolled in a statistics course is required to complete and
turn in several hundred homework problems throughout the semester. The
teaching assistant responsible for grading suggests the following plan to the
course professor: instead of grading all problems for each student, he will
grade a random sample of problems. He first offers, to grade a random
sample of just three problems for each student. This offer is not well-received
by the professor. The professor fears that such a small sample may not
provide a very precise estimate of a student's overall homework performance.
Students are particularly concerned that the random selection may happen to
include one or two problems on which they performed poorly, thereby
lowering their grade. The next offer, to grade a random sample of 25
problems for each student, is deemed acceptable by both the professor and
the students.
Comment
In practice, we are confronted with many trade-offs in statistics. A larger sample is more informative
about the population, but it is also more costly in terms of time and money. Researchers must make an
effort to keep their costs down. However, they must still obtain a sample that is large enough to allow
them to report fairly precise results.
Let's Summarize
In statistics, our goal is to use information obtained from a sample to draw conclusions about the
population of interest. The first step in this process is to obtain a sample of individuals that is truly
representative of the population. If this step is not carried out properly, then the sample is subject
to bias, a systematic tendency to misrepresent the variables of interest in the population.

Bias is almost guaranteed if a volunteer sample is used. If the individuals select themselves for the
study, they are often different in an important way from the individuals who did not volunteer.
A convenience sample, chosen because individuals were in the right place at the right time to suit the
researcher, is generally also biased. It may be different from the general population in a subtle, but
important way. However, for certain variables of interest, a convenience sample may still be fairly
representative.
The sampling frame of individuals from whom the sample is selected should match the population of
interest. Bias may result if parts of the population are systematically excluded.
Systematic sampling takes an organized (but not random) approach to the selection process. For
example, one can pick every 50th name on a list, or the first product to come off the production line each
hour. Just as with convenience sampling, there may be subtle sources of bias in such a plan, or it may be
adequate for the purpose at hand.
Most studies are subject to some degree of nonresponse. Nonresponse refers to individuals who do not
go along with the researchers' intention to include them in a study. If there are too many non-
respondents, and they are different from respondents in an important way, then the sample turns out to
be biased.
In general, bias may be eliminated (in theory), or at least greatly reduced (in practice), if researchers
implement a probability sampling plan that utilizes randomness.
The most basic probability sampling plan is a simple random sample. This is where every group of
individuals has the same chance of being selected as every other group of the same size. This is achieved
by sampling at random and without replacement. In a cluster sample, groups of individuals are
randomly selected, such as all people in the same household. In a cluster sample, all members of each
selected group participate in the study. A stratified sample divides the population into groups called
strata before selecting study participants at random from within those groups. Multistage
sampling makes the sampling process more manageable by working down from a large population to
successively smaller groups within the population. It takes advantage of stratifying along the way, and
sometimes finishes up with a cluster sample or a simple random sample.
Simple random sampling, cluster sampling, stratified sampling, and multistage sampling all
utilize randomness and therefore produce unbiased samples.
Assuming the various sources of bias have been avoided, researchers can learn more about the variables
of interest for the population by taking larger samples. The "extreme" (meaning, the largest possible
sample) would be to study every single individual in the population (the goal of a census). However, in
practice, such a design is rarely feasible. Instead, researchers must try to obtain the largest sample that
fits in their budget (in terms of both time and money). They must take great care that the sample is truly
representative of the population of interest.
Did I Get This
"Fill in the blank" question: select the
correct answer. - Question 1
The input and select elements may be inline with important contextual
information.

In order to assess how the students in his Intro Stats class feel about the
length of the homework assignments, Professor Meyer collects data from the
first 15 students who happen to attend office hours.
The sampling method in this case is -Select- stratified
sampling voluntary response sampling simple
random sampling cluster
sampling convenience sampling , which produces a
sample that is
-Select- biased unbiased -Select-.
information.
length of the homework assignments, Professor Meyer chooses a random
sample of five students from each of the course’s 12 lab sections.

sample that is
information.
length of the homework assignments, Professor Meyer posts a survey on the
course’s blackboard page and collects data from the students who chose to
complete the survey.

sample that is
information.
length of the homework assignments, Professor Meyer chooses at random
two of the 12 course’s lab sections and collects data from all 50 students who
are registered in those two sections.
sample that is

information.
length of the homework assignments, Professor Meyer chooses and collects
data from a random sample of 30 students from the entire class.
sample that is
Learning Objectives
The purpose of this activity is to show you how a simple random sample produces a sample that is not
subject to any bias. It is thus representative of the population from which it was selected. Also, we'll see
how a nonrandom sample can produce some sources of bias.
Background

Consider the population of all students at a large university taking introductory statistics courses (1,129
students taking statistics for business, social sciences, or natural sciences).
Suppose we are interested in the values of four specific variables for this population: handedness (right-
handed or left-handed), sex, SAT Verbal score, and age. If we were unable to determine the values of
those variables for the entire population, we may be able to take a random sample from that
population. We can then use the sample summaries as estimates for population summaries. Would the
random sample provide unbiased estimates for the population values?
Next, what if instead of taking a random sample, we sampled the 192 students who happen to be
enrolled in the business statistics course? First we will intuit, then check, if they would be a
representative sample with respect to each of the four variables: handedness, sex, SAT Verbal score, and
age. It may be helpful for you to know that, at this university, all students have comparable options in
terms of when they take introductory statistics. You should also know that women, on the whole, tend to
do somewhat better than men on the verbal portion of the SAT. In addition, business is a major that
tends to interest males more than females.
To summarize the goals for this activity, we will:
A. Verify that the distributions of the variables handedness, sex, SAT Verbal score, and age are roughly
the same for the random sample as they are for the population.
B. Intuit if the distributions of each variable in the (nonrandom) sample would be roughly the same as
those for the population. Intuit if there is a reason to expect any of the variables to be biased.
C. Check our intuition by comparing the distributions of each of the four variables for the sample of
business students with those for the population. We will also determine whether they are roughly the
same or if the sample values for any of the variables appear to be biased.
Our dataset contains data on the entire population of 1,129 students. The group of students includes
students taking introductory statistics who are majoring in the natural and social sciences, as well as
business majors.
Alternatives for this
content: R Statcrunch
Minitab_17 Minitab
_14 Express_Mac
Express_Windows Excel_2013_Windows
Excel_2011_Mac Excel_
2007 Excel_2003
Ti
R Instructions
To open R with the dataset preloaded, right-click here and choose "Save Target As" to
download the file to your computer. Then find the downloaded file and double-click it
to open it in R.
The data have been loaded into the variable "population." Enter the command

population
to see the data.
The dataset includes the following variables:
 Course: natural science, social science, or business
 Handed: right-handed or left-handed
 Sex: female or male
 Verbal : SAT Verbal scores up to 800
 Age: in years
First, we will take a simple random sample of the data. For the sake of consistency,
we will make the random sample the same size (192) as the nonrandom sample of
business statistics students that will be examined later.
To do this in R, copy the following command:
random_sample = population[sample(1:length(population$Course),192),];rando
m_sample
A. Now we will determine whether the four variables' behavior for the random sample is comparable to
their behavior for the population.
1. To compare the proportion of right-handed students in the sample to those in the population, create
two pie charts. One is for handedness in the population and the other is for handedness in the random
sample.
Minitab_17 Minitab
_14 Express_Mac
2007 Excel_2003
Ti
R Instructions
To do this in R, copy the commands:

random_sample_percent = 100*summary(random_sample$Handed)/length(ran
dom_sample$Handed);random_sample_percent;
pop_percent = 100*summary(population$Handed)/length(population $Handed);
pop_percent;
par(mfrow=c(1,2));
pie(pop_percent,labels=paste(c("left=","right="),round(pop_percent,0),"%"),main="Populat
ion");
pie(random_sample_percent,labels=paste(c("left=","right="),round(random_sample_perce
nt,0),"%"),main="Random Sample");
Note: Using R —Getting R to display 2 graphs at once requires the "par" command,
which tells R to display the next 2 graphs together.
Consider the distributions to be comparable if the sample proportion comes within about 5% of the
population proportion. Does it? (Use the text box in the first Learn By Doing exercise below to record
your answer.)
2. To compare the proportion of female students in the sample to the proportion in the population, create
two pie charts. One is for sex in the population and the other for sex in the random sample.
Minitab_17 Minitab
_14 Express_Mac
2007 Excel_2003
Ti
R Instructions
Use the commands above, but replace "
$Handed
" with "
$Sex
" and "

left
" with "
female
" and "
right
" with "
male
".
Consider the distributions to be comparable if the sample proportion comes within about 5% of
population proportion. Does it? (Use the text box in the first Learn By Doing exercise below to record
your answers.)
3. Create two descriptive statistics summary tables—one for SAT Verbal score in the population and one
for SAT Verbal score in the sample.
Minitab_17 Minitab
_14 Express_Mac
2007 Excel_2003
Ti
R Instructions
To do this in R, copy the following commands:
summary(population$Verbal)
summary(random_sample$Verbal)
Since SAT scores tend to follow a normal (symmetric) distribution, you can focus on means to make a
comparison. Consider the distributions to be comparable if the sample mean SAT Verbal score comes

within about 10 points of the population mean. Does it? (Use the text box in the first Learn By Doing
exercise below to record your answers.)
4. Create two more descriptive statistics summary tables—one for age in the population and one for age
in the sample.
Minitab_17 Minitab
_14 Express_Mac
2007 Excel_2003
Ti
R Instructions
To do this in R, just substitute "
$Age
" for "
$Verbal
" in the commands above.
Since Age tends to follow a right-skewed distribution, you should focus on medians to make a
comparison. Consider the distributions to be comparable if the sample median age comes within about .5
years of the population median. Does it?
Learn by Doing
Question 1
Place your findings from exercises 1 through 4 above in the text box below.
Submit and Compare, displayed below
B.
Learn by Doing
Question 2

For each of the variables—Handed, Sex, Verbal, and Age—decide whether or
not you believe the sample of business statistics students should be fairly
representative of the larger population of all students in introductory statistics
courses.
C. How representative is the (nonrandom) sample of students in the business statistics course, in
actuality? In order to answer this question, we will need to extract this group from the population.
Minitab_17 Minitab
_14 Express_Mac
Express_Windows Excel_2007
Excel_2003 Ti
Excel_2011_Mac Excel_2013_Windo
ws
R Instructions
To do this with R, execute the following command in R:
business = population[population$Course=="Business",];business
Next, we explore whether the four variables' behavior for the (nonrandom) sample of business statistics
students is comparable to their behavior for the population:
Minitab_17 Minitab
_14 Express_Mac
Express_Windows Excel_2007
Excel_2003 Ti
Excel_2011_Mac Excel_2013_Windo
ws
R Instructions

To do this in R, just use the commands above, but replace the variable "
random_sample
" with "
business
". In the pie chart commands, replace "
Random Sample
" with "
Business Students
".
1. To compare the proportion of right-handed students in the sample to those in the
population, create 2 pie charts, one for handedness in the population and one for
handedness in the sample of business statistics students.
Consider the distributions to be comparable if the sample proportion comes within
about 5% of the population proportion. Does it? (Use the text box in the Learn By
Doing exercise below to record your answers.)
2. To compare the proportion of female students in the sample to the proportion in the
population, create 2 pie charts (using the instructions above), one for sex in the
population and one for sex in the sample of business statistics students.
Consider the distributions to be comparable if the sample proportion comes within
about 5% of the population proportion. Does it? (Use the text box in the Learn By
Doing exercise below to record your answers.)
3. Create 2 tables of descriptive statistics (using the instructions above)—one for SAT
Verbal score in the population and one for SAT Verbal score in the sample of business
statistics students.
Since SAT scores tend to follow a normal (symmetric) distribution, you can focus on
means to make a comparison. Consider the distributions to be comparable if the
sample mean SAT Verbal score comes within about 10 points of the population mean.
Does it? (Use the text box in the Learn By Doing exercise below to record your
answers.)

4. Create 2 tables of descriptive statistics (using the instructions above)—one for age
in the population and one for age in the sample of business statistics students.
Since age tends to follow a right-skewed distribution, you should focus on medians to
make a comparison. Consider the distributions to be comparable if the sample median
age comes within about .5 years of the population median. Does it? (Use the text box
in the Learn By Doing exercise below to record your answers.)
Learn by Doing
Question 3
Place your findings from exercises 1 through 4 above in the text box below.
Module Checkpoint
This checkpoint will test your understanding of key learning objectives from the material you just learned.
In this short module we learned various techniques by which one can choose a sample of
individuals from an entire population to collect data from. This is seemingly a simple step in the
big picture of statistics. However, it turns out that it has a crucial effect on the conclusions we
can draw from the sample about the entire population (i.e., inference).
Generally speaking, a probability sampling plan (such as a simple random sample, cluster, or
stratified sampling) will result in a nonbiased sample, which can be safely used to make
inferences. Moreover, the inferential procedures that we will learn later in this course assume
that the sample was chosen at random. That being said, other (nonrandom) sampling techniques
are available, and sometimes using them is the best we can do. It is important to be aware of the
types of bias that they introduce. Thus you are also aware of the limitations of the conclusions
that can be drawn from the resulting samples.
Before You Continue
Before you move on, take a few minutes to evaluate and then rate your understanding of the
learning objectives covered in this section. Your responses are not graded; however, they are
available for your course instructor or mentor to review.
You may also highlight any concepts that are still unclear to you or that you would like to
explore further. This information may help to shape discussions with the person monitoring your
progress, such as a course instructor or mentor as well as other students.
Before You Continue
Question 1

Learning Objectives.docx

Recommended

Recommended

More Related Content

Similar to Learning Objectives.docx

Similar to Learning Objectives.docx (20)

More from bozo18

More from bozo18 (11)

Recently uploaded

Recently uploaded (20)

Learning Objectives.docx