Introduction
Introduction to Populations and Samples
It would take too long and cost too much money to test the qualityof every piece of cereal made at a factory. Instead, a small sample ofeach batch is tested.
Wouldn't it be great if we could ask everyone in the world their opinion on atopic? What if we could have every person take a psychological test of interest sowe can assemble the most accurate data? How can we make sure that we includeevery man, woman, child, race, ethnicity, socioeconomic status, class, religion,occupation, or other demographic of interest in any study we conduct? We wantto make sure that the data we collect is as good as we can get under the givencircumstances. Because we cannot include everyone of interest in a study, wemust make sure our sample, or the group of those who participate in our study, isas close to "looking" like the population, or the entire collection of people ofinterest, as possible.
Consider this example. You are doing a study on the differences between men andwomen regarding their ability to follow directions. If you collected data from allmales and all females in the world—which would be the entire population,because sex is our main variable of interest—you would get an extremely accurateresult. However, it would be unrealistic, time consuming, and costly to collect thisdata. You could, however, take a sample of males and females and study them. If you choose a good sample, the results of your study can yieldan accurate representation of the population.
Collecting a sample that closely resembles the population we are interested in is an important component of conducting research. Muchconsideration must be given to the individuals you want to choose for your sample and how to ensure that your sample represents thepopulation. By choosing a good sample, we can make certain assumptions about the population, just as if we had selected everyone in thatpopulation. This is the focus of sampling: to select an appropriate cross-section of the population that will accurately represent the entirepopulation.
In the following lesson you will learn how to sample a population using a range of sampling methods. Be sure to pay specific attention to theadvantages and disadvantages of each method and when each is most useful.
Applying Knowledge of Populations and Samples
Populations and Samples in Ashford Courses
You will need to understand sample and population in a range of graduate courses, including those with a focus on psychological ororganizational assessment and testing, measurement, research methods, and statistics. In these courses you will need to be able to identify anddescribe the population of interest, how a sample was obtained, and the sampling methods used. These topics are important in understandinghow assessment or test results can be used or interpreted based on population norms, and how to conduct a study that does not suffer fromsampling biases or errors. In addition, having knowledge and s ...
APM Welcome, APM North West Network Conference, Synergies Across Sectors
IntroductionIntroduction to Populations and SamplesIt wo.docx
1. Introduction
Introduction to Populations and Samples
It would take too long and cost too much money to test the quali
tyof every piece of cereal made at a factory. Instead, a small sa
mple ofeach batch is tested.
Wouldn't it be great if we could ask everyone in the world their
opinion on atopic? What if we could have every person take a ps
ychological test of interest sowe can assemble the most accurate
data? How can we make sure that we includeevery man, woman
, child, race, ethnicity, socioeconomic status, class, religion,occ
upation, or other demographic of interest in any study we condu
ct? We wantto make sure that the data we collect is as good as
we can get under the givencircumstances. Because we cannot in
clude everyone of interest in a study, wemust make sure our sam
ple, or the group of those who participate in our study, isas clos
e to "looking" like the population, or the entire collection of pe
ople ofinterest, as possible.
Consider this example. You are doing a study on the differences
between men andwomen regarding their ability to follow directi
ons. If you collected data from allmales and all females in the w
orld—
which would be the entire population,because sex is our main va
riable of interest—
you would get an extremely accurateresult. However, it would b
e unrealistic, time consuming, and costly to collect thisdata. Yo
u could, however, take a sample of males and females and study
them. If you choose a good sample, the results of your study ca
n yieldan accurate representation of the population.
Collecting a sample that closely resembles the population we ar
e interested in is an important component of conducting researc
h. Muchconsideration must be given to the individuals you want
2. to choose for your sample and how to ensure that your sample r
epresents thepopulation. By choosing a good sample, we can ma
ke certain assumptions about the population, just as if we had se
lected everyone in thatpopulation. This is the focus of sampling:
to select an appropriate cross-
section of the population that will accurately represent the entir
epopulation.
In the following lesson you will learn how to sample a populatio
n using a range of sampling methods. Be sure to pay specific att
ention to theadvantages and disadvantages of each method and
when each is most useful.
Applying Knowledge of Populations and Samples
Populations and Samples in Ashford Courses
You will need to understand sample and population in a range o
f graduate courses, including those with a focus on psychologic
al ororganizational assessment and testing, measurement, resear
ch methods, and statistics. In these courses you will need to be
able to identify anddescribe the population of interest, how a sa
mple was obtained, and the sampling methods used. These topic
s are important in understandinghow assessment or test results c
an be used or interpreted based on population norms, and how to
conduct a study that does not suffer fromsampling biases or err
ors. In addition, having knowledge and skills in this area will he
lp you better understand and evaluate the methods,results, and d
iscussion sections of the research literature you may be asked to
evaluate for various courses.
Populations and Samples in Graduate Research
The topics of sample and population are relevant to every resear
ch study. When conducting research it is important to determine
and evaluatethe relevant population and appropriate sampling
methods that will provide a representative sample. By doing this
, you will increase thelikelihood that the results can be used to a
ccurately describe the entire population. Sampling methods tend
to be a source of criticism or thebasis for various limitations in
research studies, as we can never achieve the perfect sample, ei
3. ther in size or characteristics. Therefore, it isimportant for you t
o acknowledge these limitations in any research results.
Populations and Samples in the Professional World
In order to apply research findings in the professional world, we
must be able to evaluate the sample from which the results wer
e obtained,how the sample represents the population, and how t
he sample relates to the individuals we are interested in. In addi
tion, understanding thesample upon which a test (e.g., screening
, intelligence) was developed is important, because this informs
whether it is legal and/or ethical, forexample, to apply the test
when making employment decisions.
Introduction
Introduction to Populations and Samples
It would take too long and cost too much money to test the quali
tyof every piece of cereal made at a factory. Instead, a small sa
mple ofeach batch is tested.
Wouldn't it be great if we could ask everyone in the world their
opinion on atopic? What if we could have every person take a ps
ychological test of interest sowe can assemble the most accurate
data? How can we make sure that we includeevery man, woman
, child, race, ethnicity, socioeconomic status, class, religion,occ
upation, or other demographic of interest in any study we condu
ct? We wantto make sure that the data we collect is as good as
we can get under the givencircumstances. Because we cannot in
clude everyone of interest in a study, wemust make sure our sam
ple, or the group of those who participate in our study, isas clos
e to "looking" like the population, or the entire collection of pe
ople ofinterest, as possible.
Consider this example. You are doing a study on the differences
between men andwomen regarding their ability to follow directi
ons. If you collected data from allmales and all females in the w
orld—
which would be the entire population,because sex is our main va
4. riable of interest—
you would get an extremely accurateresult. However, it would b
e unrealistic, time consuming, and costly to collect thisdata. Yo
u could, however, take a sample of males and females and study
them. If you choose a good sample, the results of your study ca
n yieldan accurate representation of the population.
Collecting a sample that closely resembles the population we ar
e interested in is an important component of conducting researc
h. Muchconsideration must be given to the individuals you want
to choose for your sample and how to ensure that your sample r
epresents thepopulation. By choosing a good sample, we can ma
ke certain assumptions about the population, just as if we had se
lected everyone in thatpopulation. This is the focus of sampling:
to select an appropriate cross-
section of the population that will accurately represent the entir
epopulation.
In the following lesson you will learn how to sample a populatio
n using a range of sampling methods. Be sure to pay specific att
ention to theadvantages and disadvantages of each method and
when each is most useful.
Applying Knowledge of Populations and Samples
Populations and Samples in Ashford Courses
You will need to understand sample and population in a range o
f graduate courses, including those with a focus on psychologic
al ororganizational assessment and testing, measurement, resear
ch methods, and statistics. In these courses you will need to be
able to identify anddescribe the population of interest, how a sa
mple was obtained, and the sampling methods used. These topic
s are important in understandinghow assessment or test results c
an be used or interpreted based on population norms, and how to
conduct a study that does not suffer fromsampling biases or err
ors. In addition, having knowledge and skills in this area will he
lp you better understand and evaluate the methods,results, and d
iscussion sections of the research literature you may be asked to
evaluate for various courses.
5. Populations and Samples in Graduate Research
The topics of sample and population are relevant to every resear
ch study. When conducting research it is important to determine
and evaluatethe relevant population and appropriate sampling
methods that will provide a representative sample. By doing this
, you will increase thelikelihood that the results can be used to a
ccurately describe the entire population. Sampling methods tend
to be a source of criticism or thebasis for various limitations in
research studies, as we can never achieve the perfect sample, ei
ther in size or characteristics. Therefore, it isimportant for you t
o acknowledge these limitations in any research results.
Populations and Samples in the Professional World
In order to apply research findings in the professional world, we
must be able to evaluate the sample from which the results wer
e obtained,how the sample represents the population, and how t
he sample relates to the individuals we are interested in. In addi
tion, understanding thesample upon which a test (e.g., screening
, intelligence) was developed is important, because this informs
whether it is legal and/or ethical, forexample, to apply the test
when making employment decisions.
Introduction
Introduction to Variables and Measurement
iStockphoto/thinkstock
A physician collects data on qualitative and quantitative variabl
es ashe works with a company to develop a corporate health and
wellnessplan.
Variables are what you are interested in looking at in a study. A
variable issomething you plan to observe, manipulate, test, reco
rd, or evaluate. Asresearchers we need to know how to describe
variables accurately in ourcommunications with others, and how
to understand the impact a study'svariables have on the statistic
s we run and conclusions we come to.
If you are studying the impact of training on the leadership skill
s of female chiefexecutive officers (CEOs), then your variables
6. would be training and leadershipskills, because those are the ele
ments you are most interested in and what youwill be manipulati
ng (training) and measuring (lead
ership skills). The population of interest is female CEOs, and yo
u will be collectingdata from a sample of these individuals usin
g a method described in Lesson 1,Populations and Samples. Cha
racteristics of this sample, such as gender and title,are not varia
bles because these traits are shared among all participants in the
study (all are female and all are CEOs) and are not being manip
ulated ormeasured in any way.
One of the most basic ways to describe variables is in terms of t
heir qualitative or quantitative properties. Qualitative variables
classify items bycategory. For example, sex, ranking of students
in a graduating class, race, top five favorite foods, and eye colo
r are variables that fit intocategories; they measure the nature of
something, and not a specific numerical value.
Quantitative variables classify items using numerical values, wi
th which we can perform a range of mathematical functions. For
example,number of items correct or incorrect, weight, time, psy
chological test scores, and temperature are variables that measur
e the numerical value ofsomething.
Throughout this lesson you will learn more about how to describ
e variables, including identifying the scale of measurement that
applies to avariable of interest. You may still be wondering, wh
y do we need to learn about scales of measurement, or any other
characteristic of a variablefor that matter? To reiterate, being a
ble to identify the various characteristics of variables helps to i
dentify the statistics you can calculate anduse to interpret your f
indings. For example, it would not make sense to take a mean or
average for the variable of sex, which is defined by oneof two c
ategories: male or female. Sex is a qualitative variable, and is a
n example of a nominal scale of measurement. There is no numb
er bywhich we can represent this variable; each person falls into
one of the two categories. This is only one example of the impo
rtance that avariable's characteristics can have on a study. This l
esson will continue to explore these topics and help you create t
7. he foundation needed todescribe variables accurately.
Applying Knowledge of Variables and Measurement
Variables and Measurement in Ashford Courses
As a student at Ashford University, you will need to understand
variables and their characteristics in a range of graduate courses
, including thosewith a focus on psychological or organizational
assessment and testing, measurement, research methods, and st
atistics. In these courses you willneed to identify variables, des
cribe their characteristics, and evaluate the statistical tests used
and the results obtained as they relate to thosevariables. In addit
ion, having knowledge and skills in this area will help you bette
r understand the literature review, hypotheses, methods,results,
and discussion sections of the research literature you may be as
ked to evaluate for various courses.
Variables and Measurement in Graduate Research
Variables are the primary focus of a research study. When cond
ucting research it is important to determine and evaluate the var
iables ofinterest so you can accurately focus the research study,
provide an overview of the variables' characteristics, and determ
ine the appropriatestatistical tests to run and results to evaluate.
Errors in identifying and describing variables can lead the rese
archer to use inappropriatestatistical tests or results that should
not be used in the context of those variables. Errors can also cau
se the researcher to obtain inaccurateresults and draw incorrect
conclusions regarding the study's hypotheses.
Variables and Measurement in the Professional World
In the professional world it is important to understand the natur
e of variables in order to describe and measure them accurately
beforedisplaying related data. A variable's characteristics will b
e of prime importance when determining which mathematical or
statistical analyses toperform on the data obtained for that varia
ble. This is relevant in a variety of situations, ranging from dete
rmining the impact of a trainingprogram on employee performan
ce, to customer satisfaction levels, to depression levels after ad
ministering a new therapy program. Knowingthe scale of measur
ement, for example, will lead the professional to perform certai
8. n analyses and present the results to others in a manner thatmak
es sense for that variable. It is unethical and unprofessional to
misrepresent data using inappropriate analyses or inaccurate var
iabledescriptions.
Tutorial
Introduction to Variables and Data
When researchers conduct a statistical study, they try to discern
the relationship between different characteristics, or variables,
of thepopulation they are studying. For example, in a study on t
he incidence of post-
traumatic stress disorder (PTSD) among Iraq war veterans,resea
rchers may collect data on each veteran's sex, age, years on acti
ve duty, highest educational degree, and current mental well-
being. A variable is a value that varies over time, space, or from
individual to individual in a study. A constant, on the other han
d, is a value thatremains the same throughout the study. For exa
mple, in this study, whether or not each individual in the study s
erved in Iraq is a constant. Thevalue is yes for every individual
in the study. A data point is one variable value from one sample
d individual. Data refers to the collection ofdata points from all
of the individuals sampled.
There are two types of variables and data: qualitative and quanti
tative.
Qualitative Data and Quantitative Data
Qualitative Data
Qualitative data are the result of categorizing or describing attri
butes of a population that are not counted or measured. Hair col
or, blood type,ethnic group, the car a person drives, and the stre
et where a person lives are examples of qualitative variables. Q
ualitative data are generallydescribed by words or letters. For e
xample, hair color might be black, dark brown, light brown, blo
nde, gray, or red. Blood type might be AB+,O–
9. , or B+. In the case of a study on PTSD among Iraq war veteran
s, whether or not the veteran has been diagnosed with PTSD wo
uld be aqualitative variable. Qualitative data are also known as
categorical data.
Quantitative Data
Quantitative data are the result of counting or measuring attribu
tes of a population. Quantitative data are always described usin
g numbers andare usually the data of choice (where applicable)
because there are many methods available for analyzing the data
. Amount of money, pulserate, weight, number of people living i
n a town, and the number of students who take statistics classes
are examples of quantitative variables.Age and medication dosa
ge are examples of quantitative data.
Discrete vs. Continuous Variables
There are two types of quantitative data: discrete and continuou
s. Data that are the result of counting are called discrete data. T
hese data arewhole number values. For example, the number of
people in a town is an example of discrete data. There can be 1,
286 people or 1,287 people,but not 1,286.3 people in the town.
Data that are the result of measuring are called continuous data
(assuming that we can measure precisely). A person's weight an
d thetemperature of the air are both examples of continuous data
.
Confusing Issues
In some cases it is not immediately apparent whether a variable
is qualitative or quantitative.
For example, sometimes researchers assign numbers to qualitati
ve values. For example, in a data table you might see a column l
abeled "race,"but instead of descriptions like "white," "black," o
r "Hispanic," the column would include numbers 1, 2, and 3. Th
e researcher has assigned anumber to each race simply to make t
he data easier to work with. It is important to note, however, tha
t the data are still qualitative. The actualvalue of each number is
10. meaningless.
In other cases, a variable can be qualitative or quantitative depe
nding on how precisely it is described. For example, age can be
described inyears, months, and days, in which case it is quantita
tive. But it can also be described qualitatively, with relative des
criptions like "young,""middle-
aged," and "old." Similarly, it is possible to collect data on heig
ht using terms like "short," "average," and "tall." As you can se
e,although these descriptions are not inaccurate, they are not ver
y precise and are quite subjective. Quantitative data, on the othe
r hand, offerspecific information. Unlike qualitative data, quanti
tative data can be analyzed more easily and with more statistical
tools (as we will see inother lessons). For example, you can cal
culate the average age of war veterans if you know their precise
ages in years; you cannot calculate theaverage of "young," "mi
ddle-aged," and "old."
When data can be described quantitatively, it should be. Someti
mes it is useful to supplement those data with a qualitative desc
ription as well.
Independent and Dependent Variables
In some cases, variables are described not by their own intrinsic
properties, but instead by how they are used. In many studies, r
esearchers donot try to describe a sample simply by one piece of
data (e.g., cholesterol level); rather, they try to analyze how va
rious data points relate toeach other (e.g., how diet relates to ch
olesterol level). A researcher who wants to know how variable x
and variable y are related willmanipulate one of the variables a
nd not the other. The independent variable is the variable that is
manipulated, while the dependent variableis the variable that is
observed.
For example, an experimenter might compare how effective four
types of antidepressants are at relieving depression. In this case
, theindependent variable is the type of antidepressant, while the
dependent variable is the extent of relief from depression.
In many cases, there is really no "manipulated" variable, but si
11. mply two observed variables (for example, height and weight).
However, if aresearcher is trying to determine the relationship b
etween a person's height and weight, he or she would consider o
ne to be the independentvariable and the other to be the depende
nt variable.
We will go into more detail about independent and dependent va
riables in Lesson 9.
Scales of Measurement
If you look at a data table or a set of graphs, you may notice tha
t data for different variables are described in different ways. Fo
r example, thedata table below includes possible data from a stu
dy on the incidence of PTSD among Iraq war veterans.
IDNumber
Homestate
Highesteducational degree
Body temperatureat rest (°F)
Resting pulse rate (beatsper minute)
Diagnosed with PTSD? (0 = nodiagnosis; 1 = yes)
1
Texas
High schooldiploma
98.0
54
0
2
New York
MS
98.9
58
1
3
Maryland
BS
98.6
63
12. 1
4
Texas
PhD
97.4
55
0
5
California
High schooldiploma
99.3
62
1
6
Michigan
GED
98.7
62
0
Different types of variables are measured in different ways and
are described using certain scales of measurement (also called l
evels ofmeasurement).
Nominal Scales
Data on a nominal scale are categorized responses. Many types
of qualitative data are described using nominal scales; some exa
mples aregender, handedness, favorite color, and religion. In the
table above, home state and PTSD diagnosis are described on n
ominal scales. Note thatalthough the data on PTSD diagnoses ar
e displayed as numbers, these numbers represent qualitative attr
ibutes. They are not meaningfulnumerical values and are thus st
ill considered to be on a nominal scale.
Limitations of the Nominal Scale
A key characteristic of nominal scales is that they do not imply
any ordering among the responses. For example, when classifyin
13. g veteransaccording to home state, we would not rank the states.
Responses are merely categorized. Because data on a nominal s
cale are organized insimple categories, it is also not possible to
analyze them using many statistical tools. For example, we can't
calculate the "average" state of thesample of war veterans (eve
n if we assigned each state a number). We can, however, use the
data to calculate frequencies, percentages, andproportions (e.g.,
the percentage of Iraq war veterans who reside in Maryland).
Ordinal Scales
A more useful scale for qualitative data, where possible, is an or
dinal scale. A psychologist screening people for depression mig
ht ask them tospecify their feelings about life in general as eithe
r "very dissatisfied," "somewhat dissatisfied," "neither satisfied
nor dissatisfied," "somewhatsatisfied," or "very satisfied." This
is an example of an ordinal scale. Like the nominal scale, ordina
l scales generally use words rather thannumbers, and many quali
tative variables are described using ordinal scales. But unlike th
e nominal scale, the values on an ordinal scale can beordered (i
n this case ranging from least to most satisfied). Describing dat
a using an ordinal scale allows a researcher to rank responses ra
therthan simply categorize them.
Other examples of ordinal variables include military ranks and r
ankings in a race or contest (1st, 2nd, 3rd). In the table above, h
ighest educationaldegree is given on an ordinal scale.
Limitations of the Ordinal Scale
It is important to note that the difference between two levels of
an ordinal scale cannot be assumed to be the same as the differe
nce betweentwo other levels. In our satisfaction scale, for exam
ple, the difference between the responses "very dissatisfied" and
"somewhat dissatisfied"cannot be compared to the difference b
etween "somewhat dissatisfied" and "somewhat satisfied." Nothi
ng in this measurement procedureallows us to determine whethe
r the two differences reflect the same difference in psychologica
l satisfaction. Similarly, the difference between BSand MS and t
14. he difference between MS and PhD are not necessarily the same
, and there is no way to indicate those on an ordinal scale.Statist
icians express this point by saying that the differences between
adjacent scale values do not necessarily represent equal interval
s on theunderlying scale giving rise to the measurements. (In ou
r case, the underlying scale is the true feeling of satisfaction, w
hich we are trying tomeasure.)
What if the researcher had measured satisfaction by asking cons
umers to indicate their level of satisfaction by choosing a numb
er from 1 to 4?Would the difference between the responses of 1
and 2 necessarily reflect the same difference in satisfaction as t
he difference between theresponses 2 and 3? The answer is No.
Changing the response format to numbers does not change the m
eaning of the scale. We still are in noposition to assert that the
mental step from 1 to 2, for example, is the same as the mental s
tep from 3 to 4.
As with the nominal scale, there are not many statistical tools w
e can use to analyze ordinal data. For example, we cannot calcul
ate averagesatisfaction or average educational degree. But what
if those values are on a numerical ordinal scale? Does it make s
ense to compute theaverage of numbers measured on an ordinal
scale? This is a difficult question, and one that statisticians hav
e debated for decades. Theprevailing—
but by no means unanimous—
opinion is that for almost all practical situations, the average of
an ordinal variable is a meaningfulstatistic. However, there are
extreme situations in which computing the average of an ordinal
variable can be misleading. You can explore thesetypes of situa
tions in David Lane's measurement simulation in the Practice se
ction of this lesson.
Interval Scales
Interval scales are numerical scales in which equal intervals are
interpreted the same throughout. Interval scales are used for a n
umber of typesof quantitative data. As an example, consider the
Fahrenheit temperature scale, which is expressed in degrees (°F)
15. . The interval between 30°Fand 40°F represents the same temper
ature difference as the interval between 80°F and 90°F. This is
because each 10-
degree interval has thesame physical meaning (in terms of the ki
netic energy of molecules). Dates are also expressed on an inter
val scale. The difference between twosuccessive days, for exam
ple, is the same regardless of the days chosen.
Limitations of Interval Scales
Interval scales are not perfect, however. In particular, they do n
ot have a true zero point, even if one of the scaled values happe
ns to carry thename "zero." The Fahrenheit temperature scale ill
ustrates this issue. Zero degrees Fahrenheit does not represent t
he complete absence oftemperature (the absence of any molecul
ar kinetic energy). In reality, the label 0°F is applied to this tem
perature for quite accidental reasonsconnected to the history of t
emperature measurement. This is also true for dates: The year "z
ero" is quite arbitrary and does not represent theabsence of time
. Similarly, 0° longitude does not represent "no longitude," but r
ather an arbitrary north-
south measurement on the earth'ssurface.
Because an interval scale has no true zero point, it does not mak
e sense to compute ratios of values on an interval scale. For exa
mple, the ratioof 40°F to 20°F is not the same as the ratio of 10
0°F to 50°F; no interesting physical property is preserved across
the two ratios. After all, if the"zero" label were applied at the t
emperature that the Fahrenheit scale happens to label as 10 degr
ees, the two ratios would instead be 30°F to10°F and 90°F to 40
°F, which are no longer the same! For this reason, it does not m
ake sense to say that 80°F is "twice as hot" as 40°F. Such aclai
m would depend on an arbitrary decision about where to "start"
the temperature scale, namely, what temperature to call "zero" (
whereasthe claim is intended to make a more fundamental assert
ion about the underlying physical reality).
Because the data are quantitative and because the distance betw
een values is set and understood, it is possible to perform statist
16. ical analyseson data on the interval scale. We can, for example,
calculate an average temperature of seawater or the average birt
h year for a group ofpeople.
Ratio Scales
The ratio scale is the most informative scale of measurement. It
is an interval scale with the additional property that its zero pos
ition indicatesthe absence of the quantity being measured. You
can think of a ratio scale as the three earlier scales rolled into o
ne. Like a nominal scale, itprovides a name or category for each
object (the numbers serve as labels). Like an ordinal scale, the
objects are ordered (in terms of theordering of the numbers). Li
ke an interval scale, the difference between two places on the sc
ale has the same meaning regardless of the twopoints chosen. In
addition, the same ratio at two places on the scale also carries t
he same meaning.
An example of a ratio scale is the amount of money you have in
your pocket right now (25 cents, 55 cents, etc.). Money is meas
ured on a ratioscale because, in addition to having the propertie
s of an interval scale, it has a true zero point: If you have zero
money, this implies the absenceof money. Since money has a tru
e zero point, it makes sense to say that someone with 50 cents h
as twice as much money as someone with 25cents—
or that Bill Gates has a billion times more money than you do.
Like the interval scale, all statistical analyses can be performed
on data described on a ratio scale.
Likert Scales
Many questionnaires use a type of scale called a Likert scale to
gauge how people feel about particular issues. Typical response
s for an item onthe Likert scale are:
1. strongly disagree
2. disagree
3. neither agree nor disagree
4. agree
5. strongly agree
17. Similar rating scales are used frequently in psychological resear
ch. For example, experimental subjects may be asked to rate the
ir level of pain,how much they like a consumer product, or their
confidence in an answer to a test question.
Are Likert scales ordinal or interval? Researchers disagree, and
it depends on the study. Certainly, if no effort has been made to
make sure thatthe difference between any two successive rating
s on the scale is constant, then the scale is ordinal and not inter
val. But sometimes researchersattempt to construct the study su
ch that the differences between ratings are approximately equal.
This type of scale is thus sometimes referredto as an "approxim
ately interval" scale. Researchers will perform numerical statisti
cal analyses on these data. It is important, however, to becareful
when collecting and interpreting the data. Whether the data sho
uld be considered ordinal or interval can be extremely subjectiv
e, and itis often inappropriate to consider psychological measur
ements scales as either interval or ratio.
Summary of Measurement Scales
Scale
Description
Type of data
Examples
Pros
Analyze numerically?
Nominal
data described byname only
qualitative
shape, country,gender
allows data to becategorized
in a limited way(frequencies,percentages,proportions)
Ordinal
data that can beranked in meaningfulorder
qualitative orquantitative
rank, position in arace
allows ranking ofdata
18. in a limited way ifdata are qualitative(frequencies,percentages,p
roportions) cansometimes beanalyzed morethoroughly if the dat
aare quantitative
Interval
data whose numbersindicate a set fixeddifference betweeninterv
als, with anarbitrary rather thanan absolute zeropoint
quantitative
temperature, date,sea level, longitude
provides informationon the absolutenumerical differencebetwee
n two datapoints
yes
Ratio
interval data with anabsolute zero point
quantitative
age, height, elapsedtime
provides informationon the ratio of valuesof two data points;tell
s where data arein relation toabsolute zeromeasurement
yes
Introduction
Introduction to Charts and Graphs in Statistics
In the 1780s, Scottish economist and engineer William Playfair,
founder of graphical methods in statistics, invented the line and
bargraphs.
It is said that "a picture is worth a thousand words." We can say
the same withgraphs, which are pictorial representations of dat
a. By representing a data set inthis way, patterns become appare
nt, or we can begin to see what the data mightbe telling us. In a
ddition, graphs can help to categorize or group together datapoi
nts, which can help us draw conclusions regarding the data set.
Summarizing and categorizing data sets are a task aided through
the use ofvarious graphing techniques. There are many differen
t ways we can representdata, such as pie charts, bar charts, line
19. graphs, stem plots, or histograms.Choosing the most appropriate
graph depends on the information we want toinclude and how w
e can display that information most accurately.
When choosing an appropriate graph type, it is important to con
sider thevariables that will be displayed. For categorical variabl
es (which are nominal orordinal in scale), pie charts and bar cha
rts are more appropriate graphingtechniques for displaying freq
uencies or percentages for each category of thatvariable. For ex
ample, if we wanted to display the breakdown of various racialg
roups in a sample, we would likely use a pie chart or bar chart t
o display the percentages or frequencies for these categories. Th
e same couldbe said regarding other categorical variables, such
as sex, age groups, rankings, and multiple-
choice question responses.
For numerical variables (which are interval or ratio in scale), co
mplex graphing techniques such as line charts or histograms ma
y be moreappropriate. For example, if we wanted to display the
average weekly quiz scores during a 13-
week course, a histogram will likely be a moreappropriate way t
o represent this data. Other numerical variables that may call fo
r these types of graphs include time, medication dosages,physic
al measurements (height, weight, temperature, etc.), and test sco
res.
In this lesson you will learn how to create various types of grap
hs and how to determine which type of graph is most appropriat
e for aparticular data set. Pay special attention to what the grap
h is telling you about the data, whether the graph is being used t
o demonstrate a bias,and how different graphs provide specific i
nformation about the data set.
Using Charts and Graphs in Statistics
Charts and Graphs in Ashford Courses
You will need to understand graphs in a range of graduate cours
es, including those with a focus on psychological or organizatio
nal assessmentand testing, measurement, research methods, and
20. statistics. However, because graphs are commonly used to conde
nse and present datafindings, you may find graphs throughout te
xtbooks and readings for various courses. If you are required to
present findings from a literaturereview, or present an argument
, you may need to create and/or interpret graphs in order to prov
ide data and conclusions to others. In addition,having knowledg
e and skills in this area will help you better understand the revie
w, results, and discussion sections of the research literatureyou
may be asked to evaluate for various courses.
Charts and Graphs in Graduate Research
Graphs are typically used in some way to present the results of
a research study. When conducting research it is important to de
termine andevaluate the appropriate graphs needed for all of the
variables of interest for a particular study in order to provide a
n overview of the data in aconcise and accurate manner. Graphs
can range from tables that include the data collected or descripti
ve statistics of variables, to histograms ofstudy results.
Charts and Graphs in the Professional World
Graphs are commonly used as a means of communicating inform
ation in the workplace, because they provide a quick and effecti
ve look at dataresults without requiring extensive expertise in m
ath or statistics. In the professional world it is important to und
erstand how graphs arecreated, when to use them, how to accura
tely portray data to others, and how to interpret graphs in the fie
ld. In addition, it is important thatdata is presented in a way to a
void bias or skewing of the results.
Tutorial
Data Tables
Suppose a group of researchers collects data for a study on anxi
ety among working mothers of infants and toddlers. How should
the data bepresented so that the researchers can analyze it easil
y and share it effectively with others? No one can work with a b
unch of informationscribbled on various pieces of paper. Data m
21. ust be organized in order to analyze it.
Researchers should think about how to display data before they
begin collecting it. The first thing to do is create a data table. A
useful datatable is organized with a header row at the top (with
each variable represented in a separate column), and an identifie
r column at the far left(with each individual in the study represe
nted in a separate row). An example of a data table is shown bel
ow.
Table 3.1: Anxiety among Working Mothers—General Data
SampleID
Age(years)
Maritalstatus1
Number ofchildren under 1year of age
Number ofchildrenaged 1–3
Average number ofhours worked perweek2
Jobdescription
Typeofdaycare3
Rating on survey item: I get upset easily orfeel panicky.4
1
38.4
M
1
1
60
financialexecutive
5
4
2
26.2
M
2
0
40
cashier
1
3
22. 3
34.4
S
1
1
40
universityprofessor
4
2
4
42.8
M
1
0
30
physician
2
2
5
35.0
S
N/A
N/A
50
physician
N/A
3
1 Marital Status: M= married; S = single; D= divorced or separa
ted from father
2 Includes work done for job at home and on weekend
3 Type of day care: 1 = in home with relative; 2 = in home with
non-
relative; 3 = private home day care; 4 = day care center at work
place; 5= day care center not at workplace; N/A = control group
4 Anxiety Rating: 1 = a little of the time; 2= some of the time; 3
= a good part of the time; 4 = most of the time
23. The table allows us to see the data collected for each person in t
he sample. Each row provides the data for all variables for a sin
gle individualsampled. Each column displays data from all indiv
iduals sampled for a single variable. We can easily view data co
llected for a single individual("Sample #1 is 38 years old and w
orks 60 hours per week.") or compare values of a particular vari
able ("The women sampled range in age from26 to 42.").
Note that in a good table,
· Variables are clearly identified.
· Units are included for each variable (e.g., years).
·
Rating systems and abbreviations are included in the headers or
as a separate key.
One of the nice things about a table is that you can not only find
data easily, but you can also sort it easily. We could, for examp
le, sort thedata in the table above by age, marital status, or any
other variable. Having the data in a table also allows us to perfo
rm statistical calculationson the data fairly easily (the exact met
hods depend on the program you are using).
Summary Data Tables
A good study requires a sufficiently large sample, which may in
clude hundreds or thousands of individuals. In this case, it can b
e useful tocreate a number of additional tables that summarize t
he data in the original table. One useful summary table is a freq
uency table. Rather thandisplay every single data point, we disp
lay the quantity of each value. Table 3.2 includes data that coul
d have been derived from the data inTable 3.1.
Table 3.2: Anxiety among Working Mothers—
Frequencyof Anxiety Ratings
Anxietyrating
Frequency(number ofresponses)
Relative frequency(number of responses ÷total number of respo
nses)
Percentage(out of totalresponses)
1
93
24. 0.06
6%
2
453
0.28
28%
3
782
0.49
49%
4
267
0.17
17%
TOTAL
1595
1.00
100%
In the table above, frequency is the number of responses to the s
tudy. For example, 453 participants in the study reported an anx
iety rating of2. Relative frequency is the number of responses w
ith a particular value (e.g., "anxiety rating 2") divided by the tot
al number of responses.Relative frequency is a proportion relati
ng the number of participants having a particular variable value
to the number of participants. In thiscase 453 participants out of
1595 total, or 0.28 of the total, reported an anxiety rating of 2.
Percentage is simply relative frequency multipliedby 100. Relati
ve frequency is useful when comparing two sets of data that do
not have the same number of total values. For example, if wewe
re to compare this set to another researcher's set of data, it woul
d probably be more informative to say that 6% of participants re
ported ananxiety rating of 1 than to say that 93 reported an anxi
ety rating of 1.
For data that are quantitative and continuous, it may be useful t
o summarize them even further by grouping or binning the value
s into ranges.For example, if we were to create a frequency tabl
25. e for age (in years), it would probably not be useful to include o
ne row for each age. Thatwould mean a lot of rows, many of wh
ich would likely have a low frequency. Instead, we indicate the
number of individuals in certain ageranges, called intervals or b
ins.
Table 3.3: Anxiety among Working Mothers—AgeFrequency
Age (in years)
Frequency
Relative Frequency
Percentage
16–20
47
0.03
3%
21–25
190
0.12
12%
26–30
404
0.25
25%
31–35
487
0.31
31%
36–40
358
0.22
22%
41–45
82
0.05
5%
46–50
25
26. 0.02
2%
51–55
2
0.00
0%
Total
1595
1.00
100%
Charts and Graphs
Once data is organized, it can be analyzed. There are numerous
calculations and statistical operations we can make, but the first
thing to do isget a general idea of what the data "looks like." W
e do this by using graphs. Statisticians often graph data first in
order to get a picture of thedata. Then, they can use more forma
l tools to interpret the data.
A statistical graph is a tool that helps you learn about the shape
or distribution of a sample. A graph can present data more effec
tively than asimple list of numbers; a graph shows where data ar
e clustered and where they are more scattered. We can also easil
y see maximum values,minimum values, and outliers (values tha
t are very different from the rest). Graphs also allow us to see tr
ends and compare facts and figuresquickly. This can be difficult
—if not impossible—
using a table with thousands of data points.
Some types of graphs that we can use to summarize and organiz
e data are stem-and-
leaf plots (stemplots), bar graphs, pie charts, andhistograms.
Stem-and-Leaf Plots (Stemplots)
One simple graph, the stem-and-
leaf plot (or stemplot), is useful when the data sets are small. T
o create the plot, first identify the stem and leafof each piece of
data. The leaf is usually the last digit of the number; the stem is
the rest of the number. For example, the number 23 has stem2 a
27. nd leaf 3. The number 432 has stem 43 and leaf 2. The number 5
,432 has stem 543 and leaf 2. The decimal 9.3 has stem 9 and le
af 3.
Next, write the stems in a vertical line from smallest to largest.
Then draw a vertical line to the right of the stems. Finally, write
the leaves inincreasing order next to their corresponding stem.
(The stem and leaves should make sense for the data set. Look a
t the range of points andsee how it best makes sense to divide th
e stem and leaves, and then group the data.)
For example, the scores for a final exam are as follows:
12; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80; 83
; 88; 88; 88; 90; 92; 94; 94; 94; 94; 96; 100
The stemplot looks like this:
The stemplot shows that most scores fell in the 60s, 70s, 80s, an
d 90s. Eight of the 31 scores, or approximately 26%, were 90 or
above; on atypical grading scale this represents a fairly high nu
mber of "A"s. Notice that in the stemplot "0" does not indicate a
lack of data, but rather avalue (an exam score). The lack of dat
a for a particular stem is indicated by no values in the leaf colu
mn.
The stemplot is a quick way to graph and gives a succinct pictur
e of the data. You want to look for an overall pattern and any ou
tliers. An outlier, or extreme value, is a piece of data that does
not fit well with the rest of the data. When you graph an outlier,
it will appear not to fitthe pattern of the graph. Some outliers a
re due to mistakes (e.g., writing 50 instead of 500), while others
may indicate something unusual. Ittakes some background infor
mation to explain outliers. In the example above, 12 is an outlie
r.
Bar Graphs
Another type of graph that is useful for specific data values is a
bar graph. Bar graphs display data in separate bars. Bar graphs c
an be used forqualitative or quantitative variables, and the bars
can be vertical or horizontal. The figure below shows a frequenc
y table with its correspondinggraph. Frequencies are represente
28. d by the heights of the bars.
Table 3.4: Anxietyamong WorkingMothers
Anxietyrating
Frequency
1
93
2
453
3
782
4
267
Figure 3.1
The same data could also be presented in terms of relative frequ
ency or percentages.
Figure 3.2
Figure 3.3
Pie Graphs
Another way to display proportions is using a pie graph, or pie c
hart. In a pie graph, proportions are shown as pieces of a circula
r "pie." Theentire pie is equal to 100%.
Figure 3.4
It is important to note that you can use a pie graph only for prop
ortions, and those proportions must add up to 100%. You cannot
use a piechart to compare two variables, for data that overlap, o
r for data that don't represent the entire sample or population in
a study. For example,say a data set includes annual deaths from
malaria and from HIV/AIDS.
Table 3.5: Deaths from Infectious Diseases
Disease
Percent of all deaths
29. Malaria
2.23
HIV/AIDS
4.87
We could create a pie graph to display these data. Although it d
oes give us a visual idea of the difference in deaths due to the t
wo diseases(HIV/AIDS kills roughly twice as many people as d
oes malaria), the graph is misleading because it suggests that ju
st two diseases account for alldeaths. Unless we change the title
of the pie graph to "Cause of Death of People Who Die of Eithe
r Malaria or HIV/AIDS," the graph is notappropriate for the stu
dy.
Figure 3.5
A bar graph, however, would be quite useful and would not be
misleading.
Figure 3.6
Histograms
Bar graphs can be used to represent qualitative or quantitative d
ata. When the data are quantitative and use an interval or ratio s
cale, they canbe displayed in a special type of bar graph called a
histogram. A histogram is a bar graph of the frequencies of nu
merical values of a sampledpopulation.
A histogram consists of contiguous columns (columns without s
paces between them). The horizontal axis is labeled with the var
iable beingmeasured (for instance, distance from your home to s
chool). The vertical axis is labeled either "frequency" or "relati
ve frequency." Again,frequency is just the number of counts, or
data points with a particular value. Relative frequency is really
the same as proportion or percentage,and is equal to the frequen
cy divided by the total number of data points. For example, in t
he table below, the relative frequency of motherssampled that w
ere aged 16–
20 was 47/1595 = 0.03. Relative frequencies should always add
up to 1.0.
30. Table 3.6: Study of Anxiety in Working Mothers
Age
Frequency (number ofmothers)
Relativefrequency(numberofmothersin eachbindividedby totalnu
mberofmothers)
16–20
47
0.03
21–25
190
0.12
26–30
404
0.25
31–35
487
0.31
36–40
358
0.22
41–45
82
0.05
46–50
25
0.02
51–55
2
0.00
Total
1595
1.00
Two histograms of the data are shown below.
Figure 3.7
31. Figure 3.8
Notice that the graphs have the same shape whether we plot abs
olute frequency, relative frequency, or percentage. Absolute fre
quency iscommonly used when the data set is small; relative fre
quency is used when the data set is large or when we want to co
mpare severaldistributions. For example, if we had another set o
f data showing the distribution of ages of non-
working mothers in the study, unless thenumber of mothers in e
ach set was the same, we would probably want to use relative fr
equency rather than absolute frequency. We can use ahistogram
to see the shape of the data distribution. As we will see in other
lessons, we can also use it to estimate certain statistics, such as
themean, or average.
One advantage of a histogram is that it can readily display large
data sets. A rule of thumb is to use a histogram when the data s
et consists of100 values or more.
Bin Widths
There is more to be said about the widths of the class intervals,
sometimes called bin widths. Your choice of bin width determin
es the numberof class intervals. This decision, along with the ch
oice of starting point for the first interval, affects the shape of t
he histogram. This is importantbecause the shape affects how m
uch information can be seen from the graph alone and on how p
eople interpret the graph.
For example, the graph in Figure 3.9 has a narrow bin width, or
class interval, of 1 month. The number of class intervals is 44: o
ne for eachmonth. In this case, the bin width is so narrow that it
is hard to see any pattern in the data.
Figure 3.9
In Figure 3.10, however, the same data are plotted in 8 bins, eac
h having a bin width of 6 months. The wider bin width makes th
e data easierto plot because there are fewer bins to plot. We can
also see a pattern in the data that was not apparent in Figure 3.
9.
32. Figure 3.10
If the bins are too wide, however, the data becomes harder to an
alyze. In Figure 3.11, the same data are plotted again, but with a
bin width of30. As you can see, it is impossible to get an idea o
f the real distribution of data using this width.
Figure 3.11
The best thing to do is experiment with different widths, and to
choose a histogram according to how well it communicates the s
hape of thedistribution.
When choosing a bin width, remember that shifting the intervals
can also affect the appearance of the data.
Summary of Charts and Graphs
· Data tables are used to collect, display, and sort data.
·
Summary data tables include frequencies and percentages that s
ummarize large sets of data.
·
Graphs allow a researcher to quickly see trends, clusters, and m
aximum and minimum data values.
· Stem-and-
leaf plots (or stemplots) provide a graphical representation of th
e frequency of values in small data sets.
·
Bar graphs have vertical or horizontal columns to represent valu
es of qualitative or quantitative variables.
· Pie graphs are used to depict parts of a whole.
·
Histograms represent the relative frequencies of quantitative var
iables.
·
The look of a histogram is highly influenced by bin width. Grap
hs with bins that are too narrow or too wide can be difficult to i
nterpret.
Previous section
33. Next sectionTutorial
Mean
The mean of a set of data is also known as the average. It is the
sum of the values of all of the data points divided by the numbe
r of datapoints. For example, to calculate the mean selling price
of 50 houses, add the 50 prices together and divide by 50:
mean = sum of all values in the sample/number of values in the
sample
Example 1
The data below indicate the number of months that 40 AIDS pat
ients live after beginning treatment with a new antibody drug.
What is themean survival time of this sample of patients?
10
37
17
8
13
27
12
24
14
8
15
44
16
40
15
18
21
22
22
25
11
35. Median
The median of a set of data is the middle data point. If you put a
ll the data points in order, the median value is the value of the p
oint in themiddle. If there are an odd number of data points, the
median is the value of the point exactly in the middle. If there a
re an even number ofdata points, the median is the average of th
e two points in the middle. For example, to find the median pric
e of 50 houses, we would put all ofthe prices in order, from low
est to highest. We would then look at the middle point. In this c
ase, since 50 is an even number, we would takethe 25th and 26t
h prices, and find their average.
median = middle value, or average of the two middle values in t
he sample
Example 2
What is the median survival time of the sample of AIDS patient
s described in Example 1?