SURVEY USAGE AND
FINDING CORRELATIONS
Survey and Correlation Research Designs
https://my.visme.co/render/1454630688/www.erau.edu
Slide 1 Transcript
Surveys and correlations are usually found in qualitative approaches, although some specific forms may be in quantitative designs, as well. In this module, types of surveys,
development of items used, and overall survey design and administration will be the subject of interest. The second component of the module is about correlation analysis. Included
in both descriptive and inferential statistics, correlations describe the relationship between two paired variables. Surveys and correlations are regarded as nonexperimental research
and will serve to cover that category among the three research designs introduced earlier in the course.
Survey Designs
Determine incidence,
frequency, distributed
characteristics
Opinions, attitudes,
previous experience
Self report
Sample taken from
larger population
Census taken from
smaller population
Summarized as percentages,
frequencies, indices
Particular point in time,
like a snapshot
Categories of surveys
Instrumentation
(interviews, questionnaires)
Span of time
cross-sectional
longitudinal
(trend, cohort, panel,
follow-up)
Slide 3 Transcript
A survey is a study designed to determine the incidence, frequency, and distribution of certain characteristics in a population. Some of the characteristics might include opinions,
attitudes, or previous experiences using self-report responses. Surveys are either sample surveys or census surveys. Usually, a sample is taken from a population, often called a
descriptive or normative survey. Census surveys are usually conducted when a population is relatively small and readily accessible. Once the responses are tabulated they are
summarized in percentages, frequencies, or indices. Surveys often are depicted as snapshots, or moments captured in a particular frame of time. There are other types of surveys,
e.g., geological or inventory types, but here we are focusing on those done with humans. Surveys can be categorized by instrumentation or by span of time involved. Instrumentation
designs include interviews or questionnaires either of which might be conducted orally or in writing. The span of time needed to complete a survey might be cross-sectional for
information at single period in time to compare variables, or longitudinal with multiple data collection points over an extended period to examine changes. A key aspect, then, is the
number of times the survey is administered. Cross-sectional surveys have the advantage of providing data relatively quickly. However, cross-sectional surveys do not provide a broad
perspective to inform decisions about reliably changing systems, or to understand trends over time.
Longitudinal surveys, though, require an extended period of time to study an issue. Longitudinal surveys are categorized into four types – trend surveys address a .
SURVEY USAGE AND FINDING CORRELATIONS Survey and Correlat.docx
1. SURVEY USAGE AND
FINDING CORRELATIONS
Survey and Correlation Research Designs
https://my.visme.co/render/1454630688/www.erau.edu
Slide 1 Transcript
Surveys and correlations are usually found in qualitative
approaches, although some specific forms may be in
quantitative designs, as well. In this module, types of surveys,
development of items used, and overall survey design and
administration will be the subject of interest. The second
component of the module is about correlation analysis. Included
in both descriptive and inferential statistics, correlations
describe the relationship between two paired variables. Surveys
and correlations are regarded as nonexperimental research
and will serve to cover that category among the three research
designs introduced earlier in the course.
Survey Designs
Determine incidence,
frequency, distributed
characteristics
Opinions, attitudes,
previous experience
Self report
2. Sample taken from
larger population
Census taken from
smaller population
Summarized as percentages,
frequencies, indices
Particular point in time,
like a snapshot
Categories of surveys
Instrumentation
(interviews, questionnaires)
Span of time
cross-sectional
longitudinal
(trend, cohort, panel,
follow-up)
Slide 3 Transcript
A survey is a study designed to determine the incidence,
frequency, and distribution of certain characteristics in a
population. Some of the characteristics might include opinions,
attitudes, or previous experiences using self-report responses.
Surveys are either sample surveys or census surveys. Usually, a
sample is taken from a population, often called a
descriptive or normative survey. Census surveys are usually
conducted when a population is relatively small and readily
accessible. Once the responses are tabulated they are
summarized in percentages, frequencies, or indices. Surveys
often are depicted as snapshots, or moments captured in a
particular frame of time. There are other types of surveys,
3. e.g., geological or inventory types, but here we are focusing on
those done with humans. Surveys can be categorized by
instrumentation or by span of time involved. Instrumentation
designs include interviews or questionnaires either of which
might be conducted orally or in writing. The span of time
needed to complete a survey might be cross-sectional for
information at single period in time to compare variables, or
longitudinal with multiple data collection points over an
extended period to examine changes. A key aspect, then, is the
number of times the survey is administered. Cross-sectional
surveys have the advantage of providing data relatively quickly.
However, cross-sectional surveys do not provide a broad
perspective to inform decisions about reliably changing
systems, or to understand trends over time.
Longitudinal surveys, though, require an extended period of
time to study an issue. Longitudinal surveys are categorized
into four types – trend surveys address a particular trait or
characteristic, cohort surveys involve a population selected at a
particular time, panel surveys follow the same individuals
throughout the study, and follow-up surveys addresses
change in a previously studied population.
Types of Survey Items
Existing surveys
or items
Forms of
surveys
Questionnaires (scales)
Interviews
4. Structured
surveys
(closed-ended)
Limits responses
available
Common with interviews,
but with have some open
responses
Unstructured
surveys
(open-ended)
Difficulties with clarity and
categories
Partial structure
Slide 5 Transcript
Because survey researchers often seek information that is not
already available, they usually need to develop an appropriate
instrument, i.e., questions. When a valid and reliable
instrument
already exists, researchers can use it, but what is more
important is that the survey contains the appropriate questions.
Surveys generally take one of two forms, questionnaires which a
written
collection of questions are put forward to a particular group of
participants, or, interview which a person to person question
and answer session between a researcher and a participant is.
For
questionnaires, no item should be included that is not directly
related to the topic. Scales are often used in questionnaires, e.g.
5. semantic differential or Likert, or ranked items, e.g., in order of
importance. Most survey items are structured, which is to say
they are closed-ended items and limit the response options
available. Interviews are more likely to use structured questions
but
allow for open-ended responses. An unstructured item allows
the participant complete freedom in their responses, and
although they are simpler to construct, unstructured items
present
several challenges for the researcher. Sometimes respondents
will not take the time to answer these items (especially with a
questionnaire), or, they may give unclear answers. Also,
scoring
these items is more difficult and time consuming than with
closed-ended items. This is due, in part, to the issue of having
to categorize the answers and develop a procedure to decide
which
category the response should be in. A variation of the open-
ended item is the partially open-ended one which restricts some
answers but has items or choices where the participant can
respond more freely. For interviews, which are more likely to
use open-ended items, participants can wander briefly from the
topic or include multiple responses which present difficult
decisions about how to categorize the statements.
Rules for Writing Surveys
Questionnaires
Attractive, brief, and easy
Every item related to topic
Provide point of reference
For questionnaires or
interviews
6. Avoid leading or sensitive questions
Avoid multiple questions in same
item (double-barreled)
Avoid patterns in questions
Range in scales must be appropriate
Observe neutral terms
Slide 7 Transcript
Questionnaires should be attractive, brief, and easy to respond
to. For questionnaires, no item should be included that is not
directly related to the topic. Structured items work best if the
participant only needs to circle the answer, or check it in the
computer, rather than write down a response. Try and provide a
point of reference, like a comparison to something, that guides
participants in answering the questions. Whether questionnaires
or interviews, avoid leading questions that might suggest one
response is more appropriate than another. You also should
avoid sensitive questions to which the participant may not
answer honestly. And, don’t ask a question that assumes a fact
not necessarily true. Some additional aspects to avoid include
not
asking question that have multiple responses built in – the
double-barreled item. You would also want to avoid questions
that begin to establish a pattern in their answers – this is
referred to
as the response set pitfall. One way to offset this tendency is to
reverse the scale, however, it can also increase the possibility of
inaccurate responses by participants. Where you may have
several rating scales, they should be consistent in their range
and numbering, since participants don’t necessarily agree about
what various points along a scale mean and you can, at least,
keep their responses in a consistent pattern. With a Likert-type
7. scale, you will often see a range of five points, which allows for
a neutral point. However, limiting responses to four choices
forces a respondent to move more to one side than the other.
Scales that extend to 7 or 10 points offer more variability but
can be very difficult to validate the lesser differences and their
significance. When constructing survey items, it is best to
observe neutral terms with regard to gender, race, and other
sensitive matters, unless of course, these are objectives for the
survey.
Survey Administration
Have a plan for
capturing responses
Ways to administer
Records
Mail
Telephone
Group or individual
Electronic
Field test
Adjust, schedule, consent,
administer
Mailing may include inaccurate
addresses
Computerized surveys
Confidentiality and anonymity
not assured
Survey return rates can
8. be challenging
Tele-interviews invite
a few concerns
Slide 9 Transcript
Before administering a survey, the researcher must have a clear
plan for how to capture the responses and other data. Using a
computer or smartphone with a checklist or spreadsheet is
one way. Having an electronic template provides flexibility and
opportunity to gather information on individual participants or
situations. Different technology, like Cybertracker, can
follow a participant into situations the researcher can’t observe
directly. There are several ways to administer surveys, e.g.,
using records, mailing the survey, doing telephone surveys, by
group or individual interviews, and electronically, of course.
First, though, a field test or pretest should be conducted with a
few volunteers to find weak spots in the questions. Once
adjustments are made, the sample or group is identified using
procedures covered earlier, and times are arranged. The next
step is to obtain informed consent to participate. Often, as part
of this process, the researcher may first meet with participants
in person or by video to explain the purpose of the survey.
Places or means to access a questionnaire might be at a
particular
website, which is especially relevant for selected samples that
need to limit access to the questionnaire. When mailing, several
issues may arise like inaccurate addresses, mis deliveries, or
who actually responded to the survey. More open surveys can
use a link to a service like Survey Monkey, although it is
difficult to control who is actually completing the questionnaire
and it
also is difficult to screen those who may not meet the profile for
9. your desired type of participant. One of the problems with
sending out questionnaires electronically is in providing
confidentiality or anonymity. Maximizing the return rate for a
questionnaire is important, and strategies to improve the rate
might include timing that is conducive, getting a good first
impression when they see the questionnaire, and being able to
motivate potential responders. Sometimes, an advance letter
can be sent indicating they will be receiving a survey, or
nominal
payment may be offered. A survey of records uses a different
source than other surveys discussed here because they are
nonreactive and do not involve a direct response from a person.
Some of the problems with records surveys are that the
information may be incomplete, yearly comparisons may not be
compatible, or the purpose of the records may be unrelated to
the
purpose of the survey. For in-person interviews, using a
telephone or electronic process, like Skype, can be the way.
Teleinterviews, though, present some potential artifact, or
contaminating
influences, when cameras are used, or sound quality is poor.
Correlational Research Designs
Survey data often used with correlational designs
Considers extent of differences in one variable with another
Correlation = one variable increases > predicts another
Quantitative data are required
Often investigate a major, complex variable
10. High correlation prompts further causal-comparative studies
Higher the correlation, more accurate the predictions
Slide 11 Transcript
Survey data are often used in correlational research designs. A
correlational study considers to what extent differences in one
variable or characteristic are associated with differences in
other characteristics or variables. A correlation exists if, when
one variable increases, another predictably increases or
decreases. In this way, knowing the value of one variable allows
for
prediction of the value for another variable. The degree of
relation is expressed as a correlation coefficient. To perform a
correlational study, quantitative data are required. Keep in
mind, this may apply within nearly any type of research design.
Correlation research is sometimes regarded as a type of
descriptive research because it describes an existing condition.
Correlational studies typically investigate several variables
believed to be related to a major, complex variable, something
like achievement. Variables found to be highly related are
studied further in causal-comparative or experimental studies to
determine the nature of the relations. We have heard many times
that high correlation between two variables does not
imply that one causes another. But, even though correlational
relations are not cause-effect ones, it still permits prediction.
The higher the correlation the more accurate the predictions
become. There is a temptation for some researchers to correlate
all manner of variables just to see what might turn up, but this
is strongly discouraged and is very inefficient.
11. Relationship Between Variables
Correlation coefficient =
+1.00 to -1.00
High positive correlation
> high relationship
with the other
Indicates size and
direction of relationship
High negative correlation
> low relationship
with the other
Less than .35 is weak
Between .35 to .65 is
moderate
Above .65 is strong
Correlation coefficient >
shared variance
between variables
Slide 13 Transcript
When two variables are correlated, the result is a correlation
coefficient which is a decimal ranging from +1.00 to -1.00. The
coefficient indicates the size and direction of the relationship.
A
coefficient near +1.00 represents a strong relationship and
positive direction.
12. In other words, a high positive correlation is likely to have a
high relationship with the other variable. If the coefficient is
near zero, the variables are not related. Now, if there is a high
negative
value, approaching near -1.00 for instance, it means there is a
strong relation, and that the value on the other variable will be
low. As a guide, correlation coefficients can be interpreted in
bands,
e.g., less than .35 is weak, between .35 and .65 is moderate, and
greater than .65 is strong. A coefficient less than .50 is
generally considered useless for prediction. Those above .60 are
adequate
for group prediction purposes and above .80 for individual
predictions. Many beginning researchers mistakenly think a
correlation coefficient of say .50 means that two variables are
50% related.
This is incorrect. The square of the correlation coefficient
indicates the amount of variance shared by the variables.
Common variance, also called shared variance, indicates the
extent to which
variables vary in a systematic way.
Shared variance is the variation in one variable that is
attributable to its tendency to vary with another variable. So,
the more the common variance, the higher the correlation
coefficient. If two
variables are perfectly related the variability of one set of
values is very similar to variability in the other set of values.
Limitations and Interpretation
Interpretation depends
on how the coefficient is
used
13. Prediction study
Accuracy in predictions
Hypothesis study
Statistical significance
Not importance, but likelihood
of chance
Relative to sample size (df)
Not about cause and effect
Techniques to determine
correlation coefficient
Pearson r
(ratio or interval data)
Spearman rho
(rank or ordinal data)
Phi coefficient
(categorical data)
To avoid possibility of
underestimation
Eta coefficient if curvilinear
Low reliability yields
attenuation
Restricted range of values
Slide 15 Transcript
Interpretation of a correlation coefficient depends on how it is
used. In a prediction study, the correlation coefficient value is
important for accurate predictions. In a study to examine
14. hypothesized relations, a correlation coefficient is interpreted in
terms of its statistical significance. Statistical significance
refers to the probability that the results would have occurred
just by chance. As always, keep in mind that significance does
not mean importance, rather, it indicates probability of results
compared with chance occurrence. Another aspect of
significance testing with correlations is that it is computed
relative to the sample size. To demonstrate a significant
relation, the correlation coefficient for small sample sizes must
be
higher than those for larger sample sizes. The difference is due
to the larger degrees of freedom for the larger sample. As
mentioned before, it is essential to keep in mind that
correlation is about relation, not cause and effect. While it can
be tempting to conclude that high correlation values establish
cause and effect, they merely suggest it and open the way
to experimental designs to actually confirm if true causation
exists.
The most common technique to compute correlation coefficient
is the product moment correlation coefficient, usually referred
to as the Pearson r, and is used when both variables are
expressed as continuous, i.e., ratio or interval, data. The
Pearson r results in the most precise estimate of correlation. If
the data for at least one variable is rank or ordinal data, the
more appropriate technique is the rank difference correlation,
known as the Spearman rho. The Spearman rho is useful with a
small number of participants and relatively easier to
compute. Used less often is the phi coefficient when both
variables are expressed categorically, e.g., by gender or
political affiliation. Even so, there are different approaches
depending
on whether such dichotomies are natural or artificial, i.e.,
created by operationally defining a midpoint and categorizing as
falling above or below. Most correlational techniques are
based on the assumption of a linear relationship, and one in
15. which a change in one variable corresponds with a change in the
other. However, some relationships are curvilinear, and
an eta coefficient is appropriate, otherwise computations would
show no relationship exists. There are other factors that may
lead to underestimation. Attenuation will reduce the
values and occurs where the correlation coefficients have low
reliability. A second limitation might occur if there is a
restricted range of values in the data. This is because the more
variability in each set of values, the higher the correlation
coefficient is likely to be.
Correlation, Regression, and Prediction
Linear regression
Simple
(one IV predicts for the DV)
Multiple (two or more IV
predict the DV)
Scatter Plot
Shows direction and strength
of correlation
Points plotted on axes for x
(predictor v.) and y
(criterion v.)
Line of best fit
as Y = bX + a
use equations for b (LOBF)
and a (intercept)
calculated using sum of least
squares
16. Slide 17 Transcript
A regression equation examines how accurately one or more
variables enable prediction to be made regarding values of
another. A simple linear regression generates an equation in
which a single independent variable yields prediction for the
dependent variable. A multiple regression yields an equation in
which two or more independent variables are used to
predict the dependent variable. Previously covered were how
correlation indicates direction of a relationship (positive or
negative) between variables and strength of the relationship
as expressed in the correlation coefficient. Both of these,
direction and strength, can be shown in a scatter plot, or
diagram, that summarizes how the factors are related. Data
points
are plotted on an x and y axis. Then, the regression line is the
best-fitting straight line that minimizes the distance that all data
points fall from it. The extent to which two factors are
related is determined by how far the points fall from the
regression line. The issue of variability for interpreting
correlation coefficients was explained earlier. A scatter plot and
regression line are visual depictions of this variability. With
linear regression, first it is necessary to identify the predictor
variable, which is the one we know and indicated as the x
axis, and the criterion variable or the one we don’t know which
is indicated as the y axis. Then, a bit of geometry is needed to
complete the equation. There are many expressions
for the equation to calculate the line of best fit, and one we use
here is Y= bX + a. Y is the criterion variable, X is the predictor
variable, b is the slope of the regression line, and a is
where the line crosses the y axis (the intercept), or how high up
the y axis it goes. For b, the best-fitting line is calculated using
the least squares method with the means and standard
deviations of the x and y values. The line’s location is the
vertical distance between a potential line where each point is
17. squared, and these squares summed so that the line with the
least sum of squares is selected as the best fit. The formulas for
each of these operators is performed using calculus and is
beyond the scope of what is covered here. So, it really is not
something you can eyeball and draw to precision, although you
would probably be very close.
Validity and Reliability Effects
Correlation depends on
how well variables are
measured
Poor reliability or validity
diminishes relations
Extreme values affect the
means
Reminder:
Correlation does not
indicate causation
Does not describe influence
or enhancement
Assists in highlighting areas
for further research
Experimental designs
needed to establish
cause-effect
Slide 19 Transcript
18. A statistical correlation between variables depends on how well
those variables have been measured. In quantitative research
designs, if the instruments used have poor validity
and reliability, the likelihood of finding any real relationship is
dim. But, one thing is certain, you can discover substantial
correlations between characteristics only if you can
measure both variables with a reasonable level of validity and
reliability. One of the threats to regression equations is the
presence of extreme values, since that affects the means
and calculation for the line of best fit. As mentioned before,
correlation, either positive or negative, does not necessarily
indicate causation. This means that describing a
correlation as influencing or enhancing something should not be
done, since that implies a cause and effect relationship –
something a correlation cannot provide. That
remains the province of the experimental designs. So, a
correlation is like a signpost – it should lead you to wonder
about the underlying reason for the relationship and
stimulate further research that is more directed to determine
cause and effect for the association.
So, we have learned about two ways to conduct non-
experimental research designs using the tools of surveys and
correlation. That ends this presentation and I hope you have a
fine week going forward.
Blank PageBlank PageBlank PageBlank PageBlank PageBlank
PageBlank PageBlank PageBlank PageBlank Page
5.2 Lecture: Survey Usage and Finding Correlations
View the presentation and listen to the explanations offered.
When completed, reflect on the presentation and write a brief
statement that describes what you found to be an important
aspect of the information and how that might help you with your
19. research process.
· Must demonstrate understanding of the task.
· Must be able to illustrate critical thinking.
· Must demonstrate the ability to express an opinion on the
covered material constructively.
· Grading will reflect whether the assignment has been
completed satisfactorily.