Correlational Research involves collecting data to determine whether and to what degree a relation exists between two or more variables. The degree of relation is expressed as a correlation coefficient. It is sometimes treated as a type of descriptive research, primarily because it describes an existing condition. Correlation may be designed to determine relationships among variables (relationship study) or to use the relationships to make predictions (prediction study).A correlation study can also be used to help determine what variables will be examined in a causal-comparative or experimental study. (From page 216 - 218 Textbook)
Correlational research involves collecting data to determine whether, and to what degree a relationship exists between two or more quantifiable variables. The degree of the relation is expressed as a correlation coefficient. If two variables are related, scores within a certain range on one variable are associated with scores within a certain range on another variable. For example, intelligence and academic achievement are related; individuals with high scores on intelligence tests tend to have high grade point averages, and individuals with low scores on intelligence tests tend to have low grade point averagesGay, Lorraine R.; Mills, Geoffrey E.; Airasian, Peter W. (2011-09-13). Educational Research: Competencies for Analysis and Applications (10th Edition) (Page 204). Pearson. Kindle Edition.
What is the purpose of the correlational study? So glad you asked! It might be to determine relations among variables, as in a relationship study, or it might be to make predictions, as in a prediction study. If you are in the educational realm, a major complex variable, achievement, is investigated in correlational research!
Variables that are not highly related can be dropped. Variables that are highly related can be examined closer to determine the nature of the relations. (Drop www.psdgraphics.com) (Examiner.jpg). A correlational relationship is not a cause-effect relationship, but it does allow for prediction.
This example, recreated from the book shows how high school GPA’s and college GPA’s are related. As you can see, Iggie, the first high school student listed has an IQ of 85, and a GPA of 1.0. The last student on the list, Jane, has an IQ of 140 with a 3.8 GPA. The correlation allows college admissions to hypothesize a ball park figure of what that student’s GPA will look like in college. At the bottom of this table, you will see how correlational variables are correlated in decimal numbers. The decimal numbers range from -1.00 to +1.00. This shows the size, relation, and direction between the variables. +1.00 is the strong part of the relation, and a person having a high score on one of the variables will most likely have a high score on the other part of the relation. It is likewise with the opposite lower scores also. A lower scoring in one variable will most likely be a lower score in the other one. If the coefficient is near 0, the variables are most likely not related, as you can see in the middle column with weight and IQ. A score near -!.00 also has a strong relationship (a negative inverse direction). P. 206 Textbook
This table is recreated from p. 206 in the text and shows how the data in the correlation coefficient can be used. The considerations depend on the purpose, and should be taken into consideration when determining data.
Beginning researchers mistakenly thing a correlation coefficient of 0.50 means that two variables are 50% related. In research, the square of the correlation coefficient is the amount of variance shared by the variables, because the range is known as the variance. Common variance is also called shared variance. It shows the way the variables vary using a system. So the true technical term of common variance is that the shared variance is the variation in one variable! Isn’t that just the coolest statement? It just means if there is little or no variance there is no relation. If there is a large variance, there is a higher relation. The more common the variance, the higher the relation. P. 207
In the Correlational Research process, a study must start with a problem selection. For you Trekkie fans, the process must be “logical.” And for you Einstein fans, use a theoretical or experiential basis for the research.
A sample size of around 30 is needed. More than 30 can be used if results show errors of measurement. No less than 30 should be used. If instruments do not show an accurate reflection to the variables, the correlation will not indicate the degree of relation accurately.
The design for correlational research is not complicated! Each participant will need two or more variables of interest for scores. Then the paired scores are correlated! Results are expressed as a correlation coefficient with the indication of the degree of relation between the two variables.
(Read Slide!) If it is a true statistical relationship, the correlation coefficient must reflect it. It cannot have occurred due to chance. Significant does not mean important, but that the probability of a result is due to chance. Statistical significance is also computed to the sample size. Small sample sizes must be higher than large sample sizes: The larger the sample, the more closer it is in correlation to the population.
There are two basic types of Correlational Research with many sub groups that branch off of them. An example of Relationship Studies can show how academic achievement is affected by socioeconomic status. Prediction Studies can use variables to show which students will succeed at college.
Read the Slide!Researchers may want to know whether hyperactivity has something to do with motivation, or parental punishment has something to do with self-concept. Relationship studies help researchers find related variables to examine in causal-comparative and experimental studies. Experimental studies are time consuming and expensive. They also provide information about variables to control for in causal-comparative and experimental research studies. This simply means if researchers can IDENTIFY related variables to a performance, they can REMOVE the influence of the variables so the effect of the independent variable will be clear.
In Data collection, you must first identify the variables to be related. A textbook example was academic achievement’s relation to socioeconomic status. Larger coefficients lead to larger error, so it is best to keep the numbers low. Next you have to identify an appropriate population to sample. This must be done in a short time. It can be one time, or several times. The scores from one variable can be compared to the scores from another variable, or many variable scores are compared to the primary interest.
Once the scores for one variable (or several variables) have been collected, they are correlated with scores for another variable of primary interest. The Appropriate method of computing correlation coefficients depends on the type of data represented by the variables. Pearson r is the most common technique. It is used when both variables are expressed as continuous data (aka ratio, or interval). Pearson r is used in education the most. (Isn’t it amazing that Pearson is a big name in textbooks?). It ia a very precise method. Spearman rho is another data used in Relationship studies. It is used when at least one variable’s data is expressed as rank or ordinal data. Rank data is used when participants are ranked from highest score, 1, to how many participants are in the study (lowest). If there is a tie, the two scores are averaged out. Other methods with specific coefficients are phi coefficient used to define particular group differences, such as men and women, political parties, smokers to non-smokers, or graduated/drop-outs can be used. There is a table on page 211 of the text that goes into greater detail of other specific methods for analyzing and interpreting Relationship studies. Each has it’s own niche in this type of analysis.
Some correlational techniques can be base on linear relation, where one variable is associated with a corresponding increase. Two measured variables that share a non-linear relationship and show the correlation coefficient to be meaningless, is a curvilinear relation. High values or one variable correspond to high values of another variable at certain points and low values of the other variable at other points, as seen on this test scores performance correlational to the amount of anxiety. Using on type instead of the other can throw the results way off. If estimates are inaccurate, ATTENUATION can happen. That is when the reduction of correlation coefficients measures have low reliability. A restricted range of scores can also cause underestimation of the true relation between two variables.
In a prediction study, if two variables are highly related, scores on one variable can be used to predict scores on the other variable. High School grades can be used to predict college grades. Likewise, teacher scores on certification can be used to predict a principal’s evaluation of the teacher’s performance. These are called predictors and the complex variables are called criterion. Prediction studies can facilitate decision making about individuals. Data collection requires participants to provide desired data and to be available to the researcher. If the criterion were “success on the job,” “success” would need to be defined. The difference between relationship and prediction studies data collection is that relationship studies have variables gathered in a short period of time, while prediction studies variables are obtained earlier than the criterion variable. When the predictor variable is established, a new group can be established to determine how it holds up to the first group. Prediction studies can be less accurate in one group than the other group. That is called shrinkage. This can happen when one set of circumstances will not occur again in any other group. This is called. In this instance, cross-validation should occur with at least other group. Then the criterion measure should be removed.
In Data Analysis and interpretation of prediction studies, each predictor variable must correlate with the criterion variable. There are two types of prediction studies: a single prediction, which has a single predictive variable, and a multiple prediction study, which includes more than one predictive variable. Both are based on a prediction equation. For the single variable prediction equation, where Y = a+bX, Y is the predicted criterion score for an individual. X is an individual’s score on the predictor variable. A is a constant calculated from the scores of all participants, and b is the coefficient that indicates the contribution of the predictor variable to the criterion variable. A high school GPA can be used to predict a college GPA in this way. A combination of variables can give a more accurate prediction thansn a one variable, causing the multiple regression equation, or multiple prediction equation. Two or more variables that individually predict a criterion will give a more accurate prediction. An intervening variable is one that cannot be directly observed or controlled, but can influence the link between predictor and criterion. This is used if the predictor and the criterion variables are not reliable. Scores in relations studies tend to be imperfect and are reported as a range with a statistic called standard error. The coefficient of determination is an indication of the common variance shared by the predictor and the criterion variables. It is a squared correlation of the predictor and the criterion. P. 214 text.
An FYI on Correlational Data: Multiple regression uses continuous predictor variables to predict continuous variables. Discriminate function analysis use continuous predictable variables to predict a categorical variable for comparison. Canonical Analysis just extends multiple regression analysis, but produces a correlation based on a group of predictor variables and a group of criterion variables (like multiple scores in SAT, GPA and teacher’s ratings in relation to achievement). Path Analysis shows relations and patterns among a number of variables, showing the relationship pathway and connections to each other. See figure 8.3, p. 215 in textbook. Structural Equation Modeling is a sophisticated and powerful model that is an extension of Path Analysis. It is also known as LISREL, from a computer program use to perform the analysis. This model clarifies the direct and indirect interrelations among variables relative to a given variable, and also provides theoretical validity and statistical precision in the model diagram it produces. Lastly, Factor Analysis computes a large number of variables in a group and turns them into smaller clusters called factors. It is a computer program that correlates the variables and derives factors by finding groups of variables that are co-related high among like-variables, but low among other variables.
Correlational research 1 1
By Sheila Wilson,
Bonnie Dompierre, and
What is Correlational Research?
The degree of relation
is expressed as a
P. 216 Textbook
These are not cause and effect relationships!
What is the process of
Low intelligence test score =
Low grade point average.
High intelligence test score. =
High grade point average
Example: Intelligence and academic achievement are related. Individuals with
high scores on intelligence tests tend to have high grade point averages, while
individuals with low scores on intelligence tests tend to have low grade point
What is the purpose of
*It might be used in a relationship study to determine
relations among variables, such as
IQ and GPA; IQ and Weight; IQ and Errors (Textbook); or
Living together and divorce rates; or
Internet usage and depression.
*It might be used in a prediction study to make
predictions, using the results of the examples above to
make predictions about GPA, Weight, Errors, Divorce
rates, and Internet usage.
Education Participants: A major complex variable,
achievement, is investigated in correlational research!
Variables that are not highly
related can be dropped.
Variables that are highly related
can be examined closer to
determine the nature of the
TABLE 8.1 • Hypothetical sets of data illustrating a strong
positive relation between two variables, no relation,
and a strong negative relation.
Strong Position Strong Negative
Relation No Relation Relation
IQ GPA IQ Weight IQ Errors
1. Iggie 85 1.0 85 156 85 16
2. Hermie 90 1.2 90 140 90 10
3. Fifi 100 2.4 100 120 100 8
4. Teenie 110 2.2 110 116 110 5
5. Tiny 120 2.8 120 160 120 9
6. Tillie 130 3.4 130 110 130 3
7. Millie 135 3.2 135 140 135 2
8. Jane 140 3.8 140 166 140 1
Correlation r = + .95 r = + .13 r = -.89
A way to interpret
Coefficient Relation Between
Between +0.35 and -0.35 Weak or none IQ:Weight = +0.13
Between +0.35 and
Between -0.35 and -0.65 Moderate
Between +0.65 and 1.00 Strong IQ:GPA = +0.95
Between -1.00 and -0.65 Strong IG:Errors= -0.89
Common new researcher mistakes, and the
shared variance is the variation of the variable.
Choose logical variables.
Use theoretical basis.
To determine common variance, square the correlation
coefficient: .80 = (0.80)2 or 0.64, or 64% common variance.
0.00, or [0.00]2 shows 0.0 or 00% common variance.
1.00 or [1.00]2 shows 1.00 or 100% common variance.
A .50 means only a 25% common variance, where 75% of the
variance is unexplained variance.
Statistical significance is the probability that the results
could have occurred due to chance.
It’s about the math:
Types of Correlational
Relationship Study – A researcher attempts to gain
insight into variables that are related to a complex
Academic achievement vs. Socioeconomic status
Prediction Studies – A researcher uses two highly
related variables to predict scores on other variables.
High GPA and IQ as a predictor of success in college.
In a Relationship
• Researchers gain insight to variables or factors that
are related to a complex variable.
• In Educational Research:
a) academic achievement
First: Identify the variables to be related.
Ex: Academic Achievement vs. Socioeconomic
Next, identify appropriate population to sample.
Pearson r is the most common technique.
Spearman rho is a rank correlation coefficient.
Others: phi coefficient (gender based, political
affiliation, smoking status, educational status),
Kendall’s tau, Biserial, Point biserial, Tetrachonic,
Intraclass, and Correlation ratio, or eta.
Correlational Techniques in
Prediction Studies and
Definition: An attempt to determine which number or variables are most highly related
to criterion variable.
Data collection requires participants to provide desired data and to be available to the
Shrinkage occurs when the second predictor group has less accurate data than the first
Cross-validation should occur if one-of-a-kind circumstances in one group result.
Data Analysis and Interpretation
of Prediction Studies
Each predictor variable must correlate with the
Two types of Prediction studies occur: single
prediction, and multiple prediction.
Single Variable prediction equation: Y= a + bX
Prediction and Relationship studies are similar, in
that they can be formulated for each number of
subgroups or total groups.
• Many sophisticated statistical analyses are based on
• Multiple Regression
• Discriminate Function Analysis
• Canonical Analysis
• Path Analysis
• Structural Equation Modeling, AKA LISREL
• Factor Analysis
Houston, we have a problem…
Problems in interpreting Correlational Coefficients:
Proper correlation method calculation may not have
Relations cannot be found if reliabilities are low.
Invalid variables produce meaningless results.
The range of scores could be too narrow or too broad.
Large samples may show correlations that are
statistically significant but unimportant.