Correlations and t scores (2)


Published on

Help and Guide Study for Students Struggling with basic research concepts

Published in: Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Sat Scores, Age
  • Household size, annual income
  • Correlations and t scores (2)

    1. 1. Correlations and t Scores Dr. Pedro L. Martinez Introduction to Educational Research and Assessment
    2. 2. Populations and Samples  Population - all the individuals of interest in a particular study (characteristic that describes a population is called a parameter)  Sample - set of individuals selected from a population, usually intended to represent the population (characteristic that describes a sample is called a statistic)  Understanding the relationship between the population and a sample from that population is always important in understanding the use of inferential statistics and why formulas may differ for each of them why don’t we just study the population?
    3. 3. Populations and Samples  Population - all the individuals of interest in a particular study (characteristic that describes a population is called a parameter)  Sample - set of individuals selected from a population, usually intended to represent the population characteristic that describes a sample is called a statistic)  Understanding the relationship between the population and a sample from that population.  If we are mainly interested in the population, why don’t we just study the population?
    4. 4. Variables and Data  Variable - characteristic or condition that changes or has different values for different individuals.  Data - measurements or observations (plural)  Data set - collection of measurements or observation  Datum - single measurement (also called score or raw score)
    5. 5. Inferential Statistics  Inferential Statistics - techniques that allow us to study samples and then make generalizations about the populations from which they were selected.  z scores helped us to make comparisons between different distributions by standardization  Sampling error - discrepancy that exists between a sample statistic and the corresponding population parameter
    6. 6. Research Methods  Correlational Method - two different variables are observed to determine whether there is a relationship between them  Experimental Method - one variable is manipulated while another variable is observed and measured. Used to establish cause-and-effect relationship between variables
    7. 7. Correlational Designs  Represents the strength of the relationship between two variables  e.g., # of hours of studying and test grades  e.g., motivation and persistence  e.g., ability to delay gratification as a child and success in college  Scatter plot  Correlation coefficient (“r”) ranges from -1 to +1  e.g., r = +.34  e.g., r = -.52
    8. 8. Interpreting Correlations  Positive correlation  increase in studying associated with increase in tests grade scores  Negative correlation  increase in studying associated with decrease in test grade scores  No correlation (0 correlation)  Variables are not related  Correlation may exist but it might be associated with another variable (e.g. persistence and number of hour studying a subject)
    9. 9. Correlations: Positive, Negative, and None
    10. 10. Correlation ≠Correlation ≠ CausationCausation
    11. 11. Why can’t we infer causality?  Reverse-Causality Problem X Y→ or Y X ????← Is there a relationship between exposure to violent films and aggression?
    12. 12. Why can’t we infer causality?  Reverse-Causality Problem X Y→ or Y X ????←  Third-variable problem A X→ and A Y→ e.g., ice cream sales and violence (r = +.29) VERY IMPORTANT FOR
    13. 13. Explaining Correlations: Three Possibilities
    14. 14. Not Causality It is essential that you remember this point: A nonexperimental research study could be seriously flawed if you are interested in concluding that an observed relationship is a causal relationship.  That's because "observing a relationship between two variables is not sufficient grounds for concluding that the relationship is a causal relationship." (Remember this important point!)
    15. 15. The Three Necessary Conditions for Cause-and-Effect Relationships  It is essential that your remember that researchers must establish three conditions if they are to make a defensible conclusion that changes in variable A cause changes in variable B. Here are the conditions (which have been agreed upon by reserachers in a summary table:
    16. 16. Three Necessary Conditions for Causation Condition #1 Condition #2 Condition #3 Variable A and Variable B must be related Proper time sequence must be established (temporal antecedent condition) The relationship between Variables A & B must not be through a confounding extraneous variable (the third variable and the lack of an alternative explanation condition )
    17. 17. Applying the Three Necessary Conditions for Causation  in Nonexperimental Research  Non-experimental research is much weaker than strong and quasi experimental research for making justified judgments about cause and effect.  It is, however, quite easy to establish condition 1 in non-experimental research—just see if the variables are related For example, Are the variables correlated? or Is there a difference between the means?  It is much more difficult to establish conditions 2 and 3 (especially 3).
    18. 18. Designs Used to Avoid Pitfalls  When attempting to establish condition 3, researchers use logic and theory (e.g., make a list of extraneous variables that you want to measure in your research study), control techniques (such as statistical control and matching), and design approaches (such as using a longitudinal design rather than a cross-sectional design).
    19. 19. Real Life Example • Here is one more example of controlling for a variable: There is a relationship between gender and income in the United States. In particular, men earn more money than women. Perhaps this relationship would disappear if we controlled for the amount of education people had. What do you think? To test this alternative explanation (i.e., it is due not to gender but to education) you could examine the average income levels of males and females ate each of the levels of education (i.e., to see if males and females who have equal amounts of education differ in income levels). If gender and income are still related (i.e., if men earn more money than women at each level of education) then you would conclude make this conclusion: “After controlling for education, there is still a relationship between gender and income.” And, by the way, that is exactly what happens if you examine the real data (actually the relationship becomes a little smaller but there is still a relationship). Can you think of any additional variables you would like to control for? That is, are there any other variables that you think will eliminate the relationship between gender and income?
    20. 20. Other conditions…  When attempting to establish condition 2, researchers use logic and theory (e.g., we know that biological sex occurs before achievement on a math test) a design approach that is used for this condition is a longitudinal research because it establishes proper time order.  Condition 3 is a serious problem in nonexperimental research because it is always possible that an observed relationship is "spurious" (i.e., due to some confounding extraneous variable or "third variable").
    21. 21. Advantages of Correlational Methods  Allow assessment of behavior as it occurs in people’s everyday lives  Allow study of variables that can’t be studied in experimental designs (e.g. smoking, cancer)  Establishes that a relationship between 2 variables exists  One very serious disadvantage  CORRELATION IS NOT CAUSATION!
    22. 22. Experimental Method  Cornerstone of psychological research.  Used to examine cause-and-effect relationships.  Two essential characteristics:  Researcher has control over the experimental procedures to make sure that everything (but the variable being manipulated) stays the same.  Researcher manipulates one variable by changing its value from one level to another. A second variable is observed (measured) to determine whether the manipulation causes changes to occur  Participants are randomly assigned to different treatment conditions (they cannot self select).
    23. 23. Participant and Environmental Variables  Participant Variables - things like gender, age, and intelligence. Vary from one individual to another  Environmental Variables - lighting, time of day, weather. Must be the same across conditions
    24. 24. Random Sampling vs. Random Assignment  Random Sampling  Selecting Ps to be in study so that everyone in population has an equal chance of being in the study.  Representative samples  Generalization  Is it possible?  Random Assignment  Assigning Ps (who are already in study) to the different conditions so that each P as equal chance of being in any of the conditions.  Equalizes the conditions of experiment so that it is unlikely that conditions differ because of pre- existing differences  Required for inferences of causality.
    25. 25. Variables  Independent Variable  variable that we expect causes an outcome  the antecedent event  variable that the experimenter can control and manipulate  Dependent Variable  the “effect”  the outcome variable  it’s value depends on the changes introduced by the IV
    26. 26. IVs and Conditions  Must have at least two conditions (also called “levels”) of the IV in order to demonstrate that the IV has an effect on the DV. Otherwise, it wouldn’t be a ‘variable’.  Experimental condition (IV present) vs. control condition (IV not present)  Those in control condition receive no treatment or receive neutral, placebo treatment. Provides baseline for comparison with experimental condition.  Example  interested in mood and helping  experimental group – told they received “A” or “F”  control group – does not grade feedback
    27. 27. Laboratory Experiments  Conducted in settings in which:  The environment can be controlled. E.g., temperature, amount of light in room  The participants can be carefully studied. E.g., Ps remain in the same seat
    28. 28. Field Experiments  Conducted in real-world settings.  Advantage: People are more likely to act naturally.  Disadvantage: Experimenter has less control (“quasi-experiments”).  Quasi-independent variable - independent variable in nonexperimental study. Typically something the experimenter cannot manipulate such as gender or smoking
    29. 29. Field Experiments  Conducted in real-world settings.  Advantage: People are more likely to act naturally.  Disadvantage: Experimenter has less control (“quasi-experiments”).  Quasi-independent variable - independent variable in nonexperimental study. Typically something the experimenter cannot manipulate such as gender or smoking
    30. 30. Researchers are interested in influences on self-esteem. Specifically, researchers want to assess how performing a difficult task under pressure influences college students’ self-esteem. Ps are given a set of anagrams to solve. Half are randomly assigned to receive very easy anagrams, and half are given difficult ones. Crossed with this, half are randomly assigned to be given 10 minutes to complete the anagrams, and half are given 30 minutes to complete the task. After completing as many of the anagrams as they can, Ps are given a Q’aire labeled “Thoughts and Feelings Questionnaire” that is really a measure of self-esteem.  Constructs  IV1: task difficulty  IV2: pressure  DV: self-esteem  Operational  IV1: easy vs. hard  IV2: 10 vs. 30 minutes  DV: score on Q’aire
    31. 31. Discrete and Continuous Variables  Discrete Variable - consists of separate, indivisible categories. No values can exist between two neighboring categories.  One choice in a five point scale (1,2, 3, 4,5).  Continuous Variable - infinite number of possible values that fall between any two observed values. Divisible into an infinite number of fractional parts  Low Normal High  Agree, strongly agree, disagree, strongly disagree
    32. 32. Scales of Measurement  Nominal Scale  Ordinal Scale  Interval Scale  Ratio Scale
    33. 33. Nominal Scales  Nominal Nominal scales are the lowest scales of measurement. Numbers are assigned to categories as "names". Which number is assigned to which category is completely arbitrary. Therefore, the only number property of the nominal scale of measurement is identity. The number gives us the identity of the category assigned. The only mathematical operation we can perform with nominal data is to count. Classifying people according to gender is a common application of a nominal scale. In the example below, the number "1" is assigned to "male" and the number "2" is assigned to "female". We can just as easily assign the number "1" to "female" and "2" to male. The purpose of the number is merely to name the characteristic or give it "identity". 
    34. 34. Nominal Scale As we can see from the graphs, changing the number assigned to "male" and "female" does not have any impact on the data -- we still have the same number of men and women in the data set.
    35. 35. Ordinal Scale Ordinal scales have the property of magnitude as well as identity. The numbers represent a quality being measured (identity) and can tell us whether a case has more of the quality measured or less of the quality measured than another case (magnitude). The distance between scale points is not equal. Ranked preferences are presented as an example of ordinal scales encountered in everyday life. We also address the concept of unequal distance between scale points.
    36. 36. Coke, Pepsi or Sprite  Ranked Preferences We are often interested in preferences for different tastes, especially if we are planning a party. Let's say that we asked the three students pictured below to rank their preferences for four different sodas. We usually rank our strongest preference as "1". With four sodas, our lowest preference would be "4". For each soda, we assign a rank that tells us the order (magnitude) of the preference for that particular soda (identity). The number simply tells us that we prefer one soda over another, not "how much" more we prefer the soda. Changing the numbers changes the meaning of the preferences.
    37. 37. Interval Scales  Interval scales have the properties of:  identity  magnitude  equal distance  The equal distance between scale points allows us to know how many units greater than, or less than, one case is from another on the measured characteristic. So, we can always be confident that the meaning of the distance between 25 and 35 is the same as the distance between 65 and 75. Interval scales DO NOT have a true zero point; the number "0" is arbitrary.
    38. 38. IQ Scores are examples of an Interval Scale
    39. 39. Ratio Scales  Ratio scales of measurement have all of the properties of the abstract number system.  identity  magnitude  equal distance  absolute/true zero  These properties allow us to apply all of the possible mathematical operations (addition, subtraction, multiplication, and division) in data analysis. The absolute/true zero allows us to know how many times greater one case is than another. Scales with an absolute zero and equal interval are
    40. 40. Where do t scores come in?  We studied single variables such as Central Tendency measures.  However, most researchers look for relationships between variables ( at least two)  The foundation to understanding the relationship between them is the correlation coefficient.
    41. 41. More than just one correlation  The most widely used in education is Pearson Product Moment correlation coefficient.  1) When do we use it?  2) What does it tell us?  1.We use coefficients to see how variables are related to one another.  2. We use Pearson PMC when the variables being examined are ratio or interval variables (continous variables)
    42. 42. Examples  1. How much time in studying is related to your exam scores?  Assumptions for questions in your research:  A. The more time spent in studying yields higher scores in the exam.  B. Students will do well in an exam if they  Spend more time studying  C. Some students will not do well in the exam despite the hours spent studying.
    43. 43. Consider other possibilities  The reason for studying longer hours is because students do not understand the material.  Students will do well in the exam regardless of the time spent studying. Are the above exceptions to my rule?
    44. 44. How do we address this dilemma?  Two fundamental characteristics of correlations are:  A) Direction- + --  B) Strength r=.80 (is this low, moderate or high  Going back to our example, if student study more, then what is the direction of the correlation? What if they spend less time studying?
    45. 45. What happens when…  Students spend more time studying and the scores go down. Why?  The least time studying the scores go higher. Why?  So what is the direction in these cases and how do we describe this in terms of the direction of my correlation?  Draw a simple scattergram showing each of the above examples.
    46. 46. Which is which?
    47. 47. Strength (Magnitude) of Correlations  Correlations may range from -1.0 to +1.0 . A correlation of ) indicates no relationship.  The closer the correlation to -1.0 or +1.0 the stronger the relationship  Perfect correlations are never found. Instead we find:  -.20 and +.20=weak  above 2.0 to .50=moderate  Above .50 to .70= strong  Some correlations despite their magnitude do not yield any value. However, weak correlations may be of great significance.
    48. 48. Examine the following Example  r = is the symbol for correlations Student Hours Spent in Studying (X) Exam Score (Y) Student 1 5 80 Student 2 6 85 Student 3 7 70 Student 4 8 90 Student 5 9 85 *Scores must be paired because you will convert scores to z scores in order to subtract them from the mean and dividing by the standard deviation.
    49. 49. Consider the Following Scenarios  One set of scores in variable x is associated with high scores in variable y.  One set of low scores in variable x is associated with a set of low scores in variable y  When two factors are positive then you get a positive product.  When two factors are negative you get a positive product.  Explain what these statements mean with
    50. 50. What happens when…  High scores in one variable are associated with low scores in the other variable? Do not ascribe more to the association of the variables!  Correlation simply means that variations in the scores of one variable correspond to the variation of the scores of another variable.
    51. 51. Are correlations always linear? A curvilinear relationship begin positive then it turns negative. There are some explanations for this when it happens. Consider anxiety before a test and how it may help to improve the scores. However, too much of it can cause negative scores. This is alos a sign of a weak relationship.
    52. 52. Correlations can be calculated in various ways: X Values Y Values 5 80 6 85 7 70 8 90 9 85 X Value Y Value X*Y X*X(X2 ) Y*Y (Y2 ) 5 80 5 * 8 = 40 5 * 5 = 25 80 * 80 = 640 6 85 6 * 85 = 510 6 * 6 = 36 85 * 85 = 665 7 70 7 * 70 = 490 7 * 7 = 49 70 * 70 = 490 8 90 8 * 90 = 720 8 * 8 = 64 90 *90 = 810 9 85 9 * 85 =765 9 * 9 =81 85 * 85 = 665 Step 1: Find ΣX, ΣY, ΣXY, ΣX2 , ΣY2 . ΣX = ΣY = ΣXY = ΣX2 = ΣY2 = Step2: Now, Substitute in the above formula given. Correlation(r) =[ NΣXY - (ΣX)(ΣY) / Sqrt([NΣX2 - (ΣX)2 ][NΣY2 - (ΣY)2 ])]
    53. 53. Review language for formulas Hinton page 259
    54. 54. This is another way This means: •Calculate everybody's Z score. •Multiply your Zx by your Zy. •Add up these pairs for everyone. •Divide by the number of people or observations. So if we multiply your Z scores together, sum all these pairs and we get a positive sum (positive scores for X multiplied by positive scores for Y and negative scores for X multiplied by negative scores for Y) - we have a positivecorrelation) If you have a negative relationship you will have thesumof positive Zxs times negative Zys and vice versa. Add these up for a negative sum. If you have no correlation, then you get equal numbers of positive Zx times negative Zy, positive Zx times positive Zy, negative Zx times negative Zy, and negative Zx times positive Zy. Add these up and you get zero.
    55. 55. Correlations may be used when… In the simple case of correlational research you have one quantitative IV (e.g., level of motivation) and one quantitative DV (performance on math test).  · The researcher checks to see if the observed correlation is statistically significant (i.e., not due to chance) using the "t-test for correlation coefficients" (it tells you if the relationship is statistically significant.  · Remember that the commonly used correlation coefficient (i.e., the Pearson correlation) only detects linear relationships.
    56. 56. Let’s go back to our example Student No hours spent Studying Exam Score Joyce 0 95 Ashley 2 95 Jeff 4 100 Sean 7 95 Pedro 10 100 What can we conclude from the above examples?
    57. 57. We want to know… 1. Is there a correlation? 2. How strong and what the direction of the correlation? 3. Is it statistically significant? 4. Is there a possibility of another extraneous factor that gave us a false correlation?
    58. 58. If you were testing the null hypothesis…  This where t test comes along.  If your Ho= =0 (rho, population correlationƿ coefficient)  If your H1= 0ƿ≠  The t distribution is used to test whether a correlation coefficient is statistically significant.  The t test just like z scores consist on a ratio where t= r-ƿ  sr
    59. 59. t tests t= r-ƿ sr  r is the sample correlation coefficient  is the population correlation coefficientǷ  Sr is the standard error of the sample correlation coefficient  We do not need to calculate the standard error of the sample because algebraic equation allows us to use this formula: 2 2
    60. 60. Find a t value  Go back to your formula where t=(.25) √ 100-2  1-.25 2  You can skip themath now and weget : t =2.56, df= 98 Using appendix B we find that a t score of 2.56 has a probability between 01 and .02 of occurring by chance. If the researcher has used and alpha value <.05 then the null hypothesis is rejected and you can conclude that the correlation is significant by accepting the alternative hypothesis. Thus r=.25 , t98,=2.56, < .05ƿ
    61. 61. You will learn more about alpha values.  The alpha value is the probability of making a Type I Error . In a Hypothesis Test a Type I error occurs when statistically unlikely test results lead to the incorrect conclusion that the null hypothesis should be rejected. The alpha value is conventionally designated by the symbol alpha ( ).α  The alpha value is also called the 'level of significance' and is selected based on the importance of the test, a value of 0.05 would be common, with 0.01 for tests that are critical, and even 0.001 in some cases.  A hypothesis test involves comparing the calculated p-value with the previously agreed alpha value. If the p- value is less than the alpha value the alternative hypothesis will be accepted.
    62. 62. The formula for calculating the t value is  t=(r) √ N-2 1-r2 Consider the following example: I randomly select 100 people living in different latitudes to measure whether the number of hours expose to sunlight result in a seasonal affective disorder (SAD) measured by a mood scale of 1-10 where 1=“very sad” and 10=“very happy”. Suppose that I have computed a Pearson Correlation wit my data and I find that there is the above variables provided me with a .25 correlation. I want to know if this is statistically significant. So what do I do?
    63. 63. Summary: A picture is worth a thousand words.
    64. 64. My Easy Way…  ht tp://   d/templates/student_resources/worksho ps/stat_workshp/scale/scale_08.html  ht tp://   d/templates/student_resources/worksho ps/stat_workshp/scale/scale_08.html