SlideShare a Scribd company logo
1 of 72
Download to read offline
Correlation, Regression & T-test 
Prepared By: Dr. Kumara Thevan a/l 
Krishnan
Introduction 
Investigation on a relationship between two or more 
numerical or quantitative variables can be conducted 
using techniques of correlation and regression analysis. 
! 
- Correlation is a statistical method used to determine 
whether a linear relationship between variable exist. 
! 
- Simple linear regression is a statistical method used to 
described the nature of the relationship between two 
variables.
Definition 
Scatterplot (or scatter diagram) i s a 
graph in which the paired (x,y) 
sample data are plotted with a 
horizontal x axis and a vertical y axis. 
! 
Each individual (x,y) pair is plotted as a 
single point.
Definition 
Correlation 
! 
exists between two variables 
when one of them is related to 
the other in some way
Example 
Open SPSS. 
Data ; weight height biometry male 
2012.sav 
Graph 
Legacy dialogs=> scatter plot 
Height (X); Weight ( Y)
BMI test 
Category BMI range – kg/m 
Very severely underweight less than 15 
Severely underweight from 15.0 to 16.0 
Underweight from 16.0 to 18.5 
Normal (healthy weight) from 18.5 to 25 
Overweight from 25 to 30 
Obese Class I (Moderately 
from 30 to 35 
obese) 
Obese Class II (Severely 
obese) 
from 35 to 40 
Obese Class III (Very 
severely obese) 
over 40
Normality test? 
• The Kolmogorov-Smirnov and Shapiro-Wilk test. 
• The compare the scores in the sample to a 
normally distributed set of scores with the same 
mean and s.d. 
• If p>0.05, the test is non-significant. Tells us 
that the distribution of the sample is not 
significantly different from a normal 
distribution. 
• The test is significant (p<0.05) then the 
distribution in question is significantly different 
from a normal distribution(non-normal)
What can you see?
Positive Linear Correlation 
y y y 
x x 
x 
(a) Positive (b) Strong 
positive 
(c) Perfect 
positive
Negative Linear Correlation 
y y y 
x x 
x 
(d) Negative (e) Strong 
negative 
(f) Perfect 
negative
What can you see?
Test 1 
- Do scatter plot using data 
ExamAnxiety.sav 
- Exam performance (%) – y axis 
- Exam anxiety – x axis 
- Color – place gender 
- Results? 
! 
- Try 3D plot – 3 variables
Bivariate correlation 
• Having taken a preliminary glance at the 
data, we can proceed to conduct the 
correlation analysis.
Definition 
! 
Linear Correlation Coefficient r 
measures strength of the linear 
relationship between paired x- and 
y-quantitative values in a sample
No Linear Correlation 
y y 
x x 
(g) No Correlation (h) Nonlinear Correlation
Definition 
! 
Linear Correlation Coefficient r 
sometimes referred to as the 
Pearson product moment correlation 
coefficient
Notation for the 
Linear Correlation Coefficient 
n number of pairs of data presented. 
Σ denotes the addition of the items indicated. 
Σx denotes the sum of all x values. 
Σx2 indicates that each x score should be squared and then 
those squares added. 
(Σx)2 indicates that the x scores should be added and the total 
then squared. 
Σxy indicates that each x score should be first multiplied by its 
corresponding y score. After obtaining all such products, find their sum. 
r represents linear correlation coefficient for a sample 
ρ represents linear correlation coefficient for a population
Definition 
Linear Correlation Coefficient r 
nΣxy - (Σx)(Σy) 
n(Σx2) - (Σx)2 n(Σy2) - (Σy)2 r =
Test 2 
• Run the correlation analysis 
ExamAnxiety.sav 
• Assumption – Data is normally distributed
Results 
Correlations 
Exam 
Performance 
(%) Exam Anxiety 
Time Spent 
Revising 
Exam Performance 
(%) 
Pearson 
Correlation 
1 -.441 .397 
Sig. (1-tailed) .000 .000 
N 103 103 103 
Exam Anxiety Pearson 
Correlation 
-.441 1 -.709 
Sig. (1-tailed) .000 .000 
N 103 103 103 
Time Spent Revising Pearson 
Correlation 
.397 -.709 1 
Sig. (1-tailed) .000 .000 
N 103 103 103 
**. Correlation is significant at the 0.01 level (1-tailed).
Interpretation 
• Exam performance is positively related to 
the amount of time spent revising, with a 
coefficient of r= 0.397, which is also 
significant at p< 0.01. 
! 
• Exam anxiety appears to be negatively 
related to the time spent revising (r= 
-0.709, p< 0.01)
Interpretation 
• Each variable is perfectly correlated with 
itself (r=1). 
! 
• Exam performance is negatively related to 
exam anxiety with a Pearson correlation 
coefficient of r= - 0.441 and there is less 
than 0.01 probability that a correlation 
coeficient this big would have occurred by 
chance in a sample of 103 people.
In layman term 
• exam anxiety , exam mark 
• Revision time , exam mark 
• Revision time , exam anxiety
Hands on 
• Is there a linear association between 
weight and heart girth in this herd of cows? 
• Weight was measured in kg and heart girth 
in cm on 10 cows 
! 
! 
! 
• Assume data is normally distributed
• The sample coefficient of correlation is 
0.704. The P value is 0.012, which is less 
than 0.05. The conclusion is that 
correlation exists in the population. 
Correlations 
Weight Girth 
Weight Pearson Correlation 
1 .704 
Sig. (1-tailed) .012 
N 10 10 
Girth Pearson Correlation 
.704 1 
Sig. (1-tailed) .012 
N 10 10 
*. Correlation is significant at the 0.05 level (1-tailed).
Using R2 for interpretation 
( correlation coefficient) 2 = coefficient of 
determination, R2 
! 
R2 is a measure of the amount of variability 
in one variable that is explained by the 
other.
Example 
Correlations 
Exam 
Performance 
(%) Exam Anxiety 
Time Spent 
Revising 
Exam Performance 
(%) 
Pearson 
Correlation 
1 -.441 .397 
Sig. (1-tailed) .000 .000 
N 103 103 103 
Exam Anxiety Pearson 
Correlation 
-.441 1 -.709 
Sig. (1-tailed) .000 .000 
N 103 103 103 
Time Spent Revising Pearson 
Correlation 
.397 -.709 1 
Sig. (1-tailed) .000 .000 
N 103 103 103 
**. Correlation is significant at the 0.01 level (1-tailed).
Example 
Exam anxiety and exam performance 
• ( correlation coefficient) 2 = coefficient of 
determination, R2 
! 
R2 = ( -0.441) 2 = 0.194 
! 
• In % = 0.194 x 100 = 19.4%
• Although exam anxiety was correlated 
with exam performance, it can account 
for only 19.4 % of variation in exam 
scores. 
! 
• 80.6% of the variability to be accounted 
for other variables such as different 
ability, different level of preparation and 
so on…)
Hands on 
Subject Age, x Pressure, y 
A 43 128 
B 48 120 
C 56 135 
D 61 143 
E 67 141 
F 70 152 
Compute the value of the correlation coefficient for the data? 
Do you have enough Statistical evidence that this relationship 
does not occur by chance?
Correlations 
Age Pressure 
Age Pearson 
Correlation 1 .897 
Sig. (2-tailed) .015 
N 6 6 
Pressure Pearson 
Correlation .897 1 
Sig. (2-tailed) .015 
N 6 6 
*. Correlation is significant at the 0.05 level (2-tailed). 
R 2 = ?
Regression 
Correlation do not provide the predictive 
power of variables. 
! 
In regression analysis we fit a predictive 
model to our data and use that model to 
predict values of the dependent variable 
from one or more independent variables.
Independent V. Dependent 
• Intentionally 
manipulated 
• Controlled 
• Vary at known rate 
• Cause 
• Intentionally left 
alone 
• Measured 
• Vary at unknown 
rate 
• Effect
• Simple regression seeks to predict an 
outcome variable from a single predictor 
variable whereas multiple regression 
seeks to predict an outcome from several 
predictors. 
! 
Outcomei = (Modeli) + errori 
Yi = (bo + b1 xi ) + ei
Least squares 
Least squares is a method of finding the line 
that best fits the data. 
! 
This “line of best fit” is found by 
ascertaining which line, of all of the 
possible lines that could be drawn, results 
in the least amount of difference between 
the observed data points and the line.
The vertical lines (dashed) represent the differences (or residuals) 
between the line and the actual data
• “The best fit line” – there will be small 
differences between the values predicted by the 
line and the data that were actually observed. 
! 
• Our interest- in the vertical differences 
between the line and the actual data because 
we are using the line to predict values of Y from 
values of the X-variable. 
! 
• Some data fall above or below the line, 
indicating there is difference between the 
model fitted to these data and the data 
collected.
• These difference called “residuals”. 
• If the “residuals” +ve and –ve cancelled 
each other 
! 
How ? 
! 
• Square the differences before adding up. 
• If the squared differences are large, the 
line is not representative of the data; if the 
squared differences is small then is 
representative.
Total sum of squares, SST 
SST uses the differences between the observed data and the mean value of Y
• The sum of squared differences (SS) can be 
calculated for any line that is fitted to some 
data; the “goodness of fit” of each line can 
then be compared by looking at the sum of 
squares for each. 
! 
• The method of least squares works by 
selecting the line that has the lowest sum of 
squared differences(so it chooses the line that 
best represents the observed data) 
! 
• This “line of best fit” known as a regression 
line.
Residual sum of squares, SSR 
SSR uses the differences between the observed data and the regression line
SS M uses the differences between the mean 
value of Y and the regression line 
Model sum of squares (SS M)
F-ratio 
F-test = MSM 
MSR 
! 
MSM (mean square for the model) 
! 
= SS M 
Number of variables in the model
F-ratio 
F-test = MSM 
MSR 
! 
MSR (mean square for the model) 
! 
= SS R 
Number of Observation- Number of 
parameters being estimated
F- ratio 
• a good model should have a large F-ratio 
(greater than 1 at least)
Test 1 
Open sample date – Record1.sav 
Graph=> scatterplot 
! 
Analyze=> regression
Model Summary 
Model R R Square 
Adjusted R 
Square 
Std. Error of 
the Estimate 
1 
.578 .335 .331 65.991 
a. Predictors: (Constant), Advertsing Budget (thousands of pounds)
Interpretation 
• The value of R2 is 0.335, which tell us that 
advertising expenditure can account 
33.5% of the variation in record sales. 
! 
• This means that 66% of the variation in 
record sales cannot be explained by 
advertising alone
F ratio 99.58, which is significant at p< 0.001(because the value in column 
labelled Sig. is less than 0.001. 
! 
This result tells us there is less than a 0.1% chance that an F ratio this 
large would happen by chance alone.Overall, the regression model predicts record 
sales significantly well.
Multiple regression 
• Open data ; Record2.sav
Results 
Descriptive Statistics 
Mean Std. Deviation N 
Record Sales 
(thousands) 193.20 80.699 200 
Advertsing Budget 
(thousands of pounds) 6.1441E2 485.65521 200 
No. of plays on Radio 1 
per week 27.50 12.270 200 
Attractiveness of Band 6.77 1.395 200
Correlations 
Record 
Sales 
(thousands) 
Advertsing 
Budget 
(thousands 
of pounds) 
No. of plays 
on Radio 1 
per week 
Attractivene 
ss of Band 
Pearson Correlation Record Sales 
(thousands) 
1.000 .578 .599 .326 
Advertsing Budget 
(thousands of 
pounds) 
.578 1.000 .102 .081 
No. of plays on Radio 
1 per week .599 .102 1.000 .182 
Attractiveness of 
Band 
.326 .081 .182 1.000 
Sig. (1-tailed) Record Sales 
(thousands) 
. .000 .000 .000 
Advertsing Budget 
(thousands of 
pounds) 
.000 . .076 .128 
No. of plays on Radio 
1 per week .000 .076 . .005 
Attractiveness of 
Band 
.000 .128 .005 . 
N Record Sales 
(thousands) 
200 200 200 200 
Advertsing Budget 
(thousands of 
pounds) 
200 200 200 200 
No. of plays on Radio 
1 per week 200 200 200 200 
Attractiveness of 
200 200 200 200 
Band
Model Summary 
Model R R Square 
Adjusted R 
Square 
1 .578 .335 .331 65.991 
a. Predictors: (Constant), Advertsing Budget (thousands of pounds) 
Model Summary 
Model R 
R 
Square 
Adjusted R 
Square 
Std. Error 
of the 
Estimate 
Std. Error of 
the Estimate 
Change Statistics 
Durbin- 
Watson 
R Square 
Change 
F 
Change df1 df2 
Sig. F 
Change 
1 .815 .665 .660 47.087 .665 129.498 3 196 .000 1.950 
a. Predictors: (Constant), Attractiveness of Band, Advertsing Budget (thousands of pounds), No. of plays on 
bR.a Ddeiop 1e npdeer nwt eVeakriable: Record Sales (thousands)
ANOVA 
Model 
Sum of 
Squares df 
Mean 
Square F Sig. 
1 Regression 433687.833 1 433687.83 
3 
99.587 .000 
Residual 862264.167 198 4354.870 
Total 1295952.00 
0 
199 
a. Predictors: (Constant), Advertsing Budget (thousands of pounds) 
b. Dependent Variable: Record Sales (thousands) 
ANOVA 
Model 
Sum of 
Squares df 
Mean 
Square F Sig. 
1 Regression 861377.41 
8 
3 287125.80 
6 
129.49 
8 
.000 
Residual 434574.58 
2 
196 2217.217 
Total 1295952.0 
00 
199 
a. Predictors: (Constant), Attractiveness of Band, Advertsing Budget 
(thousands of pounds), No. of plays on Radio 1 per week 
b. Dependent Variable: Record Sales (thousands)
Hands on 
• Open file softdrinks.sav 
! 
• Do multiple regression analysis 
! 
• Y – dependent – delivery time
Results 
Model Summary 
Model R R Square 
Adjusted R 
Square 
Std. Error of 
the Estimate 
1 .980 .960 .956 3.25947 
a. Predictors: (Constant), distance, cases 
ANOVA 
Model Sum of 
Squares df Mean 
Square F Sig. 
1 Regression 5550.811 2 2775.405 261.235 .000 
Residual 233.732 22 10.624 
Total 5784.543 24 
a. Predictors: (Constant), distance, cases 
b. Dependent Variable: time
Coefficients 
Model 
Unstandardized 
Coefficients 
Standardiz 
ed 
B Std. Error Beta t Sig. 
1 (Constant) 2.341 1.097 2.135 .044 
cases 1.616 .171 .716 9.464 .000 
distance .014 .004 .301 3.981 .001 
a. Dependent Variable: time
T-test 
• Testing differences between means 
! 
• Dependent means t-test: used when there are 
two experimental conditions and the same 
participants took part in both conditions of 
the experiment. 
! 
• Independent means t-test: used when there 
are two experimental conditions and different 
participants were assigned to each condition.
Dependent t-test 
• 12 spider-phobes who were exposed to a picture of a 
spider (picture) and on a separate occasion a real live 
tarantula (real). Their anxiety was measured in each 
condition (half of the participants were exposed to the 
picture before the real spider while the other half were 
exposed to the real spider first). 
• Which situation caused more anxiety? 
! 
! 
! 
! 
• Open spiderRM.sav
Results 
Paired Samples Statistics 
Mean N 
Std. 
Deviation 
Std. Error 
Mean 
Pair 1 Picture of 
Spider 40.00 12 9.293 2.683 
Real 
Spider 47.00 12 11.029 3.184 
Paired Samples Correlations 
N Correlation Sig. 
Pair 1 Picture of Spider & 
Real Spider 
12 .545 .067 
r= 0.545, not significantly correlated p > 0.05
Paired Samples Test 
Paired Differences 
t df 
Sig. (2- 
tailed) 
Mea 
n 
Std. 
Deviati 
on 
Std. 
Error 
Mean 
95% 
Confidence 
Interval of the 
Difference 
Lower Upper 
Pai 
r 1 
Picture of 
Spider - Real 
Spider 
-7.00 
0 9.807 2.831 -13.231 -.769 -2.47 
3 11 .031 
T-value minus; tells us that picture had a smaller mean that the real tarantula and so 
the 
Real spider led to greater anxiety than the picture. 
! 
Conclusion; that the exposure to a real spider caused a significantly more reported 
anxiety 
In spider-phobes than exposure to a picture (t(11)= -2.47, p< 0.05)
Hands on 
• All students who enroll in a certain 
memory course are given a pretest before 
the course begin. At the completion of the 
course, post test their scores are listed 
here. Verify the results shown on the 
output by calculating the values and 
assume normality. 
Std 1 2 3 4 5 6 7 8 9 10 
Before 93 86 72 54 92 65 80 81 62 73 
After 98 92 80 62 91 78 89 78 71 80
Independent t-test 
• We have 12 spider-phobes who were 
exposed to a picture of a spider and 12 
different spider-phobes who were exposed 
to a real life tarantula. The anxiety level 
measured. 
! 
• Open spiderBG.sav
Group Statistics 
Spider or 
Picture? N Mean Std. 
Deviation 
Std. Error 
Mean 
Anxiety Picture 12 40.00 9.293 2.683 
Real 
Spider 12 47.00 11.029 3.184 
Independent Samples Test 
Levene's Test 
for Equality of 
Variances t-test for Equality of Means 
F Sig. t df 
Sig. 
(2- 
tailed) 
Mean 
Differe 
nce 
Std. 
Error 
Differe 
nce 
95% 
Confidence 
Interval of the 
Difference 
Lower Upper 
Anxiety Equal 
variances 
assumed 
.782 .386 -1.6 
81 22 .107 -7.000 4.163 -15.63 
4 1.634 
Equal 
variances 
not 
assumed 
-1.6 
81 
21.38 
5 .107 -7.000 4.163 -15.64 
9 1.649
Thank you

More Related Content

What's hot

Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
Shubham Mehta
 
Lesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionLesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And Regression
Sumit Prajapati
 
Correlation & Regression
Correlation & RegressionCorrelation & Regression
Correlation & Regression
Grant Heller
 

What's hot (19)

Regression and corelation (Biostatistics)
Regression and corelation (Biostatistics)Regression and corelation (Biostatistics)
Regression and corelation (Biostatistics)
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Data analysis 1
Data analysis 1Data analysis 1
Data analysis 1
 
Multiple linear regression II
Multiple linear regression IIMultiple linear regression II
Multiple linear regression II
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regression
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysis
 
Lesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And RegressionLesson 8 Linear Correlation And Regression
Lesson 8 Linear Correlation And Regression
 
Correlations using SPSS
Correlations using SPSSCorrelations using SPSS
Correlations using SPSS
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
regression and correlation
regression and correlationregression and correlation
regression and correlation
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Correlation & Regression
Correlation & RegressionCorrelation & Regression
Correlation & Regression
 
Simple linear regressionn and Correlation
Simple linear regressionn and CorrelationSimple linear regressionn and Correlation
Simple linear regressionn and Correlation
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Regression
RegressionRegression
Regression
 
Pearson Correlation
Pearson CorrelationPearson Correlation
Pearson Correlation
 

Similar to Lect w8 w9_correlation_regression

Stats For Life Module7 Oc
Stats For Life Module7 OcStats For Life Module7 Oc
Stats For Life Module7 Oc
N Rabe
 
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and r3433333333333333333333333333333333333333333333333egratio111n...correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
Ghaneshwer Jharbade
 
EXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docx
EXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docxEXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docx
EXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docx
gitagrimston
 

Similar to Lect w8 w9_correlation_regression (20)

Measure of Association
Measure of AssociationMeasure of Association
Measure of Association
 
Correlation and Regression ppt
Correlation and Regression pptCorrelation and Regression ppt
Correlation and Regression ppt
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relation
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptx
 
Correlation.pptx
Correlation.pptxCorrelation.pptx
Correlation.pptx
 
Stats For Life Module7 Oc
Stats For Life Module7 OcStats For Life Module7 Oc
Stats For Life Module7 Oc
 
Unit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdfUnit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdf
 
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and r3433333333333333333333333333333333333333333333333egratio111n...correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
 
Fundamental of Statistics and Types of Correlations
Fundamental of Statistics and Types of CorrelationsFundamental of Statistics and Types of Correlations
Fundamental of Statistics and Types of Correlations
 
Medical statistics2
Medical statistics2Medical statistics2
Medical statistics2
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
 
EXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docx
EXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docxEXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docx
EXERCISE 23 PEARSONS PRODUCT-MOMENT CORRELATION COEFFICIENT .docx
 
Math n Statistic
Math n StatisticMath n Statistic
Math n Statistic
 
Chap04 01
Chap04 01Chap04 01
Chap04 01
 
Quantitative Data analysis
Quantitative Data analysisQuantitative Data analysis
Quantitative Data analysis
 
correlation-analysis-160424020323.pptx
correlation-analysis-160424020323.pptxcorrelation-analysis-160424020323.pptx
correlation-analysis-160424020323.pptx
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Lecture 9 correlation-manual calcualtion
Lecture 9 correlation-manual calcualtionLecture 9 correlation-manual calcualtion
Lecture 9 correlation-manual calcualtion
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 

More from Rione Drevale

Agricultural technology upscaling_1
Agricultural technology upscaling_1Agricultural technology upscaling_1
Agricultural technology upscaling_1
Rione Drevale
 

More from Rione Drevale (20)

Risk financing
Risk financingRisk financing
Risk financing
 
Managing specialized risk_14
Managing specialized risk_14Managing specialized risk_14
Managing specialized risk_14
 
Arntzen
ArntzenArntzen
Arntzen
 
Banana acclimatization
Banana acclimatizationBanana acclimatization
Banana acclimatization
 
Strategic entrepreneurship tempelate
Strategic entrepreneurship tempelateStrategic entrepreneurship tempelate
Strategic entrepreneurship tempelate
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Sign and symptoms in crops
Sign and symptoms in cropsSign and symptoms in crops
Sign and symptoms in crops
 
Chapter 4 risk
Chapter 4 riskChapter 4 risk
Chapter 4 risk
 
Chapter 5 risk_
Chapter 5 risk_Chapter 5 risk_
Chapter 5 risk_
 
Risk 6
Risk 6Risk 6
Risk 6
 
L3 amp l4_fpe3203
L3 amp l4_fpe3203L3 amp l4_fpe3203
L3 amp l4_fpe3203
 
L2 fpe3203
L2 fpe3203L2 fpe3203
L2 fpe3203
 
L5 fpe3203 23_march_2015-1
L5 fpe3203 23_march_2015-1L5 fpe3203 23_march_2015-1
L5 fpe3203 23_march_2015-1
 
Agricultural technology upscaling_1
Agricultural technology upscaling_1Agricultural technology upscaling_1
Agricultural technology upscaling_1
 
Water science l3 available soil water 150912ed
Water science l3 available soil water 150912edWater science l3 available soil water 150912ed
Water science l3 available soil water 150912ed
 
Water science l2 cwr final full ed
Water science l2 cwr final full edWater science l2 cwr final full ed
Water science l2 cwr final full ed
 
W2 lab design_new2
W2 lab design_new2W2 lab design_new2
W2 lab design_new2
 
W1 intro plant_tc
W1 intro plant_tcW1 intro plant_tc
W1 intro plant_tc
 
Risk management chpt 2
Risk management chpt 2Risk management chpt 2
Risk management chpt 2
 
Risk management chpt 3 and 9
Risk management chpt  3 and 9Risk management chpt  3 and 9
Risk management chpt 3 and 9
 

Recently uploaded

Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
AnaAcapella
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
EADTU
 

Recently uploaded (20)

HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
What is 3 Way Matching Process in Odoo 17.pptx
What is 3 Way Matching Process in Odoo 17.pptxWhat is 3 Way Matching Process in Odoo 17.pptx
What is 3 Way Matching Process in Odoo 17.pptx
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
PANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptxPANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptx
 
Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of Play
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 

Lect w8 w9_correlation_regression

  • 1. Correlation, Regression & T-test Prepared By: Dr. Kumara Thevan a/l Krishnan
  • 2. Introduction Investigation on a relationship between two or more numerical or quantitative variables can be conducted using techniques of correlation and regression analysis. ! - Correlation is a statistical method used to determine whether a linear relationship between variable exist. ! - Simple linear regression is a statistical method used to described the nature of the relationship between two variables.
  • 3. Definition Scatterplot (or scatter diagram) i s a graph in which the paired (x,y) sample data are plotted with a horizontal x axis and a vertical y axis. ! Each individual (x,y) pair is plotted as a single point.
  • 4. Definition Correlation ! exists between two variables when one of them is related to the other in some way
  • 5. Example Open SPSS. Data ; weight height biometry male 2012.sav Graph Legacy dialogs=> scatter plot Height (X); Weight ( Y)
  • 6. BMI test Category BMI range – kg/m Very severely underweight less than 15 Severely underweight from 15.0 to 16.0 Underweight from 16.0 to 18.5 Normal (healthy weight) from 18.5 to 25 Overweight from 25 to 30 Obese Class I (Moderately from 30 to 35 obese) Obese Class II (Severely obese) from 35 to 40 Obese Class III (Very severely obese) over 40
  • 7. Normality test? • The Kolmogorov-Smirnov and Shapiro-Wilk test. • The compare the scores in the sample to a normally distributed set of scores with the same mean and s.d. • If p>0.05, the test is non-significant. Tells us that the distribution of the sample is not significantly different from a normal distribution. • The test is significant (p<0.05) then the distribution in question is significantly different from a normal distribution(non-normal)
  • 9. Positive Linear Correlation y y y x x x (a) Positive (b) Strong positive (c) Perfect positive
  • 10. Negative Linear Correlation y y y x x x (d) Negative (e) Strong negative (f) Perfect negative
  • 11. What can you see?
  • 12. Test 1 - Do scatter plot using data ExamAnxiety.sav - Exam performance (%) – y axis - Exam anxiety – x axis - Color – place gender - Results? ! - Try 3D plot – 3 variables
  • 13. Bivariate correlation • Having taken a preliminary glance at the data, we can proceed to conduct the correlation analysis.
  • 14. Definition ! Linear Correlation Coefficient r measures strength of the linear relationship between paired x- and y-quantitative values in a sample
  • 15. No Linear Correlation y y x x (g) No Correlation (h) Nonlinear Correlation
  • 16. Definition ! Linear Correlation Coefficient r sometimes referred to as the Pearson product moment correlation coefficient
  • 17. Notation for the Linear Correlation Coefficient n number of pairs of data presented. Σ denotes the addition of the items indicated. Σx denotes the sum of all x values. Σx2 indicates that each x score should be squared and then those squares added. (Σx)2 indicates that the x scores should be added and the total then squared. Σxy indicates that each x score should be first multiplied by its corresponding y score. After obtaining all such products, find their sum. r represents linear correlation coefficient for a sample ρ represents linear correlation coefficient for a population
  • 18. Definition Linear Correlation Coefficient r nΣxy - (Σx)(Σy) n(Σx2) - (Σx)2 n(Σy2) - (Σy)2 r =
  • 19. Test 2 • Run the correlation analysis ExamAnxiety.sav • Assumption – Data is normally distributed
  • 20. Results Correlations Exam Performance (%) Exam Anxiety Time Spent Revising Exam Performance (%) Pearson Correlation 1 -.441 .397 Sig. (1-tailed) .000 .000 N 103 103 103 Exam Anxiety Pearson Correlation -.441 1 -.709 Sig. (1-tailed) .000 .000 N 103 103 103 Time Spent Revising Pearson Correlation .397 -.709 1 Sig. (1-tailed) .000 .000 N 103 103 103 **. Correlation is significant at the 0.01 level (1-tailed).
  • 21. Interpretation • Exam performance is positively related to the amount of time spent revising, with a coefficient of r= 0.397, which is also significant at p< 0.01. ! • Exam anxiety appears to be negatively related to the time spent revising (r= -0.709, p< 0.01)
  • 22. Interpretation • Each variable is perfectly correlated with itself (r=1). ! • Exam performance is negatively related to exam anxiety with a Pearson correlation coefficient of r= - 0.441 and there is less than 0.01 probability that a correlation coeficient this big would have occurred by chance in a sample of 103 people.
  • 23. In layman term • exam anxiety , exam mark • Revision time , exam mark • Revision time , exam anxiety
  • 24. Hands on • Is there a linear association between weight and heart girth in this herd of cows? • Weight was measured in kg and heart girth in cm on 10 cows ! ! ! • Assume data is normally distributed
  • 25.
  • 26. • The sample coefficient of correlation is 0.704. The P value is 0.012, which is less than 0.05. The conclusion is that correlation exists in the population. Correlations Weight Girth Weight Pearson Correlation 1 .704 Sig. (1-tailed) .012 N 10 10 Girth Pearson Correlation .704 1 Sig. (1-tailed) .012 N 10 10 *. Correlation is significant at the 0.05 level (1-tailed).
  • 27. Using R2 for interpretation ( correlation coefficient) 2 = coefficient of determination, R2 ! R2 is a measure of the amount of variability in one variable that is explained by the other.
  • 28. Example Correlations Exam Performance (%) Exam Anxiety Time Spent Revising Exam Performance (%) Pearson Correlation 1 -.441 .397 Sig. (1-tailed) .000 .000 N 103 103 103 Exam Anxiety Pearson Correlation -.441 1 -.709 Sig. (1-tailed) .000 .000 N 103 103 103 Time Spent Revising Pearson Correlation .397 -.709 1 Sig. (1-tailed) .000 .000 N 103 103 103 **. Correlation is significant at the 0.01 level (1-tailed).
  • 29. Example Exam anxiety and exam performance • ( correlation coefficient) 2 = coefficient of determination, R2 ! R2 = ( -0.441) 2 = 0.194 ! • In % = 0.194 x 100 = 19.4%
  • 30. • Although exam anxiety was correlated with exam performance, it can account for only 19.4 % of variation in exam scores. ! • 80.6% of the variability to be accounted for other variables such as different ability, different level of preparation and so on…)
  • 31. Hands on Subject Age, x Pressure, y A 43 128 B 48 120 C 56 135 D 61 143 E 67 141 F 70 152 Compute the value of the correlation coefficient for the data? Do you have enough Statistical evidence that this relationship does not occur by chance?
  • 32. Correlations Age Pressure Age Pearson Correlation 1 .897 Sig. (2-tailed) .015 N 6 6 Pressure Pearson Correlation .897 1 Sig. (2-tailed) .015 N 6 6 *. Correlation is significant at the 0.05 level (2-tailed). R 2 = ?
  • 33. Regression Correlation do not provide the predictive power of variables. ! In regression analysis we fit a predictive model to our data and use that model to predict values of the dependent variable from one or more independent variables.
  • 34. Independent V. Dependent • Intentionally manipulated • Controlled • Vary at known rate • Cause • Intentionally left alone • Measured • Vary at unknown rate • Effect
  • 35. • Simple regression seeks to predict an outcome variable from a single predictor variable whereas multiple regression seeks to predict an outcome from several predictors. ! Outcomei = (Modeli) + errori Yi = (bo + b1 xi ) + ei
  • 36.
  • 37. Least squares Least squares is a method of finding the line that best fits the data. ! This “line of best fit” is found by ascertaining which line, of all of the possible lines that could be drawn, results in the least amount of difference between the observed data points and the line.
  • 38. The vertical lines (dashed) represent the differences (or residuals) between the line and the actual data
  • 39. • “The best fit line” – there will be small differences between the values predicted by the line and the data that were actually observed. ! • Our interest- in the vertical differences between the line and the actual data because we are using the line to predict values of Y from values of the X-variable. ! • Some data fall above or below the line, indicating there is difference between the model fitted to these data and the data collected.
  • 40. • These difference called “residuals”. • If the “residuals” +ve and –ve cancelled each other ! How ? ! • Square the differences before adding up. • If the squared differences are large, the line is not representative of the data; if the squared differences is small then is representative.
  • 41. Total sum of squares, SST SST uses the differences between the observed data and the mean value of Y
  • 42. • The sum of squared differences (SS) can be calculated for any line that is fitted to some data; the “goodness of fit” of each line can then be compared by looking at the sum of squares for each. ! • The method of least squares works by selecting the line that has the lowest sum of squared differences(so it chooses the line that best represents the observed data) ! • This “line of best fit” known as a regression line.
  • 43. Residual sum of squares, SSR SSR uses the differences between the observed data and the regression line
  • 44. SS M uses the differences between the mean value of Y and the regression line Model sum of squares (SS M)
  • 45. F-ratio F-test = MSM MSR ! MSM (mean square for the model) ! = SS M Number of variables in the model
  • 46. F-ratio F-test = MSM MSR ! MSR (mean square for the model) ! = SS R Number of Observation- Number of parameters being estimated
  • 47. F- ratio • a good model should have a large F-ratio (greater than 1 at least)
  • 48. Test 1 Open sample date – Record1.sav Graph=> scatterplot ! Analyze=> regression
  • 49.
  • 50. Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .578 .335 .331 65.991 a. Predictors: (Constant), Advertsing Budget (thousands of pounds)
  • 51. Interpretation • The value of R2 is 0.335, which tell us that advertising expenditure can account 33.5% of the variation in record sales. ! • This means that 66% of the variation in record sales cannot be explained by advertising alone
  • 52. F ratio 99.58, which is significant at p< 0.001(because the value in column labelled Sig. is less than 0.001. ! This result tells us there is less than a 0.1% chance that an F ratio this large would happen by chance alone.Overall, the regression model predicts record sales significantly well.
  • 53. Multiple regression • Open data ; Record2.sav
  • 54.
  • 55.
  • 56.
  • 57.
  • 58. Results Descriptive Statistics Mean Std. Deviation N Record Sales (thousands) 193.20 80.699 200 Advertsing Budget (thousands of pounds) 6.1441E2 485.65521 200 No. of plays on Radio 1 per week 27.50 12.270 200 Attractiveness of Band 6.77 1.395 200
  • 59. Correlations Record Sales (thousands) Advertsing Budget (thousands of pounds) No. of plays on Radio 1 per week Attractivene ss of Band Pearson Correlation Record Sales (thousands) 1.000 .578 .599 .326 Advertsing Budget (thousands of pounds) .578 1.000 .102 .081 No. of plays on Radio 1 per week .599 .102 1.000 .182 Attractiveness of Band .326 .081 .182 1.000 Sig. (1-tailed) Record Sales (thousands) . .000 .000 .000 Advertsing Budget (thousands of pounds) .000 . .076 .128 No. of plays on Radio 1 per week .000 .076 . .005 Attractiveness of Band .000 .128 .005 . N Record Sales (thousands) 200 200 200 200 Advertsing Budget (thousands of pounds) 200 200 200 200 No. of plays on Radio 1 per week 200 200 200 200 Attractiveness of 200 200 200 200 Band
  • 60. Model Summary Model R R Square Adjusted R Square 1 .578 .335 .331 65.991 a. Predictors: (Constant), Advertsing Budget (thousands of pounds) Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate Std. Error of the Estimate Change Statistics Durbin- Watson R Square Change F Change df1 df2 Sig. F Change 1 .815 .665 .660 47.087 .665 129.498 3 196 .000 1.950 a. Predictors: (Constant), Attractiveness of Band, Advertsing Budget (thousands of pounds), No. of plays on bR.a Ddeiop 1e npdeer nwt eVeakriable: Record Sales (thousands)
  • 61. ANOVA Model Sum of Squares df Mean Square F Sig. 1 Regression 433687.833 1 433687.83 3 99.587 .000 Residual 862264.167 198 4354.870 Total 1295952.00 0 199 a. Predictors: (Constant), Advertsing Budget (thousands of pounds) b. Dependent Variable: Record Sales (thousands) ANOVA Model Sum of Squares df Mean Square F Sig. 1 Regression 861377.41 8 3 287125.80 6 129.49 8 .000 Residual 434574.58 2 196 2217.217 Total 1295952.0 00 199 a. Predictors: (Constant), Attractiveness of Band, Advertsing Budget (thousands of pounds), No. of plays on Radio 1 per week b. Dependent Variable: Record Sales (thousands)
  • 62. Hands on • Open file softdrinks.sav ! • Do multiple regression analysis ! • Y – dependent – delivery time
  • 63. Results Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .980 .960 .956 3.25947 a. Predictors: (Constant), distance, cases ANOVA Model Sum of Squares df Mean Square F Sig. 1 Regression 5550.811 2 2775.405 261.235 .000 Residual 233.732 22 10.624 Total 5784.543 24 a. Predictors: (Constant), distance, cases b. Dependent Variable: time
  • 64. Coefficients Model Unstandardized Coefficients Standardiz ed B Std. Error Beta t Sig. 1 (Constant) 2.341 1.097 2.135 .044 cases 1.616 .171 .716 9.464 .000 distance .014 .004 .301 3.981 .001 a. Dependent Variable: time
  • 65. T-test • Testing differences between means ! • Dependent means t-test: used when there are two experimental conditions and the same participants took part in both conditions of the experiment. ! • Independent means t-test: used when there are two experimental conditions and different participants were assigned to each condition.
  • 66. Dependent t-test • 12 spider-phobes who were exposed to a picture of a spider (picture) and on a separate occasion a real live tarantula (real). Their anxiety was measured in each condition (half of the participants were exposed to the picture before the real spider while the other half were exposed to the real spider first). • Which situation caused more anxiety? ! ! ! ! • Open spiderRM.sav
  • 67. Results Paired Samples Statistics Mean N Std. Deviation Std. Error Mean Pair 1 Picture of Spider 40.00 12 9.293 2.683 Real Spider 47.00 12 11.029 3.184 Paired Samples Correlations N Correlation Sig. Pair 1 Picture of Spider & Real Spider 12 .545 .067 r= 0.545, not significantly correlated p > 0.05
  • 68. Paired Samples Test Paired Differences t df Sig. (2- tailed) Mea n Std. Deviati on Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Pai r 1 Picture of Spider - Real Spider -7.00 0 9.807 2.831 -13.231 -.769 -2.47 3 11 .031 T-value minus; tells us that picture had a smaller mean that the real tarantula and so the Real spider led to greater anxiety than the picture. ! Conclusion; that the exposure to a real spider caused a significantly more reported anxiety In spider-phobes than exposure to a picture (t(11)= -2.47, p< 0.05)
  • 69. Hands on • All students who enroll in a certain memory course are given a pretest before the course begin. At the completion of the course, post test their scores are listed here. Verify the results shown on the output by calculating the values and assume normality. Std 1 2 3 4 5 6 7 8 9 10 Before 93 86 72 54 92 65 80 81 62 73 After 98 92 80 62 91 78 89 78 71 80
  • 70. Independent t-test • We have 12 spider-phobes who were exposed to a picture of a spider and 12 different spider-phobes who were exposed to a real life tarantula. The anxiety level measured. ! • Open spiderBG.sav
  • 71. Group Statistics Spider or Picture? N Mean Std. Deviation Std. Error Mean Anxiety Picture 12 40.00 9.293 2.683 Real Spider 12 47.00 11.029 3.184 Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means F Sig. t df Sig. (2- tailed) Mean Differe nce Std. Error Differe nce 95% Confidence Interval of the Difference Lower Upper Anxiety Equal variances assumed .782 .386 -1.6 81 22 .107 -7.000 4.163 -15.63 4 1.634 Equal variances not assumed -1.6 81 21.38 5 .107 -7.000 4.163 -15.64 9 1.649