Data-Analysis.pptx

DATA ANALYSIS AND INTERPRETATION
REGRESSION ANALYSIS
Prepared by:
CARLO G. CUBA
Master Teacher I

ACTIVITY 1: DESCRIBE ME
 Pearson Correlation
 Independent Samples t-Test
 Paired Samples t-Test

 Pearson Correlation- measures the strength
of relationship between variables
 Independent Samples t-Test- use to
compare two sample means from unrelated groups
(different people providing scores from each group)
 Paired Samples t-Test- compare the means
of two variables for a single group

TEST OF
SIGNIFICANT
RELATIONSHIP

CORRELATION
(non-modelling approach)
Is there a significant correlation/ relationship between
variable 1 and variable 2?
EXAMPLE
Does students’ time spent in studying
significantly correlate with their exam
performance?

EXAM
PERFORMANCE
TIME SPENT
STUDYING

REGRESSION ANALYSIS
(modelling approach)
EXAMPLE
Does students’ locus of hope significantly
predict/ affect/ influence social well-being?
 Social Well-Being = a + b (Locus of Hope);
where a = constant, b = regression coefficient

LOCUS OF HOPE
(Predictor)
SOCIAL WELL-
BEING
(Criterion
Variable)

QUESTIONS THE REGRESSION ANSWERS
1.Causal
Analysis
2.Forecast
Values
3.Predicting

CAUSAL ANALYSIS
Aprocess foridentifyingand
addressingthecausesandeffects
ofachallengeorproblem

FORECAST VALUES
Predictafuturevalueby
usingexistingvalues

PREDICTING TRENDS
Processofresearching
andformulating
predictions

In contrast to correlation analysis
which does not indicate directionality of
effects, the regression analysis
assumes that the independent
variable has an effect on the
dependent variable.

ACTIVITY 2: CLASSIFY ME
 Causal Analysis
 Forecast Values
 Predicting Trends

CAUSAL ANALYSIS
MEDICINE:
Has the body weight an influence on the blood cholesterol level?
BIOLOGY:
Does the oxygen level in water stimulate plant growth?
MANAGEMENT:
Does customer satisfaction influence loyalty?

CAUSAL ANALYSIS
PSYCHOLOGY:
Is anxiety influenced by personality traits?
EDUCATION:
Does NCAE mathematical ability significantly predict college
admission test results?

FORECAST VALUES
MEDICINE:
With X cigarettes smoked per day, the life expectancy is Y years.
BIOLOGY:
Five additional weeks of sunshine the sugar concentration in vine
grapes will rise by X%
EDUCATION:
With X rating on COT, IPCR rating will be Y.

PREDICTING TRENDS
MEDICINE:
By how many years does the life expectancy decrease for every additional
pound overweight?
BIOLOGY:
With every additional week of sunshine the sugar concentration in vine
grapes will rise by Y%.
EDUCATION:
How does expectancy-value towards STEM affect the academic
performance?

Regression analysis is a
way of predicting an
outcome variable
(criterion) from one
predictor variable (linear
regression) or several
predictor variables
(multiple regression).

We fit a model to our data
and use it to predict
values of the dependent
variable from one or more
independent variables.

METHODS OF LEAST SQUARES
The goal is to determine the line of
best fit which is also called least
squares regression line.

FORMULA
FOR
REGRESSI
ON
ANALYSIS

REGRESSION ANALYSIS
 LINEAR REGRESSION
 MULTIPLE REGRESSION

LINEAR REGRESSION
• Next step after correlation
• It is used when we want to predict the value of a
variable based on the value of another variable
• The variable we want to predict is called the
Dependent Variable (Outcome Variable); while, the
variable we are using to predict the other variable’s
value is called the Independent Variable (Predictor
Variable).

EXAMPLE
You could use multiple regression to understand
 whether exam performance can be predicted
based on revision time
 whether cigarette consumption can be
predicted based on smoking duration

SEVEN ASSUMPTIONS FOR LINEAR REGRESSION
1. The dependent variable should be measured at the continuous
level
2. The independent variable should also be measured at the
continuous level
3. There needs to be a linear relationship between two variables
4. There should be no significant outliers
5. You should have independence of observations
6. The data needs to show homoscedasticity
7. Normally Distributed

Assumption #1
The dependent variable
should be measured at the
continuous level.
Either Interval
Variable or Ratio
Variable
Example of Continuous Variables: Time (measured in hours),
Intelligence (measured using IQ scores), exam performance
(measured from 0 to 100), Weight (measured in kilogram)

Assumption #2
The independent variable
should also be measured at
the continuous level.

Assumption #3
There needs to be a linear
relationship between two
variables

Assumption #4
There should be no
significant outliers.
Observed data point that has
a dependent variable value
that is very different to the
value predicted by the
regression equation
A point on a scatterplot
that is far away from the
regression line indicating
that it has large residual
OUTLIERS

Assumption #5
You should have independence
of observations.
Durbin-Watson
Statistic

Assumption #6
The data needs to show
homoscedasticity.
The variances along the line of best fit
remain similar as you move along the
line
HOMOSCEDAS
TICITY

Assumption #7
Normally Distributed

EXAMPLE
A salesperson for a large car brand wants to determine
whether there is a relationship between an individual’s
income and the price they pay for a car. As such, the
individual’s “income” is the independent variable and
the “price” they pay for a car is the dependent variable.
The salesperson wants to use this information to
determine which cars to offer potential customers in
new areas where average income is known.

MULTIPLE REGRESSION
• Extension of simple linear regression
• It is used when we want to predict the value of a
variable based on the value of two or more other
variables
• The variable we want to predict is called the
Dependent Variable (Outcome Variable); while, the
variables we are using to predict the value of the
dependent variable are called the Independent
Variables (Predictor Variables).

EXAMPLE
You could use multiple regression to understand
 whether exam performance can be predicted
based on revision time, test anxiety, lecture
attendance, and gender
 whether daily cigarette consumption can be
predicted based on smoking duration, age when
started smoking, smoker type, income and
gender.

EIGHT ASSUMPTIONS FOR MULTIPLE REGRESSION
1. The dependent variable should be measured at the continuous level
2. There are two or more independent variables, which can be either
continuous (i.e. interval or ratio) or categorical (ordinal or nominal)
4. There needs to be a linear relationship between two variables
5. There should be no significant outliers
7. The data needs to show homoscedasticity
8. Datamust not show multicollinearity.

Assumption #8
Data must not show
multicollinearity
MULTICOLLINEARITY
Occurs when you have two or more independent
variables that are highly correlated with each other
This leads to a problem with understanding which
independent variable contributes to the variance
explained in the dependent variable, as well as technical
issues in calculating multiple regression model.

EXAMPLE
A health researcher wants to be able to predict the level of individuals’
fitness and health. Normally, to perform this procedure requires expensive
laboratory equipment and necessitates that an individual exercise to their
maximum (i.e., until they can longer continue exercising due to physical
exhaustion). This can put off these individuals who are not very active/fit
and those individuals who might be at higher risk of ill health (e.g.. older
unfit subjects). For these reasons, it has been desirable to find a way of
predicting an individuals’ fitness and health based on attributes that can be
measured more easily and cheaply. To this end, a researcher recruited 100
participants to perform a maximum level, but also recorded their “age”,
“weight”, “heart rate”, and “gender”.

REFERENCES
 The basics of REGRESSION ANALYSIS by Starr Clyde
L. Sebial, PhD
 https://statistics.laerd.com/spss-tutorials/linear-
regression-using-spss-statistics.php
 https://statistics.laerd.com/spss-tutorials/multiple-
regression-using-spss-statistics.php

Data-Analysis.pptx

Recommended

Recommended

More Related Content

Similar to Data-Analysis.pptx

Similar to Data-Analysis.pptx (20)

More from RobertRevamonte1

More from RobertRevamonte1 (14)

Recently uploaded

Recently uploaded (20)

Data-Analysis.pptx