This document discusses regression analysis techniques. Regression analysis is used to model the relationship between a dependent variable (Y) and one or more independent variables (X1, X2, etc). Simple linear regression involves one independent variable, while multiple linear regression involves two or more independent variables. The key assumptions of linear regression are outlined. Methods for estimating regression coefficients using least squares and testing the significance of regression coefficients and the overall regression model are also described. An example application involving modeling personal pollutant exposure (Y) based on hours outdoors (X1) and home pollutant levels (X2) is provided.
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
Topic: Regression
Student Name: Nayab
Class: B.Ed. 2.5
Project Name: “Young Teachers' Professional Development (TPD)"
"Project Founder: Prof. Dr. Amjad Ali Arain
Faculty of Education, University of Sindh, Pakistan
In this presentation, we will discuss the mathematical basis of linear regression and analyze the concepts of p-value, hypothesis testing, and confidence intervals, and their interpretation.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Topic: Regression
Student Name: Nayab
Class: B.Ed. 2.5
Project Name: “Young Teachers' Professional Development (TPD)"
"Project Founder: Prof. Dr. Amjad Ali Arain
Faculty of Education, University of Sindh, Pakistan
In this presentation, we will discuss the mathematical basis of linear regression and analyze the concepts of p-value, hypothesis testing, and confidence intervals, and their interpretation.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
1. Dr. Pritpal Singh
Sr. Statistician
Department of Plant Breeding and Genetics
Regression Analysis
2. Regression analysis is used for modeling the relationship between a single variable
Y, called the response, output or dependent variable, (Effect) and one or more
predictor, input, independent or explanatory variables, X1, X2,…., Xp. (Cause)
When p=1, it is called simple regression
Y = β 0 + β1 X 1 + e
and when p> 1 it is called multiple regression.
The functional form of the multiple linear regression model is
Y = β 0 + β1 X 1 + β 2 X 2 +.. + β p X p + e
where p is the number of the so-called "independent" variables, or "regressors“ and
is the random error.
The statistical technique of estimating or predicting the unknown value of a
dependent variable from the known value of the independent variable is called
regression analysis.
Regression Analysis
3. Assumptions of linear regression
Normality: the values of the dependent variable are normally distributed for any
value of the independent variable. Normality of errors following normal
distribution with mean zero and variance σ 2
Linearity: a linear relationship between the dependent and independent variable.
Independence: the observations are randomly and independently selected.
Homoscedasticity: the variation in the values of the dependent variable is the
same (equal) for any value of the independent variable
There is no multicollinearity between the independent variables or no exact
correlation between the independent variable.
4. Linear Regression
Linear regression line is one which gives the best estimate or predict dependent variable (Y)
for any given value of the independent variable (X).
Regression Line of Y on X1
βo is the intercept which the regression line makes with the Y-axis,
β1is the slope of the line i.e. regression coefficient Y on X1
(represents the increase or decrease in the value of Y variable corresponding to
the unit increase in the value of X-variable)
and ei are the random error i.e. the effect of some unknown factors.
Least Square Estimates
The values of βo and β1 are estimated by the method of least squares such that the sum of squares
of the deviation of observed value of the dependent variable from the corresponding estimated
value based on regression function is minimum.
1
1 0 1 1
2
1
ˆ ˆ ˆ
,
yx
Y X
x
0 1 1
Y X e
imum
e
Y
Y i
i
i min
2
2
^
5. Least Square Estimates
The values of βo and β1 are estimated by the method of
least squares such that the residual/errors sum of squares
is minimum
0 1 1
Y X e
7. Testing the Significance of Overall Regression
Sources of
variation
df SS MS F-ratio
Regression 1 SSReg(1) MSR=SSReg/1 F=MSR/MSE
~F1,(n-2)
Error n-2 SSE = MSE=SSE/(n-2)
Total n-1 TSS =
1 1
ˆ yx
2
y
2
e
There are two alternative method to test this hypothesis:
ANOVA
0
1
: Overall Regression is not significant
: Overall Regression is significant
H
H
^
2
u
8. Testing the Significance of Overall Regression
Coefficient of Determination (R2)
1
1 1
2
. 2
ˆ
Re 1
Y X
yx
SS g
R
TSS y
0
1
: Overall Regression is not significant
: Overall Regression is significant
H
H
2
, 1
2
1 1
p n p
R p
F F
R n p
2
1, 2
2
1
1 2
n
R
F F
R n
9. Test of Significance of Regression Coefficients
.
1
2
1
1 1
2
1 2
1
2 2
1 1
2
ˆ 0
ˆ
ˆ ˆ
Where
1
ˆ ˆ
ˆ Re 1
ˆ
2 2 2
n
u
u
t t
SE
SE V
V
x
e y yx TSS SS g
n n n
0
:
0
:
1
1
1
0
H
H Regression coefficient. is not significant i.e. No linear relationship
Regression coefficient. is significant i.e. linear relationship exist
14. Example: Personal exposure to pollutants is influenced by various outdoor and indoor sources.
The aim of this study was to evaluate the exposure of the citizens to toluene. This variation
among monitoring campaigns might largely be explained by differences in climate
parameters, namely wind speed, humidity and amount of sunlight. Passive air samplers were
used to monitor volunteers, their homes and various urban sites for ten days, excluding
exposure from active smoking. For selected three variables i.e. Y = toluene personal
exposure concentration- widespread aromatic hydrocarbon (µg/m3); X1 = hours spent
outdoors; X2 = toluene home levels (µg/m3) the data are given below:
(a) Fit the model Y 0 1X1 2 X 2 e ?
(b) Test H0 i 0 vs H1 i 0 for i 1,2
(c) Measure of the overall strength of the linear relationship and tests its significance.
(d) Compare the explained variability of the full model with that of the reduced model ie.when
X1 is only in the regression.
S. No 1 2 3 4 5 6 7 8 9 10
(Y) 59 53 58 70 66 53 56 71 50 91
(X1) 5 9 6 14 9 7 13 11 5 17
(X2) 31 35 35 34 40 50 36 34 45 42
15. Fitting of Multiple Regression
.
2
1 2 2 1 2
1 2
2 2
1 2 1 2
2
2 1 1 1 2
2 2
2 2
1 2 1 2
0 1 1 2 2
ˆ ,
ˆ
ˆ ˆ ˆ
yx x yx x x
x x x x
yx x yx x x
x x x x
Y X X
0 1 1 2 2
Y X X e
16. Test of Significance of Regression Coefficients
. 0 1
1 1
: 0
: 0
H
H
1
1 3
1
1 1
2
2
2
1 2
2 2
1 2 1 2
2 2
1 1 2 2
2
ˆ 0
ˆ
ˆ ˆ
Where
ˆ ˆ
ˆ ˆ Re 2
ˆ
3 3 3
n
u
u
t t
SE
SE V
x
V
x x x x
e y yx yx TSS SS g
n n n
Regression coefficient. is not significant i.e. No linear relationship
Regression coefficient. is significant i.e. linear relationship exist
17. Test of Significance of Regression Coefficients
0 2
1 2
: 0
: 0
H
H
2
2 3
2
2 2
2
1
2
2 2
2 2
1 2 1 2
2 2
1 1 2 2
2
ˆ 0
ˆ
ˆ ˆ
Where
ˆ ˆ
ˆ ˆ Re 2
ˆ
3 3 3
n
u
u
t t
SE
SE V
x
V
x x x x
e y yx yx TSS SS g
n n n
Regression coefficient. is significant i.e. linear relationship exist
Regression coefficient. is not significant i.e. No linear relationship
18. Sources of
variation
df SS MS F-ratio
Regression 2 SSReg(2)= MSR=SSReg/2 F=MSR/MSE
~F2,(n-3)
Error n-3 SSE = MSE=SSE/(n-3)
Total n-1 TSS =
1 1 2 2
ˆ ˆ
yx yx
2
y
2
e
0
1
:Overall Regression is not significant
:Overall Regression is significant
H
H
Testing the Significance of Overall Regression
There are two alternative method to test this hypothesis:
ANOVA
2
^
u
19. Testing the Significance of Overall Regression
• There are two alternative method to test this hypothesis:
2.
1 2
1 1 2 2
2
. 2
ˆ ˆ
Re 2
Y X X
yx yx
SS g
R
TSS y
0
1
: Overall Regression is not significant
: Overall Regression is significant
H
H
2
, 1
2
1 1
p n p
R p
F F
R n p
2
2, 3
2
2
1 3
n
R
F F
R n
20. Improvement With the additional variable
0 2
1 2
: New variable has not improved the Regression
: New variable has improved the Regression
H X
H X
Sources of
variation
df SS MS F-ratio
m=1 SSReg(1)
p=2 SSReg(2)
p-m=1 SSReg(2)-SSReg(1) MSReg F=MSR/MSE(2)
~F1,(n-3)
Error n-3 SSE(2) MSE(2)
Total n-1 TSS
1
X
1 2
,
X X
2 1
/
X X
21.
22.
23. Residuals
A linear regression model is not always appropriate for the data. You can assess the
appropriateness of the model by examining residuals and outliers.
Residuals
The difference between the observed value of the dependent variable (y) and the predicted
value (ŷ) is called the residual (e). Each data point has one residual.
Residual = Observed value - Predicted value
e = y - ŷ
Both the sum and the mean of the residuals are equal to zero. That is, Σ e = 0 and e = 0.
Residual Plots
A residual plot is a graph that shows the residuals on the vertical axis and the predicted
value of Y on the horizontal axis. If the points in a residual plot are randomly dispersed
around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a
non-linear model is more appropriate.
The residual plots show three typical patterns. The first plot shows a random pattern,
indicating a good fit for a linear model. The other plot patterns are non-random (U-shaped
and inverted U), suggesting a better fit for a non-linear model.
Random Pattern
24. Quadratic Regression
i
i
i X
X
Y e
2
1
2
1
1
0
β0= Y intercept
β1= linear effect on Y
β0= curvilinear effect on Y
εi= random error in Y for ith obsevation
The maximum value of quadratic curve occurs at the function ^
2
^
1
)
2
(
X
2
1
2 X
X