SlideShare a Scribd company logo
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
Simple Linear Regression
Terminologies
Introduction & Example
Standard input/tuning parameters & Sample UI
Sample output UI
Interpretation of Output
Limitations
Business use cases
What Are
All Covered
Terminologies
• Predictors and Target variable :
• Target variable usually denoted by Y , is the variable being predicted and is also called
dependent variable, output variable, response variable or outcome variable
• Predictor, usually denoted by X , sometimes called an independent or explanatory
variable, is a variable that is being used to predict the target variable
• Correlation :
• Correlation is a statistical measure that indicates the extent to which two variables
fluctuate together
• Upper & Lower N% confidence intervals:
• A confidence interval is a statistical measure for saying, "I am pretty sure the true value
of a number I am approximating is within this range with n% confidence
Terminologies
• Intercept / constant term 𝜷0 :
• Intercept is the expected value of Y when all Xi = 0
• In other words, 𝛽0 represents what would be the minimum value of Y given all Xi = 0
• Coefficients 𝜷𝒊 :
• It is interpreted as the expected value of Yi corresponding to one unit change in Xi
• Error term 𝜺𝒊 :
• It represents the margin of error within a model
• It is a difference between the predicted value of Yi and observed value of Yi
• Standard error of coefficient :
• It is used to measure the precision of the estimate of the coefficient
• In other words, the smaller the standard error, the more precise the estimate
Where Yi is dependent variable
Xi is independent variable
Terminologies
• T statistic:
• Dividing the coefficient by its standard error gives t statistic which is used in
calculation of P value
• Degree of freedom:
• Degree of freedom is N-K where N is number of observations and K is number of
parameters used to calculate the estimate
• Significance level /alpha level:
• It represents level of confidence at which you want to test the results.
• Lower values of alpha means higher confidence. For example if 𝛼=0.1, confidence=
100 - (𝛼*100) = 90%
• P value :
• If the p-value associated with this t-statistic is less than alpha level, it means that
there exists a relation between corresponding predictor and dependent variable
Types of Linear regression analysis
• Depending on the number of independent variables/predictors in analysis, it is classified into two types :
• Simple linear regression:
• When there is only one dependent and one independent variable/predictor
• Multiple linear regression :
• When there is only one dependent variable but multiple independent variables/predictors
• Where
• Yi is dependent variable
• Xi is independent variable
• 𝛽0 is intercept
• 𝛽𝑖 is coefficient
• 𝜀𝑖 is the error term
Introduction : Simple
linear regression
Objective :
It is a statistical technique that attempts to
explore the relationship between one
independent variable (X) and one dependent
variable (Y )
Benefit :
Regression model output helps identify whether
independent variable/predictor X has any
relationship with dependent variable Y and if
yes then what is the nature/direction of
relationship ( i.e. positive/negative) between
the both
Model :
Simple Linear regression model equation takes
the form of Yi = 𝛽0 +𝛽1 Xi + 𝜀𝑖 as shown in
image in right :
Example: Simple linear regressionTemperature Yield
50 112
53 118
54 128
55 121
56 125
59 136
62 144
65 142
67 149
71 161
72 167
74 168
75 162
76 171
79 175
80 182
82 180
85 183
87 188
90 200
93 194
94 206
95 207
97 210
100 219
Input data Output
Regression Statistics
R Square 0.98
Coefficients P-value Lower 95% Upper 95%
Intercept 13.33 0.00268 5.13 21.52
Temperature 2.04 0.00138 1.93 2.15
Model is a good fit
as R square > 0.7
• P value for Temperature is <0.05;
• Hence Temperature is an important
factor for predicting Yield and has
significant relation with Yield
• With one unit increase in
Temperature there is 2 times
increase in Yield
• Values of coefficients will lie
between the range mentioned
under upper and lower 95%
• For example , coefficient of
Temperature will be between 1.93
and 2.15 with 95% confidence (5 %
chance of error)
Let’s get the simple linear regression output for independent variable X and
target variable Y as shown below:
Note : Intercept is not an important statistics for checking the relation between X & Y
Standard input/tuning parameters & Sample
UI
Select the predictor
Temperature
Yield
Pressure range
Step
1
Select the dependent variable
Temperature
Yield
Pressure range
Step 3
Step size =1
Number of Iterations = 100
Step
2
Display the output window
containing following :
o Model summary
o Line fit plot
o Normal probability plot
o Residual versus Fit plot
Step 4
Note : Categorical predictors should be auto detected &
converted to binary variables before applying regression
By default these parameters should
be set with the values mentioned
Sample output : 1. Model Summary
Regression Statistics
Multiple R 0.99
R Square 0.98
P-value :
o It is used to evaluate whether the corresponding predictor X has any significant impact on the target
variable Y
o As p –value for temperature is < 0.05 (highlighted in yellow in table above) , temperature has
significant relation with Yield
Value of a temperature coefficient
lies between 1.93 and 2.15 with 95%
confidence
 Multiple R : It depicts the correlation between X & Y , closer this value
to ±1, higher the correlation
 R square : It shows the goodness of fit of the model. It lies between 0 to
1 and closer this value to 1, better the model
Coefficient:
o It shows the magnitude as well as direction of impact of predictor X (temperature in this case) to a
target variable Y
o For example , in this case , with one unit increase in temperature, there is ‘2.04 unit increase’ in
Yield ( yield increases 2 times with one unit increase in X)
Coefficients P-value Lower 95% Upper 95%
Intercept 13.33 0.00268 5.13 21.52
Temperature 2.04 0.00138 1.93 2.15
Check Interpretation section for more details
Sample output : 2. Plots
y^ = 𝟏𝟕 + 𝟐𝒙
R2 = 0.75
Line fit plot is used to check the assumption of
linearity between X & Y
Normal Probability plot is used to check the
assumption of normality & to detect outliers
Residual plot is used to check the assumption
of equal error variances & outliers
Check Interpretation section for more details
Interpretation of Important Model Summary
Statistics
Multiple R :
•R > 0.7 represents a strong
positive correlation
between X and Y
•0.4 < = R < 0.7 represents a
weak positive correlation
between X and Y
•0 <= R < 0.4 represents a
negligible/no correlation
between X and Y
•-0.4 < = R < -0.7 represents
a weak negative
correlation between X and Y
•R < - 0.7 represents a
strong negative correlation
between X and Y
R Square :
•R square > 0.7 represents a
very good model i.e. model
is able to explain 70%
variability in Y
•R square between 0 to 0.7
represents a model not fit
well and assumptions of
normality and linearity
should be checked for better
fitting of a model
P value :
•At 95% confidence threshold
, if p-value for a predictor X
is <0.05 then X is a
significant/important
predictor
•At 95% confidence threshold
, if p-value for a predictor X
is >0.05 then X is an
insignificant/unimportant
predictor i.e. it doesn’t have
significant relation with
target variable Y
Coefficients :
•It indicates with how much
magnitude the output
variable will change with
one unit change in X
•For example, if coefficient
of X is 2 then Y will
increase 2 times with one
unit increase in X
•If coefficient of X is -2
then Y will decrease 2
times with one unit
increase in X
Interpretation of plots
: Line Fit plot
This plot is used to plot the relationship between
X (predictor) & Y(target variable) with Y on y
axis and X on x axis
As shown in the figure1 in right, as temperature
increases, so does the Yield, hence there is a
linear relationship between X and Y and simple
linear regression is applicable on this data
Fitted regression line and regression equation is
shown in the plot itself along with model R
square value to describe how well the model fits
the data and whether there is a linear relation
between X and Y or not
If R square is low (<0.7) and line doesn’t display
linearity as shown in figures 2 & 3 in right then a
linear regression model is not applicable and
different model should be considered to predict
Y
y^ = 𝟏𝟕 + 𝟐𝒙
R2 = 0.75
Figure 1
Figure 2
Figure 3
R2 = 0.5
R2 = 0.4
Interpretation of plots
: Normal Probability
plot
This plots the percentile vs. target/dependent
variable(Y)
It is used to check the assumptions of
linearity and normality in data and also to
detect the outliers
It can be helpful to add the trend line to see
whether the data fits a straight line
The plot in figure 1 shows that the pattern of
dots in the plot lies close to a straight line;
Therefore, data is normally distributed and
there are no outliers
Examples of non normal data are shown in
figure 2 &3 in right and example of outliers is
shown in figure 4 :
Figure 1
Figure 2
Figure 3
Figure 4
Interpretation of plots
: Residual versus Fit
plot
It is the scattered plot of residuals on Y axis and predicted
(fitted) values on X axis
It is used to detect unequal error variances and outliers
Here are the characteristics of a well-behaved residual vs.
fits plot :
The residuals should "bounce randomly" around the 0 line
and should roughly form a "horizontal band" around the 0
line as shown in figure 1. This suggests that the variances of
the error terms are equal
No one residual should "stands out" from the basic random
pattern of residuals. This suggests that there are no outliers
For example the red data point in figure 1 is an outlier, such
outliers should be removed from data before proceeding
with model interpretation
Plots shown in figures 2 & 3 above depict unequal error
variances, which is not desirable for linear regression
analysis
Figure 1
Figure 2
Figure 3
Limitations
Simple linear regression is limited to predicting numeric output i.e.
dependent variable has to be numeric in nature
• Minimum sample size should be > 50+8m where m is number of
predictors.
• Hence in case of simple linear regression, minimum sample size should be
50+8(1) = 58
• It handles only two variables : one predictor and one dependent
variable but usually there are more than one predictors correlated
with the dependent variable which can’t be analyzed through simple
linear regression
Limitations
Target/dependent variable should be normally
distributed
A normal distribution is an arrangement of a
data set in which most values cluster in the
middle of the range and the rest taper off
symmetrically toward either extreme. It will
look like a bell curve as shown in figure 1 in right
Outliers in data can affect the analysis, hence
outliers need to be removed
Outliers are the observations lying outside
overall pattern of distribution as shown in figure
2 in right
These extreme values/outliers can be replaced
with 1st or 99th percentile values
Outliers
Figure 1
Figure 2
Business use case 1
• Business problem :
• An ecommerce company wants to measure the impact of product price on product
sales
• Input data:
• Predictor/independent variable is product price data for last year
• Dependent variable is product sales data for last year
• Business benefit:
• Product sales manager will get to know how much and in what direction does the
product price impact the product sales
• Decision on product price alteration can be made with more confidence according to
the sales target for that particular product
Business use case 2
• Business problem :
• An agriculture production firm wants to predict the impact of amount of rainfall on yield of
particular crop
• Input data:
• Predictor/independent variable : Amount of rainfall during monsoon months last year
• Dependent variable : Crop production data during monsoon months last year
• Business benefit:
• An agriculture firm can predict the yield of a particular crop based on the amount of rain fall
this year and can plan for the alternative crop arrangements and other contingencies if the
amount of rain fall is not adequate in order to get the desired / targeted crop production
Example : Simple linear regression
Consider the data obtained from a chemical process where the yield (Yi ) of the
process is thought to be related to the reaction temperature ( Xi )(see the table in
right)
Where
y
_
is the mean of all the observed values of dependent variable
x
_
is the mean of all values of the predictor variable
y
_
is calculated using
x
_
is calculated using
STEP 1 : Obtain the estimates, 𝜷0 and 𝜷1 in the equation Yi = 𝜷0 +𝜷i Xi +
𝜺𝒊 using the following equations :
Example : Simple linear regression
 Calculating 𝜷0 and 𝜷1 :
Once 𝜷0 and 𝜷1 are known, the
fitted regression line can be
written as:
Where y^ is the predicted
value based on the fitted
regression model
STEP 2 : Obtain values of y^ for each observation using the regression line fit equation
obtained in Step 1 : y^ = 𝟏𝟕 + 𝟐𝒙
Also compute the corresponding error terms using equation 𝜺𝒊 = yi - yi^ as shown below:
Predicted values
corresponding to each
observation :
y1^ = 17 + 2 x1 = 17 + 2*50 = 117
y2^ = 17 + 2 x2 = 17 + 2*53 = 123
y25^ = 17 + 2 x25 = 17 + 2*100 = 217
𝜺1^ = y1 - y1^ = 122 -117 = 5
𝜺2^ = y2 - y2^ = 118 -123 = -5
𝜺25 ^ = y25 - y25^ = 217-219 = -2
Error values
corresponding to each
predicted values:
Example : Simple Linear Regression
To get P value , we need T statistic, degree of freedom and significance
level (𝛼) which can be obtained as follows:
STEP 3 : Obtain the significance value (p value) to understand whether there exists a relation between
predictor and dependent variable i.e. temperature and yield in this case
1. Calculate standard error for 𝜷1 : 2. Calculate t statistic : 3. Calculate P value :
Assuming that the desired significance level is 0.1 ( i.e. 90% confidence threshold), since P value <
0.1 here , there exists a relation between Temperature and Yield variables.
P(T<t0) is
obtained
from t table
Example: Simple Linear Regression
Example: Simple
Linear Regression
This metric shows how much % of variability in Y (dependent variable : Yield in this case)
can be explained/predicted by the fitted model
STEP 4 : Calculate the measure of model
accuracy : Coefficient of Determination (R2)
 Before any inferences are undertaken ,
model accuracy must be checked
 Closer the value of R2 to 1 , better the
fitted model
 In this case it is 0.98 indicating 98% of
variability in Yield is explained by the
fitted model . Thus, the model is very
much accurate
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

More Related Content

What's hot

Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
James Neill
 
Change Point | Statistics
Change Point | StatisticsChange Point | Statistics
Change Point | Statistics
Transweb Global Inc
 
Logistic Ordinal Regression
Logistic Ordinal RegressionLogistic Ordinal Regression
Logistic Ordinal Regression
Sri Ambati
 
Presentation on Regression Analysis
Presentation on Regression AnalysisPresentation on Regression Analysis
Presentation on Regression Analysis
J P Verma
 
Chap15 analysis of variance
Chap15 analysis of varianceChap15 analysis of variance
Chap15 analysis of variance
Judianto Nugroho
 
Structural Equation Modelling (SEM) Part 2
Structural Equation Modelling (SEM) Part 2Structural Equation Modelling (SEM) Part 2
Structural Equation Modelling (SEM) Part 2
COSTARCH Analytical Consulting (P) Ltd.
 
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate TestStudent's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
Azmi Mohd Tamil
 
Introduction To Survival Analysis
Introduction To Survival AnalysisIntroduction To Survival Analysis
Introduction To Survival Analysis
federicorotolo
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
Dr Nisha Arora
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Models
richardchandler
 
Linear regression
Linear regression Linear regression
Linear regression
Babasab Patil
 
R square vs adjusted r square
R square vs adjusted r squareR square vs adjusted r square
R square vs adjusted r square
Akhilesh Joshi
 
Multivariate data analysis regression, cluster and factor analysis on spss
Multivariate data analysis   regression, cluster and factor analysis on spssMultivariate data analysis   regression, cluster and factor analysis on spss
Multivariate data analysis regression, cluster and factor analysis on spss
Aditya Banerjee
 
Logistic regression with SPSS
Logistic regression with SPSSLogistic regression with SPSS
Logistic regression with SPSS
LNIPE
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
VARUN KUMAR
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
Dhritiman Chakrabarti
 
Regression Analysis - Thiyagu
Regression Analysis - ThiyaguRegression Analysis - Thiyagu
Regression Analysis - Thiyagu
Thiyagu K
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
Alichy Sowmya
 
Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)
Naveen Kumar Medapalli
 

What's hot (20)

Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Change Point | Statistics
Change Point | StatisticsChange Point | Statistics
Change Point | Statistics
 
Logistic Ordinal Regression
Logistic Ordinal RegressionLogistic Ordinal Regression
Logistic Ordinal Regression
 
Presentation on Regression Analysis
Presentation on Regression AnalysisPresentation on Regression Analysis
Presentation on Regression Analysis
 
Chap15 analysis of variance
Chap15 analysis of varianceChap15 analysis of variance
Chap15 analysis of variance
 
Structural Equation Modelling (SEM) Part 2
Structural Equation Modelling (SEM) Part 2Structural Equation Modelling (SEM) Part 2
Structural Equation Modelling (SEM) Part 2
 
Survival Analysis Project
Survival Analysis Project Survival Analysis Project
Survival Analysis Project
 
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate TestStudent's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
 
Introduction To Survival Analysis
Introduction To Survival AnalysisIntroduction To Survival Analysis
Introduction To Survival Analysis
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Models
 
Linear regression
Linear regression Linear regression
Linear regression
 
R square vs adjusted r square
R square vs adjusted r squareR square vs adjusted r square
R square vs adjusted r square
 
Multivariate data analysis regression, cluster and factor analysis on spss
Multivariate data analysis   regression, cluster and factor analysis on spssMultivariate data analysis   regression, cluster and factor analysis on spss
Multivariate data analysis regression, cluster and factor analysis on spss
 
Logistic regression with SPSS
Logistic regression with SPSSLogistic regression with SPSS
Logistic regression with SPSS
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Regression Analysis - Thiyagu
Regression Analysis - ThiyaguRegression Analysis - Thiyagu
Regression Analysis - Thiyagu
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 
Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)
 

Similar to What is Simple Linear Regression and How Can an Enterprise Use this Technique to Analyze Data?

What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficient
MuhamamdZiaSamad
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
Smarten Augmented Analytics
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
Shailendra Tomar
 
Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
Ergin Akalpler
 
Statistical analysis in SPSS_
Statistical analysis in SPSS_ Statistical analysis in SPSS_
Statistical analysis in SPSS_
Dr. Anugamini Priya
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSS
Parag Shah
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
Anusuya123
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
Babasab Patil
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptx
krunal soni
 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
AbdalrahmanTahaJaya
 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
AbdalrahmanTahaJaya
 
IBM401 Lecture 5
IBM401 Lecture 5IBM401 Lecture 5
IBM401 Lecture 5saark
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
Smarten Augmented Analytics
 
12943625.ppt
12943625.ppt12943625.ppt
12943625.ppt
MokayceLimited
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxCORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
Vitalis Adongo
 
Quantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA ProgramQuantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA Program
Mohamed Farouk, CFA, CFTe I
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
Khulna University
 

Similar to What is Simple Linear Regression and How Can an Enterprise Use this Technique to Analyze Data? (20)

What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficient
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
 
Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
 
Statistical analysis in SPSS_
Statistical analysis in SPSS_ Statistical analysis in SPSS_
Statistical analysis in SPSS_
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSS
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
12 rhl gta
12 rhl gta12 rhl gta
12 rhl gta
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptx
 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
 
IBM401 Lecture 5
IBM401 Lecture 5IBM401 Lecture 5
IBM401 Lecture 5
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
12943625.ppt
12943625.ppt12943625.ppt
12943625.ppt
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxCORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
 
Quantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA ProgramQuantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA Program
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
 

More from Smarten Augmented Analytics

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
Smarten Augmented Analytics
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
Smarten Augmented Analytics
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
Smarten Augmented Analytics
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
Smarten Augmented Analytics
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Smarten Augmented Analytics
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Smarten Augmented Analytics
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
Smarten Augmented Analytics
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
Smarten Augmented Analytics
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
Smarten Augmented Analytics
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
Smarten Augmented Analytics
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
Smarten Augmented Analytics
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
Smarten Augmented Analytics
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
Smarten Augmented Analytics
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
Smarten Augmented Analytics
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
Smarten Augmented Analytics
 

More from Smarten Augmented Analytics (20)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
 

Recently uploaded

A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
vrstrong314
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 

Recently uploaded (20)

A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 

What is Simple Linear Regression and How Can an Enterprise Use this Technique to Analyze Data?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
  • 3. Terminologies Introduction & Example Standard input/tuning parameters & Sample UI Sample output UI Interpretation of Output Limitations Business use cases What Are All Covered
  • 4. Terminologies • Predictors and Target variable : • Target variable usually denoted by Y , is the variable being predicted and is also called dependent variable, output variable, response variable or outcome variable • Predictor, usually denoted by X , sometimes called an independent or explanatory variable, is a variable that is being used to predict the target variable • Correlation : • Correlation is a statistical measure that indicates the extent to which two variables fluctuate together • Upper & Lower N% confidence intervals: • A confidence interval is a statistical measure for saying, "I am pretty sure the true value of a number I am approximating is within this range with n% confidence
  • 5. Terminologies • Intercept / constant term 𝜷0 : • Intercept is the expected value of Y when all Xi = 0 • In other words, 𝛽0 represents what would be the minimum value of Y given all Xi = 0 • Coefficients 𝜷𝒊 : • It is interpreted as the expected value of Yi corresponding to one unit change in Xi • Error term 𝜺𝒊 : • It represents the margin of error within a model • It is a difference between the predicted value of Yi and observed value of Yi • Standard error of coefficient : • It is used to measure the precision of the estimate of the coefficient • In other words, the smaller the standard error, the more precise the estimate Where Yi is dependent variable Xi is independent variable
  • 6. Terminologies • T statistic: • Dividing the coefficient by its standard error gives t statistic which is used in calculation of P value • Degree of freedom: • Degree of freedom is N-K where N is number of observations and K is number of parameters used to calculate the estimate • Significance level /alpha level: • It represents level of confidence at which you want to test the results. • Lower values of alpha means higher confidence. For example if 𝛼=0.1, confidence= 100 - (𝛼*100) = 90% • P value : • If the p-value associated with this t-statistic is less than alpha level, it means that there exists a relation between corresponding predictor and dependent variable
  • 7. Types of Linear regression analysis • Depending on the number of independent variables/predictors in analysis, it is classified into two types : • Simple linear regression: • When there is only one dependent and one independent variable/predictor • Multiple linear regression : • When there is only one dependent variable but multiple independent variables/predictors • Where • Yi is dependent variable • Xi is independent variable • 𝛽0 is intercept • 𝛽𝑖 is coefficient • 𝜀𝑖 is the error term
  • 8. Introduction : Simple linear regression Objective : It is a statistical technique that attempts to explore the relationship between one independent variable (X) and one dependent variable (Y ) Benefit : Regression model output helps identify whether independent variable/predictor X has any relationship with dependent variable Y and if yes then what is the nature/direction of relationship ( i.e. positive/negative) between the both Model : Simple Linear regression model equation takes the form of Yi = 𝛽0 +𝛽1 Xi + 𝜀𝑖 as shown in image in right :
  • 9. Example: Simple linear regressionTemperature Yield 50 112 53 118 54 128 55 121 56 125 59 136 62 144 65 142 67 149 71 161 72 167 74 168 75 162 76 171 79 175 80 182 82 180 85 183 87 188 90 200 93 194 94 206 95 207 97 210 100 219 Input data Output Regression Statistics R Square 0.98 Coefficients P-value Lower 95% Upper 95% Intercept 13.33 0.00268 5.13 21.52 Temperature 2.04 0.00138 1.93 2.15 Model is a good fit as R square > 0.7 • P value for Temperature is <0.05; • Hence Temperature is an important factor for predicting Yield and has significant relation with Yield • With one unit increase in Temperature there is 2 times increase in Yield • Values of coefficients will lie between the range mentioned under upper and lower 95% • For example , coefficient of Temperature will be between 1.93 and 2.15 with 95% confidence (5 % chance of error) Let’s get the simple linear regression output for independent variable X and target variable Y as shown below: Note : Intercept is not an important statistics for checking the relation between X & Y
  • 10. Standard input/tuning parameters & Sample UI Select the predictor Temperature Yield Pressure range Step 1 Select the dependent variable Temperature Yield Pressure range Step 3 Step size =1 Number of Iterations = 100 Step 2 Display the output window containing following : o Model summary o Line fit plot o Normal probability plot o Residual versus Fit plot Step 4 Note : Categorical predictors should be auto detected & converted to binary variables before applying regression By default these parameters should be set with the values mentioned
  • 11. Sample output : 1. Model Summary Regression Statistics Multiple R 0.99 R Square 0.98 P-value : o It is used to evaluate whether the corresponding predictor X has any significant impact on the target variable Y o As p –value for temperature is < 0.05 (highlighted in yellow in table above) , temperature has significant relation with Yield Value of a temperature coefficient lies between 1.93 and 2.15 with 95% confidence  Multiple R : It depicts the correlation between X & Y , closer this value to ±1, higher the correlation  R square : It shows the goodness of fit of the model. It lies between 0 to 1 and closer this value to 1, better the model Coefficient: o It shows the magnitude as well as direction of impact of predictor X (temperature in this case) to a target variable Y o For example , in this case , with one unit increase in temperature, there is ‘2.04 unit increase’ in Yield ( yield increases 2 times with one unit increase in X) Coefficients P-value Lower 95% Upper 95% Intercept 13.33 0.00268 5.13 21.52 Temperature 2.04 0.00138 1.93 2.15 Check Interpretation section for more details
  • 12. Sample output : 2. Plots y^ = 𝟏𝟕 + 𝟐𝒙 R2 = 0.75 Line fit plot is used to check the assumption of linearity between X & Y Normal Probability plot is used to check the assumption of normality & to detect outliers Residual plot is used to check the assumption of equal error variances & outliers Check Interpretation section for more details
  • 13. Interpretation of Important Model Summary Statistics Multiple R : •R > 0.7 represents a strong positive correlation between X and Y •0.4 < = R < 0.7 represents a weak positive correlation between X and Y •0 <= R < 0.4 represents a negligible/no correlation between X and Y •-0.4 < = R < -0.7 represents a weak negative correlation between X and Y •R < - 0.7 represents a strong negative correlation between X and Y R Square : •R square > 0.7 represents a very good model i.e. model is able to explain 70% variability in Y •R square between 0 to 0.7 represents a model not fit well and assumptions of normality and linearity should be checked for better fitting of a model P value : •At 95% confidence threshold , if p-value for a predictor X is <0.05 then X is a significant/important predictor •At 95% confidence threshold , if p-value for a predictor X is >0.05 then X is an insignificant/unimportant predictor i.e. it doesn’t have significant relation with target variable Y Coefficients : •It indicates with how much magnitude the output variable will change with one unit change in X •For example, if coefficient of X is 2 then Y will increase 2 times with one unit increase in X •If coefficient of X is -2 then Y will decrease 2 times with one unit increase in X
  • 14. Interpretation of plots : Line Fit plot This plot is used to plot the relationship between X (predictor) & Y(target variable) with Y on y axis and X on x axis As shown in the figure1 in right, as temperature increases, so does the Yield, hence there is a linear relationship between X and Y and simple linear regression is applicable on this data Fitted regression line and regression equation is shown in the plot itself along with model R square value to describe how well the model fits the data and whether there is a linear relation between X and Y or not If R square is low (<0.7) and line doesn’t display linearity as shown in figures 2 & 3 in right then a linear regression model is not applicable and different model should be considered to predict Y y^ = 𝟏𝟕 + 𝟐𝒙 R2 = 0.75 Figure 1 Figure 2 Figure 3 R2 = 0.5 R2 = 0.4
  • 15. Interpretation of plots : Normal Probability plot This plots the percentile vs. target/dependent variable(Y) It is used to check the assumptions of linearity and normality in data and also to detect the outliers It can be helpful to add the trend line to see whether the data fits a straight line The plot in figure 1 shows that the pattern of dots in the plot lies close to a straight line; Therefore, data is normally distributed and there are no outliers Examples of non normal data are shown in figure 2 &3 in right and example of outliers is shown in figure 4 : Figure 1 Figure 2 Figure 3 Figure 4
  • 16. Interpretation of plots : Residual versus Fit plot It is the scattered plot of residuals on Y axis and predicted (fitted) values on X axis It is used to detect unequal error variances and outliers Here are the characteristics of a well-behaved residual vs. fits plot : The residuals should "bounce randomly" around the 0 line and should roughly form a "horizontal band" around the 0 line as shown in figure 1. This suggests that the variances of the error terms are equal No one residual should "stands out" from the basic random pattern of residuals. This suggests that there are no outliers For example the red data point in figure 1 is an outlier, such outliers should be removed from data before proceeding with model interpretation Plots shown in figures 2 & 3 above depict unequal error variances, which is not desirable for linear regression analysis Figure 1 Figure 2 Figure 3
  • 17. Limitations Simple linear regression is limited to predicting numeric output i.e. dependent variable has to be numeric in nature • Minimum sample size should be > 50+8m where m is number of predictors. • Hence in case of simple linear regression, minimum sample size should be 50+8(1) = 58 • It handles only two variables : one predictor and one dependent variable but usually there are more than one predictors correlated with the dependent variable which can’t be analyzed through simple linear regression
  • 18. Limitations Target/dependent variable should be normally distributed A normal distribution is an arrangement of a data set in which most values cluster in the middle of the range and the rest taper off symmetrically toward either extreme. It will look like a bell curve as shown in figure 1 in right Outliers in data can affect the analysis, hence outliers need to be removed Outliers are the observations lying outside overall pattern of distribution as shown in figure 2 in right These extreme values/outliers can be replaced with 1st or 99th percentile values Outliers Figure 1 Figure 2
  • 19. Business use case 1 • Business problem : • An ecommerce company wants to measure the impact of product price on product sales • Input data: • Predictor/independent variable is product price data for last year • Dependent variable is product sales data for last year • Business benefit: • Product sales manager will get to know how much and in what direction does the product price impact the product sales • Decision on product price alteration can be made with more confidence according to the sales target for that particular product
  • 20. Business use case 2 • Business problem : • An agriculture production firm wants to predict the impact of amount of rainfall on yield of particular crop • Input data: • Predictor/independent variable : Amount of rainfall during monsoon months last year • Dependent variable : Crop production data during monsoon months last year • Business benefit: • An agriculture firm can predict the yield of a particular crop based on the amount of rain fall this year and can plan for the alternative crop arrangements and other contingencies if the amount of rain fall is not adequate in order to get the desired / targeted crop production
  • 21. Example : Simple linear regression Consider the data obtained from a chemical process where the yield (Yi ) of the process is thought to be related to the reaction temperature ( Xi )(see the table in right) Where y _ is the mean of all the observed values of dependent variable x _ is the mean of all values of the predictor variable y _ is calculated using x _ is calculated using STEP 1 : Obtain the estimates, 𝜷0 and 𝜷1 in the equation Yi = 𝜷0 +𝜷i Xi + 𝜺𝒊 using the following equations :
  • 22. Example : Simple linear regression  Calculating 𝜷0 and 𝜷1 : Once 𝜷0 and 𝜷1 are known, the fitted regression line can be written as: Where y^ is the predicted value based on the fitted regression model
  • 23. STEP 2 : Obtain values of y^ for each observation using the regression line fit equation obtained in Step 1 : y^ = 𝟏𝟕 + 𝟐𝒙 Also compute the corresponding error terms using equation 𝜺𝒊 = yi - yi^ as shown below: Predicted values corresponding to each observation : y1^ = 17 + 2 x1 = 17 + 2*50 = 117 y2^ = 17 + 2 x2 = 17 + 2*53 = 123 y25^ = 17 + 2 x25 = 17 + 2*100 = 217 𝜺1^ = y1 - y1^ = 122 -117 = 5 𝜺2^ = y2 - y2^ = 118 -123 = -5 𝜺25 ^ = y25 - y25^ = 217-219 = -2 Error values corresponding to each predicted values: Example : Simple Linear Regression
  • 24. To get P value , we need T statistic, degree of freedom and significance level (𝛼) which can be obtained as follows: STEP 3 : Obtain the significance value (p value) to understand whether there exists a relation between predictor and dependent variable i.e. temperature and yield in this case 1. Calculate standard error for 𝜷1 : 2. Calculate t statistic : 3. Calculate P value : Assuming that the desired significance level is 0.1 ( i.e. 90% confidence threshold), since P value < 0.1 here , there exists a relation between Temperature and Yield variables. P(T<t0) is obtained from t table Example: Simple Linear Regression
  • 25. Example: Simple Linear Regression This metric shows how much % of variability in Y (dependent variable : Yield in this case) can be explained/predicted by the fitted model STEP 4 : Calculate the measure of model accuracy : Coefficient of Determination (R2)  Before any inferences are undertaken , model accuracy must be checked  Closer the value of R2 to 1 , better the fitted model  In this case it is 0.98 indicating 98% of variability in Yield is explained by the fitted model . Thus, the model is very much accurate
  • 26. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com June 2018