SlideShare a Scribd company logo
1 of 26
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
Simple Linear Regression
Terminologies
Introduction & Example
Standard input/tuning parameters & Sample UI
Sample output UI
Interpretation of Output
Limitations
Business use cases
What Are
All Covered
Terminologies
• Predictors and Target variable :
• Target variable usually denoted by Y , is the variable being predicted and is also called
dependent variable, output variable, response variable or outcome variable
• Predictor, usually denoted by X , sometimes called an independent or explanatory
variable, is a variable that is being used to predict the target variable
• Correlation :
• Correlation is a statistical measure that indicates the extent to which two variables
fluctuate together
• Upper & Lower N% confidence intervals:
• A confidence interval is a statistical measure for saying, "I am pretty sure the true value
of a number I am approximating is within this range with n% confidence
Terminologies
• Intercept / constant term 𝜷0 :
• Intercept is the expected value of Y when all Xi = 0
• In other words, 𝛽0 represents what would be the minimum value of Y given all Xi = 0
• Coefficients 𝜷𝒊 :
• It is interpreted as the expected value of Yi corresponding to one unit change in Xi
• Error term 𝜺𝒊 :
• It represents the margin of error within a model
• It is a difference between the predicted value of Yi and observed value of Yi
• Standard error of coefficient :
• It is used to measure the precision of the estimate of the coefficient
• In other words, the smaller the standard error, the more precise the estimate
Where Yi is dependent variable
Xi is independent variable
Terminologies
• T statistic:
• Dividing the coefficient by its standard error gives t statistic which is used in
calculation of P value
• Degree of freedom:
• Degree of freedom is N-K where N is number of observations and K is number of
parameters used to calculate the estimate
• Significance level /alpha level:
• It represents level of confidence at which you want to test the results.
• Lower values of alpha means higher confidence. For example if 𝛼=0.1, confidence=
100 - (𝛼*100) = 90%
• P value :
• If the p-value associated with this t-statistic is less than alpha level, it means that
there exists a relation between corresponding predictor and dependent variable
Types of Linear regression analysis
• Depending on the number of independent variables/predictors in analysis, it is classified into two types :
• Simple linear regression:
• When there is only one dependent and one independent variable/predictor
• Multiple linear regression :
• When there is only one dependent variable but multiple independent variables/predictors
• Where
• Yi is dependent variable
• Xi is independent variable
• 𝛽0 is intercept
• 𝛽𝑖 is coefficient
• 𝜀𝑖 is the error term
Introduction : Simple
linear regression
Objective :
It is a statistical technique that attempts to
explore the relationship between one
independent variable (X) and one dependent
variable (Y )
Benefit :
Regression model output helps identify whether
independent variable/predictor X has any
relationship with dependent variable Y and if
yes then what is the nature/direction of
relationship ( i.e. positive/negative) between
the both
Model :
Simple Linear regression model equation takes
the form of Yi = 𝛽0 +𝛽1 Xi + 𝜀𝑖 as shown in
image in right :
Example: Simple linear regressionTemperature Yield
50 112
53 118
54 128
55 121
56 125
59 136
62 144
65 142
67 149
71 161
72 167
74 168
75 162
76 171
79 175
80 182
82 180
85 183
87 188
90 200
93 194
94 206
95 207
97 210
100 219
Input data Output
Regression Statistics
R Square 0.98
Coefficients P-value Lower 95% Upper 95%
Intercept 13.33 0.00268 5.13 21.52
Temperature 2.04 0.00138 1.93 2.15
Model is a good fit
as R square > 0.7
• P value for Temperature is <0.05;
• Hence Temperature is an important
factor for predicting Yield and has
significant relation with Yield
• With one unit increase in
Temperature there is 2 times
increase in Yield
• Values of coefficients will lie
between the range mentioned
under upper and lower 95%
• For example , coefficient of
Temperature will be between 1.93
and 2.15 with 95% confidence (5 %
chance of error)
Let’s get the simple linear regression output for independent variable X and
target variable Y as shown below:
Note : Intercept is not an important statistics for checking the relation between X & Y
Standard input/tuning parameters & Sample
UI
Select the predictor
Temperature
Yield
Pressure range
Step
1
Select the dependent variable
Temperature
Yield
Pressure range
Step 3
Step size =1
Number of Iterations = 100
Step
2
Display the output window
containing following :
o Model summary
o Line fit plot
o Normal probability plot
o Residual versus Fit plot
Step 4
Note : Categorical predictors should be auto detected &
converted to binary variables before applying regression
By default these parameters should
be set with the values mentioned
Sample output : 1. Model Summary
Regression Statistics
Multiple R 0.99
R Square 0.98
P-value :
o It is used to evaluate whether the corresponding predictor X has any significant impact on the target
variable Y
o As p –value for temperature is < 0.05 (highlighted in yellow in table above) , temperature has
significant relation with Yield
Value of a temperature coefficient
lies between 1.93 and 2.15 with 95%
confidence
 Multiple R : It depicts the correlation between X & Y , closer this value
to ±1, higher the correlation
 R square : It shows the goodness of fit of the model. It lies between 0 to
1 and closer this value to 1, better the model
Coefficient:
o It shows the magnitude as well as direction of impact of predictor X (temperature in this case) to a
target variable Y
o For example , in this case , with one unit increase in temperature, there is ‘2.04 unit increase’ in
Yield ( yield increases 2 times with one unit increase in X)
Coefficients P-value Lower 95% Upper 95%
Intercept 13.33 0.00268 5.13 21.52
Temperature 2.04 0.00138 1.93 2.15
Check Interpretation section for more details
Sample output : 2. Plots
y^ = 𝟏𝟕 + 𝟐𝒙
R2 = 0.75
Line fit plot is used to check the assumption of
linearity between X & Y
Normal Probability plot is used to check the
assumption of normality & to detect outliers
Residual plot is used to check the assumption
of equal error variances & outliers
Check Interpretation section for more details
Interpretation of Important Model Summary
Statistics
Multiple R :
•R > 0.7 represents a strong
positive correlation
between X and Y
•0.4 < = R < 0.7 represents a
weak positive correlation
between X and Y
•0 <= R < 0.4 represents a
negligible/no correlation
between X and Y
•-0.4 < = R < -0.7 represents
a weak negative
correlation between X and Y
•R < - 0.7 represents a
strong negative correlation
between X and Y
R Square :
•R square > 0.7 represents a
very good model i.e. model
is able to explain 70%
variability in Y
•R square between 0 to 0.7
represents a model not fit
well and assumptions of
normality and linearity
should be checked for better
fitting of a model
P value :
•At 95% confidence threshold
, if p-value for a predictor X
is <0.05 then X is a
significant/important
predictor
•At 95% confidence threshold
, if p-value for a predictor X
is >0.05 then X is an
insignificant/unimportant
predictor i.e. it doesn’t have
significant relation with
target variable Y
Coefficients :
•It indicates with how much
magnitude the output
variable will change with
one unit change in X
•For example, if coefficient
of X is 2 then Y will
increase 2 times with one
unit increase in X
•If coefficient of X is -2
then Y will decrease 2
times with one unit
increase in X
Interpretation of plots
: Line Fit plot
This plot is used to plot the relationship between
X (predictor) & Y(target variable) with Y on y
axis and X on x axis
As shown in the figure1 in right, as temperature
increases, so does the Yield, hence there is a
linear relationship between X and Y and simple
linear regression is applicable on this data
Fitted regression line and regression equation is
shown in the plot itself along with model R
square value to describe how well the model fits
the data and whether there is a linear relation
between X and Y or not
If R square is low (<0.7) and line doesn’t display
linearity as shown in figures 2 & 3 in right then a
linear regression model is not applicable and
different model should be considered to predict
Y
y^ = 𝟏𝟕 + 𝟐𝒙
R2 = 0.75
Figure 1
Figure 2
Figure 3
R2 = 0.5
R2 = 0.4
Interpretation of plots
: Normal Probability
plot
This plots the percentile vs. target/dependent
variable(Y)
It is used to check the assumptions of
linearity and normality in data and also to
detect the outliers
It can be helpful to add the trend line to see
whether the data fits a straight line
The plot in figure 1 shows that the pattern of
dots in the plot lies close to a straight line;
Therefore, data is normally distributed and
there are no outliers
Examples of non normal data are shown in
figure 2 &3 in right and example of outliers is
shown in figure 4 :
Figure 1
Figure 2
Figure 3
Figure 4
Interpretation of plots
: Residual versus Fit
plot
It is the scattered plot of residuals on Y axis and predicted
(fitted) values on X axis
It is used to detect unequal error variances and outliers
Here are the characteristics of a well-behaved residual vs.
fits plot :
The residuals should "bounce randomly" around the 0 line
and should roughly form a "horizontal band" around the 0
line as shown in figure 1. This suggests that the variances of
the error terms are equal
No one residual should "stands out" from the basic random
pattern of residuals. This suggests that there are no outliers
For example the red data point in figure 1 is an outlier, such
outliers should be removed from data before proceeding
with model interpretation
Plots shown in figures 2 & 3 above depict unequal error
variances, which is not desirable for linear regression
analysis
Figure 1
Figure 2
Figure 3
Limitations
Simple linear regression is limited to predicting numeric output i.e.
dependent variable has to be numeric in nature
• Minimum sample size should be > 50+8m where m is number of
predictors.
• Hence in case of simple linear regression, minimum sample size should be
50+8(1) = 58
• It handles only two variables : one predictor and one dependent
variable but usually there are more than one predictors correlated
with the dependent variable which can’t be analyzed through simple
linear regression
Limitations
Target/dependent variable should be normally
distributed
A normal distribution is an arrangement of a
data set in which most values cluster in the
middle of the range and the rest taper off
symmetrically toward either extreme. It will
look like a bell curve as shown in figure 1 in right
Outliers in data can affect the analysis, hence
outliers need to be removed
Outliers are the observations lying outside
overall pattern of distribution as shown in figure
2 in right
These extreme values/outliers can be replaced
with 1st or 99th percentile values
Outliers
Figure 1
Figure 2
Business use case 1
• Business problem :
• An ecommerce company wants to measure the impact of product price on product
sales
• Input data:
• Predictor/independent variable is product price data for last year
• Dependent variable is product sales data for last year
• Business benefit:
• Product sales manager will get to know how much and in what direction does the
product price impact the product sales
• Decision on product price alteration can be made with more confidence according to
the sales target for that particular product
Business use case 2
• Business problem :
• An agriculture production firm wants to predict the impact of amount of rainfall on yield of
particular crop
• Input data:
• Predictor/independent variable : Amount of rainfall during monsoon months last year
• Dependent variable : Crop production data during monsoon months last year
• Business benefit:
• An agriculture firm can predict the yield of a particular crop based on the amount of rain fall
this year and can plan for the alternative crop arrangements and other contingencies if the
amount of rain fall is not adequate in order to get the desired / targeted crop production
Example : Simple linear regression
Consider the data obtained from a chemical process where the yield (Yi ) of the
process is thought to be related to the reaction temperature ( Xi )(see the table in
right)
Where
y
_
is the mean of all the observed values of dependent variable
x
_
is the mean of all values of the predictor variable
y
_
is calculated using
x
_
is calculated using
STEP 1 : Obtain the estimates, 𝜷0 and 𝜷1 in the equation Yi = 𝜷0 +𝜷i Xi +
𝜺𝒊 using the following equations :
Example : Simple linear regression
 Calculating 𝜷0 and 𝜷1 :
Once 𝜷0 and 𝜷1 are known, the
fitted regression line can be
written as:
Where y^ is the predicted
value based on the fitted
regression model
STEP 2 : Obtain values of y^ for each observation using the regression line fit equation
obtained in Step 1 : y^ = 𝟏𝟕 + 𝟐𝒙
Also compute the corresponding error terms using equation 𝜺𝒊 = yi - yi^ as shown below:
Predicted values
corresponding to each
observation :
y1^ = 17 + 2 x1 = 17 + 2*50 = 117
y2^ = 17 + 2 x2 = 17 + 2*53 = 123
y25^ = 17 + 2 x25 = 17 + 2*100 = 217
𝜺1^ = y1 - y1^ = 122 -117 = 5
𝜺2^ = y2 - y2^ = 118 -123 = -5
𝜺25 ^ = y25 - y25^ = 217-219 = -2
Error values
corresponding to each
predicted values:
Example : Simple Linear Regression
To get P value , we need T statistic, degree of freedom and significance
level (𝛼) which can be obtained as follows:
STEP 3 : Obtain the significance value (p value) to understand whether there exists a relation between
predictor and dependent variable i.e. temperature and yield in this case
1. Calculate standard error for 𝜷1 : 2. Calculate t statistic : 3. Calculate P value :
Assuming that the desired significance level is 0.1 ( i.e. 90% confidence threshold), since P value <
0.1 here , there exists a relation between Temperature and Yield variables.
P(T<t0) is
obtained
from t table
Example: Simple Linear Regression
Example: Simple
Linear Regression
This metric shows how much % of variability in Y (dependent variable : Yield in this case)
can be explained/predicted by the fitted model
STEP 4 : Calculate the measure of model
accuracy : Coefficient of Determination (R2)
 Before any inferences are undertaken ,
model accuracy must be checked
 Closer the value of R2 to 1 , better the
fitted model
 In this case it is 0.98 indicating 98% of
variability in Yield is explained by the
fitted model . Thus, the model is very
much accurate
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

More Related Content

What's hot

Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysisnadiazaheer
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear RegressionAndrew Ferlitsch
 
Stat 130 chi-square goodnes-of-fit test
Stat 130   chi-square goodnes-of-fit testStat 130   chi-square goodnes-of-fit test
Stat 130 chi-square goodnes-of-fit testAldrin Lozano
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Dr Athar Khan
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easyWeam Banjar
 
Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tearsAnkit Sharma
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regressionJames Neill
 
Regression analysis
Regression analysisRegression analysis
Regression analysisRavi shankar
 
hypothesis testing
hypothesis testinghypothesis testing
hypothesis testingilona50
 
Simple Linear Regression (simplified)
Simple Linear Regression (simplified)Simple Linear Regression (simplified)
Simple Linear Regression (simplified)Haoran Zhang
 
Statistical Estimation and Testing Lecture Notes.pdf
Statistical Estimation and Testing Lecture Notes.pdfStatistical Estimation and Testing Lecture Notes.pdf
Statistical Estimation and Testing Lecture Notes.pdfDr. Tushar J Bhatt
 
Regression analysis
Regression analysisRegression analysis
Regression analysissaba khan
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisUmair Shafique
 

What's hot (20)

Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Regression
RegressionRegression
Regression
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear Regression
 
Stat 130 chi-square goodnes-of-fit test
Stat 130   chi-square goodnes-of-fit testStat 130   chi-square goodnes-of-fit test
Stat 130 chi-square goodnes-of-fit test
 
Linear regression theory
Linear regression theoryLinear regression theory
Linear regression theory
 
Point estimation
Point estimationPoint estimation
Point estimation
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easy
 
Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tears
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
hypothesis testing
hypothesis testinghypothesis testing
hypothesis testing
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Simple Linear Regression (simplified)
Simple Linear Regression (simplified)Simple Linear Regression (simplified)
Simple Linear Regression (simplified)
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
Statistical Estimation and Testing Lecture Notes.pdf
Statistical Estimation and Testing Lecture Notes.pdfStatistical Estimation and Testing Lecture Notes.pdf
Statistical Estimation and Testing Lecture Notes.pdf
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 

Similar to What is Simple Linear Regression and How Can an Enterprise Use this Technique to Analyze Data?

What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficientMuhamamdZiaSamad
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisShailendra Tomar
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSSParag Shah
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxAnusuya123
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxkrunal soni
 
IBM401 Lecture 5
IBM401 Lecture 5IBM401 Lecture 5
IBM401 Lecture 5saark
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...Smarten Augmented Analytics
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxCORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxVitalis Adongo
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regressionKhulna University
 

Similar to What is Simple Linear Regression and How Can an Enterprise Use this Technique to Analyze Data? (20)

What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Regression &amp; correlation coefficient
Regression &amp; correlation coefficientRegression &amp; correlation coefficient
Regression &amp; correlation coefficient
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
 
Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
 
Statistical analysis in SPSS_
Statistical analysis in SPSS_ Statistical analysis in SPSS_
Statistical analysis in SPSS_
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSS
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
12 rhl gta
12 rhl gta12 rhl gta
12 rhl gta
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
Correlation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptxCorrelation _ Regression Analysis statistics.pptx
Correlation _ Regression Analysis statistics.pptx
 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
 
IBM401 Lecture 5
IBM401 Lecture 5IBM401 Lecture 5
IBM401 Lecture 5
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
12943625.ppt
12943625.ppt12943625.ppt
12943625.ppt
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxCORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
 
Quantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA ProgramQuantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA Program
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
 

More from Smarten Augmented Analytics

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenSmarten Augmented Analytics
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...Smarten Augmented Analytics
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?Smarten Augmented Analytics
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values Smarten Augmented Analytics
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenSmarten Augmented Analytics
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?Smarten Augmented Analytics
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...Smarten Augmented Analytics
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...Smarten Augmented Analytics
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...Smarten Augmented Analytics
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?Smarten Augmented Analytics
 

More from Smarten Augmented Analytics (20)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
 

Recently uploaded

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 

What is Simple Linear Regression and How Can an Enterprise Use this Technique to Analyze Data?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
  • 3. Terminologies Introduction & Example Standard input/tuning parameters & Sample UI Sample output UI Interpretation of Output Limitations Business use cases What Are All Covered
  • 4. Terminologies • Predictors and Target variable : • Target variable usually denoted by Y , is the variable being predicted and is also called dependent variable, output variable, response variable or outcome variable • Predictor, usually denoted by X , sometimes called an independent or explanatory variable, is a variable that is being used to predict the target variable • Correlation : • Correlation is a statistical measure that indicates the extent to which two variables fluctuate together • Upper & Lower N% confidence intervals: • A confidence interval is a statistical measure for saying, "I am pretty sure the true value of a number I am approximating is within this range with n% confidence
  • 5. Terminologies • Intercept / constant term 𝜷0 : • Intercept is the expected value of Y when all Xi = 0 • In other words, 𝛽0 represents what would be the minimum value of Y given all Xi = 0 • Coefficients 𝜷𝒊 : • It is interpreted as the expected value of Yi corresponding to one unit change in Xi • Error term 𝜺𝒊 : • It represents the margin of error within a model • It is a difference between the predicted value of Yi and observed value of Yi • Standard error of coefficient : • It is used to measure the precision of the estimate of the coefficient • In other words, the smaller the standard error, the more precise the estimate Where Yi is dependent variable Xi is independent variable
  • 6. Terminologies • T statistic: • Dividing the coefficient by its standard error gives t statistic which is used in calculation of P value • Degree of freedom: • Degree of freedom is N-K where N is number of observations and K is number of parameters used to calculate the estimate • Significance level /alpha level: • It represents level of confidence at which you want to test the results. • Lower values of alpha means higher confidence. For example if 𝛼=0.1, confidence= 100 - (𝛼*100) = 90% • P value : • If the p-value associated with this t-statistic is less than alpha level, it means that there exists a relation between corresponding predictor and dependent variable
  • 7. Types of Linear regression analysis • Depending on the number of independent variables/predictors in analysis, it is classified into two types : • Simple linear regression: • When there is only one dependent and one independent variable/predictor • Multiple linear regression : • When there is only one dependent variable but multiple independent variables/predictors • Where • Yi is dependent variable • Xi is independent variable • 𝛽0 is intercept • 𝛽𝑖 is coefficient • 𝜀𝑖 is the error term
  • 8. Introduction : Simple linear regression Objective : It is a statistical technique that attempts to explore the relationship between one independent variable (X) and one dependent variable (Y ) Benefit : Regression model output helps identify whether independent variable/predictor X has any relationship with dependent variable Y and if yes then what is the nature/direction of relationship ( i.e. positive/negative) between the both Model : Simple Linear regression model equation takes the form of Yi = 𝛽0 +𝛽1 Xi + 𝜀𝑖 as shown in image in right :
  • 9. Example: Simple linear regressionTemperature Yield 50 112 53 118 54 128 55 121 56 125 59 136 62 144 65 142 67 149 71 161 72 167 74 168 75 162 76 171 79 175 80 182 82 180 85 183 87 188 90 200 93 194 94 206 95 207 97 210 100 219 Input data Output Regression Statistics R Square 0.98 Coefficients P-value Lower 95% Upper 95% Intercept 13.33 0.00268 5.13 21.52 Temperature 2.04 0.00138 1.93 2.15 Model is a good fit as R square > 0.7 • P value for Temperature is <0.05; • Hence Temperature is an important factor for predicting Yield and has significant relation with Yield • With one unit increase in Temperature there is 2 times increase in Yield • Values of coefficients will lie between the range mentioned under upper and lower 95% • For example , coefficient of Temperature will be between 1.93 and 2.15 with 95% confidence (5 % chance of error) Let’s get the simple linear regression output for independent variable X and target variable Y as shown below: Note : Intercept is not an important statistics for checking the relation between X & Y
  • 10. Standard input/tuning parameters & Sample UI Select the predictor Temperature Yield Pressure range Step 1 Select the dependent variable Temperature Yield Pressure range Step 3 Step size =1 Number of Iterations = 100 Step 2 Display the output window containing following : o Model summary o Line fit plot o Normal probability plot o Residual versus Fit plot Step 4 Note : Categorical predictors should be auto detected & converted to binary variables before applying regression By default these parameters should be set with the values mentioned
  • 11. Sample output : 1. Model Summary Regression Statistics Multiple R 0.99 R Square 0.98 P-value : o It is used to evaluate whether the corresponding predictor X has any significant impact on the target variable Y o As p –value for temperature is < 0.05 (highlighted in yellow in table above) , temperature has significant relation with Yield Value of a temperature coefficient lies between 1.93 and 2.15 with 95% confidence  Multiple R : It depicts the correlation between X & Y , closer this value to ±1, higher the correlation  R square : It shows the goodness of fit of the model. It lies between 0 to 1 and closer this value to 1, better the model Coefficient: o It shows the magnitude as well as direction of impact of predictor X (temperature in this case) to a target variable Y o For example , in this case , with one unit increase in temperature, there is ‘2.04 unit increase’ in Yield ( yield increases 2 times with one unit increase in X) Coefficients P-value Lower 95% Upper 95% Intercept 13.33 0.00268 5.13 21.52 Temperature 2.04 0.00138 1.93 2.15 Check Interpretation section for more details
  • 12. Sample output : 2. Plots y^ = 𝟏𝟕 + 𝟐𝒙 R2 = 0.75 Line fit plot is used to check the assumption of linearity between X & Y Normal Probability plot is used to check the assumption of normality & to detect outliers Residual plot is used to check the assumption of equal error variances & outliers Check Interpretation section for more details
  • 13. Interpretation of Important Model Summary Statistics Multiple R : •R > 0.7 represents a strong positive correlation between X and Y •0.4 < = R < 0.7 represents a weak positive correlation between X and Y •0 <= R < 0.4 represents a negligible/no correlation between X and Y •-0.4 < = R < -0.7 represents a weak negative correlation between X and Y •R < - 0.7 represents a strong negative correlation between X and Y R Square : •R square > 0.7 represents a very good model i.e. model is able to explain 70% variability in Y •R square between 0 to 0.7 represents a model not fit well and assumptions of normality and linearity should be checked for better fitting of a model P value : •At 95% confidence threshold , if p-value for a predictor X is <0.05 then X is a significant/important predictor •At 95% confidence threshold , if p-value for a predictor X is >0.05 then X is an insignificant/unimportant predictor i.e. it doesn’t have significant relation with target variable Y Coefficients : •It indicates with how much magnitude the output variable will change with one unit change in X •For example, if coefficient of X is 2 then Y will increase 2 times with one unit increase in X •If coefficient of X is -2 then Y will decrease 2 times with one unit increase in X
  • 14. Interpretation of plots : Line Fit plot This plot is used to plot the relationship between X (predictor) & Y(target variable) with Y on y axis and X on x axis As shown in the figure1 in right, as temperature increases, so does the Yield, hence there is a linear relationship between X and Y and simple linear regression is applicable on this data Fitted regression line and regression equation is shown in the plot itself along with model R square value to describe how well the model fits the data and whether there is a linear relation between X and Y or not If R square is low (<0.7) and line doesn’t display linearity as shown in figures 2 & 3 in right then a linear regression model is not applicable and different model should be considered to predict Y y^ = 𝟏𝟕 + 𝟐𝒙 R2 = 0.75 Figure 1 Figure 2 Figure 3 R2 = 0.5 R2 = 0.4
  • 15. Interpretation of plots : Normal Probability plot This plots the percentile vs. target/dependent variable(Y) It is used to check the assumptions of linearity and normality in data and also to detect the outliers It can be helpful to add the trend line to see whether the data fits a straight line The plot in figure 1 shows that the pattern of dots in the plot lies close to a straight line; Therefore, data is normally distributed and there are no outliers Examples of non normal data are shown in figure 2 &3 in right and example of outliers is shown in figure 4 : Figure 1 Figure 2 Figure 3 Figure 4
  • 16. Interpretation of plots : Residual versus Fit plot It is the scattered plot of residuals on Y axis and predicted (fitted) values on X axis It is used to detect unequal error variances and outliers Here are the characteristics of a well-behaved residual vs. fits plot : The residuals should "bounce randomly" around the 0 line and should roughly form a "horizontal band" around the 0 line as shown in figure 1. This suggests that the variances of the error terms are equal No one residual should "stands out" from the basic random pattern of residuals. This suggests that there are no outliers For example the red data point in figure 1 is an outlier, such outliers should be removed from data before proceeding with model interpretation Plots shown in figures 2 & 3 above depict unequal error variances, which is not desirable for linear regression analysis Figure 1 Figure 2 Figure 3
  • 17. Limitations Simple linear regression is limited to predicting numeric output i.e. dependent variable has to be numeric in nature • Minimum sample size should be > 50+8m where m is number of predictors. • Hence in case of simple linear regression, minimum sample size should be 50+8(1) = 58 • It handles only two variables : one predictor and one dependent variable but usually there are more than one predictors correlated with the dependent variable which can’t be analyzed through simple linear regression
  • 18. Limitations Target/dependent variable should be normally distributed A normal distribution is an arrangement of a data set in which most values cluster in the middle of the range and the rest taper off symmetrically toward either extreme. It will look like a bell curve as shown in figure 1 in right Outliers in data can affect the analysis, hence outliers need to be removed Outliers are the observations lying outside overall pattern of distribution as shown in figure 2 in right These extreme values/outliers can be replaced with 1st or 99th percentile values Outliers Figure 1 Figure 2
  • 19. Business use case 1 • Business problem : • An ecommerce company wants to measure the impact of product price on product sales • Input data: • Predictor/independent variable is product price data for last year • Dependent variable is product sales data for last year • Business benefit: • Product sales manager will get to know how much and in what direction does the product price impact the product sales • Decision on product price alteration can be made with more confidence according to the sales target for that particular product
  • 20. Business use case 2 • Business problem : • An agriculture production firm wants to predict the impact of amount of rainfall on yield of particular crop • Input data: • Predictor/independent variable : Amount of rainfall during monsoon months last year • Dependent variable : Crop production data during monsoon months last year • Business benefit: • An agriculture firm can predict the yield of a particular crop based on the amount of rain fall this year and can plan for the alternative crop arrangements and other contingencies if the amount of rain fall is not adequate in order to get the desired / targeted crop production
  • 21. Example : Simple linear regression Consider the data obtained from a chemical process where the yield (Yi ) of the process is thought to be related to the reaction temperature ( Xi )(see the table in right) Where y _ is the mean of all the observed values of dependent variable x _ is the mean of all values of the predictor variable y _ is calculated using x _ is calculated using STEP 1 : Obtain the estimates, 𝜷0 and 𝜷1 in the equation Yi = 𝜷0 +𝜷i Xi + 𝜺𝒊 using the following equations :
  • 22. Example : Simple linear regression  Calculating 𝜷0 and 𝜷1 : Once 𝜷0 and 𝜷1 are known, the fitted regression line can be written as: Where y^ is the predicted value based on the fitted regression model
  • 23. STEP 2 : Obtain values of y^ for each observation using the regression line fit equation obtained in Step 1 : y^ = 𝟏𝟕 + 𝟐𝒙 Also compute the corresponding error terms using equation 𝜺𝒊 = yi - yi^ as shown below: Predicted values corresponding to each observation : y1^ = 17 + 2 x1 = 17 + 2*50 = 117 y2^ = 17 + 2 x2 = 17 + 2*53 = 123 y25^ = 17 + 2 x25 = 17 + 2*100 = 217 𝜺1^ = y1 - y1^ = 122 -117 = 5 𝜺2^ = y2 - y2^ = 118 -123 = -5 𝜺25 ^ = y25 - y25^ = 217-219 = -2 Error values corresponding to each predicted values: Example : Simple Linear Regression
  • 24. To get P value , we need T statistic, degree of freedom and significance level (𝛼) which can be obtained as follows: STEP 3 : Obtain the significance value (p value) to understand whether there exists a relation between predictor and dependent variable i.e. temperature and yield in this case 1. Calculate standard error for 𝜷1 : 2. Calculate t statistic : 3. Calculate P value : Assuming that the desired significance level is 0.1 ( i.e. 90% confidence threshold), since P value < 0.1 here , there exists a relation between Temperature and Yield variables. P(T<t0) is obtained from t table Example: Simple Linear Regression
  • 25. Example: Simple Linear Regression This metric shows how much % of variability in Y (dependent variable : Yield in this case) can be explained/predicted by the fitted model STEP 4 : Calculate the measure of model accuracy : Coefficient of Determination (R2)  Before any inferences are undertaken , model accuracy must be checked  Closer the value of R2 to 1 , better the fitted model  In this case it is 0.98 indicating 98% of variability in Yield is explained by the fitted model . Thus, the model is very much accurate
  • 26. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com June 2018