Simple Linear Regression is a statistical technique that attempts to explore the relationship between one independent variable (X) and one dependent variable (Y). The Simple Linear Regression technique is not suitable for datasets where more than one variable/predictor exists.
Logistic Regression: Predicting The Chances Of Coronary Heart DiseaseMichael Lieberman
Logistic Regression - Predicting the Chances of Coronary Heart Disease weighs risks factors for heart disease and calculates the odds of contracting the disease within the next ten years.
Logistic Regression: Predicting The Chances Of Coronary Heart DiseaseMichael Lieberman
Logistic Regression - Predicting the Chances of Coronary Heart Disease weighs risks factors for heart disease and calculates the odds of contracting the disease within the next ten years.
Introduces and explains the use of multiple linear regression, a multivariate correlational statistical technique. For more info, see the lecture page at http://goo.gl/CeBsv. See also the slides for the MLR II lecture http://www.slideshare.net/jtneill/multiple-linear-regression-ii
This article is used to give a basic information regarding the change points that occur in excel and in other files. The detection methods are proposed and they are analyzed with a real time example. The features and application of the change point is also discussed in the later. Copy the link given below and paste it in new browser window to get more information on Change Point:- http://www.transtutors.com/homework-help/statistics/change-point.aspx
This presentation describes the application of regression analysis in research, testing assumptions involved in it and understanding the outputs generated in the analysis.
Multivariate data analysis regression, cluster and factor analysis on spssAditya Banerjee
Using multiple techniques to analyse data on SPSS. A basic software that can easily help run the numbers. Multivariate Data Analysis runs regressions models, factor analyses, and clustering models apart from many more
Regression analysis mathematically and statistically describes the relationship between a set of independent variables and a dependent variable. This presentation describes the concept of regression and its types with suitable illustrations. This presentation also explains the regression analysis spss path and its interpretations.
Regression analysis is a powerful statistical method that allows you to examine the relationship between two or more variables of interest. Regression analysis is a reliable method of identifying which variables have impact on a topic of interest. The process of performing a regression allows you to confidently determine which factors matter most, which factors can be ignored, and how these factors influence each other.In this presentation a brief introduction about SLR and MLR and their codes in R are described
Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables.
Multiple Linear Regression is a statistical technique that is designed to explore the relationship between two or more. It is useful in identifying important factors that will affect a dependent variable, and the nature of the relationship between each of the factors and the dependent variable. It can help an enterprise consider the impact of multiple independent predictors and variables on a dependent variable, and is beneficial for forecasting and predicting results.
Multiple Linear Regression is a statistical technique that is designed to explore the relationship between two or more. It is useful in identifying important factors that will affect a dependent variable, and the nature of the relationship between each of the factors and the dependent variable. It can help an enterprise consider the impact of multiple independent predictors and variables on a dependent variable, and is beneficial for forecasting and predicting results.
Introduces and explains the use of multiple linear regression, a multivariate correlational statistical technique. For more info, see the lecture page at http://goo.gl/CeBsv. See also the slides for the MLR II lecture http://www.slideshare.net/jtneill/multiple-linear-regression-ii
This article is used to give a basic information regarding the change points that occur in excel and in other files. The detection methods are proposed and they are analyzed with a real time example. The features and application of the change point is also discussed in the later. Copy the link given below and paste it in new browser window to get more information on Change Point:- http://www.transtutors.com/homework-help/statistics/change-point.aspx
This presentation describes the application of regression analysis in research, testing assumptions involved in it and understanding the outputs generated in the analysis.
Multivariate data analysis regression, cluster and factor analysis on spssAditya Banerjee
Using multiple techniques to analyse data on SPSS. A basic software that can easily help run the numbers. Multivariate Data Analysis runs regressions models, factor analyses, and clustering models apart from many more
Regression analysis mathematically and statistically describes the relationship between a set of independent variables and a dependent variable. This presentation describes the concept of regression and its types with suitable illustrations. This presentation also explains the regression analysis spss path and its interpretations.
Regression analysis is a powerful statistical method that allows you to examine the relationship between two or more variables of interest. Regression analysis is a reliable method of identifying which variables have impact on a topic of interest. The process of performing a regression allows you to confidently determine which factors matter most, which factors can be ignored, and how these factors influence each other.In this presentation a brief introduction about SLR and MLR and their codes in R are described
Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables.
Multiple Linear Regression is a statistical technique that is designed to explore the relationship between two or more. It is useful in identifying important factors that will affect a dependent variable, and the nature of the relationship between each of the factors and the dependent variable. It can help an enterprise consider the impact of multiple independent predictors and variables on a dependent variable, and is beneficial for forecasting and predicting results.
Multiple Linear Regression is a statistical technique that is designed to explore the relationship between two or more. It is useful in identifying important factors that will affect a dependent variable, and the nature of the relationship between each of the factors and the dependent variable. It can help an enterprise consider the impact of multiple independent predictors and variables on a dependent variable, and is beneficial for forecasting and predicting results.
Isotonic Regression is a statistical technique of fitting a free-form line to a sequence of observations such that the fitted line is non-decreasing (or non-increasing) everywhere, and lies as close to the observations as possible. Isotonic Regression is limited to predicting numeric output so the dependent variable must be numeric in nature…
Regression Analysis is simplified in this presentation. Starting with simple linear to multiple regression analysis, it covers all the statistics and interpretation of various diagnostic plots. Besides, how to verify regression assumptions and some advance concepts of choosing best models makes the slides more useful SAS program codes of two examples are also included.
Correlation & Regression Analysis using SPSSParag Shah
Concept of Correlation, Simple Linear Regression & Multiple Linear Regression and its analysis using SPSS. How it check the validity of assumptions in Regression
Generalized Linear Regression with Gaussian Distribution is a statistical technique which is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The Generalized Linear Model (GLM) generalizes linear regression by allowing the linear model to be related to the response variable via a link function (in this case link function being Gaussian Distribution) and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
Prediction of Crime Type plays a vital role in preventing crime in the society as well as assisting law agencies to design optimal strategies to ward off crime happenings in turn increasing public safety and decreasing economical loss.
"Multilayer perceptron (MLP) is a technique of feed
forward artificial neural network using back
propagation learning method to classify the target
variable used for supervised learning. It consists of multiple layers and non-linear activation allowing it to distinguish data that is not linearly separable."
Random Forest Classification is a machine learning technique utilizing aggregated outcome of many decision tree classifiers in order to improve precision of the outcome. It measures the relationship between the categorical target variable and one or more independent variables.
Predictive analytics of students' academic performance can help decision makers take appropriate actions at the right moment and plan appropriate training in order to improve the student’s success rate.
This overview discusses the predictive analytical technique known as Random Forest Regression, a method of analysis that creates a set of Decision Trees from a randomly selected subset of the training set, and aggregates by averaging values from different decision trees to decide the final target value. This technique is useful to determine which predictors have a significant impact on the target values, e.g., the impact of average rainfall, city location, parking availability, distance from hospital, and distance from shopping on the price of a house, or the impact of years of experience, position and productive hours on employee salary. Random Forest Regression is limited to predicting numeric output so the dependent variable has to be numeric in nature. The minimum sample size is 20 cases per independent variable. Random Forest Regression is just one of the numerous predictive analytical techniques and algorithms included in the Assisted Predictive Modeling module of the Smarten augmented analytics solution. This solution is designed to serve business users with sophisticated tools that are easy to use and require no data science or technical skills. Smarten is a representative vendor in multiple Gartner reports including the Gartner Modern BI and Analytics Platform report and the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms Report.
This overview discusses the predictive analytical technique known as Gradient Boosting Regression, an analytical technique that explore the relationship between two or more variables (X, and Y). Its analytical output identifies important factors ( Xi ) impacting the dependent variable (y) and the nature of the relationship between each of these factors and the dependent variable. Gradient Boosting Regression is limited to predicting numeric output so the dependent variable has to be numeric in nature. The minimum sample size is 20 cases per independent variable. The Gradient Boosting Regression technique is useful in many applications, e.g., targeted sales strategies by using appropriate predictors to ensure accuracy of marketing campaigns and clarify relationships among factors such as seasonality, product pricing and product promotions, or for an agriculture business attempting to ascertain the effects of temperature, rainfall and humidity on crop production. Gradient Boosting Regression is just one of the numerous predictive analytical techniques and algorithms included in the Assisted Predictive Modeling module of the Smarten augmented analytics solution. This solution is designed to serve business users with sophisticated tools that are easy to use and require no data science or technical skills. Smarten is a representative vendor in multiple Gartner reports including the Gartner Modern BI and Analytics Platform report and the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms Report.
Simple Linear Regression is a statistical technique that attempts to explore the relationship between one independent variable (X) and one dependent variable (Y). The Simple Linear Regression technique is not suitable for datasets where more than one variable/predictor exists.
sing advanced analytics to identify quality issues will improve production processes, protect the business against liability claims and allow the organization to focus on quality issues and change product design and/or processes.
Predictive analytics for maintenance management can take the guesswork out of equipment maintenance, which parts to order and when equipment should be replaced.
Predictive analytics targets data to predict if ATL advertising is more effective than BTL advertising and to target customer segments and characteristics.
Predictive analytics for human resource attrition identifies areas of dissatisfaction, analyzes processes, benefits, training and environs to improve retention.
Predictive Analytics for customer targeting identifies buying frequency, what causes customers to buy, factors informing purchases and messaging by segment.
Naive Bayes is a classification algorithm that is suitable for binary and multiclass classification. It is suitable for binary and multiclass classification. Naïve Bayes performs well in cases of categorical input variables compared to numerical variables. It is useful for making predictions and forecasting data based on historical results.
The KNN (K Nearest Neighbors) algorithm analyzes all available data points and classifies this data, then classifies new cases based on these established categories. It is useful for recognizing patterns and for estimating. The KNN Classification algorithm is useful in determining probable outcome and results, and in forecasting and predicting results, given the existence of multiple variables.
The independent sample t-test is a statistical method of hypothesis testing that determines whether there is a statistically significant difference between the means of two independent samples. It is helpful when an organization wants to determine whether there is a statistical difference between two categories or groups or items and, furthermore, if there is a statistical difference, whether that difference is significant.
Sampling is the technique of selecting a representative part of a population for the purpose of determining the characteristics of the whole population. There are two types of sampling analysis: Simple Random Sampling and Stratified Random Sampling. Sampling is useful in assigning values and predicting outcomes for an entire population, based on a smaller subset or sample of the population.
Binary Logistic Regression Classification makes use of one or more predictor variables that may be either continuous or categorical to predict target variable classes. This technique identifies important factors impacting the target variable and also the nature of the relationship between each of these factors and the dependent variable. It is useful in the analysis of multiple factors influencing an outcome, or other classification where there two possible outcomes.
The Paired Sample T Test is used to determine whether the mean of a dependent variable. For example, weight, anxiety level, salary, or reaction time is the same in two related groups. It is particularly useful in measuring results before and after a particular event, action, process change, etc.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Understanding Nidhi Software Pricing: A Quick Guide 🌟
Choosing the right software is vital for Nidhi companies to streamline operations. Our latest presentation covers Nidhi software pricing, key factors, costs, and negotiation tips.
📊 What You’ll Learn:
Key factors influencing Nidhi software price
Understanding the true cost beyond the initial price
Tips for negotiating the best deal
Affordable and customizable pricing options with Vector Nidhi Software
🔗 Learn more at: www.vectornidhisoftware.com/software-for-nidhi-company/
#NidhiSoftwarePrice #NidhiSoftware #VectorNidhi
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
openEuler Case Study - The Journey to Supply Chain Security
What is Simple Linear Regression and How Can an Enterprise Use this Technique to Analyze Data?
1. Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
4. Terminologies
• Predictors and Target variable :
• Target variable usually denoted by Y , is the variable being predicted and is also called
dependent variable, output variable, response variable or outcome variable
• Predictor, usually denoted by X , sometimes called an independent or explanatory
variable, is a variable that is being used to predict the target variable
• Correlation :
• Correlation is a statistical measure that indicates the extent to which two variables
fluctuate together
• Upper & Lower N% confidence intervals:
• A confidence interval is a statistical measure for saying, "I am pretty sure the true value
of a number I am approximating is within this range with n% confidence
5. Terminologies
• Intercept / constant term 𝜷0 :
• Intercept is the expected value of Y when all Xi = 0
• In other words, 𝛽0 represents what would be the minimum value of Y given all Xi = 0
• Coefficients 𝜷𝒊 :
• It is interpreted as the expected value of Yi corresponding to one unit change in Xi
• Error term 𝜺𝒊 :
• It represents the margin of error within a model
• It is a difference between the predicted value of Yi and observed value of Yi
• Standard error of coefficient :
• It is used to measure the precision of the estimate of the coefficient
• In other words, the smaller the standard error, the more precise the estimate
Where Yi is dependent variable
Xi is independent variable
6. Terminologies
• T statistic:
• Dividing the coefficient by its standard error gives t statistic which is used in
calculation of P value
• Degree of freedom:
• Degree of freedom is N-K where N is number of observations and K is number of
parameters used to calculate the estimate
• Significance level /alpha level:
• It represents level of confidence at which you want to test the results.
• Lower values of alpha means higher confidence. For example if 𝛼=0.1, confidence=
100 - (𝛼*100) = 90%
• P value :
• If the p-value associated with this t-statistic is less than alpha level, it means that
there exists a relation between corresponding predictor and dependent variable
7. Types of Linear regression analysis
• Depending on the number of independent variables/predictors in analysis, it is classified into two types :
• Simple linear regression:
• When there is only one dependent and one independent variable/predictor
• Multiple linear regression :
• When there is only one dependent variable but multiple independent variables/predictors
• Where
• Yi is dependent variable
• Xi is independent variable
• 𝛽0 is intercept
• 𝛽𝑖 is coefficient
• 𝜀𝑖 is the error term
8. Introduction : Simple
linear regression
Objective :
It is a statistical technique that attempts to
explore the relationship between one
independent variable (X) and one dependent
variable (Y )
Benefit :
Regression model output helps identify whether
independent variable/predictor X has any
relationship with dependent variable Y and if
yes then what is the nature/direction of
relationship ( i.e. positive/negative) between
the both
Model :
Simple Linear regression model equation takes
the form of Yi = 𝛽0 +𝛽1 Xi + 𝜀𝑖 as shown in
image in right :
9. Example: Simple linear regressionTemperature Yield
50 112
53 118
54 128
55 121
56 125
59 136
62 144
65 142
67 149
71 161
72 167
74 168
75 162
76 171
79 175
80 182
82 180
85 183
87 188
90 200
93 194
94 206
95 207
97 210
100 219
Input data Output
Regression Statistics
R Square 0.98
Coefficients P-value Lower 95% Upper 95%
Intercept 13.33 0.00268 5.13 21.52
Temperature 2.04 0.00138 1.93 2.15
Model is a good fit
as R square > 0.7
• P value for Temperature is <0.05;
• Hence Temperature is an important
factor for predicting Yield and has
significant relation with Yield
• With one unit increase in
Temperature there is 2 times
increase in Yield
• Values of coefficients will lie
between the range mentioned
under upper and lower 95%
• For example , coefficient of
Temperature will be between 1.93
and 2.15 with 95% confidence (5 %
chance of error)
Let’s get the simple linear regression output for independent variable X and
target variable Y as shown below:
Note : Intercept is not an important statistics for checking the relation between X & Y
10. Standard input/tuning parameters & Sample
UI
Select the predictor
Temperature
Yield
Pressure range
Step
1
Select the dependent variable
Temperature
Yield
Pressure range
Step 3
Step size =1
Number of Iterations = 100
Step
2
Display the output window
containing following :
o Model summary
o Line fit plot
o Normal probability plot
o Residual versus Fit plot
Step 4
Note : Categorical predictors should be auto detected &
converted to binary variables before applying regression
By default these parameters should
be set with the values mentioned
11. Sample output : 1. Model Summary
Regression Statistics
Multiple R 0.99
R Square 0.98
P-value :
o It is used to evaluate whether the corresponding predictor X has any significant impact on the target
variable Y
o As p –value for temperature is < 0.05 (highlighted in yellow in table above) , temperature has
significant relation with Yield
Value of a temperature coefficient
lies between 1.93 and 2.15 with 95%
confidence
Multiple R : It depicts the correlation between X & Y , closer this value
to ±1, higher the correlation
R square : It shows the goodness of fit of the model. It lies between 0 to
1 and closer this value to 1, better the model
Coefficient:
o It shows the magnitude as well as direction of impact of predictor X (temperature in this case) to a
target variable Y
o For example , in this case , with one unit increase in temperature, there is ‘2.04 unit increase’ in
Yield ( yield increases 2 times with one unit increase in X)
Coefficients P-value Lower 95% Upper 95%
Intercept 13.33 0.00268 5.13 21.52
Temperature 2.04 0.00138 1.93 2.15
Check Interpretation section for more details
12. Sample output : 2. Plots
y^ = 𝟏𝟕 + 𝟐𝒙
R2 = 0.75
Line fit plot is used to check the assumption of
linearity between X & Y
Normal Probability plot is used to check the
assumption of normality & to detect outliers
Residual plot is used to check the assumption
of equal error variances & outliers
Check Interpretation section for more details
13. Interpretation of Important Model Summary
Statistics
Multiple R :
•R > 0.7 represents a strong
positive correlation
between X and Y
•0.4 < = R < 0.7 represents a
weak positive correlation
between X and Y
•0 <= R < 0.4 represents a
negligible/no correlation
between X and Y
•-0.4 < = R < -0.7 represents
a weak negative
correlation between X and Y
•R < - 0.7 represents a
strong negative correlation
between X and Y
R Square :
•R square > 0.7 represents a
very good model i.e. model
is able to explain 70%
variability in Y
•R square between 0 to 0.7
represents a model not fit
well and assumptions of
normality and linearity
should be checked for better
fitting of a model
P value :
•At 95% confidence threshold
, if p-value for a predictor X
is <0.05 then X is a
significant/important
predictor
•At 95% confidence threshold
, if p-value for a predictor X
is >0.05 then X is an
insignificant/unimportant
predictor i.e. it doesn’t have
significant relation with
target variable Y
Coefficients :
•It indicates with how much
magnitude the output
variable will change with
one unit change in X
•For example, if coefficient
of X is 2 then Y will
increase 2 times with one
unit increase in X
•If coefficient of X is -2
then Y will decrease 2
times with one unit
increase in X
14. Interpretation of plots
: Line Fit plot
This plot is used to plot the relationship between
X (predictor) & Y(target variable) with Y on y
axis and X on x axis
As shown in the figure1 in right, as temperature
increases, so does the Yield, hence there is a
linear relationship between X and Y and simple
linear regression is applicable on this data
Fitted regression line and regression equation is
shown in the plot itself along with model R
square value to describe how well the model fits
the data and whether there is a linear relation
between X and Y or not
If R square is low (<0.7) and line doesn’t display
linearity as shown in figures 2 & 3 in right then a
linear regression model is not applicable and
different model should be considered to predict
Y
y^ = 𝟏𝟕 + 𝟐𝒙
R2 = 0.75
Figure 1
Figure 2
Figure 3
R2 = 0.5
R2 = 0.4
15. Interpretation of plots
: Normal Probability
plot
This plots the percentile vs. target/dependent
variable(Y)
It is used to check the assumptions of
linearity and normality in data and also to
detect the outliers
It can be helpful to add the trend line to see
whether the data fits a straight line
The plot in figure 1 shows that the pattern of
dots in the plot lies close to a straight line;
Therefore, data is normally distributed and
there are no outliers
Examples of non normal data are shown in
figure 2 &3 in right and example of outliers is
shown in figure 4 :
Figure 1
Figure 2
Figure 3
Figure 4
16. Interpretation of plots
: Residual versus Fit
plot
It is the scattered plot of residuals on Y axis and predicted
(fitted) values on X axis
It is used to detect unequal error variances and outliers
Here are the characteristics of a well-behaved residual vs.
fits plot :
The residuals should "bounce randomly" around the 0 line
and should roughly form a "horizontal band" around the 0
line as shown in figure 1. This suggests that the variances of
the error terms are equal
No one residual should "stands out" from the basic random
pattern of residuals. This suggests that there are no outliers
For example the red data point in figure 1 is an outlier, such
outliers should be removed from data before proceeding
with model interpretation
Plots shown in figures 2 & 3 above depict unequal error
variances, which is not desirable for linear regression
analysis
Figure 1
Figure 2
Figure 3
17. Limitations
Simple linear regression is limited to predicting numeric output i.e.
dependent variable has to be numeric in nature
• Minimum sample size should be > 50+8m where m is number of
predictors.
• Hence in case of simple linear regression, minimum sample size should be
50+8(1) = 58
• It handles only two variables : one predictor and one dependent
variable but usually there are more than one predictors correlated
with the dependent variable which can’t be analyzed through simple
linear regression
18. Limitations
Target/dependent variable should be normally
distributed
A normal distribution is an arrangement of a
data set in which most values cluster in the
middle of the range and the rest taper off
symmetrically toward either extreme. It will
look like a bell curve as shown in figure 1 in right
Outliers in data can affect the analysis, hence
outliers need to be removed
Outliers are the observations lying outside
overall pattern of distribution as shown in figure
2 in right
These extreme values/outliers can be replaced
with 1st or 99th percentile values
Outliers
Figure 1
Figure 2
19. Business use case 1
• Business problem :
• An ecommerce company wants to measure the impact of product price on product
sales
• Input data:
• Predictor/independent variable is product price data for last year
• Dependent variable is product sales data for last year
• Business benefit:
• Product sales manager will get to know how much and in what direction does the
product price impact the product sales
• Decision on product price alteration can be made with more confidence according to
the sales target for that particular product
20. Business use case 2
• Business problem :
• An agriculture production firm wants to predict the impact of amount of rainfall on yield of
particular crop
• Input data:
• Predictor/independent variable : Amount of rainfall during monsoon months last year
• Dependent variable : Crop production data during monsoon months last year
• Business benefit:
• An agriculture firm can predict the yield of a particular crop based on the amount of rain fall
this year and can plan for the alternative crop arrangements and other contingencies if the
amount of rain fall is not adequate in order to get the desired / targeted crop production
21. Example : Simple linear regression
Consider the data obtained from a chemical process where the yield (Yi ) of the
process is thought to be related to the reaction temperature ( Xi )(see the table in
right)
Where
y
_
is the mean of all the observed values of dependent variable
x
_
is the mean of all values of the predictor variable
y
_
is calculated using
x
_
is calculated using
STEP 1 : Obtain the estimates, 𝜷0 and 𝜷1 in the equation Yi = 𝜷0 +𝜷i Xi +
𝜺𝒊 using the following equations :
22. Example : Simple linear regression
Calculating 𝜷0 and 𝜷1 :
Once 𝜷0 and 𝜷1 are known, the
fitted regression line can be
written as:
Where y^ is the predicted
value based on the fitted
regression model
23. STEP 2 : Obtain values of y^ for each observation using the regression line fit equation
obtained in Step 1 : y^ = 𝟏𝟕 + 𝟐𝒙
Also compute the corresponding error terms using equation 𝜺𝒊 = yi - yi^ as shown below:
Predicted values
corresponding to each
observation :
y1^ = 17 + 2 x1 = 17 + 2*50 = 117
y2^ = 17 + 2 x2 = 17 + 2*53 = 123
y25^ = 17 + 2 x25 = 17 + 2*100 = 217
𝜺1^ = y1 - y1^ = 122 -117 = 5
𝜺2^ = y2 - y2^ = 118 -123 = -5
𝜺25 ^ = y25 - y25^ = 217-219 = -2
Error values
corresponding to each
predicted values:
Example : Simple Linear Regression
24. To get P value , we need T statistic, degree of freedom and significance
level (𝛼) which can be obtained as follows:
STEP 3 : Obtain the significance value (p value) to understand whether there exists a relation between
predictor and dependent variable i.e. temperature and yield in this case
1. Calculate standard error for 𝜷1 : 2. Calculate t statistic : 3. Calculate P value :
Assuming that the desired significance level is 0.1 ( i.e. 90% confidence threshold), since P value <
0.1 here , there exists a relation between Temperature and Yield variables.
P(T<t0) is
obtained
from t table
Example: Simple Linear Regression
25. Example: Simple
Linear Regression
This metric shows how much % of variability in Y (dependent variable : Yield in this case)
can be explained/predicted by the fitted model
STEP 4 : Calculate the measure of model
accuracy : Coefficient of Determination (R2)
Before any inferences are undertaken ,
model accuracy must be checked
Closer the value of R2 to 1 , better the
fitted model
In this case it is 0.98 indicating 98% of
variability in Yield is explained by the
fitted model . Thus, the model is very
much accurate
26. Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018