•

4 likes•4,241 views

Report

Share

Linear Regression is one of the basic and fundamental algorithm which is used in machine learning

- 2. What is Linear Regression? Linear regression in supervised learning technique which is been used in machine learning. Its main task is to determine the relationship between an independent and dependent variable in which both are continuous. In other words it is used for predicting a contionous quantity.
- 3. When do we use Linear Regression? Linear regression has many practical uses. Most applications fall into one of the following two broad categories: If the goal is prediction, or forecasting, or error reduction, linear regression can be used to fit a predictive model to an observed data set of values of the response and explanatory variables Also used in trends and sales,assessment of risk in financial services and insurance domain.
- 4. Regression Line In statistics, you can calculate a regression line for two variables if their scatterplot shows a linear pattern and the correlation between the variables is very strong (for example, r = 0.98). A regression line is simply a single line that best fits the data (in terms of having the smallest overall distance from the line to the points). Statisticians call this technique for finding the best-fitting line a simple linear regression analysis using the least squares method.
- 5. Determination of slope by Least square The "least square” method is a form of mathematical regression analysis used to determine the line of best fit for a set of data, providing a visual demonstration of the relationship between the data points. Each point of data represents the relationship between a known independent variable and an unknown dependent variable. To find the best fit line(regression line) the slope must be determined ie y=mx+c, where m is the slope and c is the y intercept. Slope is obtained with the formula ,
- 6. R squared method R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. R-squared measures the strength of the relationship between your model and the dependent variable on a convenient 0 – 100% scale .
- 7. Standard error of estimate The standard error of the estimate is a measure of the accuracy of predictions. Recall that the regression line is the line that minimizes the sum of squared deviations of prediction (also called the sum of squares error). The standard error of the estimate is closely related to this quantity and is defined below:
- 8. Advantages Linear Regression performs well when the dataset is linearly separable. We can use it to find the nature of the relationship among the variables. Linear Regression is easier to implement, interpret and very efficient to train. Linear Regression is prone to over-fitting but it can be easily avoided using some dimensionality reduction techniques, regularization (L1 and L2) techniques and cross-validation.
- 9. Disadvantages Main limitation of Linear Regression is the assumption of linearity between the dependent variable and the independent variables. In the real world, the data is rarely linearly separable. It assumes that there is a straight-line relationship between the dependent and independent variables which is incorrect many times. Prone to noise and overfitting: If the number of observations are lesser than the number of features, Linear Regression should not be used, otherwise it may lead to overfit because is starts considering noise in this scenario while building the model. Prone to outliers: Linear regression is very sensitive to outliers (anomalies). So, outliers should be analyzed and removed before applying Linear Regression to the dataset. Prone to multicollinearity: Before applying Linear regression, multicollinearity should be removed (using dimensionality reduction techniques) because it assumes that there is no relationship among independent variables.
- 10. Thank you!!