SlideShare a Scribd company logo
1 of 35
Download to read offline
MODULE 3
Supervised ML – Regression
Shiwani Gupta
Use case
Simple Linear
Gradient Descent
Evaluation Metric
Multi Linear, Polynomial
Regularization
USE CASES
 A hospital may be interested in finding how total cost of a patient varies with severity of disease.
 Insurance companies would like to understand association between healthcare cost and ageing.
 An organization may be interested in finding relationship between revenue generated from a product and
features such as price, promotional amount spent, competitor’s price of similar product, etc.
 Restaurants would like to know relationship between customer waiting time after placing the order and the
revenue generated.
 E-commerce companies like: Amazon, BigBasket, Flipkart , etc. would like to understand the relationship
between revenue generated and features like no. of customer visit to portal, no. of clicks on products, no.
of items on sale, av. discount percentage etc.
 Bank and other financial institutions would like to understand the impact of variables such as
unemployment rate, marital status, bank balance, etc. on percentage of Non Performing Assets, etc.
2
ISSUES
Outlier
Multicollinearity
Underfitting, Overfitting
3
LINEAR REGRESSION
 Linear Regression is a Supervised Machine Learning algorithm for
predictive modelling.
 It tries to find out the best linear relationship that describes the data you have
(Scatter Plot).
 It assumes that there exists a linear relationship between a dependent
variable (usually called y) and independent variable(s) (usually called X).
 The value of the dependent / response / outcome variable of a Linear
Regression model is a continuous value / quantitative in nature i.e. real
numbers.
 Linear Regression model represents linear relationship between a dependent
variable and independent / predictor / explanatory variable(s) via a sloped
straight line.
 The sloped straight line representing the linear relationship that fits the given
data best is called a Regression Line / Best Fit Line.
 Based on the number of independent variables, there are two types of Linear
Regression
4
INFERENCE ABOUT THE REGRESSION MODEL
 When a scatter plot shows a linear relationship between a quantitative explanatory variable x and a
quantitative response variable y, we can use the least square line fitted to the data to predict y for a given
value of x.
 We think of the least square line we calculated from the sample as an estimate of a regression line for the
population.
 Just as the sample mean is an estimate of the population mean µ.
 We will write the population regression line as
 The numbers and are parameters that describe the population.
 We will write the least-squares line fitted to sample data as
 This notation reminds us that the intercept b0 of the fitted line estimates the intercept 0 of the
population line, and the slope b1 estimates the slope 1 respectively.
x
x
1
0 
 
0
 1

x
b
b 1
0 
5
SIMPLE LINEAR REGRESSION
 A statistical method to summarize and study the functional relationship b/w 2 cont. variables.
 May be linear or nonlinear (eg. Population growth over time)
 The dependent variable depends only on a single independent variable.
 The form of the model is: y = β0 + β1X eg. V=I*R, Circumference = 2*pi*r, C=(F-32)*5/9, etc.
 y is a dependent variable.
 X is an independent variable.
 β0 and β1 are the regression coefficients.
 β0 is the intercept or the bias that fixes the offset to a line. It is the av. y value for Xmean = 0
 β1 is the slope or weight that specifies the factor by which X has an impact on y.
The values of the regression parameters 0, and 1 are not known.
We estimate them from data.
6
Deterministic Relationship
Stochastic Relationship
7
REGRESSION LINE
 We will write an estimated regression line based on sample data as
 Least Squares Method give us the “best” estimated line for our set of sample data.
 The method of least squares chooses the values for b0 and b1 to minimize the Sum of Squared Errors
 Using calculus, we obtain the estimating formulas:
x
b
b
y 1
0
ˆ 

 
2
1
1
0
1
2
)
ˆ
( 
 






n
i
n
i
i
i x
b
b
y
y
y
SSE
 
  


 
  








 n
i
n
i
i
i
n
i
n
i
n
i
i
i
i
i
n
i
i
n
i
i
i
x
x
n
y
x
y
x
n
x
x
y
y
x
x
b
1 1
2
2
1 1 1
1
2
1
1
)
(
)
(
)
)(
(
x
b
y
b 1
0 

Fitted regression line can be used to estimate y for a given value of x.
𝑀𝑆𝐸 = (1/𝑛)
𝑖=1
𝑛
(𝑦𝑖 − 𝑦𝑖)2 𝑀𝐴𝐸 = (1/𝑛)
𝑖=1
𝑛
|𝑦𝑖 − 𝑦 |
8
STEPS TO ESTABLISH A LINEAR RELATION
 Gather sample of observed height and corresponding weight.
 Create relationship model.
 Find coefficients from model and establish mathematical equation.
 Get summary of model to compute av. prediction error … Residual.
 Predict weight
height weight
151 63
174 81
138 56
186 91
128 47
136 57
179 76
163 72
152 62
131 48
 
2
1
1
0





n
i
x
b
b
y
Q
We want to penalize the points which are
farther from the regression line much more
than the points which lie close to the line.
9
STRENGTH OF LINEAR ASSOCIATION: PEARSON COEFFICIENT
height
(x)
weight
(y)
x-xmean y-ymean (x-xmean)*(y-ymean) (x-xmean)*(x-xmean) xy x2
y2
151 63 -2.8 -2.3 6.44 7.84 9513 22801 3969
174 81 20.2 15.7 317.14 408.04 14094 30276 6561
138 56 -15.8 -9.3 146.94 249.64 7728 19044 3136
186 91 32.2 25.7 827.54 1036.84 16926 34596 8281
128 47 -25.8 -18.3 472.14 665.64 6016 16384 2209
136 57 -17.8 -8.3 147.74 316.84 7752 18496 3249
179 76 25.2 10.7 269.64 635.04 13604 32041 5776
163 72 9.2 6.7 61.64 84.64 11736 26569 5184
152 62 -1.8 -3.3 5.94 3.24 9424 23104 3844
131 48 -22.8 -17.3 394.44 519.84 6288 17161 2304
xmean ymean sum(xy) sum(x2
) sum(y2
)
153.8 65.3 264.96 392.76 103081 240472 44513
sum(x) sum(y) (sum(y))*(sum(y)) n*sum(xy) n*sum(x2
) n*sum(y2
)
1538 653 1004314 2365444 426409 1030810 2404720 445130
b1 = 0.67461 b0 = -38.455 y = b1x+b0 = 8.767644363 r = 0.97713
r = [-1,1]
Magnitude
as well as
direction
x=70
10
GRADIENT DESCENT
 used to minimize the cost function Q
 STEPS:
1. Random initialization for θ1 and θ0.
2. Measure how the cost function changes with change in it’s parameters by computing the partial derivatives of
cost function w.r.t to the parameters θ₀, θ₁, … , θₙ.
3. After computing derivative, update parameters θj: = θj−α ∂/∂θj Q(θ0,θ1) for j=0,1 where α, learning rate, a
positive no. and a step to update parameters.
4. Repeat process of Simultaneous update of θ1 and θ0, until convergence.
 α too small, too much time, α too large, failure to converge.
11
GRADIENT DESCENT FOR UNIVARIATE LINEAR REGRESSION
 Hypothesis hθ(x)=θ0+θ1x
 Cost Function J(θ0,θ1)=(1/2*m) * ∑i=1
m(hθ(x(i))−y(i))2
 Gradient Descent to minimize cost function for Linear Regression model
 Compute derivative for j=0, j=1
12
Dispersion of observed variable around mean How well our line fits data
total variability of the data is equal to the variability
explained by the regression line plus the unexplained
variability, known as error.
MODEL
EVALUATION
13
COEFFICIENT OF DETERMINATION
 Recall that SST measures the total variations in yi when no account of the independent
variable x is taken.
 SSE measures the variation in the yi when a regression model with the independent variable x
is used.
 A natural measure of the effect of x in reducing the variation in y can be defined as:
R2 is called the coefficient of determination / goodness of fit.
 0  SSE  SST, it follows that:
 We may interpret R2 as the proportionate reduction of total variability in y associated with the
use of the independent variable x.
 The larger is R2, the more is the total variation of y reduced by including the variable x in the
model.
SST
SSE
SST
SSR
SST
SSE
SST
R 



 1
2
𝟎 ≤ 𝑅2
≤ 𝟏
14
COEFFICIENT OF DETERMINATION
If all the observations fall on the fitted regression line, SSE = 0 and R2 = 1.
If the slope of the fitted regression line
b1 = 0 so that , SSE=SST and R2 = 0.
The closer R2 is to 1, the greater is said to be the degree of linear association
between x and y.
The square root of R2 is called the coefficient of correlation (r).
y
yi 
ˆ
2
R
r 

 
 
  










2
2
2
2
2
2
)
(
)
(
)
(
)
(
)
)(
(
y
y
n
x
x
n
y
x
xy
n
r
y
y
x
x
y
y
x
x
r
15
MODEL EVALUATION : R-SQUARED
height (cm) weight (kg) ypredicted SSE = (y-ypred)2 SST = (y-ymean)2 SSR = (ypred-ymean)2
151 63 63.4111 0.16901143 5.29 3.56790543
174 81 78.9271 4.29674858 246.49 185.698945
138 56 54.6412 1.84639179 86.49 113.610444
186 91 87.0225 15.8208245 660.49 471.865268
128 47 47.8951 0.80116821 334.89 302.93124
136 57 53.292 13.7495606 68.89 144.193025
179 76 82.3002 39.692394 114.49 289.00646
163 72 71.5064 0.24361134 44.89 38.5197733
152 62 64.0857 4.35022792 10.89 1.47447592
131 48 49.9189 3.68221559 299.29 236.57793
ymean sum sum sum
65.3 82.652154 1872.1 1787.44547
R2 = SSR/SST = 0.95478
measures the
proportion of the
variation in your
dependent
variable
explained by all
your independent
variables in the
model
R2 = [0,1]
16
ESTIMATION OF MEAN RESPONSE
 The weekly advertising expenditure (x) and weekly sales (y) are presented in the following table:
 From the table, the least squares estimates of the regression coefficients are:
y x
1250 41
1380 54
1425 63
1425 54
1450 48
1300 46
1400 62
1510 61
1575 64
1650 71
 
 





818755
14365
32604
564
10 2
xy
y
x
x
n
8
.
10
)
564
(
)
32604
(
10
)
14365
)(
564
(
)
818755
(
10
)
( 2
2
2
1 






 
  
x
x
n
y
x
xy
n
b 828
)
4
.
56
(
8
.
10
5
.
1436
0 


b
The estimated regression function is:
This means that if weekly advertising expenditure is increased by $1 we would expect the weekly sales to increase by $10.8.
Fitted values for the sample data are obtained by substituting the x value into the estimated regression function.
For example, if the advertising expenditure is $50, then the estimated Sales is:
This is called the point estimate (forecast) of the mean response (sales).
e
Expenditur
8
.
10
828
Sales
10.8x
828
ŷ




1368
)
50
(
8
.
10
828 


Sales
x
b
y
b 1
0 

17
EXAMPLE: SOLVE
• The primary goal of Quantitative Analysis is to use current information about a
phenomenon to predict its future behavior.
• Current information is usually in the form of data.
• In a simple case, when the data forms a set of pairs of numbers, we may interpret
them as representing the observed values of an independent (or predictor) variable
X and a dependent (or response) variable y.
• The goal of the analyst who studies the data is to find a functional relation
between the response variable y and the predictor variable x.
lot size Man-hours
30 73
20 50
60 128
80 170
40 87
50 108
60 135
30 69
70 148
60 132
0
20
40
60
80
100
120
140
160
180
0 10 20 30 40 50 60 70 80 90
Man-Hour
Lot size
Statistical relation between Lot size and Man-Hour
)
(x
f
y 
18
EXAMPLE: RETAIL SALES AND FLOOR SPACE
 It is customary in retail operations to assess the performance of stores partly in terms of their annual
sales relative to their floor area (square feet).
 We might expect sales to increase linearly as stores get larger, with of course individual variation among
the stores of same size.
 The regression model for a population of stores says that SALES = 0 + 1 AREA + 
 The slope 1 is rate of change: it is the expected increase in annual sales associated with each additional
square foot of floor space.
 The intercept 0 is needed to describe the line but has no statistical importance because no stores have
area close to zero.
 Floor space does not completely determine sales. The term  in the model accounts for difference among
individual stores with the same floor space. A store’s location, for example, is important.
 Residual: The difference between the observed value yi and the corresponding fitted value
 Residuals are highly useful for studying whether a given regression model is appropriate for the
data at hand. i
i
i y
y
e ˆ


i
ŷ
19
ANALYSIS OF RESIDUAL
 To examine whether the regression model is appropriate for the data being analyzed, we can check residual plots.
 Residual plots are:
 A scatterplot of the residuals
 Plot residuals against the fitted values.
 Plot residuals against the independent variable.
 Plot residuals over time if the data are chronological.
 The residuals should have no systematic pattern. Eg. The residual plot below shows a scatter of the points with no
individual observations or systematic change as x increases.
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
0 20 40 60
Residuals
Degree Days
Degree Days Residual Plot
20
RESIDUAL PLOTS
The points in this residual plot have a curve pattern, so a straight line fits
poorly
The points in this plot show more spread for larger values of the
explanatory variable x, so prediction will be less accurate when x is
large.
21
EXAMPLE: DO WAGES RISE WITH EXPERIENCE?
 Many factors affect the wages of
workers: the industry they work in,
their type of job, their education,
their experience, and changes in
general levels of wages. We will
look at a sample of 59 married
women who hold customer service
jobs in Indiana banks. The table
gives their weekly wages at a
specific point in time also their
Length Of Service with their
employer, in months. The size of
the place of work is recorded
simply as “large” (100 or more
workers) or “small.” Because
industry, job type, and the time of
measurement are the same for all 59
subjects, we expect to see a clear
relationship between wages and
length of service.
22
EXAMPLE: DO WAGES RISE WITH EXPERIENCE?
From previous table we have:
The least squares estimates of the regression coefficients are:

 
 






1719376
9460467
23069
451031
4159
59
2
2
xy
y
y
x
x
n


 x
b
y
b0
 
  


 2
2
1
)
( x
x
n
y
x
xy
n
b
 
 2
)
ˆ
( i
i y
y
SSE
 
 2
)
( y
y
SST i
 
 2
)
ˆ
( y
y
SSR i
23
USING THE REGRESSION LINE
 One of the most common reasons to fit a line to data is to predict the response to a particular value of
the explanatory variable.
 In our example, the least square line for predicting the weekly earnings for female bank customer
service workers from their length of service is
 For a length of service of 125 months, our least-squares regression equation gives
x
y 5905
.
0
4
.
349
ˆ 

per week
423
$
)
125
)(
5905
(.
4
.
349
ˆ 


y
The measure of variation in the data around the fitted regression line
SSE = 36124.76 SST = 128552.5
If SST = 0, all observations are the same (No variability).
The greater is SST, the greater is the variation among the y values.
74
.
92427
76
.
36124
5
.
128552 



 SSE
SST
SSR
SSR is the variation among predicted responses. The predicted responses lie on the least-square
line. They show how y moves in response to x.
The larger is SSR relative to SST, the greater is the role of regression line in explaining the total
variability in y observations.
This indicates that most of variability in weekly sales can be explained by the relation between
the weekly advertising expenditure and the weekly sales.
R2 = SSR/SST = .719
24
MULTIPLE LINEAR REGRESSION
 The dependent variable depends on more than one independent variables.
 The form of the model is: y = b0 + b1x1 + b2x2 + b3x3 + …… + bnxn
 May be Linear or nonlinear.
 Here,
 y is a dependent variable.
 x1, x2, …., xn are independent variables.
 b0, b1,…, bn are the regression coefficients.
 bj (1<=j<=n) is the slope or weight that specifies the factor by which Xj has an impact on Y.
25
POLYNOMIAL REGRESSION
 y= b0+b1x1+ b2x1
2+ b2x1
3+...... bnx1
n
 Special case of Multiple Linear Regression.
 We add some polynomial terms to the Multiple Linear regression equation to convert it into Polynomial Regression.
 Linear model with some modification in order to increase the accuracy.
 Training data is of non-linear nature.
 In Polynomial regression, the original features are converted into Polynomial features of required degree (2,3,..,n) and
then modeled using a Linear model.
 If we apply a linear model on a linear dataset, then it provides us good result, but if we apply the same model without
any modification on a non-linear dataset, then it will produce a drastic output. Due to which loss function will increase,
the error rate will be high, and the accuracy will decrease.
 A Polynomial Regression algorithm is also called Polynomial Linear Regression because it does not depend on the
variables, instead, it depends on the coefficients, which are arranged in a linear fashion.
26
REGULARIZATION
 To avoid overfitting of training data and hence enhance generalization performance.
 Since model tries to capture noise, that doesn’t represent true properties of data.
 Regularization is a form of regression that constraints/ regularizes/ shrinks coefficient estimates towards zero.
 Y represents the learned relation, β represents the coefficient estimates for
different variables or predictors (X).
 Coefficients are chosen so as to minimize loss function
27
• Eg. a person’s height and weight, age and sales price of a car, or years of education
and annual income
• Doesn’t affect DT
• kNN affected
• Cause
• Insufficient data
• Dummy variables
• Including a variable in the regression that is actually a combination of two
other variables.
• Identify (corr > 0.4, Variance Inflation Factor score > 5 high correlation )
• Sol
• Feature selection
• PCA
• More data
• Ridge regression reduces magnitude of model coefficients 28
RIDGE REGULARIZATION (L2 NORM)
 Used when data suffers from multicollinearity
 RSS is modified by adding shrinkage quantity, λ (tuning parameter) that decides how much we want to penalize the
flexibility of our model, intercept β0, is a measure of the mean value of the response when xi1 = xi2 = …= xip = 0.
 If we want to minimize the above function, these coefficients need to be small.
 When λ = 0, the penalty term has no effect, and the estimates produced by ridge regression will be equal to least squares.
 However, as λ→∞, the impact of the shrinkage penalty grows, and ridge regression coefficient estimates will approach
zero.
 Note: we need to standardize the predictors or bring the predictors to the same scale before performing ridge regression.
 Disadvantage: model interpretability
29
LASSO REGULARIZATION (L1 NORM)
 Least Absolute Shrinkage and Selection Operator
 This variation differs from ridge regression only in penalizing the high coefficients.
 It uses |βj| (modulus) instead of squares of β, as its penalty.
 Lasso method also performs variable selection and is said to yield sparse models.
30
RIDGE LASSO COMPARISON
 Ridge Regression can be thought of as solving an equation, where summation of squares of coefficients is less than or equal to
s. And Lasso can be thought of as an equation where summation of modulus of coefficients is less than or equal to s. Here, s is a
constant that exists for each value of shrinkage factor λ. These equations are also referred to as constraint functions.
 Consider there are 2 parameters in a given problem. Then according to above formulation:
 Ridge regression is expressed by β1² + β2² ≤ s. This implies that ridge regression coefficients have the smallest RSS (loss
function) for all points that lie within the circle given by β1² + β2² ≤ s.
 for Lasso, the equation becomes, |β1|+|β2| ≤ s. This implies that lasso coefficients have the smallest RSS (loss function) for all
points that lie within the diamond given by |β1|+|β2|≤ s.
Image shows the constraint functions (green areas), for Lasso (left) and Ridge regression (right), along with contours for RSS
(red ellipse). The black point denotes that the least square error is minimized at that point and as we can see that it increases
quadratically as we move away from it and the regularization term is minimized at the origin where all the parameters are
zero
Since Ridge Regression has a circular constraint with no sharp points,
this intersection will not generally occur on an axis, and so ridge
regression coefficient estimates will be exclusively non-zero.
However, Lasso constraint has corners at each of the axes, and so the
ellipse will often intersect the constraint region at an axis. When this
occurs, one of the coefficients will equal zero.
31
BENEFIT
 Regularization significantly reduces the variance of model, without substantial increase in its bias.
 The tuning parameter λ, controls the impact on bias and variance.
 As the value of λ rises, it reduces the value of coefficients and thus reducing the variance.
 Till a point, this increase in λ is beneficial as it is only reducing the variance (hence avoiding
overfitting), without loosing any important properties in the data.
 But after certain value, the model starts loosing important properties, giving rise to bias in the model and
thus underfitting. Therefore, the value of λ should be carefully selected.
32
SUMMATIVE ASSESSMENT
3 Consider the following dataset showing relationship
between food intake (lb) of cows and milk yield (lb).
Estimate the parameters for the linear regression model
for the dataset:
Food (lb) Milk Yield (lb)
4 3.0
6 5.5
10 6.5
12 9.0
4 Fit a Linear Regression model for following relation
between mother’s Estirol level and birth weight of child
for following data:
Estirol (mg/24 hr) Birth weight (g/100)
1 1
2 1
3 2
4 2
5 4
5 Create a relationship model for given data to find
relationship b/w height and weight of students. Compute
Karl Pearson coefficient and Coefficient of determination.
REFER SLIDE 9
6 State benefits of regularization for avoiding overfitting
in Linear Regression. State mathematical formulation of
Regularization.
7 Explain steps of Gradient Descent Algorithm.
33
1.The rent of a property is related to its area. Given the area in square feet and rent in dollars, find the
relationship between area and rent using the concept of linear regression. Also predict the rent for a
property of 790 ft square.
2. The marks obtained by a student is dependent on his/her study time. Given the study time in minutes
and marks out of 2000. Find the relationship between study time and marks using the concept of Linear
Regression. Also predict marks for a student if he/she studied for 790 minutes.
Area (ft2) Rent (inr)
360 520
1070 1600
630 1000
890 850
940 1350
500 490
Study Time (min.) Marks obtained
350 520
1070 1600
630 1000
890 850
940 1350
500 490
SUMMATIVE ASSESSMENT
34
8. Use the method of Least Square using Regression to predict the final exam grade of a
student who received 86 on mid term exam.
x (midterm) y (final exam)
65 175
67 133
71 185
71 163
66 126
75 198
67 153
70 163
71 159
69 151
9. Create a relationship model for given data to find
relationship between height and weight of students. .
Height (inches) Weight (pounds)
72 200
68 165
69 160
71 163
66 126
RESOURCES
 https://www.youtube.com/watch?v=Rb8MnMEJTI4&list=PLIeGtxpvyG-KE0M1r5cjbC_7Q_dVlKVq4&index=1
 https://www.youtube.com/watch?v=ls3XKoGntXg&list=PLIeGtxpvyG-KE0M1r5cjbC_7Q_dVlKVq4&index=3
 https://www.youtube.com/watch?v=E5RjzSK0fvY
 https://www.youtube.com/watch?v=NF5_btOaCig&list=PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU&index=5
 https://www.youtube.com/watch?v=5Z9OIYA8He8&list=PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU&index=9
 https://www.youtube.com/watch?v=Xm2C_gTAl8c
 https://www.geeksforgeeks.org/mathematical-explanation-for-linear-regression-working/
 https://www.analyticsvidhya.com/blog/2016/01/ridge-lasso-regression-python-complete-tutorial/
 https://www.youtube.com/playlist?list=PLIeGtxpvyG-IqjoU8IiF0Yu1WtxNq_4z-
 https://365datascience.com/r-squared/
35

More Related Content

What's hot

Boyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer ScienceBoyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer ScienceTransweb Global Inc
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2Srinivasan R
 
L03 ai - knowledge representation using logic
L03 ai - knowledge representation using logicL03 ai - knowledge representation using logic
L03 ai - knowledge representation using logicManjula V
 
Finite automata-for-lexical-analysis
Finite automata-for-lexical-analysisFinite automata-for-lexical-analysis
Finite automata-for-lexical-analysisDattatray Gandhmal
 
5.2 primitive recursive functions
5.2 primitive recursive functions5.2 primitive recursive functions
5.2 primitive recursive functionsSampath Kumar S
 
Minmax Algorithm In Artificial Intelligence slides
Minmax Algorithm In Artificial Intelligence slidesMinmax Algorithm In Artificial Intelligence slides
Minmax Algorithm In Artificial Intelligence slidesSamiaAziz4
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationMohammed Bennamoun
 
AUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTESAUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTESsuthi
 
Turing machine implementation
Turing machine implementationTuring machine implementation
Turing machine implementationSinaRostami7
 
Kalman filter - Applications in Image processing
Kalman filter - Applications in Image processingKalman filter - Applications in Image processing
Kalman filter - Applications in Image processingRavi Teja
 

What's hot (20)

Turing Machine
Turing MachineTuring Machine
Turing Machine
 
Control Systems
Control SystemsControl Systems
Control Systems
 
Lecture: Automata
Lecture: AutomataLecture: Automata
Lecture: Automata
 
Boyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer ScienceBoyre Moore Algorithm | Computer Science
Boyre Moore Algorithm | Computer Science
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2
 
L03 ai - knowledge representation using logic
L03 ai - knowledge representation using logicL03 ai - knowledge representation using logic
L03 ai - knowledge representation using logic
 
Turing machine
Turing machineTuring machine
Turing machine
 
4.1 turing machines
4.1 turing machines4.1 turing machines
4.1 turing machines
 
Planning
PlanningPlanning
Planning
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
 
Finite automata-for-lexical-analysis
Finite automata-for-lexical-analysisFinite automata-for-lexical-analysis
Finite automata-for-lexical-analysis
 
5.2 primitive recursive functions
5.2 primitive recursive functions5.2 primitive recursive functions
5.2 primitive recursive functions
 
Minmax Algorithm In Artificial Intelligence slides
Minmax Algorithm In Artificial Intelligence slidesMinmax Algorithm In Artificial Intelligence slides
Minmax Algorithm In Artificial Intelligence slides
 
L3 cfg
L3 cfgL3 cfg
L3 cfg
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
 
AUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTESAUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTES
 
FInite Automata
FInite AutomataFInite Automata
FInite Automata
 
Turing machine implementation
Turing machine implementationTuring machine implementation
Turing machine implementation
 
Kalman filter - Applications in Image processing
Kalman filter - Applications in Image processingKalman filter - Applications in Image processing
Kalman filter - Applications in Image processing
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Similar to ML Module 3.pdf

Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regressionnszakir
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).pptMuhammadAftab89
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.pptRidaIrfan10
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.pptkrunal soni
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.pptMoinPasha12
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Sciencessuser71ac73
 
Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Japheth Muthama
 
Regression analysis by Muthama JM
Regression analysis by Muthama JMRegression analysis by Muthama JM
Regression analysis by Muthama JMJapheth Muthama
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regressionMaria Theresa
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysisRabin BK
 
Regression analysis algorithm
Regression analysis algorithm Regression analysis algorithm
Regression analysis algorithm Sammer Qader
 

Similar to ML Module 3.pdf (20)

Chapter5
Chapter5Chapter5
Chapter5
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regression
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Regression
RegressionRegression
Regression
 
Regression
RegressionRegression
Regression
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
Chapter05
Chapter05Chapter05
Chapter05
 
Chap5 correlation
Chap5 correlationChap5 correlation
Chap5 correlation
 
Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Regression Analysis by Muthama JM
Regression Analysis by Muthama JM
 
Regression analysis by Muthama JM
Regression analysis by Muthama JMRegression analysis by Muthama JM
Regression analysis by Muthama JM
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysis
 
Regression analysis algorithm
Regression analysis algorithm Regression analysis algorithm
Regression analysis algorithm
 

More from Shiwani Gupta

module6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdfmodule6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdfShiwani Gupta
 
module5_backtrackingnbranchnbound_2022.pdf
module5_backtrackingnbranchnbound_2022.pdfmodule5_backtrackingnbranchnbound_2022.pdf
module5_backtrackingnbranchnbound_2022.pdfShiwani Gupta
 
module4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfmodule4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfShiwani Gupta
 
module3_Greedymethod_2022.pdf
module3_Greedymethod_2022.pdfmodule3_Greedymethod_2022.pdf
module3_Greedymethod_2022.pdfShiwani Gupta
 
module2_dIVIDEncONQUER_2022.pdf
module2_dIVIDEncONQUER_2022.pdfmodule2_dIVIDEncONQUER_2022.pdf
module2_dIVIDEncONQUER_2022.pdfShiwani Gupta
 
module1_Introductiontoalgorithms_2022.pdf
module1_Introductiontoalgorithms_2022.pdfmodule1_Introductiontoalgorithms_2022.pdf
module1_Introductiontoalgorithms_2022.pdfShiwani Gupta
 
ML MODULE 1_slideshare.pdf
ML MODULE 1_slideshare.pdfML MODULE 1_slideshare.pdf
ML MODULE 1_slideshare.pdfShiwani Gupta
 
Functionsandpigeonholeprinciple
FunctionsandpigeonholeprincipleFunctionsandpigeonholeprinciple
FunctionsandpigeonholeprincipleShiwani Gupta
 
Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoningShiwani Gupta
 

More from Shiwani Gupta (20)

ML MODULE 6.pdf
ML MODULE 6.pdfML MODULE 6.pdf
ML MODULE 6.pdf
 
ML MODULE 5.pdf
ML MODULE 5.pdfML MODULE 5.pdf
ML MODULE 5.pdf
 
ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdf
 
module6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdfmodule6_stringmatchingalgorithm_2022.pdf
module6_stringmatchingalgorithm_2022.pdf
 
module5_backtrackingnbranchnbound_2022.pdf
module5_backtrackingnbranchnbound_2022.pdfmodule5_backtrackingnbranchnbound_2022.pdf
module5_backtrackingnbranchnbound_2022.pdf
 
module4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfmodule4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdf
 
module3_Greedymethod_2022.pdf
module3_Greedymethod_2022.pdfmodule3_Greedymethod_2022.pdf
module3_Greedymethod_2022.pdf
 
module2_dIVIDEncONQUER_2022.pdf
module2_dIVIDEncONQUER_2022.pdfmodule2_dIVIDEncONQUER_2022.pdf
module2_dIVIDEncONQUER_2022.pdf
 
module1_Introductiontoalgorithms_2022.pdf
module1_Introductiontoalgorithms_2022.pdfmodule1_Introductiontoalgorithms_2022.pdf
module1_Introductiontoalgorithms_2022.pdf
 
ML MODULE 1_slideshare.pdf
ML MODULE 1_slideshare.pdfML MODULE 1_slideshare.pdf
ML MODULE 1_slideshare.pdf
 
ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
 
Problem formulation
Problem formulationProblem formulation
Problem formulation
 
Simplex method
Simplex methodSimplex method
Simplex method
 
Functionsandpigeonholeprinciple
FunctionsandpigeonholeprincipleFunctionsandpigeonholeprinciple
Functionsandpigeonholeprinciple
 
Relations
RelationsRelations
Relations
 
Logic
LogicLogic
Logic
 
Set theory
Set theorySet theory
Set theory
 
Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoning
 
Introduction to ai
Introduction to aiIntroduction to ai
Introduction to ai
 
Planning Agent
Planning AgentPlanning Agent
Planning Agent
 

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 

ML Module 3.pdf

  • 1. MODULE 3 Supervised ML – Regression Shiwani Gupta Use case Simple Linear Gradient Descent Evaluation Metric Multi Linear, Polynomial Regularization
  • 2. USE CASES  A hospital may be interested in finding how total cost of a patient varies with severity of disease.  Insurance companies would like to understand association between healthcare cost and ageing.  An organization may be interested in finding relationship between revenue generated from a product and features such as price, promotional amount spent, competitor’s price of similar product, etc.  Restaurants would like to know relationship between customer waiting time after placing the order and the revenue generated.  E-commerce companies like: Amazon, BigBasket, Flipkart , etc. would like to understand the relationship between revenue generated and features like no. of customer visit to portal, no. of clicks on products, no. of items on sale, av. discount percentage etc.  Bank and other financial institutions would like to understand the impact of variables such as unemployment rate, marital status, bank balance, etc. on percentage of Non Performing Assets, etc. 2
  • 4. LINEAR REGRESSION  Linear Regression is a Supervised Machine Learning algorithm for predictive modelling.  It tries to find out the best linear relationship that describes the data you have (Scatter Plot).  It assumes that there exists a linear relationship between a dependent variable (usually called y) and independent variable(s) (usually called X).  The value of the dependent / response / outcome variable of a Linear Regression model is a continuous value / quantitative in nature i.e. real numbers.  Linear Regression model represents linear relationship between a dependent variable and independent / predictor / explanatory variable(s) via a sloped straight line.  The sloped straight line representing the linear relationship that fits the given data best is called a Regression Line / Best Fit Line.  Based on the number of independent variables, there are two types of Linear Regression 4
  • 5. INFERENCE ABOUT THE REGRESSION MODEL  When a scatter plot shows a linear relationship between a quantitative explanatory variable x and a quantitative response variable y, we can use the least square line fitted to the data to predict y for a given value of x.  We think of the least square line we calculated from the sample as an estimate of a regression line for the population.  Just as the sample mean is an estimate of the population mean µ.  We will write the population regression line as  The numbers and are parameters that describe the population.  We will write the least-squares line fitted to sample data as  This notation reminds us that the intercept b0 of the fitted line estimates the intercept 0 of the population line, and the slope b1 estimates the slope 1 respectively. x x 1 0    0  1  x b b 1 0  5
  • 6. SIMPLE LINEAR REGRESSION  A statistical method to summarize and study the functional relationship b/w 2 cont. variables.  May be linear or nonlinear (eg. Population growth over time)  The dependent variable depends only on a single independent variable.  The form of the model is: y = β0 + β1X eg. V=I*R, Circumference = 2*pi*r, C=(F-32)*5/9, etc.  y is a dependent variable.  X is an independent variable.  β0 and β1 are the regression coefficients.  β0 is the intercept or the bias that fixes the offset to a line. It is the av. y value for Xmean = 0  β1 is the slope or weight that specifies the factor by which X has an impact on y. The values of the regression parameters 0, and 1 are not known. We estimate them from data. 6
  • 8. REGRESSION LINE  We will write an estimated regression line based on sample data as  Least Squares Method give us the “best” estimated line for our set of sample data.  The method of least squares chooses the values for b0 and b1 to minimize the Sum of Squared Errors  Using calculus, we obtain the estimating formulas: x b b y 1 0 ˆ     2 1 1 0 1 2 ) ˆ (          n i n i i i x b b y y y SSE                      n i n i i i n i n i n i i i i i n i i n i i i x x n y x y x n x x y y x x b 1 1 2 2 1 1 1 1 2 1 1 ) ( ) ( ) )( ( x b y b 1 0   Fitted regression line can be used to estimate y for a given value of x. 𝑀𝑆𝐸 = (1/𝑛) 𝑖=1 𝑛 (𝑦𝑖 − 𝑦𝑖)2 𝑀𝐴𝐸 = (1/𝑛) 𝑖=1 𝑛 |𝑦𝑖 − 𝑦 | 8
  • 9. STEPS TO ESTABLISH A LINEAR RELATION  Gather sample of observed height and corresponding weight.  Create relationship model.  Find coefficients from model and establish mathematical equation.  Get summary of model to compute av. prediction error … Residual.  Predict weight height weight 151 63 174 81 138 56 186 91 128 47 136 57 179 76 163 72 152 62 131 48   2 1 1 0      n i x b b y Q We want to penalize the points which are farther from the regression line much more than the points which lie close to the line. 9
  • 10. STRENGTH OF LINEAR ASSOCIATION: PEARSON COEFFICIENT height (x) weight (y) x-xmean y-ymean (x-xmean)*(y-ymean) (x-xmean)*(x-xmean) xy x2 y2 151 63 -2.8 -2.3 6.44 7.84 9513 22801 3969 174 81 20.2 15.7 317.14 408.04 14094 30276 6561 138 56 -15.8 -9.3 146.94 249.64 7728 19044 3136 186 91 32.2 25.7 827.54 1036.84 16926 34596 8281 128 47 -25.8 -18.3 472.14 665.64 6016 16384 2209 136 57 -17.8 -8.3 147.74 316.84 7752 18496 3249 179 76 25.2 10.7 269.64 635.04 13604 32041 5776 163 72 9.2 6.7 61.64 84.64 11736 26569 5184 152 62 -1.8 -3.3 5.94 3.24 9424 23104 3844 131 48 -22.8 -17.3 394.44 519.84 6288 17161 2304 xmean ymean sum(xy) sum(x2 ) sum(y2 ) 153.8 65.3 264.96 392.76 103081 240472 44513 sum(x) sum(y) (sum(y))*(sum(y)) n*sum(xy) n*sum(x2 ) n*sum(y2 ) 1538 653 1004314 2365444 426409 1030810 2404720 445130 b1 = 0.67461 b0 = -38.455 y = b1x+b0 = 8.767644363 r = 0.97713 r = [-1,1] Magnitude as well as direction x=70 10
  • 11. GRADIENT DESCENT  used to minimize the cost function Q  STEPS: 1. Random initialization for θ1 and θ0. 2. Measure how the cost function changes with change in it’s parameters by computing the partial derivatives of cost function w.r.t to the parameters θ₀, θ₁, … , θₙ. 3. After computing derivative, update parameters θj: = θj−α ∂/∂θj Q(θ0,θ1) for j=0,1 where α, learning rate, a positive no. and a step to update parameters. 4. Repeat process of Simultaneous update of θ1 and θ0, until convergence.  α too small, too much time, α too large, failure to converge. 11
  • 12. GRADIENT DESCENT FOR UNIVARIATE LINEAR REGRESSION  Hypothesis hθ(x)=θ0+θ1x  Cost Function J(θ0,θ1)=(1/2*m) * ∑i=1 m(hθ(x(i))−y(i))2  Gradient Descent to minimize cost function for Linear Regression model  Compute derivative for j=0, j=1 12
  • 13. Dispersion of observed variable around mean How well our line fits data total variability of the data is equal to the variability explained by the regression line plus the unexplained variability, known as error. MODEL EVALUATION 13
  • 14. COEFFICIENT OF DETERMINATION  Recall that SST measures the total variations in yi when no account of the independent variable x is taken.  SSE measures the variation in the yi when a regression model with the independent variable x is used.  A natural measure of the effect of x in reducing the variation in y can be defined as: R2 is called the coefficient of determination / goodness of fit.  0  SSE  SST, it follows that:  We may interpret R2 as the proportionate reduction of total variability in y associated with the use of the independent variable x.  The larger is R2, the more is the total variation of y reduced by including the variable x in the model. SST SSE SST SSR SST SSE SST R      1 2 𝟎 ≤ 𝑅2 ≤ 𝟏 14
  • 15. COEFFICIENT OF DETERMINATION If all the observations fall on the fitted regression line, SSE = 0 and R2 = 1. If the slope of the fitted regression line b1 = 0 so that , SSE=SST and R2 = 0. The closer R2 is to 1, the greater is said to be the degree of linear association between x and y. The square root of R2 is called the coefficient of correlation (r). y yi  ˆ 2 R r                    2 2 2 2 2 2 ) ( ) ( ) ( ) ( ) )( ( y y n x x n y x xy n r y y x x y y x x r 15
  • 16. MODEL EVALUATION : R-SQUARED height (cm) weight (kg) ypredicted SSE = (y-ypred)2 SST = (y-ymean)2 SSR = (ypred-ymean)2 151 63 63.4111 0.16901143 5.29 3.56790543 174 81 78.9271 4.29674858 246.49 185.698945 138 56 54.6412 1.84639179 86.49 113.610444 186 91 87.0225 15.8208245 660.49 471.865268 128 47 47.8951 0.80116821 334.89 302.93124 136 57 53.292 13.7495606 68.89 144.193025 179 76 82.3002 39.692394 114.49 289.00646 163 72 71.5064 0.24361134 44.89 38.5197733 152 62 64.0857 4.35022792 10.89 1.47447592 131 48 49.9189 3.68221559 299.29 236.57793 ymean sum sum sum 65.3 82.652154 1872.1 1787.44547 R2 = SSR/SST = 0.95478 measures the proportion of the variation in your dependent variable explained by all your independent variables in the model R2 = [0,1] 16
  • 17. ESTIMATION OF MEAN RESPONSE  The weekly advertising expenditure (x) and weekly sales (y) are presented in the following table:  From the table, the least squares estimates of the regression coefficients are: y x 1250 41 1380 54 1425 63 1425 54 1450 48 1300 46 1400 62 1510 61 1575 64 1650 71          818755 14365 32604 564 10 2 xy y x x n 8 . 10 ) 564 ( ) 32604 ( 10 ) 14365 )( 564 ( ) 818755 ( 10 ) ( 2 2 2 1             x x n y x xy n b 828 ) 4 . 56 ( 8 . 10 5 . 1436 0    b The estimated regression function is: This means that if weekly advertising expenditure is increased by $1 we would expect the weekly sales to increase by $10.8. Fitted values for the sample data are obtained by substituting the x value into the estimated regression function. For example, if the advertising expenditure is $50, then the estimated Sales is: This is called the point estimate (forecast) of the mean response (sales). e Expenditur 8 . 10 828 Sales 10.8x 828 ŷ     1368 ) 50 ( 8 . 10 828    Sales x b y b 1 0   17
  • 18. EXAMPLE: SOLVE • The primary goal of Quantitative Analysis is to use current information about a phenomenon to predict its future behavior. • Current information is usually in the form of data. • In a simple case, when the data forms a set of pairs of numbers, we may interpret them as representing the observed values of an independent (or predictor) variable X and a dependent (or response) variable y. • The goal of the analyst who studies the data is to find a functional relation between the response variable y and the predictor variable x. lot size Man-hours 30 73 20 50 60 128 80 170 40 87 50 108 60 135 30 69 70 148 60 132 0 20 40 60 80 100 120 140 160 180 0 10 20 30 40 50 60 70 80 90 Man-Hour Lot size Statistical relation between Lot size and Man-Hour ) (x f y  18
  • 19. EXAMPLE: RETAIL SALES AND FLOOR SPACE  It is customary in retail operations to assess the performance of stores partly in terms of their annual sales relative to their floor area (square feet).  We might expect sales to increase linearly as stores get larger, with of course individual variation among the stores of same size.  The regression model for a population of stores says that SALES = 0 + 1 AREA +   The slope 1 is rate of change: it is the expected increase in annual sales associated with each additional square foot of floor space.  The intercept 0 is needed to describe the line but has no statistical importance because no stores have area close to zero.  Floor space does not completely determine sales. The term  in the model accounts for difference among individual stores with the same floor space. A store’s location, for example, is important.  Residual: The difference between the observed value yi and the corresponding fitted value  Residuals are highly useful for studying whether a given regression model is appropriate for the data at hand. i i i y y e ˆ   i ŷ 19
  • 20. ANALYSIS OF RESIDUAL  To examine whether the regression model is appropriate for the data being analyzed, we can check residual plots.  Residual plots are:  A scatterplot of the residuals  Plot residuals against the fitted values.  Plot residuals against the independent variable.  Plot residuals over time if the data are chronological.  The residuals should have no systematic pattern. Eg. The residual plot below shows a scatter of the points with no individual observations or systematic change as x increases. -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 0 20 40 60 Residuals Degree Days Degree Days Residual Plot 20
  • 21. RESIDUAL PLOTS The points in this residual plot have a curve pattern, so a straight line fits poorly The points in this plot show more spread for larger values of the explanatory variable x, so prediction will be less accurate when x is large. 21
  • 22. EXAMPLE: DO WAGES RISE WITH EXPERIENCE?  Many factors affect the wages of workers: the industry they work in, their type of job, their education, their experience, and changes in general levels of wages. We will look at a sample of 59 married women who hold customer service jobs in Indiana banks. The table gives their weekly wages at a specific point in time also their Length Of Service with their employer, in months. The size of the place of work is recorded simply as “large” (100 or more workers) or “small.” Because industry, job type, and the time of measurement are the same for all 59 subjects, we expect to see a clear relationship between wages and length of service. 22
  • 23. EXAMPLE: DO WAGES RISE WITH EXPERIENCE? From previous table we have: The least squares estimates of the regression coefficients are:            1719376 9460467 23069 451031 4159 59 2 2 xy y y x x n    x b y b0         2 2 1 ) ( x x n y x xy n b    2 ) ˆ ( i i y y SSE    2 ) ( y y SST i    2 ) ˆ ( y y SSR i 23
  • 24. USING THE REGRESSION LINE  One of the most common reasons to fit a line to data is to predict the response to a particular value of the explanatory variable.  In our example, the least square line for predicting the weekly earnings for female bank customer service workers from their length of service is  For a length of service of 125 months, our least-squares regression equation gives x y 5905 . 0 4 . 349 ˆ   per week 423 $ ) 125 )( 5905 (. 4 . 349 ˆ    y The measure of variation in the data around the fitted regression line SSE = 36124.76 SST = 128552.5 If SST = 0, all observations are the same (No variability). The greater is SST, the greater is the variation among the y values. 74 . 92427 76 . 36124 5 . 128552      SSE SST SSR SSR is the variation among predicted responses. The predicted responses lie on the least-square line. They show how y moves in response to x. The larger is SSR relative to SST, the greater is the role of regression line in explaining the total variability in y observations. This indicates that most of variability in weekly sales can be explained by the relation between the weekly advertising expenditure and the weekly sales. R2 = SSR/SST = .719 24
  • 25. MULTIPLE LINEAR REGRESSION  The dependent variable depends on more than one independent variables.  The form of the model is: y = b0 + b1x1 + b2x2 + b3x3 + …… + bnxn  May be Linear or nonlinear.  Here,  y is a dependent variable.  x1, x2, …., xn are independent variables.  b0, b1,…, bn are the regression coefficients.  bj (1<=j<=n) is the slope or weight that specifies the factor by which Xj has an impact on Y. 25
  • 26. POLYNOMIAL REGRESSION  y= b0+b1x1+ b2x1 2+ b2x1 3+...... bnx1 n  Special case of Multiple Linear Regression.  We add some polynomial terms to the Multiple Linear regression equation to convert it into Polynomial Regression.  Linear model with some modification in order to increase the accuracy.  Training data is of non-linear nature.  In Polynomial regression, the original features are converted into Polynomial features of required degree (2,3,..,n) and then modeled using a Linear model.  If we apply a linear model on a linear dataset, then it provides us good result, but if we apply the same model without any modification on a non-linear dataset, then it will produce a drastic output. Due to which loss function will increase, the error rate will be high, and the accuracy will decrease.  A Polynomial Regression algorithm is also called Polynomial Linear Regression because it does not depend on the variables, instead, it depends on the coefficients, which are arranged in a linear fashion. 26
  • 27. REGULARIZATION  To avoid overfitting of training data and hence enhance generalization performance.  Since model tries to capture noise, that doesn’t represent true properties of data.  Regularization is a form of regression that constraints/ regularizes/ shrinks coefficient estimates towards zero.  Y represents the learned relation, β represents the coefficient estimates for different variables or predictors (X).  Coefficients are chosen so as to minimize loss function 27
  • 28. • Eg. a person’s height and weight, age and sales price of a car, or years of education and annual income • Doesn’t affect DT • kNN affected • Cause • Insufficient data • Dummy variables • Including a variable in the regression that is actually a combination of two other variables. • Identify (corr > 0.4, Variance Inflation Factor score > 5 high correlation ) • Sol • Feature selection • PCA • More data • Ridge regression reduces magnitude of model coefficients 28
  • 29. RIDGE REGULARIZATION (L2 NORM)  Used when data suffers from multicollinearity  RSS is modified by adding shrinkage quantity, λ (tuning parameter) that decides how much we want to penalize the flexibility of our model, intercept β0, is a measure of the mean value of the response when xi1 = xi2 = …= xip = 0.  If we want to minimize the above function, these coefficients need to be small.  When λ = 0, the penalty term has no effect, and the estimates produced by ridge regression will be equal to least squares.  However, as λ→∞, the impact of the shrinkage penalty grows, and ridge regression coefficient estimates will approach zero.  Note: we need to standardize the predictors or bring the predictors to the same scale before performing ridge regression.  Disadvantage: model interpretability 29
  • 30. LASSO REGULARIZATION (L1 NORM)  Least Absolute Shrinkage and Selection Operator  This variation differs from ridge regression only in penalizing the high coefficients.  It uses |βj| (modulus) instead of squares of β, as its penalty.  Lasso method also performs variable selection and is said to yield sparse models. 30
  • 31. RIDGE LASSO COMPARISON  Ridge Regression can be thought of as solving an equation, where summation of squares of coefficients is less than or equal to s. And Lasso can be thought of as an equation where summation of modulus of coefficients is less than or equal to s. Here, s is a constant that exists for each value of shrinkage factor λ. These equations are also referred to as constraint functions.  Consider there are 2 parameters in a given problem. Then according to above formulation:  Ridge regression is expressed by β1² + β2² ≤ s. This implies that ridge regression coefficients have the smallest RSS (loss function) for all points that lie within the circle given by β1² + β2² ≤ s.  for Lasso, the equation becomes, |β1|+|β2| ≤ s. This implies that lasso coefficients have the smallest RSS (loss function) for all points that lie within the diamond given by |β1|+|β2|≤ s. Image shows the constraint functions (green areas), for Lasso (left) and Ridge regression (right), along with contours for RSS (red ellipse). The black point denotes that the least square error is minimized at that point and as we can see that it increases quadratically as we move away from it and the regularization term is minimized at the origin where all the parameters are zero Since Ridge Regression has a circular constraint with no sharp points, this intersection will not generally occur on an axis, and so ridge regression coefficient estimates will be exclusively non-zero. However, Lasso constraint has corners at each of the axes, and so the ellipse will often intersect the constraint region at an axis. When this occurs, one of the coefficients will equal zero. 31
  • 32. BENEFIT  Regularization significantly reduces the variance of model, without substantial increase in its bias.  The tuning parameter λ, controls the impact on bias and variance.  As the value of λ rises, it reduces the value of coefficients and thus reducing the variance.  Till a point, this increase in λ is beneficial as it is only reducing the variance (hence avoiding overfitting), without loosing any important properties in the data.  But after certain value, the model starts loosing important properties, giving rise to bias in the model and thus underfitting. Therefore, the value of λ should be carefully selected. 32
  • 33. SUMMATIVE ASSESSMENT 3 Consider the following dataset showing relationship between food intake (lb) of cows and milk yield (lb). Estimate the parameters for the linear regression model for the dataset: Food (lb) Milk Yield (lb) 4 3.0 6 5.5 10 6.5 12 9.0 4 Fit a Linear Regression model for following relation between mother’s Estirol level and birth weight of child for following data: Estirol (mg/24 hr) Birth weight (g/100) 1 1 2 1 3 2 4 2 5 4 5 Create a relationship model for given data to find relationship b/w height and weight of students. Compute Karl Pearson coefficient and Coefficient of determination. REFER SLIDE 9 6 State benefits of regularization for avoiding overfitting in Linear Regression. State mathematical formulation of Regularization. 7 Explain steps of Gradient Descent Algorithm. 33 1.The rent of a property is related to its area. Given the area in square feet and rent in dollars, find the relationship between area and rent using the concept of linear regression. Also predict the rent for a property of 790 ft square. 2. The marks obtained by a student is dependent on his/her study time. Given the study time in minutes and marks out of 2000. Find the relationship between study time and marks using the concept of Linear Regression. Also predict marks for a student if he/she studied for 790 minutes. Area (ft2) Rent (inr) 360 520 1070 1600 630 1000 890 850 940 1350 500 490 Study Time (min.) Marks obtained 350 520 1070 1600 630 1000 890 850 940 1350 500 490
  • 34. SUMMATIVE ASSESSMENT 34 8. Use the method of Least Square using Regression to predict the final exam grade of a student who received 86 on mid term exam. x (midterm) y (final exam) 65 175 67 133 71 185 71 163 66 126 75 198 67 153 70 163 71 159 69 151 9. Create a relationship model for given data to find relationship between height and weight of students. . Height (inches) Weight (pounds) 72 200 68 165 69 160 71 163 66 126
  • 35. RESOURCES  https://www.youtube.com/watch?v=Rb8MnMEJTI4&list=PLIeGtxpvyG-KE0M1r5cjbC_7Q_dVlKVq4&index=1  https://www.youtube.com/watch?v=ls3XKoGntXg&list=PLIeGtxpvyG-KE0M1r5cjbC_7Q_dVlKVq4&index=3  https://www.youtube.com/watch?v=E5RjzSK0fvY  https://www.youtube.com/watch?v=NF5_btOaCig&list=PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU&index=5  https://www.youtube.com/watch?v=5Z9OIYA8He8&list=PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU&index=9  https://www.youtube.com/watch?v=Xm2C_gTAl8c  https://www.geeksforgeeks.org/mathematical-explanation-for-linear-regression-working/  https://www.analyticsvidhya.com/blog/2016/01/ridge-lasso-regression-python-complete-tutorial/  https://www.youtube.com/playlist?list=PLIeGtxpvyG-IqjoU8IiF0Yu1WtxNq_4z-  https://365datascience.com/r-squared/ 35