2. What is a Math/Stats Model?
1. Often Describe Relationship between Variables
1. Types
- Deterministic Models (no randomness)
- Probabilistic Models (with randomness)
RV College
of
Engineering
Go, change the world
3. Deterministic Models
1. Hypothesize Exact Relationships
2. Suitable When Prediction Error is Negligible
3. Example: Body mass index (BMI) is measure of
body fat based
– Metric Formula: BMI = Weight in Kilograms
(Height in Meters)2
– Non-metric Formula: BMI = Weight (pounds)x703
(Height in inches)2
RV College
of
Engineering
Go, change the world
4. Probabilistic Models
1. Hypothesize 2 Components
• Deterministic
• Random Error
2. Example: Systolic blood pressure of newborns Is 6
Times the Age in days + Random Error
• SBP = 6xage(d) + ε
• Random Error May Be Due to Factors Other
Than age in days (e.g. Birthweight)
RV College
of
Engineering
Go, change the world
6. Regression Models
• Relationship between one dependent variable
and explanatory variable(s)
• Use equation to set up relationship
• Numerical Dependent (Response) Variable
• 1 or More Numerical or Categorical Independent
(Explanatory) Variables
• Used Mainly for Prediction & Estimation
RV College
of
Engineering
Go, change the world
7. Regression Modeling Steps
• 1. Hypothesize Deterministic Component
• Estimate Unknown Parameters
• 2. Specify Probability Distribution of
Random Error Term
• Estimate Standard Deviation of Error
• 3. Evaluate the fitted Model
• 4. Use Model for Prediction & Estimation
RV College
of
Engineering
Go, change the world
19. Y X
i i i
= + +
β β ε
0 1
Linear Regression Model
• 1. Relationship Between Variables Is a
Linear Function
Dependent
(Response)
Variable
Independent (Explanatory)
Variable
Population
Slope
Population
Y-Intercept
Random
Error
RV College
of
Engineering
Go, change the world
21. Scatter plot
• 1. Plot of All (Xi, Yi) Pairs
• 2. Suggests How Well Model Will Fit
0
20
40
60
0 20 40 60
X
Y
RV College
of
Engineering
Go, change the world
22. How would you draw a line through the points?
How do you determine which line ‘fits best’?
0
20
40
60
0 20 40 60
X
Y
RV College
of
Engineering
Go, change the world
23. How would you draw a line through the points?
How do you determine which line ‘fits best’?
0
20
40
60
0 20 40 60
X
Y
Slope changed
Intercept unchanged
RV College
of
Engineering
Go, change the world
24. How would you draw a line through the points?
How do you determine which line ‘fits best’?
0
20
40
60
0 20 40 60
X
Y
Slope unchanged
Intercept changed
RV College
of
Engineering
Go, change the world
25. How would you draw a line through the points?
How do you determine which line ‘fits best’?
0
20
40
60
0 20 40 60
X
Y
Slope changed
Intercept changed
RV College
of
Engineering
Go, change the world
26. Least Squares
• 1. ‘Best Fit’ Means Difference Between
Actual Y Values & Predicted Y Values Are a
Minimum. But Positive Differences Off-Set
Negative ones
RV College
of
Engineering
Go, change the world
27. Least Squares
• 1. ‘Best Fit’ Means Difference Between
Actual Y Values & Predicted Y Values is a
Minimum. But Positive Differences Off-Set
Negative ones. So square errors!
RV College
of
Engineering
Go, change the world
28. Least Squares
• 1. ‘Best Fit’ Means Difference Between
Actual Y Values & Predicted Y Values Are a
Minimum. But Positive Differences Off-Set
Negative. So square errors!
• 2. LS Minimizes the Sum of the Squared
Differences (errors) (SSE)
RV College
of
Engineering
Go, change the world
41. Interpretation of Coefficients
• 1. Slope (β1)
– Estimated Y Changes by β1 for Each 1 Unit
Increase in X
• If β1 = 2, then Y Is Expected to Increase by 2 for Each 1
Unit Increase in X
^
^
^
RV College
of
Engineering
Go, change the world
42. Interpretation of Coefficients
• 1. Slope (β1)
– Estimated Y Changes by β1 for Each 1 Unit
Increase in X
• If β1 = 2, then Y Is Expected to Increase by 2 for Each 1
Unit Increase in X
• 2. Y-Intercept (β0)
– Average Value of Y When X = 0
• If β0 = 4, then Average Y Is Expected to Be 4
When X Is 0
^
^
^
^
^
RV College
of
Engineering
Go, change the world
43. Parameter Estimation Example-1
• Obstetrics: What is the relationship between
Mother’s Estriol (harmone)level & Birthweight using the
following data?
Estriol Birthweight
(mg/24h) (g/1000)
1 1
2 1
3 2
4 2
5 4
RV College
of
Engineering
Go, change the world
48. Coefficient Interpretation Solution
• 1. Slope (β1)
– Birthweight (Y) Is Expected to Increase by .7 Units
for Each 1 unit Increase in Estriol (X)
^
RV College
of
Engineering
Go, change the world
49. Coefficient Interpretation Solution
• 1. Slope (β1)
– Birthweight (Y) Is Expected to Increase by .7 Units
for Each 1 unit Increase in Estriol (X)
• 2. Intercept (β0)
– Average Birthweight (Y) Is -.10 Units When Estriol
level (X) Is 0
• Difficult to explain
• The birthweight should always be positive
^
^
RV College
of
Engineering
Go, change the world
55. Coefficient Interpretation
• 1. Slope (β1)
– Milk Yield (Y) Is Expected to Increase by .65 lb.
for Each 1 lb. Increase in Food intake (X)
^
RV College
of
Engineering
Go, change the world
56. Coefficient Interpretation
• 1. Slope (β1)
– Milk Yield (Y) Is Expected to Increase by .65 lb.
for Each 1 lb. Increase in Food intake (X)
• 2. Y-Intercept (β0)
– Average Milk yield (Y) Is Expected to Be 0.8 lb.
When Food intake (X) Is 0
^
^
RV College
of
Engineering
Go, change the world
57. Implementation
• Import the packages and classes you need.
• Provide data to work with and eventually do appropriate
transformations.
• Create a regression model and fit it with existing data.
• Check the results of model fitting to know whether the model is
satisfactory.
• Apply the model for predictions.
RV College
of
Engineering
Go, change the world
58. RV College
of
Engineering
Go, change the world
Speeds of car : x -- age, y -- speed
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
import matplotlib.pyplot as plt
from scipy import stats
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
59. RV College
of
Engineering
Go, change the world
from scipy import stats
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
speed = myfunc(10)
print(speed)
_____________________________________________________________________
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
60. import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([5, 20, 14, 32, 22, 38])
print(x)
print(y)
model = LinearRegression().fit(x, y)
r_sq = model.score(x, y)
print('coefficient of determination:', r_sq)
print('intercept:', model.intercept_)
print('slope:', model.coef_)
“””new_model = LinearRegression().fit(x, y.reshape((-1, 1)))
print('intercept:', new_model.intercept_)
print('slope:', new_model.coef_)”””
y_pred = model.predict(x)
print('predicted response:', y_pred, sep='n')
y_pred = model.intercept_ + model.coef_ * x
print('predicted response:', y_pred, sep='n')
RV College
of
Engineering
Go, change the world
90. 90
Adjusted R2
• R2 Never Decreases When New X Variable Is Added
to Model (Disadvantage When Comparing Models)
• Solution: Adjusted R2
– Each additional variable reduces adjusted R2, unless SSE
goes up enough to compensate
91. Points Discussed :
1. coefficient of determination(R squared)
2. Adjusted R square
3. MSE mean square error
4. variance
5. correlation coefficient
6. intercept n slope