Lec04.pdf

•

0 likes•1 view

ssuserbad56d

Linear regression

Data & Analytics

What is a Math/Stats Model
Describe relationships between variables
Deterministic models (no randomness)
Probabilistic models (with randomness)

Deterministic Models
Hypothesize exact relationships
Suitable when prediction error is negligible
Example: Gravity force: F = Gm m /d
1 2
2

Probabilistic Models
Hypothesize two components of the relationship
Deterministic
Random error
Example: Systolic blood pressure of newborns: p = 6d + ϵ
Random error may be due to other factors (e.g. birth weight)

Regression Model
Model relationship between one dependent variable and one or several explanatory variable(s)
bug = α * code size + β * prior bugs + γ * changes + ϵ
Used mainly for prediction and estimation

Regression Modeling Steps
1. Hypothesize deterministic component
2. Specify probability distribution of random error term
3. Evaluate fitted model
4. Use model for prediction and estimation

Model Specification
Specifying the deterministic component
1. Define the dependent variable and independent variable
2. Hypothesize nature of relationship
Functional form (e.g. linear or non‑linear)
Expected Effects (i.e., signs of coefficients)
Interactions between variables

Linear Regression Model
Relationship between variables is a linear function
Y = aX + b + ϵ

Estimating Parameters
Compute model parameters that best fit data

Example: Cow's food intake and milk
Food (lb) Milk yield (lb)
4 3.0
6 5.5
10 6.5
12 9.0

Least Squares Method
Best Fit means minimized sum of squared errors (SSE)

Interpretation of Coefficients
Slope: Estimated Y changes by for each 1 unit increase in X
Intercept: Average value of Y when X = 0

Explanatory and predictive power
R is the measurement of goodness‑of‑fit, i.e., How the model fits to all training data
R = 1 − where Y is the actual dependent variable and is the fitted
R is also called a measurement of explanatory power, i.e. how well the model explains the
data it is trained on
Predictive power indicates how well the model predicts the new data (data not used for
training, also called testing data)
MAE = mean(∣ − Y ∣) where where Y is the actual dependent variable and is the
predicted on testing data
2
2
var(Y )
var( −Y )
Y
^
Y
^
2
Ȳ Ȳ

Cross‑validation
Is used to compute predictive power when only a dataset is available:
1. Divide dataset into two subsets: training and testing data
2. Train the model on training data and make prediction for testing data
3. Repeat many times
4. Compute the final mean absolute error

Similar to Lec04.pdf

Correation, Linear Regression and Multilinear Regression using R softwareshrikrishna kesharwani

Linear RegressionAbdullah al Mamun

Chapter III.pptxBeamlak5

dimensional_analysis.pptxDinaSaad22

Arellano bondNghiên Cứu Định Lượng

1607.01152.pdfAnkitBiswas31

manecohuhuhuhubasicEstimation-1.pptxasdfg hjkl

Multiple Regression.pptTanyaWadhwani4

Sem with amos iiJordan Sitorus

Ders 2 ols .pptErgin Akalpler

12 13 h2_measurement_pptTan Hong

Regression -Linear.pptxGauravchaudhary214677

Multiple RegressionNicholas Manurung

Machine Learning.pdfBeyaNasr1

SURE Model_Panel data.pptxGeetaShreeprabha

Short-term load forecasting with using multiple linear regression IJECEIAES

Lasso and ridge regressionSreerajVA

Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra

Multinomial Logistic Regression.pdfAlemAyahu

Multinomial Logistic Regression AnalysisHARISH Kumar H R

Similar to Lec04.pdf (20)

Correation, Linear Regression and Multilinear Regression using R software

Linear Regression

Chapter III.pptx

dimensional_analysis.pptx

Arellano bond

1607.01152.pdf

manecohuhuhuhubasicEstimation-1.pptx

Multiple Regression.ppt

Sem with amos ii

Ders 2 ols .ppt

12 13 h2_measurement_ppt

Regression -Linear.pptx

Multiple Regression

Machine Learning.pdf

SURE Model_Panel data.pptx

Short-term load forecasting with using multiple linear regression

Lasso and ridge regression

Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University

Multinomial Logistic Regression.pdf

Multinomial Logistic Regression Analysis

Recently uploaded

SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon

原件一样伦敦国王学院毕业证成绩单留信学历认证pwgnohujw

如何办理(UCLA毕业证书）加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv

Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethSamantha Rae Coolbeth

Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics

How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies

Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics

Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7

MATERI MANAJEMEN OF PENYAKIT TETANUS.pptRachmaGhifari

如何办理哥伦比亚大学毕业证(Columbia毕业证）成绩单原版一比一fztigerwe

Northern New England Tableau User Group (TUG) May 2024patrickdtherriault

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969

原件一样(UWO毕业证书）西安大略大学毕业证成绩单留信学历认证pwgnohujw

Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics

edited gordis ebook sixth edition david d.pdfgreat91

如何办理(WashU毕业证书）圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1

社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社

What is Insertion Sort. Its basic informationmuqadasqasim10

1:1原版定制伦敦政治经济学院毕业证(LSE毕业证）成绩单学位证书留信学历认证dq9vz1isj

Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013

Recently uploaded (20)

SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj

原件一样伦敦国王学院毕业证成绩单留信学历认证

如何办理(UCLA毕业证书）加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样

Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth

Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age

How to Transform Clinical Trial Management with Advanced Data Analytics

Predictive Precipitation: Advanced Rain Forecasting Techniques

Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...

MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt

如何办理哥伦比亚大学毕业证(Columbia毕业证）成绩单原版一比一

Northern New England Tableau User Group (TUG) May 2024

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...

原件一样(UWO毕业证书）西安大略大学毕业证成绩单留信学历认证

Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks

edited gordis ebook sixth edition david d.pdf

如何办理(WashU毕业证书）圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证

社内勉強会資料_Object Recognition as Next Token Prediction

What is Insertion Sort. Its basic information

1:1原版定制伦敦政治经济学院毕业证(LSE毕业证）成绩单学位证书留信学历认证

Audience Researchndfhcvnfgvgbhujhgfv.pptx

Lec04.pdf

1. Linear regression

2. What is a Math/Stats Model Describe relationships between variables Deterministic models (no randomness) Probabilistic models (with randomness)

3. Deterministic Models Hypothesize exact relationships Suitable when prediction error is negligible Example: Gravity force: F = Gm m /d 1 2 2

4. Probabilistic Models Hypothesize two components of the relationship Deterministic Random error Example: Systolic blood pressure of newborns: p = 6d + ϵ Random error may be due to other factors (e.g. birth weight)

5. Regression Model Model relationship between one dependent variable and one or several explanatory variable(s) bug = α * code size + β * prior bugs + γ * changes + ϵ Used mainly for prediction and estimation

6. Regression Modeling Steps 1. Hypothesize deterministic component 2. Specify probability distribution of random error term 3. Evaluate fitted model 4. Use model for prediction and estimation

7. Model Specification Specifying the deterministic component 1. Define the dependent variable and independent variable 2. Hypothesize nature of relationship Functional form (e.g. linear or non‑linear) Expected Effects (i.e., signs of coefficients) Interactions between variables

8. Linear Regression Model Relationship between variables is a linear function Y = aX + b + ϵ

9. Estimating Parameters Compute model parameters that best fit data

10. Example: Cow's food intake and milk Food (lb) Milk yield (lb) 4 3.0 6 5.5 10 6.5 12 9.0

11. Least Squares Method Best Fit means minimized sum of squared errors (SSE)

12. Interpretation of Coefficients Slope: Estimated Y changes by for each 1 unit increase in X Intercept: Average value of Y when X = 0

13. Explanatory and predictive power R is the measurement of goodness‑of‑fit, i.e., How the model fits to all training data R = 1 − where Y is the actual dependent variable and is the fitted R is also called a measurement of explanatory power, i.e. how well the model explains the data it is trained on Predictive power indicates how well the model predicts the new data (data not used for training, also called testing data) MAE = mean(∣ − Y ∣) where where Y is the actual dependent variable and is the predicted on testing data 2 2 var(Y ) var( −Y ) Y ^ Y ^ 2 Ȳ Ȳ

14. Cross‑validation Is used to compute predictive power when only a dataset is available: 1. Divide dataset into two subsets: training and testing data 2. Train the model on training data and make prediction for testing data 3. Repeat many times 4. Compute the final mean absolute error

Lec04.pdf

Recommended

Recommended

More Related Content

Similar to Lec04.pdf

Similar to Lec04.pdf (20)

More from ssuserbad56d

More from ssuserbad56d (7)

Recently uploaded

Recently uploaded (20)

Lec04.pdf