SlideShare a Scribd company logo
IPS
Introduction to Probability & Statistics
Lecture - 20
Prof. Pritam Ranjan
IPS
Regression
In statistics, regression analysis is a statistical process for
estimating the relationships between variables.
Pritam Ranjan / OM&QT 1
IPS
Motivating problem
1. Name: Name of cereal
2. mfr: Manufacturer of cereal (e.g., Kelloggs; Post; Quaker Oats, etc. )
3. type: cold or hot
4. calories: calories per serving
5. protein: grams of protein
6. fat: grams of fat
7. sodium: milligrams of sodium
8. fiber: grams of dietary fiber
9. carbo: grams of complex carbohydrates
10. sugars: grams of sugars
11. potass: milligrams of potassium
12. vitamins: vitamins and minerals % of FDA recommended
13. shelf: display shelf (1, 2, or 3, counting from the floor)
14. weight: weight in ounces of one serving
15. cups: number of cups in one serving
16. rating: a rating of the cereals
Pritam Ranjan / OM&QT 2
IPS
Simpler problem
Section G: red circle, Section H: blue traingle
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
20 40 60 80 100
30
40
50
60
70
80
90
Quiz 1
Midterm
Pritam Ranjan / OM&QT 3
IPS
Regression setup
• Variables types
Y - Response variable / Dependent variable
X - Predictor variable / Independent variable
Pritam Ranjan / OM&QT 4
IPS
Regression types
• Multiple regression model – One Y, multiple {X1, X2, ..., Xp}
• Simple regression model – One Y, one predictor X
• Simple linear regression model
• Non-linear regression model
� Spline regression
� Deep Learning models
� Gaussian regression
� non-parametric regression (smoothing)
• Generalized linear regression model
• Multivariate regression model
Pritam Ranjan / OM&QT 5
IPS
Simple linear regression (SLR) model
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
20 40 60 80 100
30
40
50
60
70
80
90
Quiz 1
Midterm
Pritam Ranjan / OM&QT 6
IPS
SLR- model statement
• Paired data: (Xi , Yi ), i = 1, 2, ..., n
• Model:
Yi = β0 + β1Xi + εi , for i = 1, 2, ..., n, and εi
iid
∼ N(0, σ2
)
OR
E(Y|X = Xi ) = β0+β1xi ,
Pritam Ranjan / OM&QT 7
IPS
SLR - model assumptions
• Errors have mean zero
E(εi ) = 0
• Errors have constant variance
Var(εi ) = σ2
• Errors are independent / uncorrelated
Corr(εi , εj ) = 0
• Relationship between Y and X is linear
E(Y|X = Xi ) = β0 + β1xi ,
Pritam Ranjan / OM&QT 8
IPS
Simple linear regression
• Model fitting
⇒ estimating parameters
• Intuitively: what should the fitted line look like?
Pritam Ranjan / OM&QT 9
IPS
Simple linear regression
• Model fitting
→ how to estimate the parameters?
Pritam Ranjan / OM&QT 10
IPS
SLR - model fitting
• Ordinary least squares (OLS) method
minimize Q =
n
�
i=1
ε2
i =
n
�
i=1
(Yi − β0 − β1Xi )2
Pritam Ranjan / OM&QT 11
IPS
SLR - parameter estimates
• Intercept
β̂0 = Ȳ − β̂1X̄
• Slope
β̂1 =
Sxy
Sxx
• Error-variance
σ̂2
=
1
n
n
�
i=1
(Yi − β̂0 − β̂1Xi )2
σ̂2
=
1
n − 2
n
�
i=1
(Yi − β̂0 − β̂1Xi )2
Pritam Ranjan / OM&QT 12
IPS
Example
• A pharmaceutical manufacturer wants to determine the concentration of a key
component of cough medicine that may be used without the drug’s causing
adverse side effects. As part of the analysis, a random sample of 45 patients is
administered doses of varying concentration (X), and the severity of side effects
(Y) is measured. Find the fitted regression line.
x̄ = 88.9, ȳ = 165.3, Sxx = 2, 133.9, Sxy = 4, 502.53, Syy = 12, 500.
Pritam Ranjan / OM&QT 13
IPS
Inference
Pritam Ranjan / OM&QT 14
IPS
SLR - important distributions
• Error variable
εi
iid
∼ N(0, σ2
)
• Conditional distribution of response variable (NOT iid)
Y|(X = Xi ) ∼ N(β0 + β1Xi , σ2
)
• Slope parameter
β̂1 ∼ N
�
β1,
σ2
Sxx
�
• variance parameter
(n − 2)σ̂2
σ2
=
�n
i=1(Yi − β̂0 − β̂1Xi )2
σ2
∼ χ2
(n−2)
Pritam Ranjan / OM&QT 15
IPS
SLR - parameter of interest
• Slope parameter
β̂1 =
Sxy
Sxx
β̂1 ∼ N
�
β1,
σ2
Sxx
�
β1 ∈
�
β̂1 − tn−2,α/2
σ̂
Sx
, β̂1 + tn−2,α/2
σ̂
Sx
�
Reject H0 : β1 = 0 vs. Ha : β1 �= 0 if
�
�
�
�
�
β̂1 − 0
σ̂/Sx
�
�
�
�
�
≥ tn−2,α/2
Pritam Ranjan / OM&QT 16
IPS
SLR - parameter of interest
• Slope parameter β1 = 0 vs. β1 �= 0
Pritam Ranjan / OM&QT 17
IPS
Prediction
Pritam Ranjan / OM&QT 18
IPS
SLR - Prediction
• Two quantities of interest
(1) Estimate the regression line
E[Y|X = X0] = β0 + β1X0
(2) Predict a future observation
Y|(X = Xnew ) = β0 + β1Xnew + εnew
Pritam Ranjan / OM&QT 19
IPS
SLR - Prediction
• Confidence interval - interval estimate for the regression line
[95% chance that your regression line will lie in this band]
• Prediction interval - interval estimate for predicting new observation
[95% chance that your new observation will lie in this band]
Pritam Ranjan / OM&QT 20
IPS
Model assessment
Pritam Ranjan / OM&QT 21
IPS
SLR - ANOVA
• Total variability - Sum of squares total
SSTo =
n
�
i=1
(Yi − Ȳ)2
df = n − 1
• Unexplained variability - Sum of squares error
SSE =
n
�
i=1
(Yi − Ŷi )2
df = n − k
• Variability explained by the model - Sum of squares regression
SSReg =
n
�
i=1
(Ŷi − Ȳ)2
df = k − 1
Pritam Ranjan / OM&QT 22
IPS
SLR - GOF (R2
)
• Goodness of fit measures – R2
should be close to 1
Pritam Ranjan / OM&QT 23
IPS
Model Diagnostic
Pritam Ranjan / OM&QT 24
IPS
SLR - diagnostic check
• Is the relation between X and Y linear ?
Pritam Ranjan / OM&QT 25
IPS
SLR - diagnostic check
• Is the error variance constant?
Pritam Ranjan / OM&QT 26
IPS
SLR - diagnostic check
• Are the errors random?
Pritam Ranjan / OM&QT 27
IPS
SLR - diagnostic check
• Are the errors normal?
Pritam Ranjan / OM&QT 28
IPS
SLR - diagnostic check
• Are there any outliers?
Pritam Ranjan / OM&QT 29
IPS
Violations !!
Pritam Ranjan / OM&QT 30
IPS
Violation of Assumptions
• f(·) is non-linear (transformation /non-linear regression)
• Remove outliers (be super careful)
• Non-normal data (CI and testing via Bootstrapping)
• Non-constant variance (transformation / different model)
• εi ’s are correlated to each other
• Non-zero correlation between ε and X?
• Randomness in X is also present.
Pritam Ranjan / OM&QT 31
IPS
Next
• End-term
• Good luck !!
Pritam Ranjan / OM&QT 32

More Related Content

Similar to L20_D.pdf

การสุ่มตัวอย่างในงานวิจัยสาธารณสุข
การสุ่มตัวอย่างในงานวิจัยสาธารณสุขการสุ่มตัวอย่างในงานวิจัยสาธารณสุข
การสุ่มตัวอย่างในงานวิจัยสาธารณสุข
Ultraman Taro
 
U1.4-RVDistributions.ppt
U1.4-RVDistributions.pptU1.4-RVDistributions.ppt
U1.4-RVDistributions.ppt
Sameeraasif2
 
Correlation new 2017 black
Correlation new 2017 blackCorrelation new 2017 black
Correlation new 2017 black
fizjadoon
 
spearman correlation.pdf
spearman correlation.pdfspearman correlation.pdf
spearman correlation.pdf
CommunicationElectro
 
Thesis seminar
Thesis seminarThesis seminar
Thesis seminar
gvesom
 
14.pdf
14.pdf14.pdf
Parity-Violating and Parity-Conserving Asymmetries in ep and eN Scattering in...
Parity-Violating and Parity-Conserving Asymmetries in ep and eN Scattering in...Parity-Violating and Parity-Conserving Asymmetries in ep and eN Scattering in...
Parity-Violating and Parity-Conserving Asymmetries in ep and eN Scattering in...
Wouter Deconinck
 
Roslin qtl mapping
Roslin qtl mappingRoslin qtl mapping
Roslin qtl mapping
SKUASTKashmir
 
kape_science
kape_sciencekape_science
kape_science
Kapernicus AB
 
CHOIRUDDIN(1)
CHOIRUDDIN(1)CHOIRUDDIN(1)
CHOIRUDDIN(1)
Achmad Choiruddin
 
Pennell-Evolution-2014-talk
Pennell-Evolution-2014-talkPennell-Evolution-2014-talk
Pennell-Evolution-2014-talk
mwpennell
 
Pearson's correlation coefficient
Pearson's correlation coefficientPearson's correlation coefficient
Pearson's correlation coefficient
Waleed Zaghal
 
Chapter07.pdf
Chapter07.pdfChapter07.pdf
Chapter07.pdf
KarenJoyBabida
 
Parity-Violating and Parity-Conserving Asymmetries in ep and eN scattering in...
Parity-Violating and Parity-Conserving Asymmetries in ep and eN scattering in...Parity-Violating and Parity-Conserving Asymmetries in ep and eN scattering in...
Parity-Violating and Parity-Conserving Asymmetries in ep and eN scattering in...
Wouter Deconinck
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
VARUN KUMAR
 
Basic calculations of measurement uncertainty in medical testing
Basic calculations of measurement uncertainty in medical testingBasic calculations of measurement uncertainty in medical testing
Basic calculations of measurement uncertainty in medical testing
GH Yeoh
 
Statistics-Non parametric test
Statistics-Non parametric testStatistics-Non parametric test
Statistics-Non parametric test
Rabin BK
 
Sociology 601 class 7
Sociology 601 class 7Sociology 601 class 7
Sociology 601 class 7
Rishabh Gupta
 
Admission in India
Admission in IndiaAdmission in India
Admission in India
Edhole.com
 
L18_D.pdf
L18_D.pdfL18_D.pdf
L18_D.pdf
ssuserdc94e8
 

Similar to L20_D.pdf (20)

การสุ่มตัวอย่างในงานวิจัยสาธารณสุข
การสุ่มตัวอย่างในงานวิจัยสาธารณสุขการสุ่มตัวอย่างในงานวิจัยสาธารณสุข
การสุ่มตัวอย่างในงานวิจัยสาธารณสุข
 
U1.4-RVDistributions.ppt
U1.4-RVDistributions.pptU1.4-RVDistributions.ppt
U1.4-RVDistributions.ppt
 
Correlation new 2017 black
Correlation new 2017 blackCorrelation new 2017 black
Correlation new 2017 black
 
spearman correlation.pdf
spearman correlation.pdfspearman correlation.pdf
spearman correlation.pdf
 
Thesis seminar
Thesis seminarThesis seminar
Thesis seminar
 
14.pdf
14.pdf14.pdf
14.pdf
 
Parity-Violating and Parity-Conserving Asymmetries in ep and eN Scattering in...
Parity-Violating and Parity-Conserving Asymmetries in ep and eN Scattering in...Parity-Violating and Parity-Conserving Asymmetries in ep and eN Scattering in...
Parity-Violating and Parity-Conserving Asymmetries in ep and eN Scattering in...
 
Roslin qtl mapping
Roslin qtl mappingRoslin qtl mapping
Roslin qtl mapping
 
kape_science
kape_sciencekape_science
kape_science
 
CHOIRUDDIN(1)
CHOIRUDDIN(1)CHOIRUDDIN(1)
CHOIRUDDIN(1)
 
Pennell-Evolution-2014-talk
Pennell-Evolution-2014-talkPennell-Evolution-2014-talk
Pennell-Evolution-2014-talk
 
Pearson's correlation coefficient
Pearson's correlation coefficientPearson's correlation coefficient
Pearson's correlation coefficient
 
Chapter07.pdf
Chapter07.pdfChapter07.pdf
Chapter07.pdf
 
Parity-Violating and Parity-Conserving Asymmetries in ep and eN scattering in...
Parity-Violating and Parity-Conserving Asymmetries in ep and eN scattering in...Parity-Violating and Parity-Conserving Asymmetries in ep and eN scattering in...
Parity-Violating and Parity-Conserving Asymmetries in ep and eN scattering in...
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Basic calculations of measurement uncertainty in medical testing
Basic calculations of measurement uncertainty in medical testingBasic calculations of measurement uncertainty in medical testing
Basic calculations of measurement uncertainty in medical testing
 
Statistics-Non parametric test
Statistics-Non parametric testStatistics-Non parametric test
Statistics-Non parametric test
 
Sociology 601 class 7
Sociology 601 class 7Sociology 601 class 7
Sociology 601 class 7
 
Admission in India
Admission in IndiaAdmission in India
Admission in India
 
L18_D.pdf
L18_D.pdfL18_D.pdf
L18_D.pdf
 

Recently uploaded

Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
HODECEDSIET
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
enizeyimana36
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 

Recently uploaded (20)

Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 

L20_D.pdf

  • 1. IPS Introduction to Probability & Statistics Lecture - 20 Prof. Pritam Ranjan
  • 2. IPS Regression In statistics, regression analysis is a statistical process for estimating the relationships between variables. Pritam Ranjan / OM&QT 1
  • 3. IPS Motivating problem 1. Name: Name of cereal 2. mfr: Manufacturer of cereal (e.g., Kelloggs; Post; Quaker Oats, etc. ) 3. type: cold or hot 4. calories: calories per serving 5. protein: grams of protein 6. fat: grams of fat 7. sodium: milligrams of sodium 8. fiber: grams of dietary fiber 9. carbo: grams of complex carbohydrates 10. sugars: grams of sugars 11. potass: milligrams of potassium 12. vitamins: vitamins and minerals % of FDA recommended 13. shelf: display shelf (1, 2, or 3, counting from the floor) 14. weight: weight in ounces of one serving 15. cups: number of cups in one serving 16. rating: a rating of the cereals Pritam Ranjan / OM&QT 2
  • 4. IPS Simpler problem Section G: red circle, Section H: blue traingle l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 20 40 60 80 100 30 40 50 60 70 80 90 Quiz 1 Midterm Pritam Ranjan / OM&QT 3
  • 5. IPS Regression setup • Variables types Y - Response variable / Dependent variable X - Predictor variable / Independent variable Pritam Ranjan / OM&QT 4
  • 6. IPS Regression types • Multiple regression model – One Y, multiple {X1, X2, ..., Xp} • Simple regression model – One Y, one predictor X • Simple linear regression model • Non-linear regression model � Spline regression � Deep Learning models � Gaussian regression � non-parametric regression (smoothing) • Generalized linear regression model • Multivariate regression model Pritam Ranjan / OM&QT 5
  • 7. IPS Simple linear regression (SLR) model l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 20 40 60 80 100 30 40 50 60 70 80 90 Quiz 1 Midterm Pritam Ranjan / OM&QT 6
  • 8. IPS SLR- model statement • Paired data: (Xi , Yi ), i = 1, 2, ..., n • Model: Yi = β0 + β1Xi + εi , for i = 1, 2, ..., n, and εi iid ∼ N(0, σ2 ) OR E(Y|X = Xi ) = β0+β1xi , Pritam Ranjan / OM&QT 7
  • 9. IPS SLR - model assumptions • Errors have mean zero E(εi ) = 0 • Errors have constant variance Var(εi ) = σ2 • Errors are independent / uncorrelated Corr(εi , εj ) = 0 • Relationship between Y and X is linear E(Y|X = Xi ) = β0 + β1xi , Pritam Ranjan / OM&QT 8
  • 10. IPS Simple linear regression • Model fitting ⇒ estimating parameters • Intuitively: what should the fitted line look like? Pritam Ranjan / OM&QT 9
  • 11. IPS Simple linear regression • Model fitting → how to estimate the parameters? Pritam Ranjan / OM&QT 10
  • 12. IPS SLR - model fitting • Ordinary least squares (OLS) method minimize Q = n � i=1 ε2 i = n � i=1 (Yi − β0 − β1Xi )2 Pritam Ranjan / OM&QT 11
  • 13. IPS SLR - parameter estimates • Intercept β̂0 = Ȳ − β̂1X̄ • Slope β̂1 = Sxy Sxx • Error-variance σ̂2 = 1 n n � i=1 (Yi − β̂0 − β̂1Xi )2 σ̂2 = 1 n − 2 n � i=1 (Yi − β̂0 − β̂1Xi )2 Pritam Ranjan / OM&QT 12
  • 14. IPS Example • A pharmaceutical manufacturer wants to determine the concentration of a key component of cough medicine that may be used without the drug’s causing adverse side effects. As part of the analysis, a random sample of 45 patients is administered doses of varying concentration (X), and the severity of side effects (Y) is measured. Find the fitted regression line. x̄ = 88.9, ȳ = 165.3, Sxx = 2, 133.9, Sxy = 4, 502.53, Syy = 12, 500. Pritam Ranjan / OM&QT 13
  • 16. IPS SLR - important distributions • Error variable εi iid ∼ N(0, σ2 ) • Conditional distribution of response variable (NOT iid) Y|(X = Xi ) ∼ N(β0 + β1Xi , σ2 ) • Slope parameter β̂1 ∼ N � β1, σ2 Sxx � • variance parameter (n − 2)σ̂2 σ2 = �n i=1(Yi − β̂0 − β̂1Xi )2 σ2 ∼ χ2 (n−2) Pritam Ranjan / OM&QT 15
  • 17. IPS SLR - parameter of interest • Slope parameter β̂1 = Sxy Sxx β̂1 ∼ N � β1, σ2 Sxx � β1 ∈ � β̂1 − tn−2,α/2 σ̂ Sx , β̂1 + tn−2,α/2 σ̂ Sx � Reject H0 : β1 = 0 vs. Ha : β1 �= 0 if � � � � � β̂1 − 0 σ̂/Sx � � � � � ≥ tn−2,α/2 Pritam Ranjan / OM&QT 16
  • 18. IPS SLR - parameter of interest • Slope parameter β1 = 0 vs. β1 �= 0 Pritam Ranjan / OM&QT 17
  • 20. IPS SLR - Prediction • Two quantities of interest (1) Estimate the regression line E[Y|X = X0] = β0 + β1X0 (2) Predict a future observation Y|(X = Xnew ) = β0 + β1Xnew + εnew Pritam Ranjan / OM&QT 19
  • 21. IPS SLR - Prediction • Confidence interval - interval estimate for the regression line [95% chance that your regression line will lie in this band] • Prediction interval - interval estimate for predicting new observation [95% chance that your new observation will lie in this band] Pritam Ranjan / OM&QT 20
  • 23. IPS SLR - ANOVA • Total variability - Sum of squares total SSTo = n � i=1 (Yi − Ȳ)2 df = n − 1 • Unexplained variability - Sum of squares error SSE = n � i=1 (Yi − Ŷi )2 df = n − k • Variability explained by the model - Sum of squares regression SSReg = n � i=1 (Ŷi − Ȳ)2 df = k − 1 Pritam Ranjan / OM&QT 22
  • 24. IPS SLR - GOF (R2 ) • Goodness of fit measures – R2 should be close to 1 Pritam Ranjan / OM&QT 23
  • 26. IPS SLR - diagnostic check • Is the relation between X and Y linear ? Pritam Ranjan / OM&QT 25
  • 27. IPS SLR - diagnostic check • Is the error variance constant? Pritam Ranjan / OM&QT 26
  • 28. IPS SLR - diagnostic check • Are the errors random? Pritam Ranjan / OM&QT 27
  • 29. IPS SLR - diagnostic check • Are the errors normal? Pritam Ranjan / OM&QT 28
  • 30. IPS SLR - diagnostic check • Are there any outliers? Pritam Ranjan / OM&QT 29
  • 32. IPS Violation of Assumptions • f(·) is non-linear (transformation /non-linear regression) • Remove outliers (be super careful) • Non-normal data (CI and testing via Bootstrapping) • Non-constant variance (transformation / different model) • εi ’s are correlated to each other • Non-zero correlation between ε and X? • Randomness in X is also present. Pritam Ranjan / OM&QT 31
  • 33. IPS Next • End-term • Good luck !! Pritam Ranjan / OM&QT 32