SlideShare a Scribd company logo
MULTIPLE LINEAR
REGRESSION
Avjinder Singh Kaler and Kristi Mai
We will look at a method for analyzing a linear relationship involving
more than two variables.
We focus on these key elements:
1. Finding the multiple regression equation.
2. The values of the adjusted R2, and the p-value as measures of
how well the multiple regression equation fits the sample data.
β€’ Multiple Regression Equation – given a collection of sample data with
several (π‘˜βˆ’π‘šπ‘Žπ‘›π‘¦) explanatory variables, the regression equation that
algebraically describes the relationship between the response variable 𝑦
and two or more explanatory variables π‘₯1, π‘₯2, … π‘₯ π‘˜ and is:
𝑦 = 𝑏0 + 𝑏1 π‘₯1 + 𝑏2 π‘₯2 + β‹― + 𝑏 π‘˜ π‘₯ π‘˜
β€’ We are using more than one explanatory variable to predict a response variable now
β€’ In practice, you need large amounts of data to use several predictor/explanatory
variables
* Guideline: Your sample size should be 10 times larger than the number of π‘₯ variables*
β€’ Multiple Regression Line – the graph of the multiple regression equation
β€’ This multiple regression line still fits the sample points best according to the least squares
property
β€’ Visualization – multiple
scatterplots of each pair
(π‘₯ π‘˜, 𝑦) of quantitative data
can still be helpful in
determining whether there
is a relationship between
two variables
β€’ These scatterplots can be
created one at a time.
However, it is common to
visualize all the pairs of
variables within one plot.
This is often called a pairs
plot, pairwise scatterplot
or scatterplot matrix.
Population Parameter Sample Statistic
Equation 𝑦 = 𝛽0 + 𝛽1 π‘₯1 + 𝛽2 π‘₯2 + β‹― + 𝛽 π‘˜ π‘₯ π‘˜ 𝑦 = 𝑏0 + 𝑏1 π‘₯1 + 𝑏2 π‘₯2 + β‹― + 𝑏 π‘˜ π‘₯ π‘˜
Note:
β€’ 𝑦 is the predicted value of 𝑦
β€’ π‘˜ is the number of predictor variables (also called independent
variables or π‘₯ variables)
β€’ Requirements for Regression:
1. The sample data is a Simple Random Sample of quantitative data
2. Each of the pairs of data (π‘₯ π‘˜, 𝑦) has a bivariate normal distribution
(recall this definition)
3. Random errors associated with the regression equation (i.e. residuals)
are independent and normally distributed with a mean of 0 and a
standard deviation 𝜎
β€’ Formulas for 𝑏 π‘˜:
β€’ Statistical software will be used to calculate the individual coefficient
estimates, 𝑏 π‘˜
1. Use common sense and practical considerations to include or
exclude variables
2. Consider the P-value for the test of overall model significance
β€’ Hypotheses:
𝐻0: 𝛽1 = 𝛽2 = β‹― = 𝛽 π‘˜ = 0
𝐻1: 𝐴𝑑 π‘™π‘’π‘Žπ‘ π‘‘ π‘œπ‘›π‘’ 𝛽 π‘˜ β‰  0
β€’ Test Statistic: 𝐹 =
𝑀𝑆 π‘…π‘’π‘”π‘Ÿπ‘’π‘ π‘ π‘–π‘œπ‘›
𝑀𝑆(πΈπ‘Ÿπ‘Ÿπ‘œπ‘Ÿ)
β€’ This will result in an ANOVA table with a p-value that expresses the overall
statistical significance of the model
3. Consider equations with high adjusted 𝑹 𝟐 values
β€’ 𝑅 is the multiple correlation coefficient that describes the correlation
between the observed 𝑦 values and the predicted 𝑦 values
β€’ 𝑅2
is the multiple coefficient of determination and measures how well the
multiple regression equation fits the sample data
β€’ Problems: This measure of model β€œfitness” increases as more variables are
included until it can usually raise no more or only by a very little amount no
matter how significant the most recently added predictor variable may be
β€’ Adjusted 𝑅2
is the multiple coefficient of determination that is modified to
account for the number of variables in the model and the sample size
4. Consider equations with the fewest number of predictor/explanatory
variables if models that are being compared are nearly equivalent in
terms of significance and fit (i.e. p-value and adjusted 𝑅2)
β€’ This is known as the β€œLaw of Parsimony”
β€’ We are looking for the simplest yet most informative model
β€’ Individual t-tests of particular regression parameters may help select the
correct model and eliminate insignificant explanatory variables
Notice: If the regression equation does not appear to be useful for predictions,
the best predicted value of a 𝑦 variable is still its point estimate [i.e. the sample
mean of the 𝑦 variable would be the best predicted value for that variable]
β€’ Identify the response and potential explanatory variables by
constructing a scatterplot matrix
β€’ Create a multiple regression model
β€’ Perform the appropriate tests of the following:
β€’ Overall model significance (the ANOVA i.e. the 𝐹 test)
β€’ Individual variable significance (𝑑 tests)
β€’ In addition, find the following:
β€’ Find the adjusted 𝑅2 value to assess the predictive power of the model
β€’ Perform a Residual Analysis to verify the Requirements for Linear
Regression have been satisfied:
1. Construct a residual plot and verify that there is no pattern (other than a
straight line pattern) and also verify that the residual plot does not
become thicker or thinner
β€’ Examples are shown below:
2. Use a histogram, normal quantile plot, or Shapiro Wilk test of normality
to confirm that the values of the residuals have a distribution that is
approximately normal
β€’ Normal Quantile Plot (aka QQ Plot) * Examples on the next 3 slides *
β€’ Shapiro Wilk Normality Test
β€’ This will help you assess the normality of a given set of data (in this case, the
normality of the residuals) when the visual examination of the QQ Plot and/or
the histogram of the data seem unclear to you and leave you stumped!
β€’ Hypotheses:
H0: Th݁ έ€π‘Žπ‘‘π‘Ž άΏβ€«έέπ‘šέ‹β€¬ έ‚β€«π‘šέ‹έŽβ€¬ π‘Ž π‘›β€«έˆπ‘Žπ‘šέŽέ‹β€¬ έ€έ…β€«π‘›έ‹έ…π‘‘έ‘άΎέ…έŽπ‘‘έβ€¬
H1: Th݁ έ€π‘Žπ‘‘π‘Ž ݀‫ݏ݁݋‬ 𝑛‫𝑑݋‬ π‘Žβ€«έŽπ‘ŽέέŒέŒβ€¬ 𝑑‫݋‬ άΏβ€«έπ‘šέ‹β€¬ έ‚β€«π‘šέ‹έŽβ€¬ π‘Ž π‘›β€«έˆπ‘Žπ‘šέŽέ‹β€¬ έ€έ…β€«π‘›έ‹έ…π‘‘έ‘άΎέ…έŽπ‘‘έβ€¬
Normal: Histogram of IQ scores is close to being bell-shaped, suggests that the IQ
scores are from a normal distribution. The normal quantile plot shows points that are
reasonably close to a straight-line pattern. It is safe to assume that these IQ scores
are from a normally distributed population.
Uniform: Histogram of data having a uniform distribution. The corresponding
normal quantile plot suggests that the points are not normally distributed because
the points show a systematic pattern that is not a straight-line pattern. These
sample values are not from a population having a normal distribution.
Skewed: Histogram of the amounts of rainfall in Boston for every Monday during
one year. The shape of the histogram is skewed, not bell-shaped. The
corresponding normal quantile plot shows points that are not at all close to a
straight-line pattern. These rainfall amounts are not from a population having a
normal distribution.
The table to the right includes a random
sample of heights of mothers, fathers, and their
daughters (based on data from the National
Health and Nutrition Examination).
Find the multiple regression equation in which
the response (y) variable is the height of a
daughter and the predictor (x) variables are
the height of the mother and height of the
father.
The StatCrunch results are shown here:
From the display, we see that the multiple
regression equation is:
π·π‘Žπ‘’π‘”β„Žπ‘‘π‘’π‘Ÿ = 7.5 + 0.707π‘€π‘œπ‘‘β„Žπ‘’π‘Ÿ + 0.164 πΉπ‘Žπ‘‘β„Žπ‘’π‘Ÿ
We could write this equation as:
𝑦 = 7.5 + 0.707π‘₯1 + 0.164π‘₯2
where 𝑦 is the predicted height of a
daughter,
π‘₯1 is the height of the mother, and π‘₯2 is the
height of the father.
The preceding technology display shows the adjusted coefficient of
determination as R-Sq(adj) = 63.7%.
When we compare this multiple regression equation to others, it is better
to use the adjusted R2 of 63.7%
Based on StatCrunch, the p-value is less than 0.0001, indicating that the
multiple regression equation has good overall significance and is usable
for predictions.
That is, it makes sense to predict the heights of daughters based on heights
of mothers and fathers.
The p-value results from a test of the null hypothesis that Ξ²1 = Ξ²2 = 0, and
rejection of this hypothesis indicates the equation is effective in predicting
the heights of daughters.
Data Set 2 in Appendix B includes the age, foot length, shoe print length,
shoe size, and height for each of 40 different subjects.
Using those sample data, find the regression equation that is the best for
predicting height.
The table on the next slide includes key results from the combinations of
the five predictor variables.
Using critical thinking and statistical analysis:
1. Delete the variable age.
2. Delete the variable shoe size, because it is really a rounded form of foot length.
3. For the remaining variables of foot length and shoe print length, select foot length
because its adjusted R2 of 0.7014 is greater than 0.6520 for shoe print length.
4. Although it appears that only foot length is best, we note that criminals usually wear
shoes, so shoe print lengths are likely to be found than foot lengths.
Hence, the final regression equation only including foot length:
𝑦 = 𝛽0 + 𝛽1 π‘₯1
where 𝛽0 is the intercept, 𝛽1 is the coefficient corresponding to x1 variable (foot length).
The methods of the above section (Multiple Linear Regression) rely on variables
that are continuous in nature. Many times we are interested in dichotomous or
binary variables.
These variables have only two possible categorical outcomes such as
male/female, success/failure, dead/alive, etc.
Indicator or dummy variables are artificial variables that can be used to specify
the categories of the binary variable such as 0=male/1=female.
If an indicator variable is included in the regression model as a
predictor/explanatory variable, the methods we have are appropriate.
HOWEVER, can we handle a situation when the variable we are trying to predict
is categorical and/or binary? Notice that this is a different situation.
But, YES!!
The data in the table also includes
the dummy variable of sex (coded
as 0 = female and 1 = male).
Given that a mother is 63 inches tall
and a father is 69 inches tall, find the
regression equation and use it to
predict the height of a daughter and
a son.
Using technology, we get the regression equation:
π»π‘’π‘–π‘”β„Žπ‘‘ π‘œπ‘“ πΆβ„Žπ‘–π‘™π‘‘ = 25.6 + 0.377 π»π‘’π‘–π‘”β„Žπ‘‘ π‘œπ‘“ π‘€π‘œπ‘‘β„Žπ‘’π‘Ÿ + 0.195 π»π‘’π‘–π‘”β„Žπ‘‘ π‘œπ‘“ πΉπ‘Žπ‘‘β„Žπ‘’π‘Ÿ + 4.15(𝑠𝑒π‘₯)
We substitute in 0 for the sex variable, 63 for the mother, and 69 for the
father, and predict the daughter will be 62.8 inches tall.
We substitute in 1 for the sex variable, 63 for the mother, and 69 for the
father, and predict the son will be 67 inches tall.

More Related Content

What's hot

R square vs adjusted r square
R square vs adjusted r squareR square vs adjusted r square
R square vs adjusted r square
Akhilesh Joshi
Β 
Statistical inference 2
Statistical inference 2Statistical inference 2
Statistical inference 2
safi Ullah
Β 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regression
alok tiwari
Β 
Regression analysis
Regression analysisRegression analysis
Regression analysissaba khan
Β 
Regression analysis
Regression analysisRegression analysis
Regression analysisRavi shankar
Β 
Model selection
Model selectionModel selection
Model selection
Animesh Kumar
Β 
Linear Regression Using SPSS
Linear Regression Using SPSSLinear Regression Using SPSS
Linear Regression Using SPSS
Dr Athar Khan
Β 
Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)
Naveen Kumar Medapalli
Β 
Ordinary least squares linear regression
Ordinary least squares linear regressionOrdinary least squares linear regression
Ordinary least squares linear regression
Elkana Rorio
Β 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis pptElkana Rorio
Β 
Regression
RegressionRegression
Regression
ICFAI Business School
Β 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
DrZahid Khan
Β 
Polynomial regression
Polynomial regressionPolynomial regression
Polynomial regression
naveedaliabad
Β 
Regression
RegressionRegression
Regression
RAVI PRASAD K.J.
Β 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
sonia gupta
Β 
Correlation in Statistics
Correlation in StatisticsCorrelation in Statistics
Correlation in Statistics
Avjinder (Avi) Kaler
Β 
Linear regression theory
Linear regression theoryLinear regression theory
Linear regression theory
Saurav Mukherjee
Β 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Srikant001p
Β 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
COSTARCH Analytical Consulting (P) Ltd.
Β 

What's hot (20)

R square vs adjusted r square
R square vs adjusted r squareR square vs adjusted r square
R square vs adjusted r square
Β 
Statistical inference 2
Statistical inference 2Statistical inference 2
Statistical inference 2
Β 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regression
Β 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Β 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Β 
Model selection
Model selectionModel selection
Model selection
Β 
Linear Regression Using SPSS
Linear Regression Using SPSSLinear Regression Using SPSS
Linear Regression Using SPSS
Β 
Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)Multiple Regression Analysis (MRA)
Multiple Regression Analysis (MRA)
Β 
Ordinary least squares linear regression
Ordinary least squares linear regressionOrdinary least squares linear regression
Ordinary least squares linear regression
Β 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
Β 
Regression
RegressionRegression
Regression
Β 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
Β 
Polynomial regression
Polynomial regressionPolynomial regression
Polynomial regression
Β 
Regression
RegressionRegression
Regression
Β 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
Β 
Multiple Linear Regression
Multiple Linear Regression Multiple Linear Regression
Multiple Linear Regression
Β 
Correlation in Statistics
Correlation in StatisticsCorrelation in Statistics
Correlation in Statistics
Β 
Linear regression theory
Linear regression theoryLinear regression theory
Linear regression theory
Β 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Β 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
Β 

Similar to Multiple linear regression

Lecture 4
Lecture 4Lecture 4
Lecture 4
Farzad Javidanrad
Β 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
AbdalrahmanTahaJaya
Β 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
AbdalrahmanTahaJaya
Β 
Data Ananlysis lecture 7 Simon Fraser University
Data Ananlysis lecture 7 Simon Fraser UniversityData Ananlysis lecture 7 Simon Fraser University
Data Ananlysis lecture 7 Simon Fraser University
soniyamarghani
Β 
Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your research
Dorothy Bishop
Β 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Derek Kane
Β 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysis
Farzad Javidanrad
Β 
Inorganic CHEMISTRY
Inorganic CHEMISTRYInorganic CHEMISTRY
Inorganic CHEMISTRY
Saikumar raja
Β 
Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
Ergin Akalpler
Β 
Lect w8 w9_correlation_regression
Lect w8 w9_correlation_regressionLect w8 w9_correlation_regression
Lect w8 w9_correlation_regression
Rione Drevale
Β 
Kendall's ,partial correlation and scatter plot
Kendall's ,partial correlation and scatter plotKendall's ,partial correlation and scatter plot
Kendall's ,partial correlation and scatter plot
Bharath kumar Karanam
Β 
Measure of Association
Measure of AssociationMeasure of Association
Measure of Association
Kalahandi University
Β 
Principal components
Principal componentsPrincipal components
Principal componentsHutami Endang
Β 
4. correlations
4. correlations4. correlations
4. correlations
Steve Saffhill
Β 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.ppt
mousaderhem1
Β 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
Valerii Klymchuk
Β 
2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data 2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data
Malik Hassan Qayyum πŸ•΅πŸ»β€β™‚οΈ
Β 
Statistical parameters
Statistical parametersStatistical parameters
Statistical parameters
Burdwan University
Β 
Descriptive Analysis.pptx
Descriptive Analysis.pptxDescriptive Analysis.pptx
Descriptive Analysis.pptx
Parveen Vashisth
Β 
Factor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptxFactor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptx
GauravRajole
Β 

Similar to Multiple linear regression (20)

Lecture 4
Lecture 4Lecture 4
Lecture 4
Β 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
Β 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
Β 
Data Ananlysis lecture 7 Simon Fraser University
Data Ananlysis lecture 7 Simon Fraser UniversityData Ananlysis lecture 7 Simon Fraser University
Data Ananlysis lecture 7 Simon Fraser University
Β 
Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your research
Β 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Β 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysis
Β 
Inorganic CHEMISTRY
Inorganic CHEMISTRYInorganic CHEMISTRY
Inorganic CHEMISTRY
Β 
Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
Β 
Lect w8 w9_correlation_regression
Lect w8 w9_correlation_regressionLect w8 w9_correlation_regression
Lect w8 w9_correlation_regression
Β 
Kendall's ,partial correlation and scatter plot
Kendall's ,partial correlation and scatter plotKendall's ,partial correlation and scatter plot
Kendall's ,partial correlation and scatter plot
Β 
Measure of Association
Measure of AssociationMeasure of Association
Measure of Association
Β 
Principal components
Principal componentsPrincipal components
Principal components
Β 
4. correlations
4. correlations4. correlations
4. correlations
Β 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.ppt
Β 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
Β 
2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data 2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data
Β 
Statistical parameters
Statistical parametersStatistical parameters
Statistical parameters
Β 
Descriptive Analysis.pptx
Descriptive Analysis.pptxDescriptive Analysis.pptx
Descriptive Analysis.pptx
Β 
Factor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptxFactor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptx
Β 

More from Avjinder (Avi) Kaler

Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerUnleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Avjinder (Avi) Kaler
Β 
Tutorial for Deep Learning Project with Keras
Tutorial for Deep Learning Project  with KerasTutorial for Deep Learning Project  with Keras
Tutorial for Deep Learning Project with Keras
Avjinder (Avi) Kaler
Β 
Tutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningTutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine Learning
Avjinder (Avi) Kaler
Β 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdf
Avjinder (Avi) Kaler
Β 
Sql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsSql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functions
Avjinder (Avi) Kaler
Β 
Kaler et al 2018 euphytica
Kaler et al 2018 euphyticaKaler et al 2018 euphytica
Kaler et al 2018 euphytica
Avjinder (Avi) Kaler
Β 
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Avjinder (Avi) Kaler
Β 
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Avjinder (Avi) Kaler
Β 
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesGenome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Avjinder (Avi) Kaler
Β 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Avjinder (Avi) Kaler
Β 
Tutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsTutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plots
Avjinder (Avi) Kaler
Β 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using R
Avjinder (Avi) Kaler
Β 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
Avjinder (Avi) Kaler
Β 
Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...
Avjinder (Avi) Kaler
Β 
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Avjinder (Avi) Kaler
Β 
R code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerR code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder Kaler
Avjinder (Avi) Kaler
Β 
Population genetics
Population geneticsPopulation genetics
Population genetics
Avjinder (Avi) Kaler
Β 
Quantitative genetics
Quantitative geneticsQuantitative genetics
Quantitative genetics
Avjinder (Avi) Kaler
Β 
Abiotic stresses in plant
Abiotic stresses in plantAbiotic stresses in plant
Abiotic stresses in plant
Avjinder (Avi) Kaler
Β 
Seed rate calculation for experiment
Seed rate calculation for experimentSeed rate calculation for experiment
Seed rate calculation for experiment
Avjinder (Avi) Kaler
Β 

More from Avjinder (Avi) Kaler (20)

Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerUnleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Β 
Tutorial for Deep Learning Project with Keras
Tutorial for Deep Learning Project  with KerasTutorial for Deep Learning Project  with Keras
Tutorial for Deep Learning Project with Keras
Β 
Tutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningTutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine Learning
Β 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdf
Β 
Sql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsSql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functions
Β 
Kaler et al 2018 euphytica
Kaler et al 2018 euphyticaKaler et al 2018 euphytica
Kaler et al 2018 euphytica
Β 
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Β 
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Β 
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesGenome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Β 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Β 
Tutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsTutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plots
Β 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using R
Β 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
Β 
Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...
Β 
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Β 
R code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerR code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder Kaler
Β 
Population genetics
Population geneticsPopulation genetics
Population genetics
Β 
Quantitative genetics
Quantitative geneticsQuantitative genetics
Quantitative genetics
Β 
Abiotic stresses in plant
Abiotic stresses in plantAbiotic stresses in plant
Abiotic stresses in plant
Β 
Seed rate calculation for experiment
Seed rate calculation for experimentSeed rate calculation for experiment
Seed rate calculation for experiment
Β 

Recently uploaded

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
Β 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
Β 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
Β 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
Β 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
Β 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
Β 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
Β 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
Β 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
Β 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
Β 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
Β 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
Β 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
Β 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
Β 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
Β 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
Β 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
Β 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
Β 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
Β 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
Β 

Recently uploaded (20)

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Β 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Β 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Β 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Β 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Β 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Β 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Β 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
Β 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
Β 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
Β 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
Β 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Β 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Β 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Β 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Β 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Β 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Β 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Β 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Β 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Β 

Multiple linear regression

  • 2. We will look at a method for analyzing a linear relationship involving more than two variables. We focus on these key elements: 1. Finding the multiple regression equation. 2. The values of the adjusted R2, and the p-value as measures of how well the multiple regression equation fits the sample data.
  • 3. β€’ Multiple Regression Equation – given a collection of sample data with several (π‘˜βˆ’π‘šπ‘Žπ‘›π‘¦) explanatory variables, the regression equation that algebraically describes the relationship between the response variable 𝑦 and two or more explanatory variables π‘₯1, π‘₯2, … π‘₯ π‘˜ and is: 𝑦 = 𝑏0 + 𝑏1 π‘₯1 + 𝑏2 π‘₯2 + β‹― + 𝑏 π‘˜ π‘₯ π‘˜ β€’ We are using more than one explanatory variable to predict a response variable now β€’ In practice, you need large amounts of data to use several predictor/explanatory variables * Guideline: Your sample size should be 10 times larger than the number of π‘₯ variables* β€’ Multiple Regression Line – the graph of the multiple regression equation β€’ This multiple regression line still fits the sample points best according to the least squares property
  • 4. β€’ Visualization – multiple scatterplots of each pair (π‘₯ π‘˜, 𝑦) of quantitative data can still be helpful in determining whether there is a relationship between two variables β€’ These scatterplots can be created one at a time. However, it is common to visualize all the pairs of variables within one plot. This is often called a pairs plot, pairwise scatterplot or scatterplot matrix.
  • 5. Population Parameter Sample Statistic Equation 𝑦 = 𝛽0 + 𝛽1 π‘₯1 + 𝛽2 π‘₯2 + β‹― + 𝛽 π‘˜ π‘₯ π‘˜ 𝑦 = 𝑏0 + 𝑏1 π‘₯1 + 𝑏2 π‘₯2 + β‹― + 𝑏 π‘˜ π‘₯ π‘˜ Note: β€’ 𝑦 is the predicted value of 𝑦 β€’ π‘˜ is the number of predictor variables (also called independent variables or π‘₯ variables)
  • 6. β€’ Requirements for Regression: 1. The sample data is a Simple Random Sample of quantitative data 2. Each of the pairs of data (π‘₯ π‘˜, 𝑦) has a bivariate normal distribution (recall this definition) 3. Random errors associated with the regression equation (i.e. residuals) are independent and normally distributed with a mean of 0 and a standard deviation 𝜎 β€’ Formulas for 𝑏 π‘˜: β€’ Statistical software will be used to calculate the individual coefficient estimates, 𝑏 π‘˜
  • 7. 1. Use common sense and practical considerations to include or exclude variables 2. Consider the P-value for the test of overall model significance β€’ Hypotheses: 𝐻0: 𝛽1 = 𝛽2 = β‹― = 𝛽 π‘˜ = 0 𝐻1: 𝐴𝑑 π‘™π‘’π‘Žπ‘ π‘‘ π‘œπ‘›π‘’ 𝛽 π‘˜ β‰  0 β€’ Test Statistic: 𝐹 = 𝑀𝑆 π‘…π‘’π‘”π‘Ÿπ‘’π‘ π‘ π‘–π‘œπ‘› 𝑀𝑆(πΈπ‘Ÿπ‘Ÿπ‘œπ‘Ÿ) β€’ This will result in an ANOVA table with a p-value that expresses the overall statistical significance of the model
  • 8. 3. Consider equations with high adjusted 𝑹 𝟐 values β€’ 𝑅 is the multiple correlation coefficient that describes the correlation between the observed 𝑦 values and the predicted 𝑦 values β€’ 𝑅2 is the multiple coefficient of determination and measures how well the multiple regression equation fits the sample data β€’ Problems: This measure of model β€œfitness” increases as more variables are included until it can usually raise no more or only by a very little amount no matter how significant the most recently added predictor variable may be β€’ Adjusted 𝑅2 is the multiple coefficient of determination that is modified to account for the number of variables in the model and the sample size
  • 9. 4. Consider equations with the fewest number of predictor/explanatory variables if models that are being compared are nearly equivalent in terms of significance and fit (i.e. p-value and adjusted 𝑅2) β€’ This is known as the β€œLaw of Parsimony” β€’ We are looking for the simplest yet most informative model β€’ Individual t-tests of particular regression parameters may help select the correct model and eliminate insignificant explanatory variables Notice: If the regression equation does not appear to be useful for predictions, the best predicted value of a 𝑦 variable is still its point estimate [i.e. the sample mean of the 𝑦 variable would be the best predicted value for that variable]
  • 10. β€’ Identify the response and potential explanatory variables by constructing a scatterplot matrix β€’ Create a multiple regression model β€’ Perform the appropriate tests of the following: β€’ Overall model significance (the ANOVA i.e. the 𝐹 test) β€’ Individual variable significance (𝑑 tests) β€’ In addition, find the following: β€’ Find the adjusted 𝑅2 value to assess the predictive power of the model
  • 11. β€’ Perform a Residual Analysis to verify the Requirements for Linear Regression have been satisfied: 1. Construct a residual plot and verify that there is no pattern (other than a straight line pattern) and also verify that the residual plot does not become thicker or thinner β€’ Examples are shown below:
  • 12. 2. Use a histogram, normal quantile plot, or Shapiro Wilk test of normality to confirm that the values of the residuals have a distribution that is approximately normal β€’ Normal Quantile Plot (aka QQ Plot) * Examples on the next 3 slides * β€’ Shapiro Wilk Normality Test β€’ This will help you assess the normality of a given set of data (in this case, the normality of the residuals) when the visual examination of the QQ Plot and/or the histogram of the data seem unclear to you and leave you stumped! β€’ Hypotheses: H0: Th݁ έ€π‘Žπ‘‘π‘Ž άΏβ€«έέπ‘šέ‹β€¬ έ‚β€«π‘šέ‹έŽβ€¬ π‘Ž π‘›β€«έˆπ‘Žπ‘šέŽέ‹β€¬ έ€έ…β€«π‘›έ‹έ…π‘‘έ‘άΎέ…έŽπ‘‘έβ€¬ H1: Th݁ έ€π‘Žπ‘‘π‘Ž ݀‫ݏ݁݋‬ 𝑛‫𝑑݋‬ π‘Žβ€«έŽπ‘ŽέέŒέŒβ€¬ 𝑑‫݋‬ άΏβ€«έπ‘šέ‹β€¬ έ‚β€«π‘šέ‹έŽβ€¬ π‘Ž π‘›β€«έˆπ‘Žπ‘šέŽέ‹β€¬ έ€έ…β€«π‘›έ‹έ…π‘‘έ‘άΎέ…έŽπ‘‘έβ€¬
  • 13. Normal: Histogram of IQ scores is close to being bell-shaped, suggests that the IQ scores are from a normal distribution. The normal quantile plot shows points that are reasonably close to a straight-line pattern. It is safe to assume that these IQ scores are from a normally distributed population.
  • 14. Uniform: Histogram of data having a uniform distribution. The corresponding normal quantile plot suggests that the points are not normally distributed because the points show a systematic pattern that is not a straight-line pattern. These sample values are not from a population having a normal distribution.
  • 15. Skewed: Histogram of the amounts of rainfall in Boston for every Monday during one year. The shape of the histogram is skewed, not bell-shaped. The corresponding normal quantile plot shows points that are not at all close to a straight-line pattern. These rainfall amounts are not from a population having a normal distribution.
  • 16. The table to the right includes a random sample of heights of mothers, fathers, and their daughters (based on data from the National Health and Nutrition Examination). Find the multiple regression equation in which the response (y) variable is the height of a daughter and the predictor (x) variables are the height of the mother and height of the father.
  • 17. The StatCrunch results are shown here: From the display, we see that the multiple regression equation is: π·π‘Žπ‘’π‘”β„Žπ‘‘π‘’π‘Ÿ = 7.5 + 0.707π‘€π‘œπ‘‘β„Žπ‘’π‘Ÿ + 0.164 πΉπ‘Žπ‘‘β„Žπ‘’π‘Ÿ We could write this equation as: 𝑦 = 7.5 + 0.707π‘₯1 + 0.164π‘₯2 where 𝑦 is the predicted height of a daughter, π‘₯1 is the height of the mother, and π‘₯2 is the height of the father.
  • 18. The preceding technology display shows the adjusted coefficient of determination as R-Sq(adj) = 63.7%. When we compare this multiple regression equation to others, it is better to use the adjusted R2 of 63.7%
  • 19. Based on StatCrunch, the p-value is less than 0.0001, indicating that the multiple regression equation has good overall significance and is usable for predictions. That is, it makes sense to predict the heights of daughters based on heights of mothers and fathers. The p-value results from a test of the null hypothesis that Ξ²1 = Ξ²2 = 0, and rejection of this hypothesis indicates the equation is effective in predicting the heights of daughters.
  • 20. Data Set 2 in Appendix B includes the age, foot length, shoe print length, shoe size, and height for each of 40 different subjects. Using those sample data, find the regression equation that is the best for predicting height. The table on the next slide includes key results from the combinations of the five predictor variables.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. Using critical thinking and statistical analysis: 1. Delete the variable age. 2. Delete the variable shoe size, because it is really a rounded form of foot length. 3. For the remaining variables of foot length and shoe print length, select foot length because its adjusted R2 of 0.7014 is greater than 0.6520 for shoe print length. 4. Although it appears that only foot length is best, we note that criminals usually wear shoes, so shoe print lengths are likely to be found than foot lengths. Hence, the final regression equation only including foot length: 𝑦 = 𝛽0 + 𝛽1 π‘₯1 where 𝛽0 is the intercept, 𝛽1 is the coefficient corresponding to x1 variable (foot length).
  • 30. The methods of the above section (Multiple Linear Regression) rely on variables that are continuous in nature. Many times we are interested in dichotomous or binary variables. These variables have only two possible categorical outcomes such as male/female, success/failure, dead/alive, etc. Indicator or dummy variables are artificial variables that can be used to specify the categories of the binary variable such as 0=male/1=female. If an indicator variable is included in the regression model as a predictor/explanatory variable, the methods we have are appropriate. HOWEVER, can we handle a situation when the variable we are trying to predict is categorical and/or binary? Notice that this is a different situation. But, YES!!
  • 31. The data in the table also includes the dummy variable of sex (coded as 0 = female and 1 = male). Given that a mother is 63 inches tall and a father is 69 inches tall, find the regression equation and use it to predict the height of a daughter and a son.
  • 32. Using technology, we get the regression equation: π»π‘’π‘–π‘”β„Žπ‘‘ π‘œπ‘“ πΆβ„Žπ‘–π‘™π‘‘ = 25.6 + 0.377 π»π‘’π‘–π‘”β„Žπ‘‘ π‘œπ‘“ π‘€π‘œπ‘‘β„Žπ‘’π‘Ÿ + 0.195 π»π‘’π‘–π‘”β„Žπ‘‘ π‘œπ‘“ πΉπ‘Žπ‘‘β„Žπ‘’π‘Ÿ + 4.15(𝑠𝑒π‘₯) We substitute in 0 for the sex variable, 63 for the mother, and 69 for the father, and predict the daughter will be 62.8 inches tall. We substitute in 1 for the sex variable, 63 for the mother, and 69 for the father, and predict the son will be 67 inches tall.