This document discusses multiple linear regression analysis. It begins by defining a multiple regression equation that describes the relationship between a response variable and two or more explanatory variables. It notes that multiple regression allows prediction of a response using more than one predictor variable. The document outlines key elements of multiple regression including visualization of relationships, statistical significance testing, and evaluating model fit. It provides examples of interpreting multiple regression output and using the technique to predict outcomes.
Introduces and explains the use of multiple linear regression, a multivariate correlational statistical technique. For more info, see the lecture page at http://goo.gl/CeBsv. See also the slides for the MLR II lecture http://www.slideshare.net/jtneill/multiple-linear-regression-ii
Introduces and explains the use of multiple linear regression, a multivariate correlational statistical technique. For more info, see the lecture page at http://goo.gl/CeBsv. See also the slides for the MLR II lecture http://www.slideshare.net/jtneill/multiple-linear-regression-ii
Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables.
It introduces the reader to the basic concepts behind regression - a key advanced analytics theory. It describes simple and multiple linear regression in detail. It also talks about some limitations of linear regression as well. Logistic regression is just touched upon, but not delved deeper into this presentation.
Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables.
It introduces the reader to the basic concepts behind regression - a key advanced analytics theory. It describes simple and multiple linear regression in detail. It also talks about some limitations of linear regression as well. Logistic regression is just touched upon, but not delved deeper into this presentation.
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
Β
This lecture provides an overview of some modern regression techniques including a discussion of the bias variance tradeoff for regression errors and the topic of shrinkage estimators. This leads into an overview of ridge regression, LASSO, and elastic nets. These topics will be discussed in detail and we will go through the calibration/diagnostics and then conclude with a practical example highlighting the techniques.
This article provides a brief discussion on several statistical parameters that are most commonly used in any measurement and analysis process. There are a plethora of such parameters but the most important and widely used are briefed in here.
Factor Extraction method in factor analysis with example in R studio.pptxGauravRajole
Β
In this ppt information about factor analysis is given which is part of multivariate analysis. detail description is given about the factor extraction method, a test of the sufficiency of factor numbers, Interpretation of factors, factor score, rotation of factors, orthogonal rotation methods, varimax rotation, Oblique Rotation, and an example of factor analysis in R-studio.
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerAvjinder (Avi) Kaler
Β
Simulation, a key tool in understanding complex systems, offers a dynamic representation to analyze and enhance resource allocation, preventing issues like congestion and delays.
DBSCAN stands for Density-Based Spatial Clustering for Applications with Noise. This is an unsupervised clustering algorithm which is used to find high-density base samples to extend the clusters
Sql tutorial for select, where, order by, null, insert functionsAvjinder (Avi) Kaler
Β
Sql tutorial for select, where, order by, null, insert functions. SQL is a standard language for storing, manipulating and retrieving data in databases.
Association mapping identifies loci for canopy coverage in diverse soybean ge...Avjinder (Avi) Kaler
Β
Rapid establishment of canopy coverage decreases
soil evaporation relative to transpiration improves
water use efficiency and light interception, and increases
soybean competitiveness against weeds.
Genome-wide association mapping of canopy wilting in diverse soybean genotypesAvjinder (Avi) Kaler
Β
Genome-wide association analysis identified 61 SNP markers for canopy wilting, which likely tagged 51 different loci. Based on the allelic effects of the significant SNPs, the slowest and fastest wilting genotypes were identified.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
Β
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
Β
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
Β
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Palestine last event orientationfvgnh .pptxRaedMohamed3
Β
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
How to Create Map Views in the Odoo 17 ERPCeline George
Β
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as βdistorted thinkingβ.
2. We will look at a method for analyzing a linear relationship involving
more than two variables.
We focus on these key elements:
1. Finding the multiple regression equation.
2. The values of the adjusted R2, and the p-value as measures of
how well the multiple regression equation fits the sample data.
3. β’ Multiple Regression Equation β given a collection of sample data with
several (πβππππ¦) explanatory variables, the regression equation that
algebraically describes the relationship between the response variable π¦
and two or more explanatory variables π₯1, π₯2, β¦ π₯ π and is:
π¦ = π0 + π1 π₯1 + π2 π₯2 + β― + π π π₯ π
β’ We are using more than one explanatory variable to predict a response variable now
β’ In practice, you need large amounts of data to use several predictor/explanatory
variables
* Guideline: Your sample size should be 10 times larger than the number of π₯ variables*
β’ Multiple Regression Line β the graph of the multiple regression equation
β’ This multiple regression line still fits the sample points best according to the least squares
property
4. β’ Visualization β multiple
scatterplots of each pair
(π₯ π, π¦) of quantitative data
can still be helpful in
determining whether there
is a relationship between
two variables
β’ These scatterplots can be
created one at a time.
However, it is common to
visualize all the pairs of
variables within one plot.
This is often called a pairs
plot, pairwise scatterplot
or scatterplot matrix.
5. Population Parameter Sample Statistic
Equation π¦ = π½0 + π½1 π₯1 + π½2 π₯2 + β― + π½ π π₯ π π¦ = π0 + π1 π₯1 + π2 π₯2 + β― + π π π₯ π
Note:
β’ π¦ is the predicted value of π¦
β’ π is the number of predictor variables (also called independent
variables or π₯ variables)
6. β’ Requirements for Regression:
1. The sample data is a Simple Random Sample of quantitative data
2. Each of the pairs of data (π₯ π, π¦) has a bivariate normal distribution
(recall this definition)
3. Random errors associated with the regression equation (i.e. residuals)
are independent and normally distributed with a mean of 0 and a
standard deviation π
β’ Formulas for π π:
β’ Statistical software will be used to calculate the individual coefficient
estimates, π π
7. 1. Use common sense and practical considerations to include or
exclude variables
2. Consider the P-value for the test of overall model significance
β’ Hypotheses:
π»0: π½1 = π½2 = β― = π½ π = 0
π»1: π΄π‘ ππππ π‘ πππ π½ π β 0
β’ Test Statistic: πΉ =
ππ π πππππ π πππ
ππ(πΈππππ)
β’ This will result in an ANOVA table with a p-value that expresses the overall
statistical significance of the model
8. 3. Consider equations with high adjusted πΉ π values
β’ π is the multiple correlation coefficient that describes the correlation
between the observed π¦ values and the predicted π¦ values
β’ π 2
is the multiple coefficient of determination and measures how well the
multiple regression equation fits the sample data
β’ Problems: This measure of model βfitnessβ increases as more variables are
included until it can usually raise no more or only by a very little amount no
matter how significant the most recently added predictor variable may be
β’ Adjusted π 2
is the multiple coefficient of determination that is modified to
account for the number of variables in the model and the sample size
9. 4. Consider equations with the fewest number of predictor/explanatory
variables if models that are being compared are nearly equivalent in
terms of significance and fit (i.e. p-value and adjusted π 2)
β’ This is known as the βLaw of Parsimonyβ
β’ We are looking for the simplest yet most informative model
β’ Individual t-tests of particular regression parameters may help select the
correct model and eliminate insignificant explanatory variables
Notice: If the regression equation does not appear to be useful for predictions,
the best predicted value of a π¦ variable is still its point estimate [i.e. the sample
mean of the π¦ variable would be the best predicted value for that variable]
10. β’ Identify the response and potential explanatory variables by
constructing a scatterplot matrix
β’ Create a multiple regression model
β’ Perform the appropriate tests of the following:
β’ Overall model significance (the ANOVA i.e. the πΉ test)
β’ Individual variable significance (π‘ tests)
β’ In addition, find the following:
β’ Find the adjusted π 2 value to assess the predictive power of the model
11. β’ Perform a Residual Analysis to verify the Requirements for Linear
Regression have been satisfied:
1. Construct a residual plot and verify that there is no pattern (other than a
straight line pattern) and also verify that the residual plot does not
become thicker or thinner
β’ Examples are shown below:
12. 2. Use a histogram, normal quantile plot, or Shapiro Wilk test of normality
to confirm that the values of the residuals have a distribution that is
approximately normal
β’ Normal Quantile Plot (aka QQ Plot) * Examples on the next 3 slides *
β’ Shapiro Wilk Normality Test
β’ This will help you assess the normality of a given set of data (in this case, the
normality of the residuals) when the visual examination of the QQ Plot and/or
the histogram of the data seem unclear to you and leave you stumped!
β’ Hypotheses:
H0: Thέ έππ‘π άΏβ«έέπέβ¬ έβ«πέέβ¬ π πβ«έππέέβ¬ έέ β«πέέ π‘έάΎέ έπ‘έβ¬
H1: Thέ έππ‘π έβ«έέέβ¬ πβ«π‘έβ¬ πβ«έπέέέβ¬ π‘β«έβ¬ άΏβ«έπέβ¬ έβ«πέέβ¬ π πβ«έππέέβ¬ έέ β«πέέ π‘έάΎέ έπ‘έβ¬
13. Normal: Histogram of IQ scores is close to being bell-shaped, suggests that the IQ
scores are from a normal distribution. The normal quantile plot shows points that are
reasonably close to a straight-line pattern. It is safe to assume that these IQ scores
are from a normally distributed population.
14. Uniform: Histogram of data having a uniform distribution. The corresponding
normal quantile plot suggests that the points are not normally distributed because
the points show a systematic pattern that is not a straight-line pattern. These
sample values are not from a population having a normal distribution.
15. Skewed: Histogram of the amounts of rainfall in Boston for every Monday during
one year. The shape of the histogram is skewed, not bell-shaped. The
corresponding normal quantile plot shows points that are not at all close to a
straight-line pattern. These rainfall amounts are not from a population having a
normal distribution.
16. The table to the right includes a random
sample of heights of mothers, fathers, and their
daughters (based on data from the National
Health and Nutrition Examination).
Find the multiple regression equation in which
the response (y) variable is the height of a
daughter and the predictor (x) variables are
the height of the mother and height of the
father.
17. The StatCrunch results are shown here:
From the display, we see that the multiple
regression equation is:
π·ππ’πβπ‘ππ = 7.5 + 0.707πππ‘βππ + 0.164 πΉππ‘βππ
We could write this equation as:
π¦ = 7.5 + 0.707π₯1 + 0.164π₯2
where π¦ is the predicted height of a
daughter,
π₯1 is the height of the mother, and π₯2 is the
height of the father.
18. The preceding technology display shows the adjusted coefficient of
determination as R-Sq(adj) = 63.7%.
When we compare this multiple regression equation to others, it is better
to use the adjusted R2 of 63.7%
19. Based on StatCrunch, the p-value is less than 0.0001, indicating that the
multiple regression equation has good overall significance and is usable
for predictions.
That is, it makes sense to predict the heights of daughters based on heights
of mothers and fathers.
The p-value results from a test of the null hypothesis that Ξ²1 = Ξ²2 = 0, and
rejection of this hypothesis indicates the equation is effective in predicting
the heights of daughters.
20. Data Set 2 in Appendix B includes the age, foot length, shoe print length,
shoe size, and height for each of 40 different subjects.
Using those sample data, find the regression equation that is the best for
predicting height.
The table on the next slide includes key results from the combinations of
the five predictor variables.
21.
22.
23.
24.
25.
26.
27.
28.
29. Using critical thinking and statistical analysis:
1. Delete the variable age.
2. Delete the variable shoe size, because it is really a rounded form of foot length.
3. For the remaining variables of foot length and shoe print length, select foot length
because its adjusted R2 of 0.7014 is greater than 0.6520 for shoe print length.
4. Although it appears that only foot length is best, we note that criminals usually wear
shoes, so shoe print lengths are likely to be found than foot lengths.
Hence, the final regression equation only including foot length:
π¦ = π½0 + π½1 π₯1
where π½0 is the intercept, π½1 is the coefficient corresponding to x1 variable (foot length).
30. The methods of the above section (Multiple Linear Regression) rely on variables
that are continuous in nature. Many times we are interested in dichotomous or
binary variables.
These variables have only two possible categorical outcomes such as
male/female, success/failure, dead/alive, etc.
Indicator or dummy variables are artificial variables that can be used to specify
the categories of the binary variable such as 0=male/1=female.
If an indicator variable is included in the regression model as a
predictor/explanatory variable, the methods we have are appropriate.
HOWEVER, can we handle a situation when the variable we are trying to predict
is categorical and/or binary? Notice that this is a different situation.
But, YES!!
31. The data in the table also includes
the dummy variable of sex (coded
as 0 = female and 1 = male).
Given that a mother is 63 inches tall
and a father is 69 inches tall, find the
regression equation and use it to
predict the height of a daughter and
a son.
32. Using technology, we get the regression equation:
π»πππβπ‘ ππ πΆβπππ = 25.6 + 0.377 π»πππβπ‘ ππ πππ‘βππ + 0.195 π»πππβπ‘ ππ πΉππ‘βππ + 4.15(π ππ₯)
We substitute in 0 for the sex variable, 63 for the mother, and 69 for the
father, and predict the daughter will be 62.8 inches tall.
We substitute in 1 for the sex variable, 63 for the mother, and 69 for the
father, and predict the son will be 67 inches tall.