We present an overview of regression analysis, theoretical construct, then provide a graphic representation before performing multiple regression analysis step by step using SPSS (audio files accompany the tutorial).
With a constant beta zero, but a different beta, the beta gives the direction of the line (up, horizontal, or down). A different beta zero but a constant beta has the lines going the same direction, but on different parts of the graph (different betas).
Each beta tells us about the relationship of the predictor and the outcome.
Click on file> open a data document> We are looking at female life expectancy, which is Female Life Expectancy (column 6) and Literacy Rate of the The Country by column 8
Before beginning any statistical procedure, it is always a good idea to run a scatter plot. First Go to Graphs Legacy dialogs, scatter dot, click define,\\
And add Female Life Expectancy on the Y axis and Literacy rate on the X-axis and click OK.
And there is a robust uphill pattern, with A big group of countries on the right, high literacy rates and high average female expectancy. But since We are doing a numerical running of regression, we might as well run a graphical regression line
Double click onto the graph and you’ll get these windows, click onto the button (see the pink arrow) and you’ll get the line and the r squared .Close the Chart Editor and both windows will close and you’ll get only the actual graph with the line and r-squared.A measure of how well the data fit closely to the line. 75% means that if we know the percentage of the people who can read, you can measureAccurately predict 75 percent of the variants in the female life expectancy.
From this output, we see that there was one dependent variable and independent variable. Model summary how well this particular regression predicts the outcome variable. It has a our squared that we saw earlier , at 75%, rounded up from .747, which is very good. Again, the closer to 1 is better. The analysis of variants also known as ANOVA shows how well the model the slope and the intercept model fits the data… again it is a very good fit… the f value is 313 and the sig value is less than .001.
Coefficients… the constant is the intercept for the regression line… 38.5 if the predictor was zero, the women would have the avg expectancy of 38.5 yearsThe people who read percent SLOPE.. For every percentage point increase of literacy, you can expect a .4 (4/10) of a year increase in women’s life expectancy… pretty large…
For Multiple Linear Regression, we want to look at the association four variables TOGETHER to predict life expectancy for women.Analyze>Regression> Linear format to run a simple multiple linear regression…Now that we have already run a scatter plot on two variables, we don’t have to run another one.We’ll just add the additional independent variables which are literacy rate (which was an independent variable), the GDP, the daily caloric intake, and birth rate per 1000 as predictor variable of female life expectancy and CLICK OK
The typing on top is a command that lists the the specific running of this regression.Variables. .note the multiple variables entered and the single dependent variable (female life expectancy)
The model summary shows these variables predict life expectancy VERY WELL…The capital R is looking at the association of all the variables together. The max value is 1, and this value is .912. When squared, the number is .832 83 percent of the variance in female average life expectancy can be predicted by these four variables. An especially important section is the Coefficients section The constant - or the Y intercept -- when all the predictor variables are zero. The average life span is 43.7 years For each percentage point increase, there is an increase of .226 years in avg life expectancyFor each additional calorie, add .006 to the female average life expectancy. Except for the GDP,, which isn’t no longer significant… this combination only in combination with each other. Probably better to use the entire model to predict life expectancy
While we have seen in simple regression that literacy is a an important variable in female life expectancy.In multiple Linear regression we see that using additional variables can turn out to be key variables in combination with others to help us get closer to the regression line and predict a more accurate outcome.
Regression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis CMGT 587AUNIVERSITY OF SOUTHERN CALIFORNIA AL ARIZMENDEZ/CATHRYN LOTTIER
What is Regression Analysis? The Regression Method, more commonly referred to as Regression Analysis, is the assessment of the relationship of a dependent variable and one or more multiple independent variable(s). It involves techniques for measuring or analyzing multiple variables and their relationship This technique is used to analyze variables with at least one dependent variable (often y) and one or multiple independent variables (often x) to understand a phenomena, make predictions, and/or test hypotheses
Assumptions Underlying the Method The validity of regression analysis depends on four assumptions: Linearity: where the relationship between dependent and independent variables are directly proportional to each other Independence: an independence of errors with no serial correlation (a random value of Y is assumed to be independent of any other value of Y) Constant variance: having your data values be scattered to the same extent Normality: the random variable of interest is distributed is a normal manner
When can you use Regression Analysis? Regression Analysis is used to make predictions, so it can virtually be used by anyone Some reasons that you may want to use regression analysis are: To model a phenomena to understand it better in order to make decisions To model a phenomena to understand it better to predict values for that in other places or times (later in these slides, you will see an example of this as we created an example to forecast album sales) To test a hypotheses, but one should note that regression analysis is an estimate or guess, not an accurate data set (we will show an example of this later in the slides with our test of life expectancy vs. literary rates)
Diving a Little Deeper… Multiple linear regression analysis begins by positing the general form of the relationship in the following model: ϒi = β0 + β1Χi1 + εi More simply put: Outcomei = (b0 + b1xi) + errori Where Y is the dependent variable, β0 is the intercept, β1 is the slope and Χi1 is the independent variable The ε is the residual term, which expresses the composite of all the other types of individual differences that aren’t explicitly identified in the model (a.k.a. random error term)…a reminder that it will never be perfect
What does that really mean? That equation means that the “outcome” can be predicted from a model and some error associated with that prediction (εi) The outcome variable is represented as yi, which is predicted using a predictor variable (xi) and a parameter (bi) associated with the predictor variable Bi is the line the direction or strength of the relationship or effect B0 tells us what the value of the outcome is when the predictor is 0 (the intercept) The betas tell us what the shape of the model is and what it looks like
Explanation of R Squared R2 allows one to assess how well the model fits If you square all of the differences, the sum of all the squared differences is known as the total sum of squares (SST ) If an optimal model is fitted to the data, the differences between the observed data points and the values predicted by the regression line can be squared and summed, which is referred to as the sum of squared residuals (SSR) The difference between SST and SSR is the model sum of squares (SSM) R2 is determined by dividing the model sum of squares by the total sum of squares, which is used to describe how well the regression line fits An R2 near 1 indicates that a regression line fits the data well, while an R2 closer to 0 indicates a regression line does not fit the data very well
Example of Regression Analysis Regression Analysis can be used to forecast the trend of album sales (shown on the y-axis) in relation to the advertising budget (shown on the x-axis)
Adding Another Variable to the Equation Now, taking it one step further and adding amount of radio play to the equation This turns into multiple regression analysis with more predictors creating a regression plane (or a 3d model) with the line turning into a plane It looks more complicated, but the principles remain the same as linear regression
Explanation of Multiple Regression Analysis Multiple Regression Analysis Often referred to as OLS (Ordinary Least Squares) regression “multiple regression can establish whether a set of independent variables explains a proportion of the variance in a dependent variable at a significant level (through a significance test of R2)” (Garson, 2012, p. 10) It can also determine the relative predictive importance of the independent variable (by comparing regression weights, also known as beta weights)
Multiple Regression Analysis While the formula for linear regression analysis looks like this: ϒi = β0 + β1Χ1i + εi Multiple regression analysis looks more like this: ϒi = (β0 + β1Χ1i+ β2Χ2i…+ βnΧni) + εi This shows that the principles are the same aslinear regression, there are just more predictors!
Talking About the Betas The betas tell the relationship between a particular predictor and the outcome The betas also define the shape of the plane In this instance: the beta 0 is represent where the plane hits the y-axis (value of the outcome when both predictors are zero) b1 represents the slope of the side associated with radio play b2 represents the slope of the side associated with advertising budget This can go on for multiple dimensions with each of the predictors defining the shape
Simple Linear Regression w/ SPSS Life Expectancy of Females (dependent variable) Literacy of country in percent (independent variable)