ders 3.2 Unit root testing section 2 .pptxErgin Akalpler
The document provides information about several theoretical probability distributions including the normal, t, and chi-square distributions. It discusses their key properties and formulas. For the normal distribution, it covers the empirical rule, skewness, kurtosis, and how to calculate z-scores. Examples are given for finding areas under the normal curve and performing hypothesis tests using the t and chi-square distributions.
The document provides information about several theoretical probability distributions including the normal, t, and chi-square distributions. It discusses key properties such as the mean, standard deviation, and shape of the normal distribution curve. Examples are given to demonstrate how to calculate areas under the normal distribution curve and find z-scores. The t-distribution is introduced as similar to the normal but used for smaller sample sizes. The chi-square distribution is defined as used for hypothesis testing involving categorical data.
This document discusses the normal distribution curve, also called the bell curve or normal curve. It describes several key properties of the normal distribution including that it is symmetrical around the mean, the area under the curve sums to 1, and most values cluster around the mean. The normal distribution is important because many natural phenomena and psychological variables follow this pattern. Statistical tests often assume a normal distribution of data, and the empirical rule can be used to determine what percentage of values fall within a given number of standard deviations from the mean for a normal distribution. The document provides guidance on checking if a dataset follows a normal distribution.
1. The standard deviation is a measure of how spread out numbers are from the average value.
2. It is calculated by taking the square root of the variance, which is the average of the squared differences from the mean.
3. When only a sample of data is available rather than the entire population, the sample standard deviation is estimated using N-1 in the denominator rather than N to reduce bias, though some bias still remains for small samples.
MSL 5080, Methods of Analysis for Business Operations 1 .docxmadlynplamondon
MSL 5080, Methods of Analysis for Business Operations 1
Course Learning Outcomes for Unit III
Upon completion of this unit, students should be able to:
2. Distinguish between the approaches to determining probability.
3. Contrast the major differences between the normal distribution and the exponential and Poisson
distributions.
Reading Assignment
Chapter 2: Probability Concepts and Applications, pp. 32–48
Unit Lesson
Mathematical truths provide us several useful means to estimate what will happen based on factors that are
given or researched. After becoming familiar with the idea of probability, one can see how mathematics make
applications in government and business possible.
Probability Distributions
To look at probability distributions, one should define a random variable as an unknown that could be any
real number, including decimals or fractions. Many problems in life have real numbers of any value of a whole
number and fraction or decimal as the value of the random variable amount. Discrete random variables will
have a certain limited range of values, and continuous random variables may have an infinite range of
possible values. These continuous random variables could be any value at all (Render, Stair, Hanna, &
Hale, 2015).
One true tendency is that events that occur in a group of trials tend to cluster around a middle point of values
as the most occurring, or highest probabilities they will occur. They then taper off to one or both sides as there
are lower probabilities that the events will be very low from the middle (or zero) and very high from the middle.
This middle point is called the mean or expected value E(X):
n
E(X) = ∑ Xi P(Xi)
i=1
Where Xi is the random variable value, and the summation sign ∑ with n and i=1 means you are adding all n
possible values (Render et al., 2015).
The sum of these events can be shown as graphs. If the random variable has a discrete probability
distribution (e.g., cans of paint that can be sold in a day), then the graph of events may look like this:
UNIT III STUDY GUIDE
Binomial and Normal Distributions
MSL 5080, Methods of Analysis for Business Operations 2
UNIT x STUDY GUIDE
Title
The bar heights show the probability for any X (or, P(X) ) along the y-axis, given the discrete number for X
along the x-axis and no fractions for discrete variables (no half-cans of paint).
The variance (σ2) is the spread of the distribution of events in a probability distribution (Render et al., 2015).
The variance is interesting because a small variance may indicate that the event value will most likely be near
the mean most of the time, and a large variance may show that the mean is not all that reliable a guide of
what the event values will be, as the sp.
This document discusses hypothesis testing using z- and t-tests. It begins by introducing key concepts like sampling distributions and the central limit theorem. It explains that as sample size increases, the sampling distribution of the mean approaches a normal distribution, even if the population is not normally distributed. It then provides an example to illustrate these concepts using a small population. The document discusses how the central limit theorem can be used to determine if a sampling distribution is approximately normal. It also explains that the rule of needing a sample size of 30 refers to approximating the t-distribution with the normal distribution, not the sampling distribution itself. Finally, it works through an example problem using a sampling distribution to solve a hypothesis test with a z-score.
Transportation and logistics modeling 2karim sal3awi
This document discusses statistical concepts like variability, random experiments, descriptive statistics, probability distributions, and statistical data analysis. It provides examples of different probability distributions like binomial, Poisson, normal, exponential, and Weibull distributions. It also discusses the four basic steps of statistical data analysis: defining the problem, collecting the data, analyzing the data, and reporting results. Methods like hypothesis testing are discussed as part of data analysis.
ders 3.2 Unit root testing section 2 .pptxErgin Akalpler
The document provides information about several theoretical probability distributions including the normal, t, and chi-square distributions. It discusses their key properties and formulas. For the normal distribution, it covers the empirical rule, skewness, kurtosis, and how to calculate z-scores. Examples are given for finding areas under the normal curve and performing hypothesis tests using the t and chi-square distributions.
The document provides information about several theoretical probability distributions including the normal, t, and chi-square distributions. It discusses key properties such as the mean, standard deviation, and shape of the normal distribution curve. Examples are given to demonstrate how to calculate areas under the normal distribution curve and find z-scores. The t-distribution is introduced as similar to the normal but used for smaller sample sizes. The chi-square distribution is defined as used for hypothesis testing involving categorical data.
This document discusses the normal distribution curve, also called the bell curve or normal curve. It describes several key properties of the normal distribution including that it is symmetrical around the mean, the area under the curve sums to 1, and most values cluster around the mean. The normal distribution is important because many natural phenomena and psychological variables follow this pattern. Statistical tests often assume a normal distribution of data, and the empirical rule can be used to determine what percentage of values fall within a given number of standard deviations from the mean for a normal distribution. The document provides guidance on checking if a dataset follows a normal distribution.
1. The standard deviation is a measure of how spread out numbers are from the average value.
2. It is calculated by taking the square root of the variance, which is the average of the squared differences from the mean.
3. When only a sample of data is available rather than the entire population, the sample standard deviation is estimated using N-1 in the denominator rather than N to reduce bias, though some bias still remains for small samples.
MSL 5080, Methods of Analysis for Business Operations 1 .docxmadlynplamondon
MSL 5080, Methods of Analysis for Business Operations 1
Course Learning Outcomes for Unit III
Upon completion of this unit, students should be able to:
2. Distinguish between the approaches to determining probability.
3. Contrast the major differences between the normal distribution and the exponential and Poisson
distributions.
Reading Assignment
Chapter 2: Probability Concepts and Applications, pp. 32–48
Unit Lesson
Mathematical truths provide us several useful means to estimate what will happen based on factors that are
given or researched. After becoming familiar with the idea of probability, one can see how mathematics make
applications in government and business possible.
Probability Distributions
To look at probability distributions, one should define a random variable as an unknown that could be any
real number, including decimals or fractions. Many problems in life have real numbers of any value of a whole
number and fraction or decimal as the value of the random variable amount. Discrete random variables will
have a certain limited range of values, and continuous random variables may have an infinite range of
possible values. These continuous random variables could be any value at all (Render, Stair, Hanna, &
Hale, 2015).
One true tendency is that events that occur in a group of trials tend to cluster around a middle point of values
as the most occurring, or highest probabilities they will occur. They then taper off to one or both sides as there
are lower probabilities that the events will be very low from the middle (or zero) and very high from the middle.
This middle point is called the mean or expected value E(X):
n
E(X) = ∑ Xi P(Xi)
i=1
Where Xi is the random variable value, and the summation sign ∑ with n and i=1 means you are adding all n
possible values (Render et al., 2015).
The sum of these events can be shown as graphs. If the random variable has a discrete probability
distribution (e.g., cans of paint that can be sold in a day), then the graph of events may look like this:
UNIT III STUDY GUIDE
Binomial and Normal Distributions
MSL 5080, Methods of Analysis for Business Operations 2
UNIT x STUDY GUIDE
Title
The bar heights show the probability for any X (or, P(X) ) along the y-axis, given the discrete number for X
along the x-axis and no fractions for discrete variables (no half-cans of paint).
The variance (σ2) is the spread of the distribution of events in a probability distribution (Render et al., 2015).
The variance is interesting because a small variance may indicate that the event value will most likely be near
the mean most of the time, and a large variance may show that the mean is not all that reliable a guide of
what the event values will be, as the sp.
This document discusses hypothesis testing using z- and t-tests. It begins by introducing key concepts like sampling distributions and the central limit theorem. It explains that as sample size increases, the sampling distribution of the mean approaches a normal distribution, even if the population is not normally distributed. It then provides an example to illustrate these concepts using a small population. The document discusses how the central limit theorem can be used to determine if a sampling distribution is approximately normal. It also explains that the rule of needing a sample size of 30 refers to approximating the t-distribution with the normal distribution, not the sampling distribution itself. Finally, it works through an example problem using a sampling distribution to solve a hypothesis test with a z-score.
Transportation and logistics modeling 2karim sal3awi
This document discusses statistical concepts like variability, random experiments, descriptive statistics, probability distributions, and statistical data analysis. It provides examples of different probability distributions like binomial, Poisson, normal, exponential, and Weibull distributions. It also discusses the four basic steps of statistical data analysis: defining the problem, collecting the data, analyzing the data, and reporting results. Methods like hypothesis testing are discussed as part of data analysis.
lesson 3.1 Unit root testing section 1 .pptxErgin Akalpler
The document discusses key concepts related to the normal distribution, including its properties, formula, and uses. Some key points:
- The normal distribution is a bell-shaped curve that is symmetric around the mean. Many natural phenomena approximate it.
- It is defined by two parameters: the mean and standard deviation. Approximately 68% of values fall within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.
- The normal distribution follows a specific formula involving the mean, standard deviation, and z-scores.
- Other concepts discussed include skewness, kurtosis, the t-distribution and how it resembles the normal distribution, and
This document provides instructions for creating and interpreting stem-and-leaf plots. It explains that stem-and-leaf plots organize data into "stems" and "leaves" to display the shape, center, and spread of a distribution similar to a histogram. The document includes three examples of constructing stem-and-leaf plots from different data sets and discusses how to interpret the plots, including identifying the median. It also briefly describes how to create frequency tables from raw data.
This document provides an overview of key concepts in inferential statistics, including distributions, the normal distribution, the central limit theorem, estimators and estimates, confidence intervals, the Student's t-distribution, and formulas for calculating confidence intervals. It defines key terms and concepts, provides examples to illustrate statistical distributions and properties, and outlines the general formulas used to construct confidence intervals for different sampling situations.
Descriptive statistics are used to summarize and describe characteristics of a data set. It includes measures of central tendency like mean, median, and mode, measures of variability like range and standard deviation, and the distribution of data through histograms. Inferential statistics are used to generalize results from a sample to the population it represents through estimation of population parameters and hypothesis testing. Correlation and regression analysis are used to study relationships between two or more variables.
The document discusses estimation and confidence intervals. It explains the central limit theorem, which states that sample means will approximate a normal distribution as long as sample sizes are sufficiently large. This allows constructing confidence intervals for a population mean using z-scores. The document provides formulas for calculating confidence intervals using point estimates from sample data and outlines how to interpret the resulting confidence intervals. It notes that when the population standard deviation is unknown, a t-distribution can be used if sample sizes are large enough.
This document discusses statistical estimation and confidence intervals. It begins with an overview of the central limit theorem, which states that as sample size increases, the sampling distribution of the sample means will approximate a normal distribution. It then covers how to construct confidence intervals to estimate population parameters like the mean and proportion when the population standard deviation is both known and unknown. The document explains how the t-distribution is used when the population standard deviation is unknown and the sample size is small. It provides examples of how to calculate confidence intervals and determine sample sizes needed based on the central limit theorem.
This learning activity sheet aims to help students understand normal random variables and their characteristics. It contains activities that illustrate key properties of the normal distribution through histograms and probability graphs. A normal distribution is bell-shaped and symmetrical, with the mean, median and mode located at the center. It can be used to describe many real-world variables that are approximately normally distributed, even if not perfectly so. The activities guide students to observe these properties and correctly identify the mean, median, mode and other characteristics of normal distributions.
This document discusses continuous probability distributions, including the normal and exponential distributions. It defines a continuous random variable and probability density function. The key properties of the normal distribution are described, such as its bell-shaped, symmetric curve defined by the mean and standard deviation. Examples demonstrate how to generate random normal variables and calculate probabilities using R commands. The exponential distribution is also introduced, which has a positively skewed density curve where the mean and standard deviation are equal.
This document discusses statistical inference concepts including parameter estimation, hypothesis testing, sampling distributions, and confidence intervals. It provides examples of how to calculate point estimates, construct sampling distributions for sample means and proportions, and determine confidence intervals for population parameters using normal and t-distributions. The key concepts of statistical inference covered include parameter vs statistic, point vs interval estimation, properties of sampling distributions, and the components and calculation of confidence intervals.
This document provides summaries of key concepts related to statistical hypothesis testing and confidence intervals involving t-distributions. It discusses when to use a t-distribution versus a standard normal distribution, specifically when the population standard deviation is unknown and must be estimated from a sample. It provides examples of hypothesis tests and confidence intervals for a single population mean when the sample size is small as well as examples involving paired data. Key formulas are presented for t-tests, confidence intervals, and the t-distribution.
This document provides summaries of key concepts related to statistical hypothesis testing and confidence intervals involving t-distributions. It discusses when to use a t-distribution versus a standard normal distribution, specifically when the population standard deviation is unknown and must be estimated from a sample. It provides examples of hypothesis tests and confidence intervals for a single population mean when the sample size is small as well as examples involving paired data. Key formulas are presented for t-tests, confidence intervals, and the t-distribution.
The document discusses measurement uncertainties and how to report experimental results with associated uncertainties. It begins by explaining that any measurement without a statement of uncertainty is of limited usefulness. There are two main sources of uncertainties - systematic errors which produce consistently high or low results, and random uncertainties which produce about half high and half low results. Random uncertainties can be characterized statistically using concepts like the normal distribution and standard deviation. When reporting a measurement, both the best value (e.g. the mean) and its uncertainty (e.g. the standard deviation of multiple trials) should be provided. For derived quantities, the uncertainties in input measurements must be propagated to determine the overall uncertainty in the result.
1) The document discusses density curves and normal distributions, which are important mathematical models for describing the overall pattern of data. A density curve describes the distribution of a large number of observations.
2) It specifically covers the normal distribution and some of its key properties, including that about 68%, 95%, and 99.7% of observations fall within 1, 2, and 3 standard deviations of the mean, respectively.
3) The document shows how to work with normal distributions using techniques like standardizing data, finding areas under the normal curve using the standard normal table, and assessing normality with a normal quantile plot.
The document discusses the normal probability distribution and related concepts. It defines continuous random variables and probability density functions, and describes how the normal distribution is characterized by its mean and standard deviation. It also explains how to standardize a normal random variable to the standard normal distribution in order to use normal probability tables and find probabilities of intervals. Key steps covered include transforming values to z-scores and looking up the corresponding areas in the standard normal distribution table.
EVT can model extreme events with low probabilities of occurrence. This paper describes EVT techniques and applies them to model extreme daily returns of IBM, Ford, and Nortel stock prices. Specifically, it shows that the Generalized Pareto Distribution (GPD) can appropriately model extreme daily losses. The GPD is estimated from the data and used to calculate Value-at-Risk and Expected Shortfall risk measures for the stock returns.
- Univariate normal distribution describes the distribution of a single random variable and is characterized by its bell-shaped curve. The mean, median, and mode are equal and located at the center. Approximately 68% of the data falls within one standard deviation of the mean.
- Multivariate normal distribution describes the joint distribution of multiple random variables. It generalizes the univariate normal distribution to multiple dimensions. The variables have a consistent relationship that can be modeled as a covariance matrix.
- Examples of data that may follow a normal distribution include heights, test scores, measurement errors, and stock price changes over time. Normal distributions are widely used in statistics
Chapter 5 part1- The Sampling Distribution of a Sample Meannszakir
Mathematics, Statistics, Population Distribution vs. Sampling Distribution, The Mean and Standard Deviation of the Sample Mean, Sampling Distribution of a Sample Mean, Central Limit Theorem
This document summarizes three measures of central tendency (mean, median, mode) and dispersion (range, standard deviation). It provides examples and explanations of how to calculate each measure. It also discusses the normal distribution curve and how it is used to describe variation in natural and industrial processes. The central limit theorem is introduced, stating that as sample size increases, the distribution of sample means approaches a normal distribution, regardless of the population distribution.
1. Primary sources2. Secondary sources3. La Malinche4. Bacon’s.docxvannagoforth
1. Primary sources
2. Secondary sources
3. La Malinche
4. Bacon’s rebellion
5. Robert Carter III
6. Mesoamerica
7. Middle Passage
8. Indentured servitude
9. The Jefferson-Hemings Controversy
10. Triangular trade
11. Saint Dominique Revolt
12. Syncretism
13. Olaudah Equiano
14. Christopher Columbus
15. Columbian Moment
16. Hernan Cortes
17. Florentine Codex
18. Master Narrative of American History
19. Reconquista
20. The Paradox of Slavery
21. Indian Removal Act 1830
22. Trail of Tears
23. Treaty of Guadalupe Hidalgo
24. Niños Heroes (Heroic Children)
25. Antonio López de Santa Anna y Pérez de Lebrón
26. The Royal Africa Company
27. John Locke
28. St. Patrick’s Battalion
29. Chilam Balam
30. Popol Vuh
31. El requerimiento (The Requirement)
32. Manifest Destiny
33. Moses and Stephen F. Austin
34. Colonialism
35. Colonial Legacy
.
1. Prepare an outline, an introduction, and a summary.docxvannagoforth
The document instructs the reader to prepare a 4 page double spaced report on an attached article, including an outline, introduction, and summary, and to prepare 4 PowerPoint slides summarizing the report.
More Related Content
Similar to Non-Normally Distributed Errors In Regression Diagnostics.docx
lesson 3.1 Unit root testing section 1 .pptxErgin Akalpler
The document discusses key concepts related to the normal distribution, including its properties, formula, and uses. Some key points:
- The normal distribution is a bell-shaped curve that is symmetric around the mean. Many natural phenomena approximate it.
- It is defined by two parameters: the mean and standard deviation. Approximately 68% of values fall within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.
- The normal distribution follows a specific formula involving the mean, standard deviation, and z-scores.
- Other concepts discussed include skewness, kurtosis, the t-distribution and how it resembles the normal distribution, and
This document provides instructions for creating and interpreting stem-and-leaf plots. It explains that stem-and-leaf plots organize data into "stems" and "leaves" to display the shape, center, and spread of a distribution similar to a histogram. The document includes three examples of constructing stem-and-leaf plots from different data sets and discusses how to interpret the plots, including identifying the median. It also briefly describes how to create frequency tables from raw data.
This document provides an overview of key concepts in inferential statistics, including distributions, the normal distribution, the central limit theorem, estimators and estimates, confidence intervals, the Student's t-distribution, and formulas for calculating confidence intervals. It defines key terms and concepts, provides examples to illustrate statistical distributions and properties, and outlines the general formulas used to construct confidence intervals for different sampling situations.
Descriptive statistics are used to summarize and describe characteristics of a data set. It includes measures of central tendency like mean, median, and mode, measures of variability like range and standard deviation, and the distribution of data through histograms. Inferential statistics are used to generalize results from a sample to the population it represents through estimation of population parameters and hypothesis testing. Correlation and regression analysis are used to study relationships between two or more variables.
The document discusses estimation and confidence intervals. It explains the central limit theorem, which states that sample means will approximate a normal distribution as long as sample sizes are sufficiently large. This allows constructing confidence intervals for a population mean using z-scores. The document provides formulas for calculating confidence intervals using point estimates from sample data and outlines how to interpret the resulting confidence intervals. It notes that when the population standard deviation is unknown, a t-distribution can be used if sample sizes are large enough.
This document discusses statistical estimation and confidence intervals. It begins with an overview of the central limit theorem, which states that as sample size increases, the sampling distribution of the sample means will approximate a normal distribution. It then covers how to construct confidence intervals to estimate population parameters like the mean and proportion when the population standard deviation is both known and unknown. The document explains how the t-distribution is used when the population standard deviation is unknown and the sample size is small. It provides examples of how to calculate confidence intervals and determine sample sizes needed based on the central limit theorem.
This learning activity sheet aims to help students understand normal random variables and their characteristics. It contains activities that illustrate key properties of the normal distribution through histograms and probability graphs. A normal distribution is bell-shaped and symmetrical, with the mean, median and mode located at the center. It can be used to describe many real-world variables that are approximately normally distributed, even if not perfectly so. The activities guide students to observe these properties and correctly identify the mean, median, mode and other characteristics of normal distributions.
This document discusses continuous probability distributions, including the normal and exponential distributions. It defines a continuous random variable and probability density function. The key properties of the normal distribution are described, such as its bell-shaped, symmetric curve defined by the mean and standard deviation. Examples demonstrate how to generate random normal variables and calculate probabilities using R commands. The exponential distribution is also introduced, which has a positively skewed density curve where the mean and standard deviation are equal.
This document discusses statistical inference concepts including parameter estimation, hypothesis testing, sampling distributions, and confidence intervals. It provides examples of how to calculate point estimates, construct sampling distributions for sample means and proportions, and determine confidence intervals for population parameters using normal and t-distributions. The key concepts of statistical inference covered include parameter vs statistic, point vs interval estimation, properties of sampling distributions, and the components and calculation of confidence intervals.
This document provides summaries of key concepts related to statistical hypothesis testing and confidence intervals involving t-distributions. It discusses when to use a t-distribution versus a standard normal distribution, specifically when the population standard deviation is unknown and must be estimated from a sample. It provides examples of hypothesis tests and confidence intervals for a single population mean when the sample size is small as well as examples involving paired data. Key formulas are presented for t-tests, confidence intervals, and the t-distribution.
This document provides summaries of key concepts related to statistical hypothesis testing and confidence intervals involving t-distributions. It discusses when to use a t-distribution versus a standard normal distribution, specifically when the population standard deviation is unknown and must be estimated from a sample. It provides examples of hypothesis tests and confidence intervals for a single population mean when the sample size is small as well as examples involving paired data. Key formulas are presented for t-tests, confidence intervals, and the t-distribution.
The document discusses measurement uncertainties and how to report experimental results with associated uncertainties. It begins by explaining that any measurement without a statement of uncertainty is of limited usefulness. There are two main sources of uncertainties - systematic errors which produce consistently high or low results, and random uncertainties which produce about half high and half low results. Random uncertainties can be characterized statistically using concepts like the normal distribution and standard deviation. When reporting a measurement, both the best value (e.g. the mean) and its uncertainty (e.g. the standard deviation of multiple trials) should be provided. For derived quantities, the uncertainties in input measurements must be propagated to determine the overall uncertainty in the result.
1) The document discusses density curves and normal distributions, which are important mathematical models for describing the overall pattern of data. A density curve describes the distribution of a large number of observations.
2) It specifically covers the normal distribution and some of its key properties, including that about 68%, 95%, and 99.7% of observations fall within 1, 2, and 3 standard deviations of the mean, respectively.
3) The document shows how to work with normal distributions using techniques like standardizing data, finding areas under the normal curve using the standard normal table, and assessing normality with a normal quantile plot.
The document discusses the normal probability distribution and related concepts. It defines continuous random variables and probability density functions, and describes how the normal distribution is characterized by its mean and standard deviation. It also explains how to standardize a normal random variable to the standard normal distribution in order to use normal probability tables and find probabilities of intervals. Key steps covered include transforming values to z-scores and looking up the corresponding areas in the standard normal distribution table.
EVT can model extreme events with low probabilities of occurrence. This paper describes EVT techniques and applies them to model extreme daily returns of IBM, Ford, and Nortel stock prices. Specifically, it shows that the Generalized Pareto Distribution (GPD) can appropriately model extreme daily losses. The GPD is estimated from the data and used to calculate Value-at-Risk and Expected Shortfall risk measures for the stock returns.
- Univariate normal distribution describes the distribution of a single random variable and is characterized by its bell-shaped curve. The mean, median, and mode are equal and located at the center. Approximately 68% of the data falls within one standard deviation of the mean.
- Multivariate normal distribution describes the joint distribution of multiple random variables. It generalizes the univariate normal distribution to multiple dimensions. The variables have a consistent relationship that can be modeled as a covariance matrix.
- Examples of data that may follow a normal distribution include heights, test scores, measurement errors, and stock price changes over time. Normal distributions are widely used in statistics
Chapter 5 part1- The Sampling Distribution of a Sample Meannszakir
Mathematics, Statistics, Population Distribution vs. Sampling Distribution, The Mean and Standard Deviation of the Sample Mean, Sampling Distribution of a Sample Mean, Central Limit Theorem
This document summarizes three measures of central tendency (mean, median, mode) and dispersion (range, standard deviation). It provides examples and explanations of how to calculate each measure. It also discusses the normal distribution curve and how it is used to describe variation in natural and industrial processes. The central limit theorem is introduced, stating that as sample size increases, the distribution of sample means approaches a normal distribution, regardless of the population distribution.
Similar to Non-Normally Distributed Errors In Regression Diagnostics.docx (20)
1. Primary sources2. Secondary sources3. La Malinche4. Bacon’s.docxvannagoforth
1. Primary sources
2. Secondary sources
3. La Malinche
4. Bacon’s rebellion
5. Robert Carter III
6. Mesoamerica
7. Middle Passage
8. Indentured servitude
9. The Jefferson-Hemings Controversy
10. Triangular trade
11. Saint Dominique Revolt
12. Syncretism
13. Olaudah Equiano
14. Christopher Columbus
15. Columbian Moment
16. Hernan Cortes
17. Florentine Codex
18. Master Narrative of American History
19. Reconquista
20. The Paradox of Slavery
21. Indian Removal Act 1830
22. Trail of Tears
23. Treaty of Guadalupe Hidalgo
24. Niños Heroes (Heroic Children)
25. Antonio López de Santa Anna y Pérez de Lebrón
26. The Royal Africa Company
27. John Locke
28. St. Patrick’s Battalion
29. Chilam Balam
30. Popol Vuh
31. El requerimiento (The Requirement)
32. Manifest Destiny
33. Moses and Stephen F. Austin
34. Colonialism
35. Colonial Legacy
.
1. Prepare an outline, an introduction, and a summary.docxvannagoforth
The document instructs the reader to prepare a 4 page double spaced report on an attached article, including an outline, introduction, and summary, and to prepare 4 PowerPoint slides summarizing the report.
1. Normative moral philosophy typically focuses on the determining t.docxvannagoforth
According to Aristotle, virtues are traits of character that are good for a person to have and that are developed through habitual actions over time. Acting virtuously leads to morally correct actions. The document discusses Aristotle's view of virtue ethics and how it differs from normative moral philosophy by focusing on the character of the moral agent rather than just determining the right action. It asks how virtue ethics would analyze two different medical ethical dilemmas.
1. Paper should be 5-pages min. + 1 page works cited2. Should have.docxvannagoforth
1. Paper should be 5-pages min. + 1 page works cited
2. Should have at least 10 annotated sources (copy article onto word, highlight main point, write a few sentences about how it'll help you in writing the paper at the bottom of page)
3
. Should have an INTRO, NARRATION, ARGUMENTS, REFUTATION, CONCUSION
4. Use in-text citations and have organized mla format works cited page
SAMPLE OUTLINE
Research Paper Outline
Title: Rebellious Libya
Thesis: The United States should not get involved with Libya’s conflicts.
I.
Introduction:
A.
Start with the question, what is war? Explain briefly.
B.
Talk about the wars of the United States.
C.
What were the outcomes of some of those wars?
II.
Narration:
A.
Give some background on Libya.
B.
Explain how Col. Muammar Gaddafi became the leader of Libya
C.
Talk about why the citizens of Libya want to overthrow Gaddafi.
D.
Explain why the people feel that the United States should get involved in Libya’s conflicts.
III.
Partition:
A.
Thesis: I believe that the United States should not get involve with Libya’s conflicts.
B.
Essay Map.
1.
Cost of war.
2.
Using money in other Departments other defense.
3.
Killing innocent civilians and soldiers.
4.
Helping unknown rebels
5.
Involvement of foreign wars
IV.
Arguments:
A.
The cost of war is rising by the minute. The Obama Administration proposed a budget of $553 billion dollars for the department.
B.
Instead of spending all that money on war, we should be investing that money on health care and education.
C.
This conflict has caused the lives of many innocent civilians. NATO openly admitted to have killed innocent civilians, due to misguidance.
D.
The rebels fighting against Gaddafi are in need of military supplies. I don’t think that it is a good idea to help unknown rebels. We helped the Afghanistan rebels when they were fighting Russia. After they were victorious, they later became the “Taliban” and used those weapons to attack the US.
E.
Getting involved in foreign wars is not a good idea. The US has been involved in many foreign wars lately. These wars have been in foreign countries where Islam is the prominent religion. Libya is one of these countries. The involvement of the US in these places, builds a bad reputation worldwide and among the Muslim community.
V.
Refutation:
A.
Gaddafi’s actions against the civilians of Libya are totally wrong. Killing your own people is bad and therefore, we should help the rebels overthrow him.
B.
Gaddafi has been in power for many years. In fact, he holds the record for most years in power in a single country. This type of power can potentially lead to corruption and mistreatment of civilians.
C.
The people of Libya deserve to have democracy. They should have the right to elect their own leader.
D.
If Al Qaeda is threatening NATO and Libyan mercenaries then we should help them fight terrorism.
VI.
Conclusion:
A.
Summarize my arguments.
B.
State why we should not get involve with Libya’s conf.
1. Name and describe the three steps of the looking-glass self.2.docxvannagoforth
1. Name and describe the three steps of the 'looking-glass self'.
2. List and describe the three stages in George Mead's model of human development.
3. Piaget developed a four-stage process to explain how children develop reasoning skills. List each and give an example of one of the stages.
4. Briefly summarize the three elements of Freud's theory of personality and explain why sociologist have negative reactions to his analysis.
5. How does the mass media reinforce society's expectations of gender?
.
1. Provide an example of a business or specific person(s) that effec.docxvannagoforth
1. Provide an example of a business or specific person(s) that effectively use social media. What tools does the business or person use? How do they apply the tools effectively? Describe areas of improvement.
This assignment has to be 4 pages long, then it needs a cover page and reference page however that can not be a part of the four pages. So it would be 6 pages if you count the cover page and reference page!
.
1. Mexico and Guatemala. Research the political and economic situati.docxvannagoforth
1. Mexico and Guatemala. Research the political and economic situation of these countries and write about their peculiar circumstances.
2. Honduras, El Salvador and Panama. Research the political and economic situation of these countries and write about their peculiar circumstances.
3. Costa Rica and Nicaragua. Research the ecological and political situation of these countries and write about their peculiar circumstances.
4. Colombia and Ecuador. Research about the truths and myths about this two countries and write about your impressions on these stereotypes.
.
1. Many scholars have set some standards to judge a system for taxat.docxvannagoforth
1. Many scholars have set some standards to judge a system for taxation for its validity. How can you decide if a tax is good or bad?
You can consider these five following principles for your Discussion. What do these issues mean? How do you think they matter?
Adequacy Equity Exportability Neutrality Simplicity
What other tax revenue systems could you consider? How do you think they would be better or worse?
2. What role do taxes play in political issues?
3. What is your opinion of a flat tax as some politicians have proposed?
.
1. List and (in 1-2 sentences) describe the 4 interlocking factors t.docxvannagoforth
1. List and (in 1-2 sentences) describe the 4 interlocking factors that led to the ourbreak of world war 1
2. Explain the difference between and authoritarian regime and a totalitarian regime.
3. List and (in 1-2 sentences) describe the 5 factors that led to the ourbreak of world war 2.
.
1. Please explain how the Constitution provides for a system of sepa.docxvannagoforth
1. Please explain how the Constitution provides for a system of separation of powers and checks and balances. Provide a fully developed essay of at least 500 words, and cite sources used.
2. Describe how a bill becomes a law at the national level, in a fully developed essay of at least 500 words. Support your work with cited sources, references to Lecture Notes, or URLs where you obtained your information.
.
1. Please watch the following The Diving Bell & The Butterfly, Amel.docxvannagoforth
1. Please watch the following: The Diving Bell & The Butterfly, Amelie, The Lookout, A Single Man, Her, Little Children, and An Education and
Please respond to the films. In particular, respond to how the film develops the identity of a single character for an audience, and which you responded to (either the characters themselves or the way the film constructed the character) the most, or the least please , 10 sentence min and no plagiariasm also it has to be
followowed exactly whats written here.
PS: please dont waste my time if you will do a messy assigment, just dont send me a msg.
.
1. Most sociologists interpret social life from one of the three maj.docxvannagoforth
1. Most sociologists interpret social life from one of the three major theoretical frameworks/perspectives (conflict theory, functionalism, symbolic interactionism). Describe the major points of each one. List at least one sociologist who has been identified with each of these three theories.
2. What is the difference between basic sociology and applied sociology?
3. List and describe the eight steps of the scientific research model.
4. Discuss the importance of ethics in social research. Define what is meant by ethics.
.
1. Members of one species cannot successfully interbreed and produc.docxvannagoforth
1. Members of one species cannot successfully interbreed and produce fertile offspring with members of other species. This idea is known as
a. reproductive success.
b. punctuated evolution.
c. adaptive radiation.
d. the biological species concept.
e. geographic isolation.
2. The origin of new species, the extinction of species, and the evolution of major new features of living things are all changes that result from
a. macroevolution.
b. fitness.
c. speciation.
d. the biological species concept.
e. convergent evolution.
3. Which is a barrier that can contribute to reproductive isolation?
a. timing
b. behavior
c. habitat
d. incompatible reproductive structures
e. all of the above
4. Which of the following statements is false?
a. Horses and donkeys are separate species.
b. Two mules can mate and produce fertile offspring.
c. A horse and a donkey can mate and produce offspring.
d. Two donkeys can mate and produce fertile offspring.
e. Two horses can mate and produce fertile offspring.
5. The evolution of the penguin’s wing from a wing suited for flying to a “flipper-wing” used for swimming is an example of
a. refinement of existing adaptations.
b. reproductive isolation.
c. adaptation of existing structures to new functions.
d. inheritance of acquired characteristics.
e. the biological species concept.
6. Which of the following have been preserved as fossils?
a. dinosaur footprints
b. insects preserved in amber
c. petrified plant remains
d. animal bones
e. all of the above
7. The mass extinctions that included the dinosaurs took place during which period?
a. Cambrian (543–510 million years ago)
b. Devonian (409–363 million years ago)
c. Carboniferous (363–290 million years ago)
d. Jurassic (206–144 million years ago)
e. Cretaceous (144–65 million years ago)
8. The development of the complex, camera-like eye of a mammal is an example of
a. refinement of existing adaptations.
b. reproductive isolation.
c. adaptation of existing structures to new functions.
d. inheritance of acquired characteristics.
e. the biological species concept.
9. Which of the following statements is true?
a. Carbon-14 dating is useful for studying the age of early dinosaur fossils.
b. Carbon-14 has a half-life of 5,730 years.
c. Uranium-238 has a very short half-life.
d. Uranium-238 is present in all organisms.
e. Carbon-12 is not found in living plants.
10. Which of the following provides the best explanation for why Australia has so many organisms unique to that continent?
a. punctuated equilibrium
b. the biological species concept
c. convergent evolution
d. continental drift
e. cladistics
11. Scientists think that a meteor that fell in ____________________ may have led to the extinction of the dinosaurs.
a. Australia
b. the Yucatán peninsula
c. The Galápagos Islands
d. Pangaea
e. India
12. The great diversit.
1. Of the three chemical bonds discussed in class, which of them is .docxvannagoforth
1. Of the three chemical bonds discussed in class, which of them is simultaneously the weakest and most important for life on this planet as we know it?
2.Carbohydrates are very important sources of energy for life. Plants and arthropods also use carbohydrates as components of structures that are very important for their existence. Provide the names of the two most important carbohydrate based structures (one for plants and one for arthropods) and the carbohydrate components that are used to form them.
3._____________ _____________ are joined by ______________ bonds to form proteins.
4.Proteins can be used for several functions. Provide examples of structural and metabolic functions of proteins.
5.Describe the phosholipid bilayer of the plasma membrane. Why is this bilayer important for the formation of cells and the sequestration of chemical reactions within the cell?
.
1. Look at your diagrams for hydrogen, lithium, and sodium. What do .docxvannagoforth
1. Look at your diagrams for hydrogen, lithium, and sodium. What do they all have in common? What group are these elements in on the periodic table?
2. Look at your diagrams for fluorine and chlorine. What do they have in common?
Picture is in the link. Put answers on the word document and re-submit
.
1. Name the following molecules2. Sketch the following molecules.docxvannagoforth
1. Name the following molecules:
2. Sketch the following molecules:
3-cyclohexenone
4-ethyl 2,2,5-trimethyl 3-hexanone
ethyl butyrate
pentanoic acid
2-chloro 4-methyl 2,5-heptadienal
3,4-dichloro 4-ethyl octanal
p-chloro phenol
3-bromo 2-chloro 4-methyl hexane
3-cyclopropyl 1,2-cyclopentanediol
methyl phenyl ether
3,5-dimethyl 2-heptene-4,5-diol
3. Give two different uses for ethanol.
4. Name two categories of organic compounds (alkanes, aldehydes…) that have very strong characteristic odours.
.
1. List the horizontal and vertical levels of systems that exist in .docxvannagoforth
1. List the horizontal and vertical levels of systems that exist in organizations.
2.
Describe at least five steps involved in systems integration
3.
What is the role of ERP systems in system integration?
4. Why do you think functional silos are not appropriate for today's organization? Discuss your answer from organizational and technical perspectives.
5. Pick an organization that you know of or where you are/were working and provide examples of logical and physical integration issues that were faced by the organization when they broke the functional silos and moved to integrated systems.
.
1. Kemal Ataturk carried out policies that distanced the new Turkish.docxvannagoforth
1. Kemal Ataturk carried out policies that distanced the new Turkish republic of the 1920s from the Ottoman past. Why? What specific policies did Ataturk pursue? 2. Why many Arabs felt betrayed by the British (and the French) after the First World War? 3. Discuss at least three features of patrimonial leadership. List three or more Middle Eastern states where such type of political leadership persists 4. Describe the key processes (both internal and external) that initiated political and economic disintegration of the Ottoman Empire in the nineteenth century. 5. European military superiority in the late eighteenth century prompted Ottoman rulers to respond with what specific political measures? 6. The Zionist political movement originated in Europe rather than in the Middle East. Explain why and how. 7. After the Second World War, several Arab countries went through the process of transition from constitutional monarchies to republics. Identify three such countries and describe the course of events that brought about this transition. 8. How is religious Zionism different from secular Zionism? What is the relevance of this difference for the creation of the state of Israel? Has the relative influence of the two remained stable since the creation of the Israeli state? 9. What was the principle source of political legitimacy of the Ottoman Empire? 10. While most Ottoman European provinces, riding the tide of the nineteenth century nationalism, sought and won independence from Istanbul, Ottoman Arab provinces maintained their political loyalty to the Ottomans. What explains this difference between Arab and European provinces? 11. Social and political forces in favor of a constitutional reform in Iran (1905-1911) were markedly different from the groups that promoted constitutional limitations on executive powers of the sultan in the Ottoman Empire prior to the First World War? Explain this difference. 12. What are some of the key features of Arab socialisms? Which Arab leaders adopted socialist ideology? Which Arab leaders were opposed to it? 13. After the First World War, the new Middle Eastern protectorates (e.g., Syria, Lebanon, Iraq) were expected to develop into modern secular states. What specific policies did France and Britain try to implement? How successful have theses policies been? 14. The 1967 war was a watershed event for all major actors in the Middle East. Explain the consequences of the war for domestic politics in Israel and Egypt respectively.
.
1. If we consider a gallon of gas as having 100 units of energy, and.docxvannagoforth
1. If we consider a gallon of gas as having 100 units of energy, and 25 of those units are used to move the car, what law of thermodynamics accounts for the other 75 units of energy? (Points : 2)
the first law
the second law
2. Which of these is not a component of a molecule of adenosine triphosphate (ATP)? (Points : 3)
adenosine
phosphate
deoxyribose sugar
ribose sugar
3. Glycolysis is a sequence of ______ chemical reactions. (Points : 3)
nine
six
five
ten
4. Exergonic reactions produce products with a ___ energy level than that of the initial reactants. (Points : 3)
lower
higher
the same
5. When chemical X is reduced, which of these expressions would be an accurate representation of its reduced state? (Points : 3)
XO
XH
X
HX
6. Most enzymes are which kind of organic compound? (Points : 3)
carbohydrates
lipids
proteins
none of the above
7. The area on an enzyme where the substrate attaches is called the: (Points : 3)
active site
allosteric site
anabolic site
inactive site
8. Which of the following creatures would not be an autotroph? (Points : 3)
cactus
cyanobacteria
fish
palm tree
9. The process by which most of the world's autotrophs make their food is known as: (Points : 3)
glycolysis
photosynthesis
chemosynthesis
herbivory
10. Plants are the only organisms that use ATP for the transfer and storage of energy. (Points : 2)
True
False
11. The colors of light in the visible range (from longest wavelength to shortest) are: (Points : 3)
ROYGBIV
VIBGYOR
GRBIYV
ROYROGERS
12. Chlorophyll is a green pigment because it absorbs only the green part of the visible light spectrum. (Points : 2)
True
False
13. The photosynthetic pigment that is essential for the process to occur is: (Points : 3)
chlorophyll a
chlorophyll b
beta carotene
xanthocyanin
14. A photosystem is: (Points : 3)
a collection of hydrogen-pumping proteins
a series of electron-accepting proteins arranged in the thylakoid membrane
a collection of photosynthetic pigments arranged in a thylakoid membrane
found only in prokaryotic organisms
15. Which of these molecules is NOT a product of the Electron Transport System? (Points : 3)
ATP
Water
Pyruvate
NAD+
16. The dark reactions require all of these chemicals to proceed except: (Points : 3)
ATP
NADPH
carbon dioxide
oxygen
17. The structural unit of photosynthesis, where the photosystems are located, are called: (Points : 3)
chlorophylls
eukaryotes
stroma
thylakoids
18. Which of the following does NOT occur during the light independent process? (Points : 3)
CO2 is used to form carbohydrates
NADPH converts to NADP
ADP converts to ATP
ATP converts to ADP
19. The production of ATP that occurs in the presence of oxygen is called: (Points : 3)
aerobic respiration
anaerobic respiration
chemiosmosis
photosynthesis
20. The first stable chemical formed by the Calvin Cycle is: (Points :.
1. In 200-250 words, analyze the basic issues of human biology as th.docxvannagoforth
1. In 200-250 words, analyze the basic issues of human biology as they relate to chronic conditions and describe the interaction between disability, disease, and behavior. Examine and discuss the impact of biological health or illness on social, psychological, and physical problems from the micro, mezzo, and macro perspectives. Choose a chronic condition from those provided in your text and consider how you might feel, think, and behave differently if the condition were affecting you versus if the condition were affecting a stranger. How might you think differently about this chronic condition if it were affecting someone close to you, your neighbor, or someone in your community? Please include at least two supporting scholarly resources.
2.Our stage of life, intellectual/cognitive abilities, and sociocultural position in life, affect our perspectives and resultant behaviors about a number of conditions including cancer. Consider the information provided in the
“Introduction to the Miller Family”
document. Both Ella and Elías have been diagnosed with cancer. Ella has been fighting cancer with complementary and alternative methods with some success for many years. Elías, her grandson, is 10 years old and has recently been diagnosed with leukemia but has not yet begun treatment. Putting yourself in either Ella or Elías’s place, what might your perspective on your cancer be? Integrate how the stage of life, cognitive abilities, and sociocultural position of your chosen person impacts her/his perspective on his/her individual disease.
.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
2. Non-Normally Distributed Errors
The assumption of normally distributed errors is almost always
arbitrary. Nevertheless, the central-limit
theorem assures that under very broad conditions inference
based on the least-squares estimators is
approximately valid in all but small samples. Why, then, should
we be concerned about non-normal errors?
First, although the validity of least-squares estimation is
robust—as stated, the levels of tests and confidence
intervals are approximately correct in large samples even when
the assumption of normality is violated—the
method is not robust in efficiency: The least-squares estimator
is maximally efficient among unbiased
estimators when the errors are normal. For some types of error
distributions, however, particularly those with
heavy tails, the efficiency of least-squares estimation decreases
markedly. In these cases, the least-squares
estimator becomes much less efficient than alternatives (e.g.,
so-called robust estimators, or least-squares
augmented by diagnostics). To a substantial extent, heavy-tailed
error distributions are problematic because
they give rise to outliers, a problem that I addressed in the
previous chapter.
A commonly quoted justification of least-squares estimation—
3. called the Gauss-Markov theorem—states
that the least-squares coefficients are the most efficient
unbiased estimators that are linear functions of
the observations yi. This result depends on the assumptions of
linearity, constant error variance, and
independence, but does not require normality (see, e.g., Fox,
1984, pp. 42–43). Although the restriction to
linear estimators produces simple sampling properties, it is not
compelling in light of the vulnerability of least
squares to heavy-tailed error distributions.
Second, highly skewed error distributions, aside from their
propensity to generate outliers in the direction of
the skew, compromise the interpretation of the least-squares fit.
This fit is, after all, a conditional mean (of y
given the xs), and the mean is not a good measure of the center
of a highly skewed distribution. Consequently,
we may prefer to transform the data to produce a symmetric
error distribution.
Finally, a multimodal error distribution suggests the omission of
one or more qualitative variables mat
divide the data naturally into groups. An examination of the
distribution of residuals may therefore motivate
respecification of the model.
4. Although there are tests for non-normal errors, I shall describe
here instead graphical methods for examining
the distribution of the residuals (but see Chapter 9). These
methods are more useful for pinpointing the
character of a problem and for suggesting solutions.
Normal Quantile-Comparison Plot of Residuals
One such graphical display is the quantile-comparison plot,
which permits us to compare visually the
cumulative distribution of an independent random sample—here
of studentized residuals—to a cumulative
reference distribution—the unit-normal distribution. Note that
approximations are implied, because the
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 2 of 9
Non-Normally Distributed Errors
studentized residuals are t distributed and dependent, but
generally the distortion is negligible, at least for
moderate-sized to large samples.
To construct the quantile-comparison plot:
5. 1.
Arrange the studentized residuals in ascending order: t(1), t(1),
…, t(n). By convention, the ith
largest studentized residual, t(i), has gi = (i - 1/2)/n proportion
of the data below it. This convention
avoids cumulative proportions of zero and one by (in effect)
counting half of each observation
below and half above its recorded value. Cumulative
proportions of zero and one would be
problematic because the normal distribution, to which we wish
to compare the distribution of the
residuals, never quite reaches cumulative probabilities of zero
or one.
2.
Find the quantile of the unit-normal distribution that
corresponds to a cumulative probability of gi
— that is, the value zi from Z ∼ N(0, 1) for which Pr(Z < zi) =
gi.
3.
Plot the t(i) against the zi.
If the ti were drawn from a unit-normal distribution, then,
within the bounds of sampling error, t(i) = zi.
6. Consequently, we expect to find an approximately linear plot
with zero intercept and unit slope, a line that can
be placed on the plot for comparison. Nonlinearity in the plot,
in contrast, is symptomatic of non-normality.
It is sometimes advantageous to adjust the fitted line for the
observed center and spread of the residuals. To
understand how the adjustment may be accomplished, suppose
more generally that a variable X is normally
distributed with mean μ. and variance ζ2. Then, for an ordered
sample of values, approximately x(i) = μ +
ζzi, where zi is defined as before. In applications, we need to
estimate μ and μ, preferably robustly, because
the usual estimators—the sample mean and standard deviation—
are markedly affected by extreme values.
Generally effective choices are the median of x to estimate μ
and (Q3 - Q1)/1.349 to estimate ζ, where Q1 and
Q3 are, respectively, the first and third quartiles of x: The
median and quartiles are not sensitive to outliers.
Note that 1.349 is the number of standard deviations separating
the quartiles of a normal distribution. Applied
to the studentized residuals, we have the fitted line (i) =
median(t) + {[Q3(t) - Q1(t)]/1.349} × zi. The normal
quantile-comparison plots in this monograph employ the more
general procedure.
7. Several illustrative normal-probability plots for simulated data
are shown in Figure 5.1. In parts a and b of
the figure, independent samples of size n = 25 and n = 100,
respectively, were drawn from a unit-normal
distribution. In parts c and d, samples of size n = 100 were
drawn from the highly positively skewed χ4
2
distribution and the heavy-tailed t2 distribution, respectively.
Note how the skew and heavy tails show up as
departures from linearity in the normal quantile-comparison
plots. Outliers are discernible as unusually large
or small values in comparison with corresponding normal
quantiles.
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 3 of 9
Non-Normally Distributed Errors
Judging departures from normality can be assisted by plotting
information about sampling variation. If the
studentized residuals were drawn independently from a unit-
normal distribution, then
8. where ϕ(zi) is the probability density (i.e., the “height”) of the
unit-normal distribution at Z = zi. Thus, zi ± 2 ×
SE(t(i)) gives a rough 95% confidence interval around the fitted
line (i) = zi in the quantile-comparison plot.
If the slope of the fitted line is taken as = (Q3 - Q1)/ 1.349
rather than 1, then the estimated standard error
may be multiplied by . As an alternative to computing standard
errors, Atkinson (1985) has suggested a
computationally intensive simulation procedure that does not
treat the studentized residuals as independent
and normally distributed.
Figure 5.1. Illustrative normal quantile-comparison plots. (a)
For a sample of n = 25 from N(0, 1). (b)
For a sample of n = 100 from N(0, 1). (c) For a sample of n -
100 from the positively skewed χ4
2. (d)
For a sample of n = 100 from the heavy-tailed t2.
Figure 5.2 shows a normal quantile-comparison plot for the
studentized residuals from Duncan's regression of
rated prestige on occupational income and education levels. The
plot includes a fitted line with two-standard-
error limits. Note that the residual distribution is reasonably
well behaved.
SAGE
9. 1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 4 of 9
Non-Normally Distributed Errors
Figure 5.2. Normal quantile-comparison plot for the studentized
residuals from the regression of
occupational prestige on income and education. The plot shows
a fitted line, based on the median
and quartiles of the fs, and approximate ±2SE limits around the
line.
Histograms of Residuals
A strength of the normal quantile-comparison plot is that it
retains high resolution in the tails of the distribution,
where problems often manifest themselves. A weakness of the
display, however, is that it does not convey a
good overall sense of the shape of the distribution of the
residuals. For example, multiple modes are difficult
to discern in a quantile-comparison plot.
Histograms (frequency bar graphs), in contrast, have poor
resolution in the tails or wherever data are
sparse, but do a good job of conveying general distributional
10. information. The arbitrary class boundaries,
arbitrary intervals, and roughness of histograms sometimes
produce misleading impressions of the data,
however. These problems can partly be addressed by smoothing
the histogram (see Silverman, 1986, or Fox,
1990). Generally, I prefer to employ stem-and-leaf displays—a
type of histogram (Tukey, 1977) that records
the numerical data values directly in the bars of the graph—for
small samples (say n < 100), smoothed
histograms for moderate-sized samples (say 100 ≤ n ≤ 1,000),
and histograms with relatively narrow bars for
large samples (say n > 1,000).
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 5 of 9
Non-Normally Distributed Errors
Figure 5.3. Stem-and-leaf display of studentized residuals from
the regression of occupational
prestige on income and education.
A stem-and-leaf display of studentized residuals from the
11. Duncan regression is shown in Figure 5.3. The
display reveals nothing of note: There is a single node, the
distribution appears reasonably symmetric, and
there are no obvious outliers, although the largest value (3.1) is
somewhat separated from the next-largest
value (2.0).
Each data value in the stem-and-leaf display is broken into two
parts: The leading digits comprise the stem;
the first trailing digit forms the leaf; and the remaining trailing
digits are discarded, thus truncating rather
than rounding the data value. (Truncation makes it simpler to
locate values in a list or table.) For studentized
residuals, it is usually sensible to make this break at the decimal
point. For example, for the residuals shown
in Figure 5.4: 0.3039 → 0 |3; 3.1345 → 3 |1; and -0.4981 → -0
|4. Note that each stem digit appears twice,
implicitly producing bins of width 0.5. Stems marked with
asterisks (e.g., 1?) take leaves 0 — 4; stems
marked with periods (e.g., 1.) take leaves 5—9. (For more
information about stem-and-leaf displays, see,
e.g., Velleman and Hoaglin [1981] or Fox [1990].)
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
13. by y' = yp. Typically, p = -2, -1, -1/2, 1/2, 2, or
3, although sometimes other powers and roots are considered.
Note that p = 1 represents no transformation.
In place of the 0th power, which would be useless because y0 =
1 regardless of the value of y, we take y' =
log y, usually using base 2 or 10 for the log function. Because
logs to different bases differ only by a constant
factor, we can select the base for convenience of interpretation.
Using the log transformation as a “zeroth
power” is reasonable, because the closer p gets to zero, the
more yp looks like the log function (formally,
limp→0[(y
p - 1)/p] = logey, where the log to the base e ≈ 2.718 is the so-
called “natural” logarithm). Finally, for
negative powers, we take y' = -yp, preserving the order of the y
values, which would otherwise be reversed.
As we move away from p = 1 in either direction, the
transformations get stronger, as illustrated in Figure 5.4.
The effect of some of these transformations is shown in Table
5.1a. Transformations “up the ladder” of powers
and roots (a term borrowed from Tukey, 1977)—that is, toward
y2—serve differentially to spread out large
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
14. SAGE Research Methods
Page 7 of 9
Non-Normally Distributed Errors
values of y relative to small ones; transformations “down the
ladder”—toward log y—have the opposite effect.
To correct a positive skew (as in Table 5.1b), it is therefore
necessary to move down the ladder; to correct a
negative skew (Table 5.1c), which is less common in
applications, move up the ladder.
I have implicitly assumed that all data values are positive, a
condition that must hold for power transformations
to maintain order. In practice, negative values can be eliminated
prior to transformation by adding a small
constant, sometimes called a “start,” to the data. Likewise, for
power transformations to be effective, the ratio
of the largest to the smallest data value must be sufficiently
large; otherwise the transformation will be too
nearly linear. A small ratio can be dealt with by using a
negative start.
In the specific context of regression analysis, a skewed error
distribution, revealed by examining the
distribution of the residuals, can often be corrected by
15. transforming the dependent variable. Although more
sophisticated approaches are available (see, e.g., Chapter 9), a
good transformation can be located by trial
and error.
Dependent variables that are bounded below, and hence that
tend to be positively skewed, often respond
well to transformations down the ladder of powers. Power
transformations usually do not work well, however,
when many values stack up against the boundary, a situation
termed truncation or censoring (see, e.g., Tobin
[1958] for a treatment of “limited” dependent variables in
regression). As well, data that are bounded both
above and below—such as proportions and percentages—
generally require another approach. For example
the logit or “log odds” transformation given by y' = log[y/(l -
y)], often works well for proportions.
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 8 of 9
Non-Normally Distributed Errors
16. TABLE 5.1 Correcting Skews by Power Transformations
Transforming variables in a regression analysis raises issues of
interpretation. I address these issues briefly
at the end of Chapter 7.
http://dx.doi.org/10.4135/9781412985604.n5
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 9 of 9
Non-Normally Distributed Errors
http://dx.doi.org/10.4135/9781412985604.n5Non-Normally
Distributed ErrorsIn: Regression Diagnostics
Nonconstant Error Variance
In: Regression Diagnostics
By: John Fox
Pub. Date: 2011
Access Date: October 16, 2019
Publishing Company: SAGE Publications, Inc.
City: Thousand Oaks
Print ISBN: 9780803939714
Online ISBN: 9781412985604
DOI: https://dx.doi.org/10.4135/9781412985604
Print pages: 49-53
18. grows larger, or there may be a systematic relationship between
error variance and a particular x. The former
situation can be detected by plotting residuals against fitted
values, and the latter by plotting residuals against
each x. It is worth noting that plotting residuals against y (as
opposed to ŷ) is generally unsatisfactory. The plot
will be tilted: There is a built-in linear correlation between e
and y, because y = ŷ + e; in fact, the correlation
between y and e is r(y, e) = . In contrast, the least-squares fit
insures that r(ŷ, e) = 0, producing a plot
that is much easier to examine for evidence of nonconstant
spread.
Because the least-squares residuals have unequal variances even
when the errors have constant variance,
I prefer plotting the studentized residuals against fitted values.
Finally, a pattern of changing spread is often
more easily discerned in a plot of |ti| or ti2 versus ŷ, perhaps
augmented by a lowess scatterplot smooth (see
Appendix A6.1); smoothing this plot is particularly useful when
the sample size is very large or the distribution
of ŷ is very uneven. An example appears in Figure 6.2.
An illustrative plot of studentized residuals against fitted values
is shown in Figure 6.1a. In Figure 6.1b,
studentized residuals are plotted against log2(3 + ŷ); by
correcting the positive skew in these ŷ values,
the second plot makes it easier to discern the tendency of the
residual spread to increase with ŷ. The
data for this example are drawn from work by Ornstein (1976)
on interlocking directorates among the 248
largest Canadian firms. The number of interlocking directorate
and executive positions maintained by the
19. corporations is regressed on corporate assets (square-root
transformed to make the relationship linear; see
Chapter 7); 9 dummy variables representing 10 industrial
classes, with heavy manufacturing serving as the
baseline category; and 3 dummy variables representing 4
nations of control, with Canada as the baseline
category. The results of the regression are given in the left-hand
columns of Table 6.1; the results shown on
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 2 of 6 Nonconstant Error Variance
the right of the table are discussed below. Note that part of the
tendency for the residual scatter to increase
with ŷ is due to the lower bound of 0 for y: Because e = y - ŷ,
the smallest possible residual corresponding to
a particular ŷ value is e = 0 - ŷ = -ŷ.
Figure 6.1. Plots of studentized residuals versus fitted values
for Ornstein's interlocking-directorate
regression. (a) t versus ŷ. (b) t versus log2(3 + ŷ). The log
transformation serves to reduce the skew
of the fitted values, making the increasing residual spread easier
to discern.
Correcting Nonconstant Error Variance
Transformations frequently serve to correct a tendency of the
20. error variance to increase or, less commonly,
decrease with the magnitude of the dependent variable: Move y
down the ladder of powers and roots if
the residual scatter broadens with the fitted values; move y up
the ladder if the residual scatter narrows.
An effective transformation may be selected by trial and error
(but see Chapter 9 for an analytic method of
selecting a variance-stabilizing transformation).
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 3 of 6 Nonconstant Error Variance
TABLE 6.1 Regression of Number of Interlocking Directorate
and Executive Positions Maintained by
248 Major Canadian Corporations on Corporate Assets, Sector,
and Nation of Control
If the error variance is proportional to a particular x, or if the
pattern of V(?i) is otherwise known up to
a constant of proportionality, then an alternative to
transformation of y is weighted-least-squares (WLS)
estimation (see Appendix A6.2). It also is possible to correct
the estimated standard errors of the least-
squares coefficients for heteroscedasticity: A method proposed
by White (1980) is described in Appendix
A6.3. An advantage of this approach is that knowledge of the
pattern of nonconstant error variance (e.g.,
increased variance with the level of y or with an x) is not
21. required. If, however, the heteroscedasticity problem
is severe, and the corrected standard errors therefore are
substantially larger than those produced by the
usual formula, then discovering the pattern of nonconstant
variance and correcting for it—by a transformation
or WLS estimation—offers the possibility of more efficient
estimation. In any event, unequal error variances
are worth correcting only when the problem is extreme—where,
for example, the spread of the errors varies
by a factor of about three or more (i.e., the error variance varies
by a factor of about 10 or more; see Appendix
A6.4).
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 4 of 6 Nonconstant Error Variance
Figure 6.2. Plot of absolute studentized residuals versus fitted
values for the square-root transformed
interlocking-directorate data. The line on the graph is a lowess
smooth using f = 0.5 and 2 robustness
iterations.
For Ornstein's interlocking-directorate regression, for example,
a square-root transformation appears to
correct the dependence of the residual spread on the level of the
dependent variable. A plot of |ti| versus ŷi
for the transformed data is given in Figure 6.2, and the
regression results appear in the right-hand columns of
22. Table 6.1. The lowess smooth in Figure 6.2 (see Appendix
A6.1) shows little change in the average absolute
studentized residuals as the fitted values increase.
The coefficients for the original and transformed regressions in
Table 6.1 cannot be compared directly,
because the scale of the dependent variable has been altered. It
is clear, however, that assets retain their
positive effect and that the nations of control maintain their
ranks. The sectoral ranks are also similar
across the two analyses, although not identical. In comparing
the two sets of results, recall that the baseline
categories for the sets of dummy regressors—Canada and heavy
manufacturing—implicitly have coefficients
of zero.
Transforming y also changes the shape of the error distribution
and alters the shape of the regression of y
on the xs. It is frequently the case that producing constant
residual variation through a transformation also
makes the distribution of the residuals more symmetric. At
times, eliminating nonconstant spread also makes
the relationship of y to the xs more nearly linear (see the next
chapter). These by-products are not necessary
consequences of correcting nonconstant error variance,
however, and it is particularly important to check data
for non-linearity following a transformation of y. Of course,
because there generally is no reason to suppose
that the regression is linear prior to transforming y, we should
check for nonlinearity even when y is not
transformed.
Finally, nonconstant residual spread sometimes is evidence for
the omission of important effects from the
model. Suppose, for example, that there is an omitted
categorical independent variable, such as regional
23. location, that interacts with assets in affecting interlocks; in
particular, the assets slope, although positive in
every region, is steeper in some regions than in others. Then the
omission of region and its interactions with
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 5 of 6 Nonconstant Error Variance
assets could produce a fan-shaped residual plot even if the
errors from the correct model have constant
spread. The detection of this type of specification error
therefore requires substantive insight into the process
generating the data and cannot rely on diagnostics alone.
http://dx.doi.org/10.4135/9781412985604.n6
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 6 of 6 Nonconstant Error Variance
http://dx.doi.org/10.4135/9781412985604.n6Nonconstant Error
VarianceIn: Regression Diagnostics
Nonlinearity
25. therefore implies that the model fails
to capture the systematic pattern of relationship between the
dependent and independent variables. For
example, a partial relationship specified to be linear may be
nonlinear, or two independent variables specified
to have additive partial effects may interact in determining y.
Nevertheless, the fitted model is frequently
a useful approximation even if the regression surface E(y) is not
precisely captured. In other instances,
however, the model can be extremely misleading.
The regression surface is generally high dimensional, even after
accounting for regressors (such as
polynomial terms, dummy variables, and interactions) that are
functions of a smaller number of fundamental
independent variables. As in the case of nonconstant error
variance, therefore, it is necessary to focus on
particular patterns of departure from linearity. The graphical
diagnostics discussed in this chapter represent
two-dimensional views of the higher-dimensional point-cloud of
observations {yi, x1i, …, xki,}. With modern
computer graphics, the ideas here can usefully be extended to
three dimensions, permitting, for example, the
detection of two-way interactions between independent
variables (Monette, 1990).
26. Residual and Partial-Residual Plots
Although it is useful in multiple regression to plot y against
each x, these plots do not tell the whole story—and
can be misleading—because our interest centers on the partial
relationship between y and each x, controlling
for the other xs, not on the marginal relationship between y and
a single x. Residual-based plots are
consequently more relevant in this context.
Plotting residuals or studentized residuals against each x,
perhaps augmented by a lowess smooth (see
Appendix A6.1), is helpful for detecting departures from
linearity. As Figure 7.1 illustrates, however, residual
plots cannot distinguish between monotone (i.e., strictly
increasing or decreasing) and nonmonotone (e.g.,
falling and then rising) nonlinearity. The distinction between
monotone and non-monotone nonlinearity is lost
in the residual plots because the least-squares fit ensures that
the residuals are linearly uncorrelated with
each x. The distinction is important, because, as I shall explain
below, monotone nonlinearity frequently can
be corrected by simple transformations. In Figure 7.1, for
example, case a might be modeled by y = β0 +
β1x
27. 2 + ?, whereas case b cannot be linearized by a power
transformation of x and might instead be dealt with
by a quadratic specification,y = β0 + β1x + β2x
2 + ?. (Case b could, however, be accommodated by a more
complex transformation of x: y = β0 + β1(x - α)
2 + ?; I shall not pursue this approach here.)
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 2 of 8
Nonlinearity
Figure 7.1. Scatterplots (a and b) and corresponding residual
plots (a' and b') in simple regression.
The residual plots do not distinguish between (a) a nonlinear
but monotone relationship, and (b) a
nonlinear, nonmonotone relationship.
In contrast to simple residual plots, partial-regression plots,
introduced in Chapter 4 for detecting influential
data, can reveal nonlinearity and suggest whether a relationship
is monotone. These plots are not always
useful for locating a transformation, however: The partial-
regression plot adjusts xj for the other xs, but it is the
28. unadjusted xj that is transformed in respecifying the model.
Partial-residual plots, also called component-plus-
residual plots, are often an effective alternative. Partial-residual
plots are not as suitable as partial-regression
plots for revealing leverage and influence.
Define the partial residual for they'th regressor as
In words, add back the linear component of the partial
relationship between y and xj to the least-squares
residuals, which may include an unmodeled nonlinear
component. Then plot e(j) versus xj. By construction,
the multiple-regression coefficient bj is the slope of the simple
linear regression of e
(j) on xj, but nonlinearity
should be apparent in the plot as well. Again, a lowess smooth
may help in interpreting the plot.
The partial-residual plots in Figure 7.2 are for a regression of
the rated prestige P of 102 Canadian
occupations (from Pineo and Porter, 1967) on the average
education (E) in years, average income (I) in
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
29. Page 3 of 8
Nonlinearity
dollars, and percentage of women (W) in the occupations in
1971. (Related results appear in Fox and
Suschnigg [1989]; cf. Duncan's regression for similar U.S. data
reported in Chapter 4.) A lowess smooth is
shown in each plot. The results of the regression are as follows:
= -6.79+ 4.19E + 0.00131I - 0.00891W (3.24) (0.39) (0.00028)
(0.0304)
R2 = 0.80 s = 7.85
Note that the magnitudes of the regression coefficients should
not be compared, because the independent
variables are measured in different units: In particular, the unit
for income is small—the dollar—and that for
education is comparatively large—the year. Interpreting the
regression coefficients in light of the units of the
corresponding independent variables, the education and income
coefficients are both substantial, whereas
the coefficient for percentage of women is very small.
There is apparent monotone nonlinearity in the partial-residual
plots for education and, much more strongly,
income (Figure 7.2, parts a and b); there also is a small apparent
30. tendency for occupations with intermediate
percentages of women to have lower prestige, controlling for
income and educational levels (Figure 7.2c).
To my eye, the patterns in the partial-residual plots for
education and percentage of women are not easily
discernible without the lowess smooth: The departure from
linearity is not great. The nonlinear patterns for
income and percentage of women are simple: In the first case,
the lowess curve opens downwards; in the
second case, it opens upwards. For education, however, the
direction of curvature changes, producing a more
complex nonlinear pattern.
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 4 of 8
Nonlinearity
Figure 7.2. Partial-residual plots for the regression of the rated
prestige of 102 Canadian occupations
on 1971 occupational characteristics: (a) education, (b) income,
and (c) percentage of women. The
31. observation index is plotted for each point. In each graph, the
linear least-squares fit (broken line) and
the lowess smooth (solid line for f = 0.5 with 2 robustness
iterations) are shown.
SOURCE: Data taken from B. Blishen, W. Carroll, and C.
Moore, personal communication; Census of Canada
(Statistics Canada, 1971, Part 6, pp. 19.1–19.21); Pineo and
Porter (1967).
Mallows (1986) has suggested a variation on the partial-residual
plot that sometimes reveals nonlinearity
more clearly: First, add a quadratic term in xj to the model,
which becomes
Then, after fitting the model, form the “augmented” partial
residual
Note that in general bj differs from the regression coefficient
for xj in the original model, which does not include
the squared term. Finally, plot e'(j) versus xj.
Transformations for Linearity
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 5 of 8
Nonlinearity
32. To consider how power transformations can serve to linearize a
monotone nonlinear relationship, examine
Figure 7.3. Here, I have plotted y = (1/5)x2 for x = 1, 2, 3, 4, 5.
By construction, the relationship can be
linearized by taking x' = x2, in which case y = (1/5)x'; or by
taking y' = , in which case y' = x. Figure
7.3 reveals how each transformation serves to stretch one of the
axes differentially, pulling the curve into a
straight line.
As illustrated in Figure 7.4, there are four simple patterns of
monotone nonlinear relationships. Each can be
straightened by moving y, x, or both up or down the ladder of
powers and roots: The direction of curvature
determines the direction of movement on the ladder; Tukey
(1977) calls this the “bulging rule.” Specific
transformations to linearity can be located by trial and error
(but see Chapter 9 for an analytic approach).
In multiple regression, the bulging rule may be applied to the
partial-residual plots. Generally, we transform
xj in preference to y, because changing the scale of y disturbs
its relationship to other regressors and
because transforming y changes the error distribution. An
33. exception occurs when similar nonlinear patterns
are observed in all of the partial-residual plots. Furthermore,
the logit transformation often helps for dependent
variables that are proportions.
Figure 7.3. How a transformation of y (a to b) or x (a to c) can
make a simple monotone nonlinear
relationship linear.
As suggested in connection with Figure 7.1b, nonmonotone
nonlinearity (and some complex monotone
patterns) frequently can be accommodated by fitting polynomial
functions in an x; quadratic specifications are
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 6 of 8
Nonlinearity
often useful in applications. As long as the model remains linear
in its parameters, it may be fit by linear least-
squares regression.
Trial-and-error experimentation with the Canadian occupational
prestige data leads to the log transformation
35. transformation of this independent variable is
not promising: Trial and error suggests that the best that we can
do is to increase R2 to 0.85 by squaring
education.
In transforming data or respecifying the functional form of the
model, there should be an interplay between
substantive and modeling considerations. We must recognize,
however, that social theories are almost never
mathematically concrete: Theory may tell us that prestige
should increase with income, but it does not specify
the functional form of the relationship.
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 7 of 8
Nonlinearity
Still, in certain contexts, specific transformations may have
advantages of interpretability. For example,
log transformations often can be given meaningful substantive
interpretation: To increase log2x by 1, for
instance, represents a doubling of x. In the respecified Canadian
36. occupational prestige regression, therefore,
doubling income is associated on average with a 9-point
increment in prestige, holding education and gender
composition constant.
Likewise, the square root of an area or cube root of a volume
can be interpreted as a linear measure of
distance or length, the inverse of the amount of time required to
traverse a particular distance is speed, and so
on. If both y and xj are log-transformed, then the regression
coefficient for x'j is interpretable as the “elasticity”
of y with respect to xj—that is, the approximate percentage of
change in y corresponding to a 1% change
in xj. In many contexts, a quadratic relationship will have a
clear substantive interpretation (in the example,
occupations with a gender mix appear to pay a small penalty in
prestige), but a fourth-degree polynomial may
not.
Finally, although it is desirable to maintain simplicity and
interpretability, it is not reasonable to distort the data
by insisting on a functional form that is clearly inadequate. It is
possible, in any event, to display the fitted
relationship between y and an x graphically or in a table, using
the original scales of the variables if they have
37. been transformed, or to describe the effect at a few strategic x
values (see, e.g., the brief description above
of the partial effect of percentage of women on occupational
prestige).
http://dx.doi.org/10.4135/9781412985604.n7
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 8 of 8
Nonlinearity
http://dx.doi.org/10.4135/9781412985604.n7NonlinearityIn:
Regression Diagnostics
Nonconstant Error Variance
In: Regression Diagnostics
By: John Fox
Pub. Date: 2011
Access Date: October 16, 2019
Publishing Company: SAGE Publications, Inc.
City: Thousand Oaks
39. problem depending on the degree to which error
variances differ. I describe graphical methods for detecting
nonconstant error variance in this chapter. Tests
for heteroscedasticity are discussed in Chapter 8 on discrete
data and in Chapter 9 on maximum-likelihood
methods.
Because the regression surface is k dimensional, and imbedded
in a space of k + 1 dimensions, it is generally
impractical to assess the assumption of constant error variance
by direct graphical examination of the data
for k larger than 1 or 2. Nevertheless, it is common for error
variance to increase as the expectation of y
grows larger, or there may be a systematic relationship between
error variance and a particular x. The former
situation can be detected by plotting residuals against fitted
values, and the latter by plotting residuals against
each x. It is worth noting that plotting residuals against y (as
opposed to ŷ) is generally unsatisfactory. The plot
will be tilted: There is a built-in linear correlation between e
and y, because y = ŷ + e; in fact, the correlation
between y and e is r(y, e) = . In contrast, the least-squares fit
insures that r(ŷ, e) = 0, producing a plot
that is much easier to examine for evidence of nonconstant
spread.
40. Because the least-squares residuals have unequal variances even
when the errors have constant variance,
I prefer plotting the studentized residuals against fitted values.
Finally, a pattern of changing spread is often
more easily discerned in a plot of |ti| or ti
2 versus ŷ, perhaps augmented by a lowess scatterplot smooth
(see
Appendix A6.1); smoothing this plot is particularly useful when
the sample size is very large or the distribution
of ŷ is very uneven. An example appears in Figure 6.2.
An illustrative plot of studentized residuals against fitted values
is shown in Figure 6.1a. In Figure 6.1b,
studentized residuals are plotted against log2(3 + ŷ); by
correcting the positive skew in these ŷ values,
the second plot makes it easier to discern the tendency of the
residual spread to increase with ŷ. The
data for this example are drawn from work by Ornstein (1976)
on interlocking directorates among the 248
largest Canadian firms. The number of interlocking directorate
and executive positions maintained by the
corporations is regressed on corporate assets (square-root
transformed to make the relationship linear; see
Chapter 7); 9 dummy variables representing 10 industrial
classes, with heavy manufacturing serving as the
41. baseline category; and 3 dummy variables representing 4
nations of control, with Canada as the baseline
category. The results of the regression are given in the left-hand
columns of Table 6.1; the results shown on
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 2 of 6
Nonconstant Error Variance
the right of the table are discussed below. Note that part of the
tendency for the residual scatter to increase
with ŷ is due to the lower bound of 0 for y: Because e = y - ŷ,
the smallest possible residual corresponding to
a particular ŷ value is e = 0 - ŷ = -ŷ.
Figure 6.1. Plots of studentized residuals versus fitted values
for Ornstein's interlocking-directorate
regression. (a) t versus ŷ. (b) t versus log2(3 + ŷ). The log
transformation serves to reduce the skew
of the fitted values, making the increasing residual spread easier
to discern.
Correcting Nonconstant Error Variance
42. Transformations frequently serve to correct a tendency of the
error variance to increase or, less commonly,
decrease with the magnitude of the dependent variable: Move y
down the ladder of powers and roots if
the residual scatter broadens with the fitted values; move y up
the ladder if the residual scatter narrows.
An effective transformation may be selected by trial and error
(but see Chapter 9 for an analytic method of
selecting a variance-stabilizing transformation).
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 3 of 6
Nonconstant Error Variance
TABLE 6.1 Regression of Number of Interlocking Directorate
and Executive Positions Maintained by
248 Major Canadian Corporations on Corporate Assets, Sector,
and Nation of Control
If the error variance is proportional to a particular x, or if the
pattern of V(?i) is otherwise known up to
a constant of proportionality, then an alternative to
43. transformation of y is weighted-least-squares (WLS)
estimation (see Appendix A6.2). It also is possible to correct
the estimated standard errors of the least-
squares coefficients for heteroscedasticity: A method proposed
by White (1980) is described in Appendix
A6.3. An advantage of this approach is that knowledge of the
pattern of nonconstant error variance (e.g.,
increased variance with the level of y or with an x) is not
required. If, however, the heteroscedasticity problem
is severe, and the corrected standard errors therefore are
substantially larger than those produced by the
usual formula, then discovering the pattern of nonconstant
variance and correcting for it—by a transformation
or WLS estimation—offers the possibility of more efficient
estimation. In any event, unequal error variances
are worth correcting only when the problem is extreme—where,
for example, the spread of the errors varies
by a factor of about three or more (i.e., the error variance varies
by a factor of about 10 or more; see Appendix
A6.4).
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
44. Page 4 of 6
Nonconstant Error Variance
Figure 6.2. Plot of absolute studentized residuals versus fitted
values for the square-root transformed
interlocking-directorate data. The line on the graph is a lowess
smooth using f = 0.5 and 2 robustness
iterations.
For Ornstein's interlocking-directorate regression, for example,
a square-root transformation appears to
correct the dependence of the residual spread on the level of the
dependent variable. A plot of |ti| versus ŷi
for the transformed data is given in Figure 6.2, and the
regression results appear in the right-hand columns of
Table 6.1. The lowess smooth in Figure 6.2 (see Appendix
A6.1) shows little change in the average absolute
studentized residuals as the fitted values increase.
The coefficients for the original and transformed regressions in
Table 6.1 cannot be compared directly,
because the scale of the dependent variable has been altered. It
is clear, however, that assets retain their
positive effect and that the nations of control maintain their
ranks. The sectoral ranks are also similar
45. across the two analyses, although not identical. In comparing
the two sets of results, recall that the baseline
categories for the sets of dummy regressors—Canada and heavy
manufacturing—implicitly have coefficients
of zero.
Transforming y also changes the shape of the error distribution
and alters the shape of the regression of y
on the xs. It is frequently the case that producing constant
residual variation through a transformation also
makes the distribution of the residuals more symmetric. At
times, eliminating nonconstant spread also makes
the relationship of y to the xs more nearly linear (see the next
chapter). These by-products are not necessary
consequences of correcting nonconstant error variance,
however, and it is particularly important to check data
for non-linearity following a transformation of y. Of course,
because there generally is no reason to suppose
that the regression is linear prior to transforming y, we should
check for nonlinearity even when y is not
transformed.
Finally, nonconstant residual spread sometimes is evidence for
the omission of important effects from the
model. Suppose, for example, that there is an omitted
46. categorical independent variable, such as regional
location, that interacts with assets in affecting interlocks; in
particular, the assets slope, although positive in
every region, is steeper in some regions than in others. Then the
omission of region and its interactions with
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 5 of 6
Nonconstant Error Variance
assets could produce a fan-shaped residual plot even if the
errors from the correct model have constant
spread. The detection of this type of specification error
therefore requires substantive insight into the process
generating the data and cannot rely on diagnostics alone.
http://dx.doi.org/10.4135/9781412985604.n6
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 6 of 6
48. https://dx.doi.org/10.4135/9781412985604
Discrete Data
Discrete independent and dependent variables often lead to
plots that are difficult to interpret. A simple
example of this phenomenon appears in Figure 8.1, the data for
which are drawn from the 1989 General
Social Survey conducted by the National Opinion Research
Center. The independent variable, years of
education completed, is coded from 0 to 20. The dependent
variable is the number of correct answers to a
10-item vocabulary test; note that this variable is a disguised
proportion—literally, the proportion correct × 10.
Figure 8.1. Scatterplot (a) and residual plot (b) for vocabulary
score by
year of education. The least-squares regression line is shown on
the
scatterplot.
The scatterplot in Figure 8.1a conveys the general impression
that vocabulary increases with education. The
plot is difficult to read, however, because most of the 968 data
points fall on top of one another. The least-
squares regression line, also shown on the plot, has the equation
49. where V and E are, respectively, the vocabulary score and
education.
Figure 8.1b plots residuals from the fitted regression against
education. The diagonal lines running from upper
left to lower right in this plot are typical of residuals for a
discrete dependent variable: For any one of the 11
distinct y values, e.g., y = 5, the residual is e = 5 - b0 - b1x =
3.87 - 0.374x, which is a linear function of x.
I noted a similar phenomenon in Chapter 6 for the plot of
residuals against fitted values when y has a fixed
minimum score. The diagonals from lower left to upper right are
due to the discreteness of x.
It also appears that the variation of the residuals in Figure 8.1b
is lower for the largest and smallest values
of education than for intermediate values. This pattern is
consistent with the observation that the dependent
variable is a disguised proportion: As the average number of
correct answers approaches 0 or 10, the
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 2 of 5
Discrete Data
50. potential variation in vocabulary scores decreases. It is
possible, however, that at least part of the apparent
decrease in residual variation is due to the relative sparseness of
data at the extremes of the education scale.
Our eye is drawn to the range of residual values, especially
because we cannot see most of the data points,
and even when variance is constant, the range tends to increase
with the amount of data.
These issues are addressed in Figure 8.2, where each data point
has been randomly “jittered” both vertically
and horizontally: Specifically, a uniform random variable on the
interval [-1/2, 1/2] was added to each
education and vocabulary score. This approach to plotting
discrete data was suggested by Chambers,
Cleveland, Kleiner, and Tukey (1983). The plot also shows the
fitted regression line for the original data, along
with lines tracing the median and first and third quartiles of the
distribution of jittered vocabulary scores for
each value of education; I excluded education values below six
from the median and quartile traces because
of the sparseness of data in this region.
Several features of Figure 8.2 are worth highlighting: (a) It is
clear from the jittered data that the observations
51. are particularly dense at 12 years of education, corresponding to
high-school graduation; (b) the median trace
is quite close to the linear least-squares regression line; and (c)
the quartile traces indicate that the spread of
y does not decrease appreciably at high values of education.
A discrete dependent variable violates the assumption that the
error in the regression model is normally
distributed with constant variance. This problem, like that of a
limited dependent variable, is only serious in
extreme cases—for example, when there are very few response
categories, or where a large proportion of
observations is in a small number of categories, conditional on
the values of the independent variables.
In contrast, discrete independent variables are perfectly
consistent with the regression model, which makes
no distributional assumptions about the xs other than
uncorrelation with the error. Indeed a discrete x makes
possible a straightforward hypothesis test of nonlinearity,
sometimes called a test for “lack of fit.” Likewise, it
is relatively simple to test for nonconstant error variance across
categories of a discrete independent variable
(see below).
Figure 8.2. “Jittered” scatterplot for vocabulary score by
52. education.
A small random quantity is added to each horizontal and
vertical
coordinate. The dashed line is the least-squares regression line
for the
unjittered data. The solid lines are median and quartile traces
for the
jittered vocabulary scores.
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 3 of 5
Discrete Data
Testing for Nonlinearity
Suppose, for example, that we model education with a set of
dummy regressors rather than specify a linear
relationship between vocabulary score and education. Although
there are 21 conceivable education scores,
ranging from 0 through 20, none of the individuals in the
sample has 2 years of education, yielding 20
categories and 19 dummy regressors. The model becomes
53. TABLE 8.1 Analysis of Variance for Vocabulary-Test Score,
Showing the Incremental F Test for
Nonlinearity of the Relationship Between Vocabulary and
Education
Contrasting this model with
produces a test for nonlinearity, because Equation 8.2,
specifying a linear relationship, is a special case of
Equation 8.1, which captures any pattern of relationship
between E(y) and x. The resulting incremental F test
for nonlinearity appears in the analysis-of-variance of Table
8.1. There is, therefore, very strong evidence of
a linear relationship between vocabulary and education, but
little evidence of nonlinearity.
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 4 of 5
Discrete Data
The F test for nonlinearity easily can be extended to a discrete
independent variable—say, x1—in a multiple-
regression model. Here, we contrast the more general model
54. with a model specifying a linear effect of x1,
where d1, …, dq-1 are dummy regressors constructed to
represent the q categories of x1.
Testing for Nonconstant Error Variance
A discrete x (or combination of xs) partitions the data into q
groups. Let yij denote the jth of ni dependent-
variable scores in the ith group. If the error variance is
constant, then the within-group variance estimates
should be similar. Here, ŷi is the mean in the ith group. Tests
that examine the si
2 directly, such as Bartlett's
(1937) commonly employed test, do not maintain their validity
well when the errors are non-normal.
Many alternative tests have been proposed. In a large-scale
simulation study, Conover, Johnson, and
Johnson (1981) demonstrate that the following simple F test is
both robust and powerful: Calculate the values
zij = |yij - yi
?| where yi
? is the median y within the ith group. Then perform a one-way
analysis-of-variance of
the variable z over the q groups. If the error variance is not
constant across the groups, then the group means
55. will tend to differ, producing a large value of the F test
statistic. For the vocabulary data, for example, where
education partitions the 968 observations into q = 20 groups,
this test gives F19,948 = 1.48, p = .08, providing
weak evidence of nonconstant spread.
http://dx.doi.org/10.4135/9781412985604.n8
SAGE
1991 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods
Page 5 of 5
Discrete Data
http://dx.doi.org/10.4135/9781412985604.n8Discrete DataIn:
Regression Diagnostics
QRA Week 10
Dummy Variables, Regression Diagnostics, and Model
Evaluation
By now, you have gained quite a bit of experience estimating
regression models. Perhaps one thing you have noticed is that
you have not been able to include categorical predictor/control
variables. In social science, many of the predictor variables that
we might want to use are inherently qualitative and measured
categorically (i.e., race, gender, political party affiliation, etc.).
This week, you will learn how to use categorical variables in
our multiple regression models.
While we have discussed a great deal about the benefits of
multiple regression, we have been reticent about what can go
wrong in our models. For our models to provide accurate
56. estimates, we must adhere to a set of assumptions. Given the
dynamics of the social world, data gathered are often far from
perfect. This week, you will examine all of the assumptions of
multiple regression and how you can test for them.
Learning Objectives
Students will:
· Analyze multiple regression testing using dummy variables
· Analyze measures for multiple regression testing
· Construct research questions
· Evaluate assumptions of multiple regression testing
· Analyze assumptions of correlation and bivariate regression
· Analyze implications for social change
Learning Resources
Required Readings
Wagner, W. E. (2016). Using IBM® SPSS® statistics for
research methods and social science statistics (6th ed.).
Thousand Oaks, CA: Sage Publications.
· Chapter 2, “Transforming Variables” (pp. 14–32)
· Chapter 11, “Editing Output” (previously read in Week 2, 3, 4,
5. 6, 7, 8, and 9)
Allison, P. D. (1999). Multiple regression: A primer. Thousand
Oaks, CA: Pine Forge Press/Sage Publications.
Multiple Regression: A Primer, by Allison, P.
D. Copyright 1998 by Sage College. Reprinted by permission
of Sage College via the Copyright Clearance Center.
· Chapter 6, “What are the Assumptions of Multiple
Regression?” (pp. 119–136)
Allison, P. D. (1999). Multiple regression: A primer. Thousand
Oaks, CA: Pine Forge Press/Sage Publications.
Multiple Regression: A Primer, by Allison, P.
D. Copyright 1998 by Sage College. Reprinted by permission
of Sage College via the Copyright Clearance Center.
· Chapter 7, “What can be done about Multicollinearity?” (pp.
57. 137–152)
Multiple Regression: A Primer, by Allison, P.
D. Copyright 1998 by Sage College. Reprinted by permission
of Sage College via the Copyright Clearance Center.
Warner, R. M. (2012). Applied statistics from bivariate through
multivariate techniques (2nd ed.). Thousand Oaks, CA: Sage
Publications.
Applied Statistics From Bivariate Through Multivariate
Techniques, 2nd Edition by Warner, R.M. Copyright 2012 by
Sage College. Reprinted by permission of Sage College via the
Copyright Clearance Center.
· Chapter 12, “Dummy Predictor Variables in Multiple
Regression”
Applied Statistics From Bivariate Through Multivariate
Techniques, 2nd Edition by Warner, R.M. Copyright 2012 by
Sage College. Reprinted by permission of Sage College via the
Copyright Clearance Center.
Fox, J. (Ed.). (1991). Regression diagnostics. Thousand Oaks,
CA: SAGE Publications.
· Chapter 3, “Outlying and Influential Data” (pp. 22–41)
· Chapter 4, “Non-Normally Distributed Errors” (pp. 41–49)
· Chapter 5, “Nonconstant Error Variance” (pp. 49–54)
· Chapter 6, “Nonlinearity” (pp. 54–62)
· Chapter 7, “Discrete Data” (pp. 62–67)
Note: You will access these chapters through the Walden
Library databases.
Document: Walden University: Research Design Alignment
Table
Datasets
Document: Data Set 2014 General Social Survey (dataset file)
58. Use this dataset to complete this week’s Discussion.
Note: You will need the SPSS software to open this dataset.
Document: Data Set Afrobarometer (dataset file)
Use this dataset to complete this week’s Assignment.
Note: You will need the SPSS software to open this dataset.
Document: High School Longitudinal Study 2009 Dataset
(dataset file)
Use this dataset to complete this week’s Assignment.
Note: You will need the SPSS software to open this dataset.
Required Media
Laureate Education (Producer). (2016m). Regression
diagnostics and model evaluation [Video file]. Baltimore, MD:
Author.
Note: The approximate length of this media piece is 7 minutes.
In this media program, Dr. Matt Jones demonstrates regression
diagnostics and model evaluation using the SPSS software.
Accessible player
Laureate Education (Producer). (2016). Dummy variables
[Video file]. Baltimore, MD: Author.
Note: This media program is approximately 12 minutes.
In this media program, Dr. Matt Jones demonstrates dummy
variables using the SPSS software.
Accessible player
Discussion: Estimating Models Using Dummy Variables
You have had plenty of opportunity to interpret coefficients for
metric variables in regression models. Using and interpreting
categorical variables takes just a little bit of extra practice. In
this Discussion, you will have the opportunity to practice how
59. to recode categorical variables so they can be used in a
regression model and how to properly interpret the coefficients.
Additionally, you will gain some practice in running diagnostics
and identifying any potential problems with the model.
To prepare for this Discussion:
· Review Warner’s Chapter 12 and Chapter 2 of the Wagner
course text and the media program found in this week’s
Learning Resources and consider the use of dummy variables.
· Create a research question using the General Social Survey
dataset that can be answered by multiple regression. Using the
SPSS software, choose a categorical variable to dummy code as
one of your predictor variables.
By Day 3
Estimate a multiple regression model that answers your research
question. Post your response to the following:
1. What is your research question?
2. Interpret the coefficients for the model, specifically
commenting on the dummy variable.
3. Run diagnostics for the regression model. Does the model
meet all of the assumptions? Be sure and comment on what
assumptions were not met and the possible implications. Is there
any possible remedy for one the assumption violations?
Be sure to support your Main Post and Response Post with
reference to the week’s Learning Resources and other scholarly
evidence in APA Style.
Assignment: Testing for Multiple Regression
You had the chance earlier in the course to practice with
multiple regression and obtain peer feedback. Now, it is time
once again to put all of that good practice to use and answer a
social research question with multiple regression. As you begin
the Assignment, be sure and pay close attention to the
assumptions of the test. Specifically, make sure the variables
are metric level variables.
60. Part 1
To prepare for this Part 1 of your Assignment:
· Review this week 9 and 10 Learning Resources and media
program related to multiple regression.
· Using the SPSS software, open the Afrobarometer dataset or
the High School Longitudinal Study dataset (whichever you
choose) found in the Learning Resources for this week.
· Based on the dataset you chose, construct a research question
that can be answered with a multiple regression analysis.
· Once you perform your multiple regression analysis, review
Chapter 11 of the Wagner text to understand how to copy and
paste your output into your Word document.
For this Part 1 Assignment:
Write a 1- to 2-page analysis of your multiple regression results
for each research question. In your analysis, display the data for
the output. Based on your results, provide an explanation of
what the implications of social change might be.
Use proper APA format, citations, and referencing for your
analysis, research question, and display of output.
Part 2
To prepare for this Part 2 of your Assignment:
· Review Warner’s Chapter 12 and Chapter 2 of the Wagner
course text and the media program found in this week’s
Learning Resources and consider the use of dummy variables.
· Using the SPSS software, open the Afrobarometer dataset or
the High School Longitudinal Study dataset (whichever you
choose) found in this week’s Learning Resources.
· Consider the following:
· Create a research question with metric variables and one
variable that requires dummy coding. Estimate the model and
report results. Note: You are expected to perform regression
diagnostics and report that as well.
· Once you perform your analysis, review Chapter 11 of the
61. Wagner text to understand how to copy and paste your output
into your Word document.
For this Part 2 Assignment:
Write a 2- to 3-page analysis of your multiple regression using
dummy variables results for each research question. In your
analysis, display the data for the output. Based on your results,
provide an explanation of what the implications of social
change might be.
Use proper APA format, citations, and referencing for your
analysis, research question, and display of output.
By Day 7
Submit Parts 1 and 2 of your Assignment: Testing for Multiple
Regression.