In this presentation, we will discuss the mathematical basis of linear regression and analyze the concepts of p-value, hypothesis testing, and confidence intervals, and their interpretation.
In this presentation, we will analyze multicollinearity and inference with interaction terms in regression analysis. We will also analyze partial correlation and interpretation procedures of statistical data.
This document discusses regression analysis techniques. Regression analysis is used to model the relationship between a dependent variable (Y) and one or more independent variables (X1, X2, etc). Simple linear regression involves one independent variable, while multiple linear regression involves two or more independent variables. The key assumptions of linear regression are outlined. Methods for estimating regression coefficients using least squares and testing the significance of regression coefficients and the overall regression model are also described. An example application involving modeling personal pollutant exposure (Y) based on hours outdoors (X1) and home pollutant levels (X2) is provided.
The document discusses statistical assumptions of multiple regression models, including normality and homoscedasticity of the error term. It explains that the error term in a regression model is assumed to be normally distributed with constant variance. Tests for normality look at skewness and kurtosis of the error distribution, while heteroscedasticity exists if the variance is not constant. If these assumptions are violated, it impacts test statistics and statistical inference from the model.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression is a multivariate technique used when the outcome is continuous that provides slopes. Linear regression assumes a linear relationship between the predictor and outcome variables, normality of the outcome at each value of the predictor, equal variances of the outcome, and independence of observations. It also discusses calculating the slope and intercept via least squares estimation to find the line that best fits the data by minimizing residuals.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression is a multivariate technique used when the outcome is continuous that provides slopes. Linear regression assumes a linear relationship between an independent and dependent variable, normally distributed dependent variable values, equal variances, and independence of observations. Least squares estimation is used to calculate the intercept and slope that minimize the squared differences between observed and predicted dependent variable values. The slope's significance can be tested using a t-test.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression analyzes the relationship between a continuous outcome (dependent) variable and one or more independent (predictor) variables. Linear regression finds the line of best fit to model this relationship and estimates coefficients that can be tested for statistical significance. The assumptions of linear regression include a linear relationship between variables, normally distributed errors, homogeneity of variance, and independent observations.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression analyzes the relationship between a continuous outcome (dependent) variable and one or more independent (predictor) variables. Linear regression finds the line of best fit to model this relationship and estimates coefficients that can be used to predict the outcome variable based on the independent variables. Key assumptions of linear regression include a linear relationship between variables, normally distributed errors, homogeneity of variance, and independence of observations. The significance of regression coefficients can be tested using t-tests and the standard error of the coefficients is also discussed.
Slideset Simple Linear Regression models.pptrahulrkmgb09
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression is a multivariate technique used when the outcome is continuous that provides slopes. Linear regression assumes a linear relationship between an independent and dependent variable, normally distributed dependent variable values, equal variances, and independence of observations. It estimates a slope and intercept through least squares estimation to minimize the squared distances between observed and predicted dependent variable values. The significance of the estimated slope can be tested using a t-test.
In this presentation, we will analyze multicollinearity and inference with interaction terms in regression analysis. We will also analyze partial correlation and interpretation procedures of statistical data.
This document discusses regression analysis techniques. Regression analysis is used to model the relationship between a dependent variable (Y) and one or more independent variables (X1, X2, etc). Simple linear regression involves one independent variable, while multiple linear regression involves two or more independent variables. The key assumptions of linear regression are outlined. Methods for estimating regression coefficients using least squares and testing the significance of regression coefficients and the overall regression model are also described. An example application involving modeling personal pollutant exposure (Y) based on hours outdoors (X1) and home pollutant levels (X2) is provided.
The document discusses statistical assumptions of multiple regression models, including normality and homoscedasticity of the error term. It explains that the error term in a regression model is assumed to be normally distributed with constant variance. Tests for normality look at skewness and kurtosis of the error distribution, while heteroscedasticity exists if the variance is not constant. If these assumptions are violated, it impacts test statistics and statistical inference from the model.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression is a multivariate technique used when the outcome is continuous that provides slopes. Linear regression assumes a linear relationship between the predictor and outcome variables, normality of the outcome at each value of the predictor, equal variances of the outcome, and independence of observations. It also discusses calculating the slope and intercept via least squares estimation to find the line that best fits the data by minimizing residuals.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression is a multivariate technique used when the outcome is continuous that provides slopes. Linear regression assumes a linear relationship between an independent and dependent variable, normally distributed dependent variable values, equal variances, and independence of observations. Least squares estimation is used to calculate the intercept and slope that minimize the squared differences between observed and predicted dependent variable values. The slope's significance can be tested using a t-test.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression analyzes the relationship between a continuous outcome (dependent) variable and one or more independent (predictor) variables. Linear regression finds the line of best fit to model this relationship and estimates coefficients that can be tested for statistical significance. The assumptions of linear regression include a linear relationship between variables, normally distributed errors, homogeneity of variance, and independent observations.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression analyzes the relationship between a continuous outcome (dependent) variable and one or more independent (predictor) variables. Linear regression finds the line of best fit to model this relationship and estimates coefficients that can be used to predict the outcome variable based on the independent variables. Key assumptions of linear regression include a linear relationship between variables, normally distributed errors, homogeneity of variance, and independence of observations. The significance of regression coefficients can be tested using t-tests and the standard error of the coefficients is also discussed.
Slideset Simple Linear Regression models.pptrahulrkmgb09
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression is a multivariate technique used when the outcome is continuous that provides slopes. Linear regression assumes a linear relationship between an independent and dependent variable, normally distributed dependent variable values, equal variances, and independence of observations. It estimates a slope and intercept through least squares estimation to minimize the squared distances between observed and predicted dependent variable values. The significance of the estimated slope can be tested using a t-test.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression analyzes the relationship between a continuous outcome (dependent) variable and one or more independent (predictor) variables. Linear regression finds the line of best fit to model this relationship and estimates coefficients that can be tested for statistical significance. The assumptions of linear regression include a linear relationship between variables, normally distributed errors, homogeneity of variance, and independent observations.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression is a multivariate technique used when the outcome is continuous that provides slopes. Linear regression assumes a linear relationship between an independent and dependent variable, normally distributed errors, equal variances, and independence of observations. The slope is estimated using least squares to minimize the squared differences between observed and predicted values of the dependent variable. Significance of the slope is tested using a t-test.
This document discusses linear regression analysis. It defines simple and multiple linear regression, and explains that regression examines the relationship between independent and dependent variables. The document provides the equations for linear regression analysis, and discusses calculating the slope, intercept, standard error of the estimate, and coefficient of determination. It explains that regression analysis is widely used for prediction and forecasting in areas like advertising and product sales.
This document discusses regression analysis techniques. It begins with defining regression and its objectives, such as using independent variables to predict dependent variable values. It then covers understanding regression through layman terms and statistical terms. The rest of the document assesses goodness of fit both graphically and statistically. It discusses assumptions of regression like normality, equal variance, and independent errors. It also covers analyzing residuals, outliers, influential cases, and addressing issues like multicollinearity.
Regression analysis is a statistical technique used to investigate relationships between variables. It allows one to determine the strength of the relationship between a dependent variable (usually denoted by Y) and one or more independent variables (denoted by X). Multiple regression extends this to analyze the relationship between a dependent variable and multiple independent variables. The goals of regression analysis are to understand how the dependent variable changes with the independent variables and to use the independent variables to predict the value of the dependent variable. It requires the dependent variable to be continuous and the independent variables can be either continuous or categorical.
Regression.ppt basic introduction of regression with exampleshivshankarshiva98
Regression analysis attempts to explain variation in a dependent variable using independent variables. Simple linear regression fits a straight line to the data using an equation of y=b0+b1x+ε. The coefficient of determination R2 indicates how well the regression line represents the data, ranging from 0 to 1. Multiple linear regression generalizes this to use more than one independent variable to explain the dependent variable.
linear Regression, multiple Regression and AnnovaMansi Rastogi
This document provides an overview of simple linear regression analysis. It defines key concepts such as the regression line, slope, intercept, and error term. The learning objectives are to predict dependent variable values from independent variables, interpret regression coefficients, evaluate assumptions, and make inferences. An example uses house price data to fit a linear regression model with square footage as the independent variable. The slope is interpreted as the change in house price associated with an additional square foot. A t-test is used to infer whether square footage significantly affects price.
The document discusses simple linear regression and correlation methods. It defines deterministic and probabilistic models for describing the relationship between two variables. A simple linear regression model assumes a population regression line with intercept a and slope b, where observations may deviate from the line by some random error e. Key assumptions of the model are that e has a normal distribution with mean 0 and constant variance across values of x, and errors are independent. The slope b estimates the average change in y per unit change in x.
This document discusses multiple regression analysis. It begins by introducing multiple regression as an extension of simple linear regression that allows for modeling relationships between a response variable and multiple explanatory variables. It then covers topics such as examining variable distributions, building regression models, estimating model parameters, and assessing overall model fit and significance of individual predictors. An example demonstrates using multiple regression to build a model for predicting cable television subscribers based on advertising rates, station power, number of local families, and number of competing stations.
This document provides an overview and schedule for an advanced econometrics training using Stata. The training covers topics such as hypothesis testing, multiple regression, time series models, panel data models, and difference-in-differences. It discusses assumptions of classical linear regression models and how to perform statistical inference using estimates of variance, standard error, and hypothesis testing. The document explains how to construct t-statistics and compare them to critical values from the t-distribution to test hypotheses about population parameters.
The document discusses hypothesis testing and statistical analysis techniques. It covers univariate, bivariate, and multivariate statistical analysis, which involve one, two, or three or more variables, respectively. The key steps of hypothesis testing are outlined, including deriving a null hypothesis from the research objectives, obtaining and measuring a sample, comparing the sample value to the hypothesis, and determining whether to support or not support the hypothesis based on consistency. Type I and Type II errors in hypothesis testing are defined. Common statistical tests like chi-square, t-tests, ANOVA, and correlation are introduced along with concepts like significance levels, p-values, and degrees of freedom.
The document provides an overview of analysis of variance (ANOVA) including its purpose, assumptions, computations, and applications. It explains that ANOVA tests whether population means are equal by comparing two estimators of variance - the variation between sample means and the variation within samples. If the null hypothesis that all population means are equal is true, the between-sample variation will be small relative to the within-sample variation. The document outlines the computations and formulas behind ANOVA including definitions of terms like treatment deviations, error deviations, and sums of squares. It also describes how to interpret and report ANOVA results including the F-statistic and ANOVA table.
Pampers CaseIn an increasingly competitive diaper market, P&G’.docxbunyansaturnina
Pampers Case
In an increasingly competitive diaper market, P&G’s marketing department wanted to formulate new approaches to the construction and marketing of Pampers to position them effectively against Hugggies without cannibalizing Luvs. They surveyed 300 mothers of infants. Each was given a randomly selected brand of diaper (either Pampers, Luvs, or Huggies) and asked to rate that diaper on nine attributes and to give her overall preference for the brand. Preference was obtained on a 7-point Likert scale (1=not at all preferred, 7=greatly preferred). Diaper ratings on the nine attributes were also obtained on 7-point Likert scale (1=very unfavorable, 7=very favorable). The study was designed so that each of the three brands appeared 100 times. The goal of the study was to learn which attributes of diapers were most important in influencing purchase preference (Y). The nine attributes used in study were:
Variable
Attribute
Marketing options
X1
count per box
Desire large counts per box?
X2
price
Pay a premium price?
X3
value
Promote high value
X4
skin care
Offer high degree of skin care
X5
style
Prints/color vs. plain diapers
X6
absorbency
Regular vs. superabsorbency
X7
leakage
Narrow/tapered vs. regular crotch
X8
comfort/size
Extra padding and form-fitting gathers
X9
taping
Re-sealable tape vs. regular tape
Question (will be discussed in week 8):
If you don’t have SPSS software at home, you may be able to download a trial version (good for 21 days) from spss.com(software(statistics family(PASW statistics 17.0(click “free trial” and download.
1. Run a regression analysis for brand preference that includes all independent variables in the model, and describe how meaningful the model is. Interpret the results for management.
6. Correlation and Regression
*
The mean, or average value, is the most commonly used measure of central tendency. The mean, ,is given by
Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)
The mode is the value that occurs most frequently. It represents the highest peak of the distribution. The mode is a good measure of location when the variable is inherently categorical or has otherwise been grouped into categories.
Statistics Associated with Frequency Distribution Measures of Location
X
=
X
i
/
n
S
i
=
1
n
X
*
The median of a sample is the middle value when the data are arranged in ascending or descending order.
http://www.city-data.com/
Statistics Associated with Frequency Distribution Measures of Location
*
Skewness. The tendency of the deviations from the mean to be larger in one direction than in the other. It can be thought of as the tendency for one tail of the distribution to be heavier than the other.
Kurtosis is a measure of the relative peakedness or flatness of the curve defined by the frequency distribution. The kurtosis of a normal distribution is zero. If.
This document discusses probability distributions for random variables. It introduces discrete distributions like the binomial and Poisson distributions which are used for counting experiments. It also introduces continuous distributions like the normal distribution which are defined over continuous ranges of values. Key concepts covered include probability density functions, cumulative distribution functions, and how to relate random variables with specific parameters to standard distributions. Examples are provided to illustrate concepts like modeling the number of plant stems in a sampling area with a Poisson distribution.
Descriptive Statistics Formula Sheet Sample Populatio.docxsimonithomas47935
Descriptive Statistics Formula Sheet
Sample Population
Characteristic statistic Parameter
raw scores x, y, . . . . . X, Y, . . . . .
mean (central tendency) M =
∑ x
n
μ =
∑ X
N
range (interval/ratio data) highest minus lowest value highest minus lowest value
deviation (distance from mean) Deviation = (x − M ) Deviation = (X − μ )
average deviation (average
distance from mean)
∑(x − M )
n
= 0
∑(X − μ )
N
sum of the squares (SS)
(computational formula) SS = ∑ x
2 −
(∑ x)2
n
SS = ∑ X2 −
(∑ X)2
N
variance ( average deviation2 or
standard deviation
2
)
(computational formula)
s2 =
∑ x2 −
(∑ x)2
n
n − 1
=
SS
df
σ2 =
∑ X2 −
(∑ X)2
N
N
standard deviation (average
deviation or distance from mean)
(computational formula) s =
√∑ x
2 −
(∑ x)2
n
n − 1
σ =
√∑ X
2 −
(∑ X)2
N
N
Z scores (standard scores)
mean = 0
standard deviation = ± 1.0
Z =
x − M
s
=
deviation
stand. dev.
X = M + Zs
Z =
X − μ
σ
X = μ + Zσ
Area Under the Normal Curve -1s to +1s = 68.3%
-2s to +2s = 95.4%
-3s to +3s = 99.7%
Using Z Score Table for Normal Distribution
(Note: see graph and table in A-23)
for percentiles (proportion or %) below X
for positive Z scores – use body column
for negative Z scores – use tail column
for proportions or percentage above X
for positive Z scores – use tail column
for negative Z scores – use body column
to discover percentage / proportion between two X values
1. Convert each X to Z score
2. Find appropriate area (body or tail) for each Z score
3. Subtract or add areas as appropriate
4. Change area to % (area × 100 = %)
Regression lines
(central tendency line for all
points; used for predictions
only) formula uses raw
scores
b = slope
a = y-intercept
y = bx + a
(plug in x
to predict y)
b =
∑ xy −
(∑ x)(∑ y)
n
∑ x2 −
(∑ x)2
n
a = My - bMx
where My is mean of y
and Mx is mean of x
SEest (measures accuracy of predictions; same properties as standard deviation)
Pearson Correlation Coefficient
(used to measure relationship;
uses Z scores)
r =
∑ xy−
(∑ x)(∑ y)
n
√(∑ x2−
(∑ x)2
n
)(∑ y2−
(∑ y)2
n
)
r =
degree x & 𝑦 𝑣𝑎𝑟𝑦 𝑡𝑜𝑔𝑒𝑡ℎ𝑒𝑟
degree x & 𝑦 𝑣𝑎𝑟𝑦 𝑠𝑒𝑝𝑎𝑟𝑎𝑡𝑒𝑙𝑦
r
2
= estimate or % of accuracy of predictions
PSYC 2317 Mark W. Tengler, M.S.
Assignment #9
Hypothesis Testing
9.1 Briefly explain in your own words the advantage of using an alpha level (α) = .01
versus an α = .05. In general, what is the disadvantage of using a smaller alpha
level?
9.2 Discuss in your own words the errors that can be made in hypothesis testing.
a. What is a type I error? Why might it occur?
b. What is a type II error? How does it happen?
9.3 The term error is used in two different ways in the context of a hypothesis test.
First, there is the concept of sta
This document discusses estimating parameters and determining sample sizes from populations. It covers estimating population proportions, means, standard deviations, and variances. For each parameter, it describes how to construct confidence intervals and determine the necessary sample size. Formulas are provided for margin of error, t-scores, z-scores and the chi-square distribution, which is used for estimating variances and standard deviations. Examples show how to apply the concepts to find confidence intervals and critical values for specific population problems.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 10: Correlation and Regression
10.1: Correlation
This document provides an overview of regression analysis, including what regression is, how it works, assumptions of regression, and how to assess the model fit and check assumptions. Regression allows us to predict a dependent variable from one or more independent variables. Key steps discussed include checking the normality, homoscedasticity and independence of residuals, identifying influential observations, and addressing issues like multicollinearity. Graphical methods like normal probability plots and scatter plots of residuals are presented as ways to check assumptions.
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
This document introduces linear regression and correlation analysis. It discusses calculating and interpreting the correlation coefficient and linear regression equation to determine the relationship between two variables. It covers scatter plots, the assumptions of regression analysis, and using regression to predict and describe relationships in data. Key terms introduced include the correlation coefficient, linear regression model, explained and unexplained variation, and the coefficient of determination.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression analyzes the relationship between a continuous outcome (dependent) variable and one or more independent (predictor) variables. Linear regression finds the line of best fit to model this relationship and estimates coefficients that can be tested for statistical significance. The assumptions of linear regression include a linear relationship between variables, normally distributed errors, homogeneity of variance, and independent observations.
This document discusses linear correlation and linear regression. It defines linear correlation as showing the linear relationship between two continuous variables, while linear regression is a multivariate technique used when the outcome is continuous that provides slopes. Linear regression assumes a linear relationship between an independent and dependent variable, normally distributed errors, equal variances, and independence of observations. The slope is estimated using least squares to minimize the squared differences between observed and predicted values of the dependent variable. Significance of the slope is tested using a t-test.
This document discusses linear regression analysis. It defines simple and multiple linear regression, and explains that regression examines the relationship between independent and dependent variables. The document provides the equations for linear regression analysis, and discusses calculating the slope, intercept, standard error of the estimate, and coefficient of determination. It explains that regression analysis is widely used for prediction and forecasting in areas like advertising and product sales.
This document discusses regression analysis techniques. It begins with defining regression and its objectives, such as using independent variables to predict dependent variable values. It then covers understanding regression through layman terms and statistical terms. The rest of the document assesses goodness of fit both graphically and statistically. It discusses assumptions of regression like normality, equal variance, and independent errors. It also covers analyzing residuals, outliers, influential cases, and addressing issues like multicollinearity.
Regression analysis is a statistical technique used to investigate relationships between variables. It allows one to determine the strength of the relationship between a dependent variable (usually denoted by Y) and one or more independent variables (denoted by X). Multiple regression extends this to analyze the relationship between a dependent variable and multiple independent variables. The goals of regression analysis are to understand how the dependent variable changes with the independent variables and to use the independent variables to predict the value of the dependent variable. It requires the dependent variable to be continuous and the independent variables can be either continuous or categorical.
Regression.ppt basic introduction of regression with exampleshivshankarshiva98
Regression analysis attempts to explain variation in a dependent variable using independent variables. Simple linear regression fits a straight line to the data using an equation of y=b0+b1x+ε. The coefficient of determination R2 indicates how well the regression line represents the data, ranging from 0 to 1. Multiple linear regression generalizes this to use more than one independent variable to explain the dependent variable.
linear Regression, multiple Regression and AnnovaMansi Rastogi
This document provides an overview of simple linear regression analysis. It defines key concepts such as the regression line, slope, intercept, and error term. The learning objectives are to predict dependent variable values from independent variables, interpret regression coefficients, evaluate assumptions, and make inferences. An example uses house price data to fit a linear regression model with square footage as the independent variable. The slope is interpreted as the change in house price associated with an additional square foot. A t-test is used to infer whether square footage significantly affects price.
The document discusses simple linear regression and correlation methods. It defines deterministic and probabilistic models for describing the relationship between two variables. A simple linear regression model assumes a population regression line with intercept a and slope b, where observations may deviate from the line by some random error e. Key assumptions of the model are that e has a normal distribution with mean 0 and constant variance across values of x, and errors are independent. The slope b estimates the average change in y per unit change in x.
This document discusses multiple regression analysis. It begins by introducing multiple regression as an extension of simple linear regression that allows for modeling relationships between a response variable and multiple explanatory variables. It then covers topics such as examining variable distributions, building regression models, estimating model parameters, and assessing overall model fit and significance of individual predictors. An example demonstrates using multiple regression to build a model for predicting cable television subscribers based on advertising rates, station power, number of local families, and number of competing stations.
This document provides an overview and schedule for an advanced econometrics training using Stata. The training covers topics such as hypothesis testing, multiple regression, time series models, panel data models, and difference-in-differences. It discusses assumptions of classical linear regression models and how to perform statistical inference using estimates of variance, standard error, and hypothesis testing. The document explains how to construct t-statistics and compare them to critical values from the t-distribution to test hypotheses about population parameters.
The document discusses hypothesis testing and statistical analysis techniques. It covers univariate, bivariate, and multivariate statistical analysis, which involve one, two, or three or more variables, respectively. The key steps of hypothesis testing are outlined, including deriving a null hypothesis from the research objectives, obtaining and measuring a sample, comparing the sample value to the hypothesis, and determining whether to support or not support the hypothesis based on consistency. Type I and Type II errors in hypothesis testing are defined. Common statistical tests like chi-square, t-tests, ANOVA, and correlation are introduced along with concepts like significance levels, p-values, and degrees of freedom.
The document provides an overview of analysis of variance (ANOVA) including its purpose, assumptions, computations, and applications. It explains that ANOVA tests whether population means are equal by comparing two estimators of variance - the variation between sample means and the variation within samples. If the null hypothesis that all population means are equal is true, the between-sample variation will be small relative to the within-sample variation. The document outlines the computations and formulas behind ANOVA including definitions of terms like treatment deviations, error deviations, and sums of squares. It also describes how to interpret and report ANOVA results including the F-statistic and ANOVA table.
Pampers CaseIn an increasingly competitive diaper market, P&G’.docxbunyansaturnina
Pampers Case
In an increasingly competitive diaper market, P&G’s marketing department wanted to formulate new approaches to the construction and marketing of Pampers to position them effectively against Hugggies without cannibalizing Luvs. They surveyed 300 mothers of infants. Each was given a randomly selected brand of diaper (either Pampers, Luvs, or Huggies) and asked to rate that diaper on nine attributes and to give her overall preference for the brand. Preference was obtained on a 7-point Likert scale (1=not at all preferred, 7=greatly preferred). Diaper ratings on the nine attributes were also obtained on 7-point Likert scale (1=very unfavorable, 7=very favorable). The study was designed so that each of the three brands appeared 100 times. The goal of the study was to learn which attributes of diapers were most important in influencing purchase preference (Y). The nine attributes used in study were:
Variable
Attribute
Marketing options
X1
count per box
Desire large counts per box?
X2
price
Pay a premium price?
X3
value
Promote high value
X4
skin care
Offer high degree of skin care
X5
style
Prints/color vs. plain diapers
X6
absorbency
Regular vs. superabsorbency
X7
leakage
Narrow/tapered vs. regular crotch
X8
comfort/size
Extra padding and form-fitting gathers
X9
taping
Re-sealable tape vs. regular tape
Question (will be discussed in week 8):
If you don’t have SPSS software at home, you may be able to download a trial version (good for 21 days) from spss.com(software(statistics family(PASW statistics 17.0(click “free trial” and download.
1. Run a regression analysis for brand preference that includes all independent variables in the model, and describe how meaningful the model is. Interpret the results for management.
6. Correlation and Regression
*
The mean, or average value, is the most commonly used measure of central tendency. The mean, ,is given by
Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)
The mode is the value that occurs most frequently. It represents the highest peak of the distribution. The mode is a good measure of location when the variable is inherently categorical or has otherwise been grouped into categories.
Statistics Associated with Frequency Distribution Measures of Location
X
=
X
i
/
n
S
i
=
1
n
X
*
The median of a sample is the middle value when the data are arranged in ascending or descending order.
http://www.city-data.com/
Statistics Associated with Frequency Distribution Measures of Location
*
Skewness. The tendency of the deviations from the mean to be larger in one direction than in the other. It can be thought of as the tendency for one tail of the distribution to be heavier than the other.
Kurtosis is a measure of the relative peakedness or flatness of the curve defined by the frequency distribution. The kurtosis of a normal distribution is zero. If.
This document discusses probability distributions for random variables. It introduces discrete distributions like the binomial and Poisson distributions which are used for counting experiments. It also introduces continuous distributions like the normal distribution which are defined over continuous ranges of values. Key concepts covered include probability density functions, cumulative distribution functions, and how to relate random variables with specific parameters to standard distributions. Examples are provided to illustrate concepts like modeling the number of plant stems in a sampling area with a Poisson distribution.
Descriptive Statistics Formula Sheet Sample Populatio.docxsimonithomas47935
Descriptive Statistics Formula Sheet
Sample Population
Characteristic statistic Parameter
raw scores x, y, . . . . . X, Y, . . . . .
mean (central tendency) M =
∑ x
n
μ =
∑ X
N
range (interval/ratio data) highest minus lowest value highest minus lowest value
deviation (distance from mean) Deviation = (x − M ) Deviation = (X − μ )
average deviation (average
distance from mean)
∑(x − M )
n
= 0
∑(X − μ )
N
sum of the squares (SS)
(computational formula) SS = ∑ x
2 −
(∑ x)2
n
SS = ∑ X2 −
(∑ X)2
N
variance ( average deviation2 or
standard deviation
2
)
(computational formula)
s2 =
∑ x2 −
(∑ x)2
n
n − 1
=
SS
df
σ2 =
∑ X2 −
(∑ X)2
N
N
standard deviation (average
deviation or distance from mean)
(computational formula) s =
√∑ x
2 −
(∑ x)2
n
n − 1
σ =
√∑ X
2 −
(∑ X)2
N
N
Z scores (standard scores)
mean = 0
standard deviation = ± 1.0
Z =
x − M
s
=
deviation
stand. dev.
X = M + Zs
Z =
X − μ
σ
X = μ + Zσ
Area Under the Normal Curve -1s to +1s = 68.3%
-2s to +2s = 95.4%
-3s to +3s = 99.7%
Using Z Score Table for Normal Distribution
(Note: see graph and table in A-23)
for percentiles (proportion or %) below X
for positive Z scores – use body column
for negative Z scores – use tail column
for proportions or percentage above X
for positive Z scores – use tail column
for negative Z scores – use body column
to discover percentage / proportion between two X values
1. Convert each X to Z score
2. Find appropriate area (body or tail) for each Z score
3. Subtract or add areas as appropriate
4. Change area to % (area × 100 = %)
Regression lines
(central tendency line for all
points; used for predictions
only) formula uses raw
scores
b = slope
a = y-intercept
y = bx + a
(plug in x
to predict y)
b =
∑ xy −
(∑ x)(∑ y)
n
∑ x2 −
(∑ x)2
n
a = My - bMx
where My is mean of y
and Mx is mean of x
SEest (measures accuracy of predictions; same properties as standard deviation)
Pearson Correlation Coefficient
(used to measure relationship;
uses Z scores)
r =
∑ xy−
(∑ x)(∑ y)
n
√(∑ x2−
(∑ x)2
n
)(∑ y2−
(∑ y)2
n
)
r =
degree x & 𝑦 𝑣𝑎𝑟𝑦 𝑡𝑜𝑔𝑒𝑡ℎ𝑒𝑟
degree x & 𝑦 𝑣𝑎𝑟𝑦 𝑠𝑒𝑝𝑎𝑟𝑎𝑡𝑒𝑙𝑦
r
2
= estimate or % of accuracy of predictions
PSYC 2317 Mark W. Tengler, M.S.
Assignment #9
Hypothesis Testing
9.1 Briefly explain in your own words the advantage of using an alpha level (α) = .01
versus an α = .05. In general, what is the disadvantage of using a smaller alpha
level?
9.2 Discuss in your own words the errors that can be made in hypothesis testing.
a. What is a type I error? Why might it occur?
b. What is a type II error? How does it happen?
9.3 The term error is used in two different ways in the context of a hypothesis test.
First, there is the concept of sta
This document discusses estimating parameters and determining sample sizes from populations. It covers estimating population proportions, means, standard deviations, and variances. For each parameter, it describes how to construct confidence intervals and determine the necessary sample size. Formulas are provided for margin of error, t-scores, z-scores and the chi-square distribution, which is used for estimating variances and standard deviations. Examples show how to apply the concepts to find confidence intervals and critical values for specific population problems.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 10: Correlation and Regression
10.1: Correlation
This document provides an overview of regression analysis, including what regression is, how it works, assumptions of regression, and how to assess the model fit and check assumptions. Regression allows us to predict a dependent variable from one or more independent variables. Key steps discussed include checking the normality, homoscedasticity and independence of residuals, identifying influential observations, and addressing issues like multicollinearity. Graphical methods like normal probability plots and scatter plots of residuals are presented as ways to check assumptions.
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
This document introduces linear regression and correlation analysis. It discusses calculating and interpreting the correlation coefficient and linear regression equation to determine the relationship between two variables. It covers scatter plots, the assumptions of regression analysis, and using regression to predict and describe relationships in data. Key terms introduced include the correlation coefficient, linear regression model, explained and unexplained variation, and the coefficient of determination.
Categorical Data and Statistical AnalysisMichael770443
In this presentation, we will introduce two tests and hypothesis testing based on it, and different non-parametric methods such as the Kolmogorov-Smirnov test, the Wilcoxon’s signed-rank test, the Mann-Whitney U test, and the Kruskal-Wallis test.
In this presentation, you will differentiate the ANOVA and ANCOVA statistical methods, and identify real-world situations where the ANOVA and ANCOVA methods for statistical inference are applied.
This document outlines an agenda for a university course on classification methods taught by Dr. S. Shivendu. The objectives of the course are to understand statistical concepts of principal component analysis and factor analysis, interpret results, and use SAS software. The agenda includes overviews of statistical concepts, principal component analysis, factor analysis, and textbooks. The document provides background information on these topics, including why they are used, basic concepts, properties, assumptions, and how the analyses work.
This document provides an overview of an introductory course on statistical concepts at the University of South Florida. It outlines the course objectives, which are to identify the course structure, recap foundational statistics concepts, and identify the programming structure in SAS. The agenda covers topics like data analytics, probability, statistical inference, distributions, and SAS basics. It also discusses key statistical thinking concepts like variation, inference from data, and the relationship between data, information, knowledge and wisdom. Hypothesis testing and its errors and power are explained. Issues with correlated data are also covered.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
Assessment and Planning in Educational technology.pptxKavitha Krishnan
In an education system, it is understood that assessment is only for the students, but on the other hand, the Assessment of teachers is also an important aspect of the education system that ensures teachers are providing high-quality instruction to students. The assessment process can be used to provide feedback and support for professional development, to inform decisions about teacher retention or promotion, or to evaluate teacher effectiveness for accountability purposes.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
A Survey of Techniques for Maximizing LLM Performance.pptx
Linear Regression
1. U N I V E R S I T Y O F S O U T H F L O R I D A //
Linear Regression Concepts
Dr. S. Shivendu
2. U N I V E R S I T Y O F S O U T H F L O R I D A // 2
Objectives
Linear Regression Concepts
Identify the mathematical basis of linear
regression.
01
Differentiate statistical inferences about
relationships based on regression output.
02
Analyze the concepts of p-value, hypothesis
testing, and confidence intervals, and their
interpretation.
03
3. U N I V E R S I T Y O F S O U T H F L O R I D A // 3
Agenda
Linear Regression Concepts
Regression Analysis
Introduction
Linear Regression
Concepts
Assumptions
Concepts
Coefficient Confidence Intervals
Concepts
Prediction Confidence Intervals
Concepts
4. U N I V E R S I T Y O F S O U T H F L O R I D A // 4
Models
A mathematical model is a mathematical expression of some phenomenon
Describe relationships between variables
Deterministic
Models
Probabilistic
Models
5. U N I V E R S I T Y O F S O U T H F L O R I D A // 5
Deterministic Models
Hypothesize exact relationships.
Suitable when the relationship is certain and known.
Example: Force is exactly mass times acceleration
F = m·a
6. U N I V E R S I T Y O F S O U T H F L O R I D A // 6
The relationship is not certain and all factors that impact
the outcome are not known
Hypothesize two components
Probabilistic Models
Deterministic and random error
Example: Sales volume (y) is 10 times advertising
spending (x) + random error
y = 10x +
The random error may be due to factors
other than advertising
7. U N I V E R S I T Y O F S O U T H F L O R I D A // 7
Regression Models
Answers: “What is the relationship between the variables?”
Equations used:
One numerical dependent (response) variable
Used mainly for estimating the strength of the relationship and
for prediction
One or more numerical or categorical independent
(explanatory) variables
8. U N I V E R S I T Y O F S O U T H F L O R I D A // 8
Regression Modeling Steps
Hypothesize the
deterministic
relationship
between the
response variable
(dependent
variable) and one
or more
explanatory
(independent
variables) in the
Population
Specify
probability
distribution of
random error
term. Estimate
the standard
deviation of the
error
Estimate
unknown model
parameters
Interpret the
estimated
parameters?
What is a
parameter?
9. U N I V E R S I T Y O F S O U T H F L O R I D A // 9
Model Specification is Based on Theory
Theory of field
(e.g., Sociology)
Mathematical
theory
Previous research
“Common sense”
10. U N I V E R S I T Y O F S O U T H F L O R I D A // 10
Types of Regression Models
Simple
1 Explanatory
Variable
Regression
Models
2+ Explanatory
Variables
Multiple
Linear Linear Non- Linear
Non- Linear
11. U N I V E R S I T Y O F S O U T H F L O R I D A // 11
Linear Regression Models
Relationship between variables is a linear function
y
Dependent (Response)
Variable
x
= + +
Population y - intercept Participation Slope Random Error
Independent (Explanatory)
Variable
0 1
12. U N I V E R S I T Y O F S O U T H F L O R I D A // 12
Population Linear Regression Model
y
x
0 1
i i i
y x
0 1
E y x
Observed value
Observed value
i = Random error
13. U N I V E R S I T Y O F S O U T H F L O R I D A // 13
Sample Linear Regression Model
y
x
0 1
ˆ ˆ ˆ
i i i
y x
0 1
ˆ ˆ
ˆi i
y x
Unsampled observation
i = Random error
Observed value
^
14. U N I V E R S I T Y O F S O U T H F L O R I D A // 14
Estimating Parameters: Least Squares Method
Hypothesize deterministic component
Estimate unknown model parameters
Regression Modeling Steps
Specify probability distribution of random error term
Evaluate model
Use model for prediction and estimation
15. U N I V E R S I T Y O F S O U T H F L O R I D A // 15
Scattergram
0
20
40
60
0 20 40 60
x
y
Plot of all (xi, yi) pairs
Suggests how well the model will fit
16. U N I V E R S I T Y O F S O U T H F L O R I D A // 16
Thinking Challenge
How would you draw a line
through the points?
0
20
40
60
0 20 40 60
x
y
How would you determine
which line fits best?
17. U N I V E R S I T Y O F S O U T H F L O R I D A // 17
Least Squares
“Best fit’ means the
difference between
actual y values and
estimated or predicted y
values are a minimum
2 2
1 1
ˆ ˆ
n n
i
i i
i i
y y
Positive differences off-set
negative
Least Squares minimizes
the Sum of the Squared
Differences (SSE)
18. U N I V E R S I T Y O F S O U T H F L O R I D A // 18
Least Squares Graphically
e2
y
x
e1 e3
e4
^
^
^
^
2 0 1 2 2
ˆ ˆ ˆ
y x
0 1
ˆ ˆ
ˆi i
y x
2 2 2 2 2
1 2 3 4
1
ˆ ˆ ˆ ˆ ˆ
LS minimizes
n
i
i
19. U N I V E R S I T Y O F S O U T H F L O R I D A // 19
Coefficient Equations
Prediction Equation
0 1
ˆ ˆ
ŷ x
1 1
1
1 2
1
2
1
ˆ
n n
i i
n
i i
i i
xy i
n
xx
i
n
i
i
i
x y
x y
SS n
SS
x
x
n
Slope
0 1
ˆ ˆ
y x
y-intercept
20. U N I V E R S I T Y O F S O U T H F L O R I D A // 20
Estimated y changes by 1 for each 1unit increase in x
Interpretation of Coefficients
If 1 = 2, then Sales (y) is expected to increase by 2 for each
1 unit increase in Advertising (x)
The average value of y when x = 0
If 0 = 4, then Average Sales (y) is expected to be 4 when
Advertising (x) is 0
Slope (1)
Y-Intercept (0)
^
^
^
^
21. U N I V E R S I T Y O F S O U T H F L O R I D A // 21
Parameter Estimation Computer Output
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354
0
^
1
^
ˆ .1 .7
y x
22. U N I V E R S I T Y O F S O U T H F L O R I D A // 22
Sales Volume (y) is expected to increase by .7 units for
each $1 increase in Advertising (x)
Coefficient Interpretation Solution
Average value of Sales Volume (y) is -.10 units when
Advertising (x) is 0
Difficult to explain to marketing manager
Expect some sales without advertising
Slope (1)
Y-Intercept (0)
^
^
^
^
23. U N I V E R S I T Y O F S O U T H F L O R I D A // 23
Probability Distribution of Random Error
Hypothesize deterministic component
Estimate unknown model parameters
Regression Modeling Steps
Specify probability distribution of random error term
Evaluate model
Use model for prediction and estimation
24. U N I V E R S I T Y O F S O U T H F L O R I D A // 24
Linear Regression Assumptions
The mean probability
distribution of error, ε, is
0
The probability
distribution of error, ε, is
approximately normally
distributed
The probability
distribution of error has
a constant variance
Errors are independent
25. U N I V E R S I T Y O F S O U T H F L O R I D A // 25
Error Probability Distribution
x1 x2 x3
y
E(y) = β0 + β1x
x
26. Variation of actual y from
predicted y, y
Random Error Variation
Measured by standard error of
regression model. Sample
standard deviation of : s
Affects several factors like
parameter significance and
prediction accuracy
27. U N I V E R S I T Y O F S O U T H F L O R I D A // 27
Variation Measures
y
x
xi
0 1
ˆ ˆ
ˆi i
y x
yi
2
ˆ
( )
i i
y y
Unexplained sum of
squares or SSE
2
( )
i
y y
Total sum of squares
2
ˆ
( )
i
y y
Explained sum of
squares
y
28. U N I V E R S I T Y O F S O U T H F L O R I D A // 28
Estimation of Variance of Error σ2
2
2
ˆ
2
i i
SSE
s where SSE y y
n
2
2
SSE
s s
n
29. U N I V E R S I T Y O F S O U T H F L O R I D A // 29
Residual Analysis
e Y Y
= -
i i
ˆ
Check the assumptions of regression by examining the residuals
Examine for linearity assumption
Evaluate independence assumption
Evaluate normal distribution assumption
Examine for constant variance for all levels of X (homoscedasticity)
The residual for observation i, ei, is the difference between its
observed and predicted value
30. U N I V E R S I T Y O F S O U T H F L O R I D A // 30
Residual Analysis for Linearity
Not Linear Linear
x
residuals
x
Y
x
Y
x
residuals
31. U N I V E R S I T Y O F S O U T H F L O R I D A // 31
Residual Analysis for Independence
Not Independent Independent
X
X
residuals
residuals
X
residuals
32. U N I V E R S I T Y O F S O U T H F L O R I D A // 32
Check for Normality
Examine the Sem-and-Leaf Display of the Residuals
Examine the Boxplot of the Residuals
Examine the Histogram of the Residuals
Construct a Normal Probability Plot of the Residuals
33. U N I V E R S I T Y O F S O U T H F L O R I D A // 33
Residual Analysis for Normality
Percent
Residual
When using a normal probability plot, normal errors
will approximately display in a straight line
-3 -2 -1 0 1 2 3
0
100
34. U N I V E R S I T Y O F S O U T H F L O R I D A // 34
Residual Analysis for Equal Variance
Non-constant variance Constant variance
x x
Y
x x
Y
residuals
residuals
35. U N I V E R S I T Y O F S O U T H F L O R I D A // 35
Interpreting the Model - Testing for Significance
Hypothesize deterministic component
Estimate unknown model parameters
Regression Modeling Steps
Specify probability distribution of random error term
Interpret model
36. U N I V E R S I T Y O F S O U T H F L O R I D A // 36
Test of Slope Coefficient
Shows if there is a linear
relationship between x
and y
Hypotheses:
Involves population
slope 1
Theoretical basis is
sampling distribution of
slope
H0: 1 = 0 (No Linear Relationship)
Ha: 1 0 (Linear Relationship)
37. U N I V E R S I T Y O F S O U T H F L O R I D A // 37
Sampling Distribution of Sample Slopes
y
Population Line
x
Sample 1 Line
Sample 2 Line
1
Sampling Distribution
1
1
S
^
^
All Possible
Sample Slopes
Sample 1: 2.5
Sample 2: 1.6
Sample 3: 1.8
Sample 4: 2.1
: :
Very large number of
sample slopes
38. U N I V E R S I T Y O F S O U T H F L O R I D A // 38
Slope Coefficient Test Statistic
1
1 1
ˆ
2
1
2
1
ˆ ˆ
2
where
xx
n
i
n
i
xx i
i
t df n
s
S
SS
x
SS x
n
39. U N I V E R S I T Y O F S O U T H F L O R I D A // 39
Test of Slope Coefficient Computer Output
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354
t = 1 / S
P-Value
S
1 1 1
^
^
^
^
40. U N I V E R S I T Y O F S O U T H F L O R I D A // 40
Prediction with Regression Models
Types of predictions
What is predicted?
Point estimates
Interval estimates
Population mean response E (y) for given x
Point on population regression line
Individual response (y) for given x
41. U N I V E R S I T Y O F S O U T H F L O R I D A // 41
Confidence Interval Estimate for Mean Value of y at x = x
xx
p
SS
x
x
n
S
t
y
2
2
/
1
ˆ
df = n – 2
p
42. U N I V E R S I T Y O F S O U T H F L O R I D A // 42
Factors Affecting Interval Width
Level of confidence (1 – )
Width increases as confidence increases
Data dispersion (s)
Width increases as variation increases
Sample size
Width decreases as sample size increases
Distance of x from mean x
Width increases as distance increases
p
-
43. U N I V E R S I T Y O F S O U T H F L O R I D A // 43
Prediction Interval of Individual Value of y at x = x
df = n – 2
p
2
/2
1
ˆ 1
p
xx
x x
y t S
n SS
44. U N I V E R S I T Y O F S O U T H F L O R I D A // 44
Key Takeaway
The statistical
interpretation is the
value proposition of
the linear
regression model
The statistical
interpretation
depends on
assumptions of the
linear model being
met
Understanding
outliers is critical for
drawing meaningful
inferences from the
linear regression
model
45. U N I V E R S I T Y O F S O U T H F L O R I D A //
You have reached the end
of the presentation.