The document discusses analysis of variance (ANOVA) and linear regression. It begins by explaining ANOVA, including one-way ANOVA, assumptions of ANOVA, and how to conduct a one-way ANOVA test. It then covers linear regression, including determining the linear regression equation, assessing model fitness, assumptions of regression, and using regression for prediction and estimation. Examples of both one-way ANOVA and simple linear regression are provided.
This document presents information on multivariate analysis of variance (MANOVA). It discusses when MANOVA is appropriate to use and its advantages over univariate ANOVA. Specifically, it notes that MANOVA considers multiple dependent variables simultaneously and is more powerful than conducting separate univariate tests. The document provides an example of a two-factor mixed MANOVA design investigating the effects of sex and chocolate type on ratings of chocolate taste, crunchiness, and flavor.
This chapter discusses analysis of variance (ANOVA) techniques. It outlines one-way ANOVA, which involves one categorical independent variable (factor) and a continuous dependent variable. The chapter describes how to conduct a one-way ANOVA by identifying variables, decomposing total variation, measuring effects sizes, testing significance, and interpreting results. An example is provided to illustrate these steps using data on store sales and in-store promotion levels.
This presentation discusses the procedure involved in two-way mixed ANOVA design. The procedure has been discussed by solving a problem using SPSS functionality.
Analysis of Variance and Repeated Measures DesignJ P Verma
This presentation discusses the basic concept used in analysis of variance and it shows the difference between independent measures ANOVA and Repeated measures ANOVA
This presentation discusses the application of discriminant analysis in sports research. One can understand the steps involved in the analysis and testing its assumptions.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 3: Describing, Exploring, and Comparing Data
3.3: Measures of Relative Standing and Boxplots
This chapter discusses building multiple regression models. It covers nonlinear variables in regression, qualitative variables and how to use them, and different model building techniques like stepwise regression, forward selection and backward elimination. The chapter aims to help students analyze and interpret nonlinear models, understand dummy variables, and learn how to build and evaluate multiple regression models and detect influential observations. It provides examples of solving regression problems and interpreting their results.
This document presents information on multivariate analysis of variance (MANOVA). It discusses when MANOVA is appropriate to use and its advantages over univariate ANOVA. Specifically, it notes that MANOVA considers multiple dependent variables simultaneously and is more powerful than conducting separate univariate tests. The document provides an example of a two-factor mixed MANOVA design investigating the effects of sex and chocolate type on ratings of chocolate taste, crunchiness, and flavor.
This chapter discusses analysis of variance (ANOVA) techniques. It outlines one-way ANOVA, which involves one categorical independent variable (factor) and a continuous dependent variable. The chapter describes how to conduct a one-way ANOVA by identifying variables, decomposing total variation, measuring effects sizes, testing significance, and interpreting results. An example is provided to illustrate these steps using data on store sales and in-store promotion levels.
This presentation discusses the procedure involved in two-way mixed ANOVA design. The procedure has been discussed by solving a problem using SPSS functionality.
Analysis of Variance and Repeated Measures DesignJ P Verma
This presentation discusses the basic concept used in analysis of variance and it shows the difference between independent measures ANOVA and Repeated measures ANOVA
This presentation discusses the application of discriminant analysis in sports research. One can understand the steps involved in the analysis and testing its assumptions.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 3: Describing, Exploring, and Comparing Data
3.3: Measures of Relative Standing and Boxplots
This chapter discusses building multiple regression models. It covers nonlinear variables in regression, qualitative variables and how to use them, and different model building techniques like stepwise regression, forward selection and backward elimination. The chapter aims to help students analyze and interpret nonlinear models, understand dummy variables, and learn how to build and evaluate multiple regression models and detect influential observations. It provides examples of solving regression problems and interpreting their results.
This document provides an overview and outline of Chapter 12 which covers the analysis of categorical data using two chi-square tests: the chi-square goodness-of-fit test and the chi-square test of independence. These tests are useful for analyzing nominal data, such as categories from market research, to determine if observed frequencies match expected distributions or if two variables are independent. The chapter also provides examples of solving problems using these tests and key terms related to categorical data analysis.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 3: Describing, Exploring, and Comparing Data
3.2: Measures of Variation
This document provides an overview of multiple linear regression analysis. It describes using multiple regression to model the relationship between a dependent variable and multiple independent variables. Key points covered include: setting up and interpreting a multiple regression equation; computing measures like the standard error, coefficient of determination, and adjusted coefficient of determination; conducting hypothesis tests on the regression coefficients and overall model; evaluating assumptions; and using residual analysis to validate the model. An example is presented using data on home heating costs to develop a multiple regression model relating costs to temperature, insulation, and furnace age.
This chapter discusses discriminant analysis, a statistical technique used when the dependent variable is categorical and the independent variables are continuous. Discriminant analysis develops functions to discriminate between categories and examines differences among groups in terms of predictors. It determines important predictors and classifies new cases while evaluating accuracy. The chapter covers conducting discriminant analysis, associated statistics, and similarities and differences between discriminant analysis, regression, and ANOVA.
Mr. Sanket Chordiya presented on optimization techniques like factorial design and fractional factorial design. He introduced key terminology used in design of experiments like factors, levels, responses, effects and interactions. Full factorial design involves studying all possible factors and levels, while fractional factorial design is used when there are many factors to reduce the number of experiments. Software like Design-Expert can be used to design factorial experiments and analyze results. Factorial designs find applications in formulation, processing, and studying pharmacokinetic parameters. A case study on sustained release metformin tablets was presented to illustrate a 23 factorial design.
This chapter discusses analysis of variance (ANOVA) techniques. It covers one-way and two-way ANOVA for comparing the means of three or more groups or populations. The chapter explains how to partition total variation into between-group and within-group components using sum of squares calculations. It also describes how to conduct the F-test and make inferences about differences in population means using ANOVA tables and significance tests. Multiple comparison procedures for identifying specific mean differences are also introduced.
This chapter introduces students to the design of experiments and analysis of variance. It covers one-way and two-way ANOVA, randomized block designs, and interaction. Students learn to compute and interpret results from one-way ANOVA, randomized block designs, and two-way ANOVA. They also learn about multiple comparison tests and when to use them to analyze differences between specific treatment means.
The KNN (K Nearest Neighbors) algorithm analyzes all available data points and classifies this data, then classifies new cases based on these established categories. It is useful for recognizing patterns and for estimating. The KNN Classification algorithm is useful in determining probable outcome and results, and in forecasting and predicting results, given the existence of multiple variables.
The document discusses fractional factorial designs, which use a fraction of the total number of combinations in a full factorial design to reduce the number of required runs. It describes how effects become confounded in fractional designs and how design resolution relates to confounding. It provides examples of 2-level and 3-level fractional factorial designs, and discusses other types of designs like Plackett-Burman, central composite, and Taguchi designs. The key benefits of fractional factorial designs are reducing the number of required runs when there are many factors to investigate.
This document discusses measures of central tendency (mean, median, mode) and measures of spread (range, variance, standard deviation). It provides formulas and examples to calculate each measure. It also presents two problems, asking to calculate and compare various descriptive statistics for different data sets, such as milk yields from two cow herds and weaning weights of lambs from two breeds. A third problem asks to analyze and compare price data for rice from two markets.
This chapter discusses two-sample hypothesis tests for comparing population means and proportions between two independent samples, and between two related samples. It introduces tests for comparing the means of two independent populations, two related populations, and the proportions of two independent populations. The key tests covered are the pooled variance t-test for independent samples with equal variances, separate variance t-test for independent samples with unequal variances, and the paired t-test for related samples. Examples are provided to demonstrate how to calculate the test statistic and conduct hypothesis tests to compare sample means and determine if they are statistically different. Confidence intervals for the difference between two means are also discussed.
This overview discusses the predictive analytical technique known as Random Forest Regression, a method of analysis that creates a set of Decision Trees from a randomly selected subset of the training set, and aggregates by averaging values from different decision trees to decide the final target value. This technique is useful to determine which predictors have a significant impact on the target values, e.g., the impact of average rainfall, city location, parking availability, distance from hospital, and distance from shopping on the price of a house, or the impact of years of experience, position and productive hours on employee salary. Random Forest Regression is limited to predicting numeric output so the dependent variable has to be numeric in nature. The minimum sample size is 20 cases per independent variable. Random Forest Regression is just one of the numerous predictive analytical techniques and algorithms included in the Assisted Predictive Modeling module of the Smarten augmented analytics solution. This solution is designed to serve business users with sophisticated tools that are easy to use and require no data science or technical skills. Smarten is a representative vendor in multiple Gartner reports including the Gartner Modern BI and Analytics Platform report and the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms Report.
This document provides an overview of experimental design and analysis of variance (ANOVA). It defines key terms like independent and dependent variables, experimental units, treatments, and blocks. It explains different types of experimental designs like completely randomized designs, randomized block designs, and factorial experiments. It also covers ANOVA computations and assumptions for one-way and randomized block ANOVA models. Multiple comparison procedures like Tukey's HSD are introduced to identify differences between specific treatment means. Examples are provided to demonstrate applications of one-way and randomized block ANOVA.
S3 - Process product optimization design experiments response surface methodo...CAChemE
Session 3/4 – Central composite designs, second order models, ANOVA, blocking, qualitative factors
An intensive practical course mainly for PhD-students on the use of designs of experiments (DOE) and response surface methodology (RSM) for optimization problems. The course covers relevant background, nomenclature and general theory of DOE and RSM modelling for factorial and optimisation designs in addition to practical exercises in Matlab. Due to time limitations, the course concentrates on linear and quadratic models on the k≤3 design dimension. This course is an ideal starting point for every experimental engineering wanting to work effectively, extract maximal information and predict the future behaviour of their system.
Mikko Mäkelä (DSc, Tech) is a postdoctoral fellow at the Swedish University of Agricultural Sciences in Umeå, Sweden and is currently visiting the Department of Chemical Engineering at the University of Alicante. He is working in close cooperation with Paul Geladi, Professor of Chemometrics, and using DOE and RSM for process optimization mainly for the valorization of industrial wastes in laboratory and pilot scales.”
The course took place at the University of Alicante and would not had been possible without the support of the Instituto Universitario de Ingeniería de Procesos Químicos.
This chapter discusses chi-square tests and nonparametric tests. It covers chi-square tests for contingency tables to test differences between two or more proportions, including computing expected frequencies. The Marascuilo procedure is introduced for determining pairwise differences when proportions are found to be unequal. Chi-square tests of independence are discussed for contingency tables with more than two variables to test if the variables are independent. Nonparametric tests are also introduced. Examples are provided to demonstrate chi-square goodness of fit tests and tests of independence.
This document provides information on general factor factorial designs. It defines factorial designs as experiments that study the effects of two or more factors by investigating all possible combinations of the factors' levels. Factorial designs are more efficient than one-factor-at-a-time experiments and allow for the estimation of factor effects at different levels of other factors. However, factorial designs become prohibitively large as the number of factors increases and can be difficult to interpret when interactions are present. The document also provides examples of designing two-factor factorial experiments using completely randomized and randomized complete block designs.
The chapter discusses different scales of measurement used in marketing research including nominal, ordinal, interval, and ratio scales. It compares primary methods of scaling such as paired comparisons, rank ordering, and constant sum scaling. These scaling techniques can be used to measure preferences, attitudes, and perceptions in both comparative and noncomparative ways.
This chapter discusses noncomparative scaling techniques used in marketing research to measure attitudes, opinions, and characteristics without direct comparison between objects. It describes continuous rating scales where respondents indicate their rating along a line, and itemized rating scales including the Likert scale involving levels of agreement, semantic differential scales using bipolar adjective pairs, and Stapel scales using a numbered unipolar format. The chapter also covers decisions in designing these scales and evaluating their measurement properties.
This document provides information on performing a one-way analysis of variance (ANOVA). It discusses the F-distribution, key terms used in ANOVA like factors and treatments, and how to calculate and interpret an ANOVA test statistic. An example demonstrates how to conduct a one-way ANOVA to determine if three golf clubs produce different average driving distances.
This document provides an overview of data analysis techniques including analysis of variance (ANOVA), regression, correlation, and multivariate statistical analysis. It discusses understanding and interpreting ANOVA, regression, correlation matrices, and exploring factor analysis, multiple discriminant analysis, and cluster analysis. The document also provides examples of interpreting statistical output from ANOVA, regression, and correlation analysis.
The document discusses one-way analysis of variance (ANOVA), which compares the means of three or more populations. It provides an example where sales data from three marketing strategies are analyzed using ANOVA. The null hypothesis is that the population means are equal, and it is rejected since the F-statistic is greater than the critical value, indicating at least one mean is significantly different. Post-hoc comparisons using the Bonferroni method find that Strategy 2 (emphasizing quality) has significantly higher sales than Strategy 1 (emphasizing convenience).
This document provides an overview and outline of Chapter 12 which covers the analysis of categorical data using two chi-square tests: the chi-square goodness-of-fit test and the chi-square test of independence. These tests are useful for analyzing nominal data, such as categories from market research, to determine if observed frequencies match expected distributions or if two variables are independent. The chapter also provides examples of solving problems using these tests and key terms related to categorical data analysis.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 3: Describing, Exploring, and Comparing Data
3.2: Measures of Variation
This document provides an overview of multiple linear regression analysis. It describes using multiple regression to model the relationship between a dependent variable and multiple independent variables. Key points covered include: setting up and interpreting a multiple regression equation; computing measures like the standard error, coefficient of determination, and adjusted coefficient of determination; conducting hypothesis tests on the regression coefficients and overall model; evaluating assumptions; and using residual analysis to validate the model. An example is presented using data on home heating costs to develop a multiple regression model relating costs to temperature, insulation, and furnace age.
This chapter discusses discriminant analysis, a statistical technique used when the dependent variable is categorical and the independent variables are continuous. Discriminant analysis develops functions to discriminate between categories and examines differences among groups in terms of predictors. It determines important predictors and classifies new cases while evaluating accuracy. The chapter covers conducting discriminant analysis, associated statistics, and similarities and differences between discriminant analysis, regression, and ANOVA.
Mr. Sanket Chordiya presented on optimization techniques like factorial design and fractional factorial design. He introduced key terminology used in design of experiments like factors, levels, responses, effects and interactions. Full factorial design involves studying all possible factors and levels, while fractional factorial design is used when there are many factors to reduce the number of experiments. Software like Design-Expert can be used to design factorial experiments and analyze results. Factorial designs find applications in formulation, processing, and studying pharmacokinetic parameters. A case study on sustained release metformin tablets was presented to illustrate a 23 factorial design.
This chapter discusses analysis of variance (ANOVA) techniques. It covers one-way and two-way ANOVA for comparing the means of three or more groups or populations. The chapter explains how to partition total variation into between-group and within-group components using sum of squares calculations. It also describes how to conduct the F-test and make inferences about differences in population means using ANOVA tables and significance tests. Multiple comparison procedures for identifying specific mean differences are also introduced.
This chapter introduces students to the design of experiments and analysis of variance. It covers one-way and two-way ANOVA, randomized block designs, and interaction. Students learn to compute and interpret results from one-way ANOVA, randomized block designs, and two-way ANOVA. They also learn about multiple comparison tests and when to use them to analyze differences between specific treatment means.
The KNN (K Nearest Neighbors) algorithm analyzes all available data points and classifies this data, then classifies new cases based on these established categories. It is useful for recognizing patterns and for estimating. The KNN Classification algorithm is useful in determining probable outcome and results, and in forecasting and predicting results, given the existence of multiple variables.
The document discusses fractional factorial designs, which use a fraction of the total number of combinations in a full factorial design to reduce the number of required runs. It describes how effects become confounded in fractional designs and how design resolution relates to confounding. It provides examples of 2-level and 3-level fractional factorial designs, and discusses other types of designs like Plackett-Burman, central composite, and Taguchi designs. The key benefits of fractional factorial designs are reducing the number of required runs when there are many factors to investigate.
This document discusses measures of central tendency (mean, median, mode) and measures of spread (range, variance, standard deviation). It provides formulas and examples to calculate each measure. It also presents two problems, asking to calculate and compare various descriptive statistics for different data sets, such as milk yields from two cow herds and weaning weights of lambs from two breeds. A third problem asks to analyze and compare price data for rice from two markets.
This chapter discusses two-sample hypothesis tests for comparing population means and proportions between two independent samples, and between two related samples. It introduces tests for comparing the means of two independent populations, two related populations, and the proportions of two independent populations. The key tests covered are the pooled variance t-test for independent samples with equal variances, separate variance t-test for independent samples with unequal variances, and the paired t-test for related samples. Examples are provided to demonstrate how to calculate the test statistic and conduct hypothesis tests to compare sample means and determine if they are statistically different. Confidence intervals for the difference between two means are also discussed.
This overview discusses the predictive analytical technique known as Random Forest Regression, a method of analysis that creates a set of Decision Trees from a randomly selected subset of the training set, and aggregates by averaging values from different decision trees to decide the final target value. This technique is useful to determine which predictors have a significant impact on the target values, e.g., the impact of average rainfall, city location, parking availability, distance from hospital, and distance from shopping on the price of a house, or the impact of years of experience, position and productive hours on employee salary. Random Forest Regression is limited to predicting numeric output so the dependent variable has to be numeric in nature. The minimum sample size is 20 cases per independent variable. Random Forest Regression is just one of the numerous predictive analytical techniques and algorithms included in the Assisted Predictive Modeling module of the Smarten augmented analytics solution. This solution is designed to serve business users with sophisticated tools that are easy to use and require no data science or technical skills. Smarten is a representative vendor in multiple Gartner reports including the Gartner Modern BI and Analytics Platform report and the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms Report.
This document provides an overview of experimental design and analysis of variance (ANOVA). It defines key terms like independent and dependent variables, experimental units, treatments, and blocks. It explains different types of experimental designs like completely randomized designs, randomized block designs, and factorial experiments. It also covers ANOVA computations and assumptions for one-way and randomized block ANOVA models. Multiple comparison procedures like Tukey's HSD are introduced to identify differences between specific treatment means. Examples are provided to demonstrate applications of one-way and randomized block ANOVA.
S3 - Process product optimization design experiments response surface methodo...CAChemE
Session 3/4 – Central composite designs, second order models, ANOVA, blocking, qualitative factors
An intensive practical course mainly for PhD-students on the use of designs of experiments (DOE) and response surface methodology (RSM) for optimization problems. The course covers relevant background, nomenclature and general theory of DOE and RSM modelling for factorial and optimisation designs in addition to practical exercises in Matlab. Due to time limitations, the course concentrates on linear and quadratic models on the k≤3 design dimension. This course is an ideal starting point for every experimental engineering wanting to work effectively, extract maximal information and predict the future behaviour of their system.
Mikko Mäkelä (DSc, Tech) is a postdoctoral fellow at the Swedish University of Agricultural Sciences in Umeå, Sweden and is currently visiting the Department of Chemical Engineering at the University of Alicante. He is working in close cooperation with Paul Geladi, Professor of Chemometrics, and using DOE and RSM for process optimization mainly for the valorization of industrial wastes in laboratory and pilot scales.”
The course took place at the University of Alicante and would not had been possible without the support of the Instituto Universitario de Ingeniería de Procesos Químicos.
This chapter discusses chi-square tests and nonparametric tests. It covers chi-square tests for contingency tables to test differences between two or more proportions, including computing expected frequencies. The Marascuilo procedure is introduced for determining pairwise differences when proportions are found to be unequal. Chi-square tests of independence are discussed for contingency tables with more than two variables to test if the variables are independent. Nonparametric tests are also introduced. Examples are provided to demonstrate chi-square goodness of fit tests and tests of independence.
This document provides information on general factor factorial designs. It defines factorial designs as experiments that study the effects of two or more factors by investigating all possible combinations of the factors' levels. Factorial designs are more efficient than one-factor-at-a-time experiments and allow for the estimation of factor effects at different levels of other factors. However, factorial designs become prohibitively large as the number of factors increases and can be difficult to interpret when interactions are present. The document also provides examples of designing two-factor factorial experiments using completely randomized and randomized complete block designs.
The chapter discusses different scales of measurement used in marketing research including nominal, ordinal, interval, and ratio scales. It compares primary methods of scaling such as paired comparisons, rank ordering, and constant sum scaling. These scaling techniques can be used to measure preferences, attitudes, and perceptions in both comparative and noncomparative ways.
This chapter discusses noncomparative scaling techniques used in marketing research to measure attitudes, opinions, and characteristics without direct comparison between objects. It describes continuous rating scales where respondents indicate their rating along a line, and itemized rating scales including the Likert scale involving levels of agreement, semantic differential scales using bipolar adjective pairs, and Stapel scales using a numbered unipolar format. The chapter also covers decisions in designing these scales and evaluating their measurement properties.
This document provides information on performing a one-way analysis of variance (ANOVA). It discusses the F-distribution, key terms used in ANOVA like factors and treatments, and how to calculate and interpret an ANOVA test statistic. An example demonstrates how to conduct a one-way ANOVA to determine if three golf clubs produce different average driving distances.
This document provides an overview of data analysis techniques including analysis of variance (ANOVA), regression, correlation, and multivariate statistical analysis. It discusses understanding and interpreting ANOVA, regression, correlation matrices, and exploring factor analysis, multiple discriminant analysis, and cluster analysis. The document also provides examples of interpreting statistical output from ANOVA, regression, and correlation analysis.
The document discusses one-way analysis of variance (ANOVA), which compares the means of three or more populations. It provides an example where sales data from three marketing strategies are analyzed using ANOVA. The null hypothesis is that the population means are equal, and it is rejected since the F-statistic is greater than the critical value, indicating at least one mean is significantly different. Post-hoc comparisons using the Bonferroni method find that Strategy 2 (emphasizing quality) has significantly higher sales than Strategy 1 (emphasizing convenience).
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
help.mbaassignments@gmail.com
or
call us at : 08263069601
This document provides information on chi-square tests and other statistical tests for qualitative data analysis. It discusses the chi-square test for goodness of fit and independence. It also covers Fisher's exact test and McNemar's test. Examples are provided to illustrate chi-square calculations and how to determine statistical significance based on degrees of freedom and critical values. Assumptions and criteria for applying different tests are outlined.
The document discusses various statistical techniques for data analysis, including chi-square tests and analysis of variance (ANOVA). It provides an example of using a chi-square test to determine if there is a relationship between undergraduate degree and MBA major using data from a contingency table. Key steps include calculating expected frequencies, comparing observed and expected values, and determining if the calculated chi-square test statistic falls in the rejection region. Requirements for chi-square tests, such as satisfying the rule of five, are also covered. The document then briefly discusses using ANOVA to analyze differences between three or more independent groups for interval or ratio data.
In the t test for independent groups, ____.we estimate µ1 µ2.docxbradburgess22840
In the t test for independent groups, ____.
we estimate µ1 µ2
we estimate 2
we estimate X1-X2
df = N 1
Exhibit 14-1
A professor of women's studies is interested in determining if stress affects the menstrual cycle. Ten women are randomly sampled for an experiment and randomly divided into two groups. One of the groups is subjected to high stress for two months while the other lives in a relatively stress-free environment. The professor measures the menstrual cycle (in days) of each woman during the second month. The following data are obtained.
High stress
20
23
18
19
22
Relatively stress free
26
31
25
26
30
Refer to Exhibit 14-1. The obtained value of the appropriate statistic is ____.
tobt = 4.73
tobt = 4.71
tobt = 3.05
tobt = 0.47
Refer to Exhibit 14-1. The df for determining tcrit are ____.
4
9
8
3
Refer to Exhibit 14-1. Using = .052 tail, tcrit = ____.
+2.162
+2.506
±2.462
±2.306
Refer to Exhibit 14-1. Using = .052 tail, your conclusion is ____.
accept H0; stress does not affect the menstrual cycle
retain H0; we cannot conclude that stress affects the menstrual cycle
retain H0; stress affects the menstrual cycle
reject H0; stress affects the menstrual cycle
Refer to Exhibit 14-1. Estimate the size of the effect. = ____
0.8102
0.6810
0.4322
0.5776
A major advantage to using a two condition experiment (e.g. control and experimental groups) is ____.
the test has more power
the data are easier to analyze
the experiment does not need to know population parameters
the test has less power
Which of the following tests analyzes the difference between the means of two independent samples?
correlated t test
t test for independent groups
sign test
test of variance
If n1 = n2 and n is relatively large, then the t test is relatively robust against ____.
violations of the assumptions of homogeneity of variance and normality
violations of random samples
traffic violations
violations by the forces of evil
Exhibit 14-3
Five students were tested before and after taking a class to improve their study habits. They were given articles to read which contained a known number of facts in each story. After the story each student listed as many facts as he/she could recall. The following data was recorded.
Before
10
12
14
16
12
After
15
14
17
17
20
Refer to Exhibit 14-3. The obtained value of the appropriate statistic is ____.
3.92
3.06
4.12
2.58
Refer to Exhibit 14-3. What do you conclude using = 0.052 tail?
reject H0; the class appeared to improve study habits
retain H0; the class had no effect on study habits
retain H0; we cannot conclude that the class improved study habits
accept H0; the class appeared to improve study habits
Which of the following is (are) assumption(s) underlying the use of the F test?
the raw score populations are normally distributed
the variances of the raw score populations are the same
the mean of the populations differ
the raw score popul.
The document presents a case study where Lisa wants to open a beauty store and needs data to support her belief that women in her local area spend more than the national average of $59 every 3 months on fragrance products. Lisa takes a random sample of 25 women in her area and finds the sample mean is $68.10 with a standard deviation of $14.46. She conducts a one-sample t-test to test if the population mean is greater than $59. The test statistic is 3.1484 with a p-value of 0.0021, which is less than the significance level of 0.05. Therefore, there is sufficient evidence to conclude that the population mean is indeed greater than $
An overview of the significance of SURE(Seemingly unrelated regression) model in Panel data econometrics and its applications.
The presentation consists of the theoretical background and mathematical derivation for the model. The stochastic frontier model and treatment effects are also discussed in brief.
This document discusses logistic regression, including:
- Logistic regression can be used when the dependent variable is binary and predicts the probability of an event occurring.
- The logistic regression equation calculates the log odds of an event occurring based on independent variables.
- Logistic regression is commonly used in medical research when variables are a mix of categorical and continuous.
This document provides an overview of analysis of variance (ANOVA), including:
- ANOVA is used to compare means of three or more populations using an F-test. It assumes normal distributions, independence, and equal variances.
- Between-group and within-group variances are calculated to determine the F-value. If F exceeds the critical value, the null hypothesis of equal means is rejected.
- Two-way ANOVA extends the technique to analyze two independent variables and their interaction effects on a dependent variable. Graphs can show interactions like disordinal, ordinal, or no interaction.
In a left-tailed test comparing two means with variances unknown b.docxbradburgess22840
In a left-tailed test comparing two means with variances unknown but assumed to be equal, the sample sizes were n1 = 8 and n2 = 12. At α = .05, the critical value would be:
-1.645
-2.101
-1.734
-1.960
In the t test for independent groups, ____.
we estimate µ1 µ2
we estimate 2
we estimate X1-X2
df = N 1
Exhibit 14-1
A professor of women's studies is interested in determining if stress affects the menstrual cycle. Ten women are randomly sampled for an experiment and randomly divided into two groups. One of the groups is subjected to high stress for two months while the other lives in a relatively stress-free environment. The professor measures the menstrual cycle (in days) of each woman during the second month. The following data are obtained.
High stress
20
23
18
19
22
Relatively stress free
26
31
25
26
30
Refer to Exhibit 14-1. The obtained value of the appropriate statistic is ____.
tobt = 4.73
tobt = 4.71
tobt = 3.05
tobt = 0.47
Refer to Exhibit 14-1. The df for determining tcrit are ____.
4
9
8
3
Refer to Exhibit 14-1. Using = .052 tail, tcrit = ____.
+2.162
+2.506
±2.462
±2.306
Refer to Exhibit 14-1. Using = .052 tail, your conclusion is ____.
accept H0; stress does not affect the menstrual cycle
retain H0; we cannot conclude that stress affects the menstrual cycle
retain H0; stress affects the menstrual cycle
reject H0; stress affects the menstrual cycle
Refer to Exhibit 14-1. Estimate the size of the effect. = ____
0.8102
0.6810
0.4322
0.5776
A major advantage to using a two condition experiment (e.g. control and experimental groups) is ____.
the test has more power
the data are easier to analyze
the experiment does not need to know population parameters
the test has less power
Which of the following tests analyzes the difference between the means of two independent samples?
correlated t test
t test for independent groups
sign test
test of variance
If n1 = n2 and n is relatively large, then the t test is relatively robust against ____.
violations of the assumptions of homogeneity of variance and normality
violations of random samples
traffic violations
violations by the forces of evil
Exhibit 14-3
Five students were tested before and after taking a class to improve their study habits. They were given articles to read which contained a known number of facts in each story. After the story each student listed as many facts as he/she could recall. The following data was recorded.
Before
10
12
14
16
12
After
15
14
17
17
20
Refer to Exhibit 14-3. The obtained value of the appropriate statistic is ____.
3.92
3.06
4.12
2.58
Refer to Exhibit 14-3. What do you conclude using = 0.052 tail?
reject H0; the class appeared to improve study habits
retain H0; the class had no effect on study habits
retain H0; we cannot conclude that the class improved study habits
accept H0; the class appeared to improve study habits
Which of the following is (are) assumption(.
This document provides an introduction and overview of analysis of variance (ANOVA). It discusses one-way and two-way ANOVA. For one-way ANOVA, it defines the technique, provides notation for hypotheses testing, and works through an example comparing sales data from three marketing strategy groups. It notes the limitations of one-way ANOVA for this example and introduces two-way ANOVA as a way to analyze the effects of two factors - marketing strategy and advertising media. Two-way ANOVA allows testing of differences in means for each factor and any interactions between factors.
ForecastingBUS255 GoalsBy the end of this chapter, y.docxbudbarber38650
Forecasting
BUS255
Goals
By the end of this chapter, you should know:
Importance of Forecasting
Various Forecasting Techniques
Choosing a Forecasting Method
2
Forecasting
Forecasts are done to predict future events for planning
Finance, human resources, marketing, operations, and supply chain managers need forecasts to plan
Forecasts are made on many different variables
Forecasts are important to managing both processes and managing supply chains
3
Key Decisions in Forecasting
Deciding what to forecast
Level of aggregation
Units of measurement
Choosing a forecasting system
Choosing a forecasting technique
4
5
Forecasting Techniques
Qualitative (Judgment) Methods
Sales force Estimates
Time-series Methods
Naïve Method
Causal Methods
Executive Opinion
Market Research
Delphi Method
Moving Averages
Exponential Smoothing
Regression Analysis
Qualitative (Judgment) methods
Salesforce estimates
Executive opinion
Market Research
The Delphi Method
Salesforce estimates: Forecasts derived from estimates provided by salesforce.
Executive opinion: Method in which opinions, experience, and technical knowledge of one or more managers are summarized to arrive at a single forecast.
Market research: A scientific study and analysis of data gathered from consumer surveys intended to learn consumer interest in a product or service.
Delphi method: A process of gaining consensus from a group of experts while maintaining their anonymity.
6
Case Study
Reference: Krajewski, Ritzman, Malhotra. (2010). Operations Management: Processes and Supply Chains, Ninth Edition. Pearson Prentice Hall. P. 42-43.
7
Case study questions
What information system is used by UNILEVER to manage forecasts?
What does UNILEVER do when statistical information is not useful for forecasting?
What types of qualitative methods are used by UNILEVER?
What were some suggestions provided to improve forecasting?
8
Causal methods – Linear Regression
A dependent variable is related to one or more independent variables by a linear equation
The independent variables are assumed to “cause” the results observed in the past
Simple linear regression model assumes a straight line relationship
9
Causal methods – Linear Regression
Y = a + bX
where
Y = dependent variable
X = independent variable
a = Y-intercept of the line
b = slope of the line
10
Causal methods – Linear Regression
Fit of the regression model
Coefficient of determination
Standard error of the estimate
Please go to in-class exercise sheet
Coefficient of determination: Also called r-squared. Measures the amount of variation in the dependent variable about its mean that is explained by the regression line. Range between 0 and 1. In general, larger values are better.
Standard error of the estimate: Measures how closely the data on the dependent variable cluster around the regression line. Smaller values are better.
11
Time Series
A time seri.
This document discusses strategies for designing factorial experiments with multiple factors. It explains that factorial experiments involve studying the effect of varying levels of factors on a response variable. The optimal design strategy depends on whether the circumstances are unusual or normal. For normal circumstances where there is some noise and factors influence each other, a fractional factorial or full factorial design is typically best. The document provides details on analyzing the data from factorial experiments to determine if factor effects and interactions are significant. It includes examples of calculating main effects and interactions from 2-level factorial data.
1. A multiple regression analysis was conducted to predict problems related to drug use from self-efficacy, marijuana use, self-control, peer norms, and two dummy coded variables for race while controlling for other variables. The regression model accounted for 14% of the variance in drug problems.
2. Marijuana use and self-efficacy significantly predicted drug problems, with more marijuana use and lower self-efficacy associated with greater problems. Assumption checks found no significant violations.
3. A stepwise regression selected a two-predictor model with marijuana use and self-efficacy as significant predictors of drug problems.
This document provides an overview of analysis of variance (ANOVA) techniques. It discusses one-way ANOVA, which evaluates differences between three or more population means. Key aspects covered include partitioning total variation into between- and within-group components, assumptions of normality and equal variances, and using the F-test to test for differences. Randomized block ANOVA and two-factor ANOVA are also introduced as extensions to control for additional variables. Post-hoc tests like Tukey and Fisher's LSD are described for determining specific mean differences.
Calculating Analysis of Variance (ANOVA) and Post Hoc Analyses Follo.docxaman341480
Calculating Analysis of Variance (ANOVA) and Post Hoc Analyses Following ANOVA
Analysis of variance (ANOVA)
is a statistical procedure that compares data between two or more groups or conditions to investigate the presence of differences between those groups on some continuous dependent variable (see
Exercise 18
). In this exercise, we will focus on the
one-way ANOVA
, which involves testing one independent variable and one dependent variable (as opposed to other types of ANOVAs, such as factorial ANOVAs that incorporate multiple independent variables).
Why ANOVA and not a
t
-test? Remember that a
t
-test is formulated to compare two sets of data or two groups at one time (see
Exercise 23
for guidance on selecting appropriate statistics). Thus, data generated from a clinical trial that involves four experimental groups, Treatment 1, Treatment 2, Treatments 1 and 2 combined, and a Control, would require 6
t
-tests. Consequently, the chance of making a Type I error (alpha error) increases substantially (or is inflated) because so many computations are being performed. Specifically, the chance of making a Type I error is the number of comparisons multiplied by the alpha level. Thus, ANOVA is the recommended statistical technique for examining differences between more than two groups (
Zar, 2010
).
ANOVA is a procedure that culminates in a statistic called the
F
statistic. It is this value that is compared against an
F
distribution (see
Appendix C
) in order to determine whether the groups significantly differ from one another on the dependent variable. The formulas for ANOVA actually compute two estimates of variance: One estimate represents differences between the groups/conditions, and the other estimate represents differences among (within) the data.
Research Designs Appropriate for the One-Way ANOVA
Research designs that may utilize the one-way ANOVA include the randomized experimental, quasi-experimental, and comparative designs (
Gliner, Morgan, & Leech, 2009
). The independent variable (the “grouping” variable for the ANOVA) may be active or attributional. An active independent variable refers to an intervention, treatment, or program. An attributional independent variable refers to a characteristic of the participant, such as gender, diagnosis, or ethnicity. The ANOVA can compare two groups or more. In the case of a two-group design, the researcher can either select an independent samples
t
-test or a one-way ANOVA to answer the research question. The results will always yield the same conclusion, regardless of which test is computed; however, when examining differences between more than two groups, the one-way ANOVA is the preferred statistical test.
Example 1: A researcher conducts a randomized experimental study wherein she randomizes participants to receive a high-dosage weight loss pill, a low-dosage weight loss pill, or a placebo. She assesses the number of pounds lost from baseline to post-treatment
378
for the thre ...
This document discusses quantitative and qualitative data analysis techniques. It covers:
- Displays for numerical (frequency charts, histograms) and categorical data (bar charts, pie charts, contingency tables).
- Measures for numerical data including mean, median, mode, range, variance, standard deviation, and quartiles.
- Scatter plots to examine relationships between two quantitative variables and measures of association like covariance and correlation coefficient.
- Contingency tables to study relationships between two categorical variables and examine dependency/independency.
- An example analyzing Titanic passenger data using contingency tables to examine the "first-class passengers first" policy.
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
3. BUS B272 Unit 1 Analysis of Variance The Analysis of Variance (ANOVA) is a procedure that tests to determine whether differences exist between two or more populations. The techniques analyzes the variance of the data to determine whether we can infer that the populations differ.
4. One way (Single-factor) analysis of variance ANOVA assumptions F test for difference among k means BUS B272 Unit 1 Topics
5. BUS B272 Unit 1 General Experimental Setting Investigator controls one or more independent variables Called treatments or factors Each treatment contains two or more levels (or categories/classifications) Observe effects on dependent variable Response to different levels of independent variable Experimental design: the plan used to test hypothesis
6. BUS B272 Unit 1 Completely Randomized Design Experimental units (subjects) are assigned randomly to treatments Subjects are assumed homogeneous Only one factor or independent variable With two or more treatment levels Analyzed by One-way analysis of variance (one-way ANOVA)
7. BUS B272 Unit 1 Randomized Design Example
8. BUS B272 Unit 1 One-way Analysis of Variance F Test Evaluate the difference among the mean responses of 2 or more (k) populations e.g. : Several types of tires, oven temperature settings, different types of marketing strategies
14. BUS B272 Unit 1 Hypotheses of One-Way ANOVA All population means are equal No treatment effect (no variation in means among groups) At least one population mean is different (others may be the same!) There is treatment effect Does not mean that all population means are different
15. BUS B272 Unit 1 One-way ANOVA (No Treatment Effect) The Null Hypothesis is True
16. BUS B272 Unit 1 One-way ANOVA (Treatment Effect Present) The Null Hypothesis is NOT True
17. BUS B272 Unit 1 One-way ANOVA(Partition of Total Variation) Total Variation SS(Total) Variation Due to Treatment SST Variation Due to Random Sampling SSE + =
19. BUS B272 Unit 1 Total Variation : the i-th observation in group j : the number of observations in group j n : the total number of observations in all groups k : the number of groups the overall or grand mean
21. BUS B272 Unit 1 Among-Treatments Variation Variation Due to Differences Among Groups
22. BUS B272 Unit 1 Among-Treatments Variation (continued)
23. BUS B272 Unit 1 Summing the variation within each treatment and then adding over all treatments. Within-Treatment Variation
24. BUS B272 Unit 1 Within-Treatment Variation (continued)
25.
26. For 2 groups, use t-test. F test is more limited.For k = 2, this is the pooled-variance in the t-test.
27. BUS B272 Unit 1 One-way ANOVAF Test Statistic Test statistic: MST is mean squares among or between variances MSE is mean squares within or error variances Degrees of freedom:
29. BUS B272 Unit 1 Features of One-way ANOVA F Statistic The F statistic is the ratio of the among estimate of variance and the within estimate of variance. The ratio must always be positive df1 = k -1 will typically be small df2 = n - k will typically be large The ratio should be closed to 1 if the null is true.
30. BUS B272 Unit 1 One-way ANOVA F Test Example As production manager, you want to see if three filling machines have different mean filling times. You assign 15 similarly trained and experienced workers, five per machine, to the machines. At the 0.05 significance level, is there a difference in mean filling times? Machine1Machine2Machine3 25.40 23.40 20.00 26.31 21.80 22.20 24.10 23.50 19.75 23.74 22.75 20.60 25.10 21.60 20.40
34. BUS B272 Unit 1 Summary Table MST/MSE =25.602 3-1=2 47.1640 23.5820 15-3=12 11.0532 0.9211 15-1=14 58.2172
35. BUS B272 Unit 1 = 0.05 F 0 One-way ANOVA Example Solution Critical Value(s): H0: 1 = 2 = 3 H1: Not all the means are equal Test Statistic: 3.89 df1= 2 df2 = 12 Reject H0 at = 0.05 There is evidence to believe that at least one i differs from the rest.
36. BUS B272 Unit 1 Computer Application To obtain the Microsoft Excel computer output in the previous page, first enter the data into c columns in an Excel file, then follow the commands: Tools/ Data Analysis/ Anova: Single Factor
37. BUS B272 Unit 1 Computer Output using Data Analysis of Excel
38. Exercise 1 The manager of a large department store wants to test if the average size of customer transactions differs with four types of payment: Visa card, company card, cash or cheque. If there are differences in the average customer transaction size among the four types of payment, the manager will further investigate which types of payment will give rise to higher transaction volumes and hence he will design an appropriate promotional programme. A random sample of 54 customer transactions using various types of payment was drawn during the past two months. With reference to sampled data, the sample statistics are obtained as follows: BUS B272 Unit 1 Test if differences of average customer transaction size exist among the four types of payment at a 0.05 level of significance.
39. Exercise 1 BUS B272 Unit 1 One factor is involved, i.e. the type of payment. Under this factor, there are k = 4 treatments (or factor levels) which represent the four types of payment: Visa card, company card, cash and cheque. The experimental units are customer transactions.
40. Exercise 1 Since the test statistic of 39.16 is greater than the critical value of 2.80, reject H0. At 0.05 level of significance, there is evidence to reveal that the average customer transaction sizes are significantly different among the four types of payment. BUS B272 Unit 1
41. Can ANOVA be replaced by t-Test? t-Test : any difference between two population means μ1 and μ2 Multiple t-tests are required for more than two population means Conducting multiple tests increases the probability of making Type I errors. E.g. compare 6 population means, if use ANOVA with significant level 5%, there will be a 5% chance we reject the null hypothesis when it is true. If we use t-test, we need to perform 15 tests and if same 5% significant level is set, the chance of a Type I error will be 1 – (1 - 0.05)15 = 0.54 BUS B272 Unit 1
43. BUS B272 Unit 1 Linear Regression Origin of regression Determining the simple linear regression equation Assessing the fitness of the model Correlation analysis Estimation and prediction Assumptions of regression and correlation
44. BUS B272 Unit 1 Origin of Regression “Regression," from a Latin root meaning "going back," is a series of statistical methods used in studying the relationship between two variables and were first employed by Francis Galton in 1877. Galton was interested in studying the relationship between a father’s height and the son’ s height. Making use of the “regression” method, he found that son’s height regress to the overall mean and the method is then called “regression”.
45. BUS B272 Unit 1 Linear Regression Analysis Linear Regression analysis is used primarily to model and describe linear relationship and provide prediction among variables Predicts the value of a dependent (response) variable based on the value of at least one independent (explanatory) variable Express statistically the effect of the independent variables on the dependent variable
46. BUS B272 Unit 1 Types of Regression Models Positive Linear Relationship Relationship NOT Linear Negative Linear Relationship No Relationship
47. BUS B272 Unit 1 Simple Linear Regression Model The relationship between two variables, sayX and Y, is described by a linear function. The change of the variable Y, (called dependent or response variable) is associated with the change in the other variable X(called independent or explanatory variable). Explore the dependency of Y on X.
48. (4, 5) (2, 2.5) (3, 2.5) (1, 2) Why Regression? The larger the sum of squares, the poor the estimate. X 1 2 3 4 Y 2 2.5 2.5 5 BUS B272 Unit 1
49. BUS B272 Unit 1 Linear Relationship We wish to study whether there is any association between two quantitative variables, sayX and Y If ‘Y tends to increase as X increases’ If ‘Y tends to decrease as X increases’ If the corresponding magnitude of increase or decrease follows a specific proportion, the relationship identified is said to be a linear one. – apositive relationship – anegative relationship
50. BUS B272 Unit 1 Scatter Diagram A scatter diagram is a graph plotted for all X-Y pairs of the sample data. By viewing a scatter diagram, one can determine whether a relationship exists between the two variables. It can also suggest the likely mathematical form of that relationship that allow one to judge initially and intuitively whether or not there exists a linear relationship between the two variables involved.
51. BUS B272 Unit 1 Example The level of air pollution at Kwun Tong and the total number of consultations relating to respiratory diseases in a public clinic in the area were recorded during a specific time period on 14 randomly selected days.
52. BUS B272 Unit 1 Population Linear Regression Population regression line is a straight line that describes the dependence of the average value (conditional mean) of one variable on the other Random Error Population SlopeCoefficient Population Y intercept Dependent (Response) Variable PopulationRegression Line (conditional mean) Independent (Explanatory) Variable
53. BUS B272 Unit 1 Population Linear Regression (continued) Random Error (vertical discrepancies or residual for point i ) Y (Observed Value of Y) = (Conditional Mean) X Observed Value of Y
54. BUS B272 Unit 1 Least Squares Method The line fitted by least squares is the one that makes the sum of squares of all those vertical discrepancies (residuals) as small as possible, i.e. minimum of which is the sum of squared residuals.
55. BUS B272 Unit 1 Sample Y intercept Residual Sample regression line is formed by the point estimates of and , i.e., and . It provides an estimate of the population regression line as well as a predicted value of Y Sample Linear Regression Samplecoefficient of slope Sample regression line (Fitted regression line or predicted value)
56. BUS B272 Unit 1 Sample Linear Regression (continued) and are obtained by finding the specific values of and that minimizes the sum of the squared residuals
57. BUS B272 Unit 1 Coefficients of Sample Linear Regression For
58. BUS B272 Unit 1 Interpretation of the Slope and the Intercept is the average value of Y when the value of X is zero. measures the change in the average value of Y as a result of a one-unit change in X.
59. BUS B272 Unit 1 (continued) is the estimated average value of Y when the value of X is zero. is the estimated change in the average value of Y as a result of one-unit change in X. Interpretation of the Slope and the Intercept
60. BUS B272 Unit 1 Example 1 : Simple Linear Regression Suppose that you want to examine the linear dependency of the annual sales among seven stores on their size in square footage. Sample data for seven stores were obtained. Find the equation of the straight line that fits the data best. Annual Store Square Sales Feet ($1000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760
61. BUS B272 Unit 1 Example 1 : Scatter Diagram Excel Output
63. BUS B272 Unit 1 Computation of Regression Coefficient
64. BUS B272 Unit 1 Example 1 : Equation for the Sample Regression Line Yi = 1636.415 +1.487Xi
65. BUS B272 Unit 1 Example 1 : Interpretation of Results The slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units. The model estimates that for each increase of one square foot in the size of the store, the expected annual sales are predicted to increase by $1487.
66. BUS B272 Unit 1 Predicting Annual Sales Based on Square Footage Suppose that we would like to use the fitted model to predict the average annual sales for a store with 4,000 square feet.
67. BUS B272 Unit 1 Interpolation versus Extrapolation For using regression line for prediction purpose, it is not appropriate to make predictions beyond the relevant range (in the previous example: (1,292, 5,555)) of the independent variable. That is, we may interpolate within the relevant range of X values, but we SHOULD NOT extrapolate beyond the range of X values. For example, it is not appropriate to predict the average annual sales for a store with 7,000 square feet since it is beyond the range of X values, i.e., (1,292, 5,555).
68. BUS B272 Unit 1 Causal Relationship? In general, when there is a relationship identified between X and Y using regression analysis, we usually would say that ‘X is associated with Y’ instead of saying ‘X causes Y’. We cannot claim that two variables are related by cause and effect just because there is a statistical relationship between the two. In fact, you cannot infer a causal relationship from statistics alone.
69. BUS B272 Unit 1 For example, the price of dog food and houses, may well be positively correlated over time. When you collect data concerning the price of dog food and the price of houses over time, you might end up with an inference that they have a positive relationship, but can you conclude that an increase in the price of dog food would directly cause the price of houses to increase too? It might be that an inflationary force is influencing both and hence they can be seen to move in the same general direction over time.
70. BUS B272 Unit 1 Computer Application Import the data into two adjacent columns in an Excel file and then click Tools/Data Analysis/ Regression(See page 624-5 for detail description).
72. BUS B272 Unit 1 Exercise 2 Consider the example about the level of air pollution at Kwun Tong and the total number of consultations that relate to respiratory diseases in a public clinic in the area. The corresponding data were given as follows:
73. BUS B272 Unit 1 Exercise 1 (a) Determine the sample regression line to predict the number of consultations by the level of pollution. (b) Interpret the coefficients. Solution:
74. BUS B272 Unit 1 Exercise 1 For , each additional increase in pollution level, the number of consultations increases, on average by 0.456701074. No meaningful interpretation for can be made, as the range of x does not include zero.
75. BUS B272 Unit 1 Assessing the simple linear regression model From time to time, after we have set up a linear regression model, we wish to assess the fitness of the model. That is, we wish to find out how well the model fit to the given data. For a good fit, the data as a whole should be quite close to the regression line and the independent variable can thus be used to predict the value of the dependent variable with high accuracy. To examine how well the independent variable predicts the dependent variable, we need to develop several measures of variation.
76. BUS B272 Unit 1 Total Sample Variability Unexplained Variability = Explained Variability + Measure of Variation: The Sum of Squares SS(Total) =SSR + SSE
77. BUS B272 Unit 1 Measure of Variation: The Sum of Squares SS(Total) = total sum of squares Measures the variation of the Yi values around their mean Y SSR = regression sum of squares Explained variation attributable to the relationship between X and Y SSE = error sum of squares Variation attributable to factors other than the relationship between X and Y (Unexplained variation) (continued)
78. BUS B272 Unit 1 Measure of Variation: The Sum of Squares _ SS(Total) = (Yi – Y )2 (continued) Y Yi SSE=(Yi - Yi)2 _ _ SSR = (Yi - Y)2 _ Y X Xi
80. BUS B272 Unit 1 Standard Error of Estimate The standard deviation of the variation of observations around the regression line.
81. The smallest value that can assume is 0, which occurs when SSE = 0, that is, when all the points fall on the regression line. Thus, when is small, the fit is excellent, and the linear regression model is likely to be an effective analytical and forecasting tool. When is large, the regression model is a poor one, it is of little value to be used. BUS B272 Unit 1 Standard Error of Estimate
82. BUS B272 Unit 1 The Coefficient of Determination (r 2 or R 2 ) By themselves, SSR, SSE and SS(Total) provide little that can be directly interpreted. A simple ratio of SSR and SS(Total) provides a measure of the usefulness of the regression equation. Measures the proportion of variation in Y that is explained by the independent variable X in the regression model
83. BUS B272 Unit 1 Coefficients of Determination (r 2) r2 = 1 Y Y r2 = 1 ^ Y = b + b X i 0 1 i ^ Y = b + b X i 0 1 i X X r2 = 0 r2 = 0.8 Y Y ^ ^ Y = b + b X Y = b + b X i 0 1 i i 0 1 i X X
84. BUS B272 Unit 1 Coefficient of Correlation Coefficient of correlation is used to measure strength of association (linear relationship) between two numerical variables) Only concerned with strength of the relationship No causal effect is implied
85. BUS B272 Unit 1 (continued) Population correlation coefficient is denoted by (Rho). Sample correlation coefficient is denoted by r . It is an estimate of and is used to measure the strength of the linear relationship in the sample observations. Coefficient of Correlation
87. BUS B272 Unit 1 Sample of Observations from Various r Values Y Y Y X X X r = –1 r = –0.6 r = 0 Y Y X X r = 0.6 r = 1
88. BUS B272 Unit 1 Features of r and r Unit free Range between –1 and 1 The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker the linear relationship
89. BUS B272 Unit 1 There is also a more systematic way to assess model fitness, i.e., to perform a hypothesis testing on the slope of the regression line. Inference about the Slope If the two variables involved are not at all linearly related, one could observe from the scatter diagram shown on the right that the slope of the regression line will be zero.
90. BUS B272 Unit 1 Hence, we can determine whether a significant relationship between the variables X and Y exists by testing whether (the true slope) is equal to zero. Inference about the Slope (There is no linear relationship) (There is a linear relationship) If is rejected, there is evidence to believe that a linear relationship exists between X and Y.
91. BUS B272 Unit 1 The standard error of the slope The estimated standard error of .
92. BUS B272 Unit 1 Inference about the Slope: t Test t test for a population slope Is there a linear dependency of Y on X ? Null and alternative hypotheses H0: 1 = 0 (no linear dependency) H1: 1 0 (linear dependency) Test statistic:
93. BUS B272 Unit 1 Example: Store Sales Data for Seven Stores: Estimated Regression Equation: Annual Store Square Sales Feet ($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760 Yi = 1636.415 +1.487Xi The slope of this model is 1.487. Is square footage of the store affecting its annual sales?
94. H0: 1 = 0 0.05 H1: 1 0 df7 - 2 = 5 Test Statistic: BUS B272 Unit 1
95. BUS B272 Unit 1 Inferences about the Slope: t Test Example Reject Reject 0.025 0.025 0 2.5706 -2.5706 Decision: Conclusion: Critical Value(s): Reject H0 At 5% level of significance, there is evidence to reveal that square footage is associated with annual sales.
96. BUS B272 Unit 1 (No linear relationship) (A linear relationship) (No positive linear relationship) (A positive linear relationship) (No negative linear relationship) (A negative linear relationship) Inferences about the Slope
97. BUS B272 Unit 1 Exercise 3 Consider the data of Exercise 2 about the level of air pollution at Kwun Tong and the total number of consultations that relate to respiratory diseases in a public clinic in the area. Test at the 5% level of significance to determine whether level of air pollution and the total number of consultations are positively linearly related.
100. BUS B272 Unit 1 Computer Output For two-tailed test
101. BUS B272 Unit 1 Exercise 3 Decision: Conclusion: Reject H0 Critical Value(s): Reject H0 At 5% level of significance, there is evidence to believe that level of air pollution and total number of consultations are positively linearly related. 0.05 0 1.7823
102. BUS B272 Unit 1 You have seen how can we assess the model fitness. If the model fits satisfactorily, we can use it to forecast and estimate values of the dependent variable. We can obtain a point prediction of Y with a given value of X using the linear regression line. Confidence interval about the particular value of Y or the average of Y for a given value of X can also be computed if desired. Estimation of Mean Values
103. BUS B272 Unit 1 Estimation of Mean Values Confidence interval estimate for : The mean of Y given a particular Size of interval varies according to distance away from mean, Standard error of the estimate t value from table with df = n - 2
104. BUS B272 Unit 1 Prediction of Individual Values Prediction interval for individual response Yi at a particular Addition of one increases width of interval from that for the mean of Y
105. BUS B272 Unit 1 Interval Estimates for Different Values of X Confidence Interval for the mean of Y Prediction Interval for a individual Yi Y Yi = b0 + b1Xi X Y given X
106. BUS B272 Unit 1 Example: Stores Sales Data for seven stores: Predict the annual sales for a store with 2000 square feet. Annual Store Square Sales Feet ($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760 Regression Model Obtained: Yi = 1636.415 +1.487Xi
107. Estimation of Mean Values: Example Confidence Interval Estimate for Find the 95% confidence interval for the average annual sales for a 2,000 square-foot store. Predicted Sales Yi = 1636.415 +1.487Xi = 4609.68 ($000) tn-2 = t5 = 2.571 X = 2350.29 BUS B272 Unit 1
108. Prediction Interval for Y : Example Prediction Interval for Individual Y Find the 95% prediction interval for the annual sales of a 2,000 square-foot store Predicted Sales Yi = 1636.415 +1.487Xi = 4609.68 ($000) tn-2 = t5 = 2.571 X = 2350.29 BUS B272 Unit 1
109. BUS B272 Unit 1 Computer Application Commands:Tools/ Data Analysis Plus/ Prediction Interval.
111. BUS B272 Unit 1 Linear Regression Assumptions 1. Normality Y values are normally distributed for each X Probability distribution of error is normal 2. Homoscedasticity (Constant Variance) 3. Independence of Errors
112.
113. For each X value, the “spread” or variance around the regression line is the same.Variation of Errors around the Regression Line f(e) Y X2 X1 X Sample Regression Line .
115. BUS B272 Unit 1 Introduction Extension of the simple linear regression model to allow for any fixed number of independent variables. That is, the number of independent variables could be more than one.
116. BUS B272 Unit 1 Multiple Linear Regression To make use of computer printout to Assess the model How well it fits the data Is it useful Are any required conditions violated? Employ the model Interpreting the coefficients Predictions using the prediction equation Estimating the expected value of the dependent variable
117. BUS B272 Unit 1 Allow for k independent variables to potentially be related to the dependent variable y = b0 + b1x1+ b2x2 + …+ bkxk + e Regression Coefficients Random error variable Dependent variable Independent variables Model and Required Conditions
118. Multiple Regression for k = 2, Graphical Demonstration X 1 The simple linear regression model allows for one independent variable, “x” for y = b0 + b1x + e y y = b0 + b1x1 + b2x2 y = b0 + b1x1 + b2x2 y = b0 + b1x1 + b2x2 y = b0 + b1x1 + b2x2 y = b0 + b1x1 + b2x2 y = b0 + b1x1 + b2x2 y = b0 + b1x1 + b2x2 The multiple linear regression model allows for more than one independent variable. Y = b0 + b1x1 + b2x2 + e X2 BUS B272 Unit 1
119. BUS B272 Unit 1 The errore is normally distributed. The mean is equal to zero and the standard deviation is constant (se)for all values of y. The errors are independent. Required conditions for the error variable
120.
121. Assess the model fitness using statistics obtained from the sample.
122.
123. Estimating the Coefficients and Assessing the Model, Example Physical Profitability Margin (%) Market awareness Competition Customers Community Number Office space Income Distance Nearest Enrollment Median household income of nearby area (in $thousands) Number of hotels/motels rooms within 3 miles from the site Enrollemnt in nearby university or college (in thousands) Distance to the downtown core (in miles) Number of miles to closest competition Office space in nearby community BUS B272 Unit 1
124. BUS B272 Unit 1 Estimating the Coefficients and Assessing the Model, Example Data were collected from randomly selected 100 inns that belong to La Quinta, and ran for the following suggested model: Margin = b0 + b1Rooms + b2Nearest + b3Office + b4College + b5Income + b6Disttwn Xm18-01
125. BUS B272 Unit 1 Regression Analysis, Excel Output Margin = 38.14 - 0.0076Number +1.65Nearest + 0.020Office Space +0.21Enrollment + 0.41Income - 0.23Distance This is the sample regression equation (sometimes called the prediction equation)
126. BUS B272 Unit 1 Model Assessment The model is assessed using two tools: The coefficient of determination The F -test of the analysis of variance The standard error of estimates participates in building the above tools.
127. BUS B272 Unit 1 Standard Error of Estimate The standard deviation of the error is estimated by the Standard Error of Estimate: The magnitude of seis judged by comparing it to
128. BUS B272 Unit 1 From the printout, se = 5.51 Calculating the mean value of y, we have It seems se is not particularly small. Question:Can we conclude the model does not fit the data well? Standard Error of Estimate
129. BUS B272 Unit 1 Coefficient of Determination The definition is: From the printout, r 2 = 0.5251 52.51% of the variation in operating margin is explained by the six independent variables. 47.49% remains unexplained.
130. BUS B272 Unit 1 Testing the Validity of the Model For testing the validity of the model, the following question is asked: Is there at least one independent variable linearly related to the dependent variable? To answer the question we test the hypothesis H0: b1 = b2 = … = bk = 0 H1: At least one bi is not equal to zero. If at least one bi is not equal to zero, the model has some validity or usefulness.
131. BUS B272 Unit 1 Testing the Validity of the La Quinta Inns Regression Model The hypotheses are tested by an ANOVA procedure ( the Excel output) MSR / MSE k = n–k–1 = n-1 = SSR MSR=SSR / k SSE MSE=SSE / (n-k-1)
132. BUS B272 Unit 1 Testing the Validity of the La Quinta Inns Regression Model [Total variation in y] SS(Total) = SSR + SSE. Large F results from a large SSR. That implies much of the variation in y can be explained by the regression model; the model is useful, and thus, the null hypothesis should be rejected. Therefore, the rejection region is: F > Fa, k, n – k – 1 while the test statistic is:
133. BUS B272 Unit 1 Testing the Validity of the La Quinta Inns Regression Model Fa, k, n-k-1 = F0.05,6,100-6 -1 = 2.17 F = 17.14 > 2.17 Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the bi is not equal to zero. Thus, at least one independent variable is linearly related to y. This linear regression model is valid. Also, the p-value (Significance F) = 0.0000; Reject the null hypothesis.
134. BUS B272 Unit 1 Interpreting the Coefficients b0 = 38.14. This is the intercept, the value of y when all the variables take the value zero. Since the data range of all the independent variables do not cover the value zero, do not interpret the intercept. b1 = – 0.0076. In this model, for each additional room within 3 mile of the La Quinta inn, the operating margin decreases on average by 0.0076% (assuming the other variables are held constant).
135. BUS B272 Unit 1 Interpreting the Coefficients b2 = 1.65. In this model, for each additional mile that the nearest competitor is to a La Quinta inn, the operating margin increases on average by 1.65% when the other variables are held constant. b3 = 0.020.For each additional 1000 sq-ft of office space, the operating margin will increase on average by 0.02% when the other variables are held constant. b4 = 0.21. For each additional thousand students the operating margin increases on average by 0.21% when the other variables are held constant.
136. BUS B272 Unit 1 Interpreting the Coefficients b5 = 0.41. For additional $1000 increase in median household income, the operating margin increases on average by 0.41%, when the other variables remain constant. b6 = -0.23. For each additional mile to the downtown center, the operating margin decreases on average by 0.23% when the other variables are held constant.
137. BUS B272 Unit 1 Testing the Coefficients The hypothesis for each bi is Excel printout Test statistic: H0: bi= 0 H1: bi¹ 0 d.f. = n - k -1
138. BUS B272 Unit 1 Using the Linear Regression Equation The model can be used for making predictions by Producing prediction interval estimate for the particular value of y, for a given set of values of xi. Producing a confidence interval estimate for the expected value of y, for a given set of values of xi. The model can be used to learn about relationships between the independent variables xi, and the dependent variable y, by interpreting the coefficients bi
139. BUS B272 Unit 1 La Quinta Inns, Predictions Xm18-01 Predict the average operating margin of an inn at a site with the following characteristics: 3815 rooms within 3 miles, Closet competitor 0.9 miles away, 476,000 sq-ft of office space, 24,500 college students, $35,000 median household income, 11.2 miles away from downtown center. MARGIN = 38.14 - 0.0076(3815)+1.65(0.9) + 0.020(476) +0.21(24.5) + 0.41(35) - 0.23(11.2) = 37.1%
140. BUS B272 Unit 1 La Quinta Inns, Predictions Interval estimates by Excel (Data Analysis Plus) It is predicted, with 95% confidence that the operating margin will lie between 25.4% and 48.8%. It is estimated the average operating margin of all sites that fit this category falls within 33% and 41.2%. Both of them suggested that the given site would not be profitable (less than 50%).