Successfully reported this slideshow.
Upcoming SlideShare
×

# Abdm4064 week 11 data analysis

1,203 views

Published on

Data analysis

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Abdm4064 week 11 data analysis

1. 1. Data AnalysisData AnalysisData AnalysisData Analysis ABDM4064 BUSINESS RESEARCHABDM4064 BUSINESS RESEARCH by Stephen Ong Principal Lecturer (Specialist) Visiting Professor, Shenzhen
2. 2. 19–2 LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES 1. Know when a response is really an error and should be edited 2. Appreciate coding of pure qualitative research 3. Understand the way data are represented in a data file 4. Understand the coding of structured responses including a dummy variable approach 5. Appreciate the ways that technological advances have simplified the coding process After studying this chapter, you should be able to
3. 3. 6. Know what descriptive statistics are and why they are used 7. Create and interpret simple tabulation tables 8. Understand how cross-tabulations can reveal relationships 9. Perform basic data transformations 10. List different computer software products designed for descriptive statistical analysis 11. Understand a researcher’s role in interpreting the data 12. Implement the hypothesis-testing procedure 13. Use p-values to assess statistical significance 19–3 LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES
4. 4. 14. Test a hypothesis about an observed mean compared to some standard 15. Know the difference between Type I and Type II errors 16. Know when a univariate χ2 test is appropriate and how to conduct one 17. Recognize when a bivariate statistical test is appropriate 18. Calculate and interpret a χ2 test for a contingency table 19. Calculate and interpret an independent samples t-test comparing two means 20. Understand the concept of analysis of variance (ANOVA) 21. Interpret an ANOVA table 19–4 LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES
5. 5. 22. Apply and interpret simple bivariate correlations 23. Interpret a correlation matrix 24. Understand simple (bivariate) regression 25. Understand the least-squares estimation technique 26. Interpret regression output including the tests of hypotheses tied to specific parameter coefficients 27. Understand what multivariate statistical analysis involves and know the two types of multivariate analysis 28. Interpret results from multiple regression analysis 29. Interpret results from multivariate analysis of variance (MANOVA) 19–5 LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES
6. 6. 30. Interpret basic exploratory factor analysis results 31. Know what multiple discriminant analysis can be used to do 32. Understand how cluster analysis can identify market segments 19–6 LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES
7. 7. Remember this,Remember this,  Garbage in, garbage out!Garbage in, garbage out!  If data is collected improperly, or codedIf data is collected improperly, or coded incorrectly, then the research resultsincorrectly, then the research results are “garbage”.are “garbage”.
8. 8. Stages of Data AnalysisStages of Data Analysis  Raw Data  The unedited responses from a respondent exactly as indicated by that respondent.  Nonrespondent Error  Error that the respondent is not responsible for creating, such as when the interviewer marks a response incorrectly.  Data Integrity  The notion that the data file actually contains the information that the researcher is trying to obtain to adequately address research questions.
9. 9. 19–9 EXHIBIT 19.EXHIBIT 19.11 Overview of the Stages of Data AnalysisOverview of the Stages of Data Analysis
10. 10. EditingEditing  Editing  The process of checking the completeness, consistency, and legibility of data and making the data ready for coding and transfer to storage.  E.g. How long you have stayed at your current address? 45  The researchers need to make adjustment/reconstruct responses  Field Editing – useful in personal interview  Preliminary editing by a field supervisor on the same day as the interview to catch technical omissions, check legibility of handwriting, and clarify responses that are logically or conceptually inconsistent.
11. 11.  In-House Editing  A rigorous editing job performed by a centralized office staff.
12. 12. Editing – what to do?Editing – what to do?  Checking for Consistency  Respondents match defined population – e.g. SBS?  Check for consistency within the data collection framework – e.g. items listed by the respondents are within the definition.  Taking Action When Response is Obviously in Error  Change/correct responses only when there are multiple pieces of evidence for doing so.  Editing Technology  Computer routines can check for consistency automatically.
13. 13. 19–13 Editing for CompletenessEditing for Completeness  Item Nonresponse  The technical term for an unanswered question on an otherwise complete questionnaire resulting in missing data.  Most of the time the researchers will do nothing to it.  But sometimes the question is linked to another question therefore the researchers have to fill-in-the blank.  Plug Value  An answer that an editor “plugs in” to replace blanks or missing values so as to permit data analysis.  Choice of value is based on a predetermined decision rule, e.g. take an average value or neutral value.  Several choices:  Leave it blank  Plug in alternate choices.  Randomly select an answer.  Impute a missing value.
14. 14.  Impute  To fill in a missing data point through the use of a statistical process providing an educated guess for the missing response based on available information.  I.e. based on the respondent’s choices to other questions.
15. 15. Editing for CompletenessEditing for Completeness (cont’d)(cont’d)  What about missing data? Many statistical software programs required complete data for an analysis to take place.  List-wise deletion  The entire record for a respondent that has left a response missing is excluded from use in statistical analysis.  Pair-wise deletion  Only the actual variables for a respondent that do not contain information are eliminated from use in statistical analysis.
16. 16. Please take note,Please take note,  When a questionnaire has too manyWhen a questionnaire has too many missing answer, it may not be suitablemissing answer, it may not be suitable for the planned data analysis. In suchfor the planned data analysis. In such situation, that particular questionnairesituation, that particular questionnaire has to be dropped from the sample.has to be dropped from the sample.
17. 17. Facilitating the CodingFacilitating the Coding ProcessProcess  Editing And Tabulating “Don’t Know” Answers  Legitimate don’t know (no opinion)  Reluctant don’t know (refusal to answer)  Confused don’t know (does not understand)
18. 18. Editing (cont’d)Editing (cont’d)  Pitfalls of Editing  Allowing subjectivity to enter into the editing process.  Data editors should be intelligent, experienced, and objective.  A systematic procedure for assessing the questionnaire should be developed by the research analyst so that the editor has clearly defined decision rules.  Pretesting Edit  Editing during the pretest stage can prove very valuable for improving questionnaire format, identifying poor instructions or inappropriate question wording.
19. 19. Coding Qualitative ResponsesCoding Qualitative Responses  Coding  The process of assigning a numerical score or other character symbol to previously edited data.  Codes  Rules for interpreting, classifying, and recording data in the coding process.  The actual numerical or other character symbols assigned to raw data.  Dummy Coding  Numeric “1” or “0” coding where each number represents an alternate response such as “female” or “male.”  If k is the number of categories for a qualitative variable, k-1 dummy variables are needed.
20. 20. Data File TerminologyData File Terminology  Field  A collection of characters that represents a single type of data—usually a variable.  String Characters  Computer terminology to represent formatting a variable using a series of alphabetic characters (nonnumeric characters) that may form a word.  Record  A collection of related fields that represents the responses from one sampling unit.
21. 21. Data File Terminology (cont’d)Data File Terminology (cont’d)  Data File  The way a data set is stored electronically in spreadsheet-like form in which the rows represent sampling units and the columns represent variables.  Value Labels  Unique labels assigned to each possible numeric code for a response.
22. 22. Code ConstructionCode Construction  Two Basic Rules for Coding Categories: 1. They should be exhaustive, meaning that a coding category should exist for all possible responses. 2. They should be mutually exclusive and independent, meaning that there should be no overlap among the categories to ensure that a subject or response can be placed in only one category.  Test Tabulation – especially useful for open-ended questions  Tallying of a small sample of the total number of replies to a particular question in order to construct coding categories.  Purpose is to preliminarily identify the stability and distribution of answers that will determine a coding scheme.
23. 23. Test Tabulation  E.g.  1st respondent: I don’t like to use Facebook because it is wasting time.  2nd respondent: I don’t know what is Facebook.  3rd respondent: Facebook takes me a lot of time.  Based on the above 3 answer, you can have 2 groups of answer:  1st group: Time factor  2nd group: No knowledge on Facebook
24. 24. Devising the Coding SchemeDevising the Coding Scheme  A coding scheme should not be too elaborate.  The coder’s task is only to summarize the data.  Categories should be sufficiently unambiguous that coders will not classify items in different ways.  Code book  Identifies each variable in a study and gives the variable’s description, code name, and position in the data matrix.
25. 25. The Nature of DescriptiveThe Nature of Descriptive AnalysisAnalysis  Descriptive Analysis  The elementary transformation of raw data in a way that describes the basic characteristics such as central tendency, distribution, and variability.  Histogram  A graphical way of showing a frequency distribution in which the height of a bar corresponds to the observed frequency of the category.
26. 26. 20–26 EXHIBIT 20.EXHIBIT 20.11 Levels of Scale Measurement and Suggested Descriptive StatisticsLevels of Scale Measurement and Suggested Descriptive Statistics
27. 27. Creating and InterpretingCreating and Interpreting TabulationTabulation  Tabulation  The orderly arrangement of data in a table or other summary format showing the number of responses to each response category.  Tallying is the term when the process is done by hand.  Frequency Table  A table showing the different ways respondents answered a question.  Sometimes called a marginal tabulation.
28. 28. Frequency Table ExampleFrequency Table Example
29. 29. Cross-TabulationCross-Tabulation  Cross-Tabulation  Addresses research questions involving relationships among multiple less-than interval variables.  Results in a combined frequency table displaying one variable in rows and another variable in columns.  Contingency Table  A data matrix that displays the frequency of some combination of responses to multiple variables.  Marginals  Row and column totals in a contingency table, which are shown in its margins.
30. 30. 20–30 EXHIBIT 20.EXHIBIT 20.22 Cross-Tabulation Tables from a Survey Regarding AIG andCross-Tabulation Tables from a Survey Regarding AIG and Government BailoutsGovernment Bailouts
31. 31. 20–31 EXHIBIT 20.EXHIBIT 20.33 Different Ways of Depicting the Cross-Tabulation of Biological SexDifferent Ways of Depicting the Cross-Tabulation of Biological Sex and Target Patronageand Target Patronage
32. 32. Cross-Tabulation (cont’d)Cross-Tabulation (cont’d)  Percentage Cross-Tabulations  Statistical base – the number of respondents or observations (in a row or column) used as a basis for computing percentages.  Elaboration and Refinement  Elaboration analysis – an analysis of the basic cross-tabulation for each level of a variable not previously considered, such as subgroups of the sample.  Moderator variable – a third variable that changes the nature of a relationship between the original independent and dependent variables.
33. 33. EXHIBIT 20.EXHIBIT 20.44 Cross-Tabulation of Marital Status, Sex, and Responses to theCross-Tabulation of Marital Status, Sex, and Responses to the Question “Do You Shop at Target?”Question “Do You Shop at Target?”
34. 34. Cross-Tabulation (cont’d)Cross-Tabulation (cont’d)  How Many Cross-Tabulations?  Every possible response becomes a possible explanatory variable.  When hypotheses involve relationships among two categorical variables, cross- tabulations are the right tool for the job.  Quadrant Analysis  An extension of cross-tabulation in which responses to two rating-scale questions are plotted in four quadrants of a two-dimensional table.  Importance-performance analysis
35. 35. EXHIBIT 20.EXHIBIT 20.55 An Importance-Performance or Quadrant Analysis of HotelsAn Importance-Performance or Quadrant Analysis of Hotels
36. 36. 20–36 Data TransformationData Transformation  Data Transformation  Process of changing the data from their original form to a format suitable for performing a data analysis addressing research objectives. Bimodal
37. 37. 20–37 Problems with DataProblems with Data TransformationsTransformations  Median Split  Dividing a data set into two categories by placing respondents below the median in one category and respondents above the median in another.  The approach is best applied only when the data do indeed exhibit bimodal characteristics.  Inappropriate collapsing of continuous variables into categorical variables ignores the information contained within the untransformed values.
38. 38. 20–38 EXHIBIT 20.EXHIBIT 20.66 Bimodal Distributions Are Consistent withBimodal Distributions Are Consistent with Transformations into Categorical ValuesTransformations into Categorical Values
39. 39. 20–39 EXHIBIT 20.EXHIBIT 20.77 The Problem with Median Splits with Unimodal DataThe Problem with Median Splits with Unimodal Data
40. 40. 20–40 Index NumbersIndex Numbers  Index Numbers  Scores or observations recalibrated to indicate how they relate to a base number.  Price indexes  Represent simple data transformations that allow researchers to track a variable’s value over time and compare a variable(s) with other variables.  Recalibration allows scores or observations to be related to a certain base period or base number.
41. 41. 20–41 EXHIBIT 20.EXHIBIT 20.88 Hours of Television Usage per WeekHours of Television Usage per Week
42. 42. 20–42 Calculating Rank OrderCalculating Rank Order  Rank Order  Ranking data can be summarized by performing a data transformation.  The transformation involves multiplying the frequency by the ranking score for each choice resulting in a new scale.
43. 43. 20–43 EXHIBIT 20.EXHIBIT 20.99 Executive Rankings of Potential Conference DestinationsExecutive Rankings of Potential Conference Destinations
44. 44. 20–44 EXHIBIT 20.EXHIBIT 20.1010 Frequencies of Conference Destination RankingsFrequencies of Conference Destination Rankings
45. 45. 20–45 EXHIBIT 20.EXHIBIT 20.1111 Pie Charts Work Well with Tabulations and Cross-TabulationsPie Charts Work Well with Tabulations and Cross-Tabulations
46. 46. 20–46 Computer Programs forComputer Programs for AnalysisAnalysis Statistical Packages  Spreadsheets  Excel  Statistical software:  SAS  SPSS (Statistical Package for Social Sciences)  MINITAB
47. 47. 20–47 Computer Graphics andComputer Graphics and Computer MappingComputer Mapping  Box and Whisker Plots  Graphic representations of central tendencies, percentiles, variabilities, and the shapes of frequency distributions.  Interquartile Range  A measure of variability.  Outlier  A value that lies outside the normal range of the data.
48. 48. 20–48 EXHIBIT 20.15EXHIBIT 20.15 Computer DrawnComputer Drawn Box and WhiskerBox and Whisker PlotPlot
49. 49. SPSS WindowsSPSS Windows  The main program in SPSS is FREQUENCIES. It produces aThe main program in SPSS is FREQUENCIES. It produces a table of frequency counts, percentages, and cumulativetable of frequency counts, percentages, and cumulative percentages for the values of each variable. It gives all of thepercentages for the values of each variable. It gives all of the associated statistics.associated statistics.  If the data are interval scaled and only the summary statisticsIf the data are interval scaled and only the summary statistics are desired, the DESCRIPTIVES procedure can be used.are desired, the DESCRIPTIVES procedure can be used.  The EXPLORE procedure produces summary statistics andThe EXPLORE procedure produces summary statistics and graphical displays, either for all of the cases or separately forgraphical displays, either for all of the cases or separately for groups of cases. Mean, median, variance, standard deviation,groups of cases. Mean, median, variance, standard deviation, minimum, maximum, and range are some of the statistics thatminimum, maximum, and range are some of the statistics that can be calculated.can be calculated.
50. 50. SPSS WindowsSPSS Windows To select these procedures click:To select these procedures click: Analyze>Descriptive Statistics>FrequenciesAnalyze>Descriptive Statistics>Frequencies Analyze>Descriptive Statistics>DescriptivesAnalyze>Descriptive Statistics>Descriptives Analyze>Descriptive Statistics>ExploreAnalyze>Descriptive Statistics>Explore The major cross-tabulation program is CROSSTABS.The major cross-tabulation program is CROSSTABS. This program will display the cross-classification tables andThis program will display the cross-classification tables and provide cell counts, row and column percentages, theprovide cell counts, row and column percentages, the chi-square test for significance, and all the measures of thechi-square test for significance, and all the measures of the strength of the association that have been discussed.strength of the association that have been discussed. To select these procedures, click:To select these procedures, click: Analyze>Descriptive Statistics>CrosstabsAnalyze>Descriptive Statistics>Crosstabs
51. 51. SPSS WindowsSPSS Windows The major program for conducting parametric tests in SPSS isThe major program for conducting parametric tests in SPSS is COMPARE MEANS. This program can be used to conductCOMPARE MEANS. This program can be used to conduct tt teststests on one sample or independent or paired samples. To select theseon one sample or independent or paired samples. To select these procedures using SPSS for Windows, click:procedures using SPSS for Windows, click: Analyze>Compare Means>Means …Analyze>Compare Means>Means … Analyze>Compare Means>One-Sample T Test …Analyze>Compare Means>One-Sample T Test … Analyze>Compare Means>Independent-Samples T Test …Analyze>Compare Means>Independent-Samples T Test … Analyze>Compare Means>Paired-Samples T Test …Analyze>Compare Means>Paired-Samples T Test …
52. 52. SPSS WindowsSPSS Windows The nonparametric tests discussed in this chapter canThe nonparametric tests discussed in this chapter can be conducted using NONPARAMETRIC TESTS.be conducted using NONPARAMETRIC TESTS. To select these procedures using SPSS for Windows,To select these procedures using SPSS for Windows, click:click: Analyze>Nonparametric Tests>Chi-Square …Analyze>Nonparametric Tests>Chi-Square … Analyze>Nonparametric Tests>Binomial …Analyze>Nonparametric Tests>Binomial … Analyze>Nonparametric Tests>Runs …Analyze>Nonparametric Tests>Runs … Analyze>Nonparametric Tests>1-Sample K-S …Analyze>Nonparametric Tests>1-Sample K-S … Analyze>Nonparametric Tests>2 Independent Samples …Analyze>Nonparametric Tests>2 Independent Samples … Analyze>Nonparametric Tests>2 Related Samples …Analyze>Nonparametric Tests>2 Related Samples …
53. 53. 1 - 53
54. 54. SPSS Windows:SPSS Windows: FrequenciesFrequencies 1.1. Select ANALYZE on the SPSS menu bar.Select ANALYZE on the SPSS menu bar. 2.2. Click DESCRIPTIVE STATISTICS andClick DESCRIPTIVE STATISTICS and select FREQUENCIES.select FREQUENCIES. 3.3. Move the variable “Familiarity [familiar]”Move the variable “Familiarity [familiar]” to the VARIABLE(s) box.to the VARIABLE(s) box. 4.4. Click STATISTICS.Click STATISTICS. 5.5. Select MEAN, MEDIAN, MODE, STD.Select MEAN, MEDIAN, MODE, STD. DEVIATION, VARIANCE, and RANGE.DEVIATION, VARIANCE, and RANGE.
55. 55. SPSS Windows:SPSS Windows: FrequenciesFrequencies 6.6. Click CONTINUE.Click CONTINUE. 7.7. Click CHARTS.Click CHARTS. 8.8. Click HISTOGRAMS, thenClick HISTOGRAMS, then click CONTINUE.click CONTINUE. 9.9. Click OK.Click OK.
56. 56. Introduction of a Third Variable inIntroduction of a Third Variable in Cross-TabulationCross-Tabulation
57. 57. 1 - 57
58. 58. SPSS Windows: Cross-SPSS Windows: Cross- tabulationstabulations 1.1. Select ANALYZE on the SPSS menu bar.Select ANALYZE on the SPSS menu bar. 2.2. Click on DESCRIPTIVE STATISTICS and selectClick on DESCRIPTIVE STATISTICS and select CROSSTABS.CROSSTABS. 3.3. Move the variable “Internet Usage Group [iusagegr]” toMove the variable “Internet Usage Group [iusagegr]” to the ROW(S) box.the ROW(S) box. 4.4. Move the variable “Sex[sex]” to the COLUMN(S) box.Move the variable “Sex[sex]” to the COLUMN(S) box. 5.5. Click on CELLS.Click on CELLS. 6.6. Select OBSERVED under COUNTS and COLUMN underSelect OBSERVED under COUNTS and COLUMN under PERCENTAGES.PERCENTAGES.
59. 59. SPSS Windows: Cross-SPSS Windows: Cross- tabulationstabulations 7.7. Click CONTINUE.Click CONTINUE. 8.8. Click STATISTICS.Click STATISTICS. 9.9. Click on CHI-SQUARE, PHI ANDClick on CHI-SQUARE, PHI AND CRAMER’SCRAMER’S VV.. 10.10. Click CONTINUE.Click CONTINUE. 11.11. Click OK.Click OK.
60. 60. 20–60 InterpretationInterpretation  Interpretation  The process of drawing inferences from the analysis results.  Inferences drawn from interpretations lead to managerial implications and decisions.  From a management perspective, the qualitative meaning of the data and their managerial implications are an important aspect of the interpretation.
61. 61. Hypothesis TestingHypothesis Testing  Types of Hypotheses  Relational hypotheses  Examine how changes in one variable vary with changes in another.  Hypotheses about differences between groups  Examine how some variable varies from one group to another.  Hypotheses about differences from some standard  Examine how some variable differs from some preconceived standard. These tests typify univariate statistical tests.
62. 62. 21–62 Types of Statistical AnalysisTypes of Statistical Analysis  Univariate Statistical Analysis  Tests of hypotheses involving only one variable.  Testing of statistical significance  Bivariate Statistical Analysis  Tests of hypotheses involving two variables.  Multivariate Statistical Analysis  Statistical analysis involving three or more variables or sets of variables.
63. 63. 21–63 The Hypothesis-TestingThe Hypothesis-Testing ProcedureProcedure  Process 1. The specifically stated hypothesis is derived from the research objectives. 2. A sample is obtained and the relevant variable is measured. 3. The measured sample value is compared to the value either stated explicitly or implied in the hypothesis.  If the value is consistent with the hypothesis, the hypothesis is supported.  If the value is not consistent with the hypothesis, the hypothesis is not supported.
64. 64. 21–64 Statistical Analysis: Key TermsStatistical Analysis: Key Terms  Hypothesis  Unproven proposition: a supposition that tentatively explains certain facts or phenomena.  An assumption about nature of the world.  Null Hypothesis  Statement about the status quo.  No difference in sample and population.  Alternative Hypothesis  Statement that indicates the opposite of the null hypothesis.
65. 65. 21–65 Significance Levels and p-Significance Levels and p- valuesvalues Significance Level  A critical probability associated with a statistical hypothesis test that indicates how likely an inference supporting a difference between an observed value and some statistical expectation is true.  The acceptable level of Type I error.  p-value  Probability value, or the observed or computed significance level.  p-values are compared to significance levels to test hypotheses.  Higher p-values equal more support for an hypothesis.
66. 66. 21–66 EXHIBIT 21.EXHIBIT 21.11 pp-Values and Statistical Tests-Values and Statistical Tests
67. 67. 21–67 EXHIBIT 21.EXHIBIT 21.22 As the observed mean gets further from the standard (proposed population mean), the p- value decreases. The lower the p-value, the more confidence you have that the sample mean is different.
68. 68. 21–68 An Example of Hypothesis TestingAn Example of Hypothesis Testing The null hypothesis: the mean is equal to 3.0: The alternative hypothesis: the mean does not equal to 3.0:
69. 69. 21–69 An Example of Hypothesis TestingAn Example of Hypothesis Testing
70. 70. 21–70 EXHIBIT 21.EXHIBIT 21.33 A Hypothesis Test Using the Sampling Distribution ofA Hypothesis Test Using the Sampling Distribution of XX under the Hypothesisunder the Hypothesis µµ == 3.03.0 — Critical ValuesCritical Values Values that lieValues that lie exactly on theexactly on the boundary of theboundary of the region of rejection.region of rejection.
71. 71. Type I and Type II ErrorsType I and Type II Errors  Type I Error  An error caused by rejecting the null hypothesis when it is true.  Has a probability of alpha (α).  Practically, a Type I error occurs when the researcher concludes that a relationship or difference exists in the population when in reality it does not exist.  ““There really are no monsters under the bed.”There really are no monsters under the bed.”
72. 72. Type I and Type II ErrorsType I and Type II Errors (cont’d)(cont’d)  Type II Error  An error caused by failing to reject the null hypothesis when the alternative hypothesis is true.  Has a probability of beta (β).  Practically, a Type II error occurs when a researcher concludes that no relationship or difference exists when in fact one does exist.  ““There really are monsters under the bed.”There really are monsters under the bed.”
73. 73. EXHIBIT 21.EXHIBIT 21.44 Type I and Type II Errors in Hypothesis TestingType I and Type II Errors in Hypothesis Testing
74. 74. 21–74 Choosing the AppropriateChoosing the Appropriate Statistical TechniqueStatistical Technique  Choosing the correct statistical technique requires considering:  Type of question to be answered  E.g. Ranking question – rank order test  Number of variables involved  One variable – univariate statistical analysis  Two variable – bivariate statistical analysis  More than two variables – multivariate analysis  Level of scale measurement  E.g. in nominal scale, mean and median is meaningless.
75. 75. 21–75 Parametric versusParametric versus Nonparametric TestsNonparametric Tests  Parametric Statistics  Involve numbers with known, continuous distributions.  Appropriate when:  Data are interval or ratio scaled.  Sample size is large.  Nonparametric Statistics  Appropriate when the variables being analyzed do not conform to any known or continuous distribution.
76. 76. EXHIBIT 21.EXHIBIT 21.55 Univariate Statistical Choice Made EasyUnivariate Statistical Choice Made Easy
77. 77. 21–77 TheThe tt-Distribution-Distribution  t-test  A hypothesis test that uses the t- distribution.  A univariate t-test is appropriate when the variable being analyzed is interval or ratio.  Degrees of freedom (d.f.)  The number of observations minus the number of constraints or assumptions needed to calculate a statistical term.
78. 78. 21–78 EXHIBIT 21.EXHIBIT 21.66 The t-Distribution for Various Degrees of FreedomThe t-Distribution for Various Degrees of Freedom
79. 79. 21–79 Calculating a Confidence Interval EstimateCalculating a Confidence Interval Estimate Using theUsing the tt-Distribution-Distribution
80. 80. Calculating a Confidence Interval EstimateCalculating a Confidence Interval Estimate Using the t-Distribution (cont’d)Using the t-Distribution (cont’d) 28.5) 18 81.2 (12.289.3 =+= 49.2) 18 81.2 (12.289.3 =−=
81. 81. 21–81 One-Tailed UnivariateOne-Tailed Univariate tt-Tests-Tests One-tailed Test  Appropriate when a research hypothesis implies that an observed mean can only be greater than or less than a hypothesized value.  E.g. “Females score higher than males in English Test”  Only one of the “tails” of the bell-shaped normal curve is relevant.  A one-tailed test can be determined from a two-tailed test result by taking half of the observed p-value.  When there is any doubt about whether a one- or two- tailed test is appropriate, opt for the less conservative two-tailed test.
82. 82. 21–82 Two-Tailed UnivariateTwo-Tailed Univariate tt-Tests-Tests  Two-tailed Test  Tests for differences from the population mean that are either greater or less. i.e. Identify whether there is any difference.  E.g. The English test scores of females are different from the scores of males.  Extreme values of the normal curve (or tails) on both the right and the left are considered.  When a research question does not specify whether a difference should be greater than or less than, a two-tailed test is most appropriate.  When the researcher has any doubt about whether a one- or two-tailed test is appropriate, he or she should opt for the less conservative two-tailed test.
83. 83. Univariate Hypothesis TestUnivariate Hypothesis Test Utilizing theUtilizing the tt-Distribution-Distribution  Example:  Suppose a Pizza Inn manager believes the average number of returned pizzas each day to be 20.  The store records the number of defective assemblies for each of the 25 days it was opened in a given month.  The mean was calculated to be 22, and the standard deviation to be 5.
84. 84. 200 =µ:H Univariate Hypothesis TestUnivariate Hypothesis Test Utilizing theUtilizing the tt-Distribution: An-Distribution: An ExampleExample The sample mean is equal to 20. The sample mean is equal not to 20. 201 ≠µ:H nSSX /= 25/5= 1=
85. 85. Univariate Hypothesis TestUnivariate Hypothesis Test Utilizing theUtilizing the tt-Distribution: An-Distribution: An Example (cont’d)Example (cont’d)  The researcher desired a 95 percent confidence; the significance level becomes 0.05.  The researcher must then find the upper and lower limits of the confidence interval to determine the region of rejection.  Thus, the value of t is needed.  For 24 degrees of freedom (n-1= 25-1), the t-value is 2.064.
86. 86. Univariate Hypothesis Test UtilizingUnivariate Hypothesis Test Utilizing thethe tt-Distribution: An Example-Distribution: An Example (cont’d)(cont’d) 93617 25 5 064220 .... =      −=− Xlc StµLower limit = 06422 25 5 064220 .... =      +=+ Xlc StµUpper limit =
87. 87. Univariate Hypothesis TestUnivariate Hypothesis Test Utilizing theUtilizing the tt-Distribution:-Distribution: An Example (cont’d)An Example (cont’d) Univariate Hypothesis TestUnivariate Hypothesis Test tt-Test-Test X obs S X t µ− = 1 2022 − = 1 2 = 2= This is less than the critical t-value of 2.064 at the 0.05 level with 24 degrees of freedom  hypothesis is not supported.
88. 88. 21–88 The Chi-Square Test forThe Chi-Square Test for Goodness of FitGoodness of Fit  Chi-square (χ2 ) test  Tests for statistical significance.  Is particularly appropriate for testing hypotheses about frequencies arranged in a frequency or contingency table.  Goodness-of-Fit (GOF)  A general term representing how well some computed table or matrix of values matches some population or predetermined table or matrix of the same size.
89. 89. The Chi-Square Test forThe Chi-Square Test for Goodness of Fit: An ExampleGoodness of Fit: An Example
90. 90. The Chi-Square Test for Goodness ofThe Chi-Square Test for Goodness of Fit: An Example (cont’d)Fit: An Example (cont’d) ∑ − = i ii( ² E E )²O χ χ² = chi-square statistics Oi = observed frequency in the ith cell Ei = expected frequency on the ith cell
91. 91. n CR E ji ij = Chi-Square Test: Estimation forChi-Square Test: Estimation for Expected Number for Each CellExpected Number for Each Cell Ri = total observed frequency in the ith row Cj = total observed frequency in the jth column n = sample size
92. 92. Hypothesis Test of a ProportionHypothesis Test of a Proportion  Hypothesis Test of a Proportion  Is conceptually similar to the one used when the mean is the characteristic of interest but that differs in the mathematical formulation of the standard error of the proportion. p obs S p Z π− = π is the population proportion p is the sample proportion π is estimated with p
93. 93. What Is the Appropriate TestWhat Is the Appropriate Test of Difference?of Difference?  Test of Differences  An investigation of a hypothesis that two (or more) groups differ with respect to measures on a variable.  Behaviour, characteristics, beliefs, opinions, emotions, or attitudes  Bivariate Tests of Differences  Involve only two variables: a variable that acts like a dependent variable and a variable that acts as a classification variable.  Differences in mean scores between groups or in comparing how two groups’ scores are distributed across possible response categories.
94. 94. 22–94 EXHIBIT 22.EXHIBIT 22.11 Some Bivariate HypothesesSome Bivariate Hypotheses
95. 95. Cross-Tabulation Tables: TheCross-Tabulation Tables: The χχ22 Test for Goodness-of-FitTest for Goodness-of-Fit  Cross-Tabulation (Contingency) Table  A joint frequency distribution of observations on two more variables.  χ2 Distribution  Provides a means for testing the statistical significance of a contingency table.  Involves comparing observed frequencies (Oi) with expected frequencies (Ei) in each cell of the table.  Captures the goodness- (or closeness-) of-fit of the observed distribution with the expected distribution.
96. 96. Chi-Square TestChi-Square Test ∑ − = i ii E )²E(O χ² χ² = chi-square statistic Oi = observed frequency in the ith cell Ei = expected frequency on the ith cell n CR E ji ij = Ri = total observed frequency in the ith row Cj = total observed frequency in the jth column n = sample size
97. 97. Degrees of Freedom (d.f.)Degrees of Freedom (d.f.) d.f.=(R-1)(C-1)d.f.=(R-1)(C-1)
98. 98. 22–98 Example: Papa John’s RestaurantsExample: Papa John’s Restaurants Univariate Hypothesis:Univariate Hypothesis: Papa John’s restaurantsPapa John’s restaurants are more likely to beare more likely to be located in a stand-alonelocated in a stand-alone location or in a shoppinglocation or in a shopping center.center. Bivariate Hypothesis:Bivariate Hypothesis: Stand-alone locationsStand-alone locations are more likely to beare more likely to be profitable than areprofitable than are shopping centershopping center locations.locations.
99. 99. Example: Papa John’sExample: Papa John’s Restaurants (cont’d)Restaurants (cont’d)  In this example, χ2 = 22.16 with 1 d.f.  From Table A.4, the critical value at the 0.05 level with 1 d.f. is 3.84.  Thus, we are 95 percent confident that the observed values do not equal the expected values.  But are the deviations from the expected values in the hypothesized direction?
100. 100. χχ22 Test for Goodness-of-FitTest for Goodness-of-Fit RecapRecap Testing the hypothesis involves two key steps: 1. Examine the statistical significance of the observed contingency table. 2. Examine whether the differences between the observed and expected values are consistent with the hypothesized prediction.
101. 101. TheThe tt-Test for Comparing Two Means-Test for Comparing Two Means  Independent Samples t-Test  A test for hypotheses stating that the mean scores for some interval- or ratio-scaled variable grouped based on some less-than- interval classificatory variable are not the same. meansrandomofyVariabilit 2MeanSample-1MeanSample =t 21 21 XX S t − Χ−Χ =
102. 102. TheThe tt-Test for Comparing-Test for Comparing Two Means (cont’d)Two Means (cont’d)  Pooled Estimate of the Standard Error  An estimate of the standard error for a t- test of independent means that assumes the variances of both groups are equal. ( )       +         −+ −+− =− 2121 2 22 2 11 11 2 11 21 nnnn SnSn S XX ))(
103. 103. © 2010 South-Western/Cengage Learning. All rights reserved. May not be scanned, copied or duplicated, or posted to a publically accessible website, in whole or in part. 22–103 EXHIBIT 22.EXHIBIT 22.22 Independent SamplesIndependent Samples tt-Test Results-Test Results
104. 104. Comparing Two Means (cont’d)Comparing Two Means (cont’d)  Paired-Samples t-Test  Compares the scores of two interval variables drawn from related populations.  Used when means need to be compared that are not from independent samples.
105. 105. © 2010 South-Western/Cengage Learning. All rights reserved. May not be scanned, copied or duplicated, or posted to a publically accessible website, in whole or in part. 22–105 EXHIBIT 22.EXHIBIT 22.44 Example Results for a Paired SamplesExample Results for a Paired Samples tt-Test-Test
106. 106. A Classification of Hypothesis TestingA Classification of Hypothesis Testing Procedures for Examining DifferencesProcedures for Examining Differences
107. 107. 1 - 107
108. 108. SPSS Windows: OneSPSS Windows: One SampleSample tt TestTest 1.1. Select ANALYZE from the SPSSSelect ANALYZE from the SPSS menu bar.menu bar. 2.2. Click COMPARE MEANS and thenClick COMPARE MEANS and then ONE SAMPLE T TEST.ONE SAMPLE T TEST. 3.3. Move “Familiarity [familiar]” in to theMove “Familiarity [familiar]” in to the TEST VARIABLE(S) box.TEST VARIABLE(S) box. 4.4. Type “4” in the TEST VALUE box.Type “4” in the TEST VALUE box. 5.5. Click OK.Click OK.
109. 109. SPSS Windows:SPSS Windows: Two Independent Samples t TestTwo Independent Samples t Test 1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar. 2.2. Click COMPARE MEANS and then INDEPENDENTClick COMPARE MEANS and then INDEPENDENT SAMPLES T TEST.SAMPLES T TEST. 3.3. Move “Internet Usage Hrs/Week [iusage]” in to the TESTMove “Internet Usage Hrs/Week [iusage]” in to the TEST VARIABLE(S) box.VARIABLE(S) box. 4.4. Move “Sex[sex]” to GROUPING VARIABLE box.Move “Sex[sex]” to GROUPING VARIABLE box. 5.5. Click DEFINE GROUPS.Click DEFINE GROUPS. 6.6. Type “1” in GROUP 1 box and “2” in GROUP 2 box.Type “1” in GROUP 1 box and “2” in GROUP 2 box. 7.7. Click CONTINUE.Click CONTINUE. 8.8. Click OK.Click OK.
110. 110. SPSS Windows: Paired Samples tSPSS Windows: Paired Samples t TestTest 1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar. 2.2. Click COMPARE MEANS and then PAIREDClick COMPARE MEANS and then PAIRED SAMPLES T TEST.SAMPLES T TEST. 3.3. Select “Attitude toward Internet [iattitude]” andSelect “Attitude toward Internet [iattitude]” and then select “Attitude toward technologythen select “Attitude toward technology [tattitude].” Move these variables in to the PAIRED[tattitude].” Move these variables in to the PAIRED VARIABLE(S) box.VARIABLE(S) box. 4.4. Click OK.Click OK.
111. 111. Relationship Amongst Test, Analysis ofRelationship Amongst Test, Analysis of Variance, Analysis of Covariance, &Variance, Analysis of Covariance, & RegressionRegression One Independent One or More Metric Dependent Variable t Test Binary Variable One-Way Analysis of Variance One Factor N-Way Analysis of Variance More than One Factor Analysis of Variance Categorical: Factorial Analysis of Covariance Categorical and Interval Regression Interval Independent Variables
112. 112. TheThe ZZ-Test for Comparing-Test for Comparing Two ProportionsTwo Proportions  Z-Test for Differences of Proportions  Tests the hypothesis that proportions are significantly different for two independent samples or groups.  Requires a sample size greater than thirty.  The hypothesis is: Ho: π1 = π2 may be restated as: Ho: π1 - π2 = 0
113. 113. TheThe ZZ-Test for Comparing Two-Test for Comparing Two ProportionsProportions  ZZ-Test statistic for differences in large-Test statistic for differences in large random samples:random samples: ( ) ( ) 21 2121 ppS pp Z − −−− = ππ p1 = sample portion of successes in Group 1 p2 = sample portion of successes in Group 2 (π1 − π1)= hypothesized population proportion 1 minus hypothesized population proportion 2 Sp1-p2 = pooled estimate of the standard errors of differences of proportions
114. 114. TheThe ZZ-Test for Comparing Two-Test for Comparing Two ProportionsProportions  To calculate the standard error of theTo calculate the standard error of the differences in proportions:differences in proportions:       −=− 21 11 21 nn qpS pp
115. 115. One-Way Analysis of VarianceOne-Way Analysis of Variance (ANOVA)(ANOVA)  Analysis of Variance (ANOVA)  An analysis involving the investigation of the effects of one treatment variable on an interval-scaled dependent variable.  A hypothesis-testing technique to determine whether statistically significant differences in means occur between two or more groups.  A method of comparing variances to make inferences about the means.  The substantive hypothesis tested is:  At least one group mean is not equal to anotherAt least one group mean is not equal to another group mean.group mean.
116. 116. Partitioning Variance inPartitioning Variance in ANOVAANOVA  Total Variability  Grand Mean  The mean of a variable over all observations.  SST = Total of (observed value-grand mean)2
117. 117. Partitioning Variance in ANOVAPartitioning Variance in ANOVA  Between-Groups Variance  The sum of differences between the group mean and the grand mean summed over all groups for a given set of observations.  SSB = Total of ngroup(Group Mean − Grand Mean)2  Within-Group Error or Variance  The sum of the differences between observed values and the group mean for a given set of observations  Also known as total error variance.  SSE = Total of (Observed Mean − Group Mean)2
118. 118. TheThe FF-Test-Test  F-Test  Used to determine whether there is more variability in the scores of one sample than in the scores of another sample.  Variance components are used to compute F-ratios  SSE, SSB, SST groupswithinVariance groupsbetweenVariance F −− −− =
119. 119. EXHIBIT 22.EXHIBIT 22.66 Interpreting ANOVAInterpreting ANOVA
120. 120. 1 - 120
121. 121. SPSS WindowsSPSS Windows One-way ANOVA can be efficientlyOne-way ANOVA can be efficiently performed using the program COMPAREperformed using the program COMPARE MEANS and then One-way ANOVA. ToMEANS and then One-way ANOVA. To select this procedure using SPSS forselect this procedure using SPSS for Windows, click:Windows, click: Analyze>Compare Means>One-Way ANOVA …Analyze>Compare Means>One-Way ANOVA … N-way analysis of variance and analysis ofN-way analysis of variance and analysis of covariance can be performed usingcovariance can be performed using GENERAL LINEAR MODEL. To select thisGENERAL LINEAR MODEL. To select this procedure using SPSS for Windows, click:procedure using SPSS for Windows, click: Analyze>General Linear Model>Univariate …Analyze>General Linear Model>Univariate …
122. 122. SPSS Windows: One-WaySPSS Windows: One-Way ANOVAANOVA 1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar. 2.2. Click COMPARE MEANS and then ONE-WAY ANOVA.Click COMPARE MEANS and then ONE-WAY ANOVA. 3.3. Move “Sales [sales]” in to the DEPENDENT LIST box.Move “Sales [sales]” in to the DEPENDENT LIST box. 4.4. Move “In-Store Promotion[promotion]” to the FACTORMove “In-Store Promotion[promotion]” to the FACTOR box.box. 5.5. Click OPTIONS.Click OPTIONS. 6.6. Click Descriptive.Click Descriptive. 7.7. Click CONTINUE.Click CONTINUE. 8.8. Click OK.Click OK.
123. 123. SPSS Windows: Analysis of CovarianceSPSS Windows: Analysis of Covariance 1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar. 2.2. Click GENERAL LINEAR MODEL and then UNIVARIATE.Click GENERAL LINEAR MODEL and then UNIVARIATE. 3.3. Move “Sales [sales]” in to the DEPENDENT VARIABLEMove “Sales [sales]” in to the DEPENDENT VARIABLE box.box. 4.4. Move “In-Store Promotion[promotion]” to the FIXEDMove “In-Store Promotion[promotion]” to the FIXED FACTOR(S) box. Then move “Coupon[coupon] also toFACTOR(S) box. Then move “Coupon[coupon] also to the FIXED FACTOR(S) box.the FIXED FACTOR(S) box. 5.5. Move “Clientel[clientel] to the COVARIATE(S) box.Move “Clientel[clientel] to the COVARIATE(S) box. 6.6. Click OK.Click OK.
124. 124. The BasicsThe Basics  Measures of Association  Refers to a number of bivariate statistical techniques used to measure the strength of a relationship between two variables.  The chi-square (χ2 ) test provides information about whether two or more less-than interval variables are interrelated.  Correlation analysis is most appropriate for interval or ratio variables.  Regression can accommodate either less- than interval or interval independent variables, but the dependent variable must be continuous.
125. 125. 23–125 EXHIBIT 23.EXHIBIT 23.11 Bivariate Analysis—Bivariate Analysis— Common Procedures forCommon Procedures for Testing AssociationTesting Association
126. 126. Simple Correlation CoefficientSimple Correlation Coefficient (continued)(continued)  Correlation coefficient  A statistical measure of the covariation, or association, between two at-least interval variables.  Covariance  Extent to which two variables are associated systematically with each other. ( )( ) ( ) ( )∑ ∑ ∑ = = = −− −− == n i n i n i ii yxxy YYiXXi YYXX rr 1 1 22 1
127. 127. Simple Correlation CoefficientSimple Correlation Coefficient  Correlation coefficient (r)  Ranges from +1 to -1  Perfect positive linear relationship = +1  Perfect negative (inverse) linear relationship = -1  No correlation = 0  Correlation coefficient for two variables (X,Y)
128. 128. EXHIBIT 23.EXHIBIT 23.22 Scatter Diagram to Illustrate Correlation PatternsScatter Diagram to Illustrate Correlation Patterns
129. 129. Correlation, Covariance, andCorrelation, Covariance, and CausationCausation  When two variables covary (i.e. vary systematically), they display concomitant variation.  This systematic covariation does not in and of itself establish causality.  e.g., Rooster’s crow and the rising of the sun  Rooster does not cause the sun to rise.
130. 130. Coefficient of DeterminationCoefficient of Determination  Coefficient of Determination (R2 )  A measure obtained by squaring the correlation coefficient; the proportion of the total variance of a variable accounted for by another value of another variable.  Measures that part of the total variance of Y that is accounted for by knowing the value of X. VarianceTotal varianceExplained2 =R
131. 131. Correlation MatrixCorrelation Matrix  Correlation matrix  The standard form for reporting correlation coefficients for more than two variables.  Statistical Significance  The procedure for determining statistical significance is the t-test of the significance of a correlation coefficient.
132. 132. EXHIBIT 23.EXHIBIT 23.44 Pearson Product-Moment Correlation Matrix for SalespersonPearson Product-Moment Correlation Matrix for Salesperson ExampleExampleaa
133. 133. Regression AnalysisRegression Analysis  Simple (Bivariate) Linear Regression  A measure of linear association that investigates straight-line relationships between a continuous dependent variable and an independent variable that is usually continuous, but can be a categorical dummy variable.  The Regression Equation (Y = α + βX )  Y = the continuous dependent variable  X = the independent variable  α = the Y intercept (regression line intercepts Y axis)  β = the slope of the coefficient (rise over run)
134. 134. 130 120 110 100 90 80 80 90 100 110 120 130 140 150 160 170 X Y XaY βˆˆˆ += X∆ Yˆ∆ Regression Line and SlopeRegression Line and Slope
135. 135. The Regression EquationThe Regression Equation  Parameter Estimate Choices  β is indicative of the strength and direction of the relationship between the independent and dependent variable.  α (Y intercept) is a fixed point that is considered a constant (how much Y can exist without X)  Standardized Regression Coefficient (β)  Estimated coefficient of the strength of relationship between the independent and dependent variables.  Expressed on a standardized scale where higher absolute values indicate stronger relationships (range is from -1 to 1).
136. 136. The Regression Equation (cont’d)The Regression Equation (cont’d)  Parameter Estimate Choices  Raw regression estimates (b1)  Raw regression weights have the advantage of retaining the scale metric—which is also their key disadvantage.  If the purpose of the regression analysis is forecasting, then raw parameter estimates must be used.  This is another way of saying when the researcher is interested only in prediction.  Standardized regression estimates (β)  Standardized regression estimates have the advantage of a constant scale.  Standardized regression estimates should be used when the researcher is testing explanatory hypotheses.
137. 137. EXHIBIT 23.EXHIBIT 23.55 The Advantage of Standardized Regression WeightsThe Advantage of Standardized Regression Weights
138. 138. EXHIBIT 23.EXHIBIT 23.66 Relationship of Sales Potential to Building Permits IssuedRelationship of Sales Potential to Building Permits Issued
139. 139. EXHIBIT 23.EXHIBIT 23.77 The Best Fit Line or Knocking Out the PinsThe Best Fit Line or Knocking Out the Pins
140. 140. Ordinary Least-SquaresOrdinary Least-Squares (OLS) Method of Regression(OLS) Method of Regression AnalysisAnalysis OLS  Guarantees that the resulting straight line will produce the least possible total error in using X to predict Y.  Generates a straight line that minimizes the sum of squared deviations of the actual values from this predicted regression line.  No straight line can completely represent every dot in the scatter diagram.  There will be a discrepancy between most of the actual scores (each dot) and the predicted score .  Uses the criterion of attempting to make the least amount of total error in prediction of Y from X.
141. 141. Ordinary Least-Squares MethodOrdinary Least-Squares Method of Regression Analysis (OLS)of Regression Analysis (OLS) (cont’d)(cont’d)
142. 142. Ordinary Least-Squares MethodOrdinary Least-Squares Method of Regression Analysis (OLS)of Regression Analysis (OLS) (cont’d)(cont’d) The equation means that the predicted value for any value of X (Xi) is determined as a function of the estimated slope coefficient, plus the estimated intercept coefficient + some error.
143. 143. © 2010 South-Western/Cengage Learning. All rights reserved. May not be scanned, copied or duplicated, or posted to a publically accessible website, in whole or in part. 23–143 Ordinary Least-SquaresOrdinary Least-Squares Method of RegressionMethod of Regression Analysis (OLS) (cont’d)Analysis (OLS) (cont’d)
144. 144. © 2010 South-Western/Cengage Learning. All rights reserved. May not be scanned, copied or duplicated, or posted to a publically accessible website, in whole or in part. 23–144 Ordinary Least-SquaresOrdinary Least-Squares Method of RegressionMethod of Regression Analysis (OLS) (cont’d)Analysis (OLS) (cont’d) Statistical Significance Of Regression Model  F-test (regression)  Determines whether more variability is explained by the regression or unexplained by the regression.
145. 145. Ordinary Least-Squares MethodOrdinary Least-Squares Method of Regression Analysis (OLS)of Regression Analysis (OLS) (cont’d)(cont’d)  Statistical Significance Of Regression ModelStatistical Significance Of Regression Model  ANOVA Table:ANOVA Table:
146. 146. Ordinary Least-Squares MethodOrdinary Least-Squares Method of Regression Analysis (OLS)of Regression Analysis (OLS) (cont’d)(cont’d)  R2  The proportion of variance in Y that is explained by X (or vice versa)  A measure obtained by squaring the correlation coefficient; that proportion of the total variance of a variable that is accounted for by knowing the value of another variable. 875.0 40.882,3 49.398,32 ==R
147. 147. EXHIBIT 23.EXHIBIT 23.88 Simple Regression Results for Building Permit ExampleSimple Regression Results for Building Permit Example
148. 148. EXHIBIT 23.EXHIBIT 23.99 OLS Regression LineOLS Regression Line
149. 149. Simple Regression andSimple Regression and Hypothesis TestingHypothesis Testing  The explanatory power of regression lies in hypothesis testing. Regression is often used to test relational hypotheses.  The outcome of the hypothesis test involves two conditions that must both be satisfied:  The regression weight must be in the hypothesized direction. Positive relationships require a positive coefficient and negative relationships require a negative coefficient.  The t-test associated with the regression weight must be significant.
150. 150. What is Multivariate DataWhat is Multivariate Data Analysis?Analysis?  Research that involves three or more variables, or that is concerned with underlying dimensions among multiple variables, will involve multivariate statistical analysis.  Methods analyze multiple variables or even multiple sets of variables simultaneously.  Business problems involve multivariate data analysis:  most employee motivation research  customer psychographic profiles  research that seeks to identify viable market segments
151. 151. The “Variate” in MultivariateThe “Variate” in Multivariate  Variate  A mathematical way in which a set of variables can be represented with one equation.  A linear combination of variables, each contributing to the overall meaning of the variate based upon an empirically derived weight.  A function of the measured variables involved in an analysis: Vk = f (X1, X2, . . . , Xm )
152. 152. EXHIBIT 24.EXHIBIT 24.11 Which Multivariate Approach Is Appropriate?Which Multivariate Approach Is Appropriate?
153. 153. 24–153 Classifying MultivariateClassifying Multivariate TechniquesTechniques  Dependence Techniques  Explain or predict one or more dependent variables.  Needed when hypotheses involve distinction between independent and dependent variables.  Types:  Multiple regression analysis  Multiple discriminant analysis  Multivariate analysis of variance  Structural equations modeling
154. 154. Classifying MultivariateClassifying Multivariate Techniques (cont’d)Techniques (cont’d)  Interdependence Techniques  Give meaning to a set of variables or seek to group things together.  Used when researchers examine questions that do not distinguish between independent and dependent variables.  Types:  Factor analysis  Cluster analysis  Multidimensional scaling
155. 155. Classifying MultivariateClassifying Multivariate Techniques (cont’d)Techniques (cont’d)  Influence of Measurement Scales  The nature of the measurement scales will determine which multivariate technique is appropriate for the data.  Selection of a multivariate technique requires consideration of the types of measures used for both independent and dependent sets of variables.  Nominal and ordinal scales are nonmetric.  Interval and ratio scales are metric.
156. 156. 24–156 EXHIBIT 24.EXHIBIT 24.22 Which Multivariate Dependence Technique Should I Use?Which Multivariate Dependence Technique Should I Use?
157. 157. 24–157 EXHIBIT 24.EXHIBIT 24.33 Which Multivariate Interdependence Technique Should I Use?Which Multivariate Interdependence Technique Should I Use?
158. 158. Analysis of DependenceAnalysis of Dependence  General Linear Model (GLM)  A way of explaining and predicting a dependent variable based on fluctuations (variation) from its mean due to changes in independent variables. μ = a constant (overall mean of the dependent variable) ∆X and ∆F = changes due to main effect independent variables (experimental variables) and blocking independent variables (covariates or grouping variables) ∆ XF = represents the change due to the combination (interaction effect) of those variables.
159. 159. Interpreting Multiple RegressionInterpreting Multiple Regression  Multiple Regression Analysis  An analysis of association in which the effects of two or more independent variables on a single, interval-scaled dependent variable are investigated simultaneously. inni eXbXbXbXbbY ++++++= 3322110 • Dummy variable  The way a dichotomous (two group) independent variable is represented in regression analysis by assigning a 0 to one group and a 1 to the other.
160. 160. Multiple Regression AnalysisMultiple Regression Analysis  A Simple Example  Assume that a toy manufacturer wishes to explain store sales (dependent variable) using a sample of stores from Canada and Europe.  Several hypotheses are offered:  H1: Competitor’s sales are related negatively to sales.  H2: Sales are higher in communities with a sales office than when no sales office is present.  H3: Grammar school enrollment in a community is related positively to sales.
161. 161. Multiple Regression AnalysisMultiple Regression Analysis (cont’d)(cont’d) Statistical Results of the Multiple Regression  Regression Equation:  Coefficient of multiple determination (R2 ) = 0.845  F-value= 14.6, p < 0.05 321 7362115387018102 XXXY .... +++= 
162. 162. Multiple Regression AnalysisMultiple Regression Analysis (cont’d)(cont’d)  Regression Coefficients in Multiple Regression  Partial correlation  The correlation between two variables after taking into account the fact that they are correlated with other variables too.  R2 in Multiple Regression  The coefficient of multiple determination in multiple regression indicates the percentage of variation in Y explained by all independent variables.
163. 163. 24–163 Multiple Regression AnalysisMultiple Regression Analysis (cont’d)(cont’d)  Statistical Significance in Multiple Regression  F-test  Tests statistical significance by comparing the variation explained by the regression equation to the residual error variation.  Allows for testing of the relative magnitudes of the sum of squares due to the regression (SSR) and the error sum of squares (SSE). ( ) ( ) ( ) MSE MSR knSSe kSSr F = −− = 1/ /
164. 164. Multiple Regression AnalysisMultiple Regression Analysis (cont’d)(cont’d)  Degrees of Freedom (d.f.)  k = number of independent variables  n = number of observations or respondents  Calculating Degrees of Freedom (d.f.)  d.f. for the numerator = k  d.f. for the denominator = n - k - 1
165. 165. FF-test-test ( ) ( ) ( ) MSE MSR knSSe kSSr F = −− = 1/ /
166. 166. EXHIBIT 24.EXHIBIT 24.44 Interpreting MultipleInterpreting Multiple Regression ResultsRegression Results
167. 167. ANOVA (n-way) and MANOVAANOVA (n-way) and MANOVA  Multivariate Analysis of Variance (MANOVA)  A multivariate technique that predicts multiple continuous dependent variables with multiple categorical independent variables.
168. 168. ANOVA (n-way) and MANOVAANOVA (n-way) and MANOVA (cont’d)(cont’d) Interpreting N-way (Univariate) ANOVA 1. Examine overall model F-test result. If significant, proceed. 2. Examine individual F-tests for individual variables. 3. For each significant categorical independent variable, interpret the effect by examining the group means. 4. For each significant, continuous covariate, interpret the parameter estimate (b). 5. For each significant interaction, interpret the means for each combination.
169. 169. Discriminant AnalysisDiscriminant Analysis  A statistical technique for predicting the probability that an object will belong in one of two or more mutually exclusive categories (dependent variable), based on several independent variables.  To calculate discriminant scores, the linear function used is: niniii XbXbXbZ +++= 2211
170. 170. Discriminant AnalysisDiscriminant Analysis ExampleExample 332211 XbXbXbZ ++= 321 0007001300690 XXX ... ++=
171. 171. EXHIBIT 24.EXHIBIT 24.55 Multivariate Dependence Techniques SummaryMultivariate Dependence Techniques Summary
172. 172. Factor AnalysisFactor Analysis  Statistically identifies a reduced number of factors from a larger number of measured variables.  Types:  Exploratory factor analysis (EFA)—performed when the researcher is uncertain about how many factors may exist among a set of variables.  Confirmatory factor analysis (CFA)— performed when the researcher has strong theoretical expectations about the factor structure before performing the analysis.
173. 173. EXHIBIT 24.EXHIBIT 24.66 A Simple Illustration of Factor AnalysisA Simple Illustration of Factor Analysis
174. 174. Factor Analysis (cont’d)Factor Analysis (cont’d)  How Many Factors  Eigenvalues are a measure of how much variance is explained by each factor.  Common rule:  Base the number of factors on the number of eigenvalues greater than 1.0.  Factor Loading  Indicates how strongly a measured variable is correlated with a factor.
175. 175. Factor Analysis (cont’d)Factor Analysis (cont’d)  Factor Rotation  A mathematical way of simplifying factor analysis results to better identify which variables “load on” which factors.  Most common procedure is varimax rotation.  Data Reduction Technique  Approaches that summarize the information from many variables into a reduced set of variates formed as linear combinations of measured variables.  The rule of parsimony: an explanation involving fewer components is better than one involving many more.
176. 176. Factor Analysis (cont’d)Factor Analysis (cont’d)  Creating Composite Scales with Factor Results  When a clear pattern of loadings exists, the researcher may take a simpler approach by summing the variables with high loadings and creating a summated scale.  Very low loadings suggest a variable does not contribute much to the factor.  The reliability of each summated scale is tested by computing a coefficient alpha estimate.
177. 177. Factor Analysis (cont’d)Factor Analysis (cont’d)  Communality  A measure of the percentage of a variable’s variation that is explained by the factors.  A relatively high communality indicates that a variable has much in common with the other variables taken as a group.  Communality for any variable is equal to the sum of the squared loadings for that variable.
178. 178. Factor Analysis (cont’d)Factor Analysis (cont’d)  Total Variance Explained  Squaring and totaling each loading factor; dividing the total by the number of factors provides an estimate of variance in a set of variables explained by a factor.  This explanation of variance is much the same as R2 in multiple regression.
179. 179. 1 - 179
180. 180. SPSSSPSS WindowsWindows To select this procedure usingTo select this procedure using SPSS for Windows, click:SPSS for Windows, click: Analyze>Data Reduction>FactorAnalyze>Data Reduction>Factor ……
181. 181. SPSS Windows: Principal ComponentsSPSS Windows: Principal Components 1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar. 2.2. Click DATA REDUCTION and then FACTOR.Click DATA REDUCTION and then FACTOR. 3.3. Move “Prevents Cavities [v1],” “Shiny Teeth [v2],” “Strengthen Gums [v3],”Move “Prevents Cavities [v1],” “Shiny Teeth [v2],” “Strengthen Gums [v3],” “Freshens Breath [v4],” “Tooth Decay Unimportant [v5],” and “Attractive Teeth“Freshens Breath [v4],” “Tooth Decay Unimportant [v5],” and “Attractive Teeth [v6]” into the VARIABLES box[v6]” into the VARIABLES box 4.4. Click on DESCRIPTIVES. In the pop-up window, in the STATISTICS box checkClick on DESCRIPTIVES. In the pop-up window, in the STATISTICS box check INITIAL SOLUTION. In the CORRELATION MATRIX box, check KMO ANDINITIAL SOLUTION. In the CORRELATION MATRIX box, check KMO AND BARTLETT’S TEST OF SPHERICITY and also check REPRODUCED. ClickBARTLETT’S TEST OF SPHERICITY and also check REPRODUCED. Click CONTINUE.CONTINUE. 5.5. Click on EXTRACTION. In the pop-up window, for METHOD select PRINCIPALClick on EXTRACTION. In the pop-up window, for METHOD select PRINCIPAL COMPONENTS (default). In the ANALYZE box, check CORRELATIONCOMPONENTS (default). In the ANALYZE box, check CORRELATION MATRIX. In the EXTRACT box, check EIGEN VALUE OVER 1(default). In theMATRIX. In the EXTRACT box, check EIGEN VALUE OVER 1(default). In the DISPLAY box, check UNROTATED FACTOR SOLUTION. Click CONTINUE.DISPLAY box, check UNROTATED FACTOR SOLUTION. Click CONTINUE. 6.6. Click on ROTATION. In the METHOD box, check VARIMAX. In the DISPLAYClick on ROTATION. In the METHOD box, check VARIMAX. In the DISPLAY box, check ROTATED SOLUTION. Click CONTINUE.box, check ROTATED SOLUTION. Click CONTINUE. 7.7. Click on SCORES. In the pop-up window, check DISPLAY FACTOR SCOREClick on SCORES. In the pop-up window, check DISPLAY FACTOR SCORE COEFFICIENT MATRIX. Click CONTINUE.COEFFICIENT MATRIX. Click CONTINUE. 8.8. Click OK.Click OK.
182. 182. Cluster AnalysisCluster Analysis  Cluster analysis  A multivariate approach for grouping observations based on similarity among measured variables.  Cluster analysis is an important tool for identifying market segments.  Cluster analysis classifies individuals or objects into a small number of mutually exclusive and exhaustive groups.  Objects or individuals are assigned to groups so that there is great similarity within groups and much less similarity between groups.  The cluster should have high internal (within-cluster) homogeneity and external (between-cluster) heterogeneity.
183. 183. EXHIBIT 24.EXHIBIT 24.77 Clusters of Individuals on Two DimensionsClusters of Individuals on Two Dimensions
184. 184. 24–184 EXHIBIT 24.EXHIBIT 24.88 Cluster Analysis of Test-Market CitiesCluster Analysis of Test-Market Cities
185. 185. 1 - 185
186. 186. SPSS WindowsSPSS Windows To select this procedure using SPSS forTo select this procedure using SPSS for Windows, click:Windows, click: Analyze>Classify>Hierarchical ClusterAnalyze>Classify>Hierarchical Cluster …… Analyze>Classify>K-Means Cluster …Analyze>Classify>K-Means Cluster … Analyze>Classify>Two-Step ClusterAnalyze>Classify>Two-Step Cluster ……
187. 187. SPSS Windows: Hierarchical ClusteringSPSS Windows: Hierarchical Clustering 1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar. 2.2. Click CLASSIFY and then HIERARCHICAL CLUSTER.Click CLASSIFY and then HIERARCHICAL CLUSTER. 3.3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],”Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.“Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box. 4.4. In the CLUSTER box, check CASES (default option). In the DISPLAY box, checkIn the CLUSTER box, check CASES (default option). In the DISPLAY box, check STATISTICS and PLOTS (default options).STATISTICS and PLOTS (default options). 5.5. Click on STATISTICS. In the pop-up window, check AGGLOMERATIONClick on STATISTICS. In the pop-up window, check AGGLOMERATION SCHEDULE. In the CLUSTER MEMBERSHIP box, check RANGE OF SOLUTIONS.SCHEDULE. In the CLUSTER MEMBERSHIP box, check RANGE OF SOLUTIONS. Then, for MINIMUM NUMBER OF CLUSTERS, enter 2 and for MAXIMUM NUMBERThen, for MINIMUM NUMBER OF CLUSTERS, enter 2 and for MAXIMUM NUMBER OF CLUSTERS, enter 4. Click CONTINUE.OF CLUSTERS, enter 4. Click CONTINUE. 6.6. Click on PLOTS. In the pop-up window, check DENDROGRAM. In the ICICLEClick on PLOTS. In the pop-up window, check DENDROGRAM. In the ICICLE box, check ALL CLUSTERS (default). In the ORIENTATION box, checkbox, check ALL CLUSTERS (default). In the ORIENTATION box, check VERTICAL. Click CONTINUE.VERTICAL. Click CONTINUE. 7.7. Click on METHOD. For CLUSTER METHOD, select WARD’S METHOD. In theClick on METHOD. For CLUSTER METHOD, select WARD’S METHOD. In the MEASURE box, check INTERVAL and select SQUARED EUCLIDEAN DISTANCE.MEASURE box, check INTERVAL and select SQUARED EUCLIDEAN DISTANCE. Click CONTINUE.Click CONTINUE. 8.8. Click OK.Click OK.
188. 188. SPSS Windows: K-MeansSPSS Windows: K-Means ClusteringClustering 1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar. 2.2. Click CLASSIFY and then K-MEANS CLUSTER.Click CLASSIFY and then K-MEANS CLUSTER. 3.3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],”Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]”“Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.into the VARIABLES box. 4.4. For NUMBER OF CLUSTER, select 3.For NUMBER OF CLUSTER, select 3. 5.5. Click on OPTIONS. In the pop-up window, in the STATISTICSClick on OPTIONS. In the pop-up window, in the STATISTICS box, check INITIAL CLUSTER CENTERS and CLUSTERbox, check INITIAL CLUSTER CENTERS and CLUSTER INFORMATION FOR EACH CASE. Click CONTINUE.INFORMATION FOR EACH CASE. Click CONTINUE. 6.6. Click OK.Click OK.
189. 189. SPSS Windows: Two-StepSPSS Windows: Two-Step ClusteringClustering 1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar. 2.2. Click CLASSIFY and then TWO-STEP CLUSTER.Click CLASSIFY and then TWO-STEP CLUSTER. 3.3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “BestMove “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” intoBuys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the CONTINUOUS VARIABLES box.the CONTINUOUS VARIABLES box. 4.4. For DISTANCE MEASURE, select EUCLIDEAN.For DISTANCE MEASURE, select EUCLIDEAN. 5.5. For NUMBER OF CLUSTER, select DETERMINEFor NUMBER OF CLUSTER, select DETERMINE AUTOMATICALLY.AUTOMATICALLY. 6.6. For CLUSTERING CRITERION, select AKAIKE’S INFORMATIONFor CLUSTERING CRITERION, select AKAIKE’S INFORMATION CRITERION (AIC).CRITERION (AIC). 7.7. Click OK.Click OK.
190. 190. Multidimensional ScalingMultidimensional Scaling  Multidimensional Scaling  Measures objects in multidimensional space on the basis of respondents’ judgments of the similarity of objects.