Like this document? Why not share!

# Spss intro

## on Jan 09, 2014

• 210 views

### Views

Total Views
210
Views on SlideShare
201
Embed Views
9

Likes
0
11
0

### 1 Embed9

 https://lms.kku.edu.sa 9

### Report content

• Comment goes here.
Are you sure you want to

## Spss introDocument Transcript

• An Introduction Course in SPSS by Mr. Liberato Camilleri
• 1. Overview of Data Analysis in SPSS 1.1 Data Entry In SPSS variables are defined in the Variable View output. These variables are generated by specifying a name for each variable. Factors are declared by specifying a label and a value for each level of the categorical variables. As an illustration we provide the following case study. In a study two groups of respondents were picked at random. The experimental group suffered from cardiac problems and the control group was not known to suffer from heart problems. All the members in the two groups were known to make daily use of a treadmill. These 22 respondents were asked to fill a questionnaire specifying their age, gender, weight, and mean duration of daily treadmill use, measured in minutes. They were also asked to indicate whether or not they had cardiac problems. What is the mean duration of your daily use of a treadmill? (minutes) Do you have cardiac problems? ______ Gender: _______ What is your weight? _____ (kg) What is your age? _____ (years) ________ Gender and cardiac condition are two factors (qualitative variables) each having two levels (categories). These levels have to be labelled and enumerated. For the cardiac condition, the value 1 represents an unhealthy respondent with heart problems and the value 2 represents a healthy respondent. For the gender, the value 1 represents a male respondent and the value 2 represents a female. No levels have to be specified for the variables age, weight and duration of daily treadmill use because they are covariates. In SPSS the data is entered in the Data View output. Data files are presented in a rectangular arrangement where the rows represent the respondents and the columns represent the variables. A row contains the information elicited by a particular L.Camilleri 1
• respondent for all the variables and a column contains the information for a particular variable elicited by all the respondents. A further task was to generate another factor by classifying the respondents’ ages into three age categories. This could be done explicitly by SPSS using the Recode option. This option recodes any age value into an appropriate age category and then saves it in the generated factor ‘Age groups’. 1.2 Graphical Presentations A histogram is an important graphical presentation which shows the distribution of values of a covariate (quantitative variable). These values are first divided into groups of equally spaced intervals and then the frequency (count) of cases in each interval is plotted as a bar. A histogram can be created by choosing Graphs and Histogram from the menus. A normal curve can be superimposed onto the histogram with same L.Camilleri 2
• mean and variance as the data. It can be used to assess the symmetry of the distribution. The distribution (histogram) of the mean duration of daily treadmill use can be generated by moving the covariate ‘duration’ in the variable list and selecting Display Normal Curve. It is evident that a larger proportion of respondents are using the treadmill in the range 12 to 14 minutes daily. The distribution of the mean duration of daily treadmill use is fairly normal. It is possible, in SPSS, to modify the minimum, maximum and increment of the scale values. To conduct these modification activate the chart editor by double clicking on the graph, highlight the values on the axis and then select Scale from the Properties tab. It is also possible to modify the number of bars in the histogram and change the style and colour of the inside of the histogram. To conduct this alteration, activate the chart editor by double clicking on the graph, highlight the histogram and then select Fill and Border from the Properties tab. L.Camilleri 3
• The number of intervals or the interval widths is modified by selecting the Histogram Option from the Properties tab. It is possible to generate two separate histograms of the mean duration of daily treadmill use for the healthy or sick groups. This is conducted by moving the factor ‘cardiac’ in the Panel by rows. It is evident that healthy respondents use the treadmill for a longer period of time compared to sick respondents. L.Camilleri 4
• Pie charts are used to analyze factors (qualitative variables). In a pie chart the different levels of a factor are represented by the sectors of a circle. The size of each slice is proportional to the size of its respective category. For example, a pie chart showing the percentage of respondents in the experimental and controls groups can be created by selecting pie charts from the graphs menu. Slices can either represent frequencies or percentage of cases. To define the slices drag the categorical variable ‘Cardiac’ to slice by. Pie chart properties can be modified by clicking the right button. The counts or percentages can be displayed on the pie charts by selecting data labels. Do you have cardiac problems? Yes No 54.55% 45.45% Using this property window it is possible to separate slices by selecting explode chart. L.Camilleri 5
• Do you have cardiac problems? Yes No 45.45% 54.55% From this property window it is possible to change a pie chart to a bar chart, line chart or area chart. In a bar chart the frequency or percentage value of each factor level is represented by a vertical bar. Larger values are represented by longer bars. In a line chart each frequency or percentage value is represented by a point. These points are connected by straight lines. An area chart is a line chart with the space below the line filled in. Bar, line and area charts can also be created by selecting bar, line and area from the graphs menu. All graphs show a higher proportion of respondents in the control group. 60 54.55% 50 45.45% 40 30 20 10 0 Yes L.Camilleri No 6
• 60 50 40 30 54.55% 45.45% 20 10 0 Yes No One can also create pie, bar and area charts to display the proportion of respondents in the control and experimental groups separately for males and females. These charts can be generated by moving ‘Cardiac’ in the category axis and ‘Gender’ in the column panel. Do you have cardiac problems? Yes No Gender male female 9.09% 22.73% 36.36% 31.82% L.Camilleri 7
• Gender male female 40.0% 36.4% 31.8% Percent 30.0% 22.7% 20.0% 10.0% 9.1% 0.0% Yes No Yes No Do you have cardiac problems? All graphs demonstrate a higher proportion of males in the experimental group who have cardiac problems compared to females. The graphs also demonstrate a higher proportion of females in the control group compared to males who do not reveal any problems. Gender male female 40.0% Percent 30.0% 20.0% 36.4% 31.8% 22.7% 10.0% 9.1% 0.0% Yes No Yes Cardiac L.Camilleri 8 No
• Another way of representing categories is with clustered charts. Clustered area charts can be generated by selecting area and stacked from the chart menu. In these graphs the areas for all the factor levels have the same baseline. In the following clustered area chart the category axis is defined by ‘Cardiac’ and area is defined by ‘Gender’. 120.0% 77.8% Gender male female 100.0% 22.2% Percent 80.0% 61.5% 60.0% 38.5% 40.0% 20.0% 0.0% Yes No Do you have cardiac problems? Clustered bar charts can be generated by selecting bar and clustered from the chart menu. In the following clustered bar chart the category axis is defined by ‘Cardiac’ and the clusters are defined by ‘Gender’. Error bars can be displayed on bar charts from the option menu. The bars display the 95% confidence interval and help the analyst visualize distributions and dispersion by indicating the variability of the measure being displayed. Gender male female 100.0% Percent 80.0% 60.0% 40.0% 20.0% 0.0% Yes No Do you have cardiac problems? A bivariate scatter plot is used to analyze two covariates simultaneously and is plotted along two axes. This graphical presentation of data points reveals important relationships between the covariates. It can also reveal outliers and unusual L.Camilleri 9
• combinations of data points. Points that do not fit a relationship well stand out in the plot. The procedure is to click on Graphs and Scatter/Dot… and select Simple Scatter. The axes of the scatter plot are defined by moving ‘duration’ in the y-axis and the respondent’s ‘age’ in the x-axis. The line of best fit can be obtained by double clicking on the graph to activate the chart editor. Select Add fit line at Total to produce the regression line. It is evident from the plot that young respondents are more likely to use the treadmill for a longer daily duration compared to elderly ones. The data points can be clustered either by cardiac condition or by gender. These two graphs can be produced by moving in turn the factors, ‘cardiac’ and ‘gender’, in set markers by. Separate line fits for the two clusters can be obtained by double clicking on the graph to activate the chart editor, and then select Add fit line at Subgroups to produce separate regression lines. L.Camilleri 10
• The daily treadmill use is on average longer for healthy respondents compared to sick ones. This difference becomes more conspicuous with an increase in the respondents’ age. The reduction of the mean daily treadmill use as the respondents get older applies to both experimental and control groups. The second scatter plot does not demonstrate any gender bias with regards to the daily treadmill use. Duration of treadmill use decreases with age for both male and female respondents. L.Camilleri 11
• It is also possible to produce scatter plots for three covariates simultaneously; however, it is very difficult to visualize the relationships between all the three covariates unless the scatter plot is rotated along an axis. This is carried out by clicking on 3D scatter and then define the axes by respectively moving ‘duration’, ‘age’ and ‘weight’ in the y, x and z-axis. In a 3D space we get a plane of best fit rather than a line. Simple box plots, sometimes called box-and-whisker plots, characterize the distribution and dispersion of a covariate, displaying its median and quartiles across the levels of a factor. The median is the 50th percentile and the interquartile range L.Camilleri 12
• ranges from the 25th to the 75th percentile. Whiskers at the ends of the box show the distance from the end of the box to the largest and smallest observed values that are less than 1.5 box lengths from either end of the box. Data points that fall outside this range are labelled as outliers or extreme values and their position is identified. Box plots are created by choosing Graphs and Boxplot… from the menus. In a simple box plot the selected variable must be a covariate and the category axis must be defined by a factor. This simple box plot demonstrates the distribution of respondents’ weights for both the experimental and control groups. The median weights for the two groups are respectively 91.5kg and 73kg. This implies that half the respondents in the experimental group weigh more than 91.5kg and half the respondents in the control group weigh less than 73kg. An interesting observation is that the lower quartile (25th percentile) for the experimental group and the upper quartile (75th percentile) of the control group are almost equal. This implies that 75% of the respondents in the experimental group weigh more than 75% of the respondents in the control group. It is possible to generate two separate box plots showing the distribution of respondents’ weights for both males and females. This is conducted by moving the factor ‘gender’ in the Panel by rows. An interesting observation is that male respondents weigh more than females for both the experimental and control groups. It is also evident that sick male respondents weigh significantly more than healthy ones but this is not so evident for females. Three data points are marked as outliers because they lie between 1.5 box lengths and 3 box lengths from the end of the box. Any data point which lies beyond 3 box lengths is marked with an asterisk. L.Camilleri 13
• It is possible to combine the two plots in a single clustered box plot. Clustered box plots display the distribution of a covariate across two factors. In the subsequent clustered box plot the selected variable is the respondent’s weight whereas the category axis and the clusters are respectively defined by cardiac condition and gender. The plot exhibits the same contrasts displayed in the preceding plot. 1.3 Analyzing multiple responses In the case study presented the 22 respondents were further asked to indicate the type of food that they prefer eating given four possible food categories. These food options were pasta, fish, meat and vegetables and the respondents were allowed to select more than one option. The four food options have to be defined explicitly by four categorical variables because each cell can allow only one data entry. The first categorical variable indicates whether the respondents prefer pasta or not. For instance, the second respondent prefers pasta and fish whereas the fourth respondent prefers pasta, meat and vegetables. L.Camilleri 14
• Multiple responses are analyzed through a multiple response frequency procedure. This produces frequency tables for multiple response sets. To generate a single combined set of these four food categories choose Multiple Responses from the menus. The new set of food categories is defined by moving pasta, fish, meat and vegetables in the new set which is labeled ‘Preferred Food’. The levels of this factor are defined by entering the range of categories from 1 to 4. Crosstabs are very useful when analyzing associations between factors. It is also possible to get cross-tabulations of any number of factors by choosing Multiple Responses and Crosstabs from the menus. To examine the association between the respondent’s health and preferred food, one need to specify which of these two factors is defined by the crosstab rows and columns. In this example we define the levels of preferred food by the crosstab rows and the health categories by the crosstab columns. It is evident from the crosstab that respondents with cardiac problems are more likely to eat vegetables and fish whereas healthy respondents are more likely to eat pasta and meat L.Camilleri 15
• Preferred Food Pasta Fish Meat Vegetables Total Count Count Count Count Count Do you have cardiac problems? Yes No 4 9 8 7 3 9 9 2 10 12 Total 13 15 12 11 22 An alternative method is to stack the entries of these four categorical variables pasta, fish, meat and vegetables to explicitly generate this new factor ‘Preferred Food’. Stack also the entries of the factor ‘Cardiac’ four times to generate a new expanded factor such that both factors have 88 entries. To obtain a crosstab select Descriptive Statistics and Crosstab from the menus. Since the numbers of respondents in the two health categories are unequal it is advisable to produce column percentage to make correct associations of the preferred food categories for the two health groups. A clustered bar graph can also be produced to display, graphically, these associations. Preferred Food Pasta Fish Meat Vegetables Total Count Percentage Count Percentage Count Percentage Count Percentage Count Percentage Do you have cardiac problems? Yes No 4 9 16.7% 33.3% 8 7 33.3% 25.9% 3 9 12.5% 33.3% 9 2 37.5% 7.4% 24 27 100.0% 100.0% Total 13 25.5% 15 29.4% 12 23.5% 11 21.6% 51 100% For each preferred food category the bar lengths vary considerably between the two health-groups demonstrating graphically the association described above. L.Camilleri 16
• 1.4 Methods for describing data sets Numerical descriptive measures are very useful to make inferences for a population about the corresponding measures. A number of numerical methods are available to describe quantitative data sets. These methods measure one of these four data characteristics. 1. Measures of central tendency (location) Central tendency is the tendency of the data to cluster about a certain numerical value. The most popular measure of central tendency is the sample mean. The sample mean x is simply the average of the n observations xi . x= 1 n ∑ xi n i =1 The median is another measure of central tendency. This is the middle observation when all the observations are arranged in ascending order. The third measure of central tendency is the mode. This is the observation in the sample which occurs most frequently. 2. Measures of dispersion (variability) Dispersion is the extent to which the given data is different from the mean. The sample standard deviation, s, is the most popular measure of dispersion. It is the square root of the sample variance given by 1 n 2 s2 = ∑ ( xi − x ) n −1 1 The range is another measure of dispersion and it is the difference between the largest and the smallest observations. This is a rather plain, insensitive measure of dispersion and is hardly ever used. 3. Measures of relative standing Measures of relative standing describe the placement of an observation to the rest of the data. One measure of the relative standing of an observation is its percentile ranking. The observations are ranked from smallest to largest and the pth percentile is the number such that p% of the observations fall below this value. The lower quartile, median and upper quartile are respectively the 25th, 50th and 75th percentiles. The interquartile range is the distance between the lower and upper quartiles. Percentile rankings are of practical value only for large data sets. 4. Measures of the distribution of the data set The skewness characterizes the degree of asymmetry of a distribution around its mean. Negative skewness indicates a distribution which is skewed to the left. Positive skewness indicates a distribution which is skewed to the right. L.Camilleri 17
• Many naturally occurring continuous variables, such people’s heights and examination marks have a normal distribution which is symmetric. This is the most widely used distribution in Statistics. The kurtosis characterizes the relative peakedness or flatness of a distribution compared with the normal distribution. The skewness and kurtosis of the normal distribution are both zero. Negative kurtosis indicates a relatively flat distribution compared to the normal distribution, whereas positive kurtosis indicates a relatively peaked distribution. The Frequency procedure of SPSS provides the most important summary statistics. Some of these statistics require that the data follow a normal distribution (or at least that the shape of the variable’s histogram be symmetric). In particular, the mean, standard deviation, variance and skewness should be used with caution unless the distribution is fairly symmetric and has no extreme outlier. A descriptive statistic is called robust if the calculations are insensitive to violations of the assumption of normality. This category includes the median, mode, minimum and maximum values, range and quartiles. It is necessary to use graphics such as histograms with normal curve to determine whether the variables summarized have approximately a normal distribution. 9 43 56 68 84 12 44 56 68 84 15 45 57 70 86 21 47 58 73 87 24 47 58 73 88 26 49 63 74 88 31 52 64 77 90 31 52 64 79 93 38 54 65 80 95 39 56 67 82 96 The above table shows the marks obtained by 50 students in a Mathematics examination. The sample was chosen randomly from large school population. From the menus select Descriptive Statistic and Frequencies and move the vector of raw marks into the variables list. From the dialogue box select the statistics mean, median, mode, quartiles, standard deviation, variance, range, skewness and kurtosis to measure central tendency, variability, symmetry and peakedness of the distribution of marks. L.Camilleri 18
• marks Mean Median Mode Std. Deviation Variance Skewness Kurtosis Range Percentiles 59.56 60.50 56 23.037 530.700 -.404 -.574 87 44.75 60.50 79.25 25 50 75 The three measures of central tendency indicate that the average mark is approximately 60. In a perfectly symmetric distribution the mean, median and mode should be equal. The fact that these three measures differ from each other indicate that the distribution is skewed. The bigger the difference between these three measures the less symmetric the distribution will be. The marks range from 9 to 96 explaining why the standard deviation is large. If the marks had to be clustered closer to the mean one would expect a smaller standard deviation. Both the skewness and the kurtosis have a negative value indicating that the distribution of marks is skewed to the left and is flatter than the normal distribution. This can be verified by plotting a histogram and displaying the normal curve. The lower and upper quartiles are respectively 44.75 and 79.25. This implies that 25% of the students got a mark less than 45 and another 25% of the sample got a mark higher than 79. L.Camilleri 19
• 2. Hypothesis Testing 2.1 Aspects of Hypothesis tests Very often in practice we have to make decisions about a population on the basis of sample information. For example we may wish to decide whether a new vaccine is really more effective in curing a disease than another or whether a teaching procedure is better than another. In attempting to reach a decision, it is useful to make an assertion about the population involved. Such an assumption, called a hypothesis, may be true or false. This type of inference is called hypothesis testing. Null and alternative hypothesis In hypotheses testing the sole purpose of the null hypothesis is to nullify any difference between two procedures. In a null hypothesis, which is denoted by H o , we hypothesize that there is no difference between two vaccines or no difference between two teaching procedures. Any hypothesis that differs from the null hypothesis is called an alternative hypothesis and is denoted by H 1 . The alternative hypothesis is accepted when the null hypothesis is rejected. So it must always be formulated together with the null hypothesis. Tests of hypotheses can be either one-tailed or twotailed tests. For example, if an analyst wants to test the assertion that a die is not giving the correct proportion of sixes a two-tailed test should be used and the hypotheses will be H o : p = 16 H1 : p ≠ 16 The null hypothesis is rejected if the die gives either a significantly higher proportion or a significantly lower proportion of sixes. However if the analyst wants to test solely the assertion that the die is giving a higher proportion of sixes a one-tailed test should be used and the hypotheses will be H0 : p = 1 H1 : p > 1 6 6 The null hypothesis is rejected only if the die gives a significantly higher proportion of sixes. Type I and type II errors If the null hypothesis is rejected when it should be accepted then a type I error has been made. If, on the other hand, the null hypothesis is accepted when it should be rejected then a type II error has been made. In either case, a wrong decision or error in judgment has been made. Ho is true L.Camilleri Accept Ho Correct decision 20 Reject Ho Type I error
• Ho is false Type II error Correct decision In order for hypothesis tests to be good, they must be designed so as to minimize errors of decision. This is not a simple matter, because for any given sample size, an attempt to decrease one type of error will increase the other type of error. The only way to reduce both types of error is to increase the sample size. If this is not possible a compromise should be reached in favour of limiting the more serious error. Level of significance In testing a given hypothesis, the maximum probability with which we are willing to risk a type I error is called the level of significance. In practice a level of significance of 0.05 or 0.01 is normally used and is specified by the analyst before any samples are drawn. If a 0.05 level of significance is used then there is a 5% chance of rejecting the null hypothesis when it should be accepted (type I error). Basic procedure of conducting tests of hypotheses • • • • Devise an appropriate test statistic from the data in some specified way. Based on the general statistical properties and assuming the null hypothesis to be true, derive the sampling distribution of the test statistic. On the basis of this sampling distribution (under the null hypothesis) find the probability (p-value) that a value of the test statistic exceeds the value computed from the data. If the p-value exceeds the level of significance the null hypothesis is accepted. However if the p-value is less than the level of significance the null hypothesis is rejected the alternative hypothesis is accepted. In SPSS only two-tailed hypotheses tests are conducted. The p-value, of the SPSS output, should be divided by 2 if a one-tailed test is carried out. Complications • • The general statistical assumption may still leave the statistical properties of the data not completely specified. For example, the variance may be unknown. Thus we may need to estimate some ancillary parameters. We then need to allow for this estimation in the test procedure. We may not be certain about some of the statistical properties. For example, the form of the statistical distribution may not be normal or the observations may not be independent. Thus some checks on these assumptions have to be carried out. L.Camilleri 21
• 2.2 Tests of Normality There are mainly two types of tests; parametric and non-parametric tests. Parametric tests such as the One-sample t-test, Paired-sample t-test, Two independent sample ttest and One-Way ANOVA require specific assumptions about the population sampled. In many cases we must assume that the population has roughly the shape of a normal distribution and that the variances are known. Since there are many situations where it is doubtful whether all necessary assumptions can be met, statisticians have developed alternative procedures based on less stringent assumptions, known as non-parametric tests. These non-parametric tests are used when the data entries are ranks or when the data doesn’t satisfy the normality condition. There are basically four ways of checking whether the underlying distribution of a covariate is normal. • • • • Produce a histogram of the covariate displayed on a normal curve and assess whether the normal curve fits the distribution well. Calculate the coefficients of skewness and kurtosis and check that they are both close to 0. Display a P-P plot or a Q-Q plot to determine whether the distribution of the covariate matches the normal distribution. If it matches the normal distribution the points should cluster around a straight line. Perform a Kolmogorov-Smirnov test. The following motivating example illustrates these procedures. The table below shows the price, in dollars of ordinary shares issued by a commercial body for 30 consecutive weeks. 16.8 13.5 18.8 L.Camilleri 15.7 14.5 17.9 13.3 19.8 14.6 13.4 17.5 15.0 13.7 16.4 15.1 15.8 13.0 14.3 22 14.5 16.5 16.8 15.9 14.3 17.9 16.8 12.4 19.2 11.3 11.8 15.7
• Statistics Price Coefficient of Skewness Coefficient of Kurtosis .139 -.505 The histogram, the P-P and Q-Q plots and the coefficients of skewness and kurtosis all complement each other and all indicate that the price distribution of the ordinary shares is fairly normal. The Kolmogorov-Smirnov test is used to check the underlying distribution of a covariate. It can be used to match this underlying distribution with many standard distributions such as the Normal, Poisson, Exponential or Uniform distribution. This test is intended for continuous data; however it can be used for discrete data if the observed values are positive integers. The Kolmogorov-Smirnov test in this illustration is used solely to check the normality of the price distribution of the shares and the null and alternative hypotheses are: H 0 : Data has a normal distribution H 1 : Data has a non-normal distribution L.Camilleri 23
• To conduct this test, select Non Parametric tests and One-Sample K.S from the menus. Move the covariate ‘Price’ into the variable list and specify that the test distribution is normal. One-Sample Kolmogorov-Smirnov Test N Normal Parameters Most Extreme Differences Mean Std. Deviation Absolute Positive Negative Kolmogorov-Smirnov Z Asymp. Sig. (2-tailed) STOCK 30 15.407 2.1601 .079 .079 -.054 .432 .992 The p-value indicates how much the underlying distribution is normal. If the p-value is 1 then the underlying distribution is perfectly normal. This normality assumption becomes less evident as the p-value gets closer to 0. In this example the p-value is 0.992 and is significantly greater than the level of significance (0.05). The Kolmogorov-Smirnov test strongly accepts the null hypothesis that the price distribution of the shares is normal. 2.3 One Sample t-test The One sample t-test is a parametric test which compares the mean of a sample with a specified population mean. For this test the observations are assumed to be independent and normally distributed and the test statistic t is given by t= x −µ sx x is the sample mean, µ is the specified population mean and sx = s n where n is the sample size and s is the sample standard deviation. The test statistic t has a tdistribution with n − 1 degrees of freedom. The t-distribution resembles the normal distribution but has heavier tails. The Binomial test, which is a non-parametric analogue of the One-sample t-test, can be used if the observations are not found to be normally distributed. The following example illustrates the procedure. L.Camilleri 24
• A company producing cigars claims that the mean tar per cigar is 14.1 mg. A random sample of 36 cigars was selected to check whether the tar content was significantly larger than the specified value. 14.5 14.4 14.3 16.2 15.6 15.8 14.4 13.9 16.4 15.8 15.6 16.6 13.1 14.4 17.1 12.9 16.4 13.5 17.3 17.9 15.8 15.5 15.0 14.7 16.2 14.8 16.0 14.9 13.6 13.4 13.9 16.1 15.8 15.0 15.2 16.7 The null and alternative hypotheses for this one tailed test are: H 0 : Actual mean tar content is 14.1mg H1 : Actual mean tar content is greater than 14.1mg The One-sample t-test was used on the merit that the observations were found to be normally distributed. To carry out this test select Compare Means and One-Sample t Test from the menus. Move the covariate ‘Tar’ containing the tar content of the cigars into the variable list and specify the population mean to be 14.1. One-Sample Statistics N TAR 36 Mean 15.2417 Std. Deviation 1.23297 Std. Error Mean .20549 This output presents the sample size n, the sample standard deviation s and the standard error sx . One-Sample Test Test Value = 14.1 TAR t 5.556 df 35 Sig. (2-tailed) .000 Mean Difference 1.1417 95% Confidence Interval of the Difference Lower Upper .7245 1.5588 In the second output, the test value is the specified population mean µ and the mean difference is x − µ . The null hypothesis specifies that x should not be significantly greater than µ . If x is significantly bigger than µ , H 0 is rejected and H1 accepted. SPSS calculates the value of the test statistic t, which is simply the ratio of the mean L.Camilleri 25
• difference and the standard error. The p-value is the area under the t-distribution beyond t = 5.556 . The bigger the t-statistic value the smaller the p-value. Since a one-tailed test is conducted the p-value should be halved. The correct p-value, which is approximately 0, is smaller than the level of significance (0.05) and so H 0 is rejected. This implies that the sample mean (15.24) and the specified population mean (14.1) differ significantly and this cannot be attributed to chance. The confidence interval specifies with 95% degree of confidence the range where the actual mean difference lies. If this confidence interval excludes 0 then the null hypothesis should be rejected. 2.4 The Binomial (Sign) test The Binomial test is used to test the hypothesis that the median of a population is equal to a specified value. We assume that the population sampled is continuous and that the median exists. The Binomial test is a non-parametric test and is used when the population does not have a normal distribution. It is a non-parametric analogue to the One-Sample t-test. Suppose that we want to test the null hypothesis that the median is equal to a specified value k against the alternative hypothesis that the median is less than k. To conduct the Binomial test replace each sample value greater than k by a ‘+’ sign and each sample value less than k by a ‘-’ sign. Discard any sample value if it is equal to k. Let n be the sample size and let X be the random variable indicating the number of ‘+’ signs. If the null hypothesis is correct the number, r, of ‘+’ signs has a binomial distribution with p = 0.5 . The p-value is the probability that the number of ‘+’ signs is less or equal to r and is given by: r P ( X ≤ r ) = ∑ nC x ( 0.5 ) ( 0.5 ) x n−x x=0 The following example illustrates the procedure. An analyst collected a random sample of 42 measurements of the octane rating of a specific brand of gasoline to test whether the median octane rating is 100 against the alternative hypothesis that this median rating is less than 100 at the 0.05 level of significance. 95 96 95 99 101 93 98 95 96 96 95 94 103 101 95 105 95 95 102 95 97 98 112 96 97 96 95 95 103 95 95 107 98 96 95 97 108 99 96 101 116 97 The null and alternative hypotheses of this one-tailed test are H 0 : Median octane rating is equal to 100 units H1 : Median octane rating is less than 100 units The histogram showing the distribution of octane ratings is skewed to the right. The Kolmogorov Smirnov test indicates that this distribution is not normal since the pvalue (0.025) is less than the level of significance. So the Binomial test is used, instead of the One Sample t-test, to determine whether the median octane rating is equal to 100 units. L.Camilleri 26
• To conduct this test choose Analyze from the bar menu, select Non-parametric tests and click on Binomial. Move the set of octane ratings into the test variable list and click on cut point to specify the median octane rating. Binomial Test OctaneRating Group 1 Group 2 Total Category At most 100 Greater than 100 N 31 11 42 Observed Prop. .74 .26 1.00 Test Prop. .50 Asymp. Sig. (2-tailed) .0028 There are 11 observations which exceed an octane rating of 100 units. This accounts to 26% of all the observations and which is much less than the expected 50%. The pvalue is equal to 11 P ( X ≤ 11) = ∑ 42C x ( 0.5 ) ( 0.5 ) x 42 − x = 0.0014 x=0 This is half the p-value given by the SPSS output since it is a one-tailed test. The null hypothesis is rejected since the p-value is less than the level of significance. So we conclude that there is sufficient evidence that the median octane rating is significantly less than 100 units. L.Camilleri 27
• 2.5 Paired samples t-test The Paired sample t-test is a parametric test which compares the means of two paired samples to make an inference on their population means. Very often the design for this test involves measuring each subject twice: before and after some kind of treatment or intervention. Let x11, x12, ... , x1n be the n observations before the treatment and x21, x22, ... , x2n be the observations after the treatment. The test statistic used to test the null hypothesis that there is no difference between the two population means is t= d sd 1 n ∑ ( x1i − x2i ) is the mean of the signed differences of the paired data and n i =1 s sd = is the standard error of the mean where s is the sample standard deviation. n The t statistic has a t-distribution with n-1 degrees of freedom. The following example illustrates the procedure. d= The following are the average weekly losses, in hours, due to accidents in 30 industrial plants before and after a certain safety program was put into operation. We want to check whether the program is significantly reducing the mean weekly losses due to accidents. A parametric test was used because both sets of readings had a normal distribution. Before the safety program After the safety program 45 55 86 36 51 89 73 59 61 60 50 65 46 43 39 44 48 43 124 28 49 119 31 42 33 63 55 35 61 56 57 98 112 51 85 103 83 84 36 77 80 25 34 46 52 29 44 48 26 28 41 24 25 38 17 34 50 11 36 53 The null and alternative hypotheses for this one tailed test are: H 0 : The program is not effective (Mean weekly losses are unaltered) H1 : The program has significantly reduced the mean weekly losses Enter all the observations of one sample in the cells of the first column and the observations of the second sample in the cells of the second column. These columns define the ‘before’ and the ‘after’ average weekly losses on man hours due to accidents. Choose Analyze from the bar menu, select Compare Means and click on Paired samples t-test. Move the variables ‘before’ and ‘after’ simultaneously to the paired variables box to run the procedure. L.Camilleri 28
• Paired Samples Statistics Safety Program Mean 55.23 51.97 Before After N 30 30 Std. Deviation 25.788 24.583 Std. Error Mean 4.708 4.488 Paired Samples Test Paired Differences Safety Program Mean 3.267 Std. Deviation 5.132 Std. Error Mean .937 t 3.486 df 29 Sig. (2-tailed) .002 The average of the paired differences is 3.267. Our null hypothesis says that there is no difference in the average weekly losses on man-hours due to accidents before and after the program; in other words we are testing whether this mean paired difference, 3.267 is significantly different from zero. SPSS calculates the value of the t-statistic, which is simply the ratio of mean paired difference and standard error, t = 3.267/0.937 = 3.486. Since a one-tailed test is used the p-value should be halved. H0 is rejected and H1 is accepted because the p-value (0.001) is less than the level of significance (0.05). This implies that the program is effective in reducing the weekly losses on man-hours due to accidents. The probability that this assertion is wrong is 0.001 which is very small. 2.6 Friedman test The Friedman test is a non-parametric analogue of the Paired Samples t-test. It is used to determine whether c related samples have equal medians. The Friedman test is a generalization of the Wilcoxon signed ranks test because the latter can only accommodate two related samples. L L Item c R22 L R2c M Rr1 M Rr 2 O L M Rrc T1 T2 L Tc Item 1 Item 2 Respondent 1 R11 R12 Respondent 2 R21 M Respondent r Total R1c Suppose that the data comprise the responses of c related items elicited by r respondents and let xi1 , xi 2 ,..., xic be the c responses elicited by the ith respondent. It is assumed that the responses of the r respondents are mutually independent and no comparison between them need to be made. To conduct a Friedman test the c L.Camilleri 29
• responses elicited by each respondent must be ranked from 1 to c. Let Rij be the rank assigned to xij . This test is also valid if there are many ties in the rankings. The Friedman test is based on the test statistic: F= c r 12 ∑ T j 2 − 3r ( c + 1) where T j = ∑ Rij rc ( c + 1) j =1 i =1 The sampling distribution of F has approximately a chi-square distribution with c − 1 degrees of freedom. The following example illustrates the methodology. In a wine competition a panel of twenty tasters was asked to rank four wines A, B, C and D for quality. 1 corresponds to worst and 4 corresponds to best. Wine A Wine B Wine C Wine D 1 4 2 3 2 3 4 1 3 2 4 1 1 4 3 2 2 3 4 1 1 3 4 2 2 4 1 3 1 3 4 2 4 2 3 1 1 4 2 3 2 1 4 3 1 3 4 2 1 2 4 3 1 4 3 2 2 3 4 1 1 2 3 4 3 4 2 1 2 3 4 1 3 4 2 1 1 3 4 2 The null and alternative hypotheses are: H 0 : All wineries produce wine of the same quality H1 : Some wineries produce better quality wine than others Non-parametric tests are actually tests for rank data. There is no need to conduct the Kolmogorov Smirnov test when the responses are ranks instead of scores. To conduct this test choose Analyze from the bar menu, select Non-parametric tests and click on K Related Samples. Move the four variables containing the wine quality ranks into the variable list; click on Statistics and select Descriptive. Descriptive Statistics N WineA WineB WineC WineD 20 20 20 20 Mean 1.75 3.05 3.25 1.95 Std. Deviation .910 .887 .967 .945 Test Statisticsa N Chi-Square df Asymp. Sig. 20 20.760 3 .000 a. Friedman Test The descriptive table shows that wines B and C are rated better in quality than wines A and D. To generalize this result the Friedman test is conducted using a 0.01 level of significance. L.Camilleri 30
• The data comprises the responses of r = 20 tasters for c = 4 different wines and the responses Rij are ranks from 1 to 4. The ranks elicited for each wine have to be entered in separate columns in SPSS because they are related. T1 = 35, T2 = 61, T3 = 65, T4 = 39 and F = 20.76 The p-value is the area under the chi squared distribution beyond F = 20.76 . Since the p-value is less than the level of significance the alternative hypothesis is accepted. This implies that wines B and C are significantly better quality wines than A and D. This assertion can be generalized because it is not attributed to chance. This means that if a larger panel of tasters was employed for the task they would still come up with the same conclusions. 2.7 The One-Way ANOVA test The One-way ANOVA test is a parametric test which compares the means of several independent samples to make inferences on their population means. The two independent sample t-test is a special case of the One-way ANOVA test because it can only be used to compare the population means of two independent groups. Consider the observations of a independent groups having m1 , m2 ,..., ma replications respectively. x11 x12 … x1m1 T1 • x 21 x 22 … x2m2 T2 • M x a1 M xa 2 M … M Ta • xama T x jk is the kth replicate observation of the jth group, T j . is the sum of the observations of the jth group and T is the sum of all observations. V = ∑ x jk 2 − j ,k 2 2 T2 1 is the total variation; VB = ∑ (T j • ) − T is the variation m m j mj between the groups and VW = V − V B is the variation within the groups and m is the total number of observations. VW VB 2 and SW = are respectively the between group and within group a −1 a (b − 1) mean squares and the test statistic F is the ratio of these two mean squares. SB = 2 L.Camilleri 31
• Variation VB VW Degrees of freedom a −1 m−a V Mean square 2 SB m −1 SW Statistic F 2 The test statistic F has an F-distribution with a − 1 and m − a degrees of freedom. The calculations required for the above test are summarized in the table above, which is called an analysis-of-variance table. The Mann Whitney test is a non-parametric analogue of the Two independent sample t-test and the Kruskal Wallis test is a non-parametric analogue of the One Way ANOVA test. Both tests can be used when the data sets do not have a normal distribution. The following example illustrates the procedure. The table shows the yields per plot of four different plant crops grown on lots treated with three different types of fertilizer. The mean crop yields are compared by fertilizer and are used to determine whether some fertilizers give better yields than others. Fertilizer A Fertilizer B Fertilizer C Crop 1 4.5 4.8 4.6 4.9 8.8 8.9 8.1 9.2 5.9 5.8 5.5 7.4 Crop 2 6.4 6.3 6.6 5.9 7.8 7.7 7.4 7.9 7.8 7.7 7.5 6.6 Crop 3 7.2 7.8 7.6 7.4 9.6 9.8 9.9 9.3 5.7 5.4 5.5 5.7 Crop 4 6.7 6.8 6.4 6.9 7.0 7.4 7.5 7.2 5.2 4.9 4.8 5.3 The null and alternative hypotheses for this two tailed test are: H 0 : All fertilizers give the same mean crop yield H 1 : Some fertilizers give better crop yield than others The One Way ANOVA test was used on the merit that the three sets of data had a normal distribution. The covariate ‘Yield’ contains the yields of the three fertilizers. The yields are categorized by the factor ‘Fertilizer which specifies the type of fertilizer applied to each plot. To carry out this test select Compare Means and One Way Anova from the menus. Define ‘Yield’ as the dependent variable and define ‘Fertilizer’ as the factor. Click on Options… and select Descriptives and Means Plot. Finally click on Continue and O.K to run the procedure. L.Camilleri 32
• Descriptives Yield N A B C 16 16 16 Mean 6.300 8.344 6.044 Std. Deviation 1.0752 .9953 1.0178 Std. Error .2688 .2488 .2545 95% Confidence Interval for Mean Lower Bound Upper Bound 5.727 6.873 7.813 8.874 5.501 6.586 The mean crop yields for fertilizers A, B and C are respectively 6.30, 8.34 and 6.04. ANOVA Yield Between Groups Within Groups Total Sum of Squares 50.840 47.739 98.579 df 2 45 47 Mean Square 25.420 1.061 F 23.962 Sig. .000 SPSS calculates the value of the test statistic F using the values of the between group and with group mean squares. The p-value is the area under the F-distribution beyond F = 23.962 . A large value for the F statistic implies a small p-value. Since the pvalue, which is approximately 0, is smaller than the level of significance then H 0 is rejected. This implies that some fertilizers give better mean crop yield than others. The mean crop yield for fertilizer B is significantly bigger compared to other fertilizers. The lower bound of the 95% confidence intervals for the actual mean crop yield of fertilizer B exceeds the 95% confidence upper bounds for the other two fertilizers. This complements the result of the hypothesis test. An assumption for the analysis of variance is that the population variances for these three fertilizers are equal. So the Levene homogeneity of variance test is requested for testing this assumption. L.Camilleri 33
• Test of Homogeneity of Variances Yield Levene Statistic .029 df1 df2 2 45 Sig. .972 The Levene statistic is 0.029 with a p-value equal to 0.972. Thus the null hypothesis of equal variances across the fertilizer categories is accepted at a 0.05 level of significance. To identify which fertilizers are significantly different from others a post hoc test is essential. SPSS provides several multiple comparison procedures. Two frequently used pairwise methods are Bonferroni and Tukey which both require the assumption that the group variances are equal. When the number of comparisons is large, the Tukey procedure may be more sensitive in detecting differences. The Bonferroni method may be more sensitive when the number of comparisons is large. 2.8 Kruskal Wallis test The Kruskal-Wallis test is a non-parametric analogue to the One-Way test. It is used to test whether k independent samples come from identical populations. In particular we test the null hypothesis that the population medians are equal against the alternative hypothesis that some populations have significantly different medians. The Kruskal-Wallis test is a generalization of the Mann-Whitney test because the latter can only be applied to two independent samples. Suppose that the data consist of k random samples of possibly different sizes. Let xi1 , xi 2 ,..., xini denote the ith random sample of size ni and let N denote the total number of observations. We assume that all samples are random and in addition to independence within each sample we assume that there is mutual independence among the various samples. Sample 1 X 11 Sample 2 X 21 L L Sample k X k1 X 12 X 22 L Xk2 M X 1n1 M X 2n2 O L M X knk To conduct a Mann Whitney test the samples has to be ranked jointly from low to high as though they constitute a single sample. Let Ri be the sum of the ranks assigned to the ni observations of the ith sample. If there are no ties or the number of ties is moderate we base this test on the statistic 2 H= L.Camilleri k R 12 ∑ i − 3( N + 1) N ( N + 1) i =1 ni 34
• The sampling distribution of H is approximately a chi-square distribution with k-1 degrees of freedom provided that n1 , n 2 ,..., n k are relatively large. In the case there are too many ties Ti among the observations in the sample data, the value of H is smaller than it should be. The corrected value H c is obtained by dividing the value of H by the correction factor. N 1 1− ∑ (Ti 3 − Ti ) 2 N ( N − 1) i =1 The following example illustrates the methodology. The ministry of roads and transport wanted to conduct a survey to determine whether the average number of daily traffic accidents in a particular town differed by season. The table shows the number of traffic accidents in randomly selected days in spring, summer, autumn and winter. Spring Summer Autumn Winter 2 2 2 2 1 1 1 4 1 5 2 2 2 2 2 1 1 1 3 5 4 2 3 1 3 1 4 6 3 1 3 5 8 1 8 1 5 1 2 6 9 1 9 2 7 1 4 1 7 4 2 1 10 12 11 13 6 8 4 9 3 2 2 2 5 6 5 1 2 1 1 1 2 2 6 2 1 2 2 1 3 1 7 14 2 3 3 2 2 4 3 3 1 4 7 1 1 2 2 4 4 1 1 2 2 3 2 2 2 2 2 1 4 2 15 2 3 1 4 1 2 1 2 1 5 1 5 3 1 2 2 3 3 2 2 2 1 3 The null and alternative hypotheses are: H 0 : The average number of daily traffic accidents is the same in all the four seasons. H1 : In some seasons the incidence of traffic accidents is higher than others. Histograms showing the distribution of daily traffic accidents for each season are skewed to the right. The Kolmogorov Smirnov test indicates that all four distributions are non-normal since the p-values (0.018, 0.032, 0.040 and 0.033) are all less than the level of significance. So the Kruskal Wallis test is used, instead of the One-Way ANOVA test, to determine whether the incidence of daily traffic accidents differs by season. To conduct this test choose Analyze from the bar menu, select Non-parametric tests and click on K Independent Samples. Move ‘Frequency’ into the test variable list and move ‘Season’ into the grouping variable. Click on Define groups and define the four seasons by values from 1 to 4. L.Camilleri 35
• Descriptives Frequency N Spring Summer Autumn Winter Std. Deviation 2.129 4.105 2.095 1.583 Mean 2.79 4.83 2.81 2.45 34 35 36 33 Test Statistics Frequency 8.926 3 .030 Chi-Square df Asymp. Sig. The descriptive table shows a larger incidence of daily traffic accidents in summer. To generalize this result a Kruskal Wallis test is conducted using a 0.05 level of significance. 4 n1 = 34, n2 = 35, n3 = 36, n4 = 33 and N = ∑ ni = 138 i =1 R1 = 2196, R2 = 3020.5, R3 = 2348.5, R4 = 2026 and H = 8.459 The chi-square value provided by SPSS is the value of H c = 8.926 and the p-value is the area under the chi squared distribution beyond H c = 8.926 . Since the p-value is less than the level of significance the alternative hypothesis is accepted. This implies that the average number of daily traffic accidents in summer is significantly larger than other seasons and this is not attributed to chance. Multiple Comparisons Dependent Variable: Yield Tukey HSD (I) Fertilizer A B C (J) Fertilizer B C A C A B Mean Difference (I-J) -2.0438* .2563 2.0438* 2.3000* -.2563 -2.3000* Std. Error .3642 .3642 .3642 .3642 .3642 .3642 *. The mean difference is significant at the .05 level. L.Camilleri 36 Sig. .000 .763 .000 .000 .763 .000 95% Confidence Interval Lower Bound Upper Bound -2.926 -1.161 -.626 1.139 1.161 2.926 1.417 3.183 -1.139 .626 -3.183 -1.417
• As expected the mean crop yield of fertilizer B differs significantly from mean yields of fertilizers A and C; however the mean yields of fertilizers A and C are not significantly different. 2.9 Two-Way ANOVA with interaction In lesson 14 we discussed a method for testing whether the mean responses differed significantly across the levels of a single factor. A step further is to compare the mean responses across the levels of two factors, assuming that the two factors are independent. This is essential if an analyst wants to assess whether the interaction between these two factors is significant. The following example illustrates the procedure. The table shows the yields per plot of four different plant crops grown on lots treated with three different types of fertilizer. The mean yield per plot is compared by both crop and fertilizer. The purpose of this analysis is to determine whether some fertilizers or crops give better yields than others and to test whether the interaction between these two factors is significant. Fertilizer A Fertilizer B Fertilizer C Crop 1 4.5 4.8 4.6 4.9 8.8 8.9 8.1 9.2 5.9 5.8 5.5 7.4 Crop 2 6.4 6.3 6.6 5.9 7.8 7.7 7.4 7.9 7.8 7.7 7.5 6.6 Crop 3 7.2 7.8 7.6 7.4 9.6 9.8 9.9 9.3 5.7 5.4 5.5 5.7 Crop 4 6.7 6.8 6.4 6.9 7.0 7.4 7.5 7.2 5.2 4.9 4.8 5.3 H 0 : All fertilizers give the same yield per plot. H1 : Some fertilizers give a better yield than others. H 0 : All crops give the same yield per plot. H1 : Some crops give a better yield than others. H 0 : There are no interaction effects between the two factors. H1 : There are interaction effects between the two factors. To conduct a Two-Way analysis with interaction the yields must be entered in a single column. Two other variables are generated to identify the crop grown and the fertilizer used for each crop yield. The categorical variables ‘Fertilizer’ and ‘Crop’ define the levels of these two factors. From the bar menu choose Analyze, select General Linear Model and click on Univariate. Move yield in the Dependent Variable list and move ‘fertilizer’ and ‘crop’ in the Fixed Factors list. They are considered as fixed factors because they include all the categories. Click on Model, click on Custom and include ‘fertilizer’ and ‘crop’ in the model as main effects and ‘fertilizer*crop’ as an interaction term. To produce a table containing the relevant descriptive statistics, click on Options and select Descriptive Statistics. To produce a multiple comparison across the several levels of the two fixed factors click on Post Hoc, move ‘fertilizer’ and ‘crop’ in Post Hoc Tests and select Bonferroni. To produce line graphs showing the mean yield for L.Camilleri 37
• each crop and fertilizer click on Plots, move ‘fertilizer’ and ‘Crop’ in the Horizontal Axis and in Separate Lines and click on Add. L.Camilleri 38
• Descriptive Statistics Dependent Variable: Yield Fertilizer A B C Crop 1 2 3 4 1 2 3 4 1 2 3 4 Mean 4.700 6.300 7.500 6.700 8.750 7.700 9.650 7.275 6.150 7.400 5.575 5.050 Std. Deviation .1826 .2944 .2582 .2160 .4655 .2160 .2646 .2217 .8505 .5477 .1500 .2380 This table gives the mean and standard deviation of the yield for each level of crop and fertilizer. There is evidence that there are differences in the mean yields across the crop and fertilizer levels; however these have to be checked for significance. Tests of Between-Subjects Effects Dependent Variable: Yield Source Corrected Model Intercept Fertilizer Crop Fertilizer * Crop Error Total Corrected Total Type III Sum of Squares 93.424a 2282.521 50.840 11.474 31.110 5.155 2381.100 98.579 df 11 1 2 3 6 36 48 47 Mean Square 8.493 2282.521 25.420 3.825 5.185 .143 F 59.312 15940.010 177.522 26.710 36.209 Sig. .000 .000 .000 .000 .000 a. R Squared = .948 (Adjusted R Squared = .932) Both the main effects and the interaction term are significantly affecting the mean yield per plot because their respective p-values are extremely small and are less than the level of significance. The R-Squared value is a measure of how well these two main effects and interaction term explain the data well. These predictors, in fact, explain 94.8% of the variability in the data. Post Hoc tests are essential to check which levels of ‘fertilizer’ and ‘crop’ differ significantly. L.Camilleri 39
• Multiple Comparisons Dependent Variable: Yield Bonferroni (I) Fertilizer A B C (J) Fertilizer B C A C A B Mean Difference (I-J) -2.044 .256 2.044 2.300 -.256 -2.300 Std. Error .1338 .1338 .1338 .1338 .1338 .1338 Sig. .000 .190 .000 .000 .190 .000 95% Confidence Interval Lower Bound Upper Bound -2.380 -1.708 -.080 .592 1.708 2.380 1.964 2.636 -.592 .080 -2.636 -1.964 The mean yield of fertilizer B differs significantly from those of fertilizers A and C; however the mean yields of fertilizers A and C are not significantly different. This is evident in the above profile plot. The fact that the line graphs presenting the mean yields are very ‘non parallel’ explain why the interaction term was found to be significant. L.Camilleri 40
• Multiple Comparisons Dependent Variable: Yield Bonferroni (I) Crop 1 2 3 4 (J) Crop 2 3 4 1 3 4 1 2 4 1 2 3 Mean Difference (I-J) -.600 -1.042 .192 .600 -.442 .792 1.042 .442 1.233 -.192 -.792 -1.233 Std. Error .1545 .1545 .1545 .1545 .1545 .1545 .1545 .1545 .1545 .1545 .1545 .1545 Sig. .003 .000 1.000 .003 .042 .000 .000 .042 .000 1.000 .000 .000 95% Confidence Interval Lower Bound Upper Bound -1.031 -.169 -1.473 -.610 -.240 .623 .169 1.031 -.873 -.010 .360 1.223 .610 1.473 .010 .873 .802 1.665 -.623 .240 -1.223 -.360 -1.665 -.802 Crop 3 is providing the highest mean yield whereas crops 1 and 4 are providing the lowest mean yield. The ‘non parallel line graphs for the mean yields justify the significance of the interaction term. 2.10 The chi-squared test The chi-squared test is a very useful test to check for associations between two factors in a two-way contingency table. Suppose an analyst wants to investigate the association between two factors with r and c levels respectively. The observed L.Camilleri 41
• frequencies Oij are the counts of each factor level combination; Ri and C j are respectively the row and column totals and T is the grand total. Factor 1 Factor 2 Level 1 Level 2 L L Level 1 O11 O12 L Level 2 O21 O22 Level c O1c Total R1 O2c R2 M Level r M Or1 M Or 2 O L M Orc M Rr Total C1 C2 L Cc T The expected frequencies Eij are computed using the row and column totals Ri , C j and the grand total T. Ri .C j Eij = T Factor 1 Factor 2 Level 1 Level 2 L L Level 1 E11 E12 L Level 2 E21 E22 M Level r M Er 1 M Er 2 O L Level c E1c E2c M Er c The test statistic used to check for an association between two factors is X 2 . This statistic has a chi square distribution with (r-1)(c-1) degrees of freedom. c r X 2 = ∑∑ (O ij − Eij ) 2 Eij j =1 i =1 The following example illustrates the procedure. A psychologist wanted to conduct a study to determine whether male and female adolescents had similar positive attitudes towards life using a 0.05 level of significance. The 1000 respondents were asked to assess their attitude towards life as exciting, routine or dull. The table below shows the observed frequencies for each level combination of the factors ‘Gender’ and ‘Attitude’. Gender Males Females Total Attitude towards life Exciting Routine Dull 210 190 30 220 300 50 430 490 80 The null and alternative hypotheses are: L.Camilleri 42 Total 430 570 1000
• H 0 : There is no association between gender and attitude (Males and females have similar positive attitudes towards life. H1 : There is a significant association between gender and attitude (One gender group has a more positive attitude towards life than the other group). The known observed frequencies are entered in the vector ‘Count’. Two other variables ‘Gender’ and ‘Attitude’ are generated to identify the gender and attitude levels of each observed frequency. To define ‘Count’ as the frequency variable click on Data and Weight cases, select weight cases by and move ‘Count’ in the frequency variable list. To conduct the chi-squared test choose Analyze from the bar menu and select Descriptive Statistics and Crosstabs. Move ‘Gender’ and ‘Attitude’ into the row and column lists. Click on Statistic and select Chi-Square. Click on Cell and select Row Percentages and Expected Counts. Gender Males Attitude towards life Exciting Routine Dull 210 190 30 184.9 210.7 34.4 220 300 50 245.1 279.3 45.6 430 490 80 430.0 490.0 80.0 Count Expected Count Count Expected Count Count Expected Count Females Total Total 430 430.0 570 570.0 1000 1000.0 The chi-squared test is sensitive to small counts and as a rule of thumb the expected count should be at least 5. In this application all the expected counts exceed this value. The value of the statistic X 2 is computed using the observed and expected counts. X 2 ( 210 − 184.9 ) = 184.9 2 ( 50 − 45.6 ) + ... + 45.6 2 = 10.533 Chi-Square Tests Pearson Chi-Square Value 10.533 df 2 Asymp. Sig. (2-sided) .005 The p-value is the area under the chi-squared distribution beyond X 2 = 10.533 . Since the p-value is less than the level of significance the alternative hypothesis is accepted indicating a significant association between ‘Gender’ and ‘Attitude’. A crosstab showing the percentage number of males and females in each category of the variable ‘Attitude’ is extremely useful to describe this association. Alternatively one can use a clustered bar graph to demonstrate this association graphically. L.Camilleri 43
• Gender Males Females Total Count Percentage Count Percentage Count Percentage Exciting 210 48.8% 220 38.6% 430 43.0% Attitude Routine 190 44.2% 300 52.6% 490 49.0% Dull 30 7.0% 50 8.8% 80 8.0% Total 430 100.0% 570 100.0% 1000 100.0% Analyzing the row percentages in the crosstab it is evident that males, rather than females, are more likely to assess attitude towards life as exciting is more likely to be a male than a female; whereas females, rather than males, are more likely to assess attitude towards life as routine. This association can be generalized because it is not attributed to chance. The clustered bar shows the same association between gender and attitude. L.Camilleri 44