1Lesson 1INTRODUCTIONSPSS (Statistical Product and Service Solution) is the most famous and commonly usedsoftware for statistics measurement and analysis. It provides a lot of tools to help on calculationstatistical parameters on descriptive statistics, representing data into various graph, calculationon statistical inference and many others tools.Manual calculation has so many limitations especially when we have a big number ofsamples. It may produce the inaccurate calculation which will impact the accuracy of itsinterpretation and analysis. Thus, this software will help us a lot to improve the accuracy andeffectiveness.Besides SPSS there are some other statistic software such as MINITAB, SAS, Stata,Lisrel, Exel or PSPP. Among them, PSPP is the free software that you can download easily frominternet.Understanding the software (SPSS , PSPP)This statistics software consist of 3 main parts ( 3 windows). They are:1. Data Editor windowIt is automatically open when you open the software. It consist two main window, DataView and Variable View. It is used on the first step input the data.Variable View: Used to determine the variable and its settingData View: Used to input the data
22. Output viewer windowThis window will automatically pop up after executing data processing instruction on thesoftware.How To Define and Input DataWe can use Data editor window to input the data into SPSS or PSPP. The following steps willguide you how to create data.1. Determine the variable on the Variable View. Variable View provides some data settingthat need to set up such as:a. Variable name: can be defined as your own definition. If you don’t define thensoftware will automatically generate var 00001, var 00002 etc for the variable name.b. Data Type: Numeric, String, Date, etc. Default data type is numericc. Variable Label and its Value : It used when your data is categorical data.2. After you have done with the data setting, go to Data view to input the value of thevariable that had been created.
3Lesson2.Descriptive StatisticsHow to present Statistics Description Measurement using SPSS or PSPP?Study Case 1:A random sample of 12 joggers was asked to keep track and report the number of miles they ranlast week. The responses are:5.5 7.2 1.6 22.0 8.7 2.8 5.3 3.4 12.5 18.6 8.3 6.6a. compute all the three statistics that measure the central tendencyAnalyze Descriptive Statistics Descriptive/ Frequencyb. Briefly describe what each statistics tell youc. Measure all the variability measurementAnalyze Descriptive Statistics Descriptived. What is the interpretation?Study Case 2:Has the educational level of adults changed over 15 years? To help answer this question theBereau of Labour Statistics compiled the following table, which lists the number (1000) of adults25 years of age and older who are employed. Use graphical technique to present these figure1992 1995 2000 2004Less than high school 13418 11972 12486 12513High school 37910 36692 37699 37790Some college 27048 30927 33257 34412College graduate 28113 31149 36619 40418
4Answer:Steps:1. Create the variable and input the data2. Create Chart to see the differenceStudy Case 3Given below raw data:Id Num Name Gender Marital St Height Weight DoB785756757788793803811856876888AminahImasTn. RafiusIsmetEsihSumiatiRomlahDudungFernandoMarimarFemaleFemaleMaleMaleFemaleFemaleFemaleMaleMaleFemaleMarriedMarriedMarriedMarriedWidowedMarriedWidowedSingleSingleMarried147.6151162.4165158156.5152.716717016855.54261.464.56060.157.756605515-Feb-195330-Jun-198630-Jun-196015-Jan-19677-May-195019-Aug-195012 May 198716-Sep-198817-Oct-199217-Dec-1979a. Input the Data into PSPP/ SPSSb. Give some Descriptive Measurement (central tendency & variability) of height variablec. Interpret the standard deviation of Height variabled. See the proportion of Marital Status by using pie chart
5Study Case 4(Xr 04-36). Everyone is familiar with waiting lines or queues. For example, people wait in line ata supermarket to go through the checkout counter. There are two factors that determine how longthe queue becomes. One is the speed of service. The other is the number of arrivals at thecheckout counter. The mean number of arrivals is an important number, but so is the standarddeviation. Suppose that a consultant for the supermarket counts the number of arrivals per hourduring a sample of 150 hours.a. Compute the minimum , maximum, mean, standard deviation of the arrival variableb. Create the Histogram and give comment on the skewness of the distributionc. If it is assumed to be bell shaped, interpret the standard deviation.
6Lesson Three.Correlation and RegressionThis lesson studies how to present some correlation parameters (covariance, coefficient ofcorrelation and coefficient of determination) and how to present the regression line for the spreadof data.Some equivalence terms:Covariance CovarianceCoefficient of correlation Pearson CorrelationCoefficient of Determination R squareStudy Case 11. A Retailer wanted to estimate the monthly fixed and variable selling expenses. As a first step shecollected data from the past 8 months. The total selling expenses (in $ thousands) and the totalsales (in $ thousands) were recorded and listed belowTotal Sales SellingExpenses20 1440 1660 1850 1750 1855 1860 1870 20a. Compute the covariance, coefficient of correlation and the coefficient of determination anddescribe what these statistics tell youAnswer: using PSPPFor having the covariance and coefficient of correlationAnalyzeBivariate Correlation
7The table above give you the value of Pearson Correlation (coefficient of correlation) whichis 0.97.Interpretation: the pearson correlation is 0.97 which is really close to positive 1. It meansthat the selling expenses and Total Sales variables has very strong linear relationship.Note: in SPSS software this correlation table also covers covariance as well, but note inPSSS (pity of us, hiks hiks hiks)For having the coefficient of determination (R square)Analyze Linear Regression
8The table above give you the R square is 0.95. It means that there are around 95 % thefluctuation of selling expenses can be explained by the fluctuation of the total sales. Theremaining is unexplainedb. Determine the least square line and use it to produce the estimates retailer wants.Answer using PSPP:For having Least Square LineAnalyze Linear regressionThe table above give you coefficient for your least square line. Based on the table the leastsquare line is y=0.11x + 11.66, with y is selling expenses and x is the total salesThe retailer wants to estimate the fixed and variable selling expenses using the least squareline:The fixed selling expenses based on the table is $11.6 (in thousand). It means that theminimum selling expenses has to be covered is $ 11.6 (in thousand) even though there is nosales.The variable selling expenses will be determined by 0.11. It means that for every single totalsales increament will lead you to increament on selling expenses as amount as $ 0.11 (inthousand).
9LESSON 4. STATISTICAL INFERENCE for MeanA. One populationThe basic idea of inference for mean of one population is trying to describe the condition ofpopulation mean by using information from sample. One sample t-test is provided in SPSS and PSPP. P-value is the parameter that need to considered in determining rejection of the Null hypothesis. As long thep-value is less than the significance level, the Null hypothesis is rejected.Study Case:(Xr 12-23) [ Mean analysis for one population]A diet doctor claims that the average North American is more than 20 poundsoverweight. To test his claim, a random sample of 20 North Americans was weighed, andthe difference between their actual weight and their ideal weight was calculated.a. Do the data allow us to infer at the 5% significance level that the doctor’s claim istrue?b. What is the interval estimation for the average of overweight with 95% confidenceinterval?Steps:1. Input the data in one row (only one population sample)2. Analyze Compare Means One Sample t-test3. Input Overweight variable into test variable and put the tested population mean into test value.Click option and determine your confidence level
104. Click Ok, then you will find the below resultOne-Sample TestTest Value = 20t Df Sig. (2-tailed) Mean Difference95% Confidence Interval of theDifferenceLower UpperOverweight .562 19 .581 .850 -2.31 4.01Interpretation:a. The appropriate hypothesis for the above case is:20:0 H20:1 HThat is one tail test, thus the p-value is 0.581/2 = 0.290505.0Based on the p-value = 0.2905 which is greater than alpha, It indicates that Null hypothesis is notrejected. It means that there is no sufficient evidence to support the doctor’s claim.b. The 95% confidence interval of the overweight is [-2.31 : 4.01]B. Inference of two independent SampleThe basic idea of inference for two independent population is trying to describe the condition ofmean difference of two independent populations by using information from the samples. IndependentSample t-test is provided in SPSS and PSPP. P-value is the parameter that need to considered indetermining rejection of the Null hypothesis. As long the p-value is less than the significance level, theNull hypothesis is rejected.One-Sample StatisticsN Mean Std. Deviation Std. Error MeanOverweight 20 20.85 6.761 1.512
11Hypothesis testing in two populations is used when A bussiness analyst or researcher want to observe orto compare the condition of two population. For example:1. Compare the expenditures on shoes made in 2000 with those from 2010 in an effort to determinewhether any change occurred over the time.2. Estimate or test to determine the difference in the market proportion of two companies or theproportion of the market share of the company in two different regions.Study Case:(Xr 13-08) [Mean analysis for two Population]A men’s softball league is experimenting with a yellow baseball that is easier to see duringnights games. One way to judge the effectiveness is to count the number of errors. In apreliminary experiment, the yellow baseball was used in 10 games and the traditional whitebaseball was used in another 10 games. The number of error in each game was recorded.a. Can we infer that the there are fewer error on average when the yellow ball is used? (useα=5%)b. What is the interval estimation for the mean difference with 95% confidence interval?Steps:1. Input the data on the software2. Create two additional variables. One for combining both data from two sample, and another onefor grouping each of the data based on its sample class. It is needed to be done since the numberof sample from the two independent populations no need to be the same.
123. Analyze Compare Means Independent Sample t-test4. Input the combined data into test variable and the group into define group, and then clickDefine group to create the group value.5. Click ok, then you will below resultGroup Statisticsgroup N Mean Std. Deviation Std. Error Meanobservation Yellow 10 5.10 2.424 .767White 10 7.30 2.406 .761
13Independent Samples TestLevenes Test forEquality of Variances t-test for Equality of MeansF Sig. t dfSig. (2-tailed)MeanDifferenceStd. ErrorDifference95% ConfidenceInterval of theDifferenceLower UpperobservationEqual variancesassumed.001 .974 -2.037 18 .057 -2.200 1.080 -4.469 .069Equal variancesnot assumed-2.037 17.999.057 -2.200 1.080 -4.469 .069InterpretationHypothesis Set upThe appropriate hypothesis for the above case is:0: 210 DIFFH 0: 211 DIFFH 05.0Levene’s Test : this test is used to determine whether equal variance assume or not. If the p-value (Sig) under Leven Test is greater than the significance level alpha, then equal varianceassumed.Based on the above result the Levene’s test give sig=0.974 which is greater than the significancealpha =0.05. It means that equal variance assumed. Thus, we have to use all result based on equalvariance assumed results.p-value is 0.057/2 =0.0285 (one tail test)Conclusion: p-value is 0.057/2 =0.0285 which is less than the significance alpha 0.05. Thus, thenull hypothesis is rejected. There is sufficient evidence to support that there are fewer error whenthe yellow ball is used.The 95% confidence interval for the difference of error made by yellow ball and white ball is[-4.469 : 0.69]
14C. Paired Sample t-testBesides one sample t-test and independent sample t-test, SPSS and PSPP also providepaired t-test. Paired t-test is used when we have paired sample data. Paired sample data isgathered from one population who had treatment. We want to see the effect of the treatment.Thus, we measure the condition before and after the treatment. Paired sample also defined as twodependent samples.Study Case:(Xr 13-44) [Mean analysis for two dependent sample (paired sample)]The president of a large company is in the process of deciding whether to adopt the lunch timeexercise program. The purpose of such program is to improve the health of workers and, in sodoing, reduce medical expenses. To get more information, he instituted an exercise program forthe employee for the office. The president knows that during the winter months medicalexpenses are relatively high because of the incidence of colds and flu. Consequently, he decidedto use a match pair design by recording medical expenses for the 12 months before the programand for the 12 months after the program. The “before” and ‘after” expenses (in thousands ofdollars) are compared on month –to-month basis and shown in the data.a. Do the data indicate that exercise programs reduce medical expenses (use α = 5%)b. Estimate with 95% confidence the mean savings produced by exercise programs.Steps:1. Input the data into software2. Analyze Compare Means Paired Sample t-Test
153. Put variable After under Var 1 and variable Before under Var24. Click ok then you will have below resultPaired Samples StatisticsMean N Std. Deviation Std. Error MeanPair 1 After 43.50 12 18.618 5.375Before 46.58 12 16.670 4.812Paired Samples CorrelationsN Correlation Sig.Pair 1 After & Before 12 .950 .000Paired Samples TestPaired Differencest dfSig. (2-tailed)MeanStd.DeviationStd. ErrorMean95% Confidence Interval ofthe DifferenceLower UpperPair 1 After -Before-3.083 5.885 1.699 -6.822 .656 -1.815 11 .097
16Interpretation:Hypothesis Set upThe appropriate hypothesis for the above case is:0:0 DIFFbeforeafterH 0:1 DIFFbeforeafterH 05.0p-value is 0.097/2 =0.0485 (one tail test)Conclusion: p-value is 0.097/2 =0.0485 which is less than the significance alpha 0.05. Thus, thenull hypothesis is rejected. There is sufficient evidence to support that there is smaller amountmedical expenses when the lunch time exercise program applied.The 95% confidence interval for the difference of the medical expenses before and after thelunch time exercise program is [-6.822 : 0.656].
17Lesson 5.Chi-Square Goodness-of-Fit TestBasically Chi square Goodness of Fit Test is used to described the condition of population ofnominal data. In binomial distribution, the nominal variable could assume one of only twopossible values, such as failure or success. This concept then derives inference of twopopulations for proportion. Binomial experiment is extended into Multinomial experiment whenthe possible output is more than two. Chi Square Goodness of Fit Test is statistical Measurementwhich can be used to inference more than two populations.Study case:A machine has a record of producing 80% excellent, 17% good, and 3% unacceptable parts.After extensive repairs, a sample of 200 produced 157 excellent, 42 good, and 1 unacceptablepart. Have the repairs changed the nature of the output of the machine? Use PSPP with α = 0.05.Steps:1. Enter the category data into one variable and the observed frequency into anothervariable.Category data: Quality: 1=excellent, 2=Good, 3=UnacceptableFigure 5.12. The data will be weighted by using its frequency : Data Weight CaseWeight Casesby Observed_freq
183. Do the Chi-Square TestAnalyze Nonparametric Test Chi Square4. The output given is:
195. Output analysisStep 1: HypothesesH0: The repairs did not change the nature of the output of the machine.[i.e., the proportions remained the same (π1 = 0.80, π2 = 0.17, π3 = 0.03)]Ha: The repairs did change the nature of the output of the machine.[i.e., the proportions changed after the repairs (at least one πi ≠ πi,0)]Step 2: Significance Levelα = 0.05Step 3: Rejection RegionReject the null hypothesis if p-value ≤ 0.05 = α.Step 4.1: Calculate Expected FrequenciesStep 4.2: Check AssumptionsAccording to footnote a (below), all expected frequencies are ≥ 5 (smallest value is 6).Step 4.3: Test Statistic and P-value
20Step 5: DecisionSince p-value = 0.0472 ≤ 0.05, we shall reject the null hypothesis.Step 6: State conclusion in wordsAt the α = 0.05 level of significance, there is enough evidence to conclude that therepairs changed the nature of the output of the machine (the proportions are not what theyused to be)Lesson 6.Chi-Square of a Contingency TableThe Chi-Square test of a contingency table is used to determine whether there is enoughevidence to infer that two nominal variables are related and to infer that differences existbetween two or more populations of nominal variables.Example:Suppose we conducted a prospective cohort study to investigate the effect of aspirin on heartdisease. A group of patients who are at risk for a heart attack are randomly assigned to either aplacebo or aspirin. At the end of one year, the number of patients suffering a heart attack isrecorded.H0: two variable are independent (no effect on medicine taken into having a heart disease)Ha: two variable are dependent (there is effect on medicine taken into having a heart disease)GroupHeart DiseaseTotalYes (+) No (-)PlaceboAspirin201580135100150Total 35 215 250Steps1. Input the data. Create the variables: Heart_Disease, freq, Factor.
212. The data will be weighted by its frequency3. Analyze Descriptive Statistics crosstabPut factor in row box and heart disease in the coloumn box based on the contigency table4. Results:
225. Analysisp-value = 0.03Chi-square= 4.98p-value=0.03<alpha=0.05. It means that Null Hypothesis is rejected. There is sufficientevidence to support that the medicine taken effect of having heart disease.
23Anyone who has never made a mistake has never tried anything new.Albert EinsteinDo not worry about your difficulties in Mathematics. I can assure you mine are still greaterAlbert Einstein3 sentence for getting success: know more than other, work more than other,and expect less than other.William ShakespeareREFERENCEManagerial Statistics Abbreviated, by Keller, South Western Cengage Learning,2009.Modul Praktikum Metode Statistika, by FMIPA Gadjah Mada University, 2003