Os3 manual

632 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
632
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Os3 manual

  1. 1. USERS MANUAL FOR OPENSTAT (OS2) AND LINUXOSTAT by William G. Miller, PhD April, 2003
  2. 2. Table of Contents I. INTRODUCTION..................................................................................................................................................9 II. INSTALLATION .................................................................................................................................................9 WINDOWS VERSIONS ..................................................................................................................................................9 LINUX VERSION........................................................................................................................................................10 III. STARTING........................................................................................................................................................11 WINDOWS VERSION .................................................................................................................................................11 LINUX VERSION........................................................................................................................................................11 IV. FILES .................................................................................................................................................................12 CREATING A FILE .....................................................................................................................................................12 SAVING A FILE..........................................................................................................................................................14 LOADING A FILE .......................................................................................................................................................15 PRINTING A FILE .......................................................................................................................................................16 V. OUTPUT DISPLAY ...........................................................................................................................................17 WINDOWS VERSION .................................................................................................................................................17 LINUX VERSION........................................................................................................................................................17 VI. THE MAIN MENU ...........................................................................................................................................18 FILE ..........................................................................................................................................................................18 VARIABLES AND DATA-TOOLS .................................................................................................................................18 EDIT .........................................................................................................................................................................22 TRANSFORM .............................................................................................................................................................22 ANALYSES ................................................................................................................................................................23 SIMULATION .............................................................................................................................................................23 SUBSYSTEMS ............................................................................................................................................................23 HELP ........................................................................................................................................................................23 VII. BASIC STATISTICS ......................................................................................................................................25 INTRODUCTION .........................................................................................................................................................25 SYMBOLS USED IN STATISTICS .................................................................................................................................25 THE ARITHEMETIC MEAN.........................................................................................................................................27 VARIANCE AND STANDARD DEVIATION ...................................................................................................................29 ESTIMATING POPULATION PARAMETERS : MEAN AND STANDARD DEVIATION ........................................................30 THE STANDARD ERROR OF THE MEAN .....................................................................................................................32 USING THE DISTRIBUTION PARAMETER ESTIMATES PROCEDURE .............................................................................33 USING THE BREAKDOWN PROCEDURE ......................................................................................................................33 FREQUENCY DISTRIBUTIONS ....................................................................................................................................34 THE NORMAL DISTRIBUTION MODEL .......................................................................................................................35 THE BINOMIAL DISTRIBUTION..................................................................................................................................36 THE POISSON DISTRIBUTION ....................................................................................................................................37 THE CHI-SQUARED DISTRIBUTION ...........................................................................................................................38 THE F RATIO DISTRIBUTION.....................................................................................................................................39 USING THE DISTRIBUTION PLOTS AND CRITICAL VALUES PROCEDURE....................................................................40 VIII. DESCRIPTIVE ANALYSES ........................................................................................................................41 FREQUENCIES ...........................................................................................................................................................41 CROSS-TABULATION ................................................................................................................................................43 BREAKDOWN ............................................................................................................................................................44 DISTRIBUTION PARAMETERS ....................................................................................................................................47 2
  3. 3. BOX PLOTS ...............................................................................................................................................................47 THREE VARIABLE ROTATION ...................................................................................................................................48 X VERSUS Y PLOTS..................................................................................................................................................49 VIX. CORRELATION............................................................................................................................................51 THE PRODUCT MOMENT CORRELATION ...................................................................................................................51 TESTING HYPOTHESES FOR RELATIONSHIPS AMONG VARIABLES: CORRELATION ...................................................52 TRANSFORMATION TO Z SCORES ..............................................................................................................................54 SIMPLE LINEAR REGRESSION ...................................................................................................................................60 THE LEAST-SQUARES FIT CRITERION .......................................................................................................................60 THE VARIANCE OF PREDICTED SCORES ....................................................................................................................63 THE VARIANCE OF ERRORS OF PREDICTION .............................................................................................................63 TESTING HYPOTHESES CONCERNING THE PEARSON PRODUCT-MOMENT CORRELATION. ........................................65 Testing Equality of Correlations in Two Populations .........................................................................................67 Differences Between Correlations in Dependent Samples...................................................................................69 PARTIAL AND SEMI_PARTIAL CORRELATIONS .........................................................................................................71 Partial Correlation ..............................................................................................................................................71 Semi_Partial Correlation ....................................................................................................................................71 AUTOCORRELATION .................................................................................................................................................72 CANONICAL CORRELATION ......................................................................................................................................80 Introduction .........................................................................................................................................................80 Eigenvalues and Eigenvectors .............................................................................................................................81 The Canonical Analysis .......................................................................................................................................83 Structure Coefficents. ..........................................................................................................................................86 Redundancy Analysis ...........................................................................................................................................87 Using OpenStat2 to Obtain Canonical Correlations...........................................................................................88 X. COMPARISONS ................................................................................................................................................93 ONE SAMPLE TESTS..................................................................................................................................................93 PROPORTION DIFFERENCES ......................................................................................................................................96 CORRELATION DIFFERENCES ....................................................................................................................................98 T-TESTS ..................................................................................................................................................................100 ONE, TWO OR THREE WAY ANALYSIS OF VARIANCE .............................................................................................102 THEORY OF ANALYSIS OF VARIANCE .....................................................................................................................104 The Completely Randomized Design .................................................................................................................105 Introduction ..................................................................................................................................................................... 105 A Graphic Representation................................................................................................................................................ 105 Null Hypothesis of the Design......................................................................................................................................... 105 Summary of Data Analysis.............................................................................................................................................. 106 Model and Assumptions .................................................................................................................................................. 106 Fixed and Random Effects............................................................................................................................................... 107 Analysis of Variance - The Two-way, Fixed-Effects Design..............................................................................107 Stating the Hypotheses .................................................................................................................................................... 111 Interpreting Interactions................................................................................................................................................... 111 Random Effects Models .....................................................................................................................................112 Analysis of Variance - Treatments by Subjects Design......................................................................................115 Introduction ..................................................................................................................................................................... 115 The Research Design ....................................................................................................................................................... 115 Theoretical Model ........................................................................................................................................................... 116 Summary Table ............................................................................................................................................................... 116 Assumptions .................................................................................................................................................................... 117 Population Parameters Estimated .................................................................................................................................... 117 Computational Formulas ................................................................................................................................................. 118 An Example ..................................................................................................................................................................... 118 One Between, One Repeated Design .................................................................................................................120 Introduction ..................................................................................................................................................................... 120 The Research Design ....................................................................................................................................................... 121 3
  4. 4. Theoretical Model ........................................................................................................................................................... 122 Assumptions .................................................................................................................................................................... 123 Summary Table ............................................................................................................................................................... 123 Population Parameters Estimated .................................................................................................................................... 125 An Example Mixed Design ............................................................................................................................................. 125 Nested Factors Analysis Of Variance Design....................................................................................................127 The Research Design ....................................................................................................................................................... 127 The Variance Model ........................................................................................................................................................ 128 The ANOVA Summary Table ......................................................................................................................................... 128 Latin and Greco-Latin Square Designs .............................................................................................................130 Some Theory ................................................................................................................................................................... 130 The Latin Square ............................................................................................................................................................. 130 Example in Education Using a Latin Square ................................................................................................................... 131 Plan 1 by B.J. Winer ..........................................................................................................................................132 Plan 2.................................................................................................................................................................136 Plan 3 Latin Squares Design .............................................................................................................................139 Analysis of Greco-Latin Squares .......................................................................................................................142 Plan 5 Latin Square Design...............................................................................................................................147 Plan 6 Latin Squares Design .............................................................................................................................150 Plan 7 for Latin Squares....................................................................................................................................153 Plan 9 Latin Squares .........................................................................................................................................156 ANALYSIS OF VARIANCE USING MULTIPLE REGRESSION METHODS ......................................................................163 A Comparison of ANOVA and Regression.........................................................................................................163 Effect Coding .....................................................................................................................................................164 Orthogonal Coding............................................................................................................................................165 Dummy Coding ..................................................................................................................................................166 TWO FACTOR ANOVA BY MULTIPLE REGRESSION ...............................................................................................167 ANALYSIS OF COVARIANCE BY MULTIPLE REGRESSION ANALYSIS .......................................................................171 An Example of an Analysis of Covariance.........................................................................................................172 THE GENERAL LINEAR MODEL ..............................................................................................................................176 XI. MULTIPLE REGRESSION ..........................................................................................................................177 THE LINEAR REGRESSION EQUATION .....................................................................................................................177 LEAST SQUARES CALCULUS ...................................................................................................................................179 FINDING A CHANGE IN Y GIVEN A CHANGE IN X FOR Y=F(X)..............................................................................181 RELATIVE CHANGE IN Y FOR A CHANGE IN X ........................................................................................................182 THE CONCEPT OF A DERIVATIVE ............................................................................................................................183 SOME RULES FOR DIFFERENTIATING POLYNOMIALS ..............................................................................................184 GEOMETRIC INTERPRETATION OF A DERIVATIVE ...................................................................................................186 A Generalization of the Last Example ...............................................................................................................189 PARTIAL DERIVATIVES ...........................................................................................................................................190 LEAST SQUARES REGRESSION FOR TWO OR MORE INDEPENDENT VARIABLES ......................................................191 MATRIX FORM FOR NORMAL EQUATIONS USING RAW SCORES .............................................................................192 MATRIX FORM FOR NORMAL EQUATIONS USING DEVIATION SCORES ...................................................................193 MATRIX FORM FOR NORMAL EQUATIONS USING STANDARDARDIZED SCORES .....................................................194 HYPOTHESIS TESTING IN MULTIPLE REGRESSION ..................................................................................................195 Testing the Significance of the Multiple Regression Coefficient .......................................................................195 THE STANDARD ERROR OF ESTIMATE ....................................................................................................................196 TESTING THE REGRESSION COEFFICIENTS ..............................................................................................................196 TESTING THE DIFFERENCE BETWEEN REGRESSION COEFFICIENTS .........................................................................198 STEPWISE MULTIPLE REGRESSION .........................................................................................................................199 CROSS AND DOUBLE CROSS VALIDATION OF REGRESSION MODELS ......................................................................199 POLYNOMIAL (NON-LINEAR) REGRESSION ............................................................................................................200 RIDGE REGRESSION ANALYSIS ...............................................................................................................................201 BINARY LOGISTIC REGRESSION ..............................................................................................................................201 Background Info (just what is logistic regression, anyway?) ............................................................................201 4
  5. 5. COX PROPORTIONAL HAZARDS SURVIVAL REGRESSION ........................................................................................202 Background Information (just what is Proportional Hazards Survival Regression, anyway?).........................202 XII. MULTIVARIATE .........................................................................................................................................204 DISCRIMINANT FUNCTION / MANOVA .................................................................................................................204 Theory................................................................................................................................................................204 An Example........................................................................................................................................................204 HIERARCHICAL ANALYSIS ......................................................................................................................................214 Theory................................................................................................................................................................214 PATH ANALYSIS .....................................................................................................................................................219 Theory................................................................................................................................................................219 Example of a Path Analysis ...............................................................................................................................220 FACTOR ANALYSIS .................................................................................................................................................228 The Linear Model ..............................................................................................................................................228 GENERAL LINEAR MODEL ......................................................................................................................................236 Introduction .......................................................................................................................................................236 Example 1 ..........................................................................................................................................................236 Example Two .....................................................................................................................................................245 XIII. NON-PARAMETRIC ..................................................................................................................................250 CONTINGENCY CHI-SQUARE ..................................................................................................................................250 Example Contingency Chi Square .....................................................................................................................250 SPEARMAN RANK CORRELATION ...........................................................................................................................253 Example Spearman Rank Correlation ...............................................................................................................253 MANN-WHITNEY U TEST .......................................................................................................................................254 FISHER’S EXACT TEST ............................................................................................................................................255 KENDALL’S COEFFICIENT OF CONCORDANCE ........................................................................................................257 KRUSKAL-WALLIS ONE-WAY ANOVA.................................................................................................................258 WILCOXON MATCHED-PAIRS SIGNED RANKS TEST ...............................................................................................260 COCHRAN Q TEST ..................................................................................................................................................261 SIGN TEST ..............................................................................................................................................................262 FRIEDMAN TWO WAY ANOVA .............................................................................................................................264 PROBABILITY OF A BINOMIAL EVENT .....................................................................................................................265 XIV. MEASUREMENT ........................................................................................................................................268 TEST THEORY .........................................................................................................................................................268 Scales of Measurement ......................................................................................................................................268 Nominal Scales................................................................................................................................................................ 268 Ordinal Scales of Measurement....................................................................................................................................... 268 Interval Scales of Measurement....................................................................................................................................... 269 Ratio Scales of Measurement .......................................................................................................................................... 270 RELIABILITY, VALIDITY AND PRECISION OF MEASUREMENT .................................................................................270 Reliability ..........................................................................................................................................................270 The Kuder - Richardson Formula 20 Reliability ...............................................................................................272 Validity ..............................................................................................................................................................275 Concurrent Validity ......................................................................................................................................................... 276 Predictive Validity........................................................................................................................................................... 276 Discriminate Validity ...................................................................................................................................................... 276 Construct Validity ........................................................................................................................................................... 277 Content Validity .............................................................................................................................................................. 277 Effects of Test Length ..................................................................................................................................................... 278 Composite Test Reliability .................................................................................................................................279 Reliability by ANOVA ........................................................................................................................................280 Sources of Error - An Example........................................................................................................................................ 280 A Hypothetical Situation ................................................................................................................................................. 280 Item and Test Analysis Procedures....................................................................................................................289 5
  6. 6. CLASSICAL ITEM ANALYSIS METHODS ..................................................................................................................290 Item Discrimination ...........................................................................................................................................290 Item difficulty.....................................................................................................................................................290 The Item Analysis Program ...............................................................................................................................291 ITEM RESPONSE THEORY........................................................................................................................................291 The One Parameter Logistic Model...................................................................................................................293 Estimating Parameters in the Rasch Model: Prox. Method ..............................................................................294 Item Banking and Individualized Testing ..........................................................................................................297 Measuring Attitudes, Values, Beliefs .................................................................................................................298 Methods for Measuring Attitudes ......................................................................................................................298 Affective Measurement Theory ..........................................................................................................................301 Thurstone Paired Comparison Scaling..............................................................................................................302 Successive Interval Scaling Procedures ............................................................................................................305 Guttman Scalogram Analysis ............................................................................................................................308 Likert Scaling.....................................................................................................................................................312 Semantic Differential Scales..............................................................................................................................313 Behavior Checklists ...........................................................................................................................................315 Codifying Personal Interactions........................................................................................................................316 XV. SERIES ...........................................................................................................................................................317 INTRODUCTION .......................................................................................................................................................317 AUTOCORRELATION ...............................................................................................................................................317 An Example........................................................................................................................................................318 XVI. STATISTICAL PROCESS CONTROL.....................................................................................................322 INTRODUCTION .......................................................................................................................................................322 XBAR CHART ........................................................................................................................................................322 An Example........................................................................................................................................................322 RANGE CHART .......................................................................................................................................................325 S CONTROL CHART ................................................................................................................................................327 CUSUM CHART.....................................................................................................................................................330 P CHART .................................................................................................................................................................333 DEFECT (NON-CONFORMITY) C CHART ..................................................................................................................336 DEFECTS PER UNIT U CHART .................................................................................................................................338 XVII LINEAR PROGRAMMING ..................................................................................................................340 INTRODUCTION .......................................................................................................................................................340 Calculation ........................................................................................................................................................341 Implementation in Simplex ................................................................................................................................341 THE LINEAR PROGRAMMING PROCEDURE ..............................................................................................................341 XVIII USING MATMAN .......................................................................................................................................345 PURPOSE OF MATMAN ...........................................................................................................................................345 USING MATMAN ....................................................................................................................................................345 USING THE COMBINATION BOXES ..........................................................................................................................346 FILES LOADED AT THE START OF MATMAN ...........................................................................................................346 CLICKING THE MATRIX LIST ITEMS ........................................................................................................................346 CLICKING THE VECTOR LIST ITEMS ........................................................................................................................346 CLICKING THE SCALAR LIST ITEMS ........................................................................................................................346 THE GRIDS .............................................................................................................................................................346 OPERATIONS AND OPERANDS .................................................................................................................................347 MENUS ...................................................................................................................................................................347 COMBO BOXES .......................................................................................................................................................347 THE OPERATIONS SCRIPT .......................................................................................................................................347 GETTING HELP ON A TOPIC ....................................................................................................................................348 6
  7. 7. SCRIPTS ..................................................................................................................................................................348 Print...................................................................................................................................................................348 Clear Script List.................................................................................................................................................349 Edit the Script ....................................................................................................................................................349 Load a Script .....................................................................................................................................................349 Save a Script ......................................................................................................................................................349 Executing a Script..............................................................................................................................................349 Script Options ....................................................................................................................................................350 FILES ......................................................................................................................................................................350 Keyboard Input ..................................................................................................................................................351 File Open ...........................................................................................................................................................351 File Save ............................................................................................................................................................352 Import a File......................................................................................................................................................352 Export a File......................................................................................................................................................352 Open a Script File..............................................................................................................................................352 Save the Script ...................................................................................................................................................353 Reset All.............................................................................................................................................................353 ENTERING GRID DATA ...........................................................................................................................................353 Clearing a Grid .................................................................................................................................................353 Inserting a Column ............................................................................................................................................354 Inserting a Row..................................................................................................................................................354 Deleting a Column.............................................................................................................................................354 Deleting a Row ..................................................................................................................................................354 Using the Tab Key .............................................................................................................................................354 Using the Enter Key...........................................................................................................................................354 Editing a Cell Value ..........................................................................................................................................355 Loading a File ...................................................................................................................................................355 MATRIX OPERATIONS .............................................................................................................................................355 Printing..............................................................................................................................................................355 Row Augment.....................................................................................................................................................356 Column Augmentation .......................................................................................................................................356 Extract Col. Vector from Matrix........................................................................................................................356 SVDInverse ........................................................................................................................................................356 Tridiagonalize....................................................................................................................................................357 Upper-Lower Decomposition ............................................................................................................................358 Diagonal to Vector ............................................................................................................................................358 Determinant .......................................................................................................................................................358 Normalize Rows or Columns .............................................................................................................................359 Pre-Multiply by: ................................................................................................................................................359 Post-Multiply by: ...............................................................................................................................................359 Eigenvalues and Vectors....................................................................................................................................360 Transpose ..........................................................................................................................................................360 Trace..................................................................................................................................................................360 Matrix A + Matrix B..........................................................................................................................................361 Matrix A - Matrix B ...........................................................................................................................................361 Print...................................................................................................................................................................361 VECTOR OPERATIONS.............................................................................................................................................361 Vector Transpose...............................................................................................................................................361 Multiply a Vector by a Scalar ............................................................................................................................362 Square Root of Vector Elements ........................................................................................................................362 Reciprocal of Vector Elements ..........................................................................................................................362 Print a Vector ....................................................................................................................................................362 Row Vector Times a Column Vector..................................................................................................................362 Column Vector Times Row Vector.....................................................................................................................362 SCALAR OPERATIONS .............................................................................................................................................363 7
  8. 8. Square Root of a Scalar.....................................................................................................................................363 Reciprocal of a Scalar .......................................................................................................................................363 Scalar Times a Scalar........................................................................................................................................363 Print a Scalar.....................................................................................................................................................363 XIX THE GRADEBOOK PROGRAM .................................................................................................................364 INTRODUCTION .......................................................................................................................................................364 PHILOSOPY .............................................................................................................................................................365 BASIC MEASUREMENT CONCEPTS ..........................................................................................................................365 REPORTING TEST RESULTS .....................................................................................................................................365 COMBINING SCORES ...............................................................................................................................................366 ASSIGNING GRADES ...............................................................................................................................................367 THE GRADEBOOK MAIN FORM ..............................................................................................................................368 THE STUDENT PAGE TAB .......................................................................................................................................369 TEST RESULT PAGE TABS .......................................................................................................................................370 THE SUMMARY PAGE TAB......................................................................................................................................373 PRINTING REPORTS ................................................................................................................................................374 GRADE DISTRIBUTION GRAPHS ..............................................................................................................................377 THE BEHAVIOR PORTFOLIO ....................................................................................................................................378 The Eight Behavior Scales.................................................................................................................................378 Specifying the Initial Merit Points .....................................................................................................................379 OTHER OPERATIONS ...............................................................................................................................................380 Using the Help Menu .........................................................................................................................................380 Making Backup Copies of Files .........................................................................................................................380 PROGRAM SPECIFICATIONS ....................................................................................................................................382 Language Used ..................................................................................................................................................382 Operating System Platform................................................................................................................................382 Copyright...........................................................................................................................................................382 Disclaimers........................................................................................................................................................382 XX. THE ITEM BANKING PROGRAM .........................................................................................................383 INTRODUCTION .......................................................................................................................................................383 ITEM CODING .........................................................................................................................................................383 USING THE ITEM BANK PROGRAM ..........................................................................................................................384 LOGGING ON ..........................................................................................................................................................384 CREATING CODES ...................................................................................................................................................385 ENTERING ITEMS INTO THE BANK ..........................................................................................................................387 Multiple Choice Item Entry ...............................................................................................................................388 True or False Item Entry ...................................................................................................................................389 Entry of Matching Item Sets ..............................................................................................................................389 Entering Completion Items ................................................................................................................................390 Entry of Essay Questions ...................................................................................................................................391 CREATING A TEST ..................................................................................................................................................392 Specifying the Test .............................................................................................................................................392 LISTING ITEMS IN THE ITEM BANK .........................................................................................................................394 BIBLIOGRAPHY....................................................................................................................................................396 8
  9. 9. I. Introduction OpenStat (OS2) and LinuxOStat, among others, are ongoing projects that I have created for use by students, teachers, researchers, practitioners and others. There is no charge for use of these programs if downloaded directly from a World Wide Web site. The software is a result of an “over-active” hobby of a retired professor (Iowa State University.) I make no claim or warranty as to the accuracy, completeness, reliability or other characteristics desirable in commercial packages (as if they can meet these requirement also.) They are designed to provide a means for analysis by individuals with very limited financial resources. The typical user is a student in a required social science or education course in beginning or intermediate statistics, measurement, psychology, etc. Some users may be individuals in developing nations that have very limited resources for purchase of commercial products. The series of statistics packages began years ago while I was teaching educational psychology and industrial technology at Iowa State University. Packages have been written in Basic for the Altair computer, Amiga computer, Radio Shack, Commodore 64, and the PC. The first package was called “FreeStat”. With the advent of Windows on the PC, a new version of the package was created using Borland’s Turbo Pascal and Microsoft’s Visual Basic. These packages were named “Statistics and Measurement Program Learning Environment (SAMPLE.) Upon my retirement I began the “OpenStat” series. The first OpenStat was written with the use of Borland’s C++ Builder. OpenStat2 and OS2 were written with Borland’s Delphi (Pascal) compiler. The most recent version, LinuxOStat, was written with Borland’s Kylix compiler for the Linux operating system(s). Each of these versions contains slight variations from each other. Because there are many students out there who are learning computer programming as well as statistics, the source code for each of these versions is also available through the Internet downloading. It should be clear that programming is my hobby and I enjoy trying various languages and compilers. Other languages such as Lisp and Forth could be and are used to write statistical routines. Years ago I wrote programs in Fortran and Cobol (ugh!) But C++ seems to have become the standard for many commercial and industrial uses and Pascal is such a nice language for the teaching of programming that I have devoted more time to these languages. The advantage of the Pascal compilers from the Borland Corporation is the very fast compile times and excellent execution speeds. There are several languages which are interpretive languages that have been developing over the past decade that seem to be popular for statistical programming. R and S are such “languages” popular among “serious” statisticians. Unfortunately, the learning curve is rather high for the occasional user and student who would rather spend time on the statistics rather than the programming challenges. While I reserve the copyright protection of these packages, I make no restriction on their distribution or use. It is common courtesy, of course, to give me credit if you use these resources. Because I do not warrant them in any manner, you should insure yourself that the routines you use are adequate for your purposes. I strongly suggest analyses of text book examples and comparisons to other statistical packages where available. II. Installation Windows Versions OpenStat, OpenStat2 and OS2 have both been successfully installed on Windows 95, 98, ME, and NT systems. InstallShield Express is a package “bundled” with the Borland C++ Developer and Delphi packages and was used to create the setup program for installing the packages on the Windows operating system platform. Once I have created the setup files, I create .zip files for uploading and downloading from the Internet. I include in the setup files the executable files, a Windows Help file and sample data files that can be used to test the analysis programs. At this time, only the OS2 version is receiving my attention for updates and revisions. When I feel it is stable enough and complete enough, I will translate it to the Linux platform and replace the current LinuxOStat package. To install OS2 for Windows, follow these steps: 9
  10. 10. 1. Connect to the internet address: http://statpages.com/miller/openstat/openstatmain.html 2. Click on the link to the OpenStat package of your choice, preferably OS2 . Your IE or Netscape browser should automatically begin the download process to a directory on your computer. 3. Locate the .zip file that was downloaded and use a zip program to extract the files into another directory of your choice. 4. In the directory to which you extracted the zipped files you should see a “Setup.exe” file. Double click this file name to start the execution of the setup program. By default, Windows will normally install the program in the Program Files directory of the C: drive. You may, of course, select an alternative drive and directory if you wish. Simply follow the directions provided by the setup program and complete the installation. When completed, there should be an entry in the Programs menu. Linux Version There are a relatively large number of GNU Linux operating system versions available either free through the internet or commercially through computer and bookstores. The RedHat versions have been very popular. I am using the SUSE version 7.0 myself. Like other operating systems, Linux goes through constant revision and improvement. It is important to have a version that contains recent updates to certain software used by Linux. Linux is a stable operating system and desirable for running server operations. It is making gains for desktop use also due to the many software packages distributed free with most versions. While the window features of Linux are similar to other window systems such as Microsoft, MacIntosh, Amiga, etc., the commands issued are often quite different for those accustomed to using DOS on a PC. In addition, there has been no support from the HewlettPackard Corporation or other vendors of printers, sound boards, disk drives, etc. that has made the “Plug-and-Play” features of the Microsoft Windows system so popular. Linux developers have had to create their own “drivers” to use these devices. The result is that some printers may not perform as well as desired with the Linux system. If your computer uses a keyboard and mouse connected to a USB (Universal Serial Bus) connection, you may not even be able to successfully install Linux on your system. In any event, there is a growing number of Linux users “out there”. To meet their needs, the OpenStat2 package was “ported” to the Linux system using the Borland Kylix, Professional Version 2 compiler. Note: a free version of the Kylix compiler may be downloaded from the Borland site. To install LinuxOStat on your Linux system, follow these steps: 1. With your internet browser, connect to http://statpages.com/miller/openstat/openstatmain.html 2. Click on the download link to the LOpenStat1.tar.gz file. I recommend downloading the file to the /Home directory if you plan on having users other than “root” access it. 3. To unzip and untar the files in the .tar.gz file, issue the command: tar -xvzf /Home/LOpenStat1.tar.gz . The result will be the creation of a directory labeled LOpenStat1 which contains two directories labeled “bin” and “lib”. The bin directory contains two files: “LinuxOStat” and “start_a”. The lib directory contains a number of files which support the run-time execution of the program. 4. You may need to change the permission coding of each file in the bin and lib directories to permit users other than the person who installed the files to access them. I recommend changing the name of the “start_a” file in the bin directory to OpenStat. This is the file which actually loads and starts the execution of LinuxOStat. 10
  11. 11. III. Starting Windows Version To begin use of OS2 simply click the Windows “Start” button in the lower left portion of your screen, move the cursor to the “Programs” menu and click on the OS2 entry. The following form should appear: Linux Version Open a file manager window and change directory to /Home/LOpenStat1/bin. Click the “start_a” (or OpenStat if you renamed it) file. Wait until a screen similar to that above appears (depending on the speed of your system, this may take a moment - Linux does not load the program as fast as the Windows version.) Note: the “start_a” file is a file of “shell” commands that you may want to examine with an editor. 11
  12. 12. IV. Files The “heart” of OS2 or any other statistics package is the data file to be created, saved, retrieved and analyzed. Unfortunately, there is no one “best” way to store data and each data analysis package has its own method for storing data. Many packages do, however, provide options for importing and exporting files in a variety of formats. For example, with Microsoft’s Excel package, you can save a file as a file of “tab” separated fields. Other program packages such as SPSS and OS2 can import “tab” files. Here are the types of file formats supported by OpenStat2: 1. 2. 3. 4. OS2 Text files (with the file extension of .OS2 .) Tab separated field files (with the file extension of .TAB.) Comma separated field files (with the file extension of .CSV.) Space separated field files (with the file extension of .SSV.) My preference is to save files as tab separated field files. This gives me the opportunity to analyze the same data using a variety of packages. For relatively small files (say, for example, a file with 20 variables and 1000 cases), the speed of loading the different formats is similar and quite adequate. The default for OS2 is to save as a text file with the extension .OS2 to differentiate it from other types of text files. Note: the original OpenStat program written in the C++ language saves as a default in binary format. Creating a File When OS2 begins, you will see a “grid” of two rows and two columns. The left-most column will automatically contain the word “Case” followed by a number (1 for the first case.) The top row will contain the names of the variables that you assign when you start entering data for the first variable. If you click on the first data position (second row, second column of the grid) a “form” will appear that looks like the figure below: 12
  13. 13. Figure IV-1 In the above figure you will notice that a variable name has automatically been generated for the first variable. To change the default name, click the box with the default name and enter the variable name that you desire. It is suggested that you keep the length of the name to eight characters or less. You may also enter a long label for the variable. If you save your file as an OS2 file, this long name (as well as other descriptive information) will be saved in the file (the use of the long label has not yet been implemented for printing output but will be in future versions.) To proceed, simply click the Return button in the lower right of this form. The default type of variable is a “floating point” value, that is, a number which may contain a decimal fraction. If a data field is left blank, the program will assume a missing value for the data. The default format of a data value is eight positions with two positions allocated to fractional decimal values (format 8.2.) By clicking on any of the specification fields you can modify these defaults to your own preferences. You can change the width of your field, the number of decimal places (0 for integers) and the justification in the grid (Left, Center or Right.) There is also a "memo box" in which you may record a description of the file you are creating or have created. When you enter data in the grid of the main form there are several ways to navigate from cell to cell. You can, of course, simply click on the cell where you wish to enter data and type the data values. If you press the “enter” key following the typing of a value, the program will automatically move you to the next cell to the right of the current one. If you have not defined a variable for that column, you will again see the pop-up form for the specifications of the new variable. You may also press the keyboard “down” arrow to move to the cell below the current one. If it is a new row for the grid, a new row will automatically be added and the “Case” label added to the first column. You may use the arrow keys to navigate left, right, up and down. You may also press the “Page Up” button to move up a screen at a time, the “Home” button to move to the beginning of a row, etc. Try the various keys to learn how they behave. If you accidentally move into a column where you do NOT want to create a 13
  14. 14. variable, simply click the delete row button on the pop-up specifications form. You may also click on the main form’s Edit menu and use the delete column or delete row options. Be sure the cursor is sitting in a cell of the row or column you wish to delete when you use this method. Saving a File Once you have entered a number of values in the grid, it is a good idea to save your work (power outages do occur!) Go to the main form’s File menu and click it. You will see there are several ways to save your data. The first time you save your data you should click the “Save As” option. A “dialog box” will then appear as shown below: Figure IV-2 Simply type the name of the file you wish to create in the File name box and click the Save button. After this initial save as operation, you may continue to enter data and save with the Save button on the file menu. Before you exit the program, be sure to save your file if you have made additions to it. When all of the variables you will be entering in your file have been specified, you will want to click the Save and Return button on the specifications form. You may bring up the specifications form at any time during a session by clicking on the Define option under the Variables menu item on the main form. Doing this creates a file with the same name as your file but with the extension of .DIC which is the dictionary file containing your specifications for the file. In order for the dictionary file to have the same name as your OS2 file, you must first save the file under the name you select. If you do not need to save specifications other than the short name of each variable, you may prefer to “export” the file in a format compatible to other programs. The Save Tab File AS option under the File menu will save your data in a text file in which the cell values in each row are separated by a tab key character. A file with the extension .TAB will be created. The list of variables from the first row of the grid are saved first, then the first row of the data, etc. until all grid rows have been saved. Alternatively, you may export your data with a comma or a space separating the cell values. Basic language programs frequently read files in which values are separated by commas. 14
  15. 15. Loading a File When you begin a session with OS2 after having saved a file in a previous session, you simply click the File menu and select the Open option. If you have exported the file, then open the file in the same format you had saved it. If you attempt to open a file while another one is still in the grid, you will receive a message to first close the file in the grid. Files with thousands of cases and a large number of variables (e.g. 100 or more) may take a little time to load - be patient. It is possible that the file you wish to access is too large for the data grid (I cannot tell you the exact maximum rows or columns.) I have encountered files with over 500,000 records (rows) and 100's of variables that were too large. It is possible that the file you wish to access is too large for the data grid (I cannot tell you the exact maximum rows or columns.) I have encountered files with over 500,000 records (rows) and 100's of variables that were too large. As is often the case with such files however, analyses are often performed on sub-sets of data in such files. In the Data-Tools menu there is an option to “Extract a sub-file”. If you select this option, the following figure appears: Figure IV-3 Using the above form, one first selects the file by clicking the “Press to Select a File” button and enters the name in the open dialog box. You also need to specify the number of fields in each record (cells in each row.) Because you are selecting a sub-set of records from the file, you must also indicate the sequence number of one of the fields to use for selecting the records you wish to load into the grid. You then specify the value of that variable (field) that is to be matched with each record in the file that is to be retrieved. It is important to note that the value of, say, 36 is NOT the same as the value 36.00! You must be VERY specific about your matching key! Tab, comma or space separated fields may be read by this procedure. Alternatively, you can specify your own format. If this option is chosen, a grid is displayed as shown below: 15
  16. 16. Figure IV-4 In the grid that appears you can specify the beginning and ending position of each record field, the type of data contained in that field, the line number of multiple lines that are used for each record, and the label or name of the field (variable.) Once specifications have been entered, the user clicks the “Extract” button. The number of records read from the original file are displayed and the number of records extracted are displayed. When you click the “Return” button, the extracted records will be shown in the grid of the main form. You can, of course, then save the sub-file as a separate file. Printing a File For moderate sized files that you have created you may want to obtain a “hard-copy” of the file. Select the “Print” option under the File menu of the main form to print your file. 16
  17. 17. V. Output Display Windows Version Output created by the procedures of OS2 are displayed in what is called a “Rich Text Box” on an output form. This rich text box has many of the features of a word processing program. You can type new text directly in the window by clicking an area where you want to insert new text. You can change the font for a selected portion of text by dragging the mouse with the left mouse button held down over the text you wish to change then clicking the font icon which appears at the top of the form. You can save (and load previously saved) output in a .RTF file which then can be edited with another word processing program that handles rich text files such as Word or WordPerfect. These features let you “customize” the output for inclusion into reports you may be preparing. Linux Version The output window in Linux does not support the rich text file features of the Microsoft Windows version. The lines of output are instead displayed in a “Memo Box”. If you save the output file as a text file, you can then edit it with your word processing program to enhance the output for any reports you may need to prepare. 17
  18. 18. VI. The Main Menu File See chapter IV above for a description of files and the use of the File menu. Variables and Data-Tools The Data menu contains three options: (1) Define, (2) Print Dictionary, and (3) Re-code. If you select the Define option, you are presented the form shown in Figure IV-1 above. If you select the Data-Tools menu item there are five options: (1) Format, (2) Sort Cases, (3) Print Data File, (4) Transform Variable and (5) Select Cases. The first option of the Data-Tools menu allows you to automatically format all of the cells using the current formatting specifications of the variables. If you have imported a file, the formatting will be the default values such as two decimal places for numeric values. It should be noted that the values in the grid are all “strings” of characters which, for numeric values, are converted into numbers when they are read by an analysis program. It is important to note the format specification of variables particularly when using the “Missing Values” specifications. A value of 0 is NOT the same as 0.00 ! The Select Cases option is used when you want to select a sub-set of records (rows) from your file for analysis. Several methods are available for selecting records. When you select this option, the following dialogue window appears: Figure VI-1 In the above figure we see a list of variables for the previously loaded file (the cansas.tab file located in your directory was imported.) If you click one of the optional “Select” buttons on this form (other than the default All Cases or the Use the Filter Variable option), another form will appear on which you make your specifications for selection. Probably the most frequently chosen button is the “if condition is satisfied” button. If you were to click that button on the form above you would then see the following form: 18
  19. 19. Figure VI-2 In this form you can develop an expression for the selection of cases that meet one or more criteria. You will notice that an expression begins with a left parenthesis. Each expression entered must begin with a left paren and end with a right parenthesis. If, for example, we wish to select cases from the cansas.tab file for which their weight is greater than 160 pounds and have pulse rates greater than 50 we would need an expression like (weight > 160) and (pulse > 50) . On the form you can enter the name of the variable (like weight) by clicking the variable name then clicking the right arrow button. You can enter comparisons like less than, greater than, etc. by clicking the corresponding buttons in the block of buttons shown below the expression window. In the figure below we have entered the desired expression: Figure VI-3 You can see that the symbol entered for the “and” button is the ampersand (&) symbol. If we now click the OK button, we will return to the previous form (Figure VI-2). Click the Apply button and the form is expanded as appears as below: 19
  20. 20. Figure VI-4 Notice that the expanded form now shows the “logic” of our selection broken down into a line for each expression and the logic between expressions. At this point we should decide whether we simply want to filter out those cases not selected or delete them from the grid file. Note- selecting the “Deleted from the file” option will NOT delete them from the original file - only from the grid! If you select the option “Filtered Out” then a “Filter Variable” named IfFilter is automatically created and added as a variable in the grid. In the column of the filter variable will be a “YES” or “NO” indicating whether or not the case is to be included in any selected analysis. Once you have made your decision, click the “Apply” button to continue. Show below is the result of selecting the Filtered Out option for the cansas.tab file as we specified it: 20
  21. 21. Figure VI-5 You can see that a “YES” appears for each case that met the selection criteria. Subsequent analyses on the data in this grid will ignore those cases that have a “NO” coded in the filter variable. You can turn filtering off by again selecting the “Select Cases” option on the menu and choosing to “Select All” from the selection menu. You can also select a random number of cases from a file. Again, let us load the cansas.tab file and select a random sample of 50 percent of the cases. Using the “Random Sample” button shown in Figure VI-1 above, you would be presented with the following form for specifying your random selection: Figure VI-6 We have entered 50 in the box for our approximately 50% sample. When we click the OK button, we are returned to the previous form where we again choose to either delete unselected records or use a filter variable. Selecting the filter variable in our case resulted in 10 of the 20 cases having an IfFilter variable value of “NO”. 21
  22. 22. Edit The Edit menu provides the ability to cut, copy and paste rows or columns in your grid. When you elect either the cut or copy option, the cell values of the column (or row) on which your cursor is located is copied into a temporary (hidden) location. The pasting operation inserts a new column (or row) following the position where your cursor is located. You may also insert a new column or row in the grid. If you select one of these options, the new column or grid is placed after the position of the cursor. Transform It is often the case that a variable needs to be transformed prior to an analysis or to enhance the interpretation of a variable. When you click the Transform menu, a form appears on which you specify the kind of transformation you wish to make. We have used the cansas.tab file data as an example. We wish to transform the original weight values to standardized z scores. We have specified that the new variable will be named zWeight. We selected the Z(V1) transformation which does the z score transformation of the variable 1 we specified in the form: Figure VI-7 When we click the “Compute” button, the procedure completes the transformation and stores the results in a new variable with the title we chose. We note that the z transformation is a linear transformation and that the z scores produced have the same distribution shape as the original scores. We might have elected the NormDistZ(V1) option instead which would have transformed our original scores to z scores from a normal distribution having the same percentile ranks as the original scores. A variety of transformations are possible. Some use two variables (V1 and V2) while others use just one. 22
  23. 23. Analyses The Analyses menu of the main form provides the bulk of the procedures available to the user. The procedures are grouped into the following categories: 1. Descriptive 2. One Sample Test for Means, Proportion, Correlation or Variance 3. Comparisons 4. Correlation 5. Multiple Regression 6. Multivariate 7. Cross Classification 8. Measurement 9. Nonparametric 10. Statistical Process Control Within each of these sub-menus are listed the procedures available to the user. Chapters VIII through XVI provide a description of the procedures of these menus. Simulation The Simulation menu of the main form presents seven procedures which are further described in Chapter XVII. These procedures are: 1. 2. 3. 4. 5. 6. 7. Bivariate Scatterplot Multivariate Distribution Type I and Type II Error Curves for z-test Power Curves for z Test Distribution Plots and Critical Values Generate Sequential Values Generate Random Theoretical Distribution Values These procedures are particularly helpful to the student learning basic statistical concepts about distributions, hypothesis testing, correlation, etc. Subsystems A Matrix Manipulation program provides a user the ability to enter one or more matrices and perform matrix operations on and between them. Scalar, vector and matrix operations are available. By careful entry of a series of operations, a complete “program” for certain types of analyses can be created. Each operation is encoded in a script file which can be saved, loaded and re-executed. Statistics teachers might appreciate this for teaching multiple regression, canonical correlation, etc. The Matrix Manipulation program is further described in Chapter XVIII. Help Users of Microsoft Windows are used to having a “help” system available to them for instant assistance when using a program. Most of these systems provide the user the ability to press the “F1" key for assistance on a particular topic or by placing their cursor on a particular program item and pressing the right mouse button to get 23
  24. 24. help. OS2 for the Microsoft Windows does have a help file (currently the one also used for the original OpenStat package.) While currently still in a development stage, you can find help on a variety of topics. Experiment. Press the F1 key and see what happens! Linux users are more often provided “man” pages on topics or “info” topics. The Kylix compiler does have the potential for accessing such files but LinuxOStat has not yet implemented any of these methods. Instead, this document is available in Portable Data File format (.PDF) to both the Microsoft Windows and the Linux platforms. Both systems provide the capability of keeping multiple windows open to the user concurrent with operation of one or more programs. You can keep this file open in either system and simply bring it to the front when needed. 24
  25. 25. VII. Basic Statistics Introduction This chapter introduces the basic statistics concepts you will need throughout your use of the OpenStat2 package. You will be introduced to the symbols and formulas used to represent a number of concepts utilized in statistical inference, research design, measurement theory, multivariate analyses, etc. Like many people first starting to learn statistics, you may be easily overwhelmed by the symbols and formulas - don't worry, that is pretty natural and does NOT mean you are retarded! You may need to re-read sections several times however before a concept is grasped. You will not be able to read statistics like a novel (don't we wish we could) but rather must "study" a few lines at a time and be sure of your understanding before you proceed. Symbols Used in Statistics Greek symbols are used rather often in statistical literature. (Is that why statistics is Greek to so many people?) They are used to represent both arithmetic type of operations as well as numbers, called parameters, that characterize a population or larger set of numbers. The letters you usually use, called Arabic letters, are used for numbers that represent a sample of numbers obtained from the population of numbers. Two operations that are particularly useful in the field of statistics that are represented by Greek symbols are the summation operator and the products operator. These two operations are represented by the capital Greek letters Sigma Σ and Pi Π. Whenever you see these symbols you must think: Σ= "The sum of the values: " , or Π = "The product of the values:" For example, if you see Y = Σ (1,3,5,9) you would read this as "the sum of 1, 3, 5 and 9". Similarly, if you see Y = Π(1,3,5,9) you would think "the product of 1 times 3 times 5 times 9". Other conventions are sometimes adopted by statisticians. For example, like in beginning algebra classes, we often use X to represent any one of a number of possible numbers. Sometimes we use Y to represent a number that depends on one or more other numbers X1, X2, etc. Notice that we used subscripts of 1, 2, etc. to represent different (unknown) numbers. Lower case letters like y, x, etc. are also sometimes used to represent a deviation of a score from the mean of a set of scores. Where it adds to the understanding, X, and x may be italicized or written in a script style. Now lets see how these symbols might be used to express some values. For example, we might represent the set of numbers (1,3,7,9,14,20) as X1, X2, X3, X4, X5, and X6. To represent the sum of the six numbers in the set we could write 6 Y = ∑ X i = 1 + 3 + 7 + 9 + 14 + 20 = 54 i =1 If we want to represent the sum of any arbitrary set of N numbers, we could write the above equation more generally, thus N Y = ∑ Xi i =1 represents the sum of a set of N values. Note that we read the above formula as "Y equals the sum of X subscript i values for the value of i ranging from 1 through N, the number of values". 25
  26. 26. What would be the result of the formula below if we used the same set of numbers (1,3,7,9,14,20) but each were multiplied by five ? N Y = ∑ Xi i =1 To answer the question we can expand the formula to Y = 5X1 + 5X2 + 5X3 + 5X4 + 5X5 + 5X6 = 5(X1 + X2 + X3 + X4 + X5 + X6) = 5(1 + 3 + 7 + 9 + 14 + 20) = 5(54) = 270 In other words, N N i =1 i =1 Y = ∑ 5 X i = 5∑ X i = 270 We may generalize multiplying any sum by a constant (C) to N N i =1 i =1 Y = ∑ CX i = C ∑ X i What happens when we sum a term which is a compound expression instead of a simple value? For example, how would we interpret N Y = ∑ ( X i − C ) where C is a constant value? i =1 We can expand the above formula as Y = (X1 - C) + (X2 - C) + ... + (XN - C) (Note the use of ... to denote continuation to the Nth term). The above expansion could also be written as Y = (X1 + X2 + ... + XN) - NC Or Y = N ∑X i =1 i − NC We note that the sum of an expression which is itself a sum or difference of multiple terms is the sum of the individual terms of that expression. We may say that the summation operator distributes over the terms of the expression! 26
  27. 27. Now lets look at the sum of an expression which is squared. For example, N Y = ∑ (X i − C ) 2 i =1 When the expression summed is not in its most simple form, we must first evaluate the expression. Thus N N N i =1 [ ] i =1 N N N i =1 i =1 i =1 Y = ∑ ( X i − C ) = ∑ ( X i − C )( X i − C ) = ∑ X i2 − 2CX i + C 2 = ∑ X i2 − ∑ 2 X i + ∑ C 2 2 i =1 N N i =1 i =1 ∑ X i2 − 2∑ X i + NC 2 or Y = The Arithemetic Mean The mean is probably the most often used parameter or statistic used to describe the central tendency of a population or sample. When we are discussing a population of scores, the mean of the population is denoted with the Greek letter µ . When we are discussing the mean of a sample, we utilize the letter X with a bar above it. The sample mean is obtained as n X= ∑X i =1 i n The population mean for a finite population of values may be written in a similar form as N µ= ∑X i =1 i N When the population contains an infinite number of values which are continuous, that is, can be any real value, then the population mean is the sum of the X values times the proportion of those values. The sum of values which can be an arbitrarily small in differences from one another is written using the integral symbol instead of the Greek sigma symbol. We would write the mean of a set of scores that range in size from minus infinity to plus infinity as +∞ µ= ∫ Xp( X )dx −∞ where p(X) is the proportion of any given X value in the population. The tall curve which resembles a script S is a symbol used in calculus to mean the "sum of" just like the symbol Σ that we saw previously. We use Σ to represent "countable" values, that is values which are discrete. The "integral" symbol on the other hand is used to represent the sum of values which can range continuously, that is, take on infinitely small differences from oneanother. 27
  28. 28. A similar formula can be written for the sample mean, that is, n X = ∑ X i p( X i ) i =1 where p(X) is the proportion of any given Xi value in the sample. If a sample of n values is randomly selected from a population of values, the sample mean is said to be an unbiased estimate of the population mean. This simply means that if you were to repeatedly draw random samples of size n from the population, the average of all sample means would be equal to the population mean. Of course we rarely draw more than one or two samples from a population. The sample mean we obtain therefore will typically not equal the population mean but will in fact differ from the population mean by some specific amount. Since we usually don't know what the population mean is, we therefore don't know how far our sample mean is from the population mean. If we have, in fact, used random sampling though, we do know something about the shape of the distribution of sample means; they tend to be normally distributed. (See the discussion of the Normal Distribution in the section on Distributions). In fact, we can estimate how far the sample mean will be from the population mean some (P) percent of the time. The estimate of sampling errors of the mean will be further discussed in the section on testing hypotheses about the difference between sample means. Now let us examine the calculation of a sample mean. Assume you have randomly selected a set of 5 scores from a very large population of scores and obtained the following: X1 X2 X3 X4 X5 =3 =7 =2 =8 =5 The sample mean is simply the sum (3 ) of the X scores divided by the number of the scores, that is n X= ∑X i =1 n i 5 = ∑ ( X 1 + X 2 + X 3 + X 4 + X 5 ) / 5 = (3 + 7 + 2 + 8 + 5) / 5 = 5.0 i =1 We might also note that the proportion of each value of X is the same, that is, one out of five. The mean could also be obtained by n X = ∑ X i p( X i ) i =1 = 3 (1/5) + 7 (1/5) + 2 (1/5) + 8 (1/5) + 5 (1/5) = 5.0 The sample mean is used to indicate that value which is "most typical" of a set of scores, or which describes the center of the scores. In fact, in physics, the mean is the center-of-gravity ( sometimes called the first moment of inertia) of a solid object and corresponds to the fulcrum, the point at where the object is balanced. Unfortunately, when the population of scores from which we are sampling is not symmetrically distributed about the population mean, the arithmetic average is often not very descriptive of the "central" score or most 28
  29. 29. representative score. For example, the population of working adults earn an annual salary. These salaries however are not symmetrically distributed. Most people earn a rather modest income while their are a few who earn millions. The mean of such salaries would therefore not be very descriptive of the typical wage earner. The mean value would be much higher than most people earn. A better index of the "typical" wage earner would probably be the median, the value which corresponds to the salary earned by 50 percent or fewer people. Examine the two sets of scores below. Notice that the first 9 values are the same in both sets but that the tenth scores are quite different. Obtain the mean of each set and compare them. Also examine the score below which 50 percent of the remaining scores fall. Notice that it is the same in both sets and better represents the "typical" score. SET A: ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ) Mean = Median = SET B: ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 1000 ) Mean = Median = Variance and Standard Deviation A set of scores are seldom all exactly the same if they represent measures of some attribute that varies from person to person or object to object. Some sets of scores are much more variable that others. If the attribute measures are very similar for the group of subjects, then they are less variable than for another group in which the subjects vary a great deal. For example, suppose we measured the reading ability of a sample of 20 students in the third grade. Their scores would probably be much less variable than if we drew a sample of 20 subjects from across the grades 1 through 12! There are several ways to describe the variability of a set of scores. A very simple method is to subtract the smallest score from the largest score. This is called the exculsive range. If we think the values obtained from our measurement process are really point estimates of a continuous variable, we may add 1 to the exclusive range and obtain the inclusive range. This range includes the range of possible values. Consider the set of scores below: 5, 6, 6, 7, 7, 7, 8, 8, 9 If the values represent discrete scores (not simply the closest value that the precision of our instrument gives) then we would use the exclusive range and report that the range is (9 - 5) = 4. If, on the other hand, we felt that the scores are really point estimates in the middle of intervals of width 1.0 (for example the score 7 is actually an observation someplace between 6.5 and 7.5) then we would report the range as (9-5) + 1 = 5 or (9.5 - 4.5) = 5. While the range is useful in describing roughly how the scores vary, it does not tell us much about how MOST of the scores vary around, say, the mean. If we are interested in how much the scores in our set of data tend to differ from the mean score, we could simply average the distance that each score is from the mean. The mean deviation, unfortunately is always 0.0! To see why, consider the above set of scores again: Mean = (5+6+6+7+7+7+8+8+9) / 9 = 63 / 9 = 7.0 Now the deviation of each score from the mean is obtained by subtracting the mean from each score: 5 - 7 = -2 6 - 7 = -1 6 - 7 = -1 7-7= 0 7-7= 0 29
  30. 30. 7-7= 0 8 - 7 = +1 8 - 7 = +1 9 - 7 = +2 ____ Total = 0.0 Since the sum of deviations around the mean always totals zero, then the obvious thing to do is either take the average of the absolute value of the deviations OR take the average of the squared deviations. We usually average the squared deviations from the mean because this index has some very important application in other areas of statistics. The average of squared deviations about the mean is called the variance of the scores. For example, the variance, which we will denote as S2, of the above set of scores would be: S 2 (− 2)2 + (− 1)2 + (− 1)2 + 0 2 + 0 2 + 0 2 + 12 + 12 + 2 2 = 9 = 1.3333 approximately. Thus we can describe the score variability of the above scores by saying that the average squared deviation from the mean is about 1.3 score points. We may also convert the average squared value to the scale of our original measurements by simply taking the square root of the variance, e.g. S =√ (1.3) = 1.1547 (approximately). This index of variability is called the standard deviation of the scores. It is probably the most commonly used index to describe score variability! Estimating Population Parameters : Mean and Standard Deviation We have already seen that the mean of a sample of scores randomly drawn from a population of scores is an estimate of the population's mean. What we have to do is to imagine that we repeatedly draw samples of size n from our population (always placing the previous sample back into the population) and calculate a sample mean each time. The average of all (infinite number) of these sample means is the population mean. In algebraic symbols we would write: k µ= ∑X i =1 k i as k 64 _ Notice that we have let X represent the sample mean and : represent the population mean. We say that the sample mean is an unbiased estimate of the population mean because the average of the sample statistic calculated in the same way that we would calculate the population mean leads to the population mean. We calculate the sample mean by dividing the sum of the scores by the number of scores. If we have a finite population, we could calculate the population mean in exactly the same way. The sample variance calculated as the average of squared deviations about the sample mean is, however, a biased estimator of the population variance (and therefore the standard deviation also a biased estimate of the population standard deviation). In other words, if we calculate the average of a very large (infinite) number of sample variances this average will NOT equal the population variance. If, however, we multiply each sample variance by the constant n / (n-1) then the average of these "corrected" sample variances will, in fact, equal the population variance! Notice that if n, our sample size, is large, then the bias n / (n-1) is quite small. For example a sample size of 100 gives a correction factor of about 1.010101. The bias is therefore approximately 1 hundredth of 30

×