  1. 1. 2- DAYS WORKSHOP ON SPSS SYNTAX (28th and 29th October, 2010) Organized by: Indian Institute of Psychometry, Kolkata Dr. Debdulal Dutta Roy, Ph.D. Psychology Research Unit Indian Statistical Institute, Kolkata Dr. D. Dutta Roy, ISI., Kolkata
  2. 2. What is SPSS ?  Initially, SPSS is considered as statistical package for social sciences. But it is noted that SPSS is used by many non social scientists. Therefore it is considered as software for statistical data analysis. Now, SPSS is managed by IBM. ICONS OF SPSS Dr. D. Dutta Roy, ISI., Kolkata
  3. 3. SPSS facilities  The software includes several facilities as  File management  creating new file, opening spss formatted file, extracting non SPSS file, merging file, splitting file, transposing data  Variable management  creating new variables, recoding variable  Case management  adding cases, select cases, sorting cases  Text data analysis or Text analytics  text categorization, text clustering, concept/entity extraction, document summarization, and entity relation modeling (i.e., learning relations between named entities).  Numeric data analysis  Describing the data, data quality or fitting the data into statistical models, data association, data clustering, data reliability and validity using different statistical tools. Dr. D. Dutta Roy, ISI., Kolkata
  4. 4. SPSS WORKSHEET  Variable view  Data view  Create variables :  Name :  Type : String, Numeric, Comma and others  Width : Length of digit  Decimal:  Label: Meaning of variable code name  Values: m=male, f=female or 1=male and 2=female  Missing: np/ 9/99/ extreme values  Columns :  Align : left, right, center  Measure: nominal, ordinal, scale Dr. D. Dutta Roy, ISI., Kolkata
  5. 5. Assignment  In SPSS worksheet  Prepare worksheet with five variables as gender, first name, middle name , surname and age.  Prepare list of names.  Examine their distribution using graphs and tables.  Retrieving data from excel  Retrieving data from note pad  Write in this way <Ms., Ratna, kumari, Roy, 25> in the note pad. Retrieve the list using SPSS command Dr. D. Dutta Roy, ISI., Kolkata
  6. 6. Assignment Cross tabulation is useful to determine association of two categorical variables.  Prepare spss worksheet to compute cross tabulation between gender and anxiety.  Use both text and numeric data.  Compute chi-square. Dr. D. Dutta Roy, ISI., Kolkata
  7. 7. Solution Dr. D. Dutta Roy, ISI., Kolkata
  8. 8. Summary -1  SPSS is useful software for analysis of both text and numeric data.  SPSS worksheet has two windows – data window and value window. Later is used to customize the variable.  The data saved in SPSS file can be transformed to Excel or text.  Again, the data saved in Excel or in text format can be retrieved into SPSS worksheet. Dr. D. Dutta Roy, ISI., Kolkata
  9. 9. SPSS - SYNTAX Dr. D. Dutta Roy, ISI., Kolkata
  10. 10. What is SPSS-Syntax ?  Syntax is a set of rules that are associated with the language or command. SPSS syntax is useful for data management and archiving the procedure of data analysis. In the dissertation, presence of syntax helps examiner to understand the procedure followed by the researcher.  The syntax can be written in notepad and in word document. SPSS syntax is the alternative to the point and click mode.  It is more user friendly as user can do repetitive tasks using syntax and can see what procedures are followed by him for data analysis. Dr. D. Dutta Roy, ISI., Kolkata
  11. 11. Problems of point and click  Point and click procedure provides many information. Sometimes they are not relevant to researcher. Researcher can restrict analytical information according to needs.  Point and click procedure varies with different interfaces or versions of SPSS. But syntax works well in almost all the versions.  Statistical tool not available in SPSS can be developed by syntax if author knows how to write syntax for example, moderated regression analysis. Dr. D. Dutta Roy, ISI., Kolkata
  12. 12. Syntax error  A syntax error occurs when the researcher or individual who wrote the code had not followed the rules of the language, the flow chart, causing the program to fail.  The common error is missing terminator and columns for the command line. General command is first line starts at the first column and the others are in the second line starts at second column. Dr. D. Dutta Roy, ISI., Kolkata
  13. 13. Syntax window Command Terminator Dr. D. Dutta Roy, ISI., Kolkata
  14. 14. ASSIGNMENT  Write the below in syntax window and run the program.  DESCRIPTIVES VARIABLES = ABANY ABDEFECT ABHLTH ABNOMORE ABPOOR ABRAPE ABSINGLE ADULTS AGE  /STATISTICS=MEAN STDDEV. Observation: Do you get your results ? If not, what is missing ? Put terminators in both lines and run the program. What is your observation ? Can you find out continuation line ? Dr. D. Dutta Roy, ISI., Kolkata
  15. 15. Summary -2  Syntax rule guides program in analysis of data according to user needs.  Statements are written systematically following syntax rules in syntax window .  One can control unnecessary output by using syntax. Dr. D. Dutta Roy, ISI., Kolkata
  16. 16. FLOW CHART Dr. D. Dutta Roy, ISI., Kolkata
  17. 17. What is flow chart ?  The flowchart is a means of visually presenting the flow of data through an information processing systems, the operations performed within the system and the sequence in which they are performed. Dr. D. Dutta Roy, ISI., Kolkata
  18. 18. Standard symbols  Start or end of the program  Computational steps or processing function of a program  Input or output operation  Decision making and branching  Connector or joining of two parts of program Dr. D. Dutta Roy, ISI., Kolkata
  19. 19. Guidelines of flow charting In drawing a proper flowchart, all necessary requirements should be listed out in logical order. The flowchart should be clear, neat and easy to follow. There should not be any room for ambiguity in understanding the flowchart.  The usual direction of the flow of a procedure or system is from left to right or top to bottom.  Only one flow line should come out from a process symbol.  Only one flow line should enter a decision symbol, but two or three flow lines, one for each possible answer, should leave the decision symbol.  Only one flow line is used in conjunction with terminal symbol.  Write within standard symbols briefly. As necessary, you can use the annotation symbol to describe data or computational steps more clearly.  If the flowchart becomes complex, it is better to use connector symbols to reduce the number of flow lines. Avoid the intersection of flow lines if you want to make it more effective and better way of communication.  Ensure that the flowchart has a logical start and finish.  It is useful to test the validity of the flowchart by passing through it with a simple test data. Reference: http://www.nos.org/htm/basic2.htm Dr. D. Dutta Roy, ISI., Kolkata
  21. 21. Summary - 3  Use of any statistical tool requires set of specific assumptions. Flow chart helps us to incorporate all the assumptions systematically. This will reduce errors in data analysis.  Therefore, syntax writer should study thoroughly all the assumptions and their systematic uses before selection of statistical tool in analysis. Dr. D. Dutta Roy, ISI., Kolkata
  22. 22. SYNTAX RULES Dr. D. Dutta Roy, ISI., Kolkata
  23. 23. Command Each command must begin in the first column of a new line. Continuation lines must be indented at least one space. The period at the end of the command is optional. If you generate command syntax by pasting dialog box choices into a syntax window, the format of the commands is suitable for any mode of operation. Dr. D. Dutta Roy, ISI., Kolkata
  24. 24. Variable names Variable names ending in a period can cause errors in commands created by the dialog boxes. You cannot create such variable names in the dialog boxes, and you should generally avoid them. SPSS command syntax is case insensitive, and three-letter abbreviations can be used for many command specifications. You can use as many lines as you want to specify a single command. You can add space or break lines at almost any point where a single blank is allowed, such as around slashes, parentheses, arithmetic operators, or between variable names. For example, FREQUENCIES VARIABLES=JOBCAT GENDER /PERCENTILES=25 50 75 /BARCHART. and freq var=jobcat gender /percent=25 50 75 /bar. Dr. D. Dutta Roy, ISI., Kolkata
  25. 25. Creating new variable  There are some situations where in new variable is to be created in research. For example, you are interested to add or multiply some weight to any variable or you want to multiply two variables.  Use COMPUTE command  EXERCISE * age2 is new variable COMPUTE age2=Age - 5. EXECUTE. DESCRIPTIVES VARIABLES=age, age2 /STATISTICS=MEAN STDDEV MIN MAX. Descriptive Statistics N Minimu m Maximu m Mean Std. Deviatio n Age 542 7 15 9.54 1.117 age2 542 2 10 4.5406 1.11667 Valid N (listwise) 542 Dr. D. Dutta Roy, ISI., Kolkata
  26. 26. Finding out lost file Researcher sometimes forgets the location of file using click menu. He can find the file using ‘GET FILE’ syntax.  Get the file File>new>syntax Write below syntax GET FILE=‘c:windowsdesktopddr.sav’. Dr. D. Dutta Roy, ISI., Kolkata
  27. 27. Check your file  You can check validity of lost file using DISPLAY command. This will help you to get the variable names.  GET FILE='E:ses_data_final.sav'. * Display all variables DISPLAY. /* Display data of all variables LIST /* Display data of single variable LIST VARIABLES = <var1>.  Here * is used for beginning comment and /* is used for middle comment. Dr. D. Dutta Roy, ISI., Kolkata
  28. 28. Data checking by total score  Data checking is made using if command. Box 8.5 represents syntax for checking the data. Here is the assumption that total score should not be more than 10. Therefore the command ‘if(total>10) t2=9’ is used. After the if command, execute command with period sign (.) is necessary. Output file is saved in the specific location finally.  Exercise GET File= 'E:ses_data_final.sav'. if(total>10) t2=9. Execute. LIST variables=name, total, t2. save outfile='e:sesout.sav'. Output NAME total t2 TANIA PARVIN 8 .00 BACCHU MONDAL 9 .00 HABIBUL ISLAM 9 .00 KARIM RAHAMAN 10 .00 AKTAR HUSSAIN 10 .00 LALTU MONDAL 10 .00 RAHIM RAHAMAN 10 .00 NOOR ALAM 10 .00 ***** 11 9.00 SADIK JAMAL 12 9.00 TAJMIR KHATUN 8 .00 FIROJ MONDAL . . Dr. D. Dutta Roy, ISI., Kolkata
  29. 29. Is your data good for analysis ? Data entry error is a serious concern for analysis of data. Extreme data or outlier is assumed as error. Presence of outlier sometimes changes mean and standard deviation. SD becomes higher than mean. It is not necessary to delete the outlier first as outlier sometimes provide valid information. It gives you information about inequality in distribution of data. But finding out the outlier is important. Box whisker plot is useful to find out outlier. Write this in syntax window: EXAMINE VARIABLES=abany abd efect /COMPARE VARIABLE /PLOT=BOXPLOT /STATISTICS=NONE /NOTOTAL /MISSING=LISTWISE.  Another way is to study frequencies of variables. Frequencies variables=abany. Dr. D. Dutta Roy, ISI., Kolkata
  30. 30. How can you find out case error?  Box-whisker plot sometimes can not find out the cases who have done systematic error. Suppose you have collected job satisfaction data using five point rating scale of 20 items where in 10 items are in reverse. And one case assigns 3 across all the items. Box plot can not locate the case.  Under such condition, you can transpose the data and compute mean and SD for each case. Case error can be identified if SD is 0.00 or is higher than mean. By using FLIP command you can transpose the data. EXERCISE FLIP VARIABLES= DESCRIPTIVES VARIABLES= Dr. D. Dutta Roy, ISI., Kolkata
  31. 31. Relational operator  Relational operator is used to compare values. It is used with if command  A relation is a logical expression that compares two values using a relational operator. In the command  IF (X EQ 0) Y=1 the variable X and 0 are expressions that yield the values to be compared by the EQ relational operator. The following are the relational operators: Symbol Definition EQ or = Equal to NE or ~= or ¬ = or <> Not equal to LT or < Less than LE or <= Less than or equal to GT or > Greater than GE or >= Greater than or equal to Dr. D. Dutta Roy, ISI., Kolkata
  32. 32. Select case When researcher wants to compute specific statistics for specific cases, the command select case is useful. SELECT IF (AGE=8). DESCRIPTIVES VARIABLES=ACH. Dr. D. Dutta Roy, ISI., Kolkata
  33. 33. Command to filter variable Researcher can analyze the data of specific group. Box 8.2 shows syntax for descriptive statistics of age for the cases who are living in specific block of district (code=1). USE ALL. COMPUTE filter_$=(Block_code=1). VARIABLE LABEL filter_$ 'Block_code=1 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE. DATASET ACTIVATE DataSet1. DESCRIPTIVES variables=age. Dr. D. Dutta Roy, ISI., Kolkata
  34. 34. Summary -4  Syntax rules are important to write the programs in syntax window.  By writing the programs, one can import and export file, check file, list variables, evaluate data entry error, create new variable, select case and filter variable. Dr. D. Dutta Roy, ISI., Kolkata
  35. 35. STATISTICAL ANALYSIS Dr. D. Dutta Roy, ISI., Kolkata
  36. 36. Item-item correlation of five point rating scale GET FILE='C:UsersddroyDesktopIIP_SPSS syntax_workshopinnovation data.sav'. CORRELATIONS /VARIABLES=AW1 AW2 AW6 AW10 AW18 AW19 /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE.  There are 6 items measuring awareness of environment. It is assumed that 6 items are related to each other. One can use AW1 TO AW19 also.  This program assesses inter correlation among 6 items.  Pair wise missing data are deleted and level of significance is shown.  Two tail is applicable when direction of relationship is not pre assumed.  NOSIG is used to flag significant values. Dr. D. Dutta Roy, ISI., Kolkata
  37. 37. Item total correlations GET FILE='C:UsersddroyDesktopIIP_ SPSS syntax_workshopinnovation data.sav'. compute total=AW1+ AW2+ AW6 +A W10 +AW18+ AW19. CORRELATIONS /VARIABLES=AW1 to AW19, total /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE.  Compute command is used to determine total score. Later it is used for item total correlation. Dr. D. Dutta Roy, ISI., Kolkata
  38. 38. Multiple regression GET FILE='C:UsersddroyDesktopIIP_SPSS syntax_workshopinnovation data.sav'. compute total=AW1+ AW2+ AW6 +AW10 +AW18+ AW19. REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA CHANGE /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT total /METHOD=ENTER AW1 AW2 AW6 AW10 AW18. Run command should select all otherwise total score will not be used. In this model total score is predicted by each item. Dr. D. Dutta Roy, ISI., Kolkata
  39. 39. Mean differences When data were collected from two different groups. Command of independent t-test is T-TEST GROUPS=IC3(3) /MISSING=LISTWISE /VARIABLES=total /CRITERIA=CI(.9500).  Here IC3 is independent variable and total is dependent variable.  Ic3 (3) indicates 3 as cut off points to make two different groups.  IC3(1 2) indicates categorization based on value 1 and 2. Dr. D. Dutta Roy, ISI., Kolkata
  40. 40. Chi-square statistics CROSSTABS /TABLES=AW1 BY AW2 /FORMAT=AVALUE TABLES /STATISTICS=CHISQ PHI /CELLS=COUNT /COUNT ROUND CELL.  This examines association between items . For multiple items command is  TABLES=AW1 BY AW2 AW10 AW18 AW19 AW6  In above AW1 IS ROW AND OTHERS ARE IN COL. Dr. D. Dutta Roy, ISI., Kolkata
  41. 41. One-WAY ANOVA ONEWAY total BY EXP /MISSING ANALYSIS.  Here total is dependent variable  EXP is independent variable. Dr. D. Dutta Roy, ISI., Kolkata
  42. 42. COMPUTE SIZE OF SAMPLE /*----------------------------- GETTING INPUT FILE---------------------- -------------------- . GET FILE='C:UsersddroyDesktopIIP_SPSS syntax_workshopinnovation data.sav'. /*----------------------------- SIZE OF SAMPLE -------------------------- ---------------- . compute n=0. compute n=n+1. descriptives n, AW1.  n=0 indicates initialization. N=n+1 indicates summing value following loop. DESCRIPTIVES <n, AW1> indicates comparison between computed n and aw1.  Here AW1 (numeric type and scaling measure) is used to verify the computed N or size of sample. Dr. D. Dutta Roy, ISI., Kolkata
  43. 43. Summary - 5  SPSS-Syntax makes the researcher more systematic in analysis of data. Researcher can fulfill all the assumptions of statistical tool systematically by writing the programs.  The compute command is very powerful as it assists researcher to write own program for analysis of data. Dr. D. Dutta Roy, ISI., Kolkata
