Successfully reported this slideshow.

PSPP overview and Introduction to R & R Commander

1,983 views

Published on

MSc. Medical administration 2014 batch

Published in: Education

PSPP overview and Introduction to R & R Commander

  1. 1. Application Software Statistical Software MSc. (Medical Administration) Program 2014 Post Graduate Institute of Medicine, University of Colombo ,Sri Lanka Dr.B.D.W.Jayamanne M.B.B.S.,MSc.(Biostatistics),MSc.(Biomedical Informatics) 17 - 02 -2014
  2. 2. Outline -1 • • • • • Statistics - overview Data processing Data types in computing Data representation in computers Data Analysis with computers o • o Statistical Software/Package - overview o • Variable types Choose of test o Stand alone -FOSS / Proprietary Online resources Data entering options o o o Spreadsheet Database Statistical software © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  3. 3. • • Outline - 2 PSPP software How to construct PSPP data file o • • • • o How to import other format files How to recode variables Processing data How to analyse - Parametric /Non Parametric o o • Text variables Numeric Variables o Frequency Bivariate Analysis  Cross tab  Correlation  t test Sub group selection Introduction of R software © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  4. 4. Session 01 Overview
  5. 5. The research process – 8 step model © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  6. 6. Statistics What is Statistics(ස ස ‍ ස ස ? ස ස ස ) ස The science of collection, analysis, and making inference / conclusion of data. • • • Collection Analysis Making Inference (* the word statistic(ස ස ‍ ස ස ) has a different ස ස ස ස ස meaning) © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  7. 7. Variable Variable: A quantity that vary from one unit to another ,the quantity referred as a variable. Eg: Height ,Weight,Blood Pressure, Crop yield -one value is no sufficient Discrete - Fixed number of possibilities (Blood Group) Continuous - Infinite number of possibilities (BP) -even within a finite interval © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  8. 8. Constant Constant: Opposite of a variable .If the quantity is not vary from one unit to another that quantity is referred as a constant. Eg. Density of an element - one value is sufficient © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  9. 9. Data Processing - Steps Raw Data Interviews Questionnaires Observations Interview guides Secondary sources Editing Coding <Codebook> Coding the data Verifying the coded data © bdwjayamanne@gmail.com/djayamanne@yahoo.com Analysis Develop frame of analysis Analysis
  10. 10. Data editing • Scrutinizing the completed research instruments (identify and minimize ) Errors o Incompleteness o Misclassification o Information gaps o • Two ways o o One variable at a time One Questionnaire at a time © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  11. 11. Data Types with computers • Boolen • Text/Character/String o o • Numeric o o • Single Multiple Integer Decimal Date /time © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  12. 12. Levels of measurement in statistics 1. Nominal scale a. b. Only indicates category Eg.Religion -Buddhism ,Christianity,Hindu 2. Ordinal scale a. b. in addition to the category,allows cases to be ordered by degree according to the measurement Eg: very poor,Poor,OK,Good,Excellent 3. Interval scale a. b. c. Has units measuring intervals of equal distance between values measured in linear scale No true zero Eg: temperature in Celsius ,Date ,Latitude 4. Ratio scale a. b. Has true zero Not measured in linear scale © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  13. 13. Data type & Scale of measurement Data type Measurement • • Boolean Text • Nominal • Numeric • Ordinal • Date/Time • Interval & Ratio scale © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  14. 14. Data type & Scale of measurement • • Identification of correct data type for the scale of measurement is very important before data entry If wrongly applied o o o o Can’t‍do‍appropriate‍analysis Wrong conclusions Can recode and correct the issues Or can re-enter © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  15. 15. Coding of Questions • Open ended o • o Structured (close ended) o o • Text eg .Name Number eg.age Single Answer  Yes / No - True/False  Likert scale ( Agree -> Strongly disagree )  Multiple Options/List - One Answer More than one answer  Multiple Options/List Combined © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  16. 16. Coding of Questions…. 1.Age : 2. Have you obtained any Postgraduate qualifications 1.Yes 2.No 3.We do not have to worry because Sri Lanka is not much affected by climate change ? 1.Strongly agree 2.Agree 3.No opinion 4.Disagree 5.Strongly disagree 4. You obtain information 1.TV 2.Radio 3.Newspapers 5.Journals 6.Books © bdwjayamanne@gmail.com/djayamanne@yahoo.com 4.Internet
  17. 17. Good Data File Should... Correct coding of Questions Correct Data type Correct scale of measurement Good Data File Good Analysis © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  18. 18. If not…. • • • • • • Error correction Cleaning of data Recoding Import/export Re-Enter Hand calculations ?? Time and resource wasting???? Distress ?? © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  19. 19. Statistical Software • • • Proprietary Free Online © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  20. 20. Proprietary Software list (familiar ) -2014 • • • • • • • • SPSS MiniTAB SAS STATA LISREL MedCalc STATISTICA etc © bdwjayamanne@gmail.com/djayamanne@yahoo.com US $ 5,500 US $ 1,400 US $ 1,440 US $ 620
  21. 21. Free Software list (?? unfamiliar ) • • PSPP - Analog for SPSS R and supportive packages o • • • • o R Commander Red R Epi Info (7) Epi Data Win PEPI Openepi -online Free © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  22. 22. Data entering options • • Not necessarily be a statistical software Spreadsheet o o • o Databases o • Openoffice Calc MS Excel Google Spreadsheet o MS Access etc Statistical package o o o o Epi Data Epi Info PSPP / SPSS etc © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  23. 23. Session 02 Using Statistical Software PSPP
  24. 24. Working with PSPP • • • Ver 0.8.x Perfect Statistics Professionally Presented! Probabilities Sometimes Prevent Problems! People Should Prefer PSPP!! © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  25. 25. Introduction to PSPP • How to Download and install ? o o o o • o Similar features with SPSS o o • Free download Easy to install Light weight (Small in size) http://pspp.awardspace.com/ or simple google search download PSPP o Layout Menu Commands Scripts Datafile & Script compatibility with SPSS © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  26. 26. • • • • Advantages of PSPP Free download / No subscription fees Compatible with SPSS data files (similar) Compatible with SPSS scripts Multiplatform compatible - Has Linux versions (Inter platform portability ) • • • Faster than SPSS > 1 billion variables(SPSS 2.15 billion,Excel 16,000) > 1 billion cases (SPSS 2.15 billion,Excel 1million) © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  27. 27. Windows in PSPP 1.Data Editor(default) a. Data view b. Variable view 2. Output Window 3. Syntax editor © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  28. 28. Data Editor • • Provides a convenient, spreadsheet-like method for creating and editing data files. This window opens automatically when you start a session. Switch windows © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  29. 29. Toolbar - Data View Save File Jump to case Jump to variable OpenFile (Data/Syntax (Script),etc) © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  30. 30. Data view •Data View. This view displays the actual data values or defined value labels. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  31. 31. • Data view Rows are cases. Each row represents a case or an observation. For example, each individual respondent to a questionnaire is a case. • Columns are variables. Each column represents a variable or characteristic that is being measured. For example, each item on a questionnaire is a variable. • Cells contain values. Each cell contains a single value of a variable for a case. The cell is where the case and the variable intersect. Cells contain only data values. **Unlike spreadsheet programs, cells in the Data Editor cannot © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  32. 32. View Data labels Menu Toolbar © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  33. 33. Variable view Variable View. This view displays variable definition information, including defined variable and value labels, data type (for example, string, date, or numeric), measurement level (nominal, ordinal, or scale), and user-defined missing values. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  34. 34. Variable view •Variable View contains descriptions of the attributes of each variable in the data file. In Variable View: •‍Rows‍are‍variables. •‍Columns‍are‍variable‍attributes. •You can add or delete variables and modify attributes of variables, including the following attributes: •‍Variable‍name •‍Data‍type •‍Number‍of‍digits‍or‍characters •‍Number‍of‍decimal‍places •‍Descriptive‍variable‍and‍value‍labels •‍User-defined missing values •‍Column‍width •‍Measurement‍level •All‍of‍these‍attributes‍are‍saved‍when‍you‍save‍the‍data‍file. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  35. 35. Variable view © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  36. 36. Variable Name • • • • Each variable name must be unique; duplication is not allowed. Variable names can be up to 64 bytes long, and the first character must be a letter or one of the characters @, #, or $. Subsequent characters can be any combination of letters and numbers Variable names cannot contain spaces. Can keep space using underscores Reserved keywords cannot be used as variable names. Reserved keywords are ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, and WITH. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  37. 37. Variable Type • Variable Type specifies the data type for each variable. By default, all new variables are assumed to be numeric. You can use Variable Type to change the data type. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  38. 38. Variable Labels • Can assign descriptive variable labels up to 256 characters (128 characters in double-byte languages). Variable labels can contain spaces and reserved characters that are not allowed in variable names. Missing Values • Missing Values defines specified data values as usermissing. For example, you might want to distinguish between data that are missing because a respondent refused to answer and data that are missing because the question didn't apply to that respondent. Data values that are specified as user-missing are flagged for special treatment and are excluded from most calculations. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  39. 39. Value Labels • • You can assign descriptive value labels for each value of a variable. This process is particularly useful if your data file uses numeric codes to represent non-numeric categories (for example, codes of 1 and 2 for male and female). Value labels are saved with the data file. You do not need to redefine value labels each time you open a data file. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  40. 40. Variable Measurement Level • Nominal. A variable can be treated as nominal when its values represent categories with no intrinsic ranking (for example, the department of the company in which an employee works). Examples of nominal variables include region, zip code, and religious affiliation. • Ordinal. A variable can be treated as ordinal when its values represent categories with some intrinsic ranking (for example, levels of service satisfaction from highly dissatisfied to highly satisfied). Examples of ordinal variables include attitude scores representing degree of satisfaction or confidence and preference rating scores. • Scale. A variable can be treated as scale when its values represent ordered categories with a meaningful metric, so that distance comparisons between values are appropriate. Examples of scale variables include age in years and income in thousands of dollars. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  41. 41. Variable Measurement Level Nominal Ordinal Interval Are there different categories ? Yes Yes Yes Can I rank the Categories ? No Yes Yes Can I specify the difference between categories numerically ? No No Yes © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  42. 42. Importing Data Files -Spreadsheets • Should be compatible with data structure © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  43. 43. Importing Data Files -Spreadsheets • From the menus choose – File »Open »Import Data »Select All spreadsheets as the file type you want to view »Open *.xls file Access DBsae Excel Other © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  44. 44. Data entry with value labels © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  45. 45. Compute Variables • • Simple to complex (adding ,subtract,multiply..) Type Conversions © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  46. 46. Recode Variables • • Recode into Same Variables To Recode Values of a Variable © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  47. 47. Recode into Same Variables •The Recode into Same Variables dialog box allows you to reassign the values of existing variables or collapse ranges of existing values into new values. For example, you could collapse salaries into salary range categories. •You can recode numeric and string variables. If you select multiple variables, they must all be the same type. You cannot recode numeric and string variables together. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  48. 48. Recode into Different Variables •The Recode into Different Variables dialog box allows you to reassign the values of existing variables or collapse ranges of existing values into new values for a new variable. For example, you could collapse salaries into a new variable containing salary-range categories. •‍You‍can‍recode‍numeric‍and‍string‍variables. •‍You‍can‍recode‍numeric‍variables‍into‍string‍variables‍and‍ vice versa. •‍If‍you‍select‍multiple‍variables,‍they‍must‍all‍be‍the‍same‍ type. You cannot recode numeric and string variables together. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  49. 49. Analysis Univariate Bivariate Frequency Distribution Crosstabulation Multivariate Conditional tables Scattergrams Partial rank order correlation Regression Multiple and partial correlation Rank order Correlation Multiple and partial Regression Comparison of mean Path analysis © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  50. 50. Univariate : Frequency • • The first thing to do when all the data are collected is to count how many people gave particular answers to each question. We look at how the sample is spread or distributed in the various categories of each variable. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  51. 51. Univariate : Frequency... © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  52. 52. Measuring Central Tendency • • One of the most important way of summarizing a distribution of values for a variable is to establish its Central Tendency Central Tendency : The typical value in a distribution . o The arithmetic mean o The median o The mode © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  53. 53. Measuring Central Tendency... © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  54. 54. Measuring Central Tendency... © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  55. 55. Measuring Dispersion • The amount of variation shown by that distribution is called dispersion.  Range  Variance • • •  Standard Deviation Range : Difference between highest and Lower value in a distribution. Variance : Average amount of deviation from the mean. Standard Deviation : Square root of the variance. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  56. 56. Measuring Dispersion... © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  57. 57. Measuring Dispersion... © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  58. 58. Measuring Dispersion... © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  59. 59. Group Selection (Select Cases) • Specified analysis for a category /selected group © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  60. 60. Bivariate Analysis • The aim of bivariate analysis is to see whether two variables are related. o Cross Tabulation o Bivariate Correlation o t test © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  61. 61. Crosstabulation • Crosstabulation are a way of displaying data so that we fairly readily detect association between two variable. Steps of Crosstabulation • • • • Determine which variable is to be treated as independent. The independent variable is usually placed across the top of the variable and a column is drawn for each category of that variable. The dependent variable is usually placed on the side of the table and a row is drawn for each category of that variable Compare percentages for each subgroups of the independent variable within one category of the dependent variable at a time. © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  62. 62. Crosstabulation © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  63. 63. Bivariate Correlation (Only Pearson Correlation ) © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  64. 64. t Test •In‍order‍to‍determine‍whether‍a‍set‍or‍sets‍of‍ scores are from the same population, a t-test used •There‍are‍three‍main‍types‍of‍t-test: •One‍–Sample •Independent‍groups •Repeated‍measures/Related‍samples • Assumptions © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  65. 65. One –Sample © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  66. 66. Independent groups © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  67. 67. Related Groups © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  68. 68. R software http://cran.r-project.org/bin/windows/base/ © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  69. 69. R commander © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  70. 70. R commander - Import data files © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  71. 71. R commander - Menu commands © bdwjayamanne@gmail.com/djayamanne@yahoo.com
  72. 72. Thank you Have a statistically significant day !

×