Analyzing Quantitative Data


Published on

In writing this book, I tried not to assume that readers have grasped the intricacies of quantitative data analysis as such I have provided the apparatus and the solutions that are needed in analyzing data from stated hypotheses. The purpose for this approach is for junior researchers to thoroughly understand the materials while recognizing the importance of hypothesis testing in scientific inquiry.

Published in: Technology, Education
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Analyzing Quantitative Data

  1. 1. A Simple Guide to the Analysis of Quantitative Data An Introduction with hypotheses, illustrations and references By Paul Andrew Bourne
  2. 2. A Simple Guide to the Analysis of Quantitative Data: An Introduction with hypotheses, illustrations and references By Paul Andrew Bourne Health Research Scientist, the University of the West Indies, Mona Campus Department of Community Health and Psychiatry Faculty of Medical Sciences The University of the West Indies, Mona Campus, Kingston, Jamaica 2
  3. 3. © Paul Andrew Bourne 2009 A Simple Guide to the Analysis of Quantitative Data: An Introduction with hypotheses, illustrations and references The copyright of this text is vested in Paul Andrew Bourne and the Department of Community Health and Psychiatry is the publisher, no chapter may be reproduced wholly or in part without the expressed permission in writing of both author and publisher. All rights reserved. Published April, 2009 Department of Community Health and Psychiatry Faculty of Medical Sciences The University of the West Indies, Mona Campus, Kingston, Jamaica. National Library of Jamaica Cataloguing in Publication Data A catalogue record for this book is available from the National Library of Jamaica ISBN 978-976-41-0231-1 (pbk) Covers were designed and photograph taken by Paul Andrew Bourne 3
  4. 4. Table of Contents Page Preface 8 Menu bar – Contents of the Menu bar in SPSS 11 Function - Purposes of the different things on the menu bar 12 Mathematical symbols (numeric operations), in SPSS 13 Listing of Other Symbols 14 The whereabouts of some SPSS functions, or commands 16 Disclaimer 19 Coding Missing Data 20 Computing Date of Birth 21 List of Figures 26 List of Tables 29 How do I obtain access to the SPSS PROGRAM? 35 1. INTRODUCTION ……………………………………………………………........ 43 1.1.0a: steps in the analysis of hypothesis…………………………………… 45 1.1.1a Operational definitions of a variable………………………………… 47 1.1.1b Typologies of variable ………………..………………………………. 49 1.1.1 Levels of measurement………..………………………………………... 50 1.1.3 Conceptualizing descriptive and inferential statistics ……………….. 59 2. DESCRIPTIVE STATISTICS ANALYZED ….……………………………........ 62 2.1.1 Interpreting data based on their levels of measurement………..……. 64 2.1.2 Treating missing (i.e. non-response) cases…………………….………. 84 3. HYPOTHESES: INTRODUCTION …………………………….………………. 87 3.1.1 Definitions of Hypotheses………………..……..………………………. 88 3.1.2: Typologies of Hypothesis……………………………………………… 89 3.1.3: Directional and non-Directional Hypotheses………………………….. 90 3.1.4 Outliers (i.e. skewness)…………………………….……………………. 91 3.1.5 Statistical approaches for treating skewness…………….……………… 93 4. Hypothesis 1…[using Cross tabulations and Spearman ranked ordered correlation] ……………………………………………………….. 96 A1. Physical and social factors and instructional resources will directly influence the academic performance of students who will write the Advanced Level Accounting Examination; A2. Physical and social factors and instructional resources positively influence the academic performance of students who write the Advanced level Accounting examination and that the relationship varies according to gender; 4
  5. 5. B1. Pass successes in Mathematics, Principles of Accounts and English Language at the Ordinary/CXC General level will positively influence success on the Advanced level Accounting examination; B2. Pass successes in Mathematics, Principles of Accounts and English Language at the Ordinary. 5. Hypothesis 2…………[using Crosstabulations]..…………………………….. 152 There is a relationship between religiosity, academic performance, age and marijuana smoking of Post-primary schools students and does this relationship varies based on gender. 6. Hypothesis 3……….…..…[Paired Sample t-test]…….……………………… 164 There is a statistical difference between the pre-Test and the post-Test scores. 7. Hypothesis 4….………[using Pearson Product Moment Correlation]…..…........ 184 Ho: There is no statistical relationship between expenditure on social programmes (public expenditure on education and health) and levels of development in a country; and H1: There is a statistical association between expenditure on social programmes (i.e. public expenditure on education and health) and levels of development in a country 8. Hypothesis 5….. ………[using Logistic Regression]…………………………........ 199 The health care seeking behaviour of Jamaicans is a function of educational level, poverty, union status, illnesses, duration of illnesses, gender, per capita consumption, ownership of health insurance policy, and injuries. [ Health Care Seeking Behaviour = f( educational levels, poverty, union status, illnesses, duration of illnesses, gender, per capita consumption, ownership of health insurance policy, injuries)] 9. Hypothesis 6….. ……[using Linear Regression] ….………………………….. 207 There is a negative correlation between access to tertiary level education and poverty controlled for sex, age, area of residence, household size, and educational level of parents 10. Hypothesis 7….. ……[using Pearson Product Moment Correlation Coefficient and Crosstabulations]………………………....................... 223 There is an association between the introduction of the Inventory Readiness Test and the Performance of Students in Grade 1 5
  6. 6. 11. Hypothesis 8….…………[using Spearman rho]……………………………….... 232 The people who perceived themselves to be in the upper class and middle class are more so than those in the lower (or working) class do strongly believe that acts of incivility are only caused by persons in garrison communities 12. Hypothesis 9………………………………………………………………........ 235 Various cross tabulations 13. Hypothesis 10………[using Pearson and Crosstabulations]………………........ 249 There is no statistical difference between the typology of workers in the construction industry and how they view 10-most top productivity outcomes 14. Hypothesis 11….…[using Crosstabulations and Linear Regression]……........ 265 Determinants of the academic performance of students 15. Hypothesis 12….……[using Spearman ranked ordered correlation]…........ 278 People who perceived themselves to be within the lower social status (i.e. class) are more likely to be in-civil than those of the upper classes. 16. Data Transformation…………………………………………………........ 281 Recoding 291 Dummying variables 309 Summing similar variables 331 Data reduction 340 Glossary……………..….. ………………………………………………………........ 350 Reference…..………….…………………………………………………………........ 352 Appendices…………..….. ………………………………………………………........ 356 Appendix 1- Labeling non-responses 356 6
  7. 7. Appendix 2- Statistical errors in data 357 Appendix 3- Research Design 359 Appendix 4- Example of Analysis Plan 366 Appendix 5- Assumptions in regression 367 Appendix 6- Steps in running a bivariate cross tabulation 368 Appendix 7- Steps in running a trivariate cross tabulation 380 Appendix 8- What is placed in a cross tabulations table, using the above SPSS output 394 Appendix 9- How to run a Regression in SPSS 395 Appendix 10- Running Regression in SPSS 396 Appendix 11a- Interpreting strength of associations 407 Appendix 11b - Interpreting strength of association 408 Appendix 12- Selecting cases 409 Appendix 13- ‘UNDO’ selecting cases 417 Appendix 14- Weighting cases 420 Appendix 15- ‘Undo’ weighting cases 429 Appendix 15- Statistical symbolisms 440 Appendix 16 – Converting from ‘string’ to ‘numeric’ data – Apparatus One – Converting from string data to numeric data 443 Apparatus Two – Converting from alphabetic and numeric data to all ‘numeric data 447 Appendix 17- Steps in running Spearman rho 454 Appendix 18- Steps in running Pearson’s Product Moment Correlation 459 Appendix 19-Sample sizes and their appropriate sampling error 464 Appendix 20 – Calculating sample size from sampling error(s) 465 Appendix 21 – Sample sizes and their sampling errors 467 Appendix 22 - Sample sizes and their sampling errors 468 Appendix 23 – If conditions 469 Appendix 24 – The meaning of ρ value 477 Appendix 25 – Explaining Kurtosis and Skewness 478 Appendix 26 – Sampled Research Papers 479-560 7
  8. 8. PREFACE One of the complexities for many undergraduate students and for first time researchers is ‘How to blend their socialization with the systematic rigours of scientific inquiry?’ For some, the socialization process would have embedded in them hunches, faith, family authority and even ‘hearsay’ as acceptable modes of establishing the existence of certain phenomena. These are not principles or approaches rooted in academic theorizing or critical thinking. Despite insurmountable scientific evidence that have been gathered by empiricism, the falsification of some perspectives that students hold are difficulty to change as they still want to hold ‘true’ to the previous ways of gaining knowledge. Even though time may be clearly showing those issues are obsolete or even ‘mythological’, students will always adhere to information that they had garnered in their early socialization. The difficulty in objectivism is not the ‘truths’ that it claims to provide and/or how we must relate to these realities, it is ‘how do young researchers abandon their preferred socialization to research findings? Furthermore, the difficulty of humans and even more so upcoming scholars is how to validate their socialization with research findings in the presence of empiricism. Within the aforementioned background, social researchers must understand that ethic must govern the reporting of their findings, irrespective of the results and their value systems. Ethical principles, in the social or natural research, are not ‘good’ because of their inherent construction, but that they are protectors of the subjects (participants) from the researcher(s) who may think the study’s contribution is paramount to any harm that the interviewees may suffer from conducting the study. Then, there is the issue of confidentiality, which sometimes might be conflicting to the personal situations faced by the researcher. I will be simplistic to suggest that who takes precedence is based on the code of conduct that guides that profession. Hence, undergraduate students should be brought into the general awareness that findings must be reported without any form of alteration. This then give rise to ‘how do we systematically investigate social phenomena?’ The aged old discourse of the correctness of quantitative versus qualitative research will not be explored in this work as such a debate is obsolete and by rehashing this here is a pointless dialogue. Nevertheless, this textbook will forward illustrations of how to analyze quantitative data without including any qualitative interpretation techniques. I believe that the problems faced by students as how to interpret statistical data (ie quantitative data), must be addressed as the complexities are many and can be overcome in a short time with assistance. My rationale for using ‘hypotheses’ as the premise upon which to build an analysis is embedded in the logicity of how to explore social or natural happenings. I know that hypothesis testing is not the only approach to examining current germane realities, but that it is one way which uses more ‘pure’ science techniques than other approaches. Hypothesis testing is simply not about null hypothesis, Ho (no statistical relationships), or alternative hypothesis, Ha, it is a systematic approach to the investigation of observable phenomenon. In attempting to make undergraduate students recognize the rich annals of hypothesis testing and how they are paramount to the discovery of social fact, I will 8
  9. 9. recommend that we begin by reading Thomas S. Kuhn (the Scientific Revolution), Emile Durkheim (study on suicide), W.E.B. DuBois (study on the Philadelphian Negro) and the works of Garth Lipps that clearly depict the knowledge base garnered from their usage. In writing this book, I tried not to assume that readers have grasped the intricacies of quantitative data analysis as such I have provided the apparatus and the solutions that are needed in analyzing data from stated hypotheses. The purpose for this approach is for junior researchers to thoroughly understand the materials while recognizing the importance of hypothesis testing in scientific inquiry. Paul Andrew Bourne, Dip Ed, BSc, MSc, PhD Health Research Scientist Department of Community Health and Psychiatry Faculty of Medical Sciences The University of the West Indies Mona-Jamaica. 9
  10. 10. ACKNOWLEDGEMENT This textbook would not have materialized without the assistance of a number of people (scholars, associates, and students) who took the time from their busy schedule to guide, proofread and make invaluable suggestions to the initial manuscript. Some of the individuals who have offered themselves include Drs. Ikhalfani Solan, Samuel McDaniel and Lawrence Nicholson who proofread the manuscript and made suggestions as to its appropriateness, simplicities and reach to those it intend to serve. Furthermore, Mr. Maxwell S. Williams is very responsible for fermenting the idea in my mind for a book of this nature. Special thanks must be extended to Mr. Douglas Clarke, an associate, who directed my thoughts in time of frustration and bewilderment, and on occasions gave me insight on the material and how it could be made better for the students. In addition, I would like to extend my heartiest appreciation to Professor Anthony Harriott and Dr. Lawrence Powell both of the department of Government, UWI, Mona- Jamaica, who are my mentors and have provided me with the guidance, scope for the material and who also offered their expert advice on the initial manuscript. Also, I would like to take this opportunity to acknowledge all the students of Introduction to Political Science (GT24M) of the class 2006/07 who used the introductory manuscript and made their suggestions for its improvement, in particular Ms. Nina Mighty. 10
  11. 11. Menú Bar Content: A social researcher should not only be cognizant of statistical techniques and modalities of performing his/her discipline, but he/she needs to have a comprehensive grasp of the various functions within the ‘menu’ of the SPSS program. Where and what are constituted within the ‘menu bar’; and what are the contents’ functions? ‘Menu bar’ contains the following: - File - Edit - View - Data - Transform - Analyze - Graph - Utilities - Add-ons - Window - Help The functions of the various contents of the ‘menu bar’ are explored overleaf Box 1: Menu Function 11
  12. 12. Menu Bar Functions: Purposes of the different things on the menu bar File – This icon deals with the different functions associated with files such as (i) opening .., (ii) reading …, (iii) saving …, (iv) existing. Edit – This icon stores functions such as – (i) copying, (ii) pasting, (iii) finding, and (iv) replacing. View – Within this lie functions that are screen related. Data – This icon operates several functions such as – (i) defining, (ii) configuring, (iii) entering data, (iv) sorting, (v) merging files, (vi) selecting and weighting cases, and (vii) aggregating files. Transform – Transformation is concerned with previously entered data including (i) recoding, (ii) computing, (iii) reordering, and (vi) addressing missing cases. Analyze – This houses all forms of data analysis apparatus, with a simply click of the Analyze command. Graph – Creation of graphs or charts can begin with a click on Graphs command Utilities – This deals with sophisticated ways of making complex data operations easier, as well as just simply viewing the description of the entered data 12
  13. 13. MATHEMATICAL SYMBOLS (NUMERIC OPERATIONS), in SPSS NUMERIC OPERATIONS FUNCTIONS + Add - Subtract * Multiply / Divide ** Raise to a power () Order of operations < Less than > Greater than <= Less than or equal to >= Greater than or equal to = Equal ~= Not equal to & and: both relations must be true I Or: either relation may be true ~ Negation: true between false, false become true Box 2: Mathematical symbols and their Meanings 13
  14. 14. LISTING OF OTHER SYMBOLS SYMBOLS MEANINGS YRMODA (i.e. yr. month, day) Date of birth (e.g. 1968, 12, 05) a Y intercept b Coefficient of slope (or regression) f frequency n Sample size N Population R Coefficient of correlation, Spearman’s r Coefficient of correlation , Pearson Sy Standard error of estimate W ot Wt Weight µ Mu or population mean β Beta coefficient 3 or χ Measure of skewness ∑ summation σ Standard deviation χ2 Chi-Square or chi square, this is the value use to test for goodness of fit CC Coefficient of Contingency fa Frequency of class interval above modal group fb Frequency of class interval below modal group X A single value or variable _ Adjusted r, which is the coefficient of R correlation corrected for the number of cases _ _ Arithmetic mean of X or Y X or Y RND Round off to the nearest integer SYSMIS This denotes system-missing values MISSING All missing values Type I Error Claiming that events are related (or means are different when they are not Type II Error This assumes that events (or means are not different) when they are Φ Phi coefficient r2 The proportion of variation in the dependent variable explained by the independent variable(s) 14
  15. 15. LISTING OF OTHER SYMBOLS SYMBOLS MEANINGS P(A) Probability of event A P(A/B) Probability of event A given that event B has happened CV Coefficient of variation SE Standard error O Observed frequency X Independent (explanatory, predictor) variable in regression Y Dependent (outcome, response, criterion) variable in regression df Degree of freedom t Symbol for the t ratio (the critical ratio that follows a t distribution R2 Squared multiple correlation in multiple regression 15
  16. 16. FURTHER INFORMATION ON TYPE I and TYPE II Error The Real world The null hypothesis is really…….. True False Finding from your Survey You found that True No Problem Type 2 Error the null hypothesis is: False Type 1 Error No Problem THE WHEREABOUTS OF SOME SPSS FUNCTIONS Functions or Commands Whereabouts, in SPSS (the process in arriving at various commands) Mean, Analyze Mode, Descriptive statistics Median, Frequency Standard deviation, Skewness, or kurtosis, Statistics Range Minimum or maximum Analyze Chi-square Descriptive statistics crosstabs 16
  17. 17. Analyze Pearson’s Moment Correlation Correlate bivariate Analyze Spearman’s rho Correlate Bivariate (ensure that you deselect Pearson’s, and select Spearman’s rho) Analyze Linear Regression Regression Linear Analyze Logistic Regression Regression Binary Analyze Discriminant Analysis Classify Discriminant Analyze Mann-Whitney U Test Nonparametric Test 2 Independent Samples Independent –Sample t-test Analyze Compare means Independent Samples T-Test Analyze Wilcoxon matched-pars test or Nonparametric Test 2 Independent Samples Wilcoxon signed-rank test Analyze t-test Compare means Analyze Paired-samples t-test Compare means Paired-samples T-test Analyze One-sample t-test Compare means One-samples T-test Analyze One-way analysis of variance Compare means One-way ANOVA 17
  18. 18. Analyze Factor Analysis Data reduction Factor Analyze Descriptive (for a single metric Descriptive statistics Descriptive variable) Graphs Graphs (select the appropriate type) Pie chart Bar charts Histogram Graphs Scatter plots Scatter… Data Weighting cases Weight cases…. Select weight cases by Graphs Selecting cases Select cases… If all conditions are satisfied Select If Transform Replacing missing values Missing cases values… Box 3: The whereabouts of some SPSS Functions 18
  19. 19. Disclaimer I am a trained Demographer, and as such, I have undertaken extensive review of various aspects to the SPSS program. However, I would like to make this unequivocally clear that this does not represent SPSS (Statistical Product and Service Solutions, formerly Statistical Package for the Social Sciences) brand. Thus, this text is not sponsored or approved by SPSS, and so any errors that are forthcoming are not the responsibility of the brand name. Continuing, the SPSS is a registered trademark, of SPSS Inc. In the event that you need more pertinent information on the SPSS program or other related products, this may be forwarded to: SPSS UK Ltd., First Floor, St. Andrews House, West Street, Working GU211EB, United Kingdom. 19
  20. 20. Coding Missing Data The coding of data for survey research is not limited to response, as we need to code missing data. For example, several codes indicate missing values and the researcher should know them and the context in which they are applicable in the coding process. No answer in a survey indicates something apart from the respondent’s refusal to answer or did not remember to answer. The fundamental issue here is that there is no information for the respondent, as the information is missing. Table : Missing Data codes for Survey Research Question Refused answer Didn’t know answer No answer recorded Less than 6 categories 7 8 9 More than 7 and less 97 98 99 than 3 digits More than 3 digits 997 998 999 Note Less than 6 categories – when a question is asked of a respondent, the option (or response) may be many. In this case, if the option to the question is 6 items or less, refusal can be 7, didn’t know 8 or no answer 9. Some researchers do not make a distinction between the missing categories, and 999 are used in all cases of missing values (or 99). 20
  21. 21. Computing Date of Birth – If you are only given year of birth Step 1 Step 1: First, select transform, and then compute 21
  22. 22. Step 2 On selecting ‘compute variable’ it will provide this dialogue box 22
  23. 23. Step 3 In the ‘target variable’, write the word which the researcher wants to use to represents the idea 23
  24. 24. Step 4 If the SPSS program is more than 12.0 (ie 13 – 17), the next process is to select all in ‘function group’ dialogue box In order to convert year of birth to actual ‘age’, select ‘Xdate.Year’ 24
  25. 25. Step 5 Replace the ‘?’ mark with variable in the dataset Having selected XYear, use this arrow to take it into the ‘Numeric Expression’ dialogue box 25
  26. 26. LISTING OF FIGURES AND TABLES Listing of Figures Figure 1.1.1: Flow Chart: How to Analyze Quantitative Data? Figure 1.1.2: Properties of a Variable. Figure 1.1.3: Illustration of Dichotomous Variables Figure 1.1.4: Ranking of the Levels of Measurement Figure 1.1.5: Levels of Measurement Figure 2.1.0: Steps in Analyzing Non-Metric Data Figure 2.1.1: Respondents’ Gender Figure 2.1.2: Respondents’ Gender Figure 2.1.3: Social Class of Respondents Figure 2.1.4: Social Class of Respondents Figure 2.1.5: Steps in Analyzing Metric Data Figure 2.1.6: ‘Running’ SPSS for a Metric Variable Figure 2.1.7: ‘Running’ SPSS for a Metric Variable Figure 2.1.8: ‘Running’ SPSS for a Metric Variable Figure 2.1.9: ‘Running’ SPSS for a Metric Variable Figure 2.1.10: ‘Running’ SPSS for a Metric Variable Figure 2.1.11: ‘Running’ SPSS for a Metric Variable Figure 2.1.12: ‘Running’ SPSS for a Metric Variable Figure 2.1.13: ‘Running’ SPSS for a Metric Variable Figure 2.1.14: ‘Running’ SPSS for a Metric Variable Figure 2.1.15: ‘Running’ SPSS for a Metric Variable 26
  27. 27. Figure 2.1.16: ‘Running’ SPSS for a Metric Variable Figure 4.1.1: Age - Descriptive Statistics Figure 4.1.2: Gender of Respondents Figure 4.1.3: Respondent’s parent educational level Figure 4.1.4: Parental/Guardian Composition for Respondents Figure 4.1.5: Home Ownership of Respondent’s Parent/Guardian Figure 4.1.6: Respondents’ Affected by Mental and/or Physical Illnesses Figure 4.1.7: Suffering from mental illnesses Figure 4.1.8: Affected by at least one Physical Illnesses Figure 4.1.9: Dietary Consumption for Respondents Figure 6.1.2: Typology of Previous School Figure 6.1.3: Skewness of Examination i (i.e. Test i) Figure 6.1.4: Skewness of Examination ii (i.e. Test ii) Figure 6.1.5: Perception of Ability Figure 6.1.6: Self-perception Figure 6.1.7: Perception of task Figure 6.1.8: Perception of utility Figure 6.1.9: Class environment influence on performance Figure 6.1.10: Perception of Ability Figure 6.1.11: Self-perception Figure 6.1.12: Self-perception Figure 6.1.13: Perception of task Figure 6.1.14: Perception of Utility 27
  28. 28. Figure 6.1.15: Class Environment influence on Performance Figure 7.1.1: Frequency distribution of total expenditure on health as % of GDP Figure 7.1.2: Frequency distribution of total expenditure on education as % of GNP Figure 7.1.3: Frequency distribution of the Human Development Index Figure 7.1.4: Running SPSS for social expenditure on social programme Figure 7.1.5: Running bivariate correlation for social expenditure on social programme Figure 7.1.6: Running bivariate correlation for social expenditure on social programme Figure13.1.1: Categories that describe Respondents’ Position Figure13.1.2: Company’s Annual Work Volume Figure13.1.3: Company’s Labour Force – ‘on an averAge per year’ Figure13.1.4: Respondents’ main Area of Construction Work Figure13.1.5: Percentage of work ‘self-performed’ in contrast to ‘sub-contracted’ Figure13.1.6: Percentage of work ‘self-performed’ in contrast to ‘sub-contracted’ Figure 13.1.7: Years of Experience in Construction Industry Figure13.1.8: Geographical Area of Employment Figure13.1.9: Duration of service with current employer Figure13.1.10: Productivity changes over the past five years Figure 14.1.1: Characteristic of Sampled Population Figure 14.1.2: Employment Status of Respondents 28
  29. 29. Listing of Tables Table 1.1.1: Synonyms for the different Levels of measurement Table 1.1.2: Appropriateness of Graphs, from different Levels of measurement Table 1.1.3: Levels of measurement1 with examples and other characteristics Table1.1.4: Levels of measurement, and measure of central tendencies and measure of variability Table1.1.5: combinations of Levels of measurement, and types of statistical Test which are application Table 1.1.6a: Statistical Tests and their Levels of Measurement Table 1.1.6b: Table 2.1.1a: Gender of Respondents Table 2.1.1b: General happiness Table 2.1.2: Social Status Table 2.1.3: Descriptive Statistics on the Age of the Respondents Table 2.1.4:“From the following list, please choose what the most important characteristic of democracy …are for you” Table 4.1.1: Respondents’ Age Table 4.1.2 (a) Univariate Analysis of the explanatory Variables Table 4.1.2(b): Univariate Analysis of explanatory Table 4.1.2 (c): Univariate Analysis of explanatory Table 4.1.3: Bivariate Relationships between academic performance and subjective Social Class (n=99) 1 29
  30. 30. Table 4.1.4: Bivariate Relationships between comparative academic performance and subjective Social Class (n=108) Table 4.1.5: Bivariate Relationships between academic performance and physical exercise (n= 111) Table 4.1.6 (i): Bivariate Relationships between academic performance and instructional materials (n=113) Table 4.1.6 (ii) Relationship between academic performance and materials among students who will be writing the A’ Level Accounting Examination, 2004 Table 4.1.7: Bivariate Relationships between academic performance and Class attendance (n= 106) Table 4.1.8: Bivariate Relationship between academic performance and attendance Table 4.1.9: Bivariate Relationships between academic performance and breakfast consumption, (n=114) Table 4.1.10: Relationship between academic performances and breakfasts consumption among A’ Level Accounting students, controlling for Gender Table 4.1.11: Bivariate Relationships between academic performance and migraine (n=116) Table 4.1.12: Bivariate Relationships between academic performance and mental illnesses, (n=116) Table 4.1.13: Bivariate Relationships between academic performance and physical illnesses, (n=116) Table 4.1.14: Bivariate Relationships between academic performance and illnesses (n=116) Table 4.1.15. Bivariate Relationships between current academic performance and past performance in CXC/GCE English language Examination, (n= 112) Table 4.1.16: Bivariate Relationships between academic performance and past performance in CXC/GCE English language Examination, controlling for Gender Table 4.1.17: Bivariate Relationships between academic performance and past performance in CXC/GCE Mathematics Examination n= Table 4.1.18 (i): Bivariate Relationships between academic performance and past performance in CXC/GCE principles of accounts Examination (n= 114) 30
  31. 31. Table 4.1.19 (ii): Bivariate Relationships between academic performance and past performance in CXC/GCEPOA Examination, controlling for Gender Table 4.1.20: Bivariate Relationships between academic performance and Self-Concept (n= 112) Table 4.1.21: Bivariate Relationships between academic performance and Dietary Requirements (n=116) Table 4.1.22: Summary of Tables Table 5.1.1: Frequency and percent Distributions of explanatory model Variables Table 5.1.2: Relationship between Religiosity and Marijuana Smoking (n=7,869) Table 5.1.3: Relationship between Religiosity and Marijuana Smoking controlled for Gender Table 5.1.4: Relationship between Age and marijuana smoking (n=7,948) Table 5.1.5: Relationship between marijuana smoking and Age of Respondents, controlled for sex Table 5.1.6: Relationship between academic performances and marijuana smoking, (n=7,808) Table 5.1.7: Relationship between academic performances and marijuana smoking, controlled for Gender Table 5.1.8: Summary of Tables Table 6.1.1: Age Profile of respondent Table 6.1.2: Examination Scores Table 6.1.3(a): Class Distribution by Gender Table 6.1.3(b): Class Distribution by Age Cohorts Table 6.1.3(c): Pre-Test Score by Typology of Group Table 6.1.3(c): Pre-Test Score by Typology of Group Table 6.1.4: Comparison of Examination I and Examination II Table 6.1.5: Comparison a Cross the Group by Tests 31
  32. 32. Table 6.1.6: Analysis of Factors influence on Test ii Scores Table 6.1.7: Cross-Tabulation of Test ii Scores and Factors Table 6.1.8: Bivariate Relationship between student’s Factors and Test ii Scores Table 7.1.1: Descriptive Statistics - total expenditure on public health (as Percentage of GNP HRD, 1994) Table 7.1.2: Descriptive Statistics of expenditure on public education (as Percentage of GNP, Hrd, 1994) Table 7.1.3: Descriptive Statistics of Human Development (proxy for development) Table 7.1.4: Bivariate Relationships between dependent and independent Variables Table 7.1.5: Summary of Hypotheses Analysis Table8.1.1: Age Profile of Respondents (n = 16,619) Table 8.1.2: Logged Age Profile of Respondents (n = 16,619) Table 8.1.3: Household Size (all individuals) of Respondents Table 8.1.4: Union Status of the sampled Population (n=16,619) Table 8.1.5: Other Univariate Variables of the Explanatory Model Table 8.1.6: Variables in the Logistic Equation Table 8.1.7: Classification Table Table 8.1.1: Univariate Analyses Table 8.1.2: Frequency Distribution of Educational Level by Quintile Table 8.1.3: Frequency Distribution of Jamaica’s Population by Quintile and Gender Table 8.1.4: Frequency Distribution of Educational Level by Quintile Table 8.1.5: Frequency Distribution of Pop. Quintile by Household Size Table 8.1.6: Bivariate Analysis of access to Tertiary Edu. and Poverty Status Table 8.1.7: Bivariate Analysis of access to Tertiary Edu. and Geographic Locality of Residents 32
  33. 33. Table 8.1.8: Bivariate Analysis of geographic locality of residents and poverty Status Table 8.1.9: Bivariate Relationship between access to tertiary level education by Gender Table 8.1.10: Bivariate Relationship between Access to Tertiary Level Education by Gender controlled for Poverty Status Table 8.1.11: Regression Model Summary Table 10.1.1: Univariate Analysis of Parental Information Table 10.1.2: Descriptive on Parental Involvement Table 10.1.3: Univariate Analysis of Teacher’s Information Table 10.1.4: Univariate Analysis of ECERS-R Profile Table 10.1.5: Bivariate Analysis of Self-reported Learning Environment and Mastery on Inventory Test Table 10.1.6: Relationship between Educational Involvement, Psychosocial and Environment involvement and Inventory Test Table 10.1.6: Relationship between Educational Involvement, Psychosocial and Environment Involvement and Inventory Test Table 10.1.8: School Type by Inventory Readiness Score Table 11.1.1: Incivility and Subjective Social Status Table 12.1.2: Have you or someone in your family known of an act of Corruption in the last 12 months? Table 12.1.3: Gender of Respondent Table 12.1.4: In what Parish do you live? Table 12.1.5: Suppose that you, or someone close to you, have been a victim of a crime. What would you do...? Table 12.1.6: What is your highest level of Education? Table 12.1.7: In terms of Work, which of these best describes your Present situation? Table 12.1.8: Which best represents your Present position in Jamaica Society? Table 12.1.9: Age on your last Birthday? Table 12.1.10: Age categorization of Respondents 33
  34. 34. Table 12.1.11: Suppose that you, or someone close to you, have been a victim of a crime. what would you do... by Gender of respondent Cross Tabulation Table 12.1.12: If involved in a dispute with neighbour and repeated discussions have not made a difference, would you...? by Gender of respondent Cross Tabulation Table 12.1.13: Do you believe that corruption is a serious problem in Jamaica? by Gender of respondent Cross Tabulation Table 12.1.14: have you or someone in your family known of an act of corruption in the last 12 months? by Gender of respondent Cross Tabulation Table 14.1.1: Marital Status of Respondents Table 14.1.2: Marital Status of Respondents by Gender Table 14.1.3: Marital Status by Gender by Age cohort Table 14.1.4: Marital Status by Gender by Age Cohort Table 14.1.5 Educational Level by Gender by Age Cohorts Table 14.1.6: Income Distribution of Respondents Table 14.1.7: Parental Attitude Toward School Table 14.1.8: Parent Involving Self Table 14.1.9: School Involving Parent Table 14.1.8: Regression Model Summary Table 15.1.1: Correlations Table 15.1.2: Cross Tabulation between incivility and social status 34
  35. 35. How do I obtain access to the SPSS PROGRAM? Step One: In order to access the SPSS program, the student should select ‘START’ to the bottom left hand corner of the computer monitor. This is followed by selecting ‘All programs’ (see below). Select ‘START’ and then ‘All Program 35
  36. 36. Step Two: The next step to the select ‘SPSS for widows’. Having chosen ‘SPSS for widows’ to the right of that appears a dialogue box with the following options – SPSS for widows; SPSS 12.0 (or 13.0…or, 15.0); SPSS Map Geo-dictionary Manager Ink; and last with SPSS Manager. Select ‘SPSS for widows’ 36
  37. 37. Step Three: Having done step two, the student will select SPSS 12.0 (or 13.0, or 14.0 or 15.0) for Widows as this is the program with which he/she will be working. Select SPSS 12.0 (or 13.0, or 14.0 or 15.0) for Widows 37
  38. 38. Step Four: On selecting ‘SPSS for widows’ in step 3, the below dialogue box appears. The next step is the select ‘OK’, which result in what appears in step five. Select ‘OK’ 38
  39. 39. Step Five: 39
  40. 40. What should I now do? The student should then select the ‘inner red box’ with the ‘X’. Select the ‘inner red box’ with the X’. 40
  41. 41. Step Six: This is what the SPSS spreadsheet looks like (see Figure below). 41
  42. 42. 42
  43. 43. Step Seven: What is the difference here? Look to the bottom left-hand cover the spreadsheet and you will see two terms – (1) ‘Data View’ and (2) ‘Variable View’. Data View accommodates the entering of the data having established the template in the ‘Variable View’. Thus, the variable view allows for the entering of data (i.e. responses from the questionnaires) in the ‘Data View’. Ergo, the student must ensure that he/she has established the template, before any typing can be done in the ‘Data View. widow looks like ‘Data View’ Observe what the Data View 43
  44. 44. 44 Variable View Observe what the ‘Variable View’ widow looks like
  45. 45. CHAPTER 1 1.1.0a: INTRODUCTION This book is in response to an associate’s request for the provision of some material that would adequately provide simple illustrations of ‘How to analyze quantitative data in the Social Sciences from actual hypotheses’. He contended that all the current available textbooks, despite providing some degree of analysis on quantitative data, failed to provide actual illustrations of cases, in which hypotheses are given and a comprehensive assessment made to answer issues surrounding appropriate univariate, bivariate and/or multivariate processes of analysis. Hence, I began a quest to pursued textbooks that presently exist in ‘Research Methods in Social Sciences’, ‘Research Methods in Political Sciences’, “Introductory Statistics’, ‘Statistical Methods’, ‘Multivariate Statistics’, and ‘Course materials on Research Methods’ which revealed that a vortex existed in this regard. Hence, I have consulted a plethora of academic sources in order to formulate this text. In wanting to comprehensively fulfill my friend’s request, I have used a number of dataset that I have analyzed over the past 6 years, along with the provision of key terminologies which are applicable to understanding the various hypotheses. I am cognizant that a need exist to provide some information in ‘Simple Quantitative Data Analysis’ but this text is in keeping with the demand to make available materials for aiding the interpretation of ‘quantitative data’, and is not intended to unveil any new materials in the discipline. The rationale behind this textbook is embedded in simple reality that many undergraduate students are faced with the complex task of ‘how to choose the most appropriate statistical test’ and this becomes problematic for them as the issue of wanting to complete an 45
  46. 46. assignment, and knowing that it is properly done, will plague the pupil. The answer to this question lies in the fundamental issues of - (1) the nature of the variables (continuous or discrete), and (2) what is the purpose of the analysis – is to mere description, or to provide statistical inference and/or (3) if any of the independent variables are covariates2. Nevertheless, the materials provided here are a range of research projects, which will give new information on particular topics from the hypothesis to the univariate analysis and the bivariate or multivariate analyses. 2 “If the effects of some independent variables are assessed after the effects of other independent variables are statistically removed…” (Tabachnick and Fidell 2001, 17) 46
  47. 47. 1.1.0b: STEPS IN ANALYZING A HYPOTHESIS One of the challenges faced by a social researcher is how to succinctly conceptualize (i.e. define) his/her variables, which will also be operationalized (measured) for the purpose of the study. Having written a hypothesis, the researcher should identify the number of variables which are present, from which we are to identify the dependent from the independent variables. Following this he/she should recognize the level of measurement to which each variable belongs, then the which statistical test is appropriate based on the level of measurement combination of the variables. The figure below is a flow chart depicting the steps in analyzing data when given a hypothesis. The production of this text is in response to the provision of a simple book which would address the concerns of undergraduate students who must analyze a hypothesis. Among the issues raise in this book are (1) the systematic steps involved in the completion of analyzing a hypothesis, (2) definitions of a hypothesis, (3) typologies of hypothesis, (4) conceptualization of a variable, (4) types of variables, (5) levels of measurement, (6) illustration of how to perform SPSS operations on the description of different levels of measurement and inferential statistics, (7) Type I and II errors, (8) arguments on the treatment of missing variables as well as outliers, (9) how to transform selected quantitative data, (10) and other pertinent matters. The primary reason behind the use of many of the illustrations, conceptualizations and peripheral issues rest squarely on the fact the reader should grasp a thorough understanding of how the entire process is done, and the rationale for the used method. 47
  48. 48. STEP ONE STEP TEN Write your Having used the Hypothesis STEP TWO test, Identify the analyze the data variables from the carefully, based on hypothesis the statistical test STEP TEN STEP THREE Choose the Define and appropriate operationalize statistical test based each variable on the combination selected from the of DV and IVS, and hypothesis STEP NINE STEP FOUR ANALYZING If statistical Inference is needed, look at the QUANTITATIVE Decide on the level combination DV and DATA of measurement IV(s) for each variable STEP EIGHT STEP FIVE If statistical association, causality Decide which or predictability is need, continue, if not variable is DV, and stop! IV STEP SIX STEP SEVEN Check for Do descriptive skewness, and/or statistics for chosen outliers in metric variables selected variables FIGURE 1.1.1: FLOW CHART: HOW TO ANALYZE QUANTITATIVE DATA? This entire text is ‘how to analyze quantitative data from hypothesis’, but based on Figure 1.1.1, it may appear that a research process begins from a hypothesis, but this is not the case. Despite that, I am emphasizing interpreting hypothesis, which is the base for this monograph starting from an actual hypothesis. Thus, before I provide you with operational definitions of 48
  49. 49. variables, I will provide some contextualization of ‘what is a variable?’ then the steps will be worked out. 49
  50. 50. 1.1.1a: DEFINITIONS OF A VARIABLE Undergraduates and first time researchers should be aware that quantitative data analysis are primarily based on (1) empirical literature, (2) typologies of variables within the hypothesis, (3) conceptualization and operationalization of the variables, (4) the level of measurement for each variables. It should be noted that defining a variable is simply not just the collation a group of words together, because we feel a mind to as each variable requires two critical characteristics in order that it is done properly (see Figure 1.1.2). PROPERITIES OF A VARIABLE MUTUAL EXCLUSIVITIY EXHAUSTIVNESS FIGURE 1.1.2: PROPERTIES OF A VARIABLE. In order to provide a comprehensive outlook of a variable, I will use the definitions of a various scholars so as to give a clear understanding of what it is. “Variables are empirical indicators of the concepts we are researching. Variables, as their name implies, have the ability to take on two or more values...The categories of each variable must have two requirements. They should be both exhaustive and mutually exclusive. By exhaustive, we mean that the categories of each variable must be comprehensive enough that it is possible to categorize every observation” (Babbie, Halley, and Zaino 2003, 11). “.. Exclusive refers to the fact that every observation should fit into only one category “(Babbie, Halley and Zaino 2003, 12) “A variable is therefore something which can change and can be measured.” (Boxill, Chambers and Wint 1997, 22) 50
  51. 51. “The definition of a variable, then, is any attribute or characteristic of people, places, or events that takes on different values.” (Furlong, Lovelace, Lovelace 2000, 42) “A variable is a characteristic or property of an individual population unit” (McClave, Benson and Sincich 2001, 5) “Variable. A concept or its empirical measure that can take on multiple values” (Neuman 2003, 547). “Variables are, therefore, the quantification of events, people, and places in order to measure observations which are categorical (i.e. nominal and ordinal data) and non-categorical (i.e. metric) in an attempt to be informed about the observation in reality. Each variable must fill two basic conditions – (i) Exhaustiveness – the variable must be so defined that all tenets are captured as its is comprehensive enough include all the observations, and (ii) mutually exclusivity – the variable should be so defined that it applies to one event and one event only – (i.e. Every observation should fit into only one category) (Bourne 2007). One of the difficulties of social research is not the identification of a variable or variables in the study but it’s the conceptualization and oftentimes the operationalization of chosen construct. Thus, whereas the conceptualization (i.e. the definition) of the variable may (or may not) be complex, it is the ‘how do you measure such a concept (i.e. variable) which oftentimes possesses the problem for researchers. Why this must be done properly bearing in mind the attributes of a variable, it is this operational definition, which you will be testing in the study (see Typologies of Variables, below). Thus, the testing of hypothesis is embedded within variables and empiricism from which is used to guide present studies. Hypothesis testing is a technique that is frequently employed by demographers, statisticians, economists, psychologists, to name new practitioners, who are concerned about the testing of theories, and the verification of reality truths, and the modifications of social realities within particular time, space and settings. With this being said, researchers must ensure that a variable is properly defined in an effort to ensure that the stated phenomenon is so defined and measured. 51
  52. 52. 1.1.1b TYPOLOGIES of VARIABLE (examples, using Figure 1.1.2, above) Health care seeking behaviour: is defined as people visiting a health practitioner or health consultant such as doctor, nurse, pharmacist or healer for care and/ or advice. Levels of education: This is denominated into the number of years of formal schooling that one has completed. Union status – It is a social arrangement between or among individuals. This arrangement may include ‘conjugal’ or a social state for an individual. Gender: A sociological state of being male or female. Per capita income: This is used a proxy for income of the individual by analyzing the consumption pattern. Ownership of Health insurance: Individuals who possess of an insurance polic/y (ies). Injuries: A state of being physically hurt. The examples here are incidences of disability, impairments, chronic or acute cuts and bruises. Illness: A state of unwellness. Age: The number of years lived up to the last birthday. Household size - The numbers of individuals, who share at least one common meal, use common sanitary convenience and live within the same dwelling. Now that the premise has been formed, in regard to the definition of a variable, the next step in the process is the category in which all the variables belong. Thus, the researcher needs to know the level of measurement for each variable - nominal; ordinal; interval, or ration (see 1.1.2a). 52
  53. 53. 1.1.2a: LEVELS OF MEASUREMENT3: Examples and definitions Nominal - The naming of events, peoples, institutions, and places, which are coded numerical by the researcher because the variable has no normal numerical attributes. This variable may be either (i) dichotomous, or (ii) non-dichotomous. Dichotomous variable – The categorization of a variable, which has only two sub- groupings - for example, gender – male and female; capital punishment – permissive and restrictive; religious involvement – involved and not involved. Non-dichotomous variable – The naming of events which span more than two sub-categories (example Counties in Jamaica – Cornwall, Middlesex and Surrey; Party Identification – Democrat, Independent, Republican; Ethnicity – Caucasian, Blacks, Chinese, Indians; Departments in the Faculty of Social Sciences – Management Studies, Economics, Sociology, Psychology and Social Work, Government; Political Parties in Jamaica – Peoples’ National Party (PNP), Jamaica Labour Party (JLP), and the National Democratic Movement (NDM); Universities in Jamaica – University of the West Indies; University of Technology, Jamaica; Northern Caribbean University; University College of the Caribbean; et cetera) Ordinal - Rank-categorical variables: Variables which name categories, which by their very nature indicates a position, or arrange the attributes in some rank ordering (The examples here are as follows i) Level of Educational Institutions – Primary/Preparatory, All-Age, Secondary/High, Tertiary; ii) Attitude toward gun control – strongly oppose, oppose, favour, strongly favour; iii) Social status – upper--upper, upper-middle, middle-middle, lower-middle, lower class; iv) Academic achievement – A, B, C, D, F. Interval or ratio These variables share all the characteristics of a nominal and an ordinal variable along with an equal distance between each category and a ‘true’ zero value – (for example – age; weight; height; temperature; fertility; votes in an election, mortality; population; population growth; migration rates, . Now that the definitions and illustrations have been provided for the levels of measurement, the student should understand the position of these measures (see 1.1.2b). 3 Stanley S. Stevens is created for the development of the typologies of scales – level of measurement – (i) nominal, (ii) ordinal, (iii) interval and (iv) ratio. (see Steven 1946, 1948, 1968; Downie and Heath 1970) 53
  54. 54. Dichotomy (or Dichotomous variable Typologies of Gender Science Book Non- Fictional Male Female Pure Applied Fictional Alive Dead Induction Deduction Non- Parametric Burial Non-burial parametric statistics statistics Religious Non-religious Non- use primary use secondary Decomposed data data service service decomposed Figure 1.1.3: Illustration of dichotomous variables 54
  55. 55. 1.1.2b: RANKING LEVELS OF MEASUREMENT RATIO highes t INTERVAL ORDINAL lowest NOMINAL Figure 1.1.4: Ranking of the levels of measurement The very nature of levels of measurement allows for (or do not allow for) data manipulation. If the level of measurement is nominal (for example fiction and non-fiction books), then the researcher does not have a choice in the reconstruction of this variable to a level which is below it. If the level of measurement, however, is ordinal (for example no formal education, primary, secondary and tertiary), then one may decide to use a lower level of measure (for example below secondary and above secondary). The same is possible with an interval variable. The social scientist may want to use one level down, ordinal, or two levels down, nominal. This is equally the same of a ratio variable. Thus, the further ones go up the pyramid, the more scope exists in data transformation. 55
  56. 56. Table 1.1.1: Synonyms for the different Levels of measurement Levels of Measurement Other terms Nominal Categorical; qualitative, discrete4 Ordinal Qualitative, discrete; rank-ordered; categorical Interval/Ratio Numerical, continuous5, quantitative; scale; metric, cardinal Table 1.1.2: Appropriateness of Graphs for different levels of measurement Levels of Measurement Graphs Bar chart Pie chart Histogram Line Graph Nominal √ √ __ __ √ √ __ __ Ordinal __ __ √ √ Interval/Ratio (or metric) 4 Discrete variable – take on a finite and usually small number of values, and there is no smooth transition from one value or category to the next – gender, social class, types of community, undergraduate courses 5 Continuous variables are measured on a scale that changes values smoothly rather than in steps 56
  57. 57. Table 1.1.3: Levels of measurement6 with Examples and Other Characteristics Levels of Measurement Nominal Ordinal Interval Ratio Examples Gender Social class Temperature Age Religion Preference Shoe size Height Political Parties Level of education Life span Weight Race/Ethnicity Gender equity Reaction time Political Ideologies levels of fatigue Income; Score on an Exam. Noise level Fertility; Population of a country Job satisfaction Population growth; crime rates Mathematical properties Identity Identity Identity Identity ____ Magnitude Magnitude Magnitude ____ _____ Equal Interval Equal interval ____ _____ _____ True zero Mathematical Operation(s) None Ranking Addition; Addition; Subtraction Subtraction; Division; Multiplication Compiled: Paul A. Bourne, 2007; a modification of Furlong, Lovelace and Lovelace 2000, 74 6 “Levels of measurement concern the essential nature of a variable, and it is important to know this because it determines what one can do with a variable (Burham, Gilland, Grant and Layton-Henry 2004, 114) 57
  58. 58. Table1.1.4: Levels of measurement, Measure of Central Tendency and Measure of Variability Levels of Measurement Measure of central tendencies Measure of variability Mean Mode Median Mean deviation Standard deviation Nominal NA √ NA NA NA Ordinal NA √ √ NA NA Interval/Ratio7 √ √ √ √ √ NA denotes Not Applicable 7 Ratio variable is the highest level of measurement, with nominal being first (i.e. lowest); ordinal, second; and interval, third. 58
  59. 59. Table1.1.5: Combinations of Levels of measurement, and types of Statistical test which are applicable8 Levels of Measurement Statistical Test Dependent Independent Variable Nominal Nominal Chi-square Nominal Ordinal Chi-square; Mann-Whitney Nominal Interval/ratio Binomial distribution; ANOVA; Logistic Regression; Kruskal-Wallis Discriminant Analysis Ordinal Nominal Chi-square Ordinal Ordinal Chi-square; Spearman rho; Ordinal Interval/ratio Kruskal-Wallis H; ANOVA Interval/ratio Nominal ANOVA; Interval/ratio Ordinal Interval/ratio Interval/ratio Pearson r, Multiple Regression Independent-sample t test Table 1.1.5 depicts how a dependent variable, which for example is nominal, which when combined with an independent variable, Nominal, uses a particular statistical test. 8 One of the fundamental issues within analyzing quantitative data is not merely to combine then interpret data, but it is to use each variable appropriately. This is further explained below. 59
  60. 60. STATISTICAL TESTS AND THEIR LEVELS OF MEASUREMENT Test Independent Dependent Variable variable Chi-Square (χ2) Nominal, Ordinal Nominal, Ordinal Mann-Whitney U Dichotomous Nominal, Ordinal test Kruskal-Wallis H Non-dichotomous, Ordinal, or skewed9 test Ordinal Metric Pearson’s r Normally distributed10 Normally distributed Metric Metric Linear Regress Normally distributed Normally distributed Metric, dummy Metric Independent Dichotomous Normally distributed Samples Metric T-test AVONA Nominal, Ordinal Normally distributed (non-dichotomous11) Metric Logistic regression Metric, dummy Dichotomous (skewed values or otherwise Discriminant Metric, dummy Dichotomous (normally distributed analysis value) Notes to Table 1.1.6b Chi-Square (χ2) Used to test for associations between two variables Mann-Whitney U test Used to determine differences between two groups Kruskal-Wallis H test Used to determine differences between three or more groups Pearson’s r Used to determine strength and direction of a relationship between two values Linear Regression Used to determine strength and direction of a relationship between two or more values Independent Samples T-test Used to determine difference between two groups AVONA Used to determine difference between three or more groups Logistic regression Used to predict relationship between many values Discriminant analysis Used to predict relationship between many values 9 Skewness indicates that there is a ‘pileup’ of cases to the left or right tail of the distribution 10 Normality is observed, whenever, the values of skewness and kurtosis are zero 11 Non-dichotomous (i.e. polytomous) which denotes having many (i.e. several) categories 61
  61. 61. LEVELS OF MEASURMENT AND THEIR MEASURING ASSOCIATION LEVELS OF MEASUREMENT NOMINAL ORDINAL INTERVAL/RATIO Lambda Gamma Pearson’s r Cramer’s V Somer’s D Contingency coefficients Kendall ‘s tau-B Phi Kendall’s tau-c Figure 1.1.5: Levels of measurement ‫ג‬ Lambda ( ) – This is a measure of statistical relationship between the uses of two nominal variables Phi (Φ) – This is a measure of association between the use of two dichotomous variables (i.e. dichotomous dependent and dichotomous independent) – [Φ = √[ χ2/N] Cramer’s V (V) – This is a measure of association between the use of two nominal variables (i.e. in the event that there is dichotomous dependent and dichotomous independent) – V = √[ χ2/N(k – 1)] is identical to phi. γ Gamma ( ) – This is used to measure the statistical association between ordinal by ordinal variable Contingency coefficient (cc) – Is used for association in which the matrix is more than 2 X 2 (i.e. 2 for dependent and 2 for the independent – for example 2X3; 3X2; 3X3 …) - √ [χ2/ χ2 + N] Pearson’s r – This is used for non-skewed metric variables - n∑xy - ∑x.∑y √ [n∑x2 – (∑x) 2 - [n∑y2 – (∑y) 2 62
  62. 62. 1.1.3: CONCEPTUALIZING DESCRIPTIVE AND INFERENTIAL STATISTICS Research is not done in isolation from the reality of the wider society. Thus, the social researcher needs to understand whether his/her study is descriptive and/or inferential as it guides the selection of certain statistical tools. Furthermore, an understanding of two constructs dictate the extent to which the analyst will employ as there is a clear demarcation between descriptive and inferential statistics. In order to grasp this distinction, I will provide a number of authors’ perspectives on each terminology. “Descriptive statistics describe samples of subjects in terms of variables or combination of variables” (Tabachnick and Fidell 2001, 7) “Numerical descriptive measures are commonly used to convey a mental image of pictures, objects, tables and other phenomenon. The two most common numerical descriptive measures are: measures of central tendencies and measures of variability (McDaniel 1999, 29; see also Watson, Billingsley, Croft and Huntsberger 1993, 71) “Techniques such as graphs, charts, frequency distributions, and averages may be used for description and these have much practical use” (Yamane 2973, 2; see also Blaikie 2003, 29; Crawshaw and Chambers 1994, Chapter 1) “Descriptive statistics – statistics which help in organizing and describing data, including showing relationships between variables” (Boxill, Chamber and Wind 1997, 149) 63
  63. 63. “We’ll see that there are two areas of statistics: descriptive statistics, which focuses on developing graphical and numeral summaries that describes some…phenomenon, and inferential statistics, which uses these numeral summaries to assist in making… decisions” (McClave, Benson, Sinchich 2001, 1) “Descriptive statistics utilizes numerical and graphical methods to look for patterns in a data set, to summarize the information revealed in a data set, and to present the information in a convenient form” (McClave, Benson and Sincich 2001, 2) “Inferential statistics utilizes sample data to make estimates, decisions, predictions, or other generalizations about a larger set of data” (McClave, Benson and Sincich 2001, 2) “The phrase statistical inference will appear often in this book. By this we mean, we want to “infer” or learn something about the real world by analyzing a sample of data. The ways in which statistical inference are carried out include: estimating…parameters; predicting…outcomes, and testing…hypothesis …” (Hill, Griffiths and Judge 2001, 9). Inferential statistics is not only about ‘causal’ relationships; King, Keohane and Verba argue that it is categorized into two broad areas: (1) descriptive, and (2) causal inference. Thus, descriptive inference speaks to the description of a population from what is made possible, the sample size. According to Burham, Gilland, Grant and Layton-Henry (2004) state that: Causal inferences differ from descriptive ones in one very significant way: they take a ‘leap’ not only in terms of description, but in terms of some specific causal 64
  64. 64. process [i.e. predictability of the variables]” (Burham, Gilland, Grand and Layton- Henry 2004, 148). In order that this textbook can be helping and simple, I will provide operational definitions of concepts as well as illustration of particular terminologies along with appropriateness of statistical techniques based on the typologies of variable and the level of measurement (see in Tables 1.1.1 – 1.1.6, below). 65
  65. 65. CHAPTER 2 2.1.0: DESCRIPTIVE STATISTICS The interpretation of quantitative data commences with an overview (i.e. background information on survey or study – this is normally demographic information) of the general dataset in an attempt to provide a contextual setting of the research (descriptive statistics, see above), upon which any association may be established (inferential statistics, see above). Hence, this chapter provides the reader with the analysis of univariate data (descriptive statistics), with appropriate illustration of how various levels of measurement may be interpreted, and/or diagrams chosen based on their suitability. A variable may be non-metric (i.e. nominal or ordinal) or metric (i.e. scale, interval/ratio). It is based on this premise that particular descriptive statistics are provide. In keeping with this background, I will begin this process with non-metric, then metric data. The first part of this chapter will provide a thorough outline of how nominal and/or ordinal variables are analyzed. Then, the second aspect will analyze metric variables. 66
  66. 66. STEP ONE Ensure that the STEP TEN variable is non- Analyze the output metric (e.g. Gender, STEP TWO (use Table 2.1.1a) general happiness) Select Analyze STEP TEN STEP THREE Select descriptive select paste or ok statistics HOW TO DO DESCRIPTIVE STEP NINE STATISTICS FOR A STEP FOUR NO-METRIC Choose bar or pie graphs VARIABLE? select frequency STEP FIVE STEP EIGHT select the non-metric select Chart variable STEP SEVEN STEP SIX select mode or mode and median (based on if the select statistics at the variable is nominal or end ordinal respective Figure 2.1.0: Steps in Analyzing Non-metric data 67
  67. 67. 2.1.1a: INTERPRETING NON-METRIC (or Categorical) DATA NOMINAL VARIABLE (when there are not missing cases) Table 2.1.1a: Gender of respondents Frequency Percent Valid Percent Male 150 69.4 69.4 Gender: Female 66 30.6 30.6 Total 216 100.0 100.0 Identifying Non-missing Cases: When there are no differences between the percent column and those of the valid percent column, then there are no missing cases. How is the table analyzed? Of the sampled population (n=21612), 69.4% were males compared to 30.6% females. 12 The total number of persons interviewed for the study. It is advisable that valid percents are used in descriptive statistics as there may be some instances then missing cases are present with the dataset, which makes the percent figure different from those of the valid percent (Table 2.1.1b). 68
  68. 68. NOMINAL VARIABLE: Establishment of when missing cases Table 2.1.1b: General Happiness Frequency Percent Valid Percent Very happy 467 30.8 31.1 General Happiness: Pretty happy 872 57.5 58.0 Not too happy 165 10.9 11.0 Missing Cases 13 0.9 - Total 1,517 100.0 100.0 Identifying Missing Cases: In seeking to ascertain missing data (which indicates that some of the respondents did no answer the specified question), there is a disparity between the values for percent and those in valid percent. In this case, 13 of 1,517 respondents did not answer question on ‘general happiness’. In cases where there is a difference between the two aforementioned categories (i.e. percent and valid percent), the student should remember to use the valid percent. The rationale behind the use of the valid percent is simple, the research is about those persons who have answered and they are captured in the valid percent column. Hence, it is recommended that the student use the valid percent column at all time in analyzing quantitative data. Interpretation: Of the sampled population (n=1,517), the response rate is 99.1% (n=1,504)13. Of the valid responses (n=1,504), 31.1% (n=467) indicated that they were ‘very happy’, with 58.0% (n=872) reported being ‘pretty happy’, compared to 11.0% (n=165) who said ‘not too happy’. 13 Because missing cases are within the dataset (13 or 0.9%), there is a difference between percent and valid percent. Thus, care should be taken when analyzing data. This is overcome when the valid percents are used. 69
  69. 69. Owing to the typology of the variable (i.e. nominal), this may be presented graphical by either a pie graph or a bar graph. Pie graph Female, 30.6, 31% Male, 69.4, 69% Figure 2.1.1: Respondents’ gender OR Bar graph 70 60 50 40 30 20 10 0 Male Female Figure 2.1.2: Respondents’ gender 70
  70. 70. ORDINAL VARIABLE Table 2.1.2: Subjective (or self-reported) Social Class Frequency Percent Valid Percent Social class: Lower 100 46.3 46.3 Middle 104 48.1 48.1 Upper 12 5.6 50.6 Total 216 100.0 100.0 Interpreting the Data in Table 2.1.2: When the respondents were asked to select what best describe their social standing, of the sampled population (n=216), 46.3% reported lower (working) class, 48.1% revealed middle class compared to 5.6% who said upper middle class. Based on the typology of variable (i.e. ordinal), the graphical options are (i) pie graph and/or (2) bar graph. Note: In cases where there is no difference between the percent column and that of valid percent, researchers infrequently use both columns. The column which is normally used is valid percent as this provides the information of those persons who have actually responded to the specified question. Instead of using ‘valid percent’ the choice term is ‘percent’. 71
  71. 71. 50 45 48.1 40 46.3 35 30 25 20 15 10 5 5.6 0 Lower class Middle class Upper middle class Figure 2.1.3: Social class of respondents Or Upper middle class, 5.6 Lower class, 46.3 Middle class, 48.1 Figure 2.1.4: Social class of respondents 72
  72. 72. 2.1.1b: STEPS IN INTERPRETING METRIC VARIABLE: METRIC (i.e. scale or interval/ratio) STEP ONE STEP TEN Know the metric variable (Age) STEP TWO Analyze the output (use Table 2.1.3) Select Analyze STEP TEN STEP THREE Select descriptive select paste or ok statistics HOW TO DO STEP NINE DESCRIPTIVE STATISTICS FOR STEP FOUR Choose histogram A METRIC with normal curve VARIABLE? select frequency STEP FIVE STEP EIGHT select Chart select the metric variable STEP SIX STEP SEVEN select mean, select statistics at standard deviation, the end skewness Figure 2.1.5: Steps in Analyzing Metric data 73