Quentative research method


Published on

published by "www.marketing-utopia.tk"

Published in: Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Quentative research method

  1. 1. DBA6000Quantitative Business Research Methods Rob J Hyndman
  2. 2. c Rob J Hyndman, 2008.Professor Rob HyndmanDepartment of Econometrics and Business StatisticsMonash University (Clayton campus)VIC 3800.Email: Rob.Hyndman@buseco.monash.edu.auTelephone: (03) 9905 2358www.robhyndman.info
  3. 3. ContentsPreface 51 Research design 9 1.1 Statistics in research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 Organizing a quantitative research study . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3 Some quantitative research designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4 Data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5 The survey process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Appendix A: Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Data collection 23 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 Data collecting instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Errors in statistical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4 Questionnaire design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.5 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.6 Sampling schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.7 Scale development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Appendix B: Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Data summary 53 3.1 Summarising categorical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2 Summarizing numerical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3 Summarising two numerical variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.4 Measures of reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.5 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684 Computing and quantitative research 70 4.1 Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2 Using a statistics package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.3 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4 SPSS exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765 Significance 77 5.1 Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3
  4. 4. 5.2 Numerical differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816 Statistical models and regression 88 6.1 One numerical explanatory variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.2 One categorical explanatory variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.3 Several explanatory variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.4 Comparing regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.5 Choosing regression variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.6 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.7 SPSS exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067 Significance in regression 107 7.1 Statistical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.2 ANOVA tables and F-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.3 t-tests and confidence intervals for coefficients . . . . . . . . . . . . . . . . . . . . . . 108 7.4 Post-hoc tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.5 SPSS exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1118 Dimension reduction 112 8.1 Factor analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 8.2 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1189 Data analysis with a categorical response variable 119 9.1 Chi-squared test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 9.2 Logistic and multinomial regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 9.3 SPSS exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12310 A survey of statistical methodology 12411 Further methods 131 11.1 Classification and regression trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 11.2 Structural equation modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 11.3 Time series models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 11.4 Rank-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13412 Presenting quantitative research 135 12.1 Numerical tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 12.2 Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Appendix: Good graphs for better business . . . . . . . . . . . . . . . . . . . . . . . . . . . 14113 Readings 145DBA6000: Quantitative Business Research Methods 4
  5. 5. PrefaceSubject convenorProfessor Rob J HyndmanB.Sc.(Hons), Ph.D., A.StatDepartment of Econometrics and Business StatisticsLocation: Room 671, Menzies Building, Clayton.Phone: (03) 9905 2358Email: Rob.Hyndman@buseco.monash.edu.auWWW: http://www.robhyndman.infoObjectivesOn completion of this subject, students should have: • the necessary quantitative skills to conduct high quality independent research related to business administration; • comprehensive grounding in a number of quantitative methods of data production and analysis; • been introduced to quantitative data analysis through a practical research activity.SynopsisThis unit considers the quantitative research methods used in studying business, managementand organizational analysis. Topics to be covered: 1. research design including experimental designs, observational studies, case studies, lon- gitudinal analysis and cross-sectional analysis; 2. data collection including designing data collection instruments, sampling strategies and assessing the appropriateness of archival data for a research purpose; 3. data analysis including graphical and numerical techniques for the exploration of large 5
  6. 6. Preface data sets and a survey of advanced statistical methods for modelling the relationships between variables; 4. communication of quantitative research; and 5. the use of statistical software packages such as SPSS in research.The effective use of several quantitative research methods will be illustrated through readingresearch papers drawn from several disciplines.ReferencesNone of these are required texts—they provide useful background material if you want to readfurther. Huck (2007) is excellent on interpreting statistical results in academic papers. Pallant(2007) is very helpful when using SPSS and in giving advice on how to write up research results.Use Wild and Seber (2000) if you need to brush up on your basic statistics; it contains lots ofhelpful advice and interesting examples. 1. H UCK , S.W. (2007) Reading statistics and research. 5th ed., Allyn & Bacon: Boston, MA 2. PALLANT, J. (2007) SPSS survival manual, 3rd ed., Allen & Unwin. 3. DE VAUS , D. (2002) Analyzing social science data. SAGE Publications: London. 4. W ILD , C.J., & S EBER , G.A.F. (2000) Chance encounters: a first course in data analysis and inference. John Wiley & Sons: New York.Timetable 17 July Introduction/Chapter 1 24 July Chapters 2 31 July Chapter 3 7 August Chapter 4 SPSS tutorial 14 August Chapter 5 21 August Chapter 6 28 August Chapter 7 SPSS tutorial 4 September Chapter 8–9 SPSS tutorial 11 September Chapter 10 18 September Chapter 11–12 First assignment due 25 September No class 2 October No class 9 October SPSS tutorial 16 October Oral presentations Second assignment dueDBA6000: Quantitative Business Research Methods 6
  7. 7. PrefaceAssessment 1. A written report presenting and critiquing a research paper which uses quantitative re- search methods. 45% • It can be a published research paper from a scholarly journal, or a company report. It must contain substantial quantitative research. It must be approved in advance. • Your report should include comments on the research questions addressed, the ap- propriateness of the data used, how the data were collected, the method of analysis chosen, and the conclusions drawn. • Length: 4000–5000 words excluding tables and graphs. • Due: 17 September 2. A written report presenting some original quantitative analysis of a suitable multivariate data set. 45% • You may use your own data, or use data that I will provide. The data set must include at least four variables. It can be data from your workplace. • Your report should include comments on the research questions addressed, the ap- propriateness of the data used, how the data were collected, the method of analysis chosen, and the conclusions drawn. • You may use any statistical computing package or Excel for analysis. • Length: 4000–5000 words excluding tables and graphs. • Due: 15 October 3. A 20 minute oral presentation of one of the above reports. 10%. • On either 8 or 15 October.Assignment marking scheme • Research questions addressed: 6% • Appropriateness of data: 6% • Data collection: 6% • Description of statistical methods used: 6% • Suitability of statistical methods: 6% • Discussion of statistical results: 8% • Conclusions (are they supported/valid?): 7%Choosing a paper for Assignment 1Choose something you are interested in. For example, it can be an article you are reading aspart of your other DBA studies or something you have read as part of your professional life.The following journals contain some articles that would be suitable. There are also many others. • Australian Journal of Management • International Journal of Human Resource Management • Journal of Advertising • Journal of Applied Management Studies • Journal of Management • Journal of Management Accounting ResearchDBA6000: Quantitative Business Research Methods 7
  8. 8. Preface • Journal of Management Development • Journal of Managerial Issues • Journal of Marketing • Management DecisionYou can obtain online copies for some of these via the Monash Voyager Catalogue. Hard copiesshould be in the Monash library.Things to look for: • it should involve some substantial data analysis; • it should involve more than summary statistics (e.g., a regression model, or some chi- squared tests); • it should not use sophisticated statistical methods that are beyond this subject (e.g., avoid factor analysis and structural equation models).All papers should be approved by Rob Hyndman before you begin work on the assignment.Choosing a data set for Assignment 2 • Choose something you know about. The best data analyses involve a mix of good knowl- edge of the data context as well as good use of statistical methodology. • Don’t try to do too much. One response variable with 3–5 explanatory variables is usually sufficient. Resist the temptation to write a long treatise! • You will find it easier if the response variable is numeric. Analysing categorical response variables with several explanatory variables can be tricky. • Be clear about the purpose of your analysis. State some explicit objectives or hypotheses, and address them via your statistical analysis. • Think about what you include. A few well-chosen graphics that tell a story is better than pages of computer output that mean very little. • Start early. Even before we cover much methodology, you can do some basic data sum- maries and think about the key questions you want to address. • All data sets should be approved by Rob Hyndman before you begin work on the assign- ment.ReadingsMost weeks we will read a case study from a research journal and discuss the analysis. Pleaseread these in advance. We will discuss them in the third hour. You cannot use a paper wehave discussed for your first assessment task. If you have a suggestion of a paper that may besuitable for class discussion, please let me know.DBA6000: Quantitative Business Research Methods 8
  9. 9. CHAPTER 1 Research design1.1 Statistics in research “Statistics is the study of making sense of data.” Ott and Mendenhall “The key principle of statistics is that the analysis of observations doesn’t depend only on the observations but also on how they were obtained.” Anonymous • Data beat anecdotes “For example” proves nothing. (Hebrew proverb) • Data beat intuition “Belief is no substitute for arithmetic.” (Henry Spencer) • Data beat “expert” opinion “When information becomes unavailable, the expert comes into his own.” (A.J. Liebling)1.1.1 Statistics answers questions using data • Do pollutants cause asthma? • Do transaction volumes on the stock market react to price changes? • Does deregulation reduce unemployment? • Does fluoride reduce tooth decay?A definitionStatistical Analysis: Mysterious, sometimes bizarre, manipulations performed upon the col-lected data of an experiment in order to obscure the fact that the results have no generalizablemeaning for humanity. Commonly, computers are used, lending an additional aura of unrealityto the proceedings. (Source unknown) 97.3% of all statistics are made up. 9
  10. 10. Part 1. Research design1.1.2 Some statistics storiesThe Challenger disaster 2 Number of O-rings damaged 1 0 55 60 65 70 75 80 Ambient temperature at launchCharlie’s chooks 14 12 Y: Percentage mortality 10 8 6 4 0 20 40 60 80 100 X: Percentage Tegel birdsDBA6000: Quantitative Business Research Methods 10
  11. 11. Part 1. Research designRisk factors for heart diseaseA doctor wants to investigate who is most at risk for coronary-related deaths. He selects 12patients at random from his clinic and records their age, blood pressure and drug used. Healso records whether they eventually died from heart disease or not. Age BP Drug L/D 18 68 1 D 20 64 2 L 22 72 1 D 25 67 2 L 29 80 – D 33 70 – D 34 86 1 D 36 85 – D 37 73 2 L 39 82 – L 41 90 1 D 45 87 2 L Drug Lived Died % lived 1 0 4 0% 2 4 0 100% – 1 3 25% 5 7Drug 1 looks bad, 2 looks good.DBA6000: Quantitative Business Research Methods 11
  12. 12. Part 1. Research design1.1.3 Causation and associationSmoking and Lung CancerThere is a strong positive correlation between smoking and lung cancer. There are severalpossible explanations. • Causal hypothesis: Smoking causes lung cancer. • Genetic hypothesis: There is a hereditary trait which predisposes people to both nicotine addiction and lung cancer. • Sloppy lifestyle hypothesis: Smoking is most prevalent amongst people who also drink too much, don’t exercise, eat unhealthy food, etc.Postnatal careMothers who return home from hospital soon after birth do better than those who stay inhospital longer. • Causation hypothesis: Hospital is harmful and/or home is helpful. • Common response hypothesis: Mothers return home early because they are coping well. • Confounding hypothesis: Mothers return home early if there is someone at home to help.University applicants Male Female Total Accept 70 40 110 Reject 100 100 200 Total 170 140 310Is there evidence of discrimination?Course: Introduction to bean counting Male Female Total Accept 60 20 80 Reject 60 20 80 Total 120 40 160DBA6000: Quantitative Business Research Methods 12
  13. 13. Part 1. Research designCourse: Advanced welding Male Female Total Accept 10 20 30 Reject 40 80 120 Total 50 100 150 This is an example of Simpson’s Paradox. Simpson’s Paradox occurs when the association between variables is reversed when data from several groups are combined.Other examples of Simpsons’ paradox • Average tax rate has increased with time even though rate in every income category has decreased. Why? • Ave. female salary of B.Sc. graduates is lower than ave. male salary. Why? Causality or association? 1. A positive correlation between blood pressure and income is observed. Does this indicate a causal connection? 2. In a survey in 1960, it was found that for 25–34 y.o. males there was a positive correlation between years of school completed and height. Does going to school longer make a man taller? 3. The same survey showed a negative correlation between age and educational level for persons aged over 25. Why? 4. Students at fee paying private schools perform better on average in VCE than students at government funded schools. Why?Some subtle differences • Distinguish between: causation & association, prediction & causation, prediction & ex- planation. • Note difference between deterministic and probabilistic causation.DBA6000: Quantitative Business Research Methods 13
  14. 14. Part 1. Research design1.2 Organizing a quantitative research studyAs a quick check, ask the following questions 1. What is your hypothesis (your research question)? 2. What is already known about the problem (literature review)? 3. What sort of design is best suited to studying your hypothesis? (method) 4. What data will you collect to test your hypothesis? (sample) 5. How will you analyse these data? (data analysis) 6. What will you do with the results of the study? (communication)These questions are broken down in more detail below. (These are mostly taken from Rubin etal. (1990), and have also appeared in Balnaves and Caputi (2001).)1.2.1 Hypothesis • What is the goal of the research? • What is the problem, issue, or critical focus to be researched? • What are the important terms? What do they mean? • What is the significance of the problem? • Do you want to test a theory? • Do you want to extend a theory? • Do you want to test competing theories? • Do you want to test a method? • Do you want to replicate a previous study? • Do you want to correct previous research that was conducted in an inadequate manner? • Do you want to resolve inconsistent results from earlier studies? • Do you want to solve a practical problem? • Do you want to add to the body of knowledge in another manner?1.2.2 Review of literature • What does previous research reveal about the problem? • What is the theoretical framework for the investigation? • Are there complementary or competing theoretical frameworks? • What are the hypotheses and research questions that have emerged from the literature review?DBA6000: Quantitative Business Research Methods 14
  15. 15. Part 1. Research design1.2.3 Method • What methods or techniques will be used to collect the data? (This holds for applied and non-applied research) • What procedures will be used to apply the methods or techniques? • What are the limitations of these methods? • What factors will affect the study’s internal and external validity? • Will any ethical principles be jeopardized?1.2.4 Sample • Who (what) will provide (constitute) the data for the research? • What is the population being studied? • Who will be the participants for the research? • What sampling technique will be used? • What materials and information are necessary to conduct the research? • How will they be obtained? • What special problems can be anticipated in acquiring needed materials and information? • What are the limitations in the availability and reporting of materials and information?1.2.5 Data analysis • How will data be analysed? • What statistics will be used? • What criteria will be used to determine whether hypotheses are supported? • What was discovered (about the goal, data, method, and data analysis) as a result of doing preliminary work (if conducted)?1.2.6 Communication • How will the final research report be organised? (Outline) • What sources have you examined thus far that pertain to your study? (Reference list) • What additional information does the reader need? • What time frame (deadlines) have you established for collecting, analysing and present- ing data? (Timetable)1.3 Some quantitative research designs • Case study: questionnaire, interview, observation. Best for exploratory work and hy- pothesis generation. Limited quantitative analysis possible. • Survey: questionnaire, interview, observation. Best if sample is random. • Experiment: questionnaire, interview, observation. Best for demonstrating causality.DBA6000: Quantitative Business Research Methods 15
  16. 16. Part 1. Research design1.3.1 Cross-sectional vs longitudinal analysisAll designs can be either cross-sectional or longitudinal. • Cross-sectional design involves data collection for one time only. • Longitudinal design involves successive data collection over a period of time. Necessary if you want to study changes over time.1.3.2 Case study designs • involves intense involvement with a few cases rather than limited involvement with many cases • can’t generalize results easily • useful in exploring ideas and generating hypotheses1.3.3 Survey designs • Most popular in business/management research • useful when you cannot control the things you want to study • difficult to get random and representative samples1.3.4 Experimental designs • requires control group to allow for the placebo effect • requires the experimenter to control all variables other than the variable of interest • requires randomization to groups • allows causation to be tested Which research design would you use? Hypotheses: 1. Women believe they are better at managing than men. 2. Children who listen to poetry in early childhood make better progress in learn- ing to read than those who do not. 3. A business will run more efficiently if no person is directly responsible for more than five other people. 4. There are inherent advantages in businesses staying small. 5. Employees with postgraduate qualifications have shorter job expectancy than employees without postgraduate qualifications. What data would you collect in each case?DBA6000: Quantitative Business Research Methods 16
  17. 17. Part 1. Research design1.4 Data structure1.4.1 Populations and samplesA population is the entire collection of ‘things’ in which we are interested. A sample is a subset ofa population. We wish to make an inference about a population of interest based on informationobtained from a sample from that population.E XAMPLES : • You measure the profit/loss of 50 public hospitals in Victoria, randomly selected. Population: Sample: Points of interest: • Sales on 500 products from one company for the last 5 years are analysed. Population: Sample: Points of interest:1.4.2 Cases and variablesThink about your data in terms of cases and variables. • A case is the unit about which you are taking measurements. E.g., a person, a business. • A variable is a measurement taken on each case. E.g., age, score on test, grade-level, income.1.4.3 Types of DataThe ways of organizing, displaying and analysing data depends on the type of data we areinvestigating. • Categorical Data (also called nominal or qualitative) e.g. sex, race, type of business, postcode Averages don’t make sense. Ordered categories are called ordinal data • Numerical Data (also called scale, interval and ratio) e.g. income, test score, age, weight, temperature, time. Averages make sense.Note that we sometimes treat numerical data as categories. (e.g. three age groups.)DBA6000: Quantitative Business Research Methods 17
  18. 18. Part 1. Research design1.4.4 Response and explanatory variablesResponse variable: measures the outcome of a study. Also called dependent variable.Explanatory variable: attempts to explain the variation in the observed outcomes. Also called independent variables. Many statistical problems can be thought of in terms of a response variable and one or more explanatory variables.Sometimes the response variable is called the dependent variable and the explanatory variablesare called the independent variables. • Study of profit/loss in Victorian hospitals. Response variable: Explanatory variables: • Monthly sales of 500 products Response variable: Explanatory variables: competitor advertising.1.5 The survey process1. Planning a survey State the objectives: In order to state the objectives we often need to ask questions such as: • What is the survey’s exact purpose? • What do we not know and want to know? • What inferences do we need to draw? Begin by developing a specific list of information needs. Then write focused survey ques- tions.2. Design the sampling procedure Identify the target population: Whom are we drawing conclusions about? Select a sampling scheme: Examples: simple random sampling, stratified random sampling, systematic sampling, and cluster sampling.3. Select a survey method Decide how to collect the data: personal interviews, telephone interviews, mailed ques- tionnaires, diaries, . . .4. Develop the questionnaire Write the questionnaire. Decide on the wording, types of questions, and other issues.5. Pretest the questionnaire Select a very small sample from the sampling frame. Conduct the survey and see what goes wrong. Correct any problems before carrying out the full-scale study.6. Conduct the survey Run the survey in an efficient and time effective manner.7. Analyze the data Gather the results and determine outcomes.DBA6000: Quantitative Business Research Methods 18
  19. 19. Part 1. Research designAppendix A: Case studiesInjury management in NSWFour injury management pilots (IMP) running during 2001: • private hospitals and nursing homes within NSW; • all industry groups within the Central West NSW region; • two insurance companies (QBE and EML).We wish to do a statistical comparison of the injury management pilots with the current stan-dard injury management arrangements.Performance measures • incidence of specific payment types • duration of claims • number of claims • proportion of claimants in receipt of weekly benefits at 4, 8, 13 and 26 weeks. • costs for claimants at 4, 8, 13 and 26 weeks. – medical, rehabilitation, physiotherapy, chiropractic – weekly-benefits • timeliness – number of days from injury to agent notification – number of days from injury to first paymentSome potential driving variables • age • gender • injury type • agency (e.g., powered tools) • severity of injury • medical interventions • employer size • insuring agency • weekly pay at time of injury • industry (ANZSIC code) • occupation (ASCO code) • Driving variables affect the performance measures. • Variations between groups in key driver variables can induce apparent differences be- tween groups. This is then confused with any real differences due to the programs being evaluated. • Therefore any comparisons of groups of employees should either eliminate the effect of drivers or try to measure the effect of the drivers.DBA6000: Quantitative Business Research Methods 19
  20. 20. Part 1. Research designThe ideal design!Ideally, we would use a randomized control trial. This eliminates the effect of driving vari-ables. • The control group would be employees on the old IM system. • The treatment group would be employees in the new IMP. • Employees would be randomly allocated to the two groups. • Statistical comparisons between the two groups would show differences between the old IM system and the new IMP. • This random allocation would prevent any systematic differences between those in the IMP and those not in the IMP. • Such a scheme is impracticable.The actual designWe have to use pseudo-control groups and eliminate differences between the control and IMPgroups using statistical models. • All injuries within the specified industry group, geographical region or insurer will be subject to the new IMP during 2001. • The pseudo-controls will be the equivalent groups of employees in 2000 who are not subject to the new IMP.Problem of confounding • If there are differences between the IMP and the control, is it due to the different IM program or the different group?Solution: • adjust for as many driving variables as possible; • compare similar groups not subject to the IMP.Comparisons undertakenIMP group: Private hospitals/nursing homes in NSW 2001 Pseudo-control: Private hospitals/nursing homes 2000IMP group: Central West NSW region 2001 Pseudo-controls: Central West NSW region 2000IMP group: Insurance company 2001 Pseudo-control: Insurance company 2000Non-IMP group: Comparable industry group 2001 Pseudo-controls: Comparable industry group 2000Non-IMP group: Comparable NSW region 2001 Pseudo-controls: Comparable NSW region 2000DBA6000: Quantitative Business Research Methods 20
  21. 21. Part 1. Research designWe do not directly compare: • private hospitals/nursing homes with other industry groups; • Central West NSW region with other geographical regions.Instead, we compare the change between 2000 and 2001 in each industry group and each geo-graphical region.How to interpret the results. . . • If all 2001 groups are different from the 2000 groups after taking into account all drivers, then it is likely there are changes between years not reflected in the drivers. We won’t be able to attribute any changes to the IMP. • If all IMP 2001 groups are different from the 2000 groups after taking into account all drivers, but the non-IMP 2001 groups are not different from the 2000 groups, then it is likely the changes between years are due to the IMP.DBA6000: Quantitative Business Research Methods 21
  22. 22. Part 1. Research designNeedlestick injuriesYou are interested in the number and severity of needle stick injuries amongst health workersinvolved in blood donation and transfusion. Work in groups of three to carefully define theobjectives of your survey. You will need to specify • the objective of the survey • what data are to be collected • the target population • the survey population • the sample • the data collection method • potential errors which could occur in your survey.Palliative care referralsA few years ago, I helped the Health Department with a survey on palliative care. As partof the study, it was necessary to study the ‘referral’ pattern for palliative care providers: howmany patients they send to hospital (for inpatient or outpatient treatment); how many theyrefer to consultants for specialist comment; how many to community health programs; and soon.Possible sampling schemes: 1. sample a group of palliative care practitioners and study their referral patterns; 2. sample a group of palliative care patients and study their referral patterns.Discuss the possible advantages and disadvantages of the two schemes.DBA6000: Quantitative Business Research Methods 22
  23. 23. CHAPTER 2 Data collection2.1 Introduction “You don’t have to eat the whole ox to know that the meat is tough.” Samuel JohnsonSampling is very familiar to all of us, because we often reach conclusions about phenomenaon the basis of a sample of such phenomena. You may test a swimming pool’s temperature bydipping your toe in the water or the performance of a new vehicle by a short test drive. Theseare among the countless small samples that we rely on when making personal decisions. Wetend to use haphazard methods in picking our sample and risk substantial sampling error.Research also usually reaches its conclusions on the basis of sampling, but the methods usedmust adhere to certain rules that are going to be discussed. The goal in obtaining data throughsurvey sampling is to use a sample to make precise inferences about the target population. Wewant to be highly confident about our inferences. It is important to have a substantial graspof sampling theory to appraise the reliability and validity of the conclusions drawn from thesample taken.2.2 Data collecting instrumentsThe choice of data collection instrument is crucial to the success of the survey. When deter-mining an appropriate data collection method, many factors need to be taken into account,including complexity or sensitivity of the topic, response rate required, time or money avail-able for the survey and the population that is to be targeted. Some of the most common datacollection methods are described in the following sections. 23
  24. 24. Part 2. Data collection2.2.1 Interviewer enumerated surveysInterviewer enumerated surveys involve a trained interviewer going to the potential respon-dent, asking the questions and recording the responses.The advantages of using this methodology are: • provides better data quality • special questioning techniques can be used • greater rapport established with the respondent • allows more complex issues to be included • produces higher response rates • more flexibility in explaining things to respondents • greater success in dealing with language problemsThe disadvantages of using this methodology are: • expensive to conduct • training for interviewers is required • more intrusive for the respondent • interviewer bias may become a source of error2.2.2 Web surveysWeb surveys are increasingly popular, although care must be taken to avoid sample selectionbias and multiple responses from an individual.The advantages of this methodology are: • cheap to administer • private and confidential • easy to use conditional questions and to prompt if no response or inappropriate response. • can build in live checking. • can provide multiple language versionsThe disadvantages of this methodology are: • respondent bias may become a source of error • not everyone has access to the internet • language and interface must be very simple • cannot build up a rapport with respondents • resolution of queries is difficult • only appropriate when straight forward data can be collected2.2.3 Mail surveysSelf-enumeration mail surveys are where the questionnaire is left with the respondent to com-plete.The advantages of this methodology are:DBA6000: Quantitative Business Research Methods 24
  25. 25. Part 2. Data collection • cheaper to administer • more private and confidential • in some cases does not require interviewersThe disadvantages of this methodology are: • difficult to follow-up non-response • respondent bias may become a source of error • response rates are much lower • language must be very simple • problems with poor English and literacy skills • cannot build up a rapport with respondents • resolution of queries is difficult • only appropriate when straight forward data can be collected2.2.4 Telephone surveysA telephone survey is the process where a potential respondent is phoned and asked the surveyquestions over the phone.The advantages of this methodology are: • cheap to administer • convenient for interviewers and respondentsThe disadvantages of this methodology are: • interviews easily terminated by respondent • cannot use prompt cards to provide alternatives for answers • burden placed on interviewers and respondents • biased sample through households with phones2.2.5 DiariesDiaries can be used as a format for a survey. In these surveys respondents are directed to recordthe required information over a predetermined period in the diary, book or booklet supplied.The advantages of this methodology are: • high quality and detailed data from the completed diaries • more private and confidential circumstances for the respondent • does not require interviewersThe disadvantages of this methodology are: • response rates are lower and the diaries are rarely completed well • language must be simple • can only include relatively simple concepts • cannot build up a rapport • cannot explain the purpose of survey items to respondentsDBA6000: Quantitative Business Research Methods 25
  26. 26. Part 2. Data collection Face-to-face Telephone Mail Response rates Good Good Good Representative samples Avoidance or refusal bias Good Good Poor Control over who completes the questionnaire Good Good Satisfactory Gaining access to the selected person Satisfactory Good Good Locating the selected person Satisfactory Good Good Effects on questionnaire design Ability to handle: Long questionnaires Good Satisfactory Satisfactory Complex questions Good Poor Satisfactory Boring questions Good Satisfactory Poor Item non-response Good Good Satisfactory Filter questions Good Good Satisfactory Question sequence control Good Good Poor Open ended questions Good Good Poor Quality of answers Minimize socially desirable responses Poor Satisfactory Good Ability to avoid distortion due to Interviewer characteristics Poor Satisfactory Good Interviewer opinions Satisfactory Satisfactory Good Influence of other people Satisfactory Good Poor Allows opportunities to consult Satisfactory Poor Good Avoids subversion Poor Satisfactory Good Implementing the survey Ease of finding suitable staff Poor Good Good Speed Poor Good Satisfactory Cost Poor Satisfactory GoodTable 2.1: Advantages and disadvantages of three methods of data collection. Table taken from de Vaus(2001) who adapted it from Dillman (1978).2.2.6 Ideas for increasing response rates 1. Provide reward 2. Systematic follow up 3. Keep it short. 4. Interesting topic.DBA6000: Quantitative Business Research Methods 26
  27. 27. Part 2. Data collection2.2.7 Archival dataRather than collecting your own data, you may use some existing data. If you do, keep thefollowing points in mind.Available information Is there sufficient documentation of the original research proposal for which the data were collected? If not, there may be hidden problems in re-using the data.Geographical area Are the data relevant to the geographical area you are studying? e.g., what country, city, state or other area does the archive data cover?Time period Are the data relevant to the time period you are studying? Does your research area cover recent events, or is it historical or does it look at changes over a specified range of time? Most data are at least a year old before they are released to the public.Population What population do you wish to study? This can refer to a group or groups of people, particular events, official records, etc. In addition you should consider whether you will look at a specific sample or subset of people, events, records, etc.Context Does the archival data contain the information relevant to your research area?2.3 Errors in statistical dataIn sample surveys there are two types of error that can occur: • sampling error which arises as only a part of the population is used to represent the whole population and; • non-sampling error which can occur at any stage of a sample survey.It is important to be aware of these errors so that they can be minimized.2.3.1 Sampling errorSampling error is the error we make in selecting samples that are not representative of thepopulation. Since it is practically impossible for a smaller segment of a population to be exactlyrepresentative of the population, some degree of sampling error will be present whenever weselect a sample. It is important to consider sampling error when publishing survey results asit gives an indication of the accuracy of the estimate and therefore reflects the importance thatcan be placed on interpretations.If sampling principles are carefully applied within the constraints of available resources, sam-pling error can be accurately measured and kept to a minimum. Sampling error is affectedby: • sample size • variability within the population • sampling schemeDBA6000: Quantitative Business Research Methods 27
  28. 28. Part 2. Data collectionGenerally larger sample sizes decrease sampling error. To halve the sampling error the samplesize has to be increased fourfold. In fact, sampling error can be completely eliminated byincreasing the sample size to include every element in the population.The population variability also affects the error, more variable populations give rise to largererrors as the samples or estimates calculated from different samples are more likely to havegreater variation. The effect of the variability within the population can be reduced by increas-ing sample size to make it more representative of the target population.2.3.2 Non-sampling errorNon-sampling error can be defined as those errors in a survey that are not sampling errors.Non-sampling error is any error not caused by the fact that we have only selected part ofthe population in the survey. Even if we were to undertake a complete enumeration of thepopulation, non-sampling errors might remain. In fact, as the size of the sample increases, thenon-sampling errors may get larger, because of such factors as possible increase in the responserate, interviewer errors, and data processing errors.For the most part we cannot measure the effect that non-sampling errors will have on the re-sults. Because of their nature, these errors may not be totally eliminated. Perhaps the biggestsource of non-sampling error is a poorly designed questionnaire. The questionnaire can in-fluence the response rate achieved in the survey, the quality of responses obtained and conse-quently the conclusions drawn from survey results.Some common sources of non-sampling error are discussed in the following paragraphs.Target Population Failure to identify clearly who is to be surveyed. This can result in an inadequate sam- pling frame; imprecise definitions of concepts and poor coverage rules.Non-response A non-response error occurs when the respondents do not reflect the sampling frame. This could occur when the people who do not respond to the survey differ to the people who did respond to the survey. This often occurs in voluntary response polls. For ex- ample, suppose that in an air bag study we asked respondents to call a 0018 number to be interviewed. Because a 0018 call cost $2 per minute, many drivers may not respond. Furthermore, those who do respond may be the people who have had bad experiences with air bags. Thus the final sample of respondents may not even represent the sampling frame. For example, • telephone polls miss those people without phones • household surveys miss homeless, prisoners, students in colleges, etc. • train surveys only target public transport users and tend to include regular public transport users.DBA6000: Quantitative Business Research Methods 28
  29. 29. Part 2. Data collection Manufacturers and advertising agencies often use interviews at shopping malls to gather information about the habits of consumers and the effectiveness of ads. A sample of mall shoppers is fast and cheap. “Mall interviewing is being propelled primarily as a budget issue”, one expert told the New York Times. But people con- tacted at shopping malls are not representative of the entire population. They are richer, for example, and more likely to be teenagers or retired. Moreover, mall inter- viewers tend to select neat safe looking individuals from the stream of customers. Decisions based on mall interviews may not reflect the preferences of all consumers. In 1991 it was claimed that data showed that right-handed persons live on average almost a decade longer than left-handed or ambidextrous persons. The investigators had compared mean ages at death of people who appeared to be survivors as left, right or mixed handed. • What is the problem?The questionnaire Poorly designed questionnaires with mistakes in wording, content or layout may make it difficult to record accurate answers. The most effective methods of designing a question- naire are discussed in Section 2.4. If these principles are followed it will help reduce the non-sampling error associated with the questionnaire.Interviewers If an interviewer is used to administer the survey, their work has the potential to produce non-sampling error. This can be due to the personal characteristics of the interviewer. For example, an elderly person will often be more comfortable giving information to a female interviewer. Other factors which could cause error are the interviewer’s opinions and characteristics which may influence the respondent’s answers. In 1968, one year after a major racial disturbance in Detroit, a sample of black resi- dents was asked: Do you personally feel that you can trust most white people, some white people, or none at all? Of those interviewed by whites, 35% answered “Most”, while only 7% of those in- terviewed by blacks gave this answer. Many questions were asked in this study. Only on some topics, particularly black-white trust or hostility, did the race of the interviewer have a strong effect on the answers given. The interviewer was a large source of non-sample error in this study.Respondents Respondents can also be a source of non-sampling error. They may refuse to answer ques- tions, or provide inaccurate information to protect themselves. They may have memory lapses and/or lack of motivation to answer the questionnaire, particularly if the ques- tionnaire is lengthy, overly complicated or of a sensitive nature. Respondent fatigue is a very important factor. Social desirability bias refers to the effect where respondents will provide answers which they think are more acceptable, or which they think the interviewer wants to hear. For example, respondents may state that they have a higher income than is actually the case if they feel this will increase their status.DBA6000: Quantitative Business Research Methods 29
  30. 30. Part 2. Data collection Respondents may refuse to answer a question which they find embarrassing or choose a response which prevents them from continuing with the questions. For example, if asked the question: “Are you taking oral contraceptive pills for any reason?”, and know- ing that if they respond “Yes” they will be asked for more details, respondents who are embarrassed by the question are likely to answer “No”, even if this is incorrect. Fatigue can be a problem in surveys which require a high level of commitment for respon- dents. The level of accuracy and detail supplied may decrease as respondents become tired of recording all information. Sometimes interviewer fatigue can also be a problem, particularly when the interviewers have a large number of interviews to conduct.Processing and collection Processing and collection errors can be a source of non-sampling error. For example, the results from the survey may be entered incorrectly . The time of year the survey is enumerated can produce non-sampling error. For example, if the survey is conducted in the school holidays, potential respondents with school children could possibly be away or hard to contact.The Shere Hite surveysIn 1987, Shere Hite published a best-selling book called Women and Love. The author distributed100,000 questionnaires through various women’s groups, asking questions about love, sex, andrelations between women and men. She based her book on the 4.5% of questionnaires that werereturned. • 95% said they were unhappily married • 91% of those who were divorced said that they had initiated the divorceWhat are the problems with this research? Exercise 1: In Case 2, it was necessary to study the ‘referral’ pattern for palliative care providers: how many patients they send to hospital (for inpatient or out- patient treatment); how many they refer to consultants for specialist comment; how many to community health programs; and so on. Two alternative sam- pling schemes are available: sample a group of palliative care practitioners and study their referral patterns; or sample a group of palliative care patients and study their referral patterns. Discuss the possible advantages and disad- vantages of the two schemes.2.4 Questionnaire design2.4.1 IntroductionThe purpose of a questionnaire is to obtain specific information with tolerable accuracy andcompleteness. Before the questionnaire is designed, the collection objectives should be defined.These include:DBA6000: Quantitative Business Research Methods 30
  31. 31. Part 2. Data collection • clarifying the objectives of the survey • determining who is to be interviewed • defining the content • justifying the content • prioritizing the data that are to be collected. This is important as it makes it easier to discard items if the survey, once developed, is too lengthy.Careful consideration should be given to the content, wording and format of the questionnaireas one of the largest sources of non-sampling error is poor questionnaire design. This error canbe minimized by considering the objectives of the survey and the required output, and thendevising a list of questions that will accurately obtain the information required.2.4.2 Content of the questionnaireRelevant questionsIt is important to ask only questions that are directly related to the objectives of a survey as ameans of minimizing the burden place on respondents. The concept of a fatigue point, which oc-curs when respondents can no longer be bothered answering questions, should be recognized,and questions designed so that the respondent is through the form before this point is reached.Towards the end of long questionnaires, respondents may give less thought to their answersand concentrate less on the instructions and questions, thereby decreasing the accuracy of in-formation they provide. Very long questionnaires can also lead the respondent to refuse tocomplete the questionnaire. Hence it is necessary to ensure only relevant questions are asked.Reliable questionsIt is important to include questions in a questionnaire that can be easily answered. This objec-tive can be achieved by adhering to the following techniques.Appropriate recall If information is requested by recall, the events should be sufficiently recent or familiar to respondents. People tend to remember what they should have done, have selective memories, and move into reference period activities which surround the event. Minimizing the need for recall improves the accuracy of response.Common reference periods To make it easier for the respondent to answer, use reference periods which match those of the respondent’s records.Results justify efforts The amount of effort to which a respondent goes to obtain the data must be worth it. It is reasonable to accept a respondent’s estimate when calculating the exact figures would make little difference to the outcome.Filtering Respondents should not be asked question they cannot answer. Filter questions should be asked to exclude respondents from irrelevant questions.DBA6000: Quantitative Business Research Methods 31
  32. 32. Part 2. Data collection2.4.3 Types of questionsFactual questions Information is required from these questions rather than an opinion. For example respon- dents could be asked about behaviour patterns (e.g., When did you last visit a General Practitioner?).Classification or demographic questions These are used to gain a profile of the population that has been surveyed and provide important data for analysis.Opinion questions Rather than facts, these questions seek opinion. There are many problems associated with opinion questions: • a respondent may not have an opinion/attitude towards the subject so the response may be provided without much thought; • opinion questions are very sensitive to changes in wording; • it is impossible to check the validity of responses to opinion questions.Hypothetical questions The “What would you do if . . . ?” type of question. The problems with these questions are similar to opinion questions. You can never be certain how valid any answer to a hypothetical is likely to be.2.4.4 Answer formatsQuestions can generally be classified as one of two types, open or closed, depending on theamount of freedom allowed in answering the question. When deciding which type of questionto use, consideration should be given to the kind of information sought, ease of processing theresponse, and the availability of the resources of time, money, and personnel.Open questionsOpen questions allow the respondents to answer the question in their own words. These ques-tions allow as many possible answers and they can collect exact values from a wide range ofpossible values. Hence, open questions are used when the list of responses is very long and notobvious.The major disadvantage of open questions is they are far more demanding than closed ques-tions both to answer and process. These questions are most commonly used where a widerange of responses is expected. Also, the answers to these questions depend on the respon-dents ability to write or speak as much as their knowledge. Two respondents might have thesame knowledge and opinions, but their answers may seem different because of their varyingabilities.DBA6000: Quantitative Business Research Methods 32
  33. 33. Part 2. Data collection Question Format Which country makes the best cars Open ended ............................................... Which country makes the best cars? Multiple choice questions 1. USA 2. Germany 3. Japan Which country makes the best cars? Partially closed questions 1. USA 2. Germany 3. Japan 4. Other (please specify) For the list provided, indicate which brand/s of Checklist questions cars you have owned? 1. Ford 2. Toyota 3. BMW I believe Japanese cars are less reliable than Likert scale (opinion) questions European cars. Strongly Agree Agree No opinion Disagree Strongly disagree 1 2 3 4 5Closed questionsClosed questions ask the respondents to choose an answer from the alternatives provided.These questions should be used when the full range of responses is known. Closed questionsare far easier to process than open questions. The main disadvantage of closed questions is thereasons behind a particular selection cannot be determined.There are a number of types of closed questions. • Limited choice questions require the respondent to choose one of two mutually exclusive answers. For example yes/no. • Multiple choice questions require the respondent to choose from a number of responses provided. • Checklist questions allow a respondent to choose more than one of the responses pro- vided. • Partially closed questions provide a list of alternatives where the last alternative is “Other, please specify”. These questions are useful when it is difficult to list all possible choices. • Opinion (Likert) scale An opinion scale question seeks to locate a respondent’s opin- ion on a rating scale with a limited number of points. For example, a five point scale measure of strong and weak attitudes would ask the respondent whether they strongly agree/agree/are neutral/disagree/strongly disagree with a particular statement of opin-DBA6000: Quantitative Business Research Methods 33
  34. 34. Part 2. Data collection ion. Whereas a three point scale would only measure whether they agree, disagree or are neutral. Opinion scales of this sort are called Likert scales. Five point scales are best because: – – –Response CategoriesWhen questions have categories provided, it is important that every response is catered for.Number of Categories The quality of the data can be influenced if there are too few categories as the respondent may have difficulty finding one which accurately describes their situation. If there are too many categories the respondent may also have difficulty finding one which accurately describes their situation.Don’t Know A ‘Don’t Know’ category can be included so respondents are not forced to make decisions/attitudes that they would not normally make. Excluding the option is not usu- ally good, however, it is hard to predict the effect of including it. The decision of whether or not to include a ‘Don’t Know’ option depends, to a large extent, on the subject matter. I was gifted to be able to answer promptly, and I did. I said I didn’t know. Mark Twain, Life on the Mountain2.4.5 Wording of questionsLanguageQuestions which employ complex or technical language or jargon can confuse or irritate re-spondents. Respondents who do not understand the question may be unwilling to appearignorant by asking the interviewer to explain the question or if a interviewer is not present,may not answer or answer incorrectly.AmbiguityIf ambiguous words or phrases are included in a question, the meaning may be interpreteddifferently by different people. This will introduce errors in the data since different respondentswill virtually be answering different questions.For example “Why did you fly to New Zealand on Qantas airlines?”. Most might interpretthis question as was intended, but it contains three possible questions, so the response mightconcern any of these: • I flew (rather than another mode of travel) because . . . • I went to New Zealand because . . . • I selected Qantas because . . .DBA6000: Quantitative Business Research Methods 34
  35. 35. Part 2. Data collectionDouble-barreled questionsWhen one question contains two concepts, it is known as a double-barreled question. Forexample , “How often do you go grocery shopping and do you enjoy it?”.Each concept in the question may have a different answer, or one concept may not be relevant,respondents may be unsure how to respond. The interpretation of the answers to these ques-tions is almost impossible. Double-barreled questions should be split into two or more separatequestions.Leading questionsQuestions which lead respondents to answers can introduce error. For example, the question“How many days did you work last week?”, if asked without first determining whether re-spondents did in fact take work in the previous week, is a leading question. It implies thatthe person would have been at work. Respondents may answer incorrectly to avoid telling theinterviewer that they were not working.Unbalanced questions“Are you in favour of euthanasia?” is an unbalanced question because is provides only one al-ternative. It can be reworded to ‘Do you favour or not favour euthanasia?’, to give respondentsmore than one alternative.Similarly, the use of a persuasive tone can affect the respondent’s answers. Wording should bechosen carefully to avoid a tone that may produce bias in responses.Recall/memory errorRespondents tend to remember what should have been done rather that what was done. Thequality of data collected from recall questions is influenced by the importance of the event tothe respondent and the length of time since the event took place. Subjects of greater interest orimportance to the respondent, or events which happen infrequently, will be remembered overlonger periods and more accurately. Minimizing the recall period also helps to reduce memorybias.Telescoping is a specific type of memory error. This occurs if the respondent reports eventsas occurring either earlier or later than they actually occur. Error occurs when respondentsincluded details of an event which actually occurred outside the specified reference period.Sensitive questionsQuestions on topics which respondents may see as embarrassing or highly sensitive can pro-duce inaccurate answers. If respondents are required to answer questions with informationthat might seem socially undesirable, they may provide the interviewer with responses theybelieve are more ‘acceptable’. If placed at the being of the questionnaire, it could lead to non-response if respondents are unwilling to continue with the remaining questions.For example, “Approximately how many cans of beer do you consume each week, on aver-age?” 1. NoneDBA6000: Quantitative Business Research Methods 35
  36. 36. Part 2. Data collection 2. 1–3 cans 3. 4–6 cans 4. More than 6A respondent might answer response 2 or 3 rather than admit to consuming the greatest quan-tity on the scale. Consider extending the range of choices far beyond what is expected. Therespondent can select an answer closer to the middle and feel more in the normal range. In 1980, the New York Times CBS News Poll asked a random sample of Americans about abortion. When asked “Do you think there should be an amendment to the Constitution prohibiting abortions, or should not there be such an amendment?” 29% were in favour and 62% were opposed. The rest of the sample were uncer- tain. The same people were later asked a different question: “Do you believe there should be an amendment to the Constitution protecting the life of the unborn child, or should not there be such an amendment?” Now 50% were in favour and only 39% were opposed.AcquiescenceThis situation arises when there is a long series of questions for which respondents answerwith the same response category. Respondents get used to providing the same answer andmay answer inaccurately.2.4.6 Questionnaire formatIncluding an introductionIt can be advantageous to include an introductory statement or explanation at the beginning ofa survey. The introduction may included such information as the purpose of the survey or thescope of collection. It will aid the respondent when answering the questions if they know whythe information is being sought. The respondent should be given a context in which to framehis or her answers. An assurance of confidentiality will provide respondents with confidencethat the results will not be obtained by unwanted parties.Question and page numbersTo ensure that the questionnaire can be easily administered by interviewer or respondents, thepages of the questionnaire and the questions should be number consecutively with a simplenumbering system. Question numbering is a way of providing sign-posts along the way. Theyhelp if remedial action is required later, and you want to refer the interviewer or respondentback to a particular place.SequencingThe questions in a questionnaire should follow an order which is logical and smoothly flowsfrom one question to the next. The questionnaire layout should have the following character-istics.DBA6000: Quantitative Business Research Methods 36
  37. 37. Part 2. Data collectionRelated questions grouped Questions which are related should be grouped together and where necessary placed into sections. Sections should contain an introductory heading or statement. If possible, question ordering should try and anticipate the order in which respondents will supply information. It shows good survey design if a question not only prompts an answer but also prompts an answer to a question following shortly.Question ordering It is important to be aware that earlier questions can influence the responses of later ques- tions, so the order of questions should be carefully decided. In attitudinal questions, it is important to avoid conditioning respondents in an early question which could then bias their responses to later questions. For example, you should ask about awareness of a concept before any other mention of the concept.Respondent motivationWhenever possible, start the questionnaire with easy and pleasant questions to promote inter-est in the survey and give the respondent confidence in their ability to complete the survey.The opening questions should ensure that the particular respondent is a member of the surveypopulation.Questions that are perceived as irritating or obtrusive tend to get a low response rate andmay effectively trigger a refusal from the respondent. These questions need to be carefullypositioned in a questionnaire where they are least likely to be sensitive.It is also important that respondents are only asked relevant questions. Respondents may be-come annoyed and disinterested if this does not occur. Include filter questions to direct re-spondents to skip to questions which do not apply to them. Filter questions often identifysub-populations. For example, “Do you usually speak English at home?” Yes (Go to Q34) No (Go to Q10)Questionnaire layoutThe questionnaire layout should be aesthetically pleasing, so the layout does not contribute torespondent fatigue. Things that can interfere with the answering of a questionnaire are: unclearinstructions and questions, insufficient space to provide answers, hard-to-read text, difficultyin understanding language, back-tracking through the form. Many of these things are bad formdesign and are avoidable.Only include essentials on the questionnaire form. Keep the amount of ink on the form to theminimum necessary for the form to work properly. Anything that is not necessary contributesto the fatigue point of the respondent and to the subsequent detriment of the data quality.DBA6000: Quantitative Business Research Methods 37
  38. 38. Part 2. Data collectionGeneral layoutConsistency of layout: If consistency and logical patterns are introduced into the form design, it eases the form filler’s task. Patterns that can be useful are: • white spaces for responses • using the same question type throughout the form • using the same layout throughout the form • using a different style, consistently, for instructions or directions.Type Size: A font size between 10 and 12 is considered the best in most circumstances. If the respondent does not have perfect vision, or ideal working conditions, small fonts can cause problems.Use of all upper-case text: It is best to avoid upper case text. Upper case text has been shown to be hard to read, especially where large amounts of text are involved. Words lose their shape when in upper case, becoming converted to rectangles. Text in upper case should be left for use for titles or for emphasis but, this can often be done just as well using other methods, such as bold, italics, or slightly larger type size.Line length: As the eye has a clear focus range of only a few degrees, lines should be kept short. It takes the eyeball several eye movements to scan a line of text. If more than 2 or 3 such movement occur then the eye can become fatigued. There is a tendency for the eye to lose track of which line it is reading. This leads to backtracking the text or misinterpretation.Character and line spacing: It is very important to leave enough space on a form for answers. It has been shown in research that forms requiring hand written responses need a distance of 7–8mm between lines and a 4–5mm width for each possible character.Response layoutObtaining responses: A popular way of obtaining responses is using tick boxes. However, it is usually preferable to use a labelled list (e.g., a, b, c, . . . ) and ask respondents to circle their response. This makes coding and data entry easier. If a written response is required it is best to provide empty answer spaces, with lines made up of dots.Positioning of responses: Vertical alignment of responses is preferred to horizontal alignment. It is easier to read up and down the list, and select the correct box, than read across the page and locate an item in a horizontal string. Captions to the left of the answer box are easier for respondents to complete.Order of response options: The consideration of the order of responses is important as the order can be a source of bias. The options presented first may be selected because they make an impact on respondents or because respondents lose concentration and do not hear or read the remaining options. The last options may be chosen because it was easily recalled, particularly if respondents are faced with a long list of options. Long or complex response options may also make recall more difficult and increase the effects due to the order ofDBA6000: Quantitative Business Research Methods 38
  39. 39. Part 2. Data collection response options.Prompt card: If the questionnaire is interviewer based, and a number of response options are given for some questions, then a prompt card may be appropriate. A prompt card is a list of possible responses to a question, displayed on a separate card which are shown by the interviewer to assist respondents. This helps to decrease error resulting from respondents being unable to remember all the options read out. However respondents with poor eyesight, migrants with limited English or adults with literacy problems will experience difficulties in answering accurately. Exercise 2: (Case 2) The questionnaire on pages 47–48 was an early draft of the questionnaire prepared by the client. The questionnaire on pages 49–51 is a later draft of the questionnaire after I had provided the client with some advice. See if you can determine why each of the changes has been made. How could you further improve the questionnaire?2.4.7 Pretesting the questionnaireA pretest of a questionnaire should be considered mandatory. Although the designer of thequestionnaire would have reviewed the drafted questionnaire meticulously on all points ofgood design, it is still likely to contain faults. Normally, a number of these emerge when theform is used in the field, because the researcher did not completely anticipate what would takeplace. The only way that these faults may be fully detected is by actually administering thesurvey with the types of respondents who would be sampled in the study.Each type of testing is used at a different stage of survey development and aims to test differentaspects of the survey.Skirmishing Skirmishing is the process of informally testing questionnaire design with groups of re- spondents. The questionnaire is basically unstructured and is tested with a group of people who can provide feedback on issues such as each question’s frame of reference, the level of knowledge needed to answer the questions, the range of likely answers to questions and how answers are formulated by respondents. Skirmishing is also used to detect flaws or awkward wording of questionnaires as well as testing alternative designs. At this stage we may use open-ended response categories to work-out likely responses. The questionnaire should be redrafted after skirmishing.Focus groups A skirmish tests the questionnaire design against general respondents whilst focus groups concentrate on a specific audience. For example, a survey studying the effects of living on unemployment benefits could have a group of unemployed people as a focus group. A focus group can be used to test questions directed at small sub-populations. For ex- ample if we were looking at community services we may have a filter question to target disabled people. Since there may not be many disabled chosen in the sample, we need to test the questions on a focus group of disabled people, which is a biased sample.DBA6000: Quantitative Business Research Methods 39
  40. 40. Part 2. Data collectionObservational studies Respondents complete a draft questionnaire in the presence of an observer during an observational study. Whilst completing the form the respondents explain their under- standing of the questions and the method required in providing the information. These studies can be a means of identifying problem questions through observations, questions asked by the respondents, or the time taken to complete a particular question. Data avail- ability and the most appropriate person to supply the information can also be gauged through observational studies. The form is being tested and not the respondent and this should be stressed to the respondent.Pilot testing Pilot testing involves formally testing a questionnaire or survey with a small represen- tative sample of respondents. Semi-closed questions are usually used in pilot testing to gather a range of likely responses which are used to develop a more highly structured questionnaire with closed questions. Pilot testing is used to identify any problems asso- ciated with the form, such as questionnaire format, length, question wording and allows comparison of alternative versions of a questionnaire.2.5 Data processingData processing involves translating the answers on a questionnaire into a form that can bemanipulated to produce statistics. In general, this involves coding, editing, data entry, andmonitoring the whole data processing procedure. The main aim of checking the various stagesof data processing is to produce a file of data that is as error free as possible.2.5.1 Data codingUp to this point, the questionnaire has been considered mainly as a means of communicationwith the respondent. Just as important, the questionnaire is a working document for the trans-fer of data on to a computer file. Consequently it is important to design the questionnaire tofacilitate data entry.Unless all the questions on a questionnaire are “closed” questions, some degree of coding isrequired before the survey data can be sent for punching. The appropriate codes should be de-vised before the questionnaires are processed, and are usually based on the results of pretesting.Coding consists of labelling the responses to questions (using numerical or alphabetic codes) inorder to facilitate data entry and manipulation. Codes should be formulated to be simple andeasy. For example if Question 1 has four responses then those four responses could be giventhe codes a, b, c, and d. The advantage of coding is the simplistic storage of data as a few-digitcode compared to lengthy alphabetical descriptions which almost certainly will not be easy tocategorize.Coding is relatively expensive in terms of resource effort. However, improvements are alwaysbeing sought by developing automated techniques to cover this task. Other options include theuse of self coding where respondents answer the appropriate code or the interviewer performsDBA6000: Quantitative Business Research Methods 40
  41. 41. Part 2. Data collectionthe coding task.Before the interviewing begins, the coding frame for most questions can be devised. That is, thelikely responses are obvious from previous similar surveys or thorough pilot testing, allowingthose responses and relevant codes to be printed on the questionnaire. An “Other (PleaseSpecify)” answer code is often added to the end of a question with space for interviewers towrite the answer. The standard instruction to interviewers in doubt about any precodes is thatthey should write the answers on the questionnaire in full so that they can be dealt with by acoder later.2.5.2 Data entryEnsure that the questionnaire is designed so data entry personnel have minimal handling ofpages. For example, all codes should be on the left (or right) hand side of the page. It isadvisable to use trained data entry people to enter the data. It is quicker and more reliable andtherefore more cost effective.2.6 Sampling schemesWhen you have a clear idea of the aims of the survey and the data requirements, the degree ofaccuracy required, and have considered the resources and time available, you are in a positionto make a decision about the size and the form of collection of sampling units.The two qualities most desired in a sample (besides that of providing the appropriate findings),are its representativeness and stability. Sample units may be selected in a variety of ways. Thesampling schemes fall into two general types: probability and non-probability methods.2.6.1 Non-probability samplesIf the probability of selection for each unit is unknown, or cannot be calculated, the sample iscalled a non-probability sample. For non-probability samples, since there is no control over rep-resentativeness of the sample, it is not possible to accurately evaluate the precision of estimates(i.e., closeness of estimates under repeated sampling of the same size). However, where timeand financial constraints make probability sampling infeasible, or where knowing the level ofaccuracy in the results is not an important consideration, non-probability samples do have arole to play. Non-probability samples are inexpensive, easy to run and no frame is required.This form of sampling is popular amongst market researchers and political pollsters as a lot oftheir surveys are based on a pre-determined sample of respondents of certain categories.One common method of non-probability sampling is voluntary response polling. A generalappeal is made (often via television) for people to contact the researcher with their opinion.Voluntary response samples are rarely useful because they over-represent people with strongopinions, most often negative opinion.DBA6000: Quantitative Business Research Methods 41
  42. 42. Part 2. Data collection2.6.2 Probability sampling schemesProbability sampling schemes are those in which the population elements have a known chanceof being selected for inclusion in a sample. Probability sampling rigorously adheres to a pre-cisely specified system that permits no arbitrary or biased selection. There are four main typesof probability sampling schemes.Simple Random Sample: If a sample size of size n is drawn from a population of size N in such a way that every possible sample of size n has the sample chance of being selected, the sampling procedure is called simple random sampling. The sample thus obtained is called a simple random sample. This is the simplest form of probability sample to analyse.Stratified Random Sample: A stratified random sample is one obtained by separating the pop- ulation elements into non-overlapping groups, called strata, and then selecting a simple random sample from each stratum. This can be useful when a population is naturally divided into several groups. If the results on each stratum vary greatly, then it is possi- ble to obtain more efficient estimators (and therefore more precise results) than would be possible without stratification.Systematic Sample: A sample obtained by randomly selecting one element from the first k el- ements in the frame and every kth element thereafter is called a 1-in-k systematic sample, with a random start. This is obviously a simple method if there is a list of elements in the frame. Systematic sampling will provide better results than simple random sampling when the systematic sample has larger variance than the population. This can occur when the frame is ordered.Cluster Sample: A cluster sample is a probability sample in which each sampling unit is a collection, or cluster, of elements. The population is divided into clusters and one or more of the clusters is chosen at random and sampled. Sometimes the entire cluster is sampled; on other occasions a simple random sample of the chosen clusters is taken. Cluster sampling is usually done for administrative convenience, and is especially useful if the population has a hierarchical structure.A comparison of these four sampling schemes appears in the table on the following page. Example (Case 2): A few years ago, I advised the Department of Health and Com- munity Services on a survey of palliative care patients in Victoria. Objective: To estimate the proportion of palliative care patients in Vic- torian hospitals. Difficulties: What is a “palliative care patient”? Proportion of what? Target population: Patients in acute beds at the time of the survey? Survey population: All patients in acute beds in Victorian hospitals except for very small (< 10 bed) country hospitals. Sampling scheme: Stratified (hospital types) and clustered (hospitals). Ran- dom selection of hospitals within each strata. Total cover- age of patients in the selected hospitals. Sample: All patients in the 18 hospitals selected out of 115 hospitals in Victoria.DBA6000: Quantitative Business Research Methods 42