Sample size in qualitative research Margarete Sandelowski


Published on

Published in: Education, Technology, Business
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Sample size in qualitative research Margarete Sandelowski

  1. 1. Research in Nursing & Health, 1995, 18, 179-1 83 Focus on Qualitative Methods Sample Size in Qualitative Research Margarete Sandelowski A common misconceptionabout sampling in qualitative research is that numbers are unimportant in ensuring the adequacy of a sampling strategy. Yet, simple sizes may be too small to support claims of having achieved either informational redundancy or theoretical saturation, or too large to permit the deep, case-orientedanalysis that is the raison-d’etreof qualitative inquiry. Determining adequate sample size in qualitative research is ultimately a matter of judgment and experience in evaluating the quality of the information collected against the uses to which it will be put, the particular research method and purposeful sampling strategy employed, and the research product intended. 0 1995 John Wiley & Sons. Inc. A common misconception about sampling in qualitative research is that numbers are unimportant in ensuring the adequacy of a sampling strategy. The “logic and power” (Patton, 1990, p. 169) of the various kinds of purposeful sampling used in qualitative research lie primarily in the quality of information obtained per sampling unit, as opposed to their number per se. Moreover, an aesthetic thrust of sampling in qualitative research is that small is beautiful. Yet, inadequate sample sizes can undermine the credibility of research findings. There are no computations or power analyses that can be done in qualitative research to determine a priori the minimum number and kinds of sampling units required, but there are factors, including the aim of sampling and the type of purposeful sampling and research method employed, which researchers can consider to help them decide whether they have col1 1 am indebted to one of the anonymous reviewers of this article for the phrasing “small is beautiful.” lected enough data. These factors are the subject of this article. NEITHER SMALL NOR LARGE, BUT TOO SMALL OR TOO LARGE Adequacy of sample size in qualitative research is relative, a matter of judging a sample neither small nor large per se, but rather too small or too large for the intended purposes of sampling and for the intended qualitative product. A sample size of 10 may be judged adequate for certain kinds of homogeneous or critical case sampling, too small to achieve maximum variation of a complex phenomenon or to develop theory, or too large for certain kinds of narrative analyses. Reported sample sizes are often too small to support claims of having achieved either informational redundancy (Lincoln & Guba, 1985) or theoretical saturation (Strauss & Corbin, 1990). Margarete Sandelowski, PhD, RN, is a professor, Department of Women’s and Children’s Health, School of Nursing, University of North Carolina at Chapel Hill. This article is part of the ongoing series, Focus on Qualitative Methods, edited or contributed by Dr. Sandelowski. This article was received on September 7, 1994, revised, and acceptedfor publication November 28, 1994. Requests for reprints should be addressed to Dr. Sandelowski, University of North Carolina at Chapel Hill, #7460 Carrington Hall, Chapel Hill, NC 27599-7460. 0 1995 John Wiley & Sons, Inc. CCC 0160-6891/95/020179-05 179
  2. 2. 180 RESEARCH IN NURSING 8 HEALTH Impatience, an a priori commitment to what will be seen, or a disinclination to see any more may incline researchers to stop sampling prematurely. Seeing nothing new in newly sampled units or feeling comfortable that a theoretical category has been saturated are functions involving the recognition of what is there and what can be made out of the data already collected, and then deciding whether it is sufficient to create an intended product. These functions are acquired through experience. For example, I have noticed in my own development and that of students with whom I have worked that beginning qualitative researchers often require more sampling units than more experienced researchers to “see” and to “make.” One expert qualitative researcher (P. Stern, personal communication, 1989) intimated that we often have all the data we will need in the very first pieces of data we collect, but that we do not (or cannot) know that until we collect more. Ultimately, information can be deemed redundant or theoretical lines deemed saturated-only for now (Morse, 1989). Conversely, sample sizes may be too large to support claims to having completed detailed analyses of data, especially the microanalysis demanded by certain kinds of narrative and observational studies. Even in qualitative projects aimed at explicating regularities across pieces of data, a high premium is still placed on discerning the particularities or idiosyncrasies presented by each piece of data. While qualitative studies may involve what are considered large sample sizes (over 50), qualitative analysis is generically about maximizing understanding of the one in all of its diversity; it is case-oriented, not variableoriented (Ragin & Becker, 1989). Any sample size interfering with the case-oriented thrust of qualitative work can, accordingly, be judged too large. ISSUES IN PURPOSEFUL SAMPLING One of the major differences between qualitative and quantitative research approaches is that qualitative approaches typically involve purposeful sampling, while quantitative approaches usually involve probability sampling (Kuzel, 1992; Morse, 1986, 1989; Patton, 1990). Patton (1990) described 14 different types of purposeful sampling, involving the selection for in-depth study of typical, atypical, or, in some way, exemplary “information-rich cases” (p. 169). Researchers in both domains of inquiry often have to resort to sampling they know is less than ideal for their purposes, but qualitative researchers value the deep understanding permitted by informationrich cases and quantitative researchers value the generalizations to larger populations permitted by random and statistically representative samples. Although a sample of one will never be sufficient to permit generalization of findings to populations, it may be sufficient to permit the valuable kind of generalizations that can be made from and about cases, variously referred to as idiographic, holographic, naturalistic, or analytic generalizations (Firestone, 1993; Lincoln & Guba, 1985; Ragin & Becker, 1992; Simons, 1980; Stake & Trumbull, 1982). In qualitative research, events, incidents, and experiences, not people per se, are typically the objects of purposeful sampling (Miles & Huberman, 1994; Strauss & Corbin, 1990). People, in addition to sites, artifacts, documents, and even data that have already been collected are sampled for the information they are likely to yield about a particular phenomenon. Sample size in qualitative research may refer to numbers of persons, but also to numbers of interviews and observations conducted or numbers of events sampled. People are certainly central in all kinds of inquiry approaches in the health sciences, but they enter qualitative studies primarily by virtue of having direct and personal knowledge of some event (e.g., illness, pregnancy, life transition) that they are able and willing to communicate to others and only secondarily by virtue of demographic characteristics (e.g., age, race, sex). People Versus Purpose When qualitative researchers decide to seek people out because of their age or sex or race, it is because they consider them good sources of information that will advance them toward an analytic goal and not because they wish to generalize to other persons of similar age, sex, or race. That is, a demographic variable, such as sex, becomes an analytic variable; persons of one or the other sex are selected for a study because, by virtue of their sex, they can provide certain kinds of information. Accordingly, only as many persons of a particular sex are included in a study as is necessary to obtain that information. There is no mandate to have equivalent numbers of women or men or numbers of persons of each sex in the proportions in which they appear in a certain population. Sampling on the basis of demographic characteristics presents something of a problem in achieving both informational and size adequacy
  3. 3. SAMPLE SIZE / SANDELOWSKI in qualitative studies. There is currently a strong impulse (and federal mandate) to eliminate gender, race/ethnicity, and class bias in research by including members of minority or traditionally disempowered groups typically underrepresented in research, and by including women and men typically underrepresented in certain domains of research, such as men in family studies and women in studies of heart disease. Trost (1986) described a “statistically nonrepresentative stratified” sampling strategy whereby researchers can select persons varying in demographic characteristics to achieve representative coverage and inclusion. That is, while the sample is statistically nonrepresentative, it is informationally representative in that data will be obtained from persons who can stand for other persons with similar characteristics. In her illustration involving a study of families with teenagers, five sets of naturally and artificially dichotomized variables (one or two-parent family, one or two or more children, housed in an apartment or home, with a high or low income, and with a male or female teenager) were combined to yield 32 kinds of families to be sampled. A similar kind of sampling plan can be used to ensure inclusion of females and males, and persons varying in social class, race, cultural affiliation, religion, or other dimension. Although this kind of sampling accommodates a new, laudable, and necessary moral consciousness concerning underrepresented and, therefore, often misrepresented groups by partially accommodating the logic of probability sampling, it may wholly contravene the logic of purposeful sampling. Strictly speaking, sampling for variation in race, class, gender, or other such background or person-related characteristics ought to be done in qualitative studies when they are deemed analytically important and where the failure to sample for such variation would impede understanding or invalidate findings (Cannon, Higginbotham, Leung, 1988). Deciding a priori that a sample will include a certain number or percentage of individuals in various demographic groups may meet federal and other mandates for inclusion of traditionally excluded persons, but it may also result in a sample with a kind of variation that has little analytic significance or detracts from analysis goals (Morse, 1989). More importantly, such a sample may be too small adequately to address the analytic importance of such factors as gender or race, or, alternatively, too large to favor the deep analysis that qualitative projects mandate. One way to resolve this dilemma is to design 181 studies in which a phenomenon is investigated in one group at a time (either simultaneously or sequentially). The design for such studies will include more than one purposeful sampling strategy: for example, homogeneous and maximum variation sampling, where person-related homogeneity is maintained while variation in the target phenomenon is sought. After a series of such studies has been completed, a larger synthesis of findings can be undertaken in which the researcher can more adequately address the question of whether and how a variable such as gender is important in understanding a phenomenon. SAMPLE SIZE IN DIFFERENT KINDS OF PURPOSEFUL SAMPLING Different kinds of purposeful sampling require different minimum sample sizes. For example, in deviant case sampling, where the intention is to understand a very unusual or atypical manifestation of some phenomenon, one case may be sufficient. Yet, even a sample of one requires withincase sampling (Miles & Huberman, 1994). The researcher must decide which of the varieties of data concerning the case to sample to explicate its atypicality. This is especially evident in cases involving aggregates of one, such as a family, community, or organization. Even when an individual is the focal one, the researcher must sample from the wealth of data obtainable from and about that individual. In short, any one case offers a variety of data that must be sampled in sufficient quantity to make the case. Maximum variation is one of the most frequently employed kinds of purposeful sampling in qualitative nursing research and typically requires the largest minimum sample size of any of the purposeful sampling strategies. As in any kind of sampling, the more variability there is within the confines of a qualitative project, the more numbers of sampling units the researcher will require to reach informational redundancy or theoretical saturation. Researchers wanting maximum variation in their sample must decide what kind(s) of variation they want to maximize and when to maximize each kind. One kind of variation already described is demographic variation, where variation is sought on generally peoplerelated characteristics. A second kind of variation is phenomenal variation, or variation on the target phenomenon under study. For example, the target phenomenon in a study of couples who have obtained positive fetal diagnoses is diagnosis, which varies on such
  4. 4. 182 RESEARCH IN NURSING B HEALTH dimensions as type and time of diagnosis, and the instrumentation used to make it. Like the decision to seek demographic variation, the decision to seek phenomenal variation is often made a priori in order to have representative coverage of variables likely to be important in understanding how diverse factors configure a whole. This kind of sampling is also referred to as selective or criterion sampling, where sampling decisions are made going into a study on “reasonable” grounds, rather than on analytic grounds after some data have already been collected (Glaser, 1978, p. 37; Schatzman & Strauss, 1973). A third kind of variation is theoretical variation, or variation on a theoretical construct that is associated with theoretical sampling, or the sampling on analytic grounds characteristic of grounded theory studies. A theoretical sampling strategy is employed to fully elaborate and validate theoretically derived variations discerned in the data. Initial sampling for phenomenal variation permits these theoretical variations to be identified. A program of research employing grounded theory typically begins with a selective or criterion sampling strategy aimed at phenomenal variation and then proceeds to theoretical sampling (Sandelowski, Holditch-Davis, & Hams, 1992). Researchers control the number of sampling units required to achieve informational redundancy or theoretical saturation by deciding which category of variation to maximize and minimize. This decision is a matter of fitting the sampling strategy to the purpose of and method chosen for a particular study and appraising the resources (including number of investigators and financial support) available to conduct the study. For example, purposeful sampling for demographic homogeneity and selected phenomenal variation is a way a researcher working alone with limited resources can reduce the minimum number of sampling units required within the confines of a single research project, but still produce credible and analytically and/or clinically significant findings. SAMPLE SIZES FOR DIFFERENT QUALITATIVE METHODS Just as different purposeful sampling strategies require different minimum sample sizes, different qualitative methods require different minimum sample sizes. Morse ( 1994) has recommended that phenomenologies directed toward discerning the essence of experiences include about six participants, ethnographies and grounded theory studies, about 30 to 50 interviews and/or obser- vations, and qualitative ethological studies, about 100 to 200 units of observation. Additional considerations in matching sample size to method are within-method diversity and the multiple uses of a method. Phenomenology offers a good illustration of how within-method diversity and the particular use to which a method is put can alter the requirements for sample size. In a phenomenological case study, one case can be sufficient to show something about an experience that a researcher deems significant for special display (e.g., Wertz, 1983). One case will not be sufficient, however, if the researcher’s intention is to describe invariant or essential features of an experience. For example, a phenomenological study, as interpreted by Van Kaam (1959), will likely require 10 to 50 descriptions of a target experience in order to discern its necessary and sufficient constituents. When phenomenological techniques are used in the service of a goal other than to produce a phenomenology, such as generating items for an instrument, at least 25 descriptions of an experience will likely be required. SAMPLE SIZES IN COMBINED QUALITATIVE AND QUANTITATIVE STUDIES Studies combining qualitative and quantitative approaches involve additional considerations in determining sufficient sample size. Indeed, socalled methodologically triangulated studies present researchers with many dilemmas (beyond the scope of this article), the resolution of which depend on the researcher’s stance concerning the compatibility of the philosophies and practices of qualitative and quantitative inquiry. With respect to sampling, the logics of probability and purposeful sampling are arguably sufficiently irreconcilable in most cases to preclude using the same subjects for both quantitative and qualitative purposes (Morse, 1991). Subjects selected for the purposes of statistical representativeness may not fulfill the informational needs of the study, while participants selected for information purposes do not meet the requirement of statistical representativeness. Accordingly, whether primarily quantitative or qualitative, or whether designed for purposes of completeness or confirmation (Breitmayer, Ayres, & Knafl, 1993), such combination studies would require two samples drawn simultaneously or sequentially according to the two logics of sampling.
  5. 5. SAMPLE SIZE I SANDELOWSKI Yet, it can also be argued that among persons chosen according to the logic of probability sampling, there will likely be articulate informants whose selection for the qualitative portion of a combined study can be justified as purposeful. The purposeful sample would have to be expanded only if the data obtainable from the participants already sampled was deemed informationally insufficient. Similarly, no additional sampling may be necessary in studies where further information obtainable from standardized instruments is desired about a purposefully drawn sample. The caveat here is that the researcher use the data from these instruments for purposes of fuller description, rather than to draw statistical inferences. CONCLUSION Determining an adequate sample size in qualitative research is ultimately a matter of judgment and experience in evaluating the quality of the information collected against the uses to which it will be put, the particular research method and sampling strategy employed, and the research product intended. Numbers have a place in ensuring that a sample is fully adequate to support particular qualitative enterprises. A good principle to follow is: An adequate sample size in qualitative research is one that permits-by virtue of not being too large-the deep, case-oriented analysis that is a hallmark of all qualitative inquiry, and that results in-by virtue of not being too small-a new and richly textured understanding of experience. REFERENCES Breitmayer, B. J., Ayres, L., & Knafl, K. A. (1993). Triangulation in qualitative research: Evaluation of completeness and confirmation purposes. Image: Journal of Nursing Scholarship, 25, 237-243. Cannon, L. W., Higginbotham, E., & Leung, M. L. (1988). Race and class bias in qualitative research on women. Gender & Society, 2 , 449-462. Firestonc, W. A. (1993). Alternative arguments for generalizing from data as applied to qualitative research. Educational Researcher, 22, 16-23, Glaser, B. G. ( 1978). Theoretical sensitivity: Advances in the methodology o grounded theory. Mill Valley, f CA: Sociology Press. Kuzel, A. J. (1992). Sampling in qualitative inquiry. In B. F. Crabtree & W. L. Miller (Eds.), Doing qualitative research (pp. 31-44). Newbury Park, CA: Sage. 183 Lincoln, Y. S . , & Cuba, E. G. (1985). Naturalistic inquiry. Beverly Hills, CA: Sage. Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed). Thousand Oaks, CA: Sage. Morse, J. M. (1986). Quantitative and qualitative research: Issues in sampling. In P. L. Chinn (Ed.), Nursing research methodology: Issues and implementation (pp. 181-193). Rockville, MD: Aspen. Morse, J. M. (1989). Strategies for sampling. In J. M. Morse (Ed.), Qualitative nursing research: A contemporary dialogue (pp. 1 17- I3 I). Rockville, MD: Aspen. Morse, J. (1991). Approaches to qualitativequantitative methodological triangulation. Nursing Research. 40. 120-123. Morse, J. M. (1994). Designing funded qualitative research. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 220-235). Thousand Oaks, CA: Sage. Patton, M. Q. (1990). Qualitative evaluation and research methods (2nd ed). Newbury Park, CA: Sage. Ragin, C. C., & Becker, H. S. (1989). How the microcomputer is changing our analytic habits. In G. Blank, J. L. McCartney, & E. Brent (Eds.), New technology in society: Practical applications in research and work (pp. 47-55). New Brunswick, NJ: Transaction. Ragin, C. C., & Becker, H. S. (1992). Whar is a case? Exploring the foundations of social inquiry. Cambridge: Cambridge University Press. Sandelowski, M., Holditch-Davis, D., & Harris, B. G. (1992). Using qualitative and quantitative methods: The transition to parenthood of infertile couples. In J. F. Gilgun, K. Daly, & G. Handel (Eds.), Qualitative methods in family research (pp. 301-322). Newbury Park, CA: Sage. Schatzman, L., & Strauss, A. (1973). Field research: Strategies for a natural sociology. Englewood Cliffs, NJ: Prentice-Hall. Simons, H. (Ed.). (1980). Towards a science of the singular: Essays about case study in educational research and evaluation. Norwich: University of East Anglia, Center for Applied Research in Education. Stake, R. E., & Trumbull, D. J. (1982). Naturalistic f generalizations. Review Journal o Philosophy and Social Science, 7 , 1-12. Strauss, A,, & Corbin, J. (199). Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park, CA: Sage. Trost, J. E. (1986). Statistically nonrepresentative stratified sampling: A sampling technique for qualitative studies. Qualitative Sociology, 9, 54-57. Van Kaam, A. L. (1959). Phenomenal analysis: Exemplified by a study of the experience of “really feeling understood.” Journal of Individual Psychology, 15, 66-72. Wertz, F. J. (1983). From everyday to psychological description: Analyzing the moments of a qualitative data analysis. Journal of Phenomenological Psychology, 14, 197-241.