UCLA UCLA Previously Published Works Title On the Concept of Snowball Sampling Permalink https://escholarship.org/uc/item/90p8j560 Authors Handcock, MS Gile, KJ Publication Date 2016-10-25 Peer reviewed escholarship.org Powered by the California Digital Library University of California
13 COMMENT: ON THE CONCEPT OF SNOWBALL SAMPLING Mark S. Handcock* Krista J. Gile The need for notes by Goodman (2011) and Heckathorn (2011) reflects a phenomenon in the sociology of science: that multidisciplinary fields tend to produce a plethora of inconsistent terminology. Often the meaning of a term evolves over time, or different terms are used for the same concept. More confusing is the use of the same term for different concepts. As the two notes point out, the term snowball sampling suffers from this treatment. The term snowball sampling has likely been in informal use for a long time, but it certainly predates Coleman (1958) and Trow (1957). The earliest systematic work dates to the 1940s from the Columbia Bureau of Applied Social Research, led by Paul Lazarsfeld. The bureau became interested in the empirical study of personal influence via media (Barton 2001). This led to the consideration of interpersonal environments and to the identification of opinion leaders and followers. However, standard sampling of individuals was regarded as ineffective in studying the relations between opinion leaders and followers as pairs related in this way were seldom both selected in the sample (Lazarsfeld et al. 1944:49 50). To address this, Robert Merton asked individuals in an initial diverse sample to name the people who influenced them. From these, a second wave of influential people were interviewed as a *University of California, Los Angeles University of Massachusetts, Amherst 367
368 HANDCOCK AND GILE snowball sample (Merton 1949). This approach was expanded in a panel survey of women in a Midwestern town in 1945 (Katz and Lazarsfeld 1955). Barton (2001) provides a history of the work of the bureau that is still relevant to today s study of social media. Trow s objective was to understand the support for antidemocratic popular movements. To do this he conducted an empirical study of the political orientations and behaviors of men in Bennington, Vermont, in 1954 with particular focus on their support for Senator McCarthy. Trow conducted a snowball sample over the friendship networks of the men starting from arbitrarily chosen lists of employees and occupational groups (Trow 1957:297). He is very clear that this does not produce a representative sample, and goes on to provide a discussion of the issues with network sampling that is still relevant today (Trow 1957:290 95). He surmises: The resulting sample, while not meant to be representative of any specific population, nevertheless includes representatives of all the important occupational groups,.... Following on from these foundations, Coleman, Katz, and Menzel (1957) used the approach to collect information on influence patterns among physicians. Coleman (1958) is now the primary reference for the meaning of snowball sampling. He defines it in this way: Snowball sampling: One method of interviewing a man s immediate social environment is to use the sociometric questions in the interview for sampling purposes and describes Trow s work as the example. Acknowledging Coleman (1958), Goodman (1961) introduced s stage k name snowball sampling, a specific form of snowball sampling. Goodman s formulation requires an initial sample drawn using a probability method on a known sampling frame. It also fixes parameters of the sampling process: the number of links followed from each participant (k) and the number of waves of the sample (s). In this work, Goodman develops a rigorous statistical approach to estimating certain relational features (number of mutual ties, triangles, etc.) based on the resulting sample. Just as Lazarsfeld and colleagues followed links because they were interested in studying, and therefore sampling, relationships rather than individuals, Goodman s use of link-tracing is motivated by improvements in efficiency allowed by oversampling relations most likely involved in the structures he is studying. More recently, the term snowball sampling has been taken to refer to a convenience sampling mechanism with motivation more like
COMMENT 369 that of Trow: collecting a sample from a population in which a standard sampling approach is either impossible or prohibitively expensive, for the purpose of studying characteristics of individuals in the population (e.g., Biernacki and Waldorf 1981). Such settings are often hard-toreach populations, characterized by the lack of a serviceable sampling frame. In such cases, an initial probability sample is either impossible or impractical, such that the initial sample is drawn by a convenience mechanism, dooming the full sample to nonprobability sample status. In many such hard-to-reach populations, link-tracing sampling is an effective means of collecting data on population members. For this reason, this latter nonprobabilistic usage of snowball sampling is most common in practice, although less common in the statistical literature, which favors the probabilistic formulations. The tension between these two uses of snowball sampling is highlighted in Thompson (2002), a definitive textbook: The term snowball sampling has been applied to two types of procedures related to network sampling. In one type..., a few identified members of a rare population are asked to identify other members of the population, those so identified are asked to identify others, and so on, for the purpose of obtaining a nonprobability sample or for constructing a frame from which to sample. In the other type (Goodman 1961), individuals in the sample are asked to identify other individuals, for a fixed number of stages, for the purpose of estimating the number of mutual relationships or social circles in the population (p. 183). Other definitions of snowball sampling are consistent with this duality in usage (Snijders 1992:59). Respondent-driven sampling (RDS, introduced by Heckathorn and colleagues, e.g. Heckathorn 1997) is a newer variant of link-tracing network sampling, which brings to a head the tension between these two usages. This is because RDS is a practical sampling method in hardto-reach populations, beginning with a convenience sample, but it aims to approximate a probability sample over time. Note that it is possible
370 HANDCOCK AND GILE for the seeds in RDS to be chosen randomly even in applications to hard-to-reach populations. For example, they could be selected based on a spatial sampling frame. RDS is not a variant of either usage of snowball sampling, nor is the reverse true. Because of the confusion surrounding this term, in Gile and Handcock (2010) we prefer, and use throughout that paper, the more precise broad category link-tracing sampling while paying homage to the intellectual descent of the methods from snowball sampling. It is precisely the tension between the two usages of snowball sampling that makes RDS a fruitful area for ongoing research. RDS pairs the practical implementation of a convenience sample with the hope of recovering something like a probability sample. Gile (2008) and Gile and Handcock (2010) are the first works to systematically evaluate the statistical properties of current estimators based on RDS data. Gile (2011) proposes a new estimator that adjusts for the bias introduced by the with-replacement assumption of these estimators. It is also sometimes possible to adjust for a convenience sample of seeds. For example, Gile and Handcock (2011) extend the estimator of Gile (2011) to correct for the bias introduced by seed selection in the presence of homophily. The issue here, then, is to recognize the different uses of the term snowball sampling. A good solution is for scientists to be as clear as possible in defining the meaning of terms upon first use in each manuscript. There is enough confusion in the various literatures to make this good practice. REFERENCES Barton, Allen. 2001. Paul Lazarsfeld as Institutional Investor. International Journal of Public Opinion Research 13:245 69. Biernacki, Patrick, and Dan Waldorf. 1981. Snowball Sampling: Problem and Techniques of Chain Referral Sampling. Sociological Methods and Research 10:141 63. Coleman, James S. 1958. Relational Analysis: The Study of Social Organizations with Survey Methods. Human Organization 17:28 36. Coleman, James S., Elihu Katz, and Herbert Menzel. The Diffusion of an Innovation Among Physicians. Sociometry 20:253 70. Gile, Krista J. 2008. Inference from Partially-Observed Network Data. PhD dissertation, Department of Statistics, University of Washington, Seattle.
COMMENT 371. 2011. Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation. Journal of the American Statistical Association 106 (493):135 46. Gile, Krista J., and Mark S. Handcock. 2010. Respondent-Driven Sampling: An Assessment of Current Methodology. Pp. 285 327 in Sociological Methodology, vol. 40, edited by Tim Futing Liao. Hoboken, NJ: Wiley-Blackwell.. 2011. Network Model-Assisted Inference from Respondent-Driven Sampling Data. Available at http://arxiv.org/abs/1108.0298. Goodman, Leo A. 1961. Snowball Sampling. Annals of Mathematical Statistics 32:148 70. Heckathorn, Douglas D. 1997. Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations. Social Problems 44:174 99. Katz, Elihu, and Paul F. Lazarsfeld. 1955. Personal Influence. New York: Free Press, 1955. Lazarsfeld, Paul F., Bernard Berelson, and Hazel Gaudet. 1944. The People s Choice: How the Voter Makes Up His Mind in a Presidential Campaign. New York: Duell, Sloan and Pearce. Merton, Robert K. 1949. Patterns of Influence: A Study of Interpersonal Influence and Communications Behavior in a Local Community. Pp. 180 219 in Communications Research, 1948 49, edited by Paul F. Lazarfeld and Frank Stanton New York: Harper Snijders, Thomas A. B. 1992. Estimation on the Basis of Snowball Samples: How to Weight? Bulletin Methodologie Sociologique 36:59 70. Thompson, Steven K. 2002. Sampling.2nded.NewYork:Wiley. Trow, Martin. [1957] 1980. Right-Wing Radicalism and Political Intolerance. New York: Arno Press.