Cluster sampling is a type of sampling method in which we split a population into clusters, then randomly select some of the clusters and include all members from those clusters in the sample.
For example, suppose a company that gives whale-watching tours wants to survey its customers. Out of ten tours they give one day, they randomly select four tours and ask every customer about their experience.
This is an example of cluster sampling.
An extension of this is known as two-stage cluster sampling, which uses the following steps:
Step 1: Split a population into clusters, then randomly select some of the clusters.
Step 2: Within each chosen cluster, randomly select some of the members to be included in the survey.
For example, the whale-watching company may randomly select four tours and then within each of those tours they may randomly select a subset of customers to be included in the survey.
In this particular example, we randomly select four clusters and then within each cluster we randomly select four out of the seven customers to be included in the survey.
Why Use Two-Stage Cluster Sampling?
The benefit of cluster sampling is that it offers a far more convenient way to collect a sample compared to other probability sampling methods, especially when members of a population are spread out across a wide geographical area.
Two-stage cluster sampling takes this a step further by only including some members from each randomly selected cluster to be in the final sample.
For example, suppose we would like to conduct a survey on the opinions of teachers in California about a certain topic.
Since California is so massive, it could be helpful to first break up the state into clusters (perhaps “counties”) and then randomly select teachers from only certain schools within each county to be included in the survey.
This approach allows us to gather a sample much more quickly compared to taking a simple random sample of all teachers in California.
And because cluster sampling is a probability sampling method (i.e. each member of the target population has an equally likely chance of being included in the sample), it has a high probability of producing a sample that is representative of the overall population.
Bonus: For a real-life example of two-stage cluster sampling, check out this study that used two-stage cluster sampling to obtain a sample to estimate mortality rates in Iraq.
Additional Resources
Introduction to Sampling Methods
What is Multistage Sampling?
What is Sampling Variability?