In statistics, we’re often interested in using samples to draw inferences about populations through hypothesis tests or confidence intervals.
Most of the formulas that we use in hypothesis tests and confidence intervals make the assumption that a given sample roughly follows a normal distribution.
However, in order to safely make this assumption we need to make sure our sample size is large enough. Specifically, we need to make sure that the Large Sample Condition is met.
The Large Sample Condition: The sample size is at least 30.
Note: In some textbooks, a “large enough” sample size is defined as at least 40 but the number 30 is more commonly used.
When this condition is met, it can be assumed that the sampling distribution of the sample mean is approximately normal. This assumption allows us to use samples to draw inferences about the populations from which they came from.
The reason why the number 30 is used is based upon the Central Limit Theorem. You can read more about that in this blog post.
Example: Verifying the Large Sample Condition
Suppose a certain machine creates crackers. The distribution of the weight of these cookies is skewed to the right with a mean of 10 ounces and a standard deviation of 2 ounces. If we take a simple random sample of 100 cookies produced by this machine, what is the probability that the mean weight of the cookies in this sample is less than 9.8 ounces?
In order to answer this question, we can use the Normal CDF Calculator, but we first need to verify that the sample size is large enough in order to assume that the distribution of the sampling mean is normal.
In this example, our sample size is n = 100, which is much larger than 30. Despite the fact that the true distribution of the weight of the cookies is skewed to the right, since our sample size is “large enough” we can assume that the distribution of the sampling mean is normal. Thus, we would be safe to use the Normal CDF Calculator to solve this problem.
Modifications to the Large Sample Condition
Often a sample size is considered “large enough” if it’s greater than or equal to 30, but this number can vary a bit based on the underlying shape of the population distribution.
In particular:
- If the population distribution is symmetric, sometimes a sample size as small as 15 is sufficient.
- If the population distribution is skewed, generally a sample size of at least 30 is needed.
- If the population distribution is extremely skewed, then a sample size of 40 or higher may be necessary.
Depending on the shape of the population distribution, you may require more or less than a sample size of 30 in order for the Central Limit Theorem to apply.
Additional Resources
Introduction to the Central Limit Theorem
Introduction to Sampling Distributions