Sampling Distribution
A sampling distribution describes the probability distribution of a statistic (for example, a sample mean or sample proportion) calculated from repeated random samples drawn from the same population. It shows how that statistic varies from sample to sample and is the basis for making statistical inferences about the population.
Key takeaways
- Researchers use samples because measuring an entire population is usually impractical.
- A sampling distribution summarizes the variability of a statistic across many samples.
- The mean of the sampling distribution of the sample mean equals the population mean.
- The standard deviation of a sampling distribution is called the standard error and shrinks as sample size increases.
How sampling distributions work
To build a sampling distribution:
1. Draw a random sample from the population.
2. Compute a statistic of interest (mean, proportion, standard deviation, etc.) for that sample.
3. Repeat the process many times and record the statistic from each sample.
4. Analyze the distribution of those sample statistics.
Explore More Resources
This distribution quantifies the range and likelihood of possible sample results and underpins hypothesis tests and confidence intervals used to draw population-level conclusions.
Special considerations
- Standard error: the standard deviation of the sampling distribution. For the sample mean, standard error = population standard deviation / sqrt(sample size) (or an estimate using the sample standard deviation when the population parameter is unknown).
- Bias and variability depend on sampling method, sample size, and population variability. Larger, well-randomized samples reduce standard error and improve the reliability of sample-based estimates.
- The sampling distribution’s center often equals the true population parameter (unbiased estimator), but sampling method or measurement error can introduce bias.
Example: comparing average birth weights
Suppose researchers want to compare average newborn weights between two regions but cannot measure every birth. They repeatedly draw random samples (e.g., many samples of 100 births) from each region and compute the sample mean for each draw. The collection of those sample means forms the sampling distribution of the mean for each region. Comparing those sampling distributions lets researchers assess whether observed differences are likely to reflect true population differences or sampling variability.
Explore More Resources
Common types of sampling distributions
- Sampling distribution of the mean: distribution of sample means from repeated samples. Central to many inference methods.
- Sampling distribution of a proportion: distribution of sample proportions from repeated samples; used for categorical outcomes.
- t-distribution (sampling distribution of the t-statistic): used when sample sizes are small and the population standard deviation is unknown; it accounts for additional uncertainty in the estimate of variability.
Shape and the Central Limit Theorem
The shape of a sampling distribution depends on the statistic, sample size, and population distribution. The Central Limit Theorem states that, for sufficiently large sample sizes, the sampling distribution of the sample mean will be approximately normal (bell-shaped), regardless of the population’s shape. For small samples or certain statistics, the sampling distribution may differ and alternative distributions (like the t-distribution) may be more appropriate.
Why sampling distributions matter
Sampling distributions allow researchers to:
* Quantify uncertainty in sample-based estimates.
Construct confidence intervals for population parameters.
Perform hypothesis tests to evaluate claims about populations.
* Make data-driven decisions when measuring entire populations is infeasible.
Explore More Resources
Brief note on mean and other statistics
The arithmetic mean is the sum of values divided by their count and is the most common statistic studied via sampling distributions. Other statistics—median, variance, proportion, range—also have sampling distributions that describe their variability across repeated samples.
Conclusion
Sampling distributions translate repeated-sample behavior into probabilistic statements about statistics derived from samples. Understanding them — and concepts like standard error and the Central Limit Theorem — is essential for reliable statistical inference and sound decision-making based on sample data.