Central Limit Theorem (CLT)
What the CLT says
The Central Limit Theorem states that, for a large enough sample size, the distribution of the sample mean (or sum) of independent observations will be approximately normal (bell-shaped), regardless of the population’s original distribution. The sample mean tends to cluster around the true population mean as sample size increases.
In symbols (for independent, identically distributed samples with finite mean μ and variance σ²):
– Mean of sample mean: E[X̄] = μ
– Variance of sample mean: Var(X̄) = σ² / n
– Approximation: X̄ ≈ Normal(μ, σ² / n) for large n
The standard error of the mean is σ / √n (or s / √n when σ is unknown).
Explore More Resources
Key assumptions and components
- Random sampling: Every unit has an equal chance of selection.
- Independence: Observations do not influence each other.
- Identical (or comparable) distributions: Samples are drawn under the same conditions.
- Finite variance: The population variance should be finite.
- Large sample size: The approximation improves as n grows; more samples produce a clearer normal shape.
Note: More advanced versions of the CLT relax strict identical-distribution requirements (e.g., Lindeberg–Feller conditions), but some form of independence and finite variance is still required.
Practical rule of thumb
A common guideline is n ≥ 30 for the sample-mean distribution to be reasonably close to normal. This is only a rule of thumb:
– For mildly skewed populations, smaller n may suffice.
– For heavy-tailed or strongly skewed distributions, a much larger n may be needed.
Explore More Resources
Why it matters
The CLT underpins much of statistical inference because it justifies:
– Using normal-based confidence intervals and hypothesis tests for means.
– Approximating sampling distributions with the normal distribution even when the population is non-normal.
This simplifies analysis and decision-making in many fields—science, engineering, economics, and finance.
Application in finance
Investors and analysts often apply the CLT to estimate average returns and risks from samples of securities rather than the entire population. For example, to estimate an index’s average return, one might sample a randomly selected subset of stocks (often tens of stocks across sectors). The CLT supports treating the sample-mean distribution as approximately normal for constructing confidence intervals and assessing uncertainty.
Explore More Resources
Simple analogy (Explain like I’m 5)
Imagine scooping handfuls of candy from a large jar. Each handful has a different average size, but if you keep taking many random handfuls and plot those averages, the plot will form a bell-shaped curve centered near the jar’s true average candy size. That’s the CLT in action.
Relation to the Law of Large Numbers
- Law of Large Numbers (LLN): As sample size increases, the sample mean converges to the population mean (consistency).
- CLT: Describes the shape (distribution) of the sample mean around that population mean and quantifies its variability (standard error).
Common questions
- Does the CLT require the population to be normal? No. The population can be non-normal; the CLT concerns the distribution of the sample mean as n becomes large.
- What if observations aren’t independent? Independence (or weak dependence) is important; strong dependence can invalidate the CLT.
- What if the population variance is unknown? Use the sample standard deviation to estimate the standard error; for small n, t-distributions are appropriate.
Bottom line
The Central Limit Theorem explains why averages of large samples tend to be normally distributed and why sample means reliably estimate population means. It is a foundational result that enables the widespread use of normal-approximation methods in statistics and applied fields.
Explore More Resources
Further reading
- Boston University School of Public Health — “Central Limit Theorem”
- University of Massachusetts Amherst — “What Is Central Limit Theorem? Properties, Best Practices, Examples”
- Emory University — “Final Summary: The Central Limit Theorem”
- Historical overview: A. de Moivre (1733); later formal developments summarized in texts on the history of the CLT