Confidence Interval
Overview
A confidence interval (CI) is a range of values, derived from sample data, that is likely to contain an unknown population parameter (commonly the mean). A CI expresses both an estimate and its uncertainty: for example, a 95% CI of 9.50 to 10.50 means you can be 95% confident the true population mean lies within that range.
Key ideas
- Confidence level: the long-run proportion of CIs that will contain the true parameter if you repeat the sampling procedure many times (common levels: 90%, 95%, 99%).
- Interval width: higher confidence levels produce wider intervals; larger samples produce narrower intervals.
- Interpretation: a 95% confidence level does not mean there’s a 95% probability the true value lies in a single computed interval. It means 95% of such intervals constructed from repeated samples would contain the true value.
How it works
- Take a random sample and compute a point estimate (e.g., sample mean).
- Compute the margin of error, which depends on variability and sample size.
- The CI = point estimate ± margin of error.
If a CI for a difference or a regression coefficient includes a null value (for example, 0), you typically cannot conclude there is a statistically significant effect at the corresponding significance level.
Explore More Resources
Formulas
Basic form:
Confidence Interval = Sample Mean ± Margin of Error
Margin of Error (when population standard deviation σ is known or sample is large):
Margin of Error = z* × (σ / sqrt(n))
Explore More Resources
When σ is unknown (common), use the sample standard deviation s and the t-distribution:
Margin of Error = t* × (s / sqrt(n))
Where:
* z = critical z-value for the chosen confidence level (e.g., 1.96 for 95%).
* t = critical t-value with n−1 degrees of freedom for the chosen confidence level.
* n = sample size
* s = sample standard deviation
Explore More Resources
How to get critical values:
* For 95% confidence, α = 0.05 and you use zα/2 = 1.96 (normal approximation).
* For small samples or unknown σ, use the t-table with α/2 and df = n−1.
Example (95% CI):
1. Sample mean = 10.00, s (or σ) = 0.50, n = 25
2. z* (95%) = 1.96
3. Margin = 1.96 × (0.50 / sqrt(25)) = 1.96 × 0.10 = 0.196
4. 95% CI = 10.00 ± 0.196 → (9.804, 10.196)
Explore More Resources
Calculating in Excel
- Compute the sample mean with =AVERAGE(range).
- Compute sample standard deviation with =STDEV.S(range).
- For the t-based margin of error, use the t critical value: =T.INV.2T(alpha, n-1).
- Margin of error = T.INV.2T(alpha, n-1) * STDEV.S(range) / SQRT(n)
- Excel also offers =CONFIDENCE.T(alpha, standard_dev, size) which returns the margin of error when you supply alpha (significance level, e.g., 0.05 for 95% confidence), the sample standard deviation, and sample size.
Note: CONFIDENCE.T expects alpha = 1 − confidence_level (so use 0.05 for a 95% CI).
Uses
- Estimating population means or proportions from a sample.
- Communicating the uncertainty of estimates in polling, scientific studies, quality control, and finance.
- Informing hypothesis tests: if a CI for a parameter excludes the null value, the corresponding hypothesis test would typically reject the null at that confidence level.
Simple explanation
A confidence interval tells you how close the average from your sample is likely to be to the actual average for the whole group. It gives a range that probably contains the true value, along with a measure of how confident you are in that range.
Explore More Resources
Common questions
Q: What does the “0.05” mean in a 95% CI?
A: 0.05 is the significance level α, where α = 1 − confidence_level. For a 95% CI, α = 0.05. It represents the probability that a constructed interval from a single sample will not contain the true parameter.
Q: What is a “good” confidence interval?
A: “Good” depends on context. A 95% CI is commonly used because it balances confidence with precision. Higher confidence (e.g., 99%) reduces the chance of missing the true value but widens the interval. Larger sample sizes produce narrower, more precise intervals.
Explore More Resources
Q: When should I use z vs. t?
A: Use z (normal distribution) when the population standard deviation is known or the sample size is large. Use t when the population standard deviation is unknown and the sample size is small.
Takeaway
Confidence intervals quantify uncertainty around sample estimates. They combine a point estimate with a margin of error determined by variability, sample size, and the chosen confidence level. Interpreting CIs correctly helps assess the reliability of statistical conclusions.