Descriptive Statistics
Descriptive statistics summarize and communicate the main features of a dataset without making predictions or generalizing beyond the data. They reduce large amounts of raw data into concise, interpretable measures and visualizations that describe the center, spread, and frequency of values.
Key takeaways
- Summarize characteristics of a dataset (either a full population or a sample).
- Main categories: measures of central tendency, measures of variability (dispersion), and frequency distributions.
- Common tools: mean, median, mode, variance, standard deviation, range, quartiles, histograms, boxplots.
- Descriptive statistics describe what the data show; inferential statistics use data to draw conclusions or make predictions about a larger population.
Measures of central tendency
These describe the “center” or typical value in a dataset.
* Mean: arithmetic average (sum of values ÷ number of values).
  Example: For (2, 2, 3, 5, 8), mean = 20/5 = 4.
* Median: middle value when data are ordered (or average of two middle values).
  Example above: median = 3.
* Mode: most frequently occurring value.
  Example above: mode = 2.
Explore More Resources
Use the median when the data are skewed or contain outliers; use the mean for symmetric distributions.
Measures of variability (dispersion)
These describe how spread out the values are.
* Range: max − min.
  Example: for (5, 19, 24, 62, 91, 100), range = 95.
* Quartiles and interquartile range (IQR): split data into four parts; IQR = Q3 − Q1.
* Variance: average squared deviation from the mean.
* Standard deviation: square root of variance; gives dispersion in original units.
* Absolute deviation, skewness (asymmetry), kurtosis (tailedness).
Explore More Resources
Variability is important because two datasets with the same mean can have very different distributions.
Frequency distribution
Counts or proportions of values or categories. Useful formats:
* Frequency tables
* Histograms (numeric bins)
* Bar charts (categorical counts)
Explore More Resources
Example: In the list [male, male, female, female, female, other], counts are male = 2, female = 3, other = 1.
Univariate vs. bivariate (and multivariate)
- Univariate: analysis of a single variable (e.g., average age of people in a room).
- Bivariate: analysis of two variables and their relationship (e.g., age vs. test score).
- Multivariate: three or more variables analyzed together.
Univariate describes one trait; bivariate/multivariate examine relationships or associations (but not causation).
Explore More Resources
Visualizations
Visual tools make descriptive statistics easier to interpret:
* Histogram: shows distribution and shape (clusters, spread, skew).
* Boxplot (box-and-whisker): displays median, quartiles, and outliers; good for comparing groups.
* Scatter plot: displays relationship between two numeric variables.
* Line chart: useful for time series.
* Stem-and-leaf: compact display preserving original values.
Outliers
Outliers are values that differ markedly from others and can distort summary measures.
* Detection methods: boxplots, scatter plots, Z-scores, and the IQR rule (commonly mark values outside Q1 − 1.5·IQR or Q3 + 1.5·IQR).
* Impact: outliers can pull the mean away from the median (e.g., (1,1,1,997) has mean 250, which misrepresents the typical value).
* Treatment: investigate cause (error vs. real phenomenon). Remove if erroneous; keep and report if meaningful. Consider robust statistics (median, IQR) or transformations.
Explore More Resources
Descriptive vs. inferential statistics
- Descriptive: summarize what the sample or population data show (e.g., average sales, counts, variances).
- Inferential: use sample data to make estimates or test hypotheses about a larger population (e.g., predict sales for a new product, test for differences between groups).
Example: reporting last year’s sales by day is descriptive; using those sales to predict demand for a new product is inferential.
When to use descriptive statistics
- Exploratory data analysis to understand structure and quality of data.
- Reporting and dashboards to communicate historical performance.
- Preparing inputs for inferential models.
Simple examples
- GPA: a student’s GPA is the mean of course grades — a descriptive summary of academic performance.
- Baseball season recap: team batting averages, runs allowed, and average wins are descriptive statistics summarizing past performance.
Quick FAQ
Q: Can descriptive statistics be used to make predictions?
A: Not by themselves. They describe historical data. Predictions require inferential methods or modeling.
Explore More Resources
Q: What is the best measure of center when data have outliers?
A: The median is more robust to outliers than the mean.
Q: How do I choose a visualization?
A: Use histograms/boxplots for distributions, scatter plots for relationships between two numeric variables, and bar charts for categorical counts.
Explore More Resources
Summary
Descriptive statistics provide concise summaries and visualizations of data’s central tendency, dispersion, and frequency. They are essential for understanding and communicating what the data show and form the foundation for further statistical analysis.