Sum of Squares: Calculation, Types, and Examples
The sum of squares (SS) is a basic statistical measure of variability: it quantifies how far data points are spread around their mean or around fitted values in a regression. It’s central to variance, standard deviation, and least-squares regression.
Key takeaways
- Sum of squares equals the sum of squared deviations from a reference value (usually the mean or fitted value).
- Larger SS indicates greater variability; smaller SS indicates data points cluster more tightly.
- Variance = SS divided by the number of observations (or by n−1 for sample variance). Standard deviation is the square root of the variance.
- In regression, SS is decomposed into explained and unexplained parts (SSR and SSE).
How it works
For a dataset, the difference between each observation and a reference value (commonly the mean) is squared so negative and positive deviations do not cancel out. Summing those squared deviations yields a positive measure of total variation. Regression methods choose parameters that minimize the sum of squared residuals (least squares), producing the best-fit line or curve under that criterion.
Explore More Resources
Formula
For a dataset X = {X1, X2, …, Xn} with mean X̄:
SS (total) = Σi=1..n (Xi − X̄)²
Relationship to variance and standard deviation:
* Population variance σ² = SS / n
* Sample variance s² = SS / (n − 1) (when estimating population variance)
* Standard deviation = √variance
Explore More Resources
In regression the total sum of squares (SST) is decomposed:
SST = SSR + SSE
where:
* SSR (regression sum of squares) = Σ (ŷi − ȳ)² — variation explained by the model
* SSE (residual or error sum of squares) = Σ (yi − ŷi)² — unexplained variation (residuals)
How to calculate (step-by-step)
- Gather the data points.
- Compute the reference value (usually the mean).
- For each observation, subtract the reference value to get the deviation.
- Square each deviation.
- Sum the squared deviations — this is the sum of squares.
Types of sum of squares
- Total Sum of Squares (SST): total variation of observed values around their mean.
- Regression Sum of Squares (SSR): portion of SST explained by the regression model (variation of fitted values around the mean).
- Residual Sum of Squares (SSE): portion of SST not explained by the model (variation of actual values around fitted values).
Interpretation:
* Small SSE → model fits the data well.
* Large SSR relative to SST → model explains a large share of total variation.
* R² = SSR / SST measures the proportion of variance explained by the model.
Explore More Resources
Example
Data (closing prices): 374.01, 374.77, 373.94, 373.61, 373.40
Sum = 1,869.73 → mean = 1,869.73 / 5 = 373.946
Compute deviations and squares:
* (374.01 − 373.946)² ≈ 0.0041
(374.77 − 373.946)² ≈ 0.6790
(373.94 − 373.946)² ≈ 0.0000
(373.61 − 373.946)² ≈ 0.1129
(373.40 − 373.946)² ≈ 0.2981
Explore More Resources
Sum of squares ≈ 1.094 (low), indicating low variability in these five observed prices.
Practical uses
- Measuring variability (input to variance and standard deviation).
- Assessing volatility in finance (e.g., comparing stability of asset prices).
- Quantifying model fit in regression and computing R².
- Underpinning least-squares estimation for linear and nonlinear models.
Limitations and cautions
- SS grows with the number of observations and with scale (not directly comparable across datasets with different units or sizes unless normalized).
- Squaring amplifies the influence of outliers.
- SS and derived measures are based on historical data and do not guarantee future performance.
- Interpretation in regression relies on appropriate model specification and assumptions (e.g., independent residuals, correct functional form).
Bottom line
The sum of squares is a foundational measure of variation used across statistics and regression analysis. It quantifies how much data deviate from a reference value or model, supports calculation of variance and standard deviation, and is central to least-squares model fitting. Use SS alongside robust diagnostics and appropriate context when making inferences or investment decisions.