Coefficient of Determination (R-squared)
What it is
The coefficient of determination, commonly called R-squared (R²), quantifies how much of the variability in a dependent variable (e.g., a stock’s price) can be explained by an independent variable (e.g., a market index). R² ranges from 0 to 1 (or 0%–100%):
- R² = 1.0: 100% of variability explained (perfect fit).
- R² = 0.0: the independent variable explains none of the variability.
- Intermediate values show the proportion explained (for example, R² = 0.20 → 20% explained).
Note: R² is the square of the correlation coefficient r. Because r is squared, R² is always nonnegative even if r is negative.
Explore More Resources
Why it matters
Investors and analysts use R² to assess how closely an asset’s price movements follow an index or benchmark. It helps evaluate the strength of a relationship, guide model selection, and understand how much of observed movement may be attributed to the chosen explanatory variable(s). R² does not imply causation.
Interpreting R-squared
Interpretation depends on context (field, data, model). Rough informal guidance:
– 0.0–0.3: weak explanatory power
– 0.3–0.6: moderate explanatory power
– 0.6–1.0: strong explanatory power
Explore More Resources
Caveats:
– High R² does not prove causation or that the model is appropriate.
– R² can be inflated by overfitting when adding more predictors; use adjusted R² in multiple regression.
– Outliers and nonlinearity can distort R².
How to calculate
Spreadsheet (quick):
– Use built-in function: =RSQ(A1:A10, B1:B10) where A and B hold paired data.
Explore More Resources
Correlation-based formula (manual):
1. Compute sums: ∑x, ∑y, ∑xy, ∑x², ∑y², and sample size n.
2. Compute correlation r:
r = [ n(∑xy) − (∑x)(∑y) ] / sqrt{ [ n(∑x²) − (∑x)² ] [ n(∑y²) − (∑y)² ] }
3. R² = r²
Alternate direct formula for R²:
R² = { [ n(∑xy) − (∑x)(∑y) ]² } / { [ n(∑x²) − (∑x)² ] [ n(∑y²) − (∑y)² ] }
Explore More Resources
Manual calculation is feasible for small datasets but becomes tedious and error-prone for larger samples—spreadsheets or statistical software are recommended.
Example (summary)
Using daily closing prices for Apple (AAPL) and the S&P 500 over a 20-day sample, the computed R² was approximately 0.347. That indicates moderate but not strong correlation: about 34.7% of Apple’s price movement over that period could be explained by movements in the S&P 500.
Explore More Resources
Practical tips
- Use R² to compare models or judge how much variance a variable explains, but complement it with other diagnostics (residual analysis, p-values, adjusted R²).
- For multiple regressors, prefer adjusted R² to penalize unnecessary predictors.
- Check for nonlinearity, heteroskedasticity, and outliers, since they affect interpretation.
Bottom line
R-squared measures the proportion of variance in a dependent variable explained by an independent variable or model. It’s a useful summary metric for goodness of fit, but it must be interpreted in context and supplemented with other statistical checks to draw reliable conclusions.