Multiple Linear Regression (MLR)
Multiple linear regression (MLR) models the relationship between one dependent (response) variable and two or more independent (explanatory) variables. It estimates how each predictor contributes to the outcome while holding the other predictors constant.
MLR formula
For observation i (i = 1,…,n):
yi = β0 + β1 xi1 + β2 xi2 + … + βp xip + εi
Explore More Resources
Where:
* yi = dependent variable
* xij = jth explanatory variable for observation i
* β0 = intercept
* βj = slope (regression) coefficient for predictor j
* εi = error term (residual)
In matrix form the ordinary least squares (OLS) estimator is β̂ = (X’X)⁻¹ X’y, which software typically computes.
Explore More Resources
Key assumptions
MLR relies on several assumptions for valid estimation and inference:
* Linearity: the relationship between the dependent variable and each predictor is linear in the coefficients.
* Independence: observations (and residuals) are independent.
* No perfect multicollinearity: predictors are not exact linear combinations of one another.
* Homoscedasticity: residuals have constant variance (no heteroscedasticity).
* Normality of residuals: for hypothesis testing and confidence intervals, residuals should be approximately normally distributed.
* Correct specification: relevant predictors are included and functional form is appropriate.
Violations (e.g., multicollinearity, heteroscedasticity, omitted variables) can bias estimates or distort inference. Tools such as variance inflation factors (VIF), residual plots, and robust standard errors help diagnose and address problems.
Explore More Resources
How results are interpreted
- Each β̂j estimates the expected change in y associated with a one-unit change in xj, holding all other predictors constant (“all else equal”).
- R² (coefficient of determination) measures the proportion of outcome variance explained by the predictors. R² increases when more variables are added, even if they add little predictive value—so prefer adjusted R² or information criteria (AIC/BIC) when comparing models.
- Standard errors, t-statistics, and p-values assess whether coefficients differ significantly from zero.
- Residuals capture the difference between observed and predicted values; their patterns reveal model deficiencies.
Example (illustrative)
Suppose we model a company’s stock price (dependent variable) using:
* x1 = interest rates
* x2 = oil price
* x3 = market index (S&P 500)
* x4 = oil futures price
A hypothetical regression output might report:
* β̂2 (oil price) = +7.8: a 1% increase in oil price associates with a 7.8% increase in the stock price, holding other factors constant.
* β̂1 (interest rates) = −1.5: a 1% rise in interest rates associates with a 1.5% decrease in the stock price, holding other factors constant.
* R² = 0.865: 86.5% of the variation in the stock price is explained by the included predictors.
Explore More Resources
These numbers are illustrative; actual estimation requires data and statistical software. Residual analysis should follow to validate assumptions.
Linear vs. multiple; linear vs. nonlinear
- Simple linear regression has one predictor; multiple regression has two or more.
- “Linear” in MLR means the model is linear in the parameters (coefficients). Predictors can be transformed (log, square, interaction terms) while preserving linearity in parameters.
- Nonlinear regression involves models that are nonlinear in parameters (e.g., logistic regression, exponential models).
Common uses in finance and econometrics
- Forecasting asset prices and returns
- Testing economic and financial theories (e.g., factor models)
- Performance attribution and risk modeling (e.g., Fama–French models extend CAPM by adding factors)
- Valuation drivers and scenario analysis
Common pitfalls and best practices
- Overfitting: too many predictors relative to sample size reduce out-of-sample performance. Use cross-validation and parsimony.
- Multicollinearity: inflates standard errors and makes coefficients unstable—diagnose with VIF and consider dropping or combining variables.
- Omitted variable bias: leaving out relevant predictors can bias coefficients.
- Relying solely on R²: compare models using adjusted R², AIC/BIC, cross-validation, and economic interpretability.
- Validate assumptions with residual plots, tests (e.g., Breusch–Pagan for heteroscedasticity), and robustness checks.
Quick FAQs
Q: Why use MLR instead of simple OLS?
A: When an outcome is influenced by multiple factors, MLR isolates each factor’s effect conditional on the others.
Explore More Resources
Q: Can you compute MLR by hand?
A: In principle yes (using the normal equations), but in practice you should use statistical software; manual computation is impractical for large models.
Q: What does “linear” mean in MLR?
A: The model is linear in the parameters (coefficients), not necessarily in the raw predictors.
Explore More Resources
Conclusion
Multiple linear regression is a foundational tool for quantifying how several predictors jointly affect an outcome. Proper application requires attention to assumptions, diagnostic checks, and careful model selection to ensure reliable estimation and meaningful interpretation.