Error Term
What is an error term?
An error term is the component in a statistical model that captures the difference between predicted values and actual outcomes. It represents factors affecting the dependent variable that are not included in the model’s independent variables. Common symbols for the error term are e, ε, or u.
Formal expression
In a linear regression the model is often written as:
Y = αX + βρ + ε
Explore More Resources
where:
* Y = dependent variable (observed outcome)
* X, ρ = independent variables (predictors)
* α, β = parameters (coefficients)
* ε = error term (unobserved deviations)
When the observed Y differs from the model’s predicted Y, ε ≠ 0, indicating omitted influences, measurement error, or inherent randomness.
Explore More Resources
Interpretation and use
- The error term measures the model’s uncertainty and lack of perfect fit.
- It aggregates all influences on the dependent variable that the model does not explain (omitted variables, random shocks, measurement error).
- In applied work, the observable analogue is the residual: for each observation i, the residual ri = Yi − Ŷi (observed minus predicted). Residuals are calculable from sample data; the true error terms are unobservable.
Example: stock-price trend
If you fit a linear regression of a stock’s price on time, the trend line gives the predicted price at each moment. The error term for a given time equals the difference between the actual price and the predicted price. Points far from the trend line have large residuals/error contributions, indicating other influences (e.g., market sentiment, news).
Heteroskedasticity
Heteroskedasticity occurs when the variance of the error term varies across observations (i.e., the spread of residuals changes with the level of an independent variable). This violates the constant-variance assumption of ordinary least squares and can affect inference (standard errors, hypothesis tests).
Explore More Resources
Error terms vs. residuals
- Error term (ε): the true, unobservable deviation of an individual observation from the population regression line.
- Residual (r): the observable deviation of an individual observation from the estimated (sample) regression line. Residuals approximate error terms but include sampling estimation error.
Key takeaways
- The error term accounts for unexplained variation in a model and is essential for interpreting model accuracy.
- Residuals are the sample-based estimates of error terms and are used to diagnose fit and assumptions (e.g., heteroskedasticity, outliers).
- Proper treatment of the error term and its properties is critical for valid inference and prediction from regression models.