VIF: Detect and Fix Multicollinearity

Variance Inflation Factor (VIF)

A Variance Inflation Factor (VIF) quantifies how much the variance of an estimated regression coefficient is increased because of multicollinearity—correlation among the independent variables. Researchers use VIF to detect when predictor variables are too similar, which makes coefficient estimates unstable and difficult to interpret.

How VIF works

In multiple regression, multicollinearity occurs when one or more independent variables are linearly related to others.
Multicollinearity does not necessarily reduce a model’s predictive power but makes it hard to isolate the effect of each predictor: coefficients can become statistically insignificant or highly sensitive to small changes in the data.
VIF measures how much the variance (and therefore the standard error) of a coefficient is inflated due to this correlation with other predictors.

Formula and calculation

For the ith predictor:
VIF_i = 1 / (1 − R_i^2)

Explore More Resources

Where R_i^2 is the R-squared from regressing the ith independent variable on all the other independent variables.

Steps to compute VIF:
1. For each predictor X_i, run a regression with X_i as the dependent variable and the remaining predictors as independent variables.
2. Record R_i^2 from that regression.
3. Compute VIF_i = 1 / (1 − R_i^2).

Explore More Resources

Example numeric illustration: if R_i^2 = 0.80 for a predictor, then VIF = 1 / (1 − 0.80) = 5.

Interpretation and thresholds

VIF = 1: No correlation between the ith predictor and the others (no multicollinearity).
VIF between 1 and 5: Moderate correlation; usually acceptable.
VIF > 5: High correlation; investigate further.
VIF > 10: Strong multicollinearity that typically requires correction.

As VIF increases, standard errors grow and coefficient estimates become less reliable.

Explore More Resources

Example

Suppose an economist models inflation (dependent variable) using unemployment rate and initial jobless claims (both independent variables). Because jobless claims are related to unemployment, the model may show strong overall fit but fail to identify which predictor drives the effect. VIF will flag high multicollinearity for the involved variables, suggesting the researcher should remove or consolidate predictors depending on the analysis goal.

Remedies for high multicollinearity

Remove one or more highly correlated predictors when they provide redundant information.
Combine correlated variables into a single index or composite measure.
Use dimensionality-reduction methods such as principal components analysis (PCA) or partial least squares (PLS) to create uncorrelated components and then regress on those components.
Consider alternative regression approaches (e.g., regularization techniques) if the goal is prediction rather than interpretation.

Quick FAQs

What does a VIF of 1 mean?
It means the predictor is uncorrelated with the others; multicollinearity is not present for that variable.
Explore More Resources
Does multicollinearity always invalidate a model?
No—moderate multicollinearity is often acceptable. The main issue is interpretability of individual coefficient estimates, not necessarily predictive accuracy.

Conclusion

VIF is a simple diagnostic to detect multicollinearity and assess its severity. Use VIF to identify problematic predictors and then decide whether to remove, combine, or transform variables—or to use alternative modeling techniques—depending on whether interpretation or prediction is the primary objective.