*41*

**Multicollinearity**Â inÂ regression analysisÂ occurs when two or more predictor variables are highly correlated to each other, such that they do not provide unique or independent information in the regression model. If the degree of correlation is high enough between variables, it can cause problems when fitting and interpreting the regression model.Â

One way to detect multicollinearity is by using a metric known as the**Â variance inflation factor (VIF)**, which measures the correlation and strength of correlation between the predictor variables in a regression model.

This tutorial explains how to use VIF to detect multicollinearity in a regression analysis in SPSS.

**Example: Multicollinearity in SPSS**

Suppose we have the following dataset that shows the exam score of 10 students along with the number of hours they spent studying, the number of prep exams they took, and their current grade in the course:

We would like to perform a linear regression usingÂ **score**Â as the response variable andÂ **hours**,Â **prep_exams**, andÂ **current_grade** as the predictor variables, but we want to make sure that the three predictor variables arenâ€™t highly correlated.

To determine if multicollinearity is a problem, we can produce VIF values for each of the predictor variables.

To do so, click on theÂ **AnalyzeÂ **tab, thenÂ **Regression**, thenÂ **Linear**:

In the new window that pops up, dragÂ **scoreÂ **into the box labelled Dependent and drag the three predictor variables into the box labelledÂ Independent(s). Then clickÂ **StatisticsÂ **and make sure the box is checked next toÂ **Collinearity diagnostics**. Then clickÂ **Continue**. Then clickÂ **OK**.

Once you clickÂ **OK**, the following table will be displayed that shows the VIF value for each predictor variable:

The VIF values for each of the predictor variables are as follows:

- hours:
**1.169** - prep_exams:
**1.403** - current_grade:
**1.522**

The value for VIF starts at 1 and has no upper limit. A general rule of thumb for interpreting VIFs is as follows:

- A value of 1 indicates there is no correlation between a given predictor variable and any other predictor variables in the model.
- A value between 1 and 5 indicates moderate correlation between a given predictor variable and other predictor variables in the model, but this is often not severe enough to require attention.
- A value greater than 5 indicates potentially severe correlation between a given predictor variable and other predictor variables in the model. In this case, the coefficient estimates and p-values in the regression output are likely unreliable.

We can see that none of the VIF values for the predictor variables in this example are greater than 5, which indicates that multicollinearity will not be a problem in the regression model.