White’s test is used to determine if heteroscedasticity is present in a regression model.
Heteroscedasticity refers to the unequal scatter of residuals at different levels of a response variable, which violates the assumption that the residuals are equally scattered at each level of the response variable.
The following step-by-step example shows how to perform White’s test in Python to determine whether or not heteroscedasticity is a problem in a given regression model.
Step 1: Load Data
In this example we will fit a multiple linear regression model using the mtcars dataset.
The following code shows how to load this dataset into a pandas DataFrame:
from sklearn.linear_model import LinearRegression from statsmodels.stats.diagnostic import het_white import statsmodels.api as sm import pandas as pd #define URL where dataset is located url = "https://raw.githubusercontent.com/Statology/Python-Guides/main/mtcars.csv" #read in data data = pd.read_csv(url) #view summary of data data.info()RangeIndex: 32 entries, 0 to 31 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 model 32 non-null object 1 mpg 32 non-null float64 2 cyl 32 non-null int64 3 disp 32 non-null float64 4 hp 32 non-null int64 5 drat 32 non-null float64 6 wt 32 non-null float64 7 qsec 32 non-null float64 8 vs 32 non-null int64 9 am 32 non-null int64 10 gear 32 non-null int64 11 carb 32 non-null int64 dtypes: float64(5), int64(6), object(1)
Step 2: Fit Regression Model
Next, we will fit a regression model using mpg as the response variable and disp and hp as the two predictor variables:
#define response variable y = data['mpg'] #define predictor variables x = data[['disp', 'hp']] #add constant to predictor variables x = sm.add_constant(x) #fit regression model model = sm.OLS(y, x).fit()
Step 3: Perform White’s Test
Next, we will use the het_white() function from the statsmodels package to perform White’s test to determine if heteroscedasticity is present in the regression model:
#perform White's test white_test = het_white(model.resid, model.model.exog) #define labels to use for output of White's test labels = ['Test Statistic', 'Test Statistic p-value', 'F-Statistic', 'F-Test p-value'] #print results of White's test print(dict(zip(labels, white_test))) {'Test Statistic': 7.076620330416624, 'Test Statistic p-value': 0.21500404394263936, 'F-Statistic': 1.4764621093131864, 'F-Test p-value': 0.23147065943879694}
Here is how to interpret the output:
- The test statistic is X2 = 7.0766.
- The corresponding p-value is 0.215.
White’s test uses the following null and alternative hypotheses:
- Null (H0): Homoscedasticity is present (residuals are equally scattered)
- Alternative (HA): Heteroscedasticity is present (residuals are not equally scattered)
Since the p-value is not less than 0.05, we fail to reject the null hypothesis.
This means we do not have sufficient evidence to say that heteroscedasticity is present in the regression model.
What To Do Next
If you fail to reject the null hypothesis of White’s test then heteroscedasticity is not present and you can proceed to interpret the output of the original regression.
However, if you reject the null hypothesis, this means heteroscedasticity is present. In this case, the standard errors that are shown in the output table of the regression may be unreliable.
There are two common ways to fix this issue:
1. Transform the response variable.
You can try performing a transformation on the response variable, such as taking the log, square root, or cube root of the response variable. This often causes heteroscedasticity to go away.
2. Use weighted regression.
Weighted regression assigns a weight to each data point based on the variance of its fitted value. Essentially, this gives small weights to data points that have higher variances, which shrinks their squared residuals. When the proper weights are used, this can eliminate the problem of heteroscedasticity.
Additional Resources
The following tutorials provide additional information about linear regression in Python:
A Complete Guide to Linear Regression in Python
How to Create a Residual Plot in Python
How to Calculate VIF in Python