*38*

In regression analysis,Â heteroscedasticityÂ refers to the unequal scatter of residuals. Specifically, it refers to the case where there is a systematic change in the spread of the residuals over the range of measured values.

Heteroscedasticity is a problem because ordinary least squares (OLS) regression assumes that the residuals come from a population that hasÂ *homoscedasticity*, which means constant variance.

When heteroscedasticity is present in a regression analysis, the results of the analysis become hard to trust.

One way to determine ifÂ heteroscedasticity is present in a regression analysis is to use aÂ Breusch-Pagan Test**.**

This tutorial explains how to perform a Breusch-Pagan Test in Python.

**Example: Breusch-Pagan Test in Python**

For this example weâ€™llÂ use the following dataset that describes the attributes of 10 basketball players:

import numpy as np import pandas as pd #create dataset df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86], 'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19], 'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5], 'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]}) #view dataset df rating points assists rebounds 0 90 25 5 11 1 85 20 7 8 2 82 14 7 10 3 88 16 8 6 4 94 27 5 6 5 90 20 7 9 6 76 12 6 6 7 75 15 9 10 8 87 14 9 10 9 86 19 5 7

We will fit a multiple linear regression model using rating as the response variable and points, assists, and rebounds as the explanatory variables. Then we will perform a Breusch-Pagan Test to determine ifÂ heteroscedasticity is present in the regression.

**Step 1: Fit a multiple linear regression model.**

First, weâ€™ll fit a multiple linear regression model:

import statsmodels.formula.api as smf #fit regression model fit = smf.ols('rating ~ points+assists+rebounds', data=df).fit() #view model summary print(fit.summary())

**Step 2: Perform a Breusch-Pagan test.**

Next, weâ€™ll perform a Breusch-Pagan test to determine ifÂ heteroscedasticity is present.

from statsmodels.compat import lzip import statsmodels.stats.api as sms #perform Bresuch-Pagan test names = ['Lagrange multiplier statistic', 'p-value', 'f-value', 'f p-value'] test = sms.het_breuschpagan(fit.resid, fit.model.exog) lzip(names, test) [('Lagrange multiplier statistic', 6.003951995818433), ('p-value', 0.11141811013399583), ('f-value', 3.004944880309618), ('f p-value', 0.11663863538255281)]

A Breusch-Pagan test uses the following null and alternative hypotheses:

**The null hypothesis (H _{0}):**Â Homoscedasticity is present.

**The alternative hypothesis: (Ha):**Â Homoscedasticity is *notÂ *present (i.e. heteroscedasticity exists)

In this example, the Lagrange multiplier statistic for the test isÂ **6.004Â **and the corresponding p-value isÂ **0.1114**.Â Because this p-value is not less than 0.05, we fail to reject the null hypothesis. We do not have sufficient evidence to say that heteroscedasticity is present in the regression model.

**How to Fix Heteroscedasticity**

In the previous example we saw thatÂ heteroscedasticity was not present in the regression model.

However, whenÂ heteroscedasticity actually is present there are three common ways to remedy the situation:

**1.**Â **Transform the dependent variable.Â **One way to fixÂ heteroscedasticity is to transform the dependent variable in some way. One common transformation is to simply take the log of the dependent variable.

**2. Redefine the dependent variable.Â **Another way to fixÂ heteroscedasticity is to redefine the dependent variable. One common way to do so is to use aÂ *rate*Â for the dependent variable, rather than the raw value.

**3. Use weighted regression.Â **Another way to fixÂ heteroscedasticity is to use weighted regression. This type of regression assigns a weight to each data point based on the variance of its fitted value. When the proper weights are used, this can eliminate the problem of heteroscedasticity.

Read more details about each of these three methods in this post.