*19*

A **studentized residual** is simply a residual divided by its estimated standard deviation.

In practice, we typically say that any observation in a dataset that has a studentized residual greater than an absolute value of 3 is an outlier.

We can quickly obtain the studentized residuals of a regression model in Python by using the OLSResults.outlier_test() function from statsmodels, which uses the following syntax:

**OLSResults.outlier_test()**

where *OLSResults *is the name of a linear model fit using theÂ **ols()** function from statsmodels.

**Example: Calculating Studentized Residuals in Python**

Suppose we build the following simple linear regression model in Python:

#import necessary packages and functions import numpy as np import pandas as pd import statsmodels.api as sm from statsmodels.formula.api import ols #create dataset df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86], 'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19]}) #fit simple linear regression model model = ols('rating ~ points', data=df).fit()

We can use the **outlier_test()** function to produce a DataFrame that contains the studentized residuals for each observation in the dataset:

#calculate studentized residuals stud_res = model.outlier_test() #display studentized residuals print(stud_res) student_resid unadj_p bonf(p) 0 -0.486471 0.641494 1.000000 1 -0.491937 0.637814 1.000000 2 0.172006 0.868300 1.000000 3 1.287711 0.238781 1.000000 4 0.106923 0.917850 1.000000 5 0.748842 0.478355 1.000000 6 -0.968124 0.365234 1.000000 7 -2.409911 0.046780 0.467801 8 1.688046 0.135258 1.000000 9 -0.014163 0.989095 1.000000

This DataFrame displays the following values for each observation in the dataset:

- The studentized residual
- The unadjusted p-value of the studentized residual
- The Bonferroni-corrected p-value of the studentized residual

We can see that the studentized residual for the first observation in the dataset is **-0.486471**, the studentized residual for the second observation is **-0.491937**, and so on.

We can also create a quick plot of the predictor variable values vs. the corresponding studentized residuals:

import matplotlib.pyplot as plt #define predictor variable values and studentized residuals x = df['points'] y = stud_res['student_resid'] #create scatterplot of predictor variable vs. studentized residuals plt.scatter(x, y) plt.axhline(y=0, color='black', linestyle='--') plt.xlabel('Points') plt.ylabel('Studentized Residuals')

From the plot we can see that none of the observations have a studentized residual with an absolute value greater than 3, thus there are no clear outliers in the dataset.

**Additional Resources**

How to Perform Simple Linear Regression in Python

How to Perform Multiple Linear Regression in Python

How to Create a Residual Plot in Python