*33*

Linear regression is a method we can use to understand the relationship between one or more explanatory variables and a response variable.

When we perform linear regression on a dataset, we end up with a regression equation which can be used to predict the values of a response variable, given the values for the explanatory variables.

We can then measure the difference between the predicted values and the actual values to come up with theÂ **residuals** for each prediction. This helps us get an idea of how well our regression model is able to predict the response values.

This tutorial explains how to obtain both theÂ **predicted valuesÂ **and theÂ **residualsÂ **for a regression model in Stata.

**Example: How to Obtain Predicted Values and Residuals**

For this example we will use the built-in Stata dataset calledÂ *auto*. Weâ€™ll use *mpgÂ *andÂ *displacementÂ *as the explanatory variables and *price *as the response variable.

Use the following steps to perform linear regression and subsequently obtain the predicted values and residuals for the regression model.

**Step 1: Load and view the data.**

First, weâ€™ll load the data using the following command:

sysuse auto

Next, weâ€™ll get a quick summary of the data using the following command:

summarize

**Step 2: Fit the regression model.**

Next, weâ€™ll use the following command to fit the regression model:

regress price mpg displacement

The estimated regression equation is as follows:

estimated price = 6672.766 -121.1833*(mpg) + 10.50885*(displacement)

**Step 3: Obtain the predicted values.**

We can obtain the predicted values by using theÂ **predictÂ **command and storing these values in a variable named whatever weâ€™d like. In this case, weâ€™ll use the nameÂ **pred_price**:

predict pred_price

We can view the actual prices and the predicted prices side-by-side using theÂ **listÂ **command. There are 74 total predicted values, but weâ€™ll view just the first 10 by using theÂ **in 1/10Â **command:

list price pred_price in 1/10

**Step 4: Obtain the residuals.**

We can obtain the residuals of each prediction by using theÂ **residualsÂ **command and storing these values in a variable named whatever weâ€™d like. In this case, weâ€™ll use the nameÂ **resid_price**:

predict resid_price, residuals

We can view the actual price, the predicted price, and the residuals all side-by-side using theÂ **listÂ **command again:

list price pred_price resid_price in 1/10

**Step 5: Create a predicted values vs. residuals plot.**

Lastly, we can created a scatterplot to visualize the relationship between the predicted values and the residuals:

scatterÂ resid_price pred_price

We can see that, on average, the residuals tend to grow larger as the fitted values grow larger. This could be a sign of heteroscedasticity â€“ when the spread of the residuals is not constant at every response level.

We could formally test forÂ heteroscedasticity using the Breusch-Pagan Test and we could address this problem using robust standard errors.