*20*

A residual is the difference between an observed value and a predicted value in a regression model.

It is calculated as:

Residual = Observed value – Predicted value

One way to understand how well a regression model fits a dataset is to calculate the **residual sum of squares**, which is calculated as:

Residual sum of squares = Σ(e_{i})^{2}

where:

**Σ**: A Greek symbol that means “sum”**e**: The i_{i}^{th}residual

The lower the value, the better a model fits a dataset.

We can easily calculate the residual sum of squares for a regression model in R by using one of the following two methods:

#build regression model model #calculate residual sum of squares (method 1) deviance(model) #calculate residual sum of squares (method 2) sum(resid(model)^2)

Both methods will produce the exact same results.

The following example shows how to use these functions in practice.

**Example: Calculating Residual Sum of Squares in R**

For this example, we’ll use the built-in **mtcars** dataset in R:

#view first six rows of mtcars dataset head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

The following code shows how to fit a multiple linear regression model for this dataset and calculate the residual sum of squares of the model:

#build multiple linear regression model model #calculate residual sum of squares (method 1) deviance(model) [1] 195.0478 #calculate residual sum of squares (method 2) sum(resid(model)^2) [1] 195.0478

We can see that the residual sum of squares turns out to be **195.0478**.

If we have two competing models, we can calculate the residual sum of squares for both to determine which one fits the data better:

#build two different models model1 #calculate residual sum of squares for both models deviance(model1) [1] 195.0478 deviance(model2) [1] 246.6825

We can see that the residual sum of squares for model 1 is lower, which indicates that it fits the data better than model 2.

We can confirm this by calculating the R-squared of each model:

#build two different models model1 #calculate R-squared for both models summary(model1)$r.squared [1] 0.8267855 summary(model2)$r.squared [1] 0.7809306

The R-squared for model 1 turns out to be higher, which indicates that it’s able to explain more of the variance in the response values compared to model 2.

**Additional Resources**

How to Perform Simple Linear Regression in R

How to Perform Multiple Linear Regression in R

Residual Sum of Squares Calculator