Two types of intervals that are often used in regression analysis are confidence intervals and prediction intervals.
Here’s the difference between the two intervals:
Confidence intervals represent a range of values that are likely to contain the true mean value of some response variable based on specific values of one or more predictor variables.
Prediction intervals represent a range of values that are likely to contain the true value of some response variable for a single new observation based on specific values of one or more predictor variables.
For example, suppose we fit a simple linear regression model that uses the number of bedrooms to predict the selling price of a house:
Price = β0 + β1(number of bedrooms)
If we’d like to estimate the mean selling price of houses with three bedrooms, we would use a confidence interval.
However, if we’d like to estimate the selling price of a specific new home that just came on the market with three bedrooms, we would use a prediction interval.
Note: Since prediction intervals attempt to create an interval for a specific new observation, there’s more uncertainty in our estimate and thus prediction intervals are always wider than confidence intervals.
Confidence Interval vs. Prediction Interval: Difference in Formulas
We use the following formula to calculate a confidence interval:
ŷ0 +/- tα/2,n-2 * Syx√((x0 – x̄)2/SSx + 1/n)
We use the following formula to calculate a prediction interval:
ŷ0 +/- tα/2,n-2 * Syx√((x0 – x̄)2/SSx + 1/n + 1)
where:
- ŷ0: Estimated mean value of response variable
- tα/2,n-2: t-critical value with n-2 degrees of freedom
- Syx: Standard error of response variable
- x0: specific value of predictor variable
- x̄: mean value of predictor variable
- SSx: Sum of squares for predictor variable
- n: Total sample size
Notice that the formula for a prediction interval contains an extra one in the square root portion, which means the standard error will always be larger than a confidence interval.
Thus, a prediction interval will always be wider than a confidence interval.
Example: Interpreting Confidence Intervals vs. Prediction Intervals
Suppose we have the following dataset that shows the number of bedrooms and the selling price for 20 houses in a particular neighborhood:
Now suppose we fit a simple linear regression model to this dataset in R:
#define data df frame(beds=c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 6), price=c(120, 133, 139, 185, 148, 160, 192, 205, 244, 213, 236, 280, 275, 273, 312, 311, 304, 415, 396, 488)) #fit simple linear regression model model #view model fit summary(model) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 39.450 13.248 2.978 0.00807 ** beds 70.667 4.031 17.529 9.26e-13 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 24.19 on 18 degrees of freedom Multiple R-squared: 0.9447, Adjusted R-squared: 0.9416 F-statistic: 307.3 on 1 and 18 DF, p-value: 9.257e-13
The fitted regression model turns out to be:
Selling price (thousands) = 39.450 + 70.667(number of bedrooms)
We can use the following code to calculate a confidence interval for the mean selling price of houses that have three bedrooms:
#define new house new frame(beds=c(3)) #confidence interval for mean selling price of house with 3 bedrooms predict(model, newdata = new, interval = "confidence") fit lwr upr 1 251.45 240.087 262.813
The 95% confidence interval for the mean selling price of a house with three bedrooms is [$240k, $262k].
We can then use the following code to calculate a prediction interval for the selling price of a new house that just came on the market that has three bedrooms:
#define new house new frame(beds=c(3)) #confidence interval for mean selling price of house with 3 bedrooms predict(model, newdata = new, interval = "prediction") fit lwr upr 1 251.45 199.3783 303.5217
The 95% prediction interval for the selling price of a new house with three bedrooms is [$199k, $303k].
Notice that the prediction interval is much wider than the confidence interval because there is more uncertainty around the selling price of a single new house as opposed to the mean selling price of all houses with three bedrooms.
Additional Resources
The following tutorials offer additional information about confidence intervals:
- An Introduction to Confidence Intervals
- 4 Examples of Confidence Intervals in Real Life
- How to Calculate Confidence Intervals in Excel
The following tutorials offer additional information about prediction intervals: