Bootstrapping is a method that can be used to estimate the standard error of any statistic and produce a confidence interval for the statistic.
The basic process for bootstrapping is as follows:
- Take k repeated samples with replacement from a given dataset.
- For each sample, calculate the statistic you’re interested in.
- This results in k different estimates for a given statistic, which you can then use to calculate the standard error of the statistic and create a confidence interval for the statistic.
We can perform bootstrapping in R by using the following functions from the boot library:
1. Generate bootstrap samples.
boot(data, statistic, R, …)
where:
- data: A vector, matrix, or data frame
- statistic: A function that produces the statistic(s) to be bootstrapped
- R: Number of bootstrap replicates
2. Generate a bootstrapped confidence interval.
boot.ci(bootobject, conf, type)
where:
- bootobject: An object returned by the boot() function
- conf: The confidence interval to calculate. Default is 0.95
- type: Type of confidence interval to calculate. Options include “norm”, “basic”, “stud”, “perc”, “bca” and “all” – Default is “all”
The following examples show how to use these functions in practice.
Example 1: Bootstrap a Single Statistic
The following code shows how to calculate the standard error for the R-squared of a simple linear regression model:
set.seed(0) library(boot) #define function to calculate R-squared rsq_function function(formula, data, indices) { d #allows boot to select sample fit #fit regression model return(summary(fit)$r.square) #return R-squared of model } #perform bootstrapping with 2000 replications reps #view results of boostrapping reps ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = mtcars, statistic = rsq_function, R = 2000, formula = mpg ~ disp) Bootstrap Statistics : original bias std. error t1* 0.7183433 0.002164339 0.06513426
From the results we can see:
- The estimated R-squared for this regression model is 0.7183433.
- The standard error for this estimate is 0.06513426.
We can quickly view the distribution of the bootstrapped samples as well:
plot(reps)
We can also use the following code to calculate the 95% confidence interval for the estimated R-squared of the model:
#calculate adjusted bootstrap percentile (BCa) interval boot.ci(reps, type="bca") CALL : boot.ci(boot.out = reps, type = "bca") Intervals : Level BCa 95% ( 0.5350, 0.8188 ) Calculations and Intervals on Original Scale
From the output we can see that the 95% bootstrapped confidence interval for the true R-squared values is (.5350, .8188).
Example 2: Bootstrap Multiple Statistics
The following code shows how to calculate the standard error for each coefficient in a multiple linear regression model:
set.seed(0) library(boot) #define function to calculate fitted regression coefficients coef_function function(formula, data, indices) { d #allows boot to select sample fit #fit regression model return(coef(fit)) #return coefficient estimates of model } #perform bootstrapping with 2000 replications reps #view results of boostrapping reps ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = mtcars, statistic = coef_function, R = 2000, formula = mpg ~ disp) Bootstrap Statistics : original bias std. error t1* 29.59985476 -5.058601e-02 1.49354577 t2* -0.04121512 6.549384e-05 0.00527082
From the results we can see:
- The estimated coefficient for the intercept of the model is 29.59985476 and the standard error of this estimate is 1.49354577.
- The estimated coefficient for the predictor variable disp in the model is -0.04121512 and the standard error of this estimate is 0.00527082.
We can quickly view the distribution of the bootstrapped samples as well:
plot(reps, index=1) #intercept of model plot(reps, index=2) #disp predictor variable
We can also use the following code to calculate the 95% confidence intervals for each coefficient:
#calculate adjusted bootstrap percentile (BCa) intervals boot.ci(reps, type="bca", index=1) #intercept of model boot.ci(reps, type="bca", index=2) #disp predictor variable CALL : boot.ci(boot.out = reps, type = "bca", index = 1) Intervals : Level BCa 95% (26.78, 32.66 ) Calculations and Intervals on Original Scale BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 2000 bootstrap replicates CALL : boot.ci(boot.out = reps, type = "bca", index = 2) Intervals : Level BCa 95% (-0.0520, -0.0312 ) Calculations and Intervals on Original Scale
From the output we can see that the 95% bootstrapped confidence intervals for the model coefficients are as follows:
- C.I. for intercept: (26.78, 32.66)
- C.I. for disp: (-.0520, -.0312)
Additional Resources
How to Perform Simple Linear Regression in R
How to Perform Multiple Linear Regression in R
Introduction to Confidence Intervals