*39*

**Bootstrapping** is a method that can be used to estimate the standard error of any statistic and produce a confidence interval for the statistic.

The basic process for bootstrapping is as follows:

- Take
*k*repeated samples with replacement from a given dataset. - For each sample, calculate the statistic you’re interested in.
- This results in
*k*different estimates for a given statistic, which you can then use to calculate the standard error of the statistic and create a confidence interval for the statistic.

We can perform bootstrapping in R by using the following functions from the boot library:

1. Generate bootstrap samples.

**boot(data, statistic, R, …)**

where:

**data:**A vector, matrix, or data frame**statistic:**A function that produces the statistic(s) to be bootstrapped**R:**Number of bootstrap replicates

2. Generate a bootstrapped confidence interval.

**boot.ci(bootobject, conf, type)**

where:

**bootobject:**An object returned by the boot() function**conf:**The confidence interval to calculate. Default is 0.95**type:**Type of confidence interval to calculate. Options include “norm”, “basic”, “stud”, “perc”, “bca” and “all” – Default is “all”

The following examples show how to use these functions in practice.

**Example 1: Bootstrap a Single Statistic**

The following code shows how to calculate the standard error for the R-squared of a simple linear regression model:

set.seed(0) library(boot) #define function to calculate R-squared rsq_function function(formula, data, indices) { d #allows boot to select sample fit #fit regression model return(summary(fit)$r.square) #return R-squared of model } #perform bootstrapping with 2000 replications reps #view results of boostrapping reps ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = mtcars, statistic = rsq_function, R = 2000, formula = mpg ~ disp) Bootstrap Statistics : original bias std. error t1* 0.7183433 0.002164339 0.06513426

From the results we can see:

- The estimated R-squared for this regression model is
**0.7183433**. - The standard error for this estimate is
**0.06513426**.

We can quickly view the distribution of the bootstrapped samples as well:

plot(reps)

We can also use the following code to calculate the 95% confidence interval for the estimated R-squared of the model:

#calculate adjusted bootstrap percentile (BCa) interval boot.ci(reps, type="bca") CALL : boot.ci(boot.out = reps, type = "bca") Intervals : Level BCa 95% ( 0.5350, 0.8188 ) Calculations and Intervals on Original Scale

From the output we can see that the 95% bootstrapped confidence interval for the true R-squared values is (.5350, .8188).

**Example 2: Bootstrap Multiple Statistics**

The following code shows how to calculate the standard error for each coefficient in a multiple linear regression model:

set.seed(0) library(boot) #define function to calculate fitted regression coefficients coef_function function(formula, data, indices) { d #allows boot to select sample fit #fit regression model return(coef(fit)) #return coefficient estimates of model } #perform bootstrapping with 2000 replications reps #view results of boostrapping reps ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = mtcars, statistic = coef_function, R = 2000, formula = mpg ~ disp) Bootstrap Statistics : original bias std. error t1* 29.59985476 -5.058601e-02 1.49354577 t2* -0.04121512 6.549384e-05 0.00527082

From the results we can see:

- The estimated coefficient for the intercept of the model is
**29.59985476**and the standard error of this estimate is**1.49354577**. - The estimated coefficient for the predictor variable
*disp*in the model is**-0.04121512**and the standard error of this estimate is**0.00527082**.

We can quickly view the distribution of the bootstrapped samples as well:

plot(reps, index=1) #intercept of model plot(reps, index=2) #disp predictor variable

We can also use the following code to calculate the 95% confidence intervals for each coefficient:

#calculate adjusted bootstrap percentile (BCa) intervals boot.ci(reps, type="bca", index=1) #intercept of model boot.ci(reps, type="bca", index=2) #disp predictor variable CALL : boot.ci(boot.out = reps, type = "bca", index = 1) Intervals : Level BCa 95% (26.78, 32.66 ) Calculations and Intervals on Original Scale BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 2000 bootstrap replicates CALL : boot.ci(boot.out = reps, type = "bca", index = 2) Intervals : Level BCa 95% (-0.0520, -0.0312 ) Calculations and Intervals on Original Scale

From the output we can see that the 95% bootstrapped confidence intervals for the model coefficients are as follows:

- C.I. for intercept: (26.78, 32.66)
- C.I. for
*disp*: (-.0520, -.0312)

**Additional Resources**

How to Perform Simple Linear Regression in R

How to Perform Multiple Linear Regression in R

Introduction to Confidence Intervals