You can use the argument na.rm = TRUE to exclude missing values when calculating descriptive statistics in R.
#calculate mean and exclude missing values mean(x, na.rm = TRUE) #calculate sum and exclude missing values sum(x, na.rm = TRUE) #calculate maximum and exclude missing values max(x, na.rm = TRUE) #calculate standard deviation and exclude missing values sd(x, na.rm = TRUE)
The following examples show how to use this argument in practice with both vectors and data frames.
Example 1: Use na.rm with Vectors
Suppose we attempt to calculate the mean, sum, max, and standard deviation for the following vector in R that contains some missing values:
#define vector with some missing values
x
Each of these functions returns a value of NA.
To exclude missing values when performing these calculations, we can simply include the argument na.rm = TRUE as follows:
#define vector with some missing values x rm = TRUE) [1] 7.428571 sum(x, na.rm = TRUE) [1] 52 max(x, na.rm = TRUE) [1] 16 sd(x, na.rm = TRUE) [1] 4.790864
Notice that we were able to complete each calculation successfully while excluding the missing values.
Example 2: Use na.rm with Data Frames
Suppose we have the following data frame in R that contains some missing values:
#create data frame df frame(var1=c(1, 3, 3, 4, 5), var2=c(7, 7, NA, 3, 2), var3=c(3, 3, NA, 6, 8), var4=c(1, 1, 2, 8, NA)) #view data frame df var1 var2 var3 var4 1 1 7 3 1 2 3 7 3 1 3 3 NA NA 2 4 4 3 6 8 5 5 2 8 NA
We can use the apply() function to calculate descriptive statistics for each column in the data frame and use the na.rm = TRUE argument to exclude missing values when performing these calculations:
#calculate mean of each column
apply(df, 2, mean, na.rm = TRUE)
var1 var2 var3 var4
3.20 4.75 5.00 3.00
#calculate sum of each column
apply(df, 2, sum, na.rm = TRUE)
var1 var2 var3 var4
16 19 20 12
#calculate max of each column
apply(df, 2, max, na.rm = TRUE)
var1 var2 var3 var4
5 7 8 8
#calculate standard deviation of each column
apply(df, 2, sd, na.rm = TRUE)
var1 var2 var3 var4
1.483240 2.629956 2.449490 3.366502
Once again, we were able to complete each calculation successfully while excluding the missing values.
Additional Resources
The following tutorials explain how to perform other common tasks with missing values in R:
How to Use is.null in R
How to Use na.omit in R
How to Use is.na in R