The droplevels() function in R can be used to drop unused factor levels.
This function is particularly useful if we want to drop factor levels that are no longer used due to subsetting a vector or a data frame.
This function uses the following syntax:
droplevels(x)
where x is an object from which to drop unused factor levels.
This tutorial provides a couple examples of how to use this function in practice.
Example 1: Drop Unused Factor Levels in a Vector
Suppose we create a vector of data with five factor levels. Then suppose we define a new vector of data with just three of the original five factor levels.
#define data with 5 factor levels data factor(c(1, 2, 3, 4, 5)) #define new data as original data minus 4th and 5th factor levels new_data #view new data new_data [1] 1 2 3 Levels: 1 2 3 4 5
Although the new data only contains three factors, we can see that it still contains the original five factor levels.
To remove these unused factor levels, we can use the droplevels() function:
#drop unused factor levels new_data droplevels(new_data) #view data new_data [1] 1 2 3 Levels: 1 2 3
The new data now contains just three factor levels.
Example 2: Drop Unused Factor Levels in a Data Frame
Suppose we create a data frame in which one of the variables is a factor with five levels. Then suppose we define a new data frame that happens to remove two of these factor levels:
#create data frame df frame(region=factor(c('A', 'B', 'C', 'D', 'E')), sales = c(13, 16, 22, 27, 34)) #view data frame df region sales 1 A 13 2 B 16 3 C 22 4 D 27 5 E 34 #define new data frame new_df subset(df, sales #view new data frame new_df region sales 1 A 13 2 B 16 3 C 22 #check levels of region variable levels(new_df$region) [1] "A" "B" "C" "D" "E"
Although the new data frame contains only three factors in the region column, it still contains the original five factor levels. This would create some problems if we tried to create any plots using this data.
To remove the unused factor levels from the region variable, we can use the droplevels() function:
#drop unused factor levels new_df$region droplevels(new_df$region) #check levels of region variable levels(new_df$region) [1] "A" "B" "C"
Now the region variable only contains three factor levels.
You can find more R tutorials on this page.