Two functions that people often get mixed up in R are grep() and grepl(). Both functions allow you to see whether a certain pattern exists in a character string, but they return different results:
- grepl() returns TRUE when a pattern exists in a character string.
- grep() returns a vector of indices of the character strings that contain the pattern.
The following example illustrates this difference:
#create a vector of data data grep('Guard', data) [1] 1 2 grepl('Guard', data) [1] TRUE TRUE FALSE FALSE FALSE
The following examples show when you might want to use one of these functions over the other.
When to Use grepl()
1. Filter Rows that Contain a Certain String
One of the most common uses of grepl() is for filtering rows in a data frame that contain a certain string:
library(dplyr) #create data frame df #filter rows that contain the string 'Guard' in the player column df %>% filter(grepl('Guard', player)) player points rebounds 1 P Guard 12 5 2 S Guard 15 7
Related: How to Filter Rows that Contain a Certain String Using dplyr
When to Use grep()
1. Select Columns that Contain a Certain String
You can use grep() to select columns in a data frame that contain a certain string:
library(dplyr) #create data frame df #select columns that contain the string 'p' in their name df %>% select(grep('p', colnames(df))) player points 1 P Guard 12 2 S Guard 15 3 S Forward 19 4 P Forward 22 5 Center 32
2. Count the Number of Rows that Contain a Certain String
You can use grep() to count the number of rows in a data frame that contain a certain string:
#create data frame df #count how many rows contain the string 'Guard' in the player column length(grep('Guard', df$player)) [1] 2
You can find more R tutorials here.