You can use the following basic syntax in dplyr to mutate a variable if a column contains a particular string:
library(dplyr) df %>% mutate_at(vars(contains('starter')), ~ (scale(.) %>% as.vector))
This particular syntax applies the scale() function to each variable in the data frame that contains the string ‘starter’ in the column name.
The following example shows how to use this syntax in practice.
Example: Mutate Variable if Column Contains String
Suppose we have the following data frame in R:
#create data frame df frame(team=c('A', 'B', 'C', 'D', 'E', 'F'), starter_points=c(22, 26, 25, 13, 15, 22), starter_assists=c(4, 5, 10, 14, 12, 10), bench_points=c(7, 7, 9, 14, 13, 10), bench_assists=c(2, 5, 5, 4, 9, 14)) #view data frame df team starter_points starter_assists bench_points bench_assists 1 A 22 4 7 2 2 B 26 5 7 5 3 C 25 10 9 5 4 D 13 14 14 4 5 E 15 12 13 9 6 F 22 10 10 14
We can use the following syntax to apply the scale() function to each variable in the data frame that contains the string ‘starter’ in the column name.
library(dplyr) #apply scale() function to each variable that contains 'starter' in the name df %>% mutate_at(vars(contains('starter')), ~ (scale(.) %>% as.vector)) team starter_points starter_assists bench_points bench_assists 1 A 0.2819668 -1.3180158 7 2 2 B 1.0338784 -1.0629159 7 5 3 C 0.8459005 0.2125832 9 5 4 D -1.4098342 1.2329825 14 4 5 E -1.0338784 0.7227828 13 9 6 F 0.2819668 0.2125832 10 14
Using this syntax, we were able to apply the scale() function to scale each column that contained ‘starter’ such that their values now have a mean of 0 and standard deviation of 1.
Notice that the following columns were modified:
- starter_points
- starter_assists
All other columns remained unchanged.
Also note we can apply any function we’d like using this syntax.
In the previous example, we chose to scale each column with the string ‘starter’ in the name.
However, we could do something simpler such as multiply the values by two for each column with ‘starter’ in the name:
library(dplyr) #multiply values by two for each variable that contains 'starter' in the name df %>% mutate_at(vars(contains('starter')), ~ (. * 2)) team starter_points starter_assists bench_points bench_assists 1 A 44 8 7 2 2 B 52 10 7 5 3 C 50 20 9 5 4 D 26 28 14 4 5 E 30 24 13 9 6 F 44 20 10 14
Notice that the values in the starter_points and starter_assists columns have been multiplied by two, while all other columns have remained unchanged.
Additional Resources
The following tutorials explain how to perform other common tasks in dplyr:
How to Remove Rows Using dplyr
How to Select Columns by Index Using dplyr
How to Filter Rows that Contain a Certain String Using dplyr