You can use the following basic syntax to remove rows from a data frame in R using dplyr:
1. Remove any row with NA’s
df %>%
na.omit()
2. Remove any row with NA’s in specific column
df %>% filter(!is.na(column_name))
3. Remove duplicates
df %>%
distinct()
4. Remove rows by index position
df %>% filter(!row_number() %in% c(1, 2, 4))
5. Remove rows based on condition
df %>%
filter(column1=='A' | column2 > 8)
The following examples show how to use each of these methods in practice with the following data frame:
library(dplyr) #create data frame df frame(team=c('A', 'A', 'B', 'B', 'C', 'C'), points=c(4, NA, 7, 5, 9, 9), assists=c(1, 3, 5, NA, 2, 2)) #view data frame df team points assists 1 A 4 1 2 A NA 3 3 B 7 5 4 B 5 NA 5 C 9 2 6 C 9 2
Example 1: Remove Any Row with NA’s
The following code shows how to remove any row with NA values from the data frame:
#remove any row with NA df %>% na.omit() team points assists 1 A 4 1 3 B 7 5 5 C 9 2 6 C 9 2
Example 2: Remove Any Row with NA’s in Specific Columns
The following code shows how to remove any row with NA values in a specific column:
#remove any row with NA in 'points' column: df %>% filter(!is.na(points)) team points assists 1 A 4 1 2 B 7 5 3 B 5 NA 4 C 9 2 5 C 9 2
Example 3: Remove Duplicate Rows
The following code shows how to remove duplicate rows:
#remove duplicate rows
df %>%
distinct()
team points assists
1 A 4 1
2 A NA 3
3 B 7 5
4 B 5 NA
5 C 9 2
Example 4: Remove Rows by Index Position
The following code shows how to remove rows based on index position:
#remove rows 1, 2, and 4 df %>% filter(!row_number() %in% c(1, 2, 4)) team points assists 1 B 7 5 2 C 9 2 3 C 9 2
Example 5: Remove Rows Based on Condition
The following code shows how to remove rows based on specific conditions:
#only keep rows where team is equal to 'A' or points is greater than 8 df %>% filter(column1=='A' | column2 > 8) team points assists 1 A 4 1 2 A NA 3 3 C 9 2 4 C 9 2
Additional Resources
The following tutorials explain how to perform other common functions in dplyr:
How to Select Columns by Index Using dplyr
How to Rank Variables by Group Using dplyr
How to Replace NA with Zero in dplyr