You can use the following methods to count duplicates in a data frame in R:
Method 1: Count Duplicate Values in One Column
sum(duplicated(df$my_column))
Method 2: Count Duplicate Rows
nrow(df[duplicated(df), ])
Method 3: Count Duplicates for Each Unique Row
library(dplyr)
df %>% group_by_all() %>% count
The following examples show how to use each method in practice with the following data frame in R:
#create data frame
df = data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
position=c('G', 'G', 'G', 'F', 'G', 'G', 'F', 'F'),
points=c(5, 5, 8, 10, 5, 7, 10, 10))
#view data frame
df
team position points
1 A G 5
2 A G 5
3 A G 8
4 A F 10
5 B G 5
6 B G 7
7 B F 10
8 B F 10
Example 1: Count Duplicate Values in One Column
The following code shows how to count the number of duplicate values in the points column:
#count number of duplicate values in points column
sum(duplicated(df$points))
[1] 4
We can see that there are 4 duplicate values in the points column.
Example 2: Count Duplicate Rows
The following code shows how to count the number of duplicate rows in the data frame:
#count number of duplicate rows
nrow(df[duplicated(df), ])
[1] 2
We can see that there are 2 duplicate rows in the data frame.
We can use the following syntax to view these 2 duplicate rows:
#display duplicated rows
df[duplicated(df), ]
team position points
2 A G 5
8 B F 10
Example 3: Count Duplicates for Each Unique Row
The following code shows how to count the number of duplicates for each unique row in the data frame:
library(dplyr)
#count number of duplicate rows in data frame
df %>% group_by_all() %>% count
# A tibble: 6 x 4
# Groups: team, position, points [6]
team position points n
1 A F 10 1
2 A G 5 2
3 A G 8 1
4 B F 10 2
5 B G 5 1
6 B G 7 1
The n column displays the number of duplicates for each unique row.
Additional Resources
The following tutorials explain how to perform other common tasks in R:
How to Find Duplicate Elements Using dplyr
How to Remove Duplicate Rows in R
How to Remove Duplicate Rows in R so None are Left