You can use one of the following methods to count the number of distinct values in an R data frame using the n_distinct() function from dplyr:
Method 1: Count Distinct Values in One Column
n_distinct(df$column_name)
Method 2: Count Distinct Values in All Columns
sapply(df, function(x) n_distinct(x))
Method 3: Count Distinct Values by Group
df %>% group_by(grouping_column) %>% summarize(count_distinct = n_distinct(values_column))
The following examples show how to use each of these methods in practice with the following data frame:
library(dplyr) #create data frame df frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'), points=c(6, 6, 8, 10, 9, 9, 12, 12), assists=c(3, 6, 4, 2, 4, 5, 5, 9)) #view data frame df team points assists 1 A 6 3 2 A 6 6 3 A 8 4 4 A 10 2 5 B 9 4 6 B 9 5 7 B 12 5 8 B 12 9
Method 1: Count Distinct Values in One Column
The following code shows how to use n_distinct() to count the number of distinct values in the ‘team’ column:
#count distinct values in 'team' column
n_distinct(df$team)
[1] 2
There are 2 distinct values in the ‘team’ column.
Method 2: Count Distinct Values in All Columns
The following code shows how to use the sapply() and n_distinct() functions to count the number of distinct values in each column of the data frame:
#count distinct values in every column
sapply(df, function(x) n_distinct(x))
team points assists
2 5 6
From the output we can see:
- There are 2 distinct values in the ‘team’ column
- There are 5 distinct values in the ‘points’ column
- There are 6 distinct values in the ‘assists’ column
Method 3: Count Distinct Values by Group
The following code shows how to use the n_distinct() function to count the number of distinct values by group:
#count distinct 'points' values by 'team'
df %>%
group_by(team) %>%
summarize(distinct_points = n_distinct(points))
# A tibble: 2 x 2
team distinct_points
1 A 3
2 B 2
From the output we can see:
- There are 3 distinct points values for team A.
- There are 2 distinct points values for team B.
Additional Resources
The following tutorials explain how to perform other common operations using dplyr:
How to Recode Values Using dplyr
How to Replace NA with Zero in dplyr
How to Rank Variables by Group Using dplyr
How to Select the First Row by Group Using dplyr