The setdiff() function in R can be used to find differences between two sets. This function uses the following syntax:
setdiff(x, y)
where:
- x, y: Vectors or data frames containing a sequence of items
This tutorial provides several examples of how to use this function in practice.
Example 1: Setdiff with Numeric Vectors
The following code shows how to use setdiff() to identify all of the values in vector a that do not occur in vector b:
#define vectors a #find all values in a that do not occur in b setdiff(a, b) [1] 9 10
There are two values that occur in vector a that do not occur in vector b: 9 and 10.
If we reverse the order of the vectors in the setdiff() function, we can instead identify all of the values in vector b that do not occur in vector a:
#find all values in b that do not occur in a setdiff(b, a) [1] 2 6
There are two values that occur in vector b that do not occur in vector a: 2 and 6.
Example 2: Setdiff with Character Vectors
The following code shows how to use setdiff() to identify all of the values in vector char1 that do not occur in vector char2:
#define character vectors char1 #find all values in char1 that do not occur in char2 setdiff(char1, char2) [1] "C" "D"
Example 3: Setdiff with Data Frames
The following code shows how to use setdiff() to identify all of the values in one data frame column that do not appear in the same column of a second data frame:
#define data frames df1 frame(team=c('A', 'B', 'C', 'D'), conference=c('West', 'West', 'East', 'East'), points=c(88, 97, 94, 104)) df2 frame(team=c('A', 'B', 'C', 'D'), conference=c('West', 'West', 'East', 'East'), points=c(88, 97, 98, 99)) #find differences between the points columns in the two data frames setdiff(df1$points, df2$points) [1] 94 104
We can see that the values 94 and 104 occur in the points column of the first data frame, but not in the points column of the second data frame.
Additional Resources
How to Sum Specific Columns in R
How to Sum Specific Rows in R
How to Perform Partial String Matching in R