Often you may be interested in finding all of the unique values across multiple columns in a pandas DataFrame. Fortunately this is easy to do using the pandas unique() function combined with the ravel() function:
- unique(): Returns unique values in order of appearance.
- ravel():Â Returns a flattened data series.
For example, suppose we have the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'col1': ['a', 'b', 'c', 'd', 'e'], 'col2': ['a', 'c', 'e', 'f', 'g'], 'col3': [11, 8, 10, 6, 6]}) #view DataFrame print(df) col1 col2 col3 0 a a 11 1 b c 8 2 c e 10 3 d f 6 4 e g 6
Return Array of Unique Values
The following code shows how to find the unique values in col1 and col2:
pd.unique(df[['col1', 'col2']].values.ravel()) array(['a', 'b', 'c', 'e', 'd', 'f', 'g'], dtype=object)
From the output we can see that there are 7 unique values across these two columns: a, b, c, d, e, f, g.
Return DataFrame of Unique Values
If you’d like to return these values as a DataFrame instead of an array, you can use the following code:
uniques = pd.unique(df[['col1', 'col2']].values.ravel()) pd.DataFrame(uniques) 0 0 a 1 b 2 c 3 e 4 d 5 f 6 g
Return Number of Unique Values
If you simply want to know the number of unique values across multiple columns, you can use the following code:
uniques = pd.unique(df[['col1', 'col2']].values.ravel()) len(uniques) 7
This tell us that there are 7 unique values across these two columns.
Additional Resources
How to Merge Pandas DataFrames on Multiple Columns
How to Filter a Pandas DataFrame on Multiple Conditions