You can use the following methods to sort the rows of a pandas DataFrame based on the values in a particular string column:
Method 1: Sort by String Column (when column only contains characters)
df = df.sort_values('my_string_column')
Method 2: Sort by String Column (when column contains characters and digits)
#create 'sort' column that contains digits from 'my_string_column' df['sort'] = df['my_string_column'].str.extract('(d+)', expand=False).astype(int) #sort rows based on digits in 'sort' column df = df.sort_values('sort')
The following examples show how to use each method in practice.
Example 1: Sort by String Column (when column only contains characters)
Suppose we have the following pandas DataFrame that contains information about the sales of various products at some grocery store:
import pandas as pd #create DataFrame df = pd.DataFrame({'product': ['Apples', 'Oranges', 'Bananas', 'Lettuce', 'Beans'], 'sales': [18, 22, 19, 14, 29]}) #view DataFrame print(df) product sales 0 Apples 18 1 Oranges 22 2 Bananas 19 3 Lettuce 14 4 Beans 29
We can use the following syntax to sort the rows of the DataFrame based on the strings in the product column:
#sort rows from A to Z based on string in 'product' column
df = df.sort_values('product')
#view updated DataFrame
print(df)
product sales
0 Apples 18
2 Bananas 19
4 Beans 29
3 Lettuce 14
1 Oranges 22
Notice that the rows are now sorted from A to Z based on the strings in the product column.
If you’d like to instead sort from Z to A, simply add the argument ascending=False:
#sort rows from Z to A based on string in 'product' column
df = df.sort_values('product', ascending=False)
#view updated DataFrame
print(df)
product sales
1 Oranges 22
3 Lettuce 14
4 Beans 29
2 Bananas 19
0 Apples 18
Notice that the rows are now sorted from Z to A based on the strings in the product column.
Example 2: Sort by String Column (when column contains characters and digits)
Suppose we have the following pandas DataFrame that contains information about the sales of various products at some grocery store:
import pandas as pd #create DataFrame df = pd.DataFrame({'product': ['A3', 'A5', 'A22', 'A50', 'A2', 'A7', 'A9', 'A13'], 'sales': [18, 22, 19, 14, 14, 11, 20, 28]}) #view DataFrame print(df) product sales 0 A3 18 1 A5 22 2 A22 19 3 A50 14 4 A2 14 5 A7 11 6 A9 20 7 A13 28
Notice that the strings in the product column contain both characters and digits.
If we attempt to sort the rows of the DataFrame using the values in the product column, the strings will not be sorted in the correct order based on the digits:
import pandas as pd #sort rows based on strings in 'product' column df = df.sort_values('product') #view updated DataFrame print(df) product sales 7 A13 28 4 A2 14 2 A22 19 0 A3 18 1 A5 22 3 A50 14 5 A7 11 6 A9 20
Instead, we must create a new temporary column called sort that contains only the digits from the product column, then sort by the values in the sort column, then drop the column entirely:
import pandas as pd #create new 'sort' column that contains digits from 'product' column df['sort'] = df['product'].str.extract('(d+)', expand=False).astype(int) #sort rows based on digits in 'sort' column df = df.sort_values('sort') #drop 'sort' column df = df.drop('sort', axis=1) #view updated DataFrame print(df) product sales 4 A2 14 0 A3 18 1 A5 22 5 A7 11 6 A9 20 7 A13 28 2 A22 19 3 A50 14
Notice that the rows are now sorted by the strings in the product column and the digits are sorted in the correct order.
Note: You can find the complete documentation for the pandas sort_values() function here.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
Pandas: How to Sort by Date
Pandas: How to Sort Columns by Name
Pandas: How to Sort by Both Index and Column