You can use the following basic syntax to extract numbers from a string in pandas:
df['my_column'].str.extract('(d+)')
This particular syntax will extract the numbers from each string in a column called my_column in a pandas DataFrame.
Note: When using a regular expression, d represents “any digit” and + stands for “one or more.”
The following example shows how to use this function in practice.
Example: Extract Number from String in Pandas
Suppose we have the following pandas DataFrame that contains information about the sales of various products:
import pandas as pd #create DataFrame df = pd.DataFrame({'product': ['A33', 'B34', 'A22', 'A50', 'C200', 'D7', 'A9', 'A13'], 'sales': [18, 22, 19, 14, 14, 11, 20, 28]}) #view DataFrame print(df) product sales 0 A33 18 1 B34 22 2 A22 19 3 A50 14 4 C200 14 5 D7 11 6 A9 20 7 A13 28
Suppose we would like to extract the number from each string in the product column.
We can use the following syntax to do so:
#extract numbers from strings in 'product' column
df['product'].str.extract('(d+)')
0
0 33
1 34
2 22
3 50
4 200
5 7
6 9
7 13
The result is a DataFrame that contains only the numbers from each row in the product column.
For example:
- The formula extracts 33 from the string A33 in the first row.
- The formula extracts 34 from the string B34 in the first row.
- The formula extracts 22 from the string A22 in the first row.
And so on.
If you’d like, you can also store these numerical values in a new column in the DataFrame:
#extract numbers from strings in 'product' column and store them in new column
df['product_numbers'] = df['product'].str.extract('(d+)')
#view updated DataFrame
print(df)
product sales product_numbers
0 A33 18 33
1 B34 22 34
2 A22 19 22
3 A50 14 50
4 C200 14 200
5 D7 11 7
6 A9 20 9
7 A13 28 13
The new column called product_numbers contains only the numbers from each string in the product column.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
Pandas: How to Sort DataFrame Based on String Column
Pandas: How to Remove Specific Characters from Strings
Pandas: Search for String in All Columns of DataFrame