One error you may encounter when using pandas is:
ValueError: cannot convert float NaN to integer
This error occurs when you attempt to convert a column in a pandas DataFrame from a float to an integer, yet the column contains NaN values.
The following example shows how to fix this error in practice.
How to Reproduce the Error
Suppose we create the following pandas DataFrame:
import pandas as pd import numpy as np #create DataFrame df = pd.DataFrame({'points': [25, 12, 15, 14, 19, 23, 25, 29], 'assists': [5, 7, 7, 9, 12, 9, 9, 4], 'rebounds': [11, np.nan, 10, 6, 5, np.nan, 9, 12]}) #view DataFrame df points assists rebounds 0 25 5 11 1 12 7 NaN 2 15 7 10 3 14 9 6 4 19 12 5 5 23 9 NaN 6 25 9 9 7 29 4 12
Currently the ‘rebounds’ column is of the data type ‘float.’
#print data type of 'rebounds' column df['rebounds'].dtype dtype('float64')
Suppose we attempt to convert the ‘rebounds’ column from a float to an integer:
#attempt to convert 'rebounds' column from float to integer df['rebounds'] = df['rebounds'].astype(int) ValueError: cannot convert float NaN to integer
We receive a ValueError because the NaN values in the ‘rebounds’ column cannot be converted to integer values.
How to Fix the Error
The way to fix this error is to deal with the NaN values before attempting to convert the column from a float to an integer.
We can use the following code to first identify the rows that contain NaN values:
#print rows in DataFrame that contain NaN in 'rebounds' column print(df[df['rebounds'].isnull()]) points assists rebounds 1 12 7 NaN 5 23 9 NaN
We can then either drop the rows with NaN values or replace the NaN values with some other value before converting the column from a float to an integer:
Method 1: Drop Rows with NaN Values
#drop all rows with NaN values df = df.dropna() #convert 'rebounds' column from float to integer df['rebounds'] = df['rebounds'].astype(int) #view updated DataFrame df points assists rebounds 0 25 5 11 2 15 7 10 3 14 9 6 4 19 12 5 6 25 9 9 7 29 4 12 #view class of 'rebounds' column df['rebounds'].dtype dtype('int64')
Method 2: Replace NaN Values
#replace all NaN values with zeros df['rebounds'] = df['rebounds'].fillna(0) #convert 'rebounds' column from float to integer df['rebounds'] = df['rebounds'].astype(int) #view updated DataFrame df points assists rebounds 0 25 5 11 1 12 7 0 2 15 7 10 3 14 9 6 4 19 12 5 5 23 9 0 6 25 9 9 7 29 4 12 #view class of 'rebounds' column df['rebounds'].dtype dtype('int64')
Note that both methods allow us to avoid the ValueError and successfully convert the float column to an integer column.
Additional Resources
The following tutorials explain how to fix other common errors in Python:
How to Fix: columns overlap but no suffix specified
How to Fix: ‘numpy.ndarray’ object has no attribute ‘append’
How to Fix: if using all scalar values, you must pass an index