Drop Columns in pandas
When working with data in Pandas, we may remove a column(s) or some rows from a Pandas DataFrame. Columns/rows are usually deleted if they are no longer needed for further study. There are a few ways to do this, but the best way in Pandas is to use the .drop() form. A DataFrame can often contain columns that are irrelevant to the research. Such columns should be removed from the DataFrame to allow us to concentrate on the remaining columns.
Columns may be omitted by defining the label names and corresponding axis or simply specifying the index or column names. In addition, labels on various levels may be removed by using a multi-index by defining the level. In this article, we are going to discuss the drop columns in pandas with some examples.
Drop() function
The drop() function is used to remove a set of labels from a row or column. We may exclude rows or columns by defining label names and matching axes or directly defining index or column names. Labels on various levels may be removed by using a multi-index by defining the level. We may drop or remove one or more columns from a python DataFrame using the .drop() feature.
Syntax:
The syntax of drop() function may be defined as:
Parameters:
Labels: A string or a list of column names or the row index value.
Index: to provides the row labels.
Level: In the case of a MultiIndex DataFrame, it is used to determine the level from which the labels should be removed. It accepts either a level location or a level name as input.
Axis: It indicates that columns or rows should be dropped. To remove columns, set an axis to 1 or ‘columns’. It deletes the rows from the DataFrame by default.
Columns: It’s an alternative for axis = ‘columns’. As input, it accepts a single column label or a list of column labels.
Inplace: It specifies whether a new DataFrame should be returned or an existing one should be modified. It is a Boolean flag with a default value of False.
Errors: If set ‘ignore’, ignore errors.
Returns
- If inplace = True, it returns the DataFrame with the dropped columns or None.
- If labels aren’t found, it throws a KeyError.
Drop Single Column
A DataFrame may require the deletion of a single or complex column.
Example: We use df.drop(columns = ‘col name’) to remove the ‘age’ column from the DataFrame in the example below.
Output: After executing this code, we will get the output as shown below:
name age marks 0 Joe 20 85.1 1 Nat 21 77.8 name marks 0 Joe 85.1 1 Nat 77.8
Using drop function with axis = ‘column’ or axis = 1
To delete columns, use the axis parameter of a DataFrame.drop() method. A row or column may be used as the axis. The column axis is denoted by the number 1 or ‘columns’. Set axis=1 or axis= ‘columns’ and have a list of column names to be removed.
Example: Let’s take an above example to understand how we may use the drop function with axis = ‘column’ and axis = 1.
Output: After executing this code, we will get the output as shown below:
name age marks 0 Joe 20 85.1 1 Nat 21 77.8 name 0 Joe 1 Nat
Drop multiple columns
There are two parameters of DataFrame.drop() function parameters that we may use to delete the multiple columns of DataFrame at once.
- Use the column parameter to specify a list of column names to remove.
- Set the axis to 1 and move the column names list.
Example: Let’s take an example to understand how we may drop the multiple columns in the DataFrame.
Output: After executing this code, we will get the output as shown below:
name age marks 0 John 24 77.29 1 Alex 18 69.15 name 0 John 1 Alex
Drop the column in place
In the previous instances, whenever we executed a drop procedure, pandas generated a new copy of DataFrame because the modification was not in place. The parameter inplace specifies whether to drop a column from an existing DataFrame or make a copy of it.
- If inplace=True, it updates the current DataFrame without returning anything.
- If the inplace parameter is set to False, it generates a new DataFrame with the updated changes and returns it.
Example: Let’s explain how we may use the drop function to drop the column in place.
Output: After executing this above code, we will get the output as shown below:
name age marks 0 John 24 79.18 1 Alex 18 68.79 name 0 John 1 Alex
Drop the columns by suppressing errors
If the column we are attempting to delete does not exist in the dataset, the DataFrame.drop() method throws a KeyError. If we just want to drop the column if it occurs, we could use the parameter errors to remove the error.
- Set errors= ‘ignore’ to prevent any errors from being thrown.
- Set errors= ‘raised’ to generate a KeyError for unknown columns.
Example: Let’s take an example to understand how we may drop the columns by suppressing errors.
Output: After executing this above code, we will get the output as shown below:
name age marks 0 John 24 79.49 1 Alex 18 82.54 raise KeyError(f"{labels[mask]} not found in axis") KeyError: "['salary'] not found in axis"
Drop the column by index position
If we want to remove columns from a DataFrame but do not know their names, we can do so by deleting the column using its index position. Column indexing begins with 0 (zero) and continues until the last column, whose index value is len(df.columns)-1.
Drop First n columns
We can use DataFrame.iloc and the Python range() function to define the column’s range to be removed if we need to remove the first ‘n’ columns from a DataFrame. With the columns parameter of DataFrame.drop(), we are required to use the built-in function range().
Example: Let’s take an example to understand how we may drop the first n columns in the DataFrame.
Output: After executing this code, we will get the output as shown below:
name age marks class city 0 John 24 84.45 A US 1 Alex 18 76.11 B UK marks class city 84.45 A US 76.11 B UK
Drop the last column
Assume that we want to exclude the DataFrame’s first or last column without using the column name. Use the DataFrame.columns attribute to delete a DataFrame column based on its index location in such situations. Simply move df.columns[index] to the DataFrame.drop columns parameter ().
Example: Let’s take an example to understand how we may drop the last column from the DataFrame.
Output: After executing this above code, we will get the output as shown below:
name age marks 0 John 24 68.44 1 Alex 18 85.67 name age 0 John 24 1 Alex 18
Drop range of columns using iloc
We may need to exclude the fourth column from the dataset or a group of columns altogether. DataFrame.iloc can be used to pick a single or several columns from a DataFrame. To define the index location of the columns that need to be dropped, we can use DataFrame.iloc in the column’s parameter.
Example: Let’s take an example to understand how we may drop the range of columns by using iloc function.
Output: After executing this above code, we will get the output as shown below:
name age marks 0 John 24 79.64 1 Alex 18 86.84 name 0 John 1 Alex
Drop the Columns from multi-index DataFrames
A DataFrame with several column headers is referred to as a multi-index DataFrame. Such headers are divided into levels, with level 0 being the first, level 1 being the second, etc. A column may be dropped from any stage of a multi-index DataFrame. It drops the columns from all levels by default, but we can use a parameter level to drop the columns from only one level. We are required to pass a level name as level=level index.
Example: Let’s take an example to understand how we may drop the columns from multi-index DataFrames.
Output: After executing this above code, we will get the output as shown below:
Class X Class Y Class Z Class Y Name Marks Name Marks 0 John 87.22 Nat 68.79 1 Peter 73.45 Alex 82.76 Class X Class Z Name Name 0 John Nat 1 Peter Alex
Drop column using a function
We can also use the feature to delete columns based on some logic or a condition. To drop columns, we can use both built-in and user-defined functions.
Drop the column using the pandas DataFrame.pop() function
If we just want to delete one column, we can use the DataFrame.pop(col label) function. We are required to pass a column label that requires to be deleted. By updating the existing DataFrame, it removes the column in-place. If the column is not found, it raises a KeyError.
Example: Let’s take an example to understand how we may drop the column using the pandas DataFrame.pop() function.
Output: After executing this code, we will get the output as shown below:
name age marks 0 John 24 62.46 1 Alex 18 54.21 name marks 0 John 62.46 1 Alex 54.21
Drop the columns using the loc function
If we want to drop all of the columns from DataFrame, we may do it quickly and easily with DataFrame.loc in the column’s parameter of DataFrame.drop(). The column labels that need to be deleted are defined using DataFrame.loc. If no column labels are defined, such as df.loc[:], the DataFrame will be dropped off all columns.
Example: Let’s take an example to understand how we may drop the columns using the loc function.
Output: After executing this above code, we will get the output as shown below:
name age marks 0 John 24 79.68 1 Alex 18 84.45
Drop the columns using the pandas DataFrame delete
To drop a single column from a DataFrame, we could use pandas inbuilt function del. It is a very simplified method of dropping a column from a DataFrame. We must choose the DataFrame column to be removed and transfer it as del df[col label].
Example: Let’s take an example to understand how we may drop the columns using the pandas DataFrame delete.
Output: After executing this code, we will get the output as shown below:
name age marks 0 John 23 57.88 1 Alex 22 78.84 name marks 0 John 57.88 1 Alex 78.84