Rename Columns with Pandas

How to Rename Columns in Pandas: Practice with DataFrames

You will learn how to rename the labels of columns in Pandas. This is very common when you work with data structures like DataFrames.

How can you rename columns in a Pandas DataFrame?

The Pandas DataFrame rename function allows to rename the labels of columns in a Dataframe using a dictionary that specifies the current and the new values of the labels. There are multiple ways to rename columns with the rename function (e.g. using dictionaries, normal functions or lambdas).

We will go through few examples that show how to rename the columns of a Pandas DataFrame. By the end of this tutorial this will be very clear to you.

Let’s get started!

Rename a Column in a Pandas DataFrame

We will start by creating an example of Python dataframe that contains countries and their capitals. To do that we can use a Python dictionary after importing the pandas module:

import pandas as pd

df = pd.DataFrame({"Countries": ["Italy","United Kingdom", "Germany", "Greece"], "Capitals": ["Rome","London","Berlin","Athens"]})
print(df)

Here is the dataframe we have created:

        Countries Capitals
0           Italy     Rome
1  United Kingdom   London
2         Germany   Berlin
3          Greece   Athens

A dataframe can be also created from CSV format using the read_csv function.

To rename the columns of a Pandas dataframe we can use the rename function and pass a dictionary to it. The dictionary contains the current column names as keys and the new column names as values.

df.rename(columns={"Countries":"Country", "Capitals":"Capital"})

After running this command we get the following:

          Country Capital
0           Italy    Rome
1  United Kingdom  London
2         Germany  Berlin
3          Greece  Athens 

But then, if we print the value of the variable df we see the original columns…

Why?

To persist our change we have to assign the result of the rename function to a new dataframe:

new_df = df.rename(columns={"Countries":"Country", "Capitals":"Capital"})
print(new_df)

[output]
          Country Capital
0           Italy    Rome
1  United Kingdom  London
2         Germany  Berlin
3          Greece  Athens 

We have seen how to update columns by name, let’s see how to print just the column names instead of the full dataframe. We can use the head function that returns the first n rows of the dataframe:

print(new_df.head(1))

[output]
  Countries Capitals
0     Italy     Rome

As you can see the head function prints the column header (that contains the column labels) and the first row of the dataframe.

Rename a DataFrame Column in Place

In the previous section we have seen how to rename all the columns in a dataframe by assigning the output of the rename function to a new dataframe.

With Pandas we also have the option to update dataframe columns in place, in other words we can update the original dataframe instead of creating a new one.

To update DataFrame columns in place using the Pandas rename function we have to set the inplace argument to True.

df.rename(columns={"Countries":"Country", "Capitals":"Capital"}, inplace=True)
print(df)

[output]
          Country Capital
0           Italy    Rome
1  United Kingdom  London
2         Germany  Berlin
3          Greece  Athens

The inplace parameter is a boolean whose default value is False.

Also, if inplace is True the rename function returns None:

>>> print(df.rename(columns={"Countries":"Country", "Capitals":"Capital"}, inplace=True))
None

So, now you know two ways to update the labels of dataframe columns.

Rename One Column in a Pandas DataFrame

Pandas also allows to update one column in a dataframe.

Let’s see how…

df.rename(columns={"Country":"COUNTRY"}, inplace=True)
print(df)

[output]
          COUNTRY Capital
0           Italy    Rome
1  United Kingdom  London
2         Germany  Berlin
3          Greece  Athens

We have updated the name of the first column simply by including only the name of the first column in the dictionary passed to the rename function.

In a similar way we can update just the second column of our dataframe.

And now…

…let’s see what happen if we try to pass to the rename function a dictionary that contains a column name that doesn’t exist.

df.rename(columns={"Population":"POPULATION"}, inplace=True)
print(df)

The rename function updates the name of columns based on the dictionary passed to it only if a specific column name exists in the dataframe, otherwise it has no effect (unless the errors parameter is set to “raise”).

In this scenario, let’s see what happens if we pass an additional parameter called errors and we set its value to “raise”:

df.rename(columns={"Population":"POPULATION"}, inplace=True, errors="raise")

Pandas raises the following KeyError exception to tell us that there is no column called “Population”:

KeyError: "['Population'] not found in axis"

The default value for the errors parameter is “ignore”.

Therefore we haven’t seen any errors when the errors parameter was not present in our expression.

Rename a Column in Pandas By Position

Is it possible to rename a column in a dataframe based on its index?

Yes, here’s how…

Firstly we introduce the columns attribute that returns the column names of a DataFrame.

print(df.columns)

[output]
Index(['COUNTRY', 'Capital'], dtype='object')

We can access the variable returned by the columns attribute as a list and use it to rename a specific column.

For example, to rename the last column we can use:

df.rename(columns={ df.columns[-1]: "CAPITAL" }, inplace = True)
print(df)

[output]
          COUNTRY CAPITAL
0           Italy    Rome
1  United Kingdom  London
2         Germany  Berlin
3          Greece  Athens

Remember that you can access the last element of a list using the index -1.

Rename DataFrame Columns with a List

Similarly, it’s also possible to assign the new column values to the .columns attribute of the DataFrame:

df.columns = ['CoUnTrIeS','CaPiTaLs']
print(df)

[output]
        CoUnTrIeS CaPiTaLs
0           Italy     Rome
1  United Kingdom   London
2         Germany   Berlin
3          Greece   Athens

Keep in mind that the column names will be replaced in the order of the elements in the list provided.

Generally I prefer to always use the same way of renaming columns for consistency. My preferred way is passing a dictionary to the rename function.

Rename a Column in Pandas Using a Function

A common scenario is wanting to rename columns in a DataFrame to lowercase or uppercase.

To do that we can use Python standard functions together with the dataframe rename function.

df.rename(columns=str.lower, inplace=True)
print(df)

[output]
        countries capitals
0           Italy     Rome
1  United Kingdom   London
2         Germany   Berlin
3          Greece   Athens

For example, here we have used the string lower method to transform column labels into lowercase strings.

What other string methods could you use?

How to Apply a Lambda to the DataFrame Rename Function

In the previous section we have seen how apply a function to the columns of a dataframe.

Considering that lambdas are functions ( to be precise anonymous functions) we can also apply them to change the value of columns.

Here’s how…

df.rename(columns=lambda x: x[:2], inplace=True)
print(df)

[output]
               co      ca
0           Italy    Rome
1  United Kingdom  London
2         Germany  Berlin
3          Greece  Athens

As you can see we are using the following lambda function

lambda x: x[:2]

…to set the value of the column names to their first two characters.

Renaming Index For a Pandas DataFrame

We have used the rename function to rename columns in a DataFrame. The same can be done for the index.

For instance, let’s start from the following dataframe:

        Countries Capitals
0           Italy     Rome
1  United Kingdom   London
2         Germany   Berlin
3          Greece   Athens

I want to replace 0,1,2,3 with Nation 0, Nation 1, etc…

With the following call to the replace function I can rename the index:

df.rename(index={0:"Nation 0", 1: "Nation 1", 2: "Nation 2", 3: "Nation 3"}, inplace=True)
print(df)

[output]
               Countries Capitals
Nation 0           Italy     Rome
Nation 1  United Kingdom   London
Nation 2         Germany   Berlin
Nation 3          Greece   Athens

To update the index of a DataFrame pass a dictionary to the index parameter of the rename function. The keys of the dictionary represent the current index and the values of the dictionary the new index.

I could also use a lambda to avoid passing that long dictionary:

df.rename(index=lambda x: "Nation " + str(x), inplace=True)
print(df)

Can you see how we reduce duplication using a lambda?

Before continuing, try the expression above and confirm that the result is correct.

Axis Used When Renaming Columns or Index

The rename function can also be called using a different convention.

This convention uses the axis parameter to tell if the rename function targets index or columns. Here are the possible values for axis:

  • Index is targeted by using either ‘index’ or 0 as value of axis (this is the default value).
  • Columns is targeted by using either ‘columns’ or 1 as value of axis.

Below you can see the generic syntax:

DataFrame.rename(mapper, axis={'index', 'columns'})

The mapper can be either a dictionary or a function that transforms the values of a specific axis.

For instance, let’s see how we would rewrite calls to the rename function used before in this tutorial…

  1. Rename Columns

All expressions update the columns in the same way

df.rename(columns={"Countries":"Country", "Capitals":"Capital"}, inplace=True)

df.rename({"Countries":"Country", "Capitals":"Capital"}, axis='columns', inplace=True)

df.rename({"Countries":"Country", "Capitals":"Capital"}, axis=1, inplace=True)

2. Rename Index

All expressions update the index in the same way:

df.rename(index=lambda x: "Nation " + str(x), inplace=True)

df.rename(lambda x: "Nation " + str(x), axis='index', inplace=True)

df.rename(lambda x: "Nation " + str(x), axis=0, inplace=True)

Makes sense?

Verify that the output of the two conventions is the same.

Change Columns and Index At The Same Time

So far we have seen how to rename either columns or index, but we can also rename both with a single expression.

Here is an example that updates both columns and index:

df.rename(columns={"Countries":"Country", "Capitals":"Capital"}, index=lambda x: "Nation " + str(x), inplace=True)

You can see that we have passed both parameters columns and index to the rename function.

Renaming Columns with add_prefix And add_suffix

Pandas makes available other two functions to rename columns in a DataFrame:

  • add_prefix: adds a prefix to all column names.
  • add_suffix: adds a suffix to all column names.

They both return a dataframe with the updated columns.

Let’s see how they work in practice…

We will start from the following dataframe:

        Countries Capitals
0           Italy     Rome
1  United Kingdom   London
2         Germany   Berlin
3          Greece   Athens

Apply add_prefix to the dataframe to add ‘col_’ before each column label:

print(df.add_prefix('col_'))

[output]
    col_Countries col_Capitals
0           Italy         Rome
1  United Kingdom       London
2         Germany       Berlin
3          Greece       Athens

And in a similar way for add_suffix:

print(df.add_suffix('_col'))

[output]
    Countries_col Capitals_col
0           Italy         Rome
1  United Kingdom       London
2         Germany       Berlin
3          Greece       Athens

Conclusion

Well done, you have completed this tutorial!

You now know how to rename columns in a DataFrame using Pandas. You actually have multiple ways of doing it depending on the one you prefer.

And you know how to rename the index of a DataFrame too.

We have also seen how to combine the DataFrame rename function with other Python functions including lambdas.

I have put together the source code for this tutorial so you can download it and test it on your machine.

Once again, well done!

Share knowledge with your friends!

Leave a Reply

Your email address will not be published. Required fields are marked *