In this Python tutorial, you will see how to write a CSV file using the Pandas library.
The Python Pandas library provides the function to_csv() to write a CSV file from a DataFrame. By default, the index of the DataFrame is added to the CSV file and the field separator is the comma. Index, separator, and many other CSV properties can be modified by passing additional arguments to the to_csv() function.
Let’s go through a few code examples to make sure you know all the available options to generate a CSV file from a Pandas DataFrame.
Are you ready?
How to Create a CSV File in Python Using Pandas
The simplest and most common scenario you are likely to experience is that you have a Pandas DataFrame and you want to convert it into a CSV file.
Let’s generate the Pandas DataFrame we will use for all our examples in this tutorial:
import pandas as pd
data = {
'Player': ['Tom', 'Jack', 'Kate', 'Carol'],
'Points': [345, 23, 333, 210],
'Level': [4, 1, 3, 2]
}
df = pd.DataFrame(data)
print(df)
After importing the Pandas module we define a dictionary that contains points and level information for players in a game.
Here is what the DataFrame looks like.
Player Points Level
0 Tom 345 4
1 Jack 23 1
2 Kate 333 3
3 Carol 210 2
As you can see, this DataFrame has 4 rows and 3 columns.
The Pandas library provides the function to_csv() to write a DataFrame object to a CSV file with comma-separated values.
Let’s test this function with our sample DataFrame:
df.to_csv('players.csv')
This code generates a file called players.csv with the following content:
,Player,Points,Level
0,Tom,345,4
1,Jack,23,1
2,Kate,333,3
3,Carol,210,2
Here are a few things we can see in the CSV file:
- It includes the index of the DataFrame.
- It has the same number of rows and columns as the DataFrame.
- It uses the comma as a separator for the fields in each row (including the header).
In the next sections, we will analyze the optional arguments you can pass to the function to_csv() to customize the CSV file we create from the DataFrame.
Also, if you want to learn how to perform the opposite operation, read this article that shows how to import a CSV file into a Pandas DataFrame.
How to Create a CSV File Without Index
As we have seen in the previous section, Pandas DataFrames include an index by default. When creating a CSV file from a DataFrame it’s likely that you won’t need the index.
What can you do to remove the index?
To create a CSV file without an index using Pandas you have to pass the “index” argument to the to_csv() function and set it to the boolean value False (the default value is True).
df.to_csv('players.csv', index=False)
Let’s test it with the previous DataFrame and confirm that the index has been dropped from the CSV file.
Player,Points,Level
Tom,345,4
Jack,23,1
Kate,333,3
Carol,210,2
Nice!
The index is not in the CSV file anymore.
Using a Specific Separator When Writing CSV Files with Pandas
The Pandas to_csv() function allows writing a CSV file that uses a different separator from the default one which is the comma. To specify a different separator from the default one you have to pass the “sep” argument to the to_csv() function.
For example, to use the tab as a separator you can use the following syntax:
df.to_csv('players.csv', sep='\t')
Let’s create a CSV file using this separator and see what it looks like.
Player Points Level
0 Tom 345 4
1 Jack 23 1
2 Kate 333 3
3 Carol 210 2
Many systems use the comma as the default separator for CSV files so before changing the separator in the CSV you generate, make sure the system that will read it is aware of this.
If the system reading your CSV file assumes the separator of your CSV file is the comma, it will fail to import the data from the CSV file correctly.
It won’t be able to identify the fields in the CSV file due to the different separator used.
How to Write a CSV File Without Header in Python
To create a CSV file from a Pandas DataFrame without including the header you can pass the argument “header” to the to_csv() function and set it to False.
This is something you might want to do or not depending on the requirements of the CSV file you want to generate.
It’s good practice to include the header because it explains the meaning of each field in the CSV file.
df.to_csv('players.csv', header=False)
Here is the CSV file we have generated:
0,Tom,345,4
1,Jack,23,1
2,Kate,333,3
3,Carol,210,2
You can see that the header is missing in the CSV file.
How to Add a Subset of Columns of a DataFrame to a CSV File
Sometimes you might have columns in your DataFrame that you don’t want to include in your final CSV file.
Using the Pandas to_csv() function you can write a subset of columns from the DataFrame to the CSV file.
To write a subset of columns from a Pandas DataFrame to a CSV file, you have to first create a list that contains the names of the columns to include. Then set the value of the “columns” argument passed to the function to_csv() to the list you have created.
The default value of “columns” is None and it exports all the columns to the CSV file.
Let’s test and include only the columns Player and Points in the CSV file.
columns_to_include = ['Player', 'Points']
df.to_csv('players.csv', columns=columns_to_include)
[CSV file generated]
,Player,Points
0,Tom,345
1,Jack,23
2,Kate,333
3,Carol,210
You can also set the list of fields directly in the main statement without having to define a separate list first.
df.to_csv('players.csv', columns=['Player', 'Points'])
Makes sense?
How to Set Custom Column Names When Converting the DataFrame into CSV
Are the names of the columns in the source DataFrame different from the column names you want in the CSV file?
In that case, you can pass the “header” argument to the to_csv function and assign the list of column names to this argument.
The list assigned to the “header” is a list of strings.
Let’s generate CSV data with different column names from the original DataFrame and this time instead of generating a CSV file we will simply read the data returned by the to_csv function.
csv_data = df.to_csv(header=['players', 'points', 'levels'])
print(csv_data)
The output is:
,players,points,levels
0,Tom,345,4
1,Jack,23,1
2,Kate,333,3
3,Carol,210,2
Replacing Missing Data in a DataFrame with a Default Value in the CSV Data
Occasionally you might find yourself working with DataFrames that have missing values.
What happens when you convert those into CSV format?
Let’s find out!
Update our initial code to generate a DataFrame with a missing value (NA) in the Points column:
import pandas as pd
data = {
'Player': ['Tom', 'Jack', 'Kate', 'Carol'],
'Points': [345, 23, pd.NA, 210],
'Level': [4, 1, 3, 2]
}
df = pd.DataFrame(data)
print(df)
Here is the DataFrame we generated:
Player Points Level
0 Tom 345 4
1 Jack 23 1
2 Kate <NA> 3
3 Carol 210 2
Now, let’s convert this DataFrame into CSV.
csv_data = df.to_csv()
print(csv_data)
CSV file
,Player,Points,Level
0,Tom,345,4
1,Jack,23,1
2,Kate,,3
3,Carol,210,2
You can see that the missing value is translated with an empty value in the CSV data.
If you are not happy with this, you can tell the to_csv() function to use a specific value to replace missing values in the DataFrame. You can do this by passing the “na_rep” argument to the function.
csv_data = df.to_csv(na_rep='NA')
print(csv_data)
From the output, you can see that we have replaced missing data with NA.
,Player,Points,Level
0,Tom,345,4
1,Jack,23,1
2,Kate,NA,3
3,Carol,210,2
This value can be anything you decide and it’s often agreed with the developers who write the system that reads the CSV file you generate.
How to Reduce the Size of the CSV File Created with Pandas
If you have a large dataset in your DataFrame, you can reduce the size of the CSV file you generate from the DataFrame. You can do it by passing the “compression” argument to the function to_csv().
Here is the statement you could use. As you can see we are using gzip compression.
df.to_csv('players.csv', compression='gzip')
If you enable compression the write time and read time will increase considering that the file will have to be compressed and decompressed.
Conclusion
We went through lots of different options when it comes to writing a CSV file using Pandas.
You should now have an understanding of how to do that in your Python programs.
Related article: here is an article you can read if you want to learn how to create a Pandas DataFrame from a CSV file (the opposite compared to what we have done in this tutorial).
Claudio Sabato is an IT expert with over 15 years of professional experience in Python programming, Linux Systems Administration, Bash programming, and IT Systems Design. He is a professional certified by the Linux Professional Institute.
With a Master’s degree in Computer Science, he has a strong foundation in Software Engineering and a passion for robotics with Raspberry Pi.