Do you want to find out how to list files in a directory using Python? In this article, you will see how to do it in 4 different ways so you can choose the one you prefer.
In all the examples we will list files in a directory with the following structure. We will call the directory test_dir:
├── data
│ └── tech.txt
└── report.txt
1 directory, 2 files
Let’s get started!
How to List Files in a Directory Using Python os.listdir()
The Python OS module allows the execution of Operating System tasks. This module comes with a variety of functions you can use to create, delete and fetch files and directories. The OS module has a function called listdir() which allows listing files and subdirectories in a directory.
Here is an example:
import os
directory = '/opt/codefathertech/test_dir/'
file_paths = os.listdir(directory)
print(file_paths)
After importing the OS module, we set the path to our directory and pass it to the listdir() function which lists all the files present in the directory.
Note that the listdir() function returns the list of files and subdirectories in the directory we pass to it, but it does not list files in any subdirectories.
In fact, as you can see, the output below doesn’t include the tech.txt file inside the data directory:
['report.txt', 'data']
Note: if you are using Windows you can set the value of the directory variable based on the location of the test_dir directory on your computer.
Let’s add the following Python statement before the last print() function to show the type of the file_paths variable.
print(type(file_paths))
When you execute the program you will see the following in the output that shows that the file_paths variable is a Python list.
<class 'list'>
From the output of os.listdir() we don’t know if a given element of the list returned by listdir() is a file or directory without doing any additional checks.
How Does the Python os.walk() Function Works?
To see the list of files in all subdirectories, we can use a different approach based on the walk() function of the OS module. This function lists files and subdirectories recursively.
Before using os.walk() to get the list of files in our test directory, let’s open the Python shell to understand how os.walk() works.
>>> import os
>>> dir_content = os.walk('.')
>>> dir_content
<generator object walk at 0x7fd09008c430>
When we pass the current directory (identified by a dot) to os.walk, we get back a generator object.
Let’s find out more about the generator object by using the next() function.
>>> next(dir_content)
('.', ['data'], ['report.txt'])
Now we can see that os.walk() yields a Python tuple where the first element is the current directory, the second element is a list of subdirectories in the directory passed to it and the third element is the list of files in that directory.
Let’s call the next() function again…
>>> next(dir_content)
('./data', [], ['tech.txt'])
The os.walk() function is traversing the directory top-down so when we call the next() function the second time, we apply it to the data subdirectory.
If you call next() again you get a StopIteration exception because there are no more values in the generator object considering that the subdirectory data doesn’t contain any subdirectories.
>>> next(dir_content)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
How to List Files in a Directory Recursively Using Python os.walk()
Now that we have an understanding of how os.walk() works, let’s write some code to get the list of all files in our test directory.
import os
directory = '/opt/codefathertech/test_dir/'
file_paths = []
for dir_path, dirs, files in os.walk(directory):
file_paths.extend([os.path.join(dir_path, file) for file in files])
print(file_paths)
Output:
['/opt/codefathertech/test_dir/report.txt', '/opt/codefathertech/test_dir/data/tech.txt']
In the example above, we have used three new variables:
- dir_path – used to store the directory returned by the generator object (remember what we have seen in the previous section about os.walk()).
- dirs – used to store the subdirectories returned by the generator object.
- files – used to store the files returned by the generator object.
The function os.path.join() returns the full path of a given file by joining the path of a directory with the filename.
We are also using a list comprehension inside the for loop.
How to Use the Glob Python Module to List Files in a Directory
The glob module can return the path of files that match a specific pattern. This module also allows listing files in a directory. This module uses wildcards to search for files.
For example, if we want to list only text files then we use a wildcard (.txt).
Let’s see an example.
import glob
directory = '/opt/codefathertech/test_dir/*'
file_paths = glob.glob(directory)
print(file_paths)
After importing the glob module, we have specified the path to the directory, we are using the wildcard (*) which means searching for all files and directories.
Then we are passing it to the glob.glob function:
['/opt/codefathertech/test_dir/report.txt', '/opt/codefathertech/test_dir/data']
If you only want to match .txt files you can update the line below:
directory = '/opt/codefathertech/test_dir/*.txt'
A benefit of using the Python glob module to list files in a directory is that this approach automatically includes the full path to each file.
This wasn’t the case in the previous examples we have seen with os.listdir() and os.walk().
Using the Glob Python Module to Show Files in a Directory Recursively
In the output of the program we created at the end of the previous section, you cannot see the file tech.txt inside the data directory. To do that you have to list files recursively.
To list files in a directory recursively using the Python glob module you have to pass the recursive argument to the glob.glob() function and set it to True. The recursive argument is False by default. You also have to use a double asterisk in the pattern you are using.
Update the previous code…
We will make two changes:
- Replace the asterisk at the end of the directory variable with a double asterisk (**).
- Pass an additional argument to the glob function (recursive = True).
import glob
directory = '/opt/codefathertech/test_dir/**'
file_paths = glob.glob(directory, recursive=True)
print(file_paths)
And here is how the output changes:
['/opt/codefathertech/test_dir/', '/opt/codefathertech/test_dir/report.txt', '/opt/codefathertech/test_dir/data', '/opt/codefathertech/test_dir/data/tech.txt']
A lot better!
The double asterisk we are using in the directory pattern passed to the glob() function is only applicable when recursive is True.
Using the Python PathLib Module to Lists Files in a Directory
PathLib is another Python module that provides powerful functions to handle files.
We will use the Path() class to define a path to the directory, then we will use iterdir() to iterate through the directory.
Then, we use the is_file() method to check if we are dealing with a file or not.
Let’s implement this…
import pathlib
directory = '/opt/codefathertech/test_dir/'
file_paths = []
for file in pathlib.Path(directory).iterdir():
if file.is_file():
file_paths.append(file)
print(file_paths)
And the output is:
[PosixPath('/opt/codefathertech/test_dir/report.txt')]
Note that the code above only lists files in the current directory. It doesn’t go through subdirectories.
To list files in the current directory and sub-directory, modify the code above to make it recursive.
Let’s see an example:
from pathlib import Path
directory = Path('/opt/codefathertech/test_dir/')
file_paths = []
for file in directory.rglob('*'):
if file.is_file():
file_paths.append(str(file))
print(file_paths)
Notice that in this code we have used the construct “from pathlib import Path“. You will find out more about it at the end of this article.
This time in the output you also see the file tech.txt in the data subdirectory.
['/opt/codefathertech/test_dir/report.txt', '/opt/codefathertech/test_dir/data/tech.txt']
In Python, glob.glob() is a function that returns a list of file paths that match a given pattern. The rglob() function is similar, but it searches for matches recursively in all directories under the specified path.
In the output, we also get back the tech.txt file under the data directory because we are using the rglob() function.
Try to update the previous code by replacing rglob(‘*’) with glob(‘*’) and confirm that in the output you only see the file report.txt. In other words, the behavior is not recursive anymore.
Conclusion
In this article, you have learned how to list files in a directory using Python.
We have seen different ways to list files with examples using the OS, Glob, and Pathlib modules.
Claudio Sabato is an IT expert with over 15 years of professional experience in Python programming, Linux Systems Administration, Bash programming, and IT Systems Design. He is a professional certified by the Linux Professional Institute.
With a Master’s degree in Computer Science, he has a strong foundation in Software Engineering and a passion for robotics with Raspberry Pi.