File System Operations
Contents
File System Operations#
This lesson will go over how to use Path
objects to preform common
operations on files and directories.
This part of the lesson is about the ways to get a list of files on your local computer.
Table of Contents
Part 1: Listing files#
Part 1.2: Directory contents#
Path
objects provide an iterdir()
method for iterating through the
contents of a directory. Here’s a simple example that prints contents of the
working directory.
1from pathlib import Path
2
3cwd = Path.cwd()
4for f in cwd.iterdir():
5 print(f)
Each of the elements yielded by iterdir()
is a Path
object, so we have
access to all of its methods and properties. For example, the following code
iterates through the contents of the working directory and uses
path.name
and path.is_dir()
to print just the directory
names.
1from pathlib import Path
2
3cwd = Path.cwd()
4
5for f in cwd.iterdir():
6 if f.is_dir():
7 print(f.name)
Here’s another example that iterates through the contents of the working
directory and prints any files with "recipe"
in the name that are either
text or markdown.
1from pathlib import Path
2
3cwd = Path.cwd()
4
5for f in cwd.iterdir():
6 if f.suffix in (".md", ".txt") and "recipe" in f.name:
7 print(f.name)
(Print the contents of the working directory.)
Iterate over the contents of the working directory and:
skip over files or directories with names that begin with
.
or__
print the name of any directories followed by a
/
print the name of everything else
Solution to Exercise 70 (Print the contents of the working directory.)
from pathlib import Path
cwd = Path.cwd()
for f in cwd.iterdir():
if f.name.startswith(".") or f.name.startswith("__"):
continue
if f.is_dir():
print(f"{f.name}/")
else:
print(f.name)
Part 1.3: Searching files#
Path
objects provides a glob()
method for simple, albeit limited, file
searching. The glob()
method uses a set of wildcard characters adopted from
the command line called glob patterns, the most common of which is
*
meaning “any or no text”.
The following example uses the glob pattern *.py
to search for any files
ending in .py
.
1from pathlib import Path
2
3path = Path("data")
4for f in path.glob("*.py"):
5 print(f.name)
The rglob()
method works the same way, but recursively
includes the contents of all subdirectories in the search.
1from pathlib import Path
2
3path = Path.cwd()
4for f in path.rglob("*.py"):
5 print(f)
(Search for files)
Print the names of all files in your working directory or recursively in any
subdirectories that end with *.txt
.
Solution to Exercise 71 (Search for files)
1from pathlib import Path
2
3cwd = Path.cwd()
4for f in cwd.rglob("*.txt"):
5 print(f.name)
See also
You can find a details on globbing in Python in the reference section of this lesson.
Part 2: Directory operations#
Part 2.1: Creating directories#
You can use the mkdir()
method on a Path
object to create a new
directory.
First you’ll need to create a Path
object to the directory, then you can
call mkdir()
on that object.
The following example creates a tmp
directory.
1from pathlib import Path
2
3path = Path("tmp")
4print(f"Creating directory: {path}")
5path.mkdir()
The mkdir()
method will raise an error if the directory already exists, so
you could check the results of the exists()
and is_dir()
before calling
it. But there’s an easier way. Just pass the optional exist_ok
keyword
argument, as in the following example.
1from pathlib import Path
2
3path = Path("tmp")
4print(f"Creating directory: {path}")
5path.mkdir(exist_ok=True)
(Create a directory)
Create a tmp
directory in your data
directory. If you don’t have a data
directory, create it first.
Solution to Exercise 72 (Create a directory)
1from pathlib import Path
2tmp = Path("data") / "tmp"
3print(f"Creating directory: '{tmp}'")
4tmp.mkdir(exist_ok=True)
Part 2.2: Deleting directories#
To delete an empty directory use the rmdir()
method on a Path
object.
1from pathlib import Path
2
3path = Path("tmp")
4print(f"Removing directory: '{path}'")
5path.rmdir()
Note
The rmdir()
method only works on empty directories, so you’ll need to
delete any directory contents first. We’ll learn how to delete files later in
this lesson.
(Delete the tmp directory)
Delete the data/tmp
directory that you created earlier.
Solution to Exercise 73 (Delete the tmp directory)
from pathlib import Path
path = Path("data/tmp")
print(f"Removing directory: {path}")
path.rmdir()
Part 3: File operations#
Part 3.1: Creating empty files#
The command line command touch will either update a file’s timestamp of the last time it was accessed and modified, or create an empty file if it does not exist. This makes it an easy way to generate files without worrying if they already exist.
Path
objects provide a touch()
method that does the same thing, as shown below.
1from pathlib import Path
2
3path = Path("tmp.py")
4print(f"Touching file: {path}")
5path.touch()
(Generate files)
Create the
data/tmp
directory if it does not already exist.Use the
touch()
method to create filesfile_1.txt
thoughfile_9.txt
.
Solution to Exercise 74 (Generate files)
tmpdir = Path("data") / "tmp"
tmpdir.mkdir(exist_ok=True)
for i in range(1, 10):
path = tmpdir / f"file_{i}.txt"
print(f"Touching file: '{path}'")
path.touch()
Part 3.2: Removing files#
To remove files use the unlink()
method on a Path
object. Since this
deletes a file, it’s a good to print the path and confirm with the user that
they really want to.
1from pathlib import Path
2
3path = Path("data/a.txt")
4reply = input(f"Remove file: '{path}'? [y/N] ")
5if reply not in ("y", "yes"):
6 print("Ok, nevermind then.")
7else:
8 print("Ok, removing.")
9 path.unlink()
The unlink()
method will throw an error if the file does not exist. You can
avoid this by passing the optional missing_ok
argument like so.
1from pathlib import Path
2
3path = Path("file-that-doesnt-exist.txt")
4path.unlink(missing_ok=True)
(Remove a file)
Choose one of your generated
file_num.txt
files to delete.Ask the user to confirm they want to delete the file.
Use
unlink()
to delete the file.
Solution to Exercise 75 (Remove a file)
from pathlib import Path
filepath = Path("data") / "tmp" / "file_9.txt"
print(f"Delete the file '{filepath}'?")
reply = input(f"[yN] > ")
if reply in ("y", "yes"):
filepath.unlink(missing_ok=True)
print("All done.")
else:
print("Well, nevermind then.")
Part 3.3: Moving files#
Path
objects provide a replace()
method which can be used to move files. It
takes a destination path argument which can be either a string or Path
object
and returns a destination Path
object.
In the example below the file a.txt
is moved to the data
directory.
1from pathlib import Path
2
3old_path = Path("a.txt")
4new_path = old_path.replace("data/a.txt")
5
6print(f"File moved from '{old_path}' to '{new_path}'")
If the destination file location already exists the replace()
method will
silently overwrite it. So it’s a good idea to check that there isn’t already a
file at the destination.
In the example below a destination Path
object is created first to check if
it .exists()
. Then if all is well it is passed to .replace()
.
1from pathlib import Path
2
3from_path = Path("a.txt")
4to_path = Path("data") / "a.txt"
5
6if to_path.exists():
7 print(f"Error: '{to_path}' already exists.")
8else:
9 from_path.replace(to_path)
10 print(f"File moved from '{from_path}' to '{to_path}'")
(Move file)
Use
.touch()
to make an empty filexxx.txt
Make a new
Path
object todata/xxx.txt
Check to make sure
data/xxx.txt
does not exist, and print an error if it does.Print a message
"Moving 'from' to 'to'."
Use
.replace()
to move the file.
Solution to Exercise 76 (Move file)
from pathlib import Path
from_path = Path("xxx.txt")
to_path = Path("data") / "xxx.txt"
from_path.touch()
if to_path.exists():
print(f"Cannot move to {to_path} as it already exists.")
else:
print(f"Moving '{from_path}' to '{to_path}'")
from_path.replace(to_path)
(Move text files to data directory)
If you have no files ending in
.txt
in your working directory, use.touch()
to generate some first.Iterate over text files in your working directory ending in
.txt
[ ] Use a for-loop and
iterdir()
orglob()
to iterate over the files[ ] If using
iterdir()
, use an if statement tocontinue
if the file does not end in.txt
Check if a file with the same name already exists in
data/
[ ] Create a destination path object
[ ] If a file already
.exists()
print an error message andcontinue
Move the file
[ ] Confirm that the user wants to move the file.
[ ] Use .replace() to move the file
[ ] Print a confirmation message with the old and new path.
Bonus: Instead of skipping files that already exist in
data/
, move the file todata/file-num.txt
Solution to Exercise 77 (Move text files to data directory)
from pathlib import Path
for from_path in Path.cwd().iterdir():
to_path = Path("data") / from_path.name
if from_path.suffix.lower() != ".txt":
continue
if to_path.exists():
print(f"Skipping: {from_path.name}")
continue
print(f"Move '{from_path.name}' to 'data/{to_path.name}'?")
reply = input("[yN] >")
if reply.lower() not in ("y", "yes"):
print(f"Declined: {from_path.name}")
continue
from_path.replace(to_path)
print(f"Moved '{from_path.name}' to '{to_path}'")
from pathlib import Path
for from_path in Path.cwd().iterdir():
to_path = Path("data") / from_path.name
if from_path.suffix.lower() != ".txt":
continue
version = 1
while to_path.exists():
to_path = Path("data") / f"{from_path.stem}-{version}{from_path.suffix}"
version += 1
print(f"Move '{from_path.name}' to 'data/{to_path.name}'?")
reply = input("[yN] >")
if reply.lower() not in ("y", "yes"):
print(f"Declined: {from_path.name}")
continue
from_path.replace(to_path)
print(f"Moved '{from_path.name}' to '{to_path}'")
Part 3.4: Renaming files#
Renaming files is the same as moving files except instead of moving to a new
directory, you are moving to the same directory but with a new name. Just like
moving files, use the .replace()
method, passing the string or path object to
the destination location.
1from pathlib import Path
2
3from_path = Path("data") / "a.txt"
4to_path = Path("data") / "file_a.txt"
5
6if to_path.exists():
7 print(f"Cannot rename '{from_path}' to '{to_path}' as it already exists.")
8else:
9 from_path.replace(to_path)
10 print(f"Renamed to: '{to_path.name}'")
The path.replace()
method is relative to your working directory, just
like when you create a new Path
object. If you pass .replace()
only the
the new filename, the file will end up being moved to your working directory.
It’s is an easy mistake to make.
To rename a file, the string or path object sent to
.replace()
must have the same directory information as the original path object.
It’s a good idea to create a Path
object to the shared directory that can
then be used for both path objects.
Bad
from pathlib import Path
from_path = Path("data") / "a.txt"
to_path = from_path.replace("file_a.txt")
Better
from pathlib import Path
from_path = Path("data") / "a.txt"
to_path = from_path.replace("data/file_a.txt")
Best
from pathlib import Path
folder = Path("data")
from_path = folder / "a.txt"
to_path = folder / "file_a.txt"
from_path.replace(to_path)
(Rename file)
Rename the
file_1.txt
that you created earlier tofile_01.txt
.Be sure to check first that the destination does not already exist.
Print the files new name.
Solution to Exercise 78 (Rename file)
from pathlib import Path
tmpdir = Path("data/tmp")
from_path = tmpdir / "file_1.txt"
to_path = tmpdir / "file_01.txt"
if to_path.exists():
print(f"Cannot move to {to_path} as it already exists.")
else:
from_path.replace(to_path)
print(f"Renamed file: {to_path.name}")
(Zero pad filenames)
Sometimes when files are sorted numbers get grouped together so that
file_1.txt
and file_11.txt
are before file_2.txt
. So this exercise is
to add zeros in front of all of the single digit file numbers.
Use the
glob()
method to find all of thefile_*.txt
files that you created earlier.Check the length of the filename to skip files that are named
file_xx.txt
.For any files named
file_x.txt
, use a slice to get the number part of the file.Check that the zero-padded version of the file
file_xx.txt
does not already exist, otherwise print an error message and skip that file.Rename the file to the zero-padded version.
Print the new filename.
Solution to Exercise 79 (Zero pad filenames)
from pathlib import Path
TMPDIR = Path("data/tmp")
for from_path in TMPDIR.glob("file_*.txt"):
if len(from_path.name) > 10:
print(f"{from_path.name} ... SKIPPING: too long")
continue
num = from_path.name[5:6]
if not num.isnumeric():
print(f"{from_path.name} ... SKIPPING: {num} is not numeric")
continue
to_path = TMPDIR / f"file_0{num}.txt"
if to_path.exists():
print(f"{from_path.name} -> {to_path.name} ... ERROR: {to_path.name} exists")
continue
from_path.replace(to_path)
print(f"{from_path.name} -> {to_path.name} ... DONE")
Reference#
Glossary#
File System Operations#
- glob#
- globbing#
- glob patterns#
- filename expansion#
On the command line: A feature available on many command line shells where when some characters are not in quotes, they are interpreted as filename matching wildcards. For example the command
ls *.txt
will first find all files ending with.txt
in the working directory and send them as arguments to ls.In programming: Some variant of the syntax and behavior of glob patterns from the command line have been adopted in many other programs and languages when listing files. Some examples include Pythons
pathlib.Path.glob
andglob.glob
and the syntax used in.gitignore
files.- recursive#
- recursion#
When something uses itself to define itself. Some examples include:
In programming this happens when a function calls itself, for example to traverse a nested data structure or in algorithms to generate the Fibonacci sequence.
When dealing with a file system the term recursive is generally used as shorthand to mean, “include the contents of every subdirectory recursively”, so that the entire file tree is traversed.
Globbing#
Wildcards#
Symbol |
Matches |
Times |
---|---|---|
|
any directory, recursively |
zero or more |
|
any character |
zero or more |
|
any character |
exactly once |
|
any character in |
exactly once |
|
any character not in |
exactly once |
Character ranges#
Range |
Meaning |
---|---|
|
lowercase letter |
|
uppercase letter |
|
number |
Examples#
Pattern |
Matches where the name |
Located in |
---|---|---|
|
is anything |
this directory |
|
ends in |
this directory |
|
includes |
this directory |
|
includes a number |
this directory |
|
starts with a uppercase letter |
this directory |
|
is |
this directory |
|
is |
this directory |
|
ends with two numbers followed by |
this directory |
|
ends with a non-number, a number, then |
this directory |
|
starts with |
this directory |
|
ends in |
this directory |
|
is anything |
the |
|
ends in |
this directory or any child, recursively |
|
ends in |
any child directory |
|
is |
this directory or any child, recursively |