File System Operations#

This lesson will go over how to use Path objects to preform common operations on files and directories.

This part of the lesson is about the ways to get a list of files on your local computer.

Table of Contents

Part 1: Listing files#

Part 1.2: Directory contents#

Path objects provide an iterdir() method for iterating through the contents of a directory. Here’s a simple example that prints contents of the working directory.

1from pathlib import Path
2
3cwd = Path.cwd()
4for f in cwd.iterdir():
5    print(f)

Each of the elements yielded by iterdir() is a Path object, so we have access to all of its methods and properties. For example, the following code iterates through the contents of the working directory and uses path.name and path.is_dir() to print just the directory names.

1from pathlib import Path
2
3cwd = Path.cwd()
4
5for f in cwd.iterdir():
6    if f.is_dir():
7        print(f.name)

Here’s another example that iterates through the contents of the working directory and prints any files with "recipe" in the name that are either text or markdown.

1from pathlib import Path
2
3cwd = Path.cwd()
4
5for f in cwd.iterdir():
6    if f.suffix in (".md", ".txt") and "recipe" in f.name:
7        print(f.name)

Part 1.3: Searching files#

Path objects provides a glob() method for simple, albeit limited, file searching. The glob() method uses a set of wildcard characters adopted from the command line called glob patterns, the most common of which is * meaning “any or no text”.

The following example uses the glob pattern *.py to search for any files ending in .py.

1from pathlib import Path
2
3path = Path("data")
4for f in path.glob("*.py"):
5    print(f.name)

The rglob() method works the same way, but recursively includes the contents of all subdirectories in the search.

1from pathlib import Path
2
3path = Path.cwd()
4for f in path.rglob("*.py"):
5    print(f)

Exercise 71 (Search for files)

Print the names of all files in your working directory or recursively in any subdirectories that end with *.txt.

See also

Part 2: Directory operations#

Part 2.1: Creating directories#

You can use the mkdir() method on a Path object to create a new directory.

First you’ll need to create a Path object to the directory, then you can call mkdir() on that object.

The following example creates a tmp directory.

1from pathlib import Path
2
3path = Path("tmp")
4print(f"Creating directory: {path}")
5path.mkdir()

The mkdir() method will raise an error if the directory already exists, so you could check the results of the exists() and is_dir() before calling it. But there’s an easier way. Just pass the optional exist_ok keyword argument, as in the following example.

1from pathlib import Path
2
3path = Path("tmp")
4print(f"Creating directory: {path}")
5path.mkdir(exist_ok=True)

Exercise 72 (Create a directory)

Create a tmp directory in your data directory. If you don’t have a data directory, create it first.

Part 2.2: Deleting directories#

To delete an empty directory use the rmdir() method on a Path object.

1from pathlib import Path
2
3path = Path("tmp")
4print(f"Removing directory: '{path}'")
5path.rmdir()

Note

The rmdir() method only works on empty directories, so you’ll need to delete any directory contents first. We’ll learn how to delete files later in this lesson.

Exercise 73 (Delete the tmp directory)

Delete the data/tmp directory that you created earlier.

Part 3: File operations#

Part 3.1: Creating empty files#

The command line command touch will either update a file’s timestamp of the last time it was accessed and modified, or create an empty file if it does not exist. This makes it an easy way to generate files without worrying if they already exist.

Path objects provide a touch() method that does the same thing, as shown below.

1from pathlib import Path
2
3path = Path("tmp.py")
4print(f"Touching file: {path}")
5path.touch()

Exercise 74 (Generate files)

  1. Create the data/tmp directory if it does not already exist.

  2. Use the touch() method to create files file_1.txt though file_9.txt.

Part 3.2: Removing files#

To remove files use the unlink() method on a Path object. Since this deletes a file, it’s a good to print the path and confirm with the user that they really want to.

Listing 252 note: assumes data/a.txt exists#
1from pathlib import Path
2
3path = Path("data/a.txt")
4reply = input(f"Remove file: '{path}'? [y/N] ")
5if reply not in ("y", "yes"):
6    print("Ok, nevermind then.")
7else:
8    print("Ok, removing.")
9    path.unlink()

The unlink() method will throw an error if the file does not exist. You can avoid this by passing the optional missing_ok argument like so.

1from pathlib import Path
2
3path = Path("file-that-doesnt-exist.txt")
4path.unlink(missing_ok=True)

Exercise 75 (Remove a file)

  1. Choose one of your generated file_num.txt files to delete.

  2. Ask the user to confirm they want to delete the file.

  3. Use unlink() to delete the file.

Part 3.3: Moving files#

Path objects provide a replace() method which can be used to move files. It takes a destination path argument which can be either a string or Path object and returns a destination Path object.

In the example below the file a.txt is moved to the data directory.

Listing 253 note: assumes a.txt exists#
1from pathlib import Path
2
3old_path = Path("a.txt")
4new_path = old_path.replace("data/a.txt")
5
6print(f"File moved from '{old_path}' to '{new_path}'")

If the destination file location already exists the replace() method will silently overwrite it. So it’s a good idea to check that there isn’t already a file at the destination.

In the example below a destination Path object is created first to check if it .exists(). Then if all is well it is passed to .replace().

 1from pathlib import Path
 2
 3from_path = Path("a.txt")
 4to_path = Path("data") / "a.txt"
 5
 6if to_path.exists():
 7    print(f"Error: '{to_path}' already exists.")
 8else:
 9  from_path.replace(to_path)
10  print(f"File moved from '{from_path}' to '{to_path}'")

Exercise 76 (Move file)

  1. Use .touch() to make an empty file xxx.txt

  2. Make a new Path object to data/xxx.txt

  3. Check to make sure data/xxx.txt does not exist, and print an error if it does.

  4. Print a message "Moving 'from' to 'to'."

  5. Use .replace() to move the file.

Exercise 77 (Move text files to data directory)

  1. If you have no files ending in .txt in your working directory, use .touch() to generate some first.

  2. Iterate over text files in your working directory ending in .txt

    • [ ] Use a for-loop and iterdir() or glob() to iterate over the files

    • [ ] If using iterdir(), use an if statement to continue if the file does not end in .txt

  3. Check if a file with the same name already exists in data/

    • [ ] Create a destination path object

    • [ ] If a file already .exists() print an error message and continue

  4. Move the file

    • [ ] Confirm that the user wants to move the file.

    • [ ] Use .replace() to move the file

    • [ ] Print a confirmation message with the old and new path.

  5. Bonus: Instead of skipping files that already exist in data/, move the file to data/file-num.txt

Part 3.4: Renaming files#

Renaming files is the same as moving files except instead of moving to a new directory, you are moving to the same directory but with a new name. Just like moving files, use the .replace() method, passing the string or path object to the destination location.

Listing 256 note: assumes data/a.txt exists#
 1from pathlib import Path
 2
 3from_path = Path("data") / "a.txt"
 4to_path = Path("data") / "file_a.txt"
 5
 6if to_path.exists():
 7    print(f"Cannot rename '{from_path}' to '{to_path}' as it already exists.")
 8else:
 9    from_path.replace(to_path)
10    print(f"Renamed to: '{to_path.name}'")

The path.replace() method is relative to your working directory, just like when you create a new Path object. If you pass .replace() only the the new filename, the file will end up being moved to your working directory. It’s is an easy mistake to make.

To rename a file, the string or path object sent to .replace() must have the same directory information as the original path object.

It’s a good idea to create a Path object to the shared directory that can then be used for both path objects.

Bad

Listing 257 Bad: moves file to ./file_a.txt#
from pathlib import Path

from_path = Path("data") / "a.txt"
to_path = from_path.replace("file_a.txt")

Better

Listing 258 Better: renames file to ./data/file_a.txt#
from pathlib import Path

from_path = Path("data") / "a.txt"
to_path = from_path.replace("data/file_a.txt")

Best

Listing 259 Best: Avoid mistakes with a folder variable#
from pathlib import Path

folder = Path("data")
from_path = folder / "a.txt"
to_path = folder / "file_a.txt"

from_path.replace(to_path)

Exercise 78 (Rename file)

  1. Rename the file_1.txt that you created earlier to file_01.txt.

  2. Be sure to check first that the destination does not already exist.

  3. Print the files new name.

Exercise 79 (Zero pad filenames)

Sometimes when files are sorted numbers get grouped together so that file_1.txt and file_11.txt are before file_2.txt. So this exercise is to add zeros in front of all of the single digit file numbers.

  1. Use the glob() method to find all of the file_*.txt files that you created earlier.

  2. Check the length of the filename to skip files that are named file_xx.txt.

  3. For any files named file_x.txt, use a slice to get the number part of the file.

  4. Check that the zero-padded version of the file file_xx.txt does not already exist, otherwise print an error message and skip that file.

  5. Rename the file to the zero-padded version.

  6. Print the new filename.

Reference#

Glossary#

File System Operations#

glob#
globbing#
glob patterns#
filename expansion#

On the command line: A feature available on many command line shells where when some characters are not in quotes, they are interpreted as filename matching wildcards. For example the command ls *.txt will first find all files ending with .txt in the working directory and send them as arguments to ls.

In programming: Some variant of the syntax and behavior of glob patterns from the command line have been adopted in many other programs and languages when listing files. Some examples include Pythons pathlib.Path.glob and glob.glob and the syntax used in .gitignore files.

recursive#
recursion#

When something uses itself to define itself. Some examples include:

In programming this happens when a function calls itself, for example to traverse a nested data structure or in algorithms to generate the Fibonacci sequence.

When dealing with a file system the term recursive is generally used as shorthand to mean, “include the contents of every subdirectory recursively”, so that the entire file tree is traversed.

Globbing#

Wildcards#

Symbol

Matches

Times

**

any directory, recursively

zero or more

*

any character

zero or more

?

any character

exactly once

[seq]

any character in seq

exactly once

[!seq]

any character not in seq

exactly once

Character ranges#

Range

Meaning

a-z

lowercase letter

A-Z

uppercase letter

0-9

number

Examples#

Pattern

Matches where the name

Located in

*

is anything

this directory

*.md

ends in .md

this directory

*circle*.svg

includes circle and ends in .svg

this directory

*[0-9]*

includes a number

this directory

[A-Z]*

starts with a uppercase letter

this directory

file_??.txt

is file_, then any two characters, then .txt

this directory

20[0-2][0-9]

is 20 followed by 00 through 29

this directory

*[0-9][0-9].txt

ends with two numbers followed by .txt

this directory

*[!0-9][0-9].txt

ends with a non-number, a number, then .txt

this directory

[_.0-9]*

starts with _, . or a number

this directory

*-[hv].svg

ends in -h.svg or -v.svg

this directory

docs/*

is anything

the docs child directory

**/*.doc

ends in .doc

this directory or any child, recursively

*/*.jpg

ends in .jpg

any child directory

**/.gitignore

is .gitignore

this directory or any child, recursively