For Python based file, directory and path operations, we generally use the os.path module.

pathlib is its replacement, which is a wrapper around os.path, objectizing paths, making the api more general, easier to use, and more in line with programming habits.

The pathlib module provides a number of classes that use semantic representation of file system paths, which are suitable for a variety of operating systems. The path classes are divided into pure paths (which provide pure computational operations without I/O), and concrete paths (which are inherited from pure paths but provide I/O operations).

Let’s first look at the organization of the pathlib module, whose core is composed of 6 classes, the base class of which is the PurePath class, from which the other 5 classes are derived.

pathlib

The arrows connect two classes with inheritance relationship, take PurePosixPath and PurePath class as an example, PurePosixPath inherits from PurePath, that is, the former is a subclass of the latter.

  • PurePath class: treats a path as an ordinary string, and it can stitch multiple specified strings into the path format applicable to the current operating system, and also can determine whether any two paths are equal. From the English name, Pure means pure, which means that PurePath class is purely concerned with the operation of paths, regardless of the reality of whether the paths are valid in the real file system, whether the files exist, whether the directories exist, etc.
  • PurePosixPath and PureWindowsPath are subclasses of PurePath, the former is used to manipulate paths for UNIX (including Mac OS X) style operating systems, and the latter is used to manipulate paths for Windows operating systems. We all know that there are some differences in path separators between the two styles of operating systems.
  • The Path class differs from the above 3 classes in that it manipulates paths along with files/directories and interacts with the real file system, for example to determine if the path is real or not.
  • PosixPath and WindowsPath are subclasses of Path and are used to manipulate Unix (Mac OS X) style paths and Windows style paths respectively.

The three pure path classes PurePath, PurePosixPath and PureWindowsPath are often used in special cases, such as

  • If you need to manipulate a Windows path in a Unix device, or a Unix path in a Windiws device. Because we can’t instantiate a real Windows path on Unix, but we can instantiate a pure Windows path and pretend we are manipulating windows.
  • You want to make sure that your code only manipulates the path and does not interact with the OS for real.

The format of paths is completely different between UNIX type operating systems and Windows operating systems, the main difference is the root path and path separator, the UNIX system root path is a slash (/), while the Windows system root path is the disk character (C:); UNIX system paths use the separator is a forward slash (/), while Windows uses a backslash (\).

1. PurePath Class

The PurePath class (as well as the PurePosixPath class and the PureWindowsPath class) provide a number of constructors, instance methods, and class instance properties for us to use.

The PurePath class is automatically adapted to the operating system when it is instantiated. If you are on a UNIX or Mac OS X system, the constructor method actually returns a PurePosixPath object; conversely, if you are using PurePath to create an instance on a Windows system, the constructor method returns a PureWindowsPath object.

For example, the following statement is executed on a Windows system.

1
2
3
4
5
6
from pathlib import PurePath

path = PurePath('file.txt')
print(type(path))

# <class 'pathlib.PureWindowsPath'>

PurePath also supports passing in multiple path strings when creating objects, and they will be stitched together into a single path. Example.

1
2
3
4
5
6
from pathlib import PurePath

path = PurePath('https:','www.liujiangblog.com','django')
print(path)

# https:\www.liujiangblog.com\django

As you can see, the output is a path in Windows platform format because the runtime environment is a Windows wipe o do system.

If you want to create UNIX-style paths in Windows, you need to specify the use of the PurePosixPath class, and vice versa. Example.

1
2
3
4
5
from pathlib import PurePosixPath
path = PurePosixPath('https:','www.liujiangblog.com','django')
print(path)

# https:/www.liujiangblog.com/django

Emphasis: When you do pure path operations, you are actually operating on strings, which are not actually associated with the local file system and do not do any disk IO operations. paths constructed by PurePath are essentially strings, and can be converted to strings using str().

In addition, if you use the constructor of the PurePath class without passing any string parameters, it is equivalent to passing the point . (the current path) as an argument.

1
2
3
4
5
6
7
8
9
from pathlib import PurePath

path1 = PurePath()

path2 = PurePath('.')

print(path1 == path2)

# True

If multiple parameters passed into the PurePath constructor contain multiple root paths, only the last root path and subsequent subpaths will take effect. Example.

1
2
3
4
5
6
from pathlib import PurePath

path = PurePath('C:/', 'D:/', 'file.txt')
print(path)

# D:\file.txt

As an extra reminder, when constructing strings in Python, be sure to pay attention to the difference between forward/backward slashes when escaping and not escaping. and the use and non-use of r-native strings. Don’t ever write it wrong!

If the argument passed to the PurePath constructor contains an extra slash or . will be ignored outright, but .. will not be ignored.

1
2
3
4
5
from pathlib import PurePath
path = PurePath('C:/./..file.txt')
print(path)

# C:\..file.txt

PurePath instances support comparison operators, which can determine equality and compare size for paths of the same style (in effect, comparing the size of strings); for paths of different styles, they can only determine equality (obviously, it is impossible to be equal), but cannot compare size.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from pathlib import *

# Unix风格的路径区分大小写
print(PurePosixPath('/D/file.txt') == PurePosixPath('/d/file.txt'))

# Windows风格的路径不区分大小写
print(PureWindowsPath('D://file.txt') == PureWindowsPath('d://file.txt'))

# False
# True

The following is a list of methods and properties commonly used by PurePath instances.

Instance properties and methods Function Description
PurePath.parts Returns the sections contained in the path string.
PurePath.drive Returns the drive letter in the path string.
PurePath.root Return the root path in the path string.
PurePath.anchor Returns the disk character and root path in the path string.
PurePath.parents Return all the parent paths of the current path.
PurPath.parent Returns the previous level of the current path, equivalent to the return value of parents[0].
PurePath.name Returns the name of the file in the current path.
PurePath.suffixes Returns all suffixes of the files in the current path.
PurePath.suffix Returns the suffix name of the file in the current path. That is, the last element of the suffixes property list.
PurePath.stem Return the name of the master file in the current path.
PurePath.as_posix() Converts the current path to a UNIX-style path.
PurePath.as_uri() Only absolute paths can be converted, otherwise a ValueError will be raised.
PurePath.is_absolute() Determine if the current path is an absolute path.
PurePath.joinpath(*other) Linking multiple paths together works similarly to the slash (/) linker described earlier.
PurePath.match(pattern) Determine if the current path matches the specified wildcard character.
PurePath.relative_to(*other) Get the result after removing the base path from the current path.
PurePath.with_name(name) Replaces the file name in the current path with the new file name. If there is no filename in the current path, a ValueError will be raised.
PurePath.with_suffix(suffix) Replaces the file suffix name in the current path with a new suffix name. If there is no suffix name in the current path, a new suffix name will be added.

2. The Path class

More often than not, we use the Path class directly instead of PurePath.

Path is a subclass of PurePath. In addition to supporting various constructors, properties and methods provided by PurePath, it also provides methods to determine the validity of the path, and even determine whether the path corresponds to a file or a folder, and if it is a file, it also supports operations such as reading and writing to the file.

Path has 2 subclasses, PosixPath and WindowsPath, the role of these two subclasses is obvious and will not be repeated.

Basic usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from pathlib import Path

# 创建实例
p = Path('a','b','c/d')     
p = Path('/etc')    

-------------------------------------------------------
p = Path()      

# WindowsPath('.')
p.resolve()                     # 解析路径,不一定是真实路径
# WindowsPath('C:/Users/liujiangblog')
--------------------------------------------------
# 任何时候都返回当前的真实的绝对路径
p.cwd()
# WindowsPath('D:/work/2020/django3')
Path.cwd()
# WindowsPath('D:/work/2020/django3')
p.home()
# WindowsPath('C:/Users/liujiangblog')
Path.home()
# WindowsPath('C:/Users/liujiangblog')

Directory operations

1
2
3
4
5
6
7
8
p = Path(r'd:\test\11\22')
p.mkdir(exist_ok=True)          # 创建文件目录(前提是tt目录存在, 否则会报错)
# 一般我会使用下面这种创建方法
p.mkdir(exist_ok=True, parents=True) # 递归创建文件目录
p.rmdir()       #删除当前目录,但是该目录必须为空

p
# WindowsPath('d:/test/11/22')          p依然存在

Traversing the directory

1
2
3
4
5
p = Path(r'd:\test')
# WindowsPath('d:/test')
p.iterdir()                     # 相当于os.listdir
p.glob('*')                     # 相当于os.listdir, 但是可以添加匹配条件
p.rglob('*')                    # 相当于os.walk, 也可以添加匹配条件

Create file

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
file = Path(r'd:\test\11\22\test.py')
file.touch()                # touch方法用于创建空文件,目录必须存在,否则无法创建
#Traceback (most recent call last):
#  File "<input>", line 1, in <module>
# .....
#FileNotFoundError: [Errno 2] No such file or directory: 'd:\\test\\11\\22\\test.py'

p = Path(r'd:\test\11\22')
p.mkdir(exist_ok=True,parents=True)
file.touch()

file.exists()
# True

file.rename('33.py')            # 文件重命名或者移动
#Traceback (most recent call last):
#  File "<pyshell#4>", line 1, in <module>
#    file.rename('33.py')
#  File "C:\Program Files\Python38\lib\pathlib.py", line 1353, in rename
#    self._accessor.rename(self, target)
#PermissionError: [WinError 5] 拒绝访问。: 'd:\\test\\11\\22\\test.py' -> '33.py'

file.rename(r'd:\test\11\22\33.py')
# WindowsPath('d:/test/11/22/33.py')

File Operations

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
p = Path(r'd:\test\tt.txt.bk')
p.name                          # 获取文件名
# tt.txt.bk
p.stem                          # 获取文件名除后缀的部分
# tt.txt
p.suffix                        # 文件后缀
# .bk
p.suffixs                       # 文件的后缀们...
# ['.txt', '.bk']
p.parent                        # 相当于dirnanme
# WindowsPath('d:/test')
p.parents                       # 返回一个iterable, 包含所有父目录
# <WindowsPath.parents>
for i in p.parents:
    print(i)
# d:\test
# d:\
p.parts                         # 将路径通过分隔符分割成一个元组
# ('d:\\', 'test', 'tt.txt.bk')

p = Path('C:/Users/Administrator/Desktop/')
p.parent
# WindowsPath('C:/Users/Administrator')

p.parent.parent
# WindowsPath('C:/Users')


# 索引0是直接的父目录,索引越大越接近根目录
for x in p.parents: print(x)
# C:\Users\Administrator
# C:\Users
# C:\
# 更多技术文章请访问官网https://www.liujiangblog.com

# with_name(name)替换路径最后一部分并返回一个新路径
Path("/home/liujiangblog/test.py").with_name('python.txt')
# WindowsPath('/home/liujiangblog/python.txt')

# with_suffix(suffix)替换扩展名,返回新的路径,扩展名存在则不变
Path("/home/liujiangblog/test.py").with_suffix('.txt')
# WindowsPath('/home/liujiangblog/test.txt')

File information

1
2
3
4
5
6
7
8
9
p = Path(r'd:\test\tt.txt')
p.stat()                        # 获取详细信息
# os.stat_result(st_mode=33206, st_ino=562949953579011, st_dev=3870140380, st_nlink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1525254557, st_mtime=1525254557, st_ctime=1525254557)
p.stat().st_size                # 文件大小
# 0
p.stat().st_ctime               # 创建时间
# 1525254557.2090347
# 其他的信息也可以通过相同方式获取
p.stat().st_mtime               # 修改时间

File read/write

open(mode=‘r’, bufferiong=-1, encoding=None, errors=None, newline=None)

Used in a similar way to Python’s built-in open function, returning a file object.

1
2
3
p = Path('C:/Users/Administrator/Desktop/text.txt')
with p.open(encoding='utf-8') as f: 
    print(f.readline())  

read_bytes() : reads the file in 'rb' mode and returns data of type bytes

write_bytes(data) : writes data to the file in 'wb' mode

1
2
3
4
5
p = Path('C:/Users/Administrator/Desktop/text.txt')
p.write_bytes(b'Binary file contents')
# 20
p.read_bytes()
# b'Binary file contents'

read_text(encoding=None, errors=None) : Read the file with 'r' and return the text.

write_text(data, encoding=None, errors=None) : Write a string to the path corresponding to the file in 'w' way.

1
2
3
4
5
p = Path('C:/Users/Administrator/Desktop/text.txt')
p.write_text('Text file contents')
# 18
p.read_text()
# 'Text file contents'

Judgment operation

  • returns a boolean

  • is_dir() : whether it is a directory

  • is_file() : whether it is a normal file

  • is_symlink() : whether it is a soft link

  • is_socket() : whether it is a socket file

  • is_block_device() : if or not it is a block device

  • is_char_device() : whether it is a character device

  • is_absolute() : whether it is an absolute path

1
2
3
4
5
p = Path(r'd:\test')
p = Path(p, 'test.txt')           # 字符串拼接
p.exists()                      # 判断文件是否存在
p.is_file()                     # 判断是否是文件
p.is_dir()                      # 判断是否是目录

Path splicing and decomposition

In pathlib, paths are stitched together by the stitching operator / in three main ways:

  • Path object / Path object.
  • Path object / Strin.
  • String / Path object.

Decomposing paths is mainly done by the parts method.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
p=Path()
p
# WindowsPath('.')
p = p / 'a'
p
# WindowsPath('a')
p = 'b' / p
p
# WindowsPath('b/a')
p2 = Path('c')
p = p2 / p
p
# WindowsPath('c/b/a')
p.parts
# ('c', 'b', 'a')
p.joinpath("c:","liujiangblog.com","jack")    # 拼接的时候,前面部分被忽略了
# WindowsPath('c:liujiangblog.com/jack')

# 更多技术文章请访问官网https://www.liujiangblog.com

Wildcards

  • glob(pattern) : wildcard the given pattern
  • rglob(pattern) : wildcard the given pattern and recursively search the directory

Return value: a generator

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
p=Path(r'd:\vue_learn')
p.glob('*.html')   # 匹配所有HTML文件,返回的是一个generator生成器
# <generator object Path.glob at 0x000002ECA2199F90>
list(p.glob('*.html'))
# [WindowsPath('d:/vue_learn/base.html'), WindowsPath('d:/vue_learn/components.html'), WindowsPath('d:/vue_learn/demo.html').........................
g = p.rglob('*.html')   # 递归匹配
next(g)  
# WindowsPath('d:/vue_learn/base.html')
next(g)
# WindowsPath('d:/vue_learn/components.html')

Regular match

Use match method for pattern matching and return True if successful.

1
2
3
4
5
p = Path('C:/Users/Administrator/Desktop/text.txt')
p.match('*.txt')
# True
Path('C:/Users/Administrator/Desktop/text.txt').match('**/*.txt')
# True