In your daily work, in addition to using Python to process text files, sometimes you will also be involved in processing compressed files.

The compression file formats that are usually involved are

  • rar: the more used compression under Windows environment, the more famous GUI tool is winrar
  • tar: a packaging tool for Linux systems, only packaging, not compression
  • gz: or gzip, usually can only compress one file. Combined with tar, it can be packaged first, then compressed.
  • tgz: that is, gz. first pack with tar, then compress with gz to get the file.
  • zip: different from gzip, although using a similar algorithm, you can package and compress multiple files, but compress the files separately, the compression rate is lower than tar
  • 7z: a format supported by 7zip compression software, with higher compression efficiency.

Of course, in addition to using Python, you can also choose to use compression and decompression software or command dynamic processing.

zip file

zipfile is a module used to compress and decompress zip files in Python. zipfile has two very important classes: ZipFile and ZipInfo. zipFile is the main class used to create and read zip files, while ZipInfo stores information about each file in the zip file.

Sample code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import os
import zipfile


# 压缩
def make_zip(source_dir, output_filename):
    zipf = zipfile.ZipFile(output_filename, 'w')
    pre_len = len(os.path.dirname(source_dir))
    for parent, dirnames, filenames in os.walk(source_dir):
        for filename in filenames:
            print(filename)
            pathfile = os.path.join(parent, filename)
            arcname = pathfile[pre_len:].strip(os.path.sep)  # 相对路径
            zipf.write(pathfile, arcname)
        print()
    zipf.close()


# 解压缩
def un_zip(file_name):
    """unzip zip file"""
    zip_file = zipfile.ZipFile(file_name)
    if os.path.isdir(file_name + "_files"):
        pass
    else:
        os.mkdir(file_name + "_files")
    for names in zip_file.namelist():
        zip_file.extract(names, file_name + "_files/")
    zip_file.close()


if __name__ == '__main__':
    make_zip(r"E:\python_sample\libs\test_tar_files\libs", "test.zip")
    un_zip("test.zip")

tar.gz file

The tarfile module can be used to read and write tar archives, including archives compressed with gzip, bz2 and lzma. The mode must be understood when using tarfile.

mode must be a string of the form ‘filemode[:compression]’, whose default value is ‘r’. The following is a complete list of mode combinations:

Mode Action
'r' or 'r:*' Open and read using transparent compression (recommended).
'r:' Open and read without compression.
'r:gz' Open and read with gzip compression.
'r:bz2' Open and read with bzip2 compression.
'r:xz' Open and read using lzma compression.
'x' or 'x:' Create tarfile without compression. If the file already exists, a FileExistsError exception is thrown.
'x:gz' Create a tarfile using gzip compression. throw a FileExistsError if the file already exists.
'x:bz2' Create a tarfile using bzip2 compression, or throw a FileExistsError if the file already exists.
'x:xz' Create a tarfile using lzma compression.
'a' or 'a:' Open to append without compression. If the file does not exist, create it.
'w' or 'w:' Open for uncompressed writing.
'w:gz' Open for gzip-compressed writes.
'w:bz2' Open for bzip2 compressed writes.
'w:xz' Turn on writing for lzma compression.

For special purposes, a second mode format exists: ‘filemode|[compression]’. tarfile.open() will return a TarFile object that treats its data as a stream of data blocks.

Mode Action
‘r/*’ 打开 tar 块的 流 以进行透明压缩读取。
‘r/’ 打开一个未压缩的 tar 块的 stream 用于读取。
‘r/gz’ 打开一个 gzip 压缩的 stream 用于读取。
‘r/bz2’ 打开一个 bzip2 压缩的 stream 用于读取。
‘r/xz’ 打开一个 lzma 压缩 stream 用于读取。
‘w/’ 打开一个未压缩的 stream 用于写入。
‘w/gz’ 打开一个 gzip 压缩的 stream 用于写入。
‘w/bz2’ 打开一个 bzip2 压缩的 stream 用于写入。
‘w/xz’ 打开一个 lzma 压缩的 stream 用于写入。

Code example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import os
import tarfile
import gzip


# 一次性打包整个根目录。空子目录会被打包。
# 如果只打包不压缩,将"w:gz"参数改为"w:"或"w"即可。
def make_targz(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))


# 逐个添加文件打包,未打包空子目录。可过滤文件。
# 如果只打包不压缩,将"w:gz"参数改为"w:"或"w"即可。
def make_targz_one_by_one(output_filename, source_dir):
    tar = tarfile.open(output_filename, "w:gz")
    for root, dir, files in os.walk(source_dir):
        for file in files:
            pathfile = os.path.join(root, file)
            tar.add(pathfile)
    tar.close()


def un_gz(file_name):
    """ungz zip file"""
    f_name = file_name.replace(".gz", "")
    # 获取文件的名称,去掉
    g_file = gzip.GzipFile(file_name)
    # 创建gzip对象
    open(f_name, "wb+").write(g_file.read())
    # gzip对象用read()打开后,写入open()建立的文件里。
    g_file.close()  # 关闭gzip对象


def un_tar(file_name):
    # untar zip file
    tar = tarfile.open(file_name)
    names = tar.getnames()
    if os.path.isdir(file_name + "_files"):
        pass
    else:
        os.mkdir(file_name + "_files")
    # 由于解压后是许多文件,预先建立同名文件夹
    for name in names:
        tar.extract(name, file_name + "_files/")
    tar.close()


if __name__ == '__main__':
    make_targz('test.tar.gz', "E:\python_sample\libs")
    make_targz_one_by_one('test01.tgz', "E:\python_sample\libs")
    un_gz("test.tar.gz")
    un_tar("test.tar")

rar file

We can use rarfile to decompress .rar files, but rarfile is not supported to compress rar files. rarfile follows the unrar component, but after installing it using pip install unrar, the following error is reported.

Couldn’t find path to unrar library…

This is because unrar under Python also relies on the official RAR library.

Installation for Windows

  • Go to RARLab and download the official library file, https://www.rarlab.com/rar/UnRARDLL.exe, and install it.
  • Installation is best to choose the default path, usually under C:\Program Files (x86)\UnrarDLL\ directory.
  • Add environment variables, create a new variable name UNRAR_LIB_PATH in the system variable, if it is a 64-bit system, enter C:\Program Files (x86)\UnrarDLL\x64\UnRAR64.dll, if it is a 32-bit system is C:\Program Files (x86)\ UnrarDLL\UnRAR.dll.
  • After making sure to save the environment variables, do the pip install unrar installation, and then the code will not report an error when you run it again.

Installation of Linux

  • Download the rar source file: https://www.rarlab.com/rar/rarlinux-6.0.0.tar.gz
  • Unzip the installation package, enter the installation package directory, compile and install, generate the so file
  • Configure the environment variables, and when you are done, do the pip install unrar installation
1
2
3
4
5
6
7
8
9
# cd /usr/local/src/
# wget https://www.rarlab.com/rar/unrarsrc-6.0.3.tar.gz
# tar zxvf unrarsrc-6.0.3.tar.gz
# cd unrar
# make lib
# make install-lib  //生成libunrar.so 文件
# vim /etc/profile
export UNRAR_LIB_PATH=/usr/lib/libunrar.so
# source /etc/profile

Code example.

1
2
3
4
5
6
7
import rarfile


def unrar(rar_file, dir_name):
    # rarfile需要unrar支持, linux下pip install unrar, windows下在winrar文件夹找到unrar,加到path里
    rarobj = rarfile.RarFile(rar_file.decode('utf-8'))
    rarobj.extractall(dir_name.decode('utf-8'))

7z file

To compress and decompress .7z files you need to use py7zr component. Code example.

1
2
3
4
5
6
7
8
9
import py7zr

# 压缩
with py7zr.SevenZipFile("Archive.7z", 'r') as archive:
    archive.extractall(path="/tmp")

# 解压缩
with py7zr.SevenZipFile("Archive.7z", 'w') as archive:
    archive.writeall("target/")