The three main options for Python source protection are code obfuscation, modifying the interpreter, and compiling to binary; the other methods are basically ineffective. The safest of these three options is to compile py files with Cython (but you need to be careful about compatibility).

The method is simple: write a setup.py file first.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from distutils.core import setup
import glob

from Cython.Build import cythonize


py_files = glob.glob('*/**/*.py', recursive=True)  # 第一层的 py 文件不编译,因为 so 文件不支持直接用 python -m 来执行
setup(
    ext_modules=cythonize(
        py_files,
        nthreads=4  # 同时用 4 个进程来编译 py 到 c,根据自己的 CPU 核数来设置
    )
)

If needed, you can refer to documentation to add compiler_directives.

Then execute python setup.py build_ext --inplace -j 4 to start compiling, where -inplace is to put the compiled c and so files together with the py files, and -j 4 is to use 4 processes to compile c to so at the same time. When you’re done compiling, delete the py and c files and you’re ready to release. Execution can be done just like a normal Python project, using something like python -m app. If the first level of app.py is also sensitive, you can put it in a lower level package and just write import xxx.app in app.py.

These processes are cumbersome to handle manually, and can be automated with GitLab CI to build them in a few lines of code. But as the project grew, a new problem emerged: the builds were taking longer and longer, even up to half an hour.

So I looked into Cython’s build cache, and found that it checks to see if the py file it needs to build has already generated a corresponding so file; if it has and the file is later than the py file, it skips compiling the py file. When I compiled with docker, I only ``COPYed’’ the source code, but not the so file, so naturally I had to recompile it. But if I COPY the so file directly, it will be modified later than the py file, and even if I change the code, it won’t be recompiled, which obviously doesn’t work either.

To solve this paradox, I thought of mounting a volume and saving both the source and so files on the host, so that the modification time of the files would not be lost. Before compiling, I also need to do some synchronization work, i.e., compare the new source folder that needs to be compiled with the last compiled folder in the volume, delete the py, c and so files that no longer exist, and copy the new or modified py files. This way the new py files will be modified later than so and will be recompiled, while the other py files will be skipped. This solution looks perfect, but docker build doesn’t support mounting volume, which is a real bummer.

I had no choice but to use docker run to mount the volume, compile it, copy the required files from the volume (i.e. the py files in the first layer and the so files in subsequent layers) to the image, and then use docker commit to save the new image. Note: If you’re building locally with docker, you can use COPY or ADD in Dockerfile; however, GitLab CI uses docker in docker, so you can’t access the path in the host directly.

To do this, you need to write two Dockerfiles, one for compiling and one for packaging.

1
2
3
4
5
6
7
8
build:
  script:
    - docker build -f Dockerfile_compile -t xxx-compile .
    - docker rm xxx_compile || true
    - docker run --name xxx_compile -v $CACHE_DIR:/root/xxx xxx-compile python /root/setup.py build_ext --inplace -j 4
    - docker commit xxx_compile xxx-compile
    - docker rm xxx_compile || true
    - docker build -f Dockerfile_package -t $IMAGE_NAME .

What Dockerfile_package does is copy /usr/local/lib and the compiled files from xxx-compile to a new image, which reduces the size of the image and avoids saving intermediate layers. I tested it and found that the compile time was reduced from half an hour to half a minute, done!

Here I came up with a new solution, I don’t actually need to save the py and so modification time, just make sure that the py file I don’t want to compile has a later so file. So I built it directly from the original plan and saved it as an intermediate image. In the second build step, we COPY the files in the intermediate image to another folder, then do a sync and cp the so files that don’t need to be compiled to the corresponding path. Wait for the compilation to complete, and then save it as an intermediate image as well. In the third build step, you can copy the files from the intermediate image to the new image. This solution does not need to expose the host folder, nor does it need to use docker commit, a statement that people with code cleanliness do not want to use, perfect!