The three main options for Python source protection are code obfuscation, modifying the interpreter, and compiling to binary; the other methods are basically ineffective. The safest of these three options is to compile py files with Cython (but you need to be careful about compatibility).
The method is simple: write a
setup.py file first.
If needed, you can refer to documentation to add
python setup.py build_ext --inplace -j 4 to start compiling, where
-inplace is to put the compiled c and so files together with the py files, and
-j 4 is to use 4 processes to compile c to so at the same time.
When you’re done compiling, delete the py and c files and you’re ready to release. Execution can be done just like a normal Python project, using something like
python -m app.
If the first level of
app.py is also sensitive, you can put it in a lower level package and just write
import xxx.app in
These processes are cumbersome to handle manually, and can be automated with GitLab CI to build them in a few lines of code. But as the project grew, a new problem emerged: the builds were taking longer and longer, even up to half an hour.
So I looked into Cython’s build cache, and found that it checks to see if the py file it needs to build has already generated a corresponding so file; if it has and the file is later than the py file, it skips compiling the py file. When I compiled with docker, I only ``COPYed’’ the source code, but not the so file, so naturally I had to recompile it.
But if I
COPY the so file directly, it will be modified later than the py file, and even if I change the code, it won’t be recompiled, which obviously doesn’t work either.
To solve this paradox, I thought of mounting a volume and saving both the source and so files on the host, so that the modification time of the files would not be lost.
Before compiling, I also need to do some synchronization work, i.e., compare the new source folder that needs to be compiled with the last compiled folder in the volume, delete the py, c and so files that no longer exist, and copy the new or modified py files. This way the new py files will be modified later than so and will be recompiled, while the other py files will be skipped.
This solution looks perfect, but
docker build doesn’t support mounting volume, which is a real bummer.
I had no choice but to use
docker run to mount the volume, compile it, copy the required files from the volume (i.e. the py files in the first layer and the so files in subsequent layers) to the image, and then use
docker commit to save the new image. Note: If you’re building locally with docker, you can use
Dockerfile; however, GitLab CI uses docker in docker, so you can’t access the path in the host directly.
To do this, you need to write two
Dockerfiles, one for compiling and one for packaging.
Dockerfile_package does is copy
/usr/local/lib and the compiled files from
xxx-compile to a new image, which reduces the size of the image and avoids saving intermediate layers.
I tested it and found that the compile time was reduced from half an hour to half a minute, done!
Here I came up with a new solution, I don’t actually need to save the py and so modification time, just make sure that the py file I don’t want to compile has a later so file.
So I built it directly from the original plan and saved it as an intermediate image. In the second build step, we
COPY the files in the intermediate image to another folder, then do a sync and
cp the so files that don’t need to be compiled to the corresponding path. Wait for the compilation to complete, and then save it as an intermediate image as well. In the third build step, you can
copy the files from the intermediate image to the new image.
This solution does not need to expose the host folder, nor does it need to use
docker commit, a statement that people with code cleanliness do not want to use, perfect!