This article describes the proper way to vendor third-party libraries in Python libraries. I know the audience for this article is very narrow, and most Python developers don’t know or need to use this technique, but in the spirit of sharing, I’ll summarize it, and as the author of the software, you should respect the work of all other library authors.
WHAT - What is a vendor?
A vendor is a way of embedding third-party library code directly into software (in languages like C, Go, etc.). It differs from the way it is specified by a dependency file in that the code of the third-party library is included directly in the software and may or may not be kept as is, so you need to be aware of the various license restrictions, especially if the upstream library is under the GPL family of agreements, and the use of vendor software is subject to contagion.
WHY - When do I use vendor in Python?
As I said at the beginning, the scope is very narrow and there are three scenarios.
software features restrict it to be self-contained and zero-dependent. In the Python world, the library that uses vendor most heavily is
pip, which we use every day. There are 25 dependencies in
pipis the current standard Python installer, so it can’t have any dependencies that would otherwise have to be installed in order to install
pip, and those dependencies can only be installed through
pip, which is recursive. In addition to this, there are also basic build tools like
the software depends on a specific version of an upstream library. This also includes cases where the upstream library breaks change frequently, leading to API instability. If you simply specify
third-party-lib==1.0.0in a dependency, it will cause a dependency conflict with software that also relies on this library and does not resolve the version. Switching to vendor removes this very strict dependency restriction.
the software needs to make some changes to the upstream library, and due to the maintenance of the upstream library, these changes can not be merged into the upstream and released through PR and other means. In the case of open source agreement, you can embed the source code into the software through vendor and modify it by yourself.
In fact, for scenarios 2 and 3 above, you don’t have to be a vendor. In addition to vendor, you can also fork to your own git repository and introduce it using git dependencies or publish it as a new PyPI package. Just vendor is one of the easiest ways to do this.
- There is one more constraint: for Python, only pure Python libraries can be vendor.
HOW - How should I vendor?
A vendor is not a simple copy and paste solution, in my opinion, it has to pay attention to the following two points.
- vendor must comply with the open source protocol and put the protocol files in the vendor directory as well.
- when there are changes to the source code, you need to record the patch file, so that when the time is right, feedback back upstream.
So, a vendor is not a copy-and-paste, but a compromise to the status quo in an open source framework, and our ultimate goal is to eliminate vendors.
In Python, in addition to putting the vendor libraries in a directory under the code base (e.g.
mypackage/vendor), you need to modify all import statements to point to this directory. For example, change
import requests to
from mypackage.vendor import requests. The PDM also contains such a directory, and I use the same tool as
pip to manage vendors. This tool is vendoring and is very poorly documented (because nobody wants to use it). It contains the following functions.
- read a
requirements.txtto download the dependencies to the specified directory
- download the LICENSE files of all libraries into this directory
- read the patch file from a specified path and apply it to the source code
- rewrite all import statements to point to the vendor directory
- update the vendor version
The procedure is roughly the same as above. First create a
mypackage/vendor directory, create a
vendors.txt in it and fill in the dependencies (in
pyproject.toml under the project root path, add the following.
vendoring sync and you’ll have the vendor all ready to go automatically.
For patch files, this is actually the output of
git diff, with which git can recreate the vendor directory from the source code. To generate the patch, 1.
vendoring synconce after configuration and commit the file to the local repository (commit only, not push)
modify the source code
git diff --patch <file_path> > <patches_dir>/<file_name>.patchto save the patch file to
Review the patch file and revert any modified import statements to the original import statements, e.g.
from mypackage.vendor import requeststo
As for why we should do this, because apply patch is rewritten before import, so the patch file should be filled with unrewritten import statements. Be careful not to change any whitespace characters when modifying, the patch file is sensitive to whitespace.
git add . && git commit --amendto commit the changes
vendoring syncagain to verify that if everything works, there should be no changes, which means the vendor process is reproducible.