How does Python find packages?

Preface

I’m writing this article because I’ve recently seen a few questions in the Python community that are very frequently asked.

I installed pip, but why does running it report that the executable is not found?
Why does import module throw a ModuleNotFound exception?
Why can I run it with Pycharm but not with cmd?

To solve these kinds of problems, you need to know how Python finds packages. I hope reading this article will help.

How Python Finds Packages

Nowadays, it’s likely that everyone has more than just one Python on their computer, and more virtual environments, leading to package installations where you accidentally forget to pay attention to the path to the installation package. First let’s tackle the problem of finding packages, which is a simple question to answer, but one that many people don’t know how it works. If the path to your Python interpreter is $path_prefix/bin/python, then when you start your Python interactive environment or run a script with this interpreter, it will look for the following locations by default.

$path_prefix/lib (the standard library path)
$path_prefix/lib/pythonX.Y/site-packages (Third-party library paths, X.Y is the major and minor version number of Python, e.g. 3.7, 2.6)
the current working directory (the result of the pwd command)

Here $path_prefix is /usr if you’re using the default Python on Linux, or $path_prefix is /usr/local if you’re compiling with the default options yourself. If you upgrade Python from 3.6 to 3.7, you won’t be able to use any of the previously installed triple libraries. Of course, you can copy the entire folder, and you’ll be fine in most cases.

A few useful functions

sys.executable : the path to the Python interpreter currently in use
sys.path : a list of search paths for the current package
sys.prefix : the currently used $path_prefix

Example:

>>> import sys
>>> sys.executable
'/home/frostming/.pyenv/versions/3.7.2/bin/python'
>>> sys.path
['', '/home/frostming/.pyenv/versions/3.7.2/lib/python37.zip', '/home/frostming/.pyenv/versions/3.7.2/lib/python3.7', '/home/frostming/.pyenv/versions/3.7.2/lib/python3.7/lib-dynload', '/home/frostming/.local/lib/python3.7/site-packages', '/mnt/d/Workspace/pipenv', '/home/frostming/.pyenv/versions/3.7.2/lib/python3.7/site-packages']
>>> sys.prefix
'/home/frostming/.pyenv/versions/3.7.2'

Adding search paths using environment variables

If your package’s path does not exist in the search path list above, you can add it to the PYTHONPATH environment variable, separating multiple paths with : (Windows uses ;).

Note that you should avoid adding paths to different Python packages to PYTHONPATH, such as PYTHONPATH=/home/frostming/.local/lib/python2.7/site-packages, because the paths in PYTHONPATH take precedence over the default search path, and there are compatibility issues if you use Python 3. In fact, it’s best not to have any paths with site-packages in PYTHONPATH.

By the way, PATH is the search path used to find executable programs. If you run the command my_cmd in the terminal, the system will scan the paths in PATH in turn to see if my_cmd exists in that path, so if it says that the program is not found or the command is not recognized, then you need to see if the path is added to PATH.

How Python installs packages

Nowadays, you basically use pip to install Python packages, even if you use pipenv, poetry, the underlying pip still applies. If you don’t have pip installed please refer to here, if you have it installed and still can’t use the pip command please refer to the previous section.

There are two ways to run pip.

pip ...
python -m pip ...

The first way is similar to the second way, except that the Python interpreter used in the first way is written in the shebang of the pip file, and in general, if your pip path is $path_prefix/bin/pip, then the Python path is $path_prefix/bin /python. If you’re using a Unix system, the first line of cat $(which pip) contains the path to the Python interpreter. The second way explicitly specifies the location of Python. This rule holds true for all Python executables. The flow is shown in the following figure.

flow

Then, without any customization, the package will be automatically installed under $path_prefix/lib/pythonX.Y/site-packages using pip ($path_prefix is from the previous paragraph), and the executable will be installed under $path_prefix/bin, if If you need to run it directly from the command line using my_cmd, remember to add it to PATH.

Options for changing the installation location in pip

-prefix PATH, replace $path_prefix with the given value
--root ROOT_PATH, add ROOT_PATH before $path_prefix, for example --root /home/frostming, $path_prefix will change from /usr to /home/frostming/usr
-target TARGET to specify the installation location directly to TARGET

Virtual Environments

Virtual environments are designed to isolate packages that depend on different projects and install them under different paths to prevent dependency conflicts. Once you understand the mechanics of how Python installs packages, it’s easy to understand how virtual environments (the virtualenv , venv modules) work. In fact, running virtualenv myenv copies a new Python interpreter to myenv/bin and creates the myenv/lib and myenv/lib/pythonX.Y/site-packages directories (the venv module is not copied, but the result is basically the same). . Executing source myenv/bin/activate will then stuff myenv/bin in front of PATH, giving the copied Python interpreter the highest priority for searching. This way, when installing packages later, $path_prefix will be myenv, thus isolating the installation path.

The effect of script running style on search paths

As you know from the above introduction, the most direct reason for Python to find a package or not is sys.path, and the further reason is the path of sys.executable. Once a program is written, we have to run it, and different methods of running it may affect sys.path and cause different behavior, so let’s discuss this issue.

Suppose you have the following package structure.

.
├── main.py
└── my_package
    ├── __init__.py
    ├── a.py
    └── b.py

The contents of main.py.

`1`	`import my_package.b`

The contents of the b.py file are simple.

1
2
3

import sys
print("I'm b")
print(sys.path)

Now execute the following command in a directory at the same level as main.py.

$ python main.py
I'm b
['/home/frostming/test_path', ...]  # 省略的路径是共同的，与讨论的问题无关
$ python my_package/b.py
I'm b
['/home/frostming/test_path/my_package', ...]

The way python xxx.py is run is called run directly, where the __name__ value in the file is specified as __main__, which is the way the Run File Run Script in the IDE is used. You can see that the first value of sys.path is the directory where the script file is located, which varies with the script path, remembering that we always run our tests in the /home/frostming/test_path directory.

Okay, so if we need to import a.py into b.py, which contains the simple line print("I'm a"), what should we write in the b.py script?

Easy!, import a, well, execute the above test again
1 2 3 4 5 6

$ python main.py ModuleNotFoundError: No module named 'a' $ python my_package/b.py I'm a I'm b ['/home/frostming/test_path/my_package', ...]
The first test is wrong. If you’ve read the previous section, this error is to be expected - sys.path doesn’t have a.py in the directory /home/frostming/test_path/my_package, so of course a can’t be found.
Change to from my_package import a. The test is not done, because based on the same analysis, we can predict that the first one will run fine while the second one will report an error that my_package cannot be found. Note that since b is in the my_package package, you can use relative import and write from . import a and from my_package import a will have the same effect.

So is there a way for me to make both runs without reporting an error? There is. We need to realize that there are a limited number of entry points in a project, and there won’t actually be executable code that is both at the top level and in a subdirectory. We should put the main runtime logic in main.py (not necessarily that name, e.g. Django project is manage.py), and if we do need to run the code of a script in a subdirectory at this point, we should use python -m <module_name>, and the statement to import a in b.py should be from my_package import a, and let’s see how it works.

$ python main.py  # 和 python -m main 效果一样
I'm a
I'm b
['/home/frostming/test_path', ...]
$ python -m my_package.b
I'm a
I'm b
['/home/frostming/test_path', ...]

You can see that the contents of sys.path are the same for both runs, and its first value is the directory where it is currently running. This is called running as a module, and the argument after python -m is (separated by .) module name instead of pathname. Because of this uniformity, all your imports in your project can be defined in the same way, regardless of which script they are in. This is why the official Django documentation recommends import names of the form myapp.models.users for all imports.

In addition, when running as a module, each level of parent module (or package) passed in the module name will run as a module, which means you can use relative imports in modules (not when running directly) and the value of __name__ in the passed-in module will be set to __main__ and you can still apply if __name__ == " __main__":. If the module passed in python -m <module_name> is a package, then the __main__.py script in the package directory (if it exists) will be executed, and the __name__ value of that script will be __main__.

Summary

As you can see here, the most important thing about package path search is the $path_prefix path prefix, which is derived from the path of the Python interpreter used. So to find the package path, you just need to know the path to the interpreter, and if you encounter a change in the package path, you just need to specify the Python interpreter you want with the correct PATH setting.

Now back to the three questions at the beginning, can everyone solve this kind of problem?

Table of Contents