Python argparse explained in detail

If you’re searching for an explanation of how argparse is used, then you’re usually confused as to what exactly this code does, like the one below. What problem does it solve? Why do you need such code?

import argparse
 
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
                    help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
                    const=sum, default=max,
                    help='sum the integers (default: find the max)')
 
args = parser.parse_args()
print(args.accumulate(args.integers))

To explain what this code does, you must first answer what argv and what argc are.

If you’ve ever worked with C, then you’ve probably seen a piece of code like the following.

#include <stdio.h>
int main(int argc, char * argv[]) {
    for (int i = 0; i < argc; i++) {
        printf("argv[%d]: %s\n", i, argv[i]);
    }
    return 0;
}

It runs as follows.

$ gcc a.c
$ ./a.out hello world
argv[0]: ./a.out
argv[1]: hello
argv[2]: world

What is this again?

The problem is really quite simple. There are some data or conditions that a program needs that cannot be determined at the time of writing the code. Let’s say you are writing an Email client, but you don’t know who the user is sending the email to, so you have to get the recipient’s address somehow when the program is running, commonly by

create a graphical interface that elegantly asks the user who the recipient is.
read a file on the hard disk, let’s say C:\receiver.txt (Windows) or /home/user/receiver.txt (Linux/Unix), which requires the user to save the recipient’s address to a text file in advance.
read environment variables (yes, the familiar interface for Windows users to set PATH for Python).
Append the recipient’s address when running the program.

I’m sure you’ve run a Python script using the command line, like this.

`1`	`python3 -v some_script.py`

In this command line, python3 is the program we want to execute, and everything after it is information appended to it, i.e., -v some_script.py is information appended to the Python program at runtime. Where -v tells the Python interpreter to run in debug mode, some_script.py specifies which Python script file to run.

This process is done by the operating system, and each language has an interface to read the additional information. The C program above prints out the information received through the interface provided by C. C has already done some preprocessing for us, and the additional information has been changed from a string to a list of strings divided by spaces: argc represents how many elements are in the list, and argv[i] represents the i-th string. This additional information is called argument(s), and arg is shorthand for it.

Historically C was the first massively popular cross-platform language, so almost every programming language since then has followed its design, and so has Python.

For example, the following Python code prints all the additional information.

1
2
3

import sys
for i in range(len(sys.argv)):
    print('argv[' + str(i) + ']:', sys.argv[i])

It runs as follows.

$ python3 a.py hello world
argv[0]: x.py
argv[1]: hello
argv[2]: world

This way, we get all the information that the user attaches to the program at runtime through the interface provided by Python. Now we can answer the original question: What’s the point of argparse?

Imagine a program that has too much data or conditions that can only be determined at runtime. For example, a program that sends an email would need to know a lot of information.

sendmail --receiver someone@example.com
         --sender anotherone@anotherexample.com
         --force-point-to-point-security
         --max-retry 10
         --retry-interval 120

where receiver represents the recipient, sender represents the sender, force-point-to-point-security represents the secure transmission method that must be used, max-retry represents the maximum number of attempts to send if the transmission fails, and retry-interval codes the interval between two transmission attempts (seconds ).

When there is no argparse package, the beginning of your code may be filled with logic that checks that the user has provided all the necessary information, checks that it is a legitimate email address, and checks that the number of attempts and the interval between attempts is a number and not an illegal input containing alphabets. These checks are so tiring that the argparse package was created to free you from this worthless work. All you have to do is tell it which arguments are required, such as receiver and sender in this case; whether they are strings or numbers; or just a switch, such as force-point-to-point -security; and by the way, give the user a friendly feedback, prompting him what to correct and add when he doesn’t provide reasonable information.

Let’s go back to the example given in the official Python documentation, which has four key elements.

parser = argparse.ArgumentParser() : A new Parser is created to take care of helping us with additional information.
parser.add_argument() : defines an argument to be entered, including its type, whether it is optional, name, etc. 3.
args = parser.parse_args() tells Parser that I have defined all the arguments to help me with the process.
args is saved.

For our sendmail program, its args might look like this

{'force-point-to-point-security': True,
 'max-retry': 10,
 'receiver': 'someone@example.com',
 'retry-interval': 120,
 'sender': 'anotherone@anotherexample.com'}

This way, we can easily and securely use the data or conditions entered by the user.