1. Introduction

In Python 3.9, if there are multiple computational tasks that need to be computed in parallel, they are usually done in a multithreading or coroutine fashion.

This article is based on the official Python 3.9 documentation and summarizes some common applications in development.

2. Description

A thread is the smallest unit of computational scheduling in the operating system. A process can have multiple threads, which share all system resources of the current process.

Threads are sometimes called “lightweight processes”.

Threads share all the system resources of the process, but still have their own independent resource parts.

  • call stack
  • register context
  • thread-local storage

Since all system resources of the current process are shared, a thread can easily destroy independent resource parts of other threads.

Advantages of threads

  • Saving system resources compared to the process approach
  • Data sharing among multiple threads
  • Utilizes multiple CPU cores and threads

Disadvantages of threads

  • Fixed memory footprint
  • Time-consuming to switch thread contexts

The default size of memory allocated for each thread can be viewed with ulimit -s under Linux.

3. Usage

Starting a thread in Python3 is very simple.

Threads in Python are executed in a separate system-level thread (say a POSIX thread or a Windows thread).

The code is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import datetime
import time
from threading import Thread

letters = ['a', 'b', 'c', 'd', 'e', 'f']


def PrintLetter(sleep: int = 5):
    while len(letters) > 0:
        print('%s Pop letter: %s' % (datetime.datetime.now().strftime('%H:%M:%S'), letters.pop()))
        time.sleep(sleep)
    print('Print success')


t = Thread(target=PrintLetter, args=(1, ))
t.start()

The output is as follows

1
2
3
4
5
6
17:30:10 Pop letter: f
17:30:11 Pop letter: e
17:30:12 Pop letter: d
17:30:13 Pop letter: c
17:30:14 Pop letter: b
17:30:15 Pop letter: a

4. Message Subscriptions

Practical development often requires threads to communicate with each other or subscribe to certain messages to decide whether to execute the next step.

Messaging between threads

  • Queue objects queue.Queue are thread-safe and can be used to interact between threads.
  • The thread property threding.Event can be used for state listening to accommodate special data to be processed next by other threads after it has been processed by the thread.

Suppose there are 2 threads, 1 producer thread is responsible for pushing the string “hello” and waiting for the consumer thread to add “world” to the string and then the producer thread prints it immediately.

Inter-thread communication is actually passing object references between threads.

The code implementation is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from multiprocessing import Event
from queue import Queue
from threading import Thread, Event
import time


def Producet(q: Queue):
    while True:
        e = Event()
        hello = ['hello']
        q.put((hello, e))
        print('%s Producet push data "hello"' % time.time())
        e.wait()
        print('%s Producet say %s\n' % (time.time(), ' '.join(hello)))


def consumer(q: Queue):
    while True:
        # Until the queue has to value
        data, e = q.get()
        time.sleep(5)
        data = data.append('world')
        e.set()
        print('%s Consumer add "world" to data \n' % time.time())


qe = Queue()
t1 = Thread(target=Producet, args=(qe, ))
t1.start()
t2 = Thread(target=consumer, args=(qe, ))
t2.start()

The output is as follows, you can see that the producer thread got the “hello world” string within 1 millisecond after processing.

1
2
3
1652840455.1704156 Producet push data "hello"
1652840460.1712673 Consumer add "world" to data 
1652840460.1713297 Producet say hello world

5. GIL

The CPython interpreter uses a mutual exclusion lock called “GIL Global Interpreter Lock”, or Global Interpreter Lock, to prevent concurrent execution of machine code by multiple threads.

Without going into detail about GIL, there are two conclusions

  • For computationally intensive tasks, the performance difference between multi-threaded and multi-process is not significant (GIL locks optimize IO blocking)
  • For IO-intensive tasks, multi-processing or concurrency should be used to handle

For example, for the following program, calculate the cumulative value of 100000000.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import time

def Add(n: int):
    count = 1
    for i in range(2, n):
        count = count + i
    print('Count: %s' % count)

start = time.time()

Add(100000000)
Add(100000000)

print('Run time: %0.3f' % (time.time() - start))

The output is as follows.

1
2
3
Count: 4999999950000000
Count: 4999999950000000
Run time: 15.024

We use a multi-threaded approach to compute two 100000000 cumulative values in parallel.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import time
from threading import Thread


def Add(n: int):
    count = 1
    for i in range(2, n):
        count = count + i
    print('Count: %s ' % count)


start = time.time()

t1 = Thread(target=Add, args=(100000000, ))
t1.start()
t2 = Thread(target=Add, args=(100000000, ))
t2.start()

# Wait t1 and t2 completed
t1.join()
t2.join()

print('Run time: %0.3f' % (time.time() - start))

The output is as follows.

1
2
3
Count: 4999999950000000 
Count: 4999999950000000 
Run time: 15.406

You can see that there is no difference in speed compared to sequential execution, and for computationally intensive tasks, multithreading does not make processing faster.

If you must process computationally intensive tasks in Python, consider process pools.

6. Thread locking

In reality, any multi-threaded scheduling operation should mostly take into account atomic operation, i.e. the execution order of fixed steps should not be interrupted or even abnormal by the thread scheduling mechanism.

In Python3, dict/tuple/list objects are thread-safe and can be used boldly.

For example, if the following Add method is called once, the value of count increases by 500000, and 2 threads are called once each, theoretically the output should be 1000000.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from threading import Thread

count = 0

def Add():
    global count
    i = 0
    while i < 500000:
        count += 1
        i += 1


t1 = Thread(target=Add)
t1.start()
t2 = Thread(target=Add)
t2.start()

# Wait t1 and t2 completed
t1.join()
t2.join()

print('Count value : %d' % count)

The output is as follows. The value of Count is not 1000000, and multiple executions reveal that the number varies, which is a reflection of the broken atomic operation.

1
Count value : 804263

Locks can be very good at avoiding such problems in practical development, but they are usually accompanied by some performance loss.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from threading import Thread, Lock

count = 0

lock = Lock()


def Add():
    global count
    i = 0
    while i < 500000:
        with lock:
            count += 1
        i += 1


t1 = Thread(target=Add)
t1.start()
t2 = Thread(target=Add)
t2.start()

# Wait t1 and t2 completed
t1.join()
t2.join()

print('Count value : %d' % count)

The output is as follows. The value of Count is 1000000, no matter how many times it is executed.

1
Count value : 1000000

7. Thread pooling

Sometimes it is not easy to determine the size of the task, so thread pooling comes in handy.

For example, it takes 1 second to generate a random string each time.

We use 2 threads to generate 4 random strings of length 5, 6, 7 and 8.

The code is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from concurrent.futures import ThreadPoolExecutor
import random
import threading
import time


def RomdomString(length: int):
    romdom_Str = ''
    while length > 0:
        num = random.randint(0, 9)
        s = str(random.choice([num, chr(random.randint(65, 90))]))
        romdom_Str += s
        length -= 1
        time.sleep(1)
    t = threading.currentThread()
    print('Thread(%s) romdom string: %s' % (t.native_id, romdom_Str))
    return romdom_Str


start = time.time()

pool = ThreadPoolExecutor(2)
t1 = pool.submit(RomdomString, 5)
t2 = pool.submit(RomdomString, 6)
t3 = pool.submit(RomdomString, 7)
t4 = pool.submit(RomdomString, 8)

print('All thread start')
print('T1 string: %s\nT2 string: %s\nT3 string: %s\nT4 string: %s' % (t1.result(), t2.result(), t3.result(), t4.result()))
print('Run time: %0.3f' % (time.time() - start))

The output is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
All thread start
Thread(13538) romdom string: VZFSV
Thread(13541) romdom string: E4PM0S
Thread(13538) romdom string: M2QN7XX
Thread(13541) romdom string: 2MZ33XH3
T1 string: VZFSV
T2 string: E4PM0S
T3 string: M2QN7XX
T4 string: 2MZ33XH3
Run time: 14.018

You can see that only 2 threads are processing string generation at any given time and scheduling themselves to handle the remaining tasks. Thread pooling is ideal for scenarios where you are dealing with a lot of IO-blocking tasks.

Normally, you should also only use thread pooling in I/O processing related code.