How to get the thread ID correctly?

How to get the correct thread ID, a seemingly simple question with a hidden catch. Since there are two thread models, user and kernel, there are two ways to get the thread ID.

First of all, it is important to understand what POSIX is; a long time ago, when there was no Linux Kernel, Unix was the world of Unix, an open source system, and many developers did various customizations based on Unix and made it open source, and there were many Unix-like systems, and the situation was very confusing. In order to enhance the compatibility of the various versions of the system and support cross-platform development, the IEEE released the POSIX standard, POSIX full name is Portable Operating System Interface for Computing Systems, which defines a variety of standards with portable operating systems, including the standard reference on threads: pthreads. Currently systems including Linux, Windows, macOS, and iOS are compatible or partially compatible with the POSIX standard.

Early versions of the Linux Kernel did not have the concept of threads, and all tasks were scheduled through process management. Linux Kernel versions 2.0 to 2.4 use the LinuxThread thread model to implement support for threads. The details are related to the flags parameter of the clone system call.

1
2
3

#include <sched.h>
int clone(int (*fn)(void *), void *stack, int flags, void *arg, ...
                 /* pid_t *parent_tid, void *tls, pid_t *child_tid */ );

If the CLONE_VM flag is set in the flags, it means that the same virtual memory space is used as the parent process, i.e., when a thread is requested in the user state, a process is actually created in the kernel, except that this process shares the virtual memory space with the parent process. As you can see, when using the LinuxThread thread model, the value obtained by executing getpid in different threads of a process is different, which is contrary to the POSIX standard.

In order to be compatible with the POSIX standard, glibc introduces the NPTL (Native POSIX Thread Library) thread model, which is contributed by Redhat. Also to support NPTL, Linux Kernel 2.6 has been adapted accordingly. One important change is the addition of the tgid field to the process information structure task_struct.

 struct task_struct {
    ...
    pid_t pid;   //线程ID
    pid_t tgid; //线程组ID，即进程ID
    struct task_struct *group_leader; //主线程指针
    ...
 }

When a thread’s pid and tgid are equal, the thread is the group leader, often referred to as the “main thread”, and its pid is the process number of the thread group. The tgid of the other threads in the thread group is set to the pid of the main thread. getpid() gets the process ID and actually reads the tgid field. NPTL also adds the CLONE_THREAD flag bit to the clone function to tell the kernel that it needs to create a thread, and the kernel sets the tgid of the new thread to the pid of the calling thread and sets its own thread ID. thus, the thread ID is POSIX compliant.

As you can see, in the Linux Kernel whether it is the early LinuxThread or the NPTL thread model used now, the kernel does not make too much distinction between threads and processes, threads are the so-called lightweight processes and they are all using the task_struct structure in the kernel.

The pthread_create and pthread_self that we usually use belong to the user state interface and are included in glibc, which is the current standard C library on Linux systems. The application creates a thread through pthread_create, which corresponds to the NPTL thread in the kernel.

Since there are two levels of thread models, user state and kernel state, there are naturally two thread IDs.

gettid

The gettid function call has been supported as a system call since Linux kernel 2.4.11 and as of glibc 2.3.0. gettid returns the pid_t (int type) thread ID, which is the pid field of the kernel task_struct structure mentioned above. NPTL thread model ensures that each thread (process) ID is unique and will not conflict.
1

pid_t ttid = syscall(SYS_gettid);
iOS/macOS does not implement the SYS_gettid system call, which always returns -1.
pthread_self

An interface provided by the POSIX threads library that returns a thread handle of type pthread_t. pthread_t is allocated and maintained by the pthread threads library and is only guaranteed to be unique within the same process. However, the POSIX standard does not specify the exact format of pthread_t, and the implementation of pthread_t may be different on different systems.

Using iOS/macOS as an example.

// <sys/_pthread/_pthread_types.h>
typedef struct _opaque_pthread_t *__darwin_pthread_t;

// <sys/_pthread/_pthread_t.h>
struct __darwin_pthread_handler_rec {
    void (*__routine)(void *);  // Routine to call
    void *__arg;            // Argument to pass
    struct __darwin_pthread_handler_rec *__next;
};

// https://easeapi.com/blog/blog/158-thread-id.html
struct _opaque_pthread_t {
    long __sig;
    struct __darwin_pthread_handler_rec  *__cleanup_stack;
    char __opaque[__PTHREAD_SIZE__];
};

typedef __darwin_pthread_t pthread_t;

As you can see, pthread_t is a pointer to the _opaque_pthread_t structure, and __darwin_pthread_handler_rec in the structure stores the functions and parameters that the thread needs to call.

iOS/macOS provides the pthread_threadid_np extension method to convert the thread handle of pthread_t to an integer thread ID.

`1`	`int pthread_threadid_np(pthread_t _Nullable,__uint64_t* _Nullable);`

In iOS/macOS, you can also use the SYS_thread_selfid system call to get the integer thread ID, which has the same effect as pthread_threadid_np.

`1`	`pid_t tid = syscall(SYS_thread_selfid);`

On Linux systems, pthread_t is defined in pthreadtypes.h.

`1`	`typedef unsigned long int pthread_t;`

You can see that pthread_t is the numeric thread ID.

Table of Contents