During a debugging session a few days ago, I noticed that the Stack Trace printed by the program was not quite what I expected. After much research, I found a problematic piece of code. Can you see what the problem is?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <chrono>
#include <iostream>
#include <thread>

int subtask1(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  return x;
}

int subtask2(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(500));
  return x;
}

int run(int a, int b) {
  int result1;
  std::thread t([&]() { result1 = subtask1(a); });

  int result2 = subtask2(b);
  if (result2 < 0) {
    return -2;
  }

  t.join();
  if (result1 < 0) {
    return -1;
  }

  return 0;
}

int main(int argc, char **argv) {
  if (argc < 3) {
    std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
    return 1;
  }

  std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
  return 0;
}

Problem

The code above returns -2 directly when result2 is less than 0. However, if a std::thread object is decomposed without first calling a join or detach member function, the std::thread decomposition function will call std::terminate directly to terminate the entire program.

Although it is disconcerting to call std::terminate directly, it is not entirely unreasonable. If the std::thread destructor function automatically calls the detach member function, the execution of another thread may take longer than the life of its reference object. This may result in undefined behaviour. For example, in the previous example, the other thread would refer to result1 and a. If the std::thread destruct function calls t.detach() when return -2, result1 and a will become dangling references, and accessing them will produce undefined behaviour.

If subtask1 in the above example stops in the middle of execution to wait for subtask2, but before allowing subtask1 to continue subtask2 returns an error and triggers The std::thread destruct function (which automatically calls t.join()), which causes the two threads to wait for each other and creates a Dead Lock.

In addition, the automatic call to the join member function may also lengthen the execution time of the program. In the previous example, if an error occurs in subtask2, we don’t care what happens to subtask1. But in order to execute t.join(), the main thread must wait for another thread to execute. In some cases, this is unnecessarily wasteful.

Solution

First we must check the synchronization relationship between the two threads. If there is a synchronization relationship between the two threads other than “creating a new thread” and “merging the threads with the join function” (e.g., communicating with each other as Mutex or Condition Variable), we must re-examine the synchronization protocol between the two threads. We must make sure that the “waiting thread” can definitely get a response from the “other thread”. For example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
#include <chrono>
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>

class scoped_thread_join {
private:
  std::thread* thread_;

public:
  explicit scoped_thread_join(std::thread& thread) : thread_(&thread) {}
  ~scoped_thread_join() {
    if (thread_->joinable()) {
      thread_->join();
    }
  }
};

bool is_valid(int x) {
  return x % 2 == 0;
}

bool is_ready = false;
std::mutex m;
std::condition_variable cv;

int subtask1(int x) {
  std::unique_lock<std::mutex> lock(m);
  cv.wait(lock, []() { return is_ready; });

  std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  return x;
}

int subtask2(int x) {
  if (!is_valid(x)) {
    return -1;  // Problemetic
  }

  {
    std::lock_guard<std::mutex> lock(m);
    is_ready = true;
    cv.notify_all();
  }

  std::this_thread::sleep_for(std::chrono::milliseconds(500));
  return x;
}

int run(int a, int b) {
  int result1;
  std::thread t([&]() { result1 = subtask1(a); });
  scoped_thread_join thread_guard(t);

  int result2 = subtask2(b);
  if (result2 < 0) {
    return -2;
  }

  t.join();
  if (result1 < 0) {
    return -1;
  }

  return 0;
}

int main(int argc, char **argv) {
  if (argc < 3) {
    std::cerr << "usage: " << argv[0] << " -1 -3" << std::endl;
    return 1;
  }

  std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
  return 0;
}

The above program forgets to notify the other side when it handles an error. If your program has this problem, simply calling the join or detach functions will not solve it. We must define an “Error State” in the synchronization protocol so that another thread can handle the exception. Example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#include <chrono>
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>

class scoped_thread_join {
private:
  std::thread* thread_;

public:
  explicit scoped_thread_join(std::thread& thread) : thread_(&thread) {}
  ~scoped_thread_join() {
    if (thread_->joinable()) {
      thread_->join();
    }
  }
};

bool is_valid(int x) {
  return x % 2 == 0;
}

bool is_error = false;  // Added
bool is_ready = false;
std::mutex m;
std::condition_variable cv;

int subtask1(int x) {
  std::unique_lock<std::mutex> lock(m);
  cv.wait(lock, []() { return is_ready || is_error; });

  if (is_error) {  // Added
    // Return error early
    return -1;
  }

  std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  return x;
}

int subtask2(int x) {
  if (!is_valid(x)) {
    std::lock_guard<std::mutex> lock(m);  // Added
    is_error = true;                      // Added
    cv.notify_all();                      // Added
    return -1;
  }

  {
    std::lock_guard<std::mutex> lock(m);
    is_ready = true;
    cv.notify_all();
  }

  std::this_thread::sleep_for(std::chrono::milliseconds(500));
  return x;
}

int run(int a, int b) {
  int result1;
  std::thread t([&]() { result1 = subtask1(a); });
  scoped_thread_join thread_guard(t);

  int result2 = subtask2(b);
  if (result2 < 0) {
    return -2;
  }

  t.join();
  if (result1 < 0) {
    return -1;
  }

  return 0;
}

int main(int argc, char **argv) {
  if (argc < 3) {
    std::cerr << "usage: " << argv[0] << " -1 -3" << std::endl;
    return 1;
  }

  std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
  return 0;
}

We can also think about whether to rewrite the whole synchronisation process. For example, moving the is_valid(x) check above out of subtask2 and eliminating the problem directly before creating the thread t. However, this is beyond the scope of this article and will be covered in a future issue.

After checking the synchronization relationship, we must think about solving the std:🧵:~thread call to std::terminate with join() or detach(). Using join() is simpler, but as mentioned before join() makes the main thread wait for another thread (whether you care about the result or not). On the other hand, with detach() we have to make sure that the object used by the other thread is not deconstructed during its execution. A simple sufficient condition is to allow another thread to hold the objects it needs. If the situation is too complex to determine simply, join() is a safer choice.

The following four solutions are described separately.

  1. call the join function
  2. use std::jthread instead (call the join variant)
  3. call the detach function
  4. change to std::async (variant calling detach)

Solution 1: Calling the join function

The most straightforward way to do this is to call the join member function before std:🧵:~thread is called. The code at the beginning of this article could be rewritten as follows

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <chrono>
#include <iostream>
#include <thread>

int subtask1(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  return x;
}

int subtask2(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(500));
  return x;
}

int run(int a, int b) {
  int result1;
  std::thread t([&]() { result1 = subtask1(a); });

  int result2;
  try {
    result2 = subtask2(b);
  } catch (...) {
    t.join();
    throw;
  }
  if (result2 < 0) {
    t.join();
    return -2;
  }

  t.join();
  if (result1 < 0) {
    return -1;
  }

  return 0;
}

int main(int argc, char **argv) {
  if (argc < 3) {
    std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
    return 1;
  }

  std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
  return 0;
}

Because of the Exceptions, the whole program can become very cumbersome. We can write a scoped_thread_join class.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class scoped_thread_join {
private:
  std::thread* thread_;

public:
  explicit scoped_thread_join(std::thread& thread) : thread_(&thread) {}
  ~scoped_thread_join() {
    if (thread_->joinable()) {
      thread_->join();
    }
  }
};

Then rewrite the program as follows

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <chrono>
#include <iostream>
#include <thread>

class scoped_thread_join {
private:
  std::thread* thread_;

public:
  explicit scoped_thread_join(std::thread& thread) : thread_(&thread) {}
  ~scoped_thread_join() {
    if (thread_->joinable()) {
      thread_->join();
    }
  }
};

int subtask1(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  return x;
}

int subtask2(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(500));
  return x;
}

int run(int a, int b) {
  int result1;
  std::thread t([&]() { result1 = subtask1(a); });
  scoped_thread_join thread_guard(t);

  int result2 = subtask2(b);
  if (result2 < 0) {
    return -2;
  }

  t.join();
  if (result1 < 0) {
    return -1;
  }

  return 0;
}

int main(int argc, char **argv) {
  if (argc < 3) {
    std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
    return 1;
  }

  std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
  return 0;
}

or, more recently, merge t.join().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <chrono>
#include <iostream>
#include <thread>

class scoped_thread_join {
private:
  std::thread* thread_;

public:
  explicit scoped_thread_join(std::thread& thread) : thread_(&thread) {}
  ~scoped_thread_join() {
    if (thread_->joinable()) {
      thread_->join();
    }
  }
};

int subtask1(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  return x;
}

int subtask2(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(500));
  return x;
}

int run(int a, int b) {
  int result1;
  std::thread t([&]() { result1 = subtask1(a); });

  {
    scoped_thread_join thread_guard(t);

    int result2 = subtask2(b);
    if (result2 < 0) {
      return -2;
    }
  }

  if (result1 < 0) {
    return -1;
  }

  return 0;
}

int main(int argc, char **argv) {
  if (argc < 3) {
    std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
    return 1;
  }

  std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
  return 0;
}

Solution 2: Use std::jthread instead

C++ 20 adds a new class std::jthread (with an extra j in front of its name). Unlike std::thread, std::jthread calls the join function in the destructor function. So we can also rewrite the original program as

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <chrono>
#include <iostream>
#include <thread>

#ifndef __cpp_lib_jthread
// jthread library: https://github.com/josuttis/jthread
#include "jthread.hpp"
#endif

int subtask1(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  return x;
}

int subtask2(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(500));
  return x;
}

int run(int a, int b) {
  int result1;
  std::jthread t([&]() { result1 = subtask1(a); });

  int result2 = subtask2(b);
  if (result2 < 0) {
    return -2;
  }

  t.join();
  if (result1 < 0) {
    return -1;
  }

  return 0;
}

int main(int argc, char **argv) {
  if (argc < 3) {
    std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
    return 1;
  }

  std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
  return 0;
}

C++ 20 Alternative

C++ 20 is relatively new, however. At the time of writing, some C++ implementations do not have the std::jthread class. As an alternative, we can use the jthread library written by Nicolai Josuttis.

1
git clone https://github.com/josuttis/jthread

Then add to our program.

1
2
3
4
#ifndef __cpp_lib_jthread
// jthread library: https://github.com/josuttis/jthread
#include "jthread.hpp"
#endif

Lastly the following party instructions are compiled.

1
g++ -pthread -std=c++17 -Ijthread/source solution_jthread.cpp

Solution 3: Call the detach function

We also call detach after we have created a thread. However, to ensure the lifecycle of the object, I have changed the reference Lambda Capture ([&]) to a value Lambda Capture ([a, sync]). In addition, I have defined the data structure required for synchronisation as a struct and made it common to both threads with std::shared_ptr.

The normal flow of t.join() should also be rewritten as a Mutex and Condition Variable. The main thread will lock the std::mutex object sync->m with std::unique_lock and then wait for the return value with sync->cv.wait(lock, ...) wait for the value to be returned. The other thread will run subtask1 first. After getting the return value, it locks sync->m with std::lock_guard, sets the return value, and then notifies the main thread with sync->cv.notify_all().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <chrono>
#include <condition_variable>
#include <iostream>
#include <memory>
#include <mutex>
#include <thread>

int subtask1(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  return x;
}

int subtask2(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(500));
  return x;
}

struct Sync {
  std::mutex m;
  std::condition_variable cv;
  bool result1_ready = false;
  int result1;
};

int run(int a, int b) {
  auto sync = std::make_shared<Sync>();

  std::thread t([a, sync]() {
    int tmp = subtask1(a);

    std::lock_guard<std::mutex> lock(sync->m);
    sync->result1 = tmp;
    sync->result1_ready = true;
    sync->cv.notify_all();
  });
  t.detach();

  int result2 = subtask2(b);
  if (result2 < 0) {
    return -2;
  }

  std::unique_lock<std::mutex> lock(sync->m);
  sync->cv.wait(lock, [&]() { return sync->result1_ready; });
  if (sync->result1 < 0) {
    return -1;
  }

  return 0;
}

int main(int argc, char **argv) {
  if (argc < 3) {
    std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
    return 1;
  }

  std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
  return 0;
}

Solution 4: Use std::async instead

If you find it too cumbersome to write std::mutex and std::condition_variable yourself, we can also rewrite the std::async function as defined by the <future> header file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <chrono>
#include <future>
#include <iostream>
#include <thread>

int subtask1(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  return x;
}

int subtask2(int x) {
  std::this_thread::sleep_for(std::chrono::milliseconds(500));
  return x;
}

int run(int a, int b) {
  std::future<int> result1 = std::async(std::launch::async, subtask1, a);

  int result2 = subtask2(b);
  if (result2 < 0) {
    return -2;
  }

  if (result1.get() < 0) {
    return -1;
  }

  return 0;
}

int main(int argc, char **argv) {
  if (argc < 3) {
    std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
    return 1;
  }

  std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
  return 0;
}

In the above code, std::async(std::launch::async, subtask1, a) creates a thread to execute subtask1(a). After the execution, the return value of subtask1 will be put into std::future<int>. We can get the return value with result1.get(). If subtask1 takes longer to execute, result1.get() will stop and wait for the result of subtask1.

The underlying implementation of std::async also calls the detach member function. So, as with solution 3, we must ensure that the life cycle of the object is longer than the execution time.