Anyone who has used Objective-C will know that declaring an attribute as atomic does not solve the multi-threading problem for mutable objects. If this is the case, then what is the point of this property? In this article, we’ll compare several programming languages that support reference counting and talk about the “underlying logic” of this age-old topic.
As we know, atomic and nonatomic are mainly for properties of object types and have no effect on primitive types. For properties of object types, if you use nonatomic, you need to make sure that there are no multiple threads reading and writing the property at the same time, otherwise it will crash.
What is the difference between object types and primitive types in terms of reading and writing properties? The answer is reference counting. For the following code.
Let’s look at the setter method generated for it by the compiler (since it’s a generated method, there will only be assembly code here).
|
|
We notice that for the nonatomic property, the compiler generates the same code as the stack variable assignment, which is the objc_storeStrong runtime function. We can find the implementation of this function in the objc source code as follows.
This code contains multiple operations, including memory read/write and application count operations, and there are many interleaved points in the multi-threaded execution. The most typical example is that both threads read location to prev and then perform subsequent operations separately, resulting in the same object being freed multiple times, thus creating a dangling pointer.
Next, let’s see what happens in the generated code in the same scenario, replacing the attribute with atomic.
|
|
As you can see, the key function becomes objc_setProperty_atomic, and the implementation of this function can also be found in the source code.
|
|
Runtime is also very simple in solving this problem, we just need to ensure that the modification of the property pointer and the acquisition of the old value is an atomic operation. The atomization here uses a spin lock, and to avoid serious lock competition in case of high concurrency, a global StripedMap is used for optimization, which is a very common optimization tool. Here you can actually use CAS operations instead of locking operations, but it needs to be verified whether the performance is really improved.
Why doesn’t the last objc_release need to be in the lock’s critical zone? We know that the problem with nonatomic is that multiple threads get the old value of the property and release it at the same time; with atomic, the new value is set at the same time as the old value of the property, and there is no case where two threads get the same old value. The reference counting is also an atomic operation, so there is no need for additional locking in the case of clear ownership.
Reference counting support in other languages
The case in C++ (clang STL)
Since Objective-C solves this problem perfectly with the atomic property, is there a similar problem in C++? Let’s also verify this using the following code.
Reading and writing someProperty fields simultaneously in multiple threads also crashes, which means that nonatomic in Objective-C is not a performance optimization. Just like @synchronized, atomic is actually an additional capability provided by Objective-C to handle this multi-threaded scenario.
The cause of the crash in C++ is very similar to nonatomic in Objective-C, so let’s also look at what happens when we assign a value to someProperty. Here I have written an assignment function.
The assembly code is as follows.
|
|
Since C++ supports operator= from the object assignment operation, a simple assignment expression is actually a function call, and the result is shown here after inline. And std::move is a cast operation that has no effect on the value content, so we can analyze the key method directly. The symbols of this method are expanded by the template at compile time, and actually correspond to the following method of std::shared_ptr.
This code seems to do a lot of operations, but there is only one place we need to focus on, and that is the this pointer. As mentioned at the beginning of the article, two threads perform this operation at the same time, and the only thing that could be the same is the this pointer to the old value of the variable. Let’s continue down the call chain.
There are two swap operations here, both of which are in fact mundane swap operations on pointers, but not atomic ones.
|
|
Let’s consider two threads calling the above method at the same time, with __x being the new value and __y being the old value, then the step __x = __y has the potential for both threads to get the same old value. Next, the call stack exits, and in this code will be released twice due to RAII.
From this we can see that C++ does not perform the same operations as Objective-C in this variable swapping process due to syntax features. However, the fundamental problem is that the same object is freed multiple times, because getting the old value and writing the new value are not atomic operations.
How to fix
Attempt 1
The easier way to think of is to protect the attribute assignment using std::mutex.
This causes a minor performance problem, though. If the old value of someProperty is uniquely referenced, then after the assignment, the old value will be released in lock scope.
Attempt 2
This potential performance problem can be optimized if we first construct a temporary variable to take over the old value and destroy the temporary variable outside the lock. We can also implement this operation here by swap.
In this way, you can achieve a similar effect to Objective-C atomic by first atomically swapping the new value with the old one, and then releasing the old value outside the lock. It is worth noting that C++ has move semantics, and the temporary variable in the first line actually swaps with val, so that the contents of temp after the swap are the contents of val before, and val becomes an invalid object. After the function scope exits, both temp and val will be destructed, but the destruct of val will be a no-op. If you turn on compilation optimization, many operations of shared_ptr will be inline, and the performance will be better.
The situation in Rust
To better answer the question in the article title, we introduce here a comparison of Rust to see how the same scenario is handled in Rust.
First we construct the code for the same logic.
|
|
After compiling we get an error: obj has been mutably referenced multiple times, which is not allowed in Rust.
How can the compiler determine that the closure is still capturing external variables after it ends? We see the implementation of Scope spawn in the standard library.
As you can see, the lifecycle of the closure F is the same as the Scope itself, meaning that the captured variables inside it will also last until the destruction of the Scope. A single mutable reference is another important principle of Rust, preventing competing accesses and some other problems by this restriction.
Since you can’t have multiple mutable references, you can construct only multiple immutable references, right? Can we use “Interior Mutability” to achieve our need.
The answer is no, because Cell does not implement Sync, so the types containing Cell references will not implement Send, and these variables naturally cannot cross thread boundaries. Interestingly enough, when we look at the implementation of Cell::set we see that
|
|
This implementation is the same as the implementation of shared_ptr swap in C++: both get the old value, set the new value, and destroy the old value. In the case of no lock protection, the old value is released twice.
How to fix
The method is actually also very simple, the multi-threaded scenario is straightforward using Mutex and we modify the field type.
The operation to update a field should also be swap inside the lock + drop outside the lock.
Rust has a very good design for Mutex, where each Mutex is explicitly bound to a value. For a value to be read or written in multiple threads, it must be protected by a Mutex. All types that implement Send can become Sync when Mutex is applied. For objects with internal mutability (e.g. Arc), they may not be protected when used in multiple threads, but in fact thread safety is the responsibility of the object itself.
Why doesn’t Mutex make all objects Sync?
For !Send types (e.g. Rc), they generally represent some shared resource, and the types do not take into account the handling in multi-threaded scenarios. For example, when Rc is moved to a different thread, there is a high probability that two threads drop Rc at the same time resulting in inconsistent reference counts.
In addition, mem::swap and the single mutable borrowing principle ensure that thread safety is guaranteed in contexts where swap can be performed, and we cannot write unsafe swap operations.
So with Rust, we can better understand the issue raised in the article’s title. To break it down, Mutex<Arc<T>> involves two thread-safe guarantees.
- the guarantee of atomic modification of reference counts by
Arcitself, which is implemented here using Atomic operations. - the protection of
MutexforArcpointer modifications, preventing multiple releases ofArcdue to the presence of dirty values in multi-threaded operations.
That is, whether the reference counting mechanism itself is thread-safe or not has nothing to do with manipulating the same property of the same object in multiple threads.
Summary
The article seems to analyze how the reference counting mechanism in several system programming languages (Objective-C does not count if strictly speaking) behaves under multiple threads, but it actually explains the essence of thread safety: in the object model, the thread safety of an object does not mean that all scenarios in which the object is used are thread safe. External objects that are not thread-safe may have logical errors even if they operate on a thread-safe object. The reference counting in this article is just one example, and it just so happens that this example involves memory operations that can easily lead to obvious segfaults.
There are other multi-threaded scenarios that we may encounter in our daily development where the lack of thread-safe logic is even less noticeable and therefore more worthy of our attention.