Anyone who has used Objective-C will know that declaring an attribute as atomic does not solve the multi-threading problem for mutable objects. If this is the case, then what is the point of this property? In this article, we’ll compare several programming languages that support reference counting and talk about the “underlying logic” of this age-old topic.

As we know, atomic and nonatomic are mainly for properties of object types and have no effect on primitive types. For properties of object types, if you use nonatomic, you need to make sure that there are no multiple threads reading and writing the property at the same time, otherwise it will crash.

What is the difference between object types and primitive types in terms of reading and writing properties? The answer is reference counting. For the following code.

1
2
3
4
5
@interface SomeObject : NSObject

@property (nonatomic, strong) NSObject *someProperty;

@end

Let’s look at the setter method generated for it by the compiler (since it’s a generated method, there will only be assembly code here).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
-[SomeObject setSomeProperty:]:
    pushq  %rbp
    movq   %rsp, %rbp
    subq   $0x20, %rsp
    movq   %rdi, -0x8(%rbp)
    movq   %rsi, -0x10(%rbp)
    movq   %rdx, -0x18(%rbp)
    movq   -0x18(%rbp), %rsi
    movq   -0x8(%rbp), %rdi
    addq   $0x8, %rdi
    callq  objc_storeStrong  ; Key Function
    addq   $0x20, %rsp
    popq   %rbp
    retq

We notice that for the nonatomic property, the compiler generates the same code as the stack variable assignment, which is the objc_storeStrong runtime function. We can find the implementation of this function in the objc source code as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
void objc_storeStrong(id *location, id obj)
{
    id prev = *location;
    if (obj == prev) {
        return;
    }
    objc_retain(obj);
    *location = obj;
    objc_release(prev);
}

This code contains multiple operations, including memory read/write and application count operations, and there are many interleaved points in the multi-threaded execution. The most typical example is that both threads read location to prev and then perform subsequent operations separately, resulting in the same object being freed multiple times, thus creating a dangling pointer.

Next, let’s see what happens in the generated code in the same scenario, replacing the attribute with atomic.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
-[SomeObject setSomeProperty:]:
    pushq  %rbp
    movq   %rsp, %rbp
    subq   $0x20, %rsp
    movq   %rdi, -0x8(%rbp)
    movq   %rsi, -0x10(%rbp)
    movq   %rdx, -0x18(%rbp)
    movq   -0x10(%rbp), %rsi
    movq   -0x8(%rbp), %rdi
    movq   -0x18(%rbp), %rdx
    movl   $0x8, %ecx
    callq  objc_setProperty_atomic  ; Key Function
    addq   $0x20, %rsp
    popq   %rbp
    retq

As you can see, the key function becomes objc_setProperty_atomic, and the implementation of this function can also be found in the source code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
void objc_setProperty_atomic(id self, SEL _cmd, id newValue, ptrdiff_t offset)
{
    reallySetProperty(self, _cmd, newValue, offset, true, false, false);
}

static inline void reallySetProperty(id self, SEL _cmd, id newValue, ptrdiff_t offset, bool atomic, bool copy, bool mutableCopy)
{
    if (offset == 0) {
        object_setClass(self, newValue);
        return;
    }

    id oldValue;
    id *slot = (id*) ((char*)self + offset);

    if (copy) {
        newValue = [newValue copyWithZone:nil];
    } else if (mutableCopy) {
        newValue = [newValue mutableCopyWithZone:nil];
    } else {
        if (*slot == newValue) return;
        newValue = objc_retain(newValue);
    }

    if (!atomic) {
        oldValue = *slot;
        *slot = newValue;
    } else {
        spinlock_t& slotlock = PropertyLocks[slot];
        slotlock.lock();
        oldValue = *slot;
        *slot = newValue;
        slotlock.unlock();
    }

    objc_release(oldValue);
}

Runtime is also very simple in solving this problem, we just need to ensure that the modification of the property pointer and the acquisition of the old value is an atomic operation. The atomization here uses a spin lock, and to avoid serious lock competition in case of high concurrency, a global StripedMap is used for optimization, which is a very common optimization tool. Here you can actually use CAS operations instead of locking operations, but it needs to be verified whether the performance is really improved.

Why doesn’t the last objc_release need to be in the lock’s critical zone? We know that the problem with nonatomic is that multiple threads get the old value of the property and release it at the same time; with atomic, the new value is set at the same time as the old value of the property, and there is no case where two threads get the same old value. The reference counting is also an atomic operation, so there is no need for additional locking in the case of clear ownership.

Reference counting support in other languages

The case in C++ (clang STL)

Since Objective-C solves this problem perfectly with the atomic property, is there a similar problem in C++? Let’s also verify this using the following code.

1
2
3
struct SomeObject {
    std::shared_ptr<std::string> someProperty;
};

Reading and writing someProperty fields simultaneously in multiple threads also crashes, which means that nonatomic in Objective-C is not a performance optimization. Just like @synchronized, atomic is actually an additional capability provided by Objective-C to handle this multi-threaded scenario.

The cause of the crash in C++ is very similar to nonatomic in Objective-C, so let’s also look at what happens when we assign a value to someProperty. Here I have written an assignment function.

1
2
3
void writeProperty(SomeObject *obj, std::shared_ptr<std::string> &&val) {
    obj->someProperty = std::move(val);
}

The assembly code is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
writeProperty:
    pushq  %rbp
    movq   %rsp, %rbp
    subq   $0x10, %rsp
    movq   %rdi, -0x8(%rbp)
    movq   %rsi, -0x10(%rbp)
    movq   -0x10(%rbp), %rdi
    callq  std::__1::move<std::__1::shared_ptr<std::__1::basic_string<char, std::__1::char_traits<char>, <char> > >&> at move.h:27
    movq   %rax, %rsi
    movq   -0x8(%rbp), %rdi
    ; Key Method:
    callq  std::__1::shared_ptr<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > red_ptr.h:989
    addq   $0x10, %rsp
    popq   %rbp
    retq

Since C++ supports operator= from the object assignment operation, a simple assignment expression is actually a function call, and the result is shown here after inline. And std::move is a cast operation that has no effect on the value content, so we can analyze the key method directly. The symbols of this method are expanded by the template at compile time, and actually correspond to the following method of std::shared_ptr.

1
2
3
4
5
6
7
8
template<class _Tp>
inline
shared_ptr<_Tp>&
shared_ptr<_Tp>::operator=(shared_ptr&& __r) _NOEXCEPT
{
    shared_ptr(_VSTD::move(__r)).swap(*this);
    return *this;
}

This code seems to do a lot of operations, but there is only one place we need to focus on, and that is the this pointer. As mentioned at the beginning of the article, two threads perform this operation at the same time, and the only thing that could be the same is the this pointer to the old value of the variable. Let’s continue down the call chain.

1
2
3
4
5
6
7
8
template<class _Tp>
inline
void
shared_ptr<_Tp>::swap(shared_ptr& __r) _NOEXCEPT
{
    _VSTD::swap(__ptr_, __r.__ptr_);
    _VSTD::swap(__cntrl_, __r.__cntrl_);
}

There are two swap operations here, both of which are in fact mundane swap operations on pointers, but not atomic ones.

1
2
3
4
5
6
7
template <class _Tp>
inline _LIBCPP_INLINE_VISIBILITY __swap_result_t<_Tp> _LIBCPP_CONSTEXPR_AFTER_CXX17 swap(_Tp& __x, _Tp& __y)
    _NOEXCEPT_(is_nothrow_move_constructible<_Tp>::value&& is_nothrow_move_assignable<_Tp>::value) {
  _Tp __t(_VSTD::move(__x));
  __x = _VSTD::move(__y);
  __y = _VSTD::move(__t);
}

Let’s consider two threads calling the above method at the same time, with __x being the new value and __y being the old value, then the step __x = __y has the potential for both threads to get the same old value. Next, the call stack exits, and in this code will be released twice due to RAII.

1
2
3
4
5
6
7
8
9
template<class _Tp>
inline
shared_ptr<_Tp>&
shared_ptr<_Tp>::operator=(shared_ptr&& __r) _NOEXCEPT
{
    shared_ptr(_VSTD::move(__r)).swap(*this);
    return *this;
    // Temporary variables exit the scope and the old values they represent are released.
}

From this we can see that C++ does not perform the same operations as Objective-C in this variable swapping process due to syntax features. However, the fundamental problem is that the same object is freed multiple times, because getting the old value and writing the new value are not atomic operations.

How to fix

Attempt 1

The easier way to think of is to protect the attribute assignment using std::mutex.

1
2
3
4
5
6
7
8
9
struct SomeObject {
    std::mutex fieldLock;
    std::shared_ptr<std::string> someProperty;
};

void writeProperty(SomeObject *obj, std::shared_ptr<std::string> &&val) {
    std::unique_lock<std::mutex> lock(obj->fieldLock);
    obj->someProperty = std::move(val);
}

This causes a minor performance problem, though. If the old value of someProperty is uniquely referenced, then after the assignment, the old value will be released in lock scope.

Attempt 2

This potential performance problem can be optimized if we first construct a temporary variable to take over the old value and destroy the temporary variable outside the lock. We can also implement this operation here by swap.

1
2
3
4
5
6
7
void writeProperty(SomeObject *obj, std::shared_ptr<std::string> &&val) {
    std::shared_ptr<std::string> temp(std::move(val));

    std::unique_lock<std::mutex> lock(obj->fieldLock);
    temp.swap(obj->someProperty);
    lock.unlock();
}

In this way, you can achieve a similar effect to Objective-C atomic by first atomically swapping the new value with the old one, and then releasing the old value outside the lock. It is worth noting that C++ has move semantics, and the temporary variable in the first line actually swaps with val, so that the contents of temp after the swap are the contents of val before, and val becomes an invalid object. After the function scope exits, both temp and val will be destructed, but the destruct of val will be a no-op. If you turn on compilation optimization, many operations of shared_ptr will be inline, and the performance will be better.

The situation in Rust

To better answer the question in the article title, we introduce here a comparison of Rust to see how the same scenario is handled in Rust.

First we construct the code for the same logic.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
use std::sync::Arc;
use std::thread;

struct SomeObject {
    some_field: Arc<String>,
}

fn make_shared_string() -> Arc<String> {
    Arc::new("this is a string".to_owned())
}

#[test]
fn test() {
    let mut obj = SomeObject {some_field: make_shared_string()};
    thread::scope(|s| {
        for _ in 0..12 {
            s.spawn(|| {
                obj.some_field = make_shared_string();
            });
        }
    });
}

After compiling we get an error: obj has been mutably referenced multiple times, which is not allowed in Rust.

How can the compiler determine that the closure is still capturing external variables after it ends? We see the implementation of Scope spawn in the standard library.

1
2
3
4
5
6
7
8
#[stable(feature = "scoped_threads", since = "1.63.0")]
pub fn spawn<F, T>(&'scope self, f: F) -> ScopedJoinHandle<'scope, T>
where
    F: FnOnce() -> T + Send + 'scope,
    T: Send + 'scope,
{
    Builder::new().spawn_scoped(self, f).expect("failed to spawn thread")
}

As you can see, the lifecycle of the closure F is the same as the Scope itself, meaning that the captured variables inside it will also last until the destruction of the Scope. A single mutable reference is another important principle of Rust, preventing competing accesses and some other problems by this restriction.

Since you can’t have multiple mutable references, you can construct only multiple immutable references, right? Can we use “Interior Mutability” to achieve our need.

1
2
3
struct SomeObject {
    some_field: Cell<Arc<String>>,
}

The answer is no, because Cell does not implement Sync, so the types containing Cell references will not implement Send, and these variables naturally cannot cross thread boundaries. Interestingly enough, when we look at the implementation of Cell::set we see that

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
impl<T> Cell<T> {
    // ...

    #[inline]
    #[stable(feature = "rust1", since = "1.0.0")]
    pub fn set(&self, val: T) {
        let old = self.replace(val);
        drop(old);
    }

    #[stable(feature = "move_cell", since = "1.17.0")]
    pub fn replace(&self, val: T) -> T {
        // SAFETY: This can cause data races if called from a separate thread,
        // but `Cell` is `!Sync` so this won't happen.
        mem::replace(unsafe { &mut *self.value.get() }, val)
    }

    // ...
}

This implementation is the same as the implementation of shared_ptr swap in C++: both get the old value, set the new value, and destroy the old value. In the case of no lock protection, the old value is released twice.

How to fix

The method is actually also very simple, the multi-threaded scenario is straightforward using Mutex and we modify the field type.

1
2
3
struct SomeObject {
    some_field: Mutex<Arc<String>>,
}

The operation to update a field should also be swap inside the lock + drop outside the lock.

1
2
3
4
5
6
7
8
// ...
s.spawn(|| loop {
    let mut new_value = make_shared_string();
    {
        let mut some_field_guard = obj.some_field.lock().unwrap();
        std::mem::swap(&mut new_value, &mut *some_field_guard);
    }
});

Rust has a very good design for Mutex, where each Mutex is explicitly bound to a value. For a value to be read or written in multiple threads, it must be protected by a Mutex. All types that implement Send can become Sync when Mutex is applied. For objects with internal mutability (e.g. Arc), they may not be protected when used in multiple threads, but in fact thread safety is the responsibility of the object itself.

Why doesn’t Mutex make all objects Sync?

For !Send types (e.g. Rc), they generally represent some shared resource, and the types do not take into account the handling in multi-threaded scenarios. For example, when Rc is moved to a different thread, there is a high probability that two threads drop Rc at the same time resulting in inconsistent reference counts.

In addition, mem::swap and the single mutable borrowing principle ensure that thread safety is guaranteed in contexts where swap can be performed, and we cannot write unsafe swap operations.

So with Rust, we can better understand the issue raised in the article’s title. To break it down, Mutex<Arc<T>> involves two thread-safe guarantees.

  1. the guarantee of atomic modification of reference counts by Arc itself, which is implemented here using Atomic operations.
  2. the protection of Mutex for Arc pointer modifications, preventing multiple releases of Arc due to the presence of dirty values in multi-threaded operations.

That is, whether the reference counting mechanism itself is thread-safe or not has nothing to do with manipulating the same property of the same object in multiple threads.

Summary

The article seems to analyze how the reference counting mechanism in several system programming languages (Objective-C does not count if strictly speaking) behaves under multiple threads, but it actually explains the essence of thread safety: in the object model, the thread safety of an object does not mean that all scenarios in which the object is used are thread safe. External objects that are not thread-safe may have logical errors even if they operate on a thread-safe object. The reference counting in this article is just one example, and it just so happens that this example involves memory operations that can easily lead to obvious segfaults.

There are other multi-threaded scenarios that we may encounter in our daily development where the lack of thread-safe logic is even less noticeable and therefore more worthy of our attention.