Anyone familiar with Lua knows that Lua allows you to modify and delete elements in a
for ... pairs loop. pairs` loops to modify and delete elements in a table. There is nothing wrong with the following code:
However, if we delete elements and add new elements while traversing, there will be a problem. For example:
Running the above code will give you this error:
Anyone familiar with Lua will be familiar with this error. The solution is to avoid adding new elements to the table during the traversal. The official Lua documentation also makes it clear: if any value is assigned to a field in the table that does not exist during the traversal, the behavior of
next is undefined. But the error itself is interesting, as it involves many aspects of Lua’s mechanics, and I think it’s a good opportunity to understand Lua’s internal implementation.
Why is it interesting? First of all this error doesn’t always occur (the documentation also says “behavior undefined”), for example the following code doesn’t have this problem:
On the other hand, we know that the
for ... pairs loop essentially calls
next at each iteration, passing in the table and the key of the previous iteration to get the key-value pair for the current loop.pairs loop is essentially calling
next at each iteration, passing in the table and the key of the previous iteration, and getting the key-value pair of the current loop. The meaning of the
next(t, k) function is that, for a given table
t, it returns the next key-value pair adjacent to the given key
k is the last element of
nil is returned; if
nil, then the first element of
t is returned. What if
k is not in
t? Naturally, an error should be reported:
The familiar error. In that case, the code at the beginning of this article should also report an error, because when
v equals 1, the corresponding key is deleted, causing the next iteration to call
next to try to find the next element for a key that does not exist in the table. But not only does it work, the following code also works:
next call returns normally, as if
a were still in
t. The error occurs when we delete and then insert a new element:
Of course there’s more fun, to get this error, you don’t have to insert a new element, sometimes it comes out with a GC:
It’s written in a strange way to accommodate certain Lua mechanisms. If you change the above code a bit, for example by changing
next(t, 'a'...' b') to
next(t, 'ab'), the error will not appear.
Implementation of next function
k should exist in
t; but it seems to work if
k has just been deleted. How can this be done? Let’s look at the implementation of the
You can see that
luaH_next first calls
findindex to find the position of a given
key in table
t, and then looks backwards from that position to the next element. In
findindex, several things are done:
- Line 5 determines that if
nil, it returns 0;
- Lines 6 to 8, if
keyis a positive integer and is in the array field, return the array index;
- Otherwise, look in the hash field. First line 11 gets the primary location of
- In the loop from line 12 to 25, lines 14 to 16 check if the value of the current position is equal to
key, if so, the lookup succeeds, line 19 returns directly; otherwise, line 21 checks the next position;
- If the node equal to
keyis not found until the end of the chain, the search fails and an error
invalid key to 'next'is thrown on line 23.
As we saw earlier, calling
next immediately after an element is set to
nil, returns it normally. This means that
findindex has successfully found the position of the element. Notice the comment on line 13: “key may be dead already, but it is ok to use it in ’next’”, which means that a deleted key is fine for
next? With this in mind, let’s first see what happens when we execute
t[k] = nil.
We know that
t[k] = nil executes the
OP_SETTABLE instruction, so we find the
lvm.c where the
luaV_execute function executes
OP_SETTABLE, which is lines 27 to 32 above, and see that it does several things:
- Get the operands,
rcfor table, key, and value, respectively;
- Line 30 calls the macro
- line 15
settableProtectedfirst tries to call
tis not a table (line 4), or
kis not in
t(line 7), and returns 0. Otherwise, the value is assigned directly in line 9;
luaV_finishsetis called instead at line 16.
When we delete an element using
t[k] = nil,
k is obviously in
t, so it will execute to line 9, assigning a value directly to
slot. Notice that
slot is actually the return value of
luaH_get, which will be equal to the value of
k. So when
t[k] = nil is executed, it will only assign the value of
nil, and will not change the key. This also gives the
next function an opportunity to do so.
Let’s go back to
luaH_next . It does this in
to determine if the current position is equal to
key. If our key is not a GCObject, such as a number or a boolean, it can be accessed even if it is deleted, and
luaV_rawequalobj always gives the correct result; even if the key is a GCObject, it can be compared properly as long as it is not GC’d. However, if the position of the key is occupied by an element inserted later, the lookup will fail. This is why it is possible to get an error like “invalid key to ’next’” when traversing both deleted and new elements.
When using the
for ... pairs loop, you don’t have to worry about the key being GC’d, because the iterator always holds a reference to the current key. So in a
for ... pairs loop, you can safely perform the delete operation.
Lua’s GC mechanism
(ttisdeadkey(gkey(n)) && iscollectable(key) && deadvalue(gkey(n)) == gcvalue(key)) is used to handle the case when the key is GC’d. If the current location is a deadkey, and
key is a GCObject, compare
deadvalue(gkey(n)) == gcvalue(key) , i.e. compare the address of the key at the current location (i.e.
gkey(n) ) and the incoming
key to see if they are equal. So what is deadkey? This brings us to Lua’s GC mechanism.
Lua GC works by stringing all the GCObjects into a chain; then periodically scanning and marking all the objects in the VM, and the ones that are not marked are the ones that need to be recycled. Let’s look at the function that scans the table:
To be precise, this is a function that scans non-weak tables. This function is relatively straightforward, and it does the following:
- Lines 6 to 7 scan the array fields, all marked;
- Lines 8 to 17 scan the hash field, where:
- lines 10 to 11 check if the value of the current position is
nil, and if so, call
- Otherwise lines 13 to 15 mark the key and value of the current location.
- lines 10 to 11 check if the value of the current position is
Objects that are not marked are reclaimed, so if the value of an element is
nil, it will be reclaimed in this GC. But before it is reclaimed,
removeentry is also called, and what does it do?
It will mark the key of the element as dead. Note that this marker actually sets the object’s type
tt_ field to
LUA_TDEADKEY . This way when calling
luaV_rawequalobj to compare a deadkey with a regular object it always returns false. This way when
deadvalue(gkey(n)) == gcvalue(key) is executed, the key of element
n must have been GC’d.
For table, function, thread, userdata, two objects are equal if and only if their addresses are equal. When an object is GC’d, it doesn’t have any references, so it can’t be passed into
This is not the case with strings, where the equality of two strings depends on the content of the string, not on its address. So when a string is GC’d, we can still construct a string that is equal to it, but with a different address. Passing the newly constructed string into
next, will cause an error. In the previous example, the reason for using
'a'...' b'' is because we want him to construct a different object. Just change any of the ''a'...' b' anywhere in it, then the string
ab' will be in the constant table, so it won’t be GC’d; Lua has a short string reuse mechanism, so
'a'...' b' and
'ab' are actually the same object, so this error will not occur.
I was surprised when I first discovered that
next could pass in a key that had just been deleted, because a deleted key should be something like a wild pointer and should not be used anymore. But after a closer look, everything is fine. This made me feel the subtlety of Lua design once again. This little problem involves the Lua table implementation, the GC mechanism, the constant table mechanism, and short string reuse. It is rather boring to gnaw on Lua source code, so we can explore Lua implementation from some interesting phenomena.