Anyone familiar with Lua knows that Lua allows you to modify and delete elements in a `for ... pairs` loop. pairs` loops to modify and delete elements in a table. There is nothing wrong with the following code:

 ``````1 2 3 4 5 6 `````` ``````local t = {a = 1, b = 2, c = 3} for k, v in pairs(t) do if v == 1 then t[k] = nil end end ``````

However, if we delete elements and add new elements while traversing, there will be a problem. For example:

 ``````1 2 3 4 5 6 7 `````` ``````local t = {a = 1, b = 2, c = 3} for k, v in pairs(t) do if v == 1 then t[k] = nil t[k .. 1] = v + 1 end end ``````

Running the above code will give you this error:

 ``````1 2 3 4 5 `````` ``````invalid key to 'next' stack traceback: [C]: in function 'next' stdin:1: in main chunk [C]: in ? ``````

Anyone familiar with Lua will be familiar with this error. The solution is to avoid adding new elements to the table during the traversal. The official Lua documentation also makes it clear: if any value is assigned to a field in the table that does not exist during the traversal, the behavior of `next` is undefined. But the error itself is interesting, as it involves many aspects of Lua’s mechanics, and I think it’s a good opportunity to understand Lua’s internal implementation.

## Puzzling behavior

Why is it interesting? First of all this error doesn’t always occur (the documentation also says “behavior undefined”), for example the following code doesn’t have this problem:

 ``````1 2 3 4 5 6 7 `````` ``````local t = {a = 1, b = 2, c = 3, d = 4, e = 5, f = 6} for k, v in pairs(t) do if v == 1 then t[k] = nil t[k .. 1] = v + 1 end end ``````

On the other hand, we know that the `for ... pairs` loop essentially calls `next` at each iteration, passing in the table and the key of the previous iteration to get the key-value pair for the current loop.pairs loop is essentially calling `next` at each iteration, passing in the table and the key of the previous iteration, and getting the key-value pair of the current loop. The meaning of the `next(t, k)` function is that, for a given table `t`, it returns the next key-value pair adjacent to the given key `k`. If `k` is the last element of `t`, then `nil` is returned; if `k` is `nil`, then the first element of `t` is returned. What if `k` is not in `t`? Naturally, an error should be reported:

 ``````1 2 3 4 5 6 `````` ``````> next({a = 1, b = 2}, 'c') invalid key to 'next' stack traceback: [C]: in function 'next' stdin:1: in main chunk [C]: in ? ``````

The familiar error. In that case, the code at the beginning of this article should also report an error, because when `v` equals 1, the corresponding key is deleted, causing the next iteration to call `next` to try to find the next element for a key that does not exist in the table. But not only does it work, the following code also works:

 ``````1 2 3 4 `````` ``````local t = {a = 1, b = 2} print(next(t, 'a')) -- b 2 t.a = nil print(next(t, 'a')) -- b 2 ``````

The second `next` call returns normally, as if `a` were still in `t`. The error occurs when we delete and then insert a new element:

 ``````1 2 3 4 5 `````` ``````local t = {a = 1, b = 2} print(next(t, 'a')) -- b 2 t.a = nil t.a1 = 2 print(next(t, 'a')) -- error: invalid key to 'next' ``````

Of course there’s more fun, to get this error, you don’t have to insert a new element, sometimes it comes out with a GC:

 ``````1 2 3 4 5 6 7 8 9 `````` ``````local k = 'a'..'b' local t = { a = 1, [k] = 2, } t[k] = nil k = nil collectgarbage("collect") next(t, 'a'..'b') -- error: invalid key to 'next' ``````

It’s written in a strange way to accommodate certain Lua mechanisms. If you change the above code a bit, for example by changing `next(t, 'a'...' b')` to `next(t, 'ab')`, the error will not appear.

## Implementation of next function

When calling `next(t, k)`, `k` should exist in `t`; but it seems to work if `k` has just been deleted. How can this be done? Let’s look at the implementation of the `next` function:

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 `````` ``````/* ltable.c */ static unsigned int findindex (lua_State *L, Table *t, StkId key) { unsigned int i; if (ttisnil(key)) return 0; /* first iteration */ i = arrayindex(key); if (i != 0 && i <= t->sizearray) /* is 'key' inside array part? */ return i; /* yes; that's the index */ else { int nx; Node *n = mainposition(t, key); for (;;) { /* check whether 'key' is somewhere in the chain */ /* key may be dead already, but it is ok to use it in 'next' */ if (luaV_rawequalobj(gkey(n), key) || (ttisdeadkey(gkey(n)) && iscollectable(key) && deadvalue(gkey(n)) == gcvalue(key))) { i = cast_int(n - gnode(t, 0)); /* key index in hash table */ /* hash elements are numbered after array ones */ return (i + 1) + t->sizearray; } nx = gnext(n); if (nx == 0) luaG_runerror(L, "invalid key to 'next'"); /* key not found */ else n += nx; } } } int luaH_next (lua_State *L, Table *t, StkId key) { unsigned int i = findindex(L, t, key); /* find original element */ for (; i < t->sizearray; i++) { /* try first array part */ if (!ttisnil(&t->array[i])) { /* a non-nil value? */ setivalue(key, i + 1); setobj2s(L, key+1, &t->array[i]); return 1; } } for (i -= t->sizearray; cast_int(i) < sizenode(t); i++) { /* hash part */ if (!ttisnil(gval(gnode(t, i)))) { /* a non-nil value? */ setobj2s(L, key, gkey(gnode(t, i))); setobj2s(L, key+1, gval(gnode(t, i))); return 1; } } return 0; /* no more elements */ } ``````

You can see that `luaH_next` first calls `findindex` to find the position of a given `key` in table `t`, and then looks backwards from that position to the next element. In `findindex`, several things are done:

• Line 5 determines that if `key` is `nil`, it returns 0;
• Lines 6 to 8, if `key` is a positive integer and is in the array field, return the array index;
• Otherwise, look in the hash field. First line 11 gets the primary location of `key`;
• In the loop from line 12 to 25, lines 14 to 16 check if the value of the current position is equal to `key`, if so, the lookup succeeds, line 19 returns directly; otherwise, line 21 checks the next position;
• If the node equal to `key` is not found until the end of the chain, the search fails and an error `invalid key to 'next'` is thrown on line 23.

As we saw earlier, calling `next` immediately after an element is set to `nil`, returns it normally. This means that `findindex` has successfully found the position of the element. Notice the comment on line 13: “key may be dead already, but it is ok to use it in ’next’”, which means that a deleted key is fine for `next`? With this in mind, let’s first see what happens when we execute `t[k] = nil`.

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 `````` ``````/* lvm.h */ #define luaV_fastset(L,t,k,slot,f,v) \ (!ttistable(t) \ ? (slot = NULL, 0) \ : (slot = f(hvalue(t), k), \ ttisnil(slot) ? 0 \ : (luaC_barrierback(L, hvalue(t), v), \ setobj2t(L, cast(TValue *,slot), v), \ 1))) /* lvm.c */ #define settableProtected(L,t,k,v) { const TValue *slot; \ if (!luaV_fastset(L,t,k,slot,luaH_get,v)) \ Protect(luaV_finishset(L,t,k,v,slot)); } void luaV_execute (lua_State *L) { ... /* main loop of interpreter */ for (;;) { Instruction i; StkId ra; vmfetch(); vmdispatch (GET_OPCODE(i)) { ... vmcase(OP_SETTABLE) { TValue *rb = RKB(i); TValue *rc = RKC(i); settableProtected(L, ra, rb, rc); vmbreak; } ... } } } ``````

We know that `t[k] = nil` executes the `OP_SETTABLE` instruction, so we find the `lvm.c` where the `luaV_execute` function executes `OP_SETTABLE`, which is lines 27 to 32 above, and see that it does several things:

• Get the operands, `ra` , `rb` , and `rc` for table, key, and value, respectively;
• Line 30 calls the macro `settableProtected` ;
• line 15 `settableProtected` first tries to call `luaV_fastset` ;
• `luaV_fastset` determines that `luaV_fastset` fails when `t` is not a table (line 4), or `k` is not in `t` (line 7), and returns 0. Otherwise, the value is assigned directly in line 9;
• If `luaV_fastset` fails, then `luaV_finishset` is called instead at line 16.

When we delete an element using `t[k] = nil`, `k` is obviously in `t`, so it will execute to line 9, assigning a value directly to `slot`. Notice that `slot` is actually the return value of `luaH_get`, which will be equal to the value of `k`. So when `t[k] = nil` is executed, it will only assign the value of `k` to `nil`, and will not change the key. This also gives the `next` function an opportunity to do so.

Let’s go back to `luaH_next` . It does this in `findindex`

 ``````1 2 3 4 `````` ``````if (luaV_rawequalobj(gkey(n), key) || (ttisdeadkey(gkey(n)) && iscollectable(key) && deadvalue(gkey(n)) == gcvalue(key))) { ... ``````

to determine if the current position is equal to `key`. If our key is not a GCObject, such as a number or a boolean, it can be accessed even if it is deleted, and `luaV_rawequalobj` always gives the correct result; even if the key is a GCObject, it can be compared properly as long as it is not GC’d. However, if the position of the key is occupied by an element inserted later, the lookup will fail. This is why it is possible to get an error like “invalid key to ’next’” when traversing both deleted and new elements.

When using the `for ... pairs` loop, you don’t have to worry about the key being GC’d, because the iterator always holds a reference to the current key. So in a `for ... pairs` loop, you can safely perform the delete operation.

## Lua’s GC mechanism

`(ttisdeadkey(gkey(n)) && iscollectable(key) && deadvalue(gkey(n)) == gcvalue(key))` is used to handle the case when the key is GC’d. If the current location is a deadkey, and `key` is a GCObject, compare `deadvalue(gkey(n)) == gcvalue(key)` , i.e. compare the address of the key at the current location (i.e. `gkey(n)` ) and the incoming `key` to see if they are equal. So what is deadkey? This brings us to Lua’s GC mechanism.

Lua GC works by stringing all the GCObjects into a chain; then periodically scanning and marking all the objects in the VM, and the ones that are not marked are the ones that need to be recycled. Let’s look at the function that scans the table:

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 `````` ``````/* lgc.c */ static void traversestrongtable (global_State *g, Table *h) { Node *n, *limit = gnodelast(h); unsigned int i; for (i = 0; i < h->sizearray; i++) /* traverse array part */ markvalue(g, &h->array[i]); for (n = gnode(h, 0); n < limit; n++) { /* traverse hash part */ checkdeadkey(n); if (ttisnil(gval(n))) /* entry is empty? */ removeentry(n); /* remove it */ else { lua_assert(!ttisnil(gkey(n))); markvalue(g, gkey(n)); /* mark key */ markvalue(g, gval(n)); /* mark value */ } } } ``````

To be precise, this is a function that scans non-weak tables. This function is relatively straightforward, and it does the following:

• Lines 6 to 7 scan the array fields, all marked;
• Lines 8 to 17 scan the hash field, where:
• lines 10 to 11 check if the value of the current position is `nil`, and if so, call `removeentry` ;
• Otherwise lines 13 to 15 mark the key and value of the current location.

Objects that are not marked are reclaimed, so if the value of an element is `nil`, it will be reclaimed in this GC. But before it is reclaimed, `removeentry` is also called, and what does it do?

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 `````` ``````/* lobject.h */ #define settt_(o,t) ((o)->tt_=(t)) #define setdeadvalue(obj) settt_(obj, LUA_TDEADKEY) /* lgc.c */ static void removeentry (Node *n) { lua_assert(ttisnil(gval(n))); if (valiswhite(gkey(n))) setdeadvalue(wgkey(n)); /* unused and unmarked key; remove it */ } ``````

It will mark the key of the element as dead. Note that this marker actually sets the object’s type `tt_` field to `LUA_TDEADKEY` . This way when calling `luaV_rawequalobj` to compare a deadkey with a regular object it always returns false. This way when `deadvalue(gkey(n)) == gcvalue(key)` is executed, the key of element `n` must have been GC’d.

For table, function, thread, userdata, two objects are equal if and only if their addresses are equal. When an object is GC’d, it doesn’t have any references, so it can’t be passed into `next`.

This is not the case with strings, where the equality of two strings depends on the content of the string, not on its address. So when a string is GC’d, we can still construct a string that is equal to it, but with a different address. Passing the newly constructed string into `next`, will cause an error. In the previous example, the reason for using `'a'...' b'' is because we want him to construct a different object. Just change any of the ''a'...' b'` anywhere in it, then the string `ab'` will be in the constant table, so it won’t be GC’d; Lua has a short string reuse mechanism, so `'a'...' b'` and `'ab'` are actually the same object, so this error will not occur.

## Conclusion

I was surprised when I first discovered that `next` could pass in a key that had just been deleted, because a deleted key should be something like a wild pointer and should not be used anymore. But after a closer look, everything is fine. This made me feel the subtlety of Lua design once again. This little problem involves the Lua table implementation, the GC mechanism, the constant table mechanism, and short string reuse. It is rather boring to gnaw on Lua source code, so we can explore Lua implementation from some interesting phenomena.