The use of
defer statements in Go can be considered as a language style or habit. But while convenience is convenient, being able to use the
defer statement to delay closing/cleaning files without errors is not so easy to guarantee.
Writing files and closing them
For example, in the code related to Go language file operations, you can see code like the following everywhere.
This code may not seem like a problem at first glance. One of the problems is that the return value of the
Close() method is not checked.
Close() function actually returns an
error. In the normal course of writing files, we are very used to checking the return value of
Write() and usually ignore the return value of
Close(). And we usually think that most of the problems will occur during
Write() rather than during
Close(). This contributes to our disregard for the return value of
So the question is, does the success of
Write() necessarily mean that the contents of the file have been successfully persisted to the storage device? It doesn’t .
Since it’s not, it means we’re probably risking file data loss.
Write is a sure success?
Anyone who knows a little bit about computer architecture knows that the farther away you are from the CPU, the slower the CPU will be able to manipulate the data it needs. The fastest is the CPU’s internal registers, the slower is RAM, and the slowest is network I/O and disk I/O. Therefore, if every
Write() operation commits the data to be written to disk synchronously, the operating system’s response time will probably be extremely slow.
Another perverse behavior is to write files byte by byte. Anyone familiar with mechanical hard drives knows that the hard drive moves the mechanical arm and head to the specified sector when writing data. You can imagine how horrible it is to write a byte and possibly move it once.
But fortunately, these things don’t really happen. At the operating system and hard drive level itself, they both do their own caching (Cache) and buffering (Buffer). Cache puts hot data in memory so that you don’t have to actually access the drive every time you read the same data. Buffering is used to accumulate data, so that small amounts of data that are written multiple times can be combined together and then made available to the hard drive for writing all at once. Both caching and buffering can greatly improve the performance of reading and writing data.
So, when we call
Write() and it returns “successfully”, it only means that the data was successfully cached. It is not known whether the data was actually dropped (meaning that the data was successfully persisted to the storage device). The best time to drop the disk is determined by the operating system and the disk itself. If you experience a sudden power failure before the disk is dropped (but
Write succeeded), then the data is most likely lost (the file system itself has the relevant recovery capabilities, so this article will not discuss this category).
Better error timing
The operating system, unsurprisingly, assumes that after we
Close() a file, there will be no more operations. This is when it reports errors to us. So, the aforementioned “luck” brings us the “misfortune” that we may encounter very few errors with
Write(). But in the last step
Close(), all the previous errors are thrown together.
This can also be verified by looking at the instructions for
man 2 close #darwin).
EIO tells it like it is: you finally close the file, but I have to tell you that the writes before closing the file failed!
How unfortunate is that!
Focus on the Close() error
After the above discussion, I think we can agree that the return value of
Close() is very important!
So we may have to change the code:
Don’t accidentally change the logic of defer
This is written with the usage of
defer in Go in mind: the arguments of the function being deferred (including the receiver) are evaluated during the execution of the defer statement.
So, assuming that
f is reused later, note the difference.
But the focus of this article is not on the usage of defer itself, so I won’t discuss it anymore, just a reminder: if you read this article and want to refactor the code, don’t accidentally change the original defer logic in the code.
Or, don’t put
However, if you write it this way, you won’t experience the benefits of defer, so it’s not recommended.
Close() also does not report an error, so everything is fine?
If you read the operating system’s
close() help again, you’ll find something new.
Note: A successful close does not guarantee that the data has been successfully saved to disk, as the kernel uses the buffer cache to defer writes. Typically, filesystems do not flush buffers when a file is closed. If you need to be sure that the data is physically stored on the underlying disk, use fsync(2). (It will depend on the disk hardware at this point.)
Simply put, a successful
close() does not guarantee that the data is also successfully stored to disk, because the OS kernel uses buffering to delay write operations. And generally speaking, the file system does not flush when the file is closed. If you want to make sure that the data is actually stored to disk successfully, you need to call
fsync() (which corresponds to the
f.Sync() method in Go, not to be distinguished later).
One thing not mentioned above is that
fsync() will cause the data to be stored to disk immediately. Of course, it is conceivable that this must have a performance impact: the
- Originally it was: without calling
fsync(), the data might have failed to store to disk, but the program didn’t know about it and the whole function would return quickly.
- Now it is: with
fsync()called, the data still has the store-to-disk failure problem, but the program can know about it, which increases the function wait time.
- This article does not conclude with best practices; users should determine for themselves whether to
Sync()and how to compromise.
- The Go language documentation for
Sync()is not very detailed and requires reference to the operating system documentation.