The use of defer statements in Go can be considered as a language style or habit. But while convenience is convenient, being able to use the defer statement to delay closing/cleaning files without errors is not so easy to guarantee.

Writing files and closing them

For example, in the code related to Go language file operations, you can see code like the following everywhere.

1
2
3
4
5
6
7
8
9
f, err := os.Create(`path`)
if err != nil {
	panic(err)
}
defer f.Close()

if _, err := f.Write([]byte(`content`)); err != nil {
	panic(err)
}

This code may not seem like a problem at first glance. One of the problems is that the return value of the Close() method is not checked.

The Close() function actually returns an error. In the normal course of writing files, we are very used to checking the return value of Write() and usually ignore the return value of Close(). And we usually think that most of the problems will occur during Write() rather than during Close(). This contributes to our disregard for the return value of Close().

So the question is, does the success of Write() necessarily mean that the contents of the file have been successfully persisted to the storage device? It doesn’t .

Since it’s not, it means we’re probably risking file data loss.

Write is a sure success?

Anyone who knows a little bit about computer architecture knows that the farther away you are from the CPU, the slower the CPU will be able to manipulate the data it needs. The fastest is the CPU’s internal registers, the slower is RAM, and the slowest is network I/O and disk I/O. Therefore, if every Write() operation commits the data to be written to disk synchronously, the operating system’s response time will probably be extremely slow.

Another perverse behavior is to write files byte by byte. Anyone familiar with mechanical hard drives knows that the hard drive moves the mechanical arm and head to the specified sector when writing data. You can imagine how horrible it is to write a byte and possibly move it once.

But fortunately, these things don’t really happen. At the operating system and hard drive level itself, they both do their own caching (Cache) and buffering (Buffer). Cache puts hot data in memory so that you don’t have to actually access the drive every time you read the same data. Buffering is used to accumulate data, so that small amounts of data that are written multiple times can be combined together and then made available to the hard drive for writing all at once. Both caching and buffering can greatly improve the performance of reading and writing data.

So, when we call Write() and it returns “successfully”, it only means that the data was successfully cached. It is not known whether the data was actually dropped (meaning that the data was successfully persisted to the storage device). The best time to drop the disk is determined by the operating system and the disk itself. If you experience a sudden power failure before the disk is dropped (but Write succeeded), then the data is most likely lost (the file system itself has the relevant recovery capabilities, so this article will not discuss this category).

Better error timing

The operating system, unsurprisingly, assumes that after we Close() a file, there will be no more operations. This is when it reports errors to us. So, the aforementioned “luck” brings us the “misfortune” that we may encounter very few errors with Write(). But in the last step Close(), all the previous errors are thrown together.

This can also be verified by looking at the instructions for close (man 2 close #darwin).

1
2
3
4
5
6
7
The close() system call will fail if:

[EBADF]            fildes is not a valid, active file descriptor.

[EINTR]            Its execution was interrupted by a signal.

[EIO]              A previously-uncommitted write(2) encountered an input/output error.

The EIO tells it like it is: you finally close the file, but I have to tell you that the writes before closing the file failed!

How unfortunate is that!

Focus on the Close() error

After the above discussion, I think we can agree that the return value of Close() is very important!

So we may have to change the code:

1
2
3
4
5
6
7
8
9
f, err := os.Create(`path`)
if err != nil {
	panic(err)
}
defer func() {
	if err := f.Close(); err != nil {
		panic(err) // 或设置到函数返回值中
	}
}()

Don’t accidentally change the logic of defer

This is written with the usage of defer in Go in mind: the arguments of the function being deferred (including the receiver) are evaluated during the execution of the defer statement.

So, assuming that f is reused later, note the difference.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// 写法一:同一个变量被复用,defer 两次,是正确的用法
f, err := os.Create(`path1`)
if err != nil {
	panic(err)
}
defer f.Close()

f, err = os.Create(`path2`)
if err != nil {
	panic(err)
}
defer f.Close()

// 写法二:会导致前一个 f 没有 Close(),后面的 f 被 Close() 两次

f, err := os.Create(`path1`)
if err != nil {
	panic(err)
}
defer func() {
	if err := f.Close(); err != nil {
		panic(err)
	}
}()

f, err = os.Create(`path2`)
if err != nil {
	panic(err)
}
defer func() {
	if err := f.Close(); err != nil {
		panic(err)
	}
}()

But the focus of this article is not on the usage of defer itself, so I won’t discuss it anymore, just a reminder: if you read this article and want to refactor the code, don’t accidentally change the original defer logic in the code.

Or, don’t put Close() in defer.

1
2
3
4
5
6
7
8
f, err := os.Create(`path`)
if err != nil {
	return err
}

// 写入操作...

return f.Close()

However, if you write it this way, you won’t experience the benefits of defer, so it’s not recommended.

Close() also does not report an error, so everything is fine?

Not really.

If you read the operating system’s close() help again, you’ll find something new.

Note: A successful close does not guarantee that the data has been successfully saved to disk, as the kernel uses the buffer cache to defer writes. Typically, filesystems do not flush buffers when a file is closed. If you need to be sure that the data is physically stored on the underlying disk, use fsync(2). (It will depend on the disk hardware at this point.)

Simply put, a successful close() does not guarantee that the data is also successfully stored to disk, because the OS kernel uses buffering to delay write operations. And generally speaking, the file system does not flush when the file is closed. If you want to make sure that the data is actually stored to disk successfully, you need to call fsync() (which corresponds to the f.Sync() method in Go, not to be distinguished later).

One thing not mentioned above is that fsync() will cause the data to be stored to disk immediately. Of course, it is conceivable that this must have a performance impact: the

  • Originally it was: without calling fsync(), the data might have failed to store to disk, but the program didn’t know about it and the whole function would return quickly.
  • Now it is: with fsync() called, the data still has the store-to-disk failure problem, but the program can know about it, which increases the function wait time.

Conclusion

  • This article does not conclude with best practices; users should determine for themselves whether to Sync() and how to compromise.
  • The Go language documentation for Close() and Sync() is not very detailed and requires reference to the operating system documentation.