Notes on using HeapByteBuffer for in-heap memory

Today we share a pitfall that many people tend to step in: the use of HeapByteBuffer.

ByteBuffer has two main implementation classes

HeapByteBuffer In-heap memory
DirectByteBuffer Off-heap memory

In my personal experience, I tend to use DirectByteBuffer in most cases, both for read and write operations, mainly because HeapByteBuffer may have some unexpected internal operations when interacting with FileChannel, which is the caveat mentioned in the title of this article, so let’s sell it here.

Copy problem of HeapByteBuffer

Without further ado, let’s look directly at where the pitfalls of HeapByteBuffer lie.

Using code that describes the file IO operation of HeapByteBuffer, the probability is that the following code will be written.

public void readInOneThread() throws Exception {
    int bufferSize = 50 * 1024 * 1024;
    File file = new File("/essd");
    FileChannel fileChannel = new RandomAccessFile(file, "rw").getChannel();
    ByteBuffer byteBuffer = ByteBuffer.allocate(bufferSize);
    fileChannel.read(byteBuffer);
}

The above code, caches the data in the file into memory, and this behavior is usually multi-threaded in both competition and production scenarios. For example, under the Cloud Native Programming Challenge evaluation, there are 40 threads for reading and writing, and if caching is done according to the thread dimension, there is naturally no problem to allocate 50M per thread for memory caching.

If you use the above code directly, you may get a memory overflow related exception directly in the evaluation. I actually mentioned this in my previous article on out-of-heap memory leaks, but from a different perspective. The reason is simple, just look at the source code directly.

FileChannel is using IOUtil for read and write operations

static int read(FileDescriptor var0, ByteBuffer var1, long var2, NativeDispatcher var4) throws IOException {
    if (var1.isReadOnly()) {
        throw new IllegalArgumentException("Read-only buffer");
    } else if (var1 instanceof DirectBuffer) {
        return readIntoNativeBuffer(var0, var1, var2, var4);
    } else {
        ByteBuffer var5 = Util.getTemporaryDirectBuffer(var1.remaining());
        int var7;
        try {
            int var6 = readIntoNativeBuffer(var0, var5, var2, var4);
            var5.flip();
            if (var6 > 0) {
                var1.put(var5);
            }
            var7 = var6;
        } finally {
            Util.offerFirstTemporaryDirectBuffer(var5);
        }
        return var7;
    }
}

You can see that when using HeapByteBuffer, it goes to the following branch

`1`	`Util.getTemporaryDirectBuffer(var1.remaining());`

This Util encapsulates some of the more underlying IO logic

package sun.nio.ch;
public class Util {
    private static ThreadLocal<Util.BufferCache> bufferCache;
    
    public static ByteBuffer getTemporaryDirectBuffer(int var0) {
        if (isBufferTooLarge(var0)) {
            return ByteBuffer.allocateDirect(var0);
        } else {
            // FOUCS ON THIS LINE
            Util.BufferCache var1 = (Util.BufferCache)bufferCache.get();
            ByteBuffer var2 = var1.get(var0);
            if (var2 != null) {
                return var2;
            } else {
                if (!var1.isEmpty()) {
                    var2 = var1.removeFirst();
                    free(var2);
                }

                return ByteBuffer.allocateDirect(var0);
            }
        }
    }
}

The isBufferTooLarge method determines how to allocate out-of-heap memory based on the size of the incoming Buffer; if it is too large, it directly allocates a large buffer; if it is not too large, it uses the bufferCache ThreadLocal variable for caching and thus multiplexing (in fact, this value is so large that it almost never goes into the branch of directly allocating out-of-heap memory branch). This seems to reveal two remarkable conclusions.

using HeapByteBuffer, both reads and writes go through DirectByteBuffer, and the flow of data for writes is actually: HeapByteBuffer -> DirectByteBuffer -> PageCache -> Disk, and the flow of data for reads is exactly the opposite.
This means that the more threads there are, the more space the temporary DirectByteBuffer will take up.

Based on these two conclusions, let’s go back to the problem. If we read and write directly as described above, each of the 40 threads holds a 50M heap memory, and at the same time, because of the internal behavior of IOUtil, an additional 40*50M out-of-heap memory is allocated, and the out-of-heap memory is inadvertently used up! It is not surprising that there is an out-of-heap memory overflow exception.

Why HeapByteBuffer needs to be copied to DirectByteBuffer during IO

The summary is as follows.

To facilitate GC implementation, the native memory pointed to by DirectByteBuffer is not governed by GC.
The HeapByteBuffer uses byte arrays, which may not occupy contiguous memory, making it less convenient for JNI method calls.
Array implementations may vary from JVM to JVM

Solutions

In fact, we are essentially maintaining a HeapByteBuffer for each thread to cache data, and there is no need to use the size of the ByteBuffer as a dimension for IO. you can optimize this process by borrowing the idea of replicating the DirectByteBuffer in IOUtil. The code example is as follows.

public void directBufferCopy() throws Exception {
    File file = new File("/essd");
    FileChannel fileChannel = new RandomAccessFile(file, "rw").getChannel();
    ByteBuffer byteBuffer = ByteBuffer.allocate(50 * 1024 * 1024);
    ByteBuffer directByteBuffer = ByteBuffer.allocateDirect(4 * 1024);
    for (int i = 0; i < 12800; i++) {
        directByteBuffer.clear();
        fileChannel.read(directByteBuffer, i * 4 * 1024);
        directByteBuffer.flip();
        byteBuffer.put(directByteBuffer);
    }
}

In Java, it must not be possible to omit the copying of out-of-heap memory from disk to in-heap memory, but we can make our own copies, thus making the process more intuitively manipulated by ourselves rather than by the internal logic of the FileChannel.

Note also here that

The DirectByteBuffer used for a single IO should not be too large, but simply act as a transport carrier, serving as a transport for the data. This way, in multi-threaded scenarios, it does not take up too much off-heap memory
The DirectByteBuffer used for a single IO should not be too small, otherwise it will have the problem of read/write amplification, and it is generally recommended to set an integer multiple of 4kb, depending on the actual test results.

Other Notes

The HeapByteBuffer read-write replication problem is the main focus of this article, but there are some other issues to be aware of when using HeapByteBuffer as a cache. For example, in a competition scenario, you may want to open up a large HeapByteBuffer, 6G of internal heap memory, and allocate 4G for caching, right? If you are interested, you can test whether it is feasible or not. You also need to consider the GC situation, and you need to take into account the ratio of old generation to new generation.

Also, if the HeapByteBuffer takes up too much memory, there will be very little PageCache left for the OS, and both are using the same block of memory! If your application takes advantage of the PageCache feature, you may run out of PageCache space, resulting in slower IO speeds.

Summary

This article describes the considerations for using HeapByteBuffer in file IO, taking into account the internal copy of the FileChannel, and realizing that there is an out-of-heap memory copy overhead for this process. In real-world scenarios, I recommend using DirectByteBuffer directly for IO operations. If for some reason you need to use HeapByteBuffer storage as a cache, you can refer to the article on using DirectByteBuffer for IO and replication in batches.

Table of Contents

Copy problem of HeapByteBuffer

Why HeapByteBuffer needs to be copied to DirectByteBuffer during IO

Solutions

Other Notes

Summary