In a distributed system, if Service A has to invoke Service B and multiple instances of both services are deployed, the problem of load balancing has to be solved. That is, we want the QPS reaching B to be balanced across all instances of B.

In previous HTTP/1.1-like implementations, Service A needs to establish a TCP connection with B for each request. So the load balancing implementation is generally based on the number of connections. But establishing a new connection every time would be very low performance. So the Keep-Alive connection was implemented by establishing a TCP connection, sending many requests over it, and reusing the TCP connection. gRPC is based on HTTP/2 and uses this Keep-Alive connection approach.

Using a Keep-Alive connection will improve performance because you don’t have to reestablish a TCP connection every time. But there are some problems.

The first issue is load balancing. This Kubernetes blog talks about why gRPC needs special load balancing. Obviously, the HTTP/1.1 approach, where one instance is chosen at random to call each time, is load balanced. But the HTTP/2 way, which uses one connection all the time, will always be used once it’s connected, and which instance is used depends on who was chosen in the first place.

Even if there is a way to balance the connection at the beginning, there are situations that can break this balance. For example, restarting service instances one by one.

After each restart of an instance, the client that was connected to this instance will be disconnected from it and will connect to other available instances instead. So, the first restarted instance will not be connected after the restart is completed. The other instances will be added: (1/n)/(n-1) * total connections. n is the total number of instances.

Because each instance will increase the number of connections to other instances after restarting, there are two problems: 1.

  1. the first restarted instance will have the most connections in the end, and the last restarted instance will not have connections, which is very unbalanced
  2. the last restarted instance will cause a large number of clients to be reconnected when it restarts

The second problem is that when the server-side instance is added, there will be no client to connect to it. That is, the problem of server migration/online offline. Because all clients are using the original established connection, they will not know that a new instance is available. In the end, it is similar to the first problem.

There are 3 solutions that come to mind.

The first one is to add a load balancer between clinet and server to maintain the connection to the backend, as mentioned in the blog above. This can solve the above two problems perfectly. The disadvantage is that the resources will be higher and the architecture adds complexity.

add a load balancer between clinet and server to maintain the connection to the backend

The second solution is from the server side: the server can send a GOAWAY command to the client from time to time, signaling the client to connect to another server instance. api-server has an option to specify the probability of sending this command to the client: -goaway-chance float.

To prevent HTTP/2 clients from getting stuck on a single apiserver, randomly close a connection (GOAWAY). The client’s other in-flight requests won’t be affected, and the client will reconnect, likely landing on a different apiserver after going through the load balancer again. This argument sets the fraction of requests that will be sent a GOAWAY. Clusters with single apiservers, or which don’t use a load balancer, should NOT enable this. Min is 0 (off), Max is .02 (1/50 requests); .001 (1/1000) is a recommended starting point.

This has the added benefit that when you go offline, instead of a brute force exit, you can send a GOAWAY command to all your current connections. Then you can exit without damage.

The third way is to solve it from the client side: instead of using a single connection to connect to the server, the client uses a connection pool:

  1. each time the client wants to send a request, it needs to first request an available connection from its own connection pool:
    1. at this point, if available, a connection is returned
    2. if not, initiate the connection establishment
  2. after using the connection, put the connection back into the connection pool
  3. the connection pool supports setting some parameters, such as
    1. close the connection if idle for a certain time
    2. after a connection has served a certain number of requests, or has been used a certain number of times, it will be closed and not used again.

This way, one can solve the problem of a connection being used indefinitely, and closing the connection is lossless because the connections inside the connection pool are not given to anyone to use, and are managed by the connection pool itself. In fact, like database clients, such as jdbc, and Redis clients, are implemented in this way.