The most common way to handle errors in gRPC is to return the error directly, e.g. return nil, err, but in practice, we also have business status codes to return, and the common way is to define an error code in the returned structure, but this is very cumbersome to write, for example, you may need to write it like this.

1
2
3
4
5
6
7
user, err := dao.GetUserByEmail(ctx, email)
if err != nil  {
    if err == gorm.RecordNotFound {
        return &GetUserResp{Code: USER_NOT_FOUND, Msg: "user not found"}, nil
    }
    return nil, err
}

There are several problems here.

  1. returning errors is a pain to write, because you need to determine the error each time, and then convert it to the corresponding error code in the Code, Msg fields.
  2. if you return err directly, instead of grpc’s self-defined codes.NotFound, the error cannot be recognized in the client.
  3. If you use the gRPC Gateway, any error that is not a grpc custom error will be indicated as 500

For example, for issue 1, we can return err directly, but it will lead to issue 2; for issue 2, we can use 1, but it is troublesome to write; for issue 3, we can use grpc’s built-in error, but its expressiveness is very limited and it cannot convey business error codes.

Therefore, in order to solve this series of problems, after comparing several error handling libraries, we put together a set of error handling system that combines their advantages while adapting to business requirements.

Error Handling Library

Python’s exception system is a very worthwhile design. First, we divide the abnormal execution of a program into errors, which we want to be able to check for and handle, and exceptions, which we can only recover from by recover attempts.

We first divide errors into types of errors, and instances of errors. When defining an error, we define the type of the error, which carries the HTTP status code and the business error code that it should display. When the error is thrown, which is when the error is instantiated, it carries the stack information, execution information, etc. of the error.

For example, the definition error.

1
2
3
4
5
6
ErrBadRequest       = RegisterErrorType(BaseErr, http.StatusBadRequest, ErrCodeBadRequest)             // 400
ErrUnauthorized     = RegisterErrorType(BaseErr, http.StatusUnauthorized, ErrCodeUnauthorized)         // 401
ErrPaymentRequired  = RegisterErrorType(BaseErr, http.StatusPaymentRequired, ErrCodePaymentRequired)   // 402
ErrForbidden        = RegisterErrorType(BaseErr, http.StatusForbidden, ErrCodeForbidden)               // 403
ErrNotFound         = RegisterErrorType(BaseErr, http.StatusNotFound, ErrCodeNotFound)                 // 404
ErrMethodNotAllowed = RegisterErrorType(BaseErr, http.StatusMethodNotAllowed, ErrCodeMethodNotAllowed) // 405

Instantiation error.

1
2
3
4
err = validateReq(req)
if err != nil {
    return nil, errs.NewBadRequest(err.Error(), err)
}

Detection error type.

1
2
3
if errs.IsError(err, ErrBadRequest) {
    //
}

Extraction error.

1
2
3
if baseErr, ok := errs.AsBaseErr(err); ok {
    //
}

With the above set of error libraries, we can happily carry error stack information, error types, error business codes, error HTTP status codes, error messages, meta-errors that cause errors to occur, and also perform type determination and information extraction. So how does it work with gRPC?

gRPC error handling

As we said above, if we use return nil, err directly, the client cannot recognize it accurately, and if we use return Resp{Code, Msg}, nil, it is cumbersome to write, and the gRPC gateway cannot translate it to the corresponding HTTP status code accurately.

Our solution is to return the error system described in the previous section directly, e.g.

1
2
3
4
5
6
func (s *service) CreateUser(ctx context.Context, req *pb.CreateUserReq) (*pb.CreateUserResp, error) {
    err = validateReq(req)
    if err != nil {
        return nil, errs.NewBadRequest(err.Error(), err)
    }
}

Then in the middleware, Resp is extracted and the code and msg are assigned values.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
func UnaryServerInterceptor() grpc.UnaryServerInterceptor {
    return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
        resp, err := handler(ctx, req)
        if err == nil {
            return resp, err
        }

        if errs.IsError(err, errs.BaseErr) {
            return resp, err
        }

        if val := reflect.ValueOf(resp); !val.IsValid() || val.IsNil() {
            tp := getRespType(ctx, info)
            if tp == nil {
                return resp, err
            }
            resp = reflect.New(tp).Interface()
        }

        if be, ok := errs.AsBaseErr(err); ok {
            grpc.SetHeader(ctx, metadata.Pairs("x-http-code", fmt.Sprintf("%d", be.HTTPCode())))
            return baseErrSetter(resp, be)
        }
    }
}

This allows us to automatically serialize the returned errors to the corresponding fields in Resp.

gRPC gateway status code

If we return error directly after processing the error in the previous step, the gRPC gateway will return 500 because it is not an error code within the gRPC system, but if we return nil, the gRPC gateway will return 200 again, neither of which is expected. Since our error system already contains HTTP status codes, can we use them directly? The answer is yes, see the code above, at the end we set a metadata x-http-code, we can register a middleware in the gRPC gateway, using the status code passed here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
mux := runtime.NewServeMux(
    runtime.WithForwardResponseOption(GRPCGatewayHTTPResponseModifier),
)

func GRPCGatewayHTTPResponseModifier(ctx context.Context, w http.ResponseWriter, p proto.Message) error {
    md, ok := runtime.ServerMetadataFromContext(ctx)
    if !ok {
        return nil
    }

    // set http status code
    if vals := md.HeaderMD.Get(httpStatusCodeKey); len(vals) > 0 {
        code, err := strconv.Atoi(vals[0])
        if err != nil {
            return err
        }
        // delete the headers to not expose any grpc-metadata in http response
        delete(md.HeaderMD, httpStatusCodeKey)
        delete(w.Header(), grpcHTTPStatusCodeKey)
        w.WriteHeader(code)
    }

    return nil
}

In this way, we return an instance of ErrBadRequest in gRPC, which will eventually be reflected in the response of gRPC gateway as 400, and ErrForbidden, which will be reflected in gRPC gateway as 403, and our purpose is successfully achieved.

Monitoring

We also provide a set of middleware that can be combined with sentry to collect the error stack.

Summary

The final result of this whole system is

  • gRPC and HTTP can be combined, conforming to the corresponding specification and fully supporting business requirements
  • Errors are graded and classified, and can form an error tree.
  • Able to identify and determine the type, can contain enough information, can customize the error and error type
  • Can combine sentry and monitoring system for error collection and monitoring
  • Simple and easy to use, easy to understand
  • Ability to keep grpc gateway consistent with status codes and error codes in grpc

Ref