serde is pretty much the most commonly used serialization and deserialization library in the Rust ecosystem today.

Golang Implementation

As a Golang programmer, it’s important to compare.

The official Golang library directly implements serialization and deserialization of json.

For both serialization and deserialization, Go uses a simple interface called interface.

1
2
3
4
5
6
7
8
9
// https://pkg.go.dev/encoding/json#Marshaler
type Marshaler interface {
	MarshalJSON() ([]byte, error)
}

// https://pkg.go.dev/encoding/json#Unmarshaler
type Unmarshaler interface {
	UnmarshalJSON([]byte) error
}

Note that this interface is for JSON only, and the suffix JSON is meaningful in the interface name. Because for the same data type, it may be necessary to implement serialization in many formats, such as yaml, toml, etc. If we go by this naming, it could be: MarshalYaml , MarshalToml.

It looks easy, doesn’t it? Well, it is simple. But in fact there is a catch, or rather, there are some details that must be paid attention to when implementing it.

The implementation of Marshaler must use a normal receiver, (i.e. not a pointer receiver only implementation). This is because pointer only will prevent our own methods from being called during serialization if they are not in pointer form.

The implementation of Unmarshaler must use a pointer receiver, because it is parsing the data to itself, so it must modify itself, unmodifiable is meaningless.

For example, suppose we want to JSON serialize a custom int type (mainly for use as an enum function) to some specific string (mainly for use in JSON structured logs, for more friendly reading).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
type HideType int

const (
	HideTypeNone     HideType = 0
	HideTypeLocation HideType = 1 << 0
	HideTypeAge      HideType = 1 << 1
	HideTypeSex      HideType = 1 << 2
	HideTypeNation   HideType = 1 << 3
)

var _hideTypeValuesMap = map[HideType]string{
	HideTypeNone:     "none",
	HideTypeLocation: "location",
	HideTypeAge:      "age",
	HideTypeSex:      "sex",
	HideTypeNation:   "nation",
}

var _hideTypeValueToType = map[string]HideType{
	"none":     HideTypeNone,
	"location": HideTypeLocation,
	"age":      HideTypeAge,
	"sex":      HideTypeSex,
	"nation":   HideTypeNation,
}

func (ht HideType) String() string {
	if val, ok := _hideTypeValuesMap[ht]; ok {
		return val
	}
	return "unknown"
}

// MarshalJSON impl Marshaler interface https://pkg.go.dev/encoding/json#Marshaler
func (ht HideType) MarshalJSON() ([]byte, error) {
	return []byte(fmt.Sprintf(`"%v"`, ht.String())), nil
}

// UnmarshalJSON impl Unmarshaler interface
// https://pkg.go.dev/encoding/json#Unmarshaler
func (ht *HideType) UnmarshalJSON(rawJSON []byte) error {
	htStr := bytes.Trim(rawJSON, `"`)
	if htInt, ok := _hideTypeValueToType[string(htStr)]; ok {
		*ht = htInt
		return nil
	}
	return fmt.Errorf("parse into HideType failed, unknown HideType: %v", htStr)
}

Yes, the whole implementation is very simple, for MarshalJSON() we just need to return the JSON string we want to return. Note that we must return a legal JSON string type here. Here we use "%v" instead of %q, mainly because in our scenario (like enum) we are 100% sure that we won’t need to escape it again, and if we are not sure, it is better to use %q.

UnmarshalJSON is also very simple to handle since we are sure that the input is legal and the format must be specific. There is no need to think too much about it.

Rust implementation

Of course, serde itself can handle this kind of serialization, we just need to add #[derive(Serialize, Deserialize, Debug)]. This is a custom implementation, mainly from a learning point of view.

Serialization is relatively simple to implement, we just need to implement the Serialize trait in Rust.

1
2
3
4
5
pub trait Serialize {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer;
}

Deserialization is a bit more complicated. Unlike Go, where the implementation details are left to the user, serde requires the data to be exchanged according to the serde data model.

Looking at the Deserialize trait alone, it seems like deserialization is similar. But it actually requires a Vistor trait.

1
2
3
4
5
pub trait Deserialize<'de>: Sized {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>;
}

Syntactically, there is only one expecting method that must be implemented for the Visitor trait, but in practice we have to choose to implement other methods depending on the specific JSON data type. For example, in our case, we can clearly determine that the JSON data we receive is a string type, so we only need visit_str, and note that visit_str is already parsed out by default in serde compared to Golang, where the user has to parse the JSON data manually, so when the input is " foo", in serde we get visit_str as foo, whereas in Go we get "foo", so we have to handle the outer quotes ourselves.

Finally, we want to bind this type, which implements the Visitor trait, to the Deserialize trait of the type we want to deserialize: deserializer.deserialize_str(HideTypeVisitor).

1
2
3
4
5
6
7
8
9
pub trait Visitor<'de>: Sized {
    /// The value produced by this visitor.
    type Value;

    /// Format a message stating what data this Visitor expects to receive.

    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result;
    /// ...
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
use std::collections::HashMap;
use std::fmt;
use serde::{Serialize, Deserialize, Serializer, Deserializer};

use serde:🇩🇪:{Error, Unexpected, Visitor};

use once_cell::sync::Lazy;

[derive(Copy, Clone, PartialEq, Eq, Hash, Debug)]
enum HideType {
    HideTypeNone = 0,
    HideTypeLocation = 1 << 0,
    HideTypeAge = 1 << 1,
    HideTypeSex = 1 << 2,
    HideTypeNation = 1 << 3,
}

static FILTER_TYPE_TO_NAME: [(HideType, &str); 5] = [
    (HideType::HideTypeNone, "none"),
    (HideType::HideTypeLocation, "location"),
    (HideType::HideTypeAge, "age"),
    (HideType::HideTypeSex, "sex"),
    (HideType::HideTypeNation, "nation"),
];

static TYPE_STR_MAP: Lazy<HashMap<HideType, &str>> = Lazy::new(|| {
    let mut m = HashMap::new();
    for (k, v) in FILTER_TYPE_TO_NAME.iter() {
        m.insert(*k, *v);
    }
    m
});

static STR_TYPE_MAP: Lazy<HashMap<&str, HideType>> = Lazy::new(|| {
    let mut m = HashMap::new();
    for (k, v) in FILTER_TYPE_TO_NAME.iter() {
        m.insert(*v, *k);
    }
    m
});

impl Serialize for HideType {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where
            S: Serializer,
    {
        return TYPE_STR_MAP.get(self).unwrap().serialize(serializer);
    }
}

struct HideTypeVisitor;

impl<'de> Visitor<'de> for HideTypeVisitor {
    type Value = HideType;

    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter.write_str(r#"a string in one of: none, location, age, sex, nation"#)
    }

    fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
        where
            E: Error,
    {
        match STR_TYPE_MAP.get(v) {
            Some(t) => Ok(*t),
            None => Err(E::invalid_value(Unexpected::Str(v), &self)),
        }
    }
}

impl<'de> Deserialize<'de> for HideType {
    fn deserialize<D>(deserializer: D) -> Result<HideType, D::Error>
        where
            D: Deserializer<'de>,
    {
        deserializer.deserialize_str(HideTypeVisitor)
    }
}

Of course, it is not necessary to use hashmap here; it is possible to use match directly to handle the conversion from type to string. But the hashmap approach is more natural, and we no longer need to manually maintain a forward and reverse map.

In general, the implementation of custom serialization and deserialization in Golang is relatively straightforward and brute-force. The entire interface definition is also very simple. The serde in Rust has a lot of features and functionality. The first thing you must understand is the serde data model. In contrast to Golang, which uses Go types directly, and whose default implementation directly binds Go types, such as []byte, to its implementation ([]byte data is converted to base64 representation by Golang after serialization). The data mapping in serde is implemented through the serde data model. serde itself is a framework and is not responsible for the implementation, which is provided by extensions such as serde_json, serde_yaml and so on. So the advantage of serde is that the trait is unified, while Go is implemented in each of them, such as the official implementation of JSON, third-party implementation of YAML and so on.

Third-party implementation of the Yaml library:

1
UnmarshalYAML(value *Node) error

The official json library:

1
UnmarshalJSON([]byte) error

As you can see, although the naming and parameters look similar, they are actually different, and of course, there is no official specification that says they should be the same. Suppose the Unmarshaler interface for both json and yaml is defined as Unmarshal([]byte) error, which, since Go’s interface is a duck type, will result in a false assertion.

Another thing is that, since serde is a framework, our custom serialization is also mapped to the framework’s data model, so it is language-independent (json, yaml, toml, etc.). It can be defined once and run everywhere. While Golang’s custom serialization must be specific to a certain type (e.g. JSON), the above example, if replaced by yaml, would have to be implemented again, whereas serde does not have this problem.

Refs