In fact, I had no intention of looking at the performance of JSON libraries, but I recently did a pprof on my project and found from the flame chart below that more than half of the performance consumption in business logic processing is in the JSON parsing process, so this article came about.

sobyte

This article dives into the source code to analyze how the standard library in Go parses JSON, and then looks at what are some of the more popular Json parsing libraries, and what features these libraries have that can help us develop better in what scenarios.

The following libraries are introduced for analysis.

Library Name Star
Standard Library JSON Unmarshal
valyala/fastjson 1.2 k
tidwall/gjson 8.3 k
buger/jsonparser 4 k

The json-iterator library is also a very famous library, but I measured the performance difference between it and the standard library is very small, so it is worth using the standard library.

Jeffail/gabs and bitly/go-simplejson use the standard library’s Unmarshal for parsing directly, so the performance is the same as the standard library, and will not be mentioned in this article.

easyjson is a library that needs to generate serialization code for each structure like protobuf, which is very invasive and I personally don’t like it very much, so I didn’t mention it.

These libraries above are the more well-known and still iterating JSON parsing libraries that I can find with Star numbers greater than 1k, if there are any missing ones, you can contact me and I will add them.

Standard Library JSON Unmarshal

Analysis

1
func Unmarshal(data []byte, v interface{})

The official JSON parsing library takes two parameters, one for the object to be serialized and the other for the type of the object.

Before the actual JSON parsing is performed, reflect.ValueOf is called to get the reflected object of the parameter v. Then it gets the first non-null character of the incoming data object to define which way to parse it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
func (d *decodeState) value(v reflect.Value) error {
    switch d.opcode {
    default:
        panic(phasePanicMsg)
    // 数组 
    case scanBeginArray:
        ...
    // 结构体或map
    case scanBeginObject:
        ...
    // 字面量,包括 int、string、float 等
    case scanBeginLiteral:
        ...
    }
    return nil
}

If the object being parsed starts with [, it means it is an array object and goes to the scanBeginArray branch; if it starts with {, it means the object being parsed is a structure or map, then it goes to the scanBeginObject branch, etc.

Take the example of parsing an object.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
func (d *decodeState) object(v reflect.Value) error {
    ...  
    var fields structFields
    // 检验这个对象的类型是 map 还是 结构体
    switch v.Kind() {
    case reflect.Map: 
        ...
    case reflect.Struct:
        // 缓存结构体的字段到 fields 对象中
        fields = cachedTypeFields(t)
        // ok
    default:
        d.saveError(&UnmarshalTypeError{Value: "object", Type: t, Offset: int64(d.off)})
        d.skip()
        return nil
    }

    var mapElem reflect.Value
    origErrorContext := d.errorContext
    // 循环一个个解析JSON字符串中的 key value 值
    for {  
        start := d.readIndex()
        d.rescanLiteral()
        item := d.data[start:d.readIndex()]
        // 获取 key 值
        key, ok := unquoteBytes(item)
        if !ok {
            panic(phasePanicMsg)
        } 
        var subv reflect.Value
        destring := false   
        ... 
        // 根据 value 的类型反射设置 value 值 
        if destring {
            // value 值是字面量会进入到这里
            switch qv := d.valueQuoted().(type) {
            case nil:
                if err := d.literalStore(nullLiteral, subv, false); err != nil {
                    return err
                }
            case string:
                if err := d.literalStore([]byte(qv), subv, true); err != nil {
                    return err
                }
            default:
                d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal unquoted value into %v", subv.Type()))
            }
        } else {
            // 数组或对象会递归调用 value 方法
            if err := d.value(subv); err != nil {
                return err
            }
        }
        ...
        // 直到遇到 } 最后退出循环
        if d.opcode == scanEndObject {
            break
        }
        if d.opcode != scanObjectValue {
            panic(phasePanicMsg)
        }
    }
    return nil
}
  1. first caches the structure object.
  2. iterates through the structure object.
  3. finds the key value in the structure and then finds the type of the field of the same name in the structure.
  4. recursively call the value method to reflect the value of the structure.
  5. until the loop ends at the end of } in JSON.

Summary

By looking at the Unmarshal source code you can see that it uses a lot of reflection to get the field value, if it is a multi-layer nested JSON words, then you also need to recursively reflect the value, so you can imagine that the performance is very poor.

But if performance is not so important, it is actually a very good choice to use it directly, while the function is perfect and the official has been iterative optimization, maybe in future versions of the performance will also get a qualitative leap.

fastjson

Github: https://github.com/valyala/fastjson

This library is as fast as its name implies, and its introduction page says this.

Fast. As usual, up to 15x faster than the standard encoding/json.

It is also very simple to use, as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
func main() {
    var p fastjson.Parser
    v, _ := p.Parse(`{
                "str": "bar",
                "int": 123,
                "float": 1.23,
                "bool": true,
                "arr": [1, "foo", {}]
        }`)
    fmt.Printf("foo=%s\n", v.GetStringBytes("str"))
    fmt.Printf("int=%d\n", v.GetInt("int"))
    fmt.Printf("float=%f\n", v.GetFloat64("float"))
    fmt.Printf("bool=%v\n", v.GetBool("bool"))
    fmt.Printf("arr.1=%s\n", v.GetStringBytes("arr", "1"))
}
// Output:
// foo=bar
// int=123
// float=1.230000
// bool=true
// arr.1=foo

To use fastjson, you must first pass the parsed JSON string to the Parser parser for parsing, and then get it from the object returned by the Parse method. If it is a nested object, you can directly pass the corresponding parent and child key when passing the Get method.

Analysis

fastjson is designed to be different from the standard library Unmarshal in that it divides JSON parsing into two parts: Parse and Get.

Parse is responsible for parsing the JSON string into a structure and returning it, and then fetching the data through the returned structure. The Parse parsing process is lock-free, so if you want to call Parse concurrently, you need to use ParserPool

fastjson iterates through the JSON from top to bottom, and then stores the parsed data in a Value structure.

1
2
3
4
5
6
type Value struct {
    o Object
    a []*Value
    s string
    t Type
}

This structure is very simple into.

  • o Object : indicates that the structure being parsed is an object.
  • a []*Value : indicates that the structure being parsed is an array.
  • s string : if the structure being parsed is neither an object nor an array, then values of other types are stored in this field as strings.
  • t Type : indicates the type of this structure, which is TypeObject, TypeArray, TypeString, TypeNumber, etc.
1
2
3
4
5
6
7
8
9
type Object struct {
    kvs           []kv
    keysUnescaped bool
}

type kv struct {
    k string
    v *Value
}

This structure holds the recursive structure of the object. If you parse the JSON string in the example above, it will look like this structure.

sobyte

Code

In terms of code implementation, the whole parsing process becomes very refreshing because there is no reflection part of the code. Let’s look at the main part of the parsing directly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
func parseValue(s string, c *cache, depth int) (*Value, string, error) {
    if len(s) == 0 {
        return nil, s, fmt.Errorf("cannot parse empty string")
    }
    depth++
    // 最大深度的json串不能超过MaxDepth
    if depth > MaxDepth {
        return nil, s, fmt.Errorf("too big depth for the nested JSON; it exceeds %d", MaxDepth)
    }
    // 解析对象
    if s[0] == '{' {
        v, tail, err := parseObject(s[1:], c, depth)
        if err != nil {
            return nil, tail, fmt.Errorf("cannot parse object: %s", err)
        }
        return v, tail, nil
    }
    // 解析数组
    if s[0] == '[' {
        ...
    }
    // 解析字符串
    if s[0] == '"' {
        ...
    } 
    ...
    return v, tail, nil
}

parseValue determines the type to be parsed based on the first non-null character of the string. Here an object type is used to do the parsing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
func parseObject(s string, c *cache, depth int) (*Value, string, error) {
    ...
    o := c.getValue()
    o.t = TypeObject
    o.o.reset()
    for {
        var err error
        // 获取Ojbect结构体中的 kv 对象
        kv := o.o.getKV()
        ... 
        // 解析 key 值

        kv.k, s, err = parseRawKey(s[1:])
        ... 
        // 递归解析 value 值
        kv.v, s, err = parseValue(s, c, depth)
        ...
        // 遇到 ,号继续往下解析
        if s[0] == ',' {
            s = s[1:]
            continue
        }
        // 解析完毕
        if s[0] == '}' {
            return o, s[1:], nil
        }
        return nil, s, fmt.Errorf("missing ',' after object value")
    }
}

The parseObject function is also very simple, it will get the key value in the loop body, then call parseValue to recursively parse the value, parsing the JSON object from top to bottom until it finally encounters } and exits.

Summary

The above analysis shows that fastjson is much simpler to implement than the standard library, and its performance is much higher. After parsing the JSON tree with Parse, it can be used repeatedly, avoiding the need for repeated parsing and thus improving performance.

However, its functionality is very rudimentary, with no common operations such as JSON to Struct or JSON to map. If you just want to simply get the value in JSON, then it is very convenient to use this library, but if you want to convert the JSON value into a structure, you need to set the value one by one.

GJSON

Github: https://github.com/tidwall/gjson

GJSON in my test, although the performance is not fastjson so extreme, but the function is very perfect, the performance is also quite OK, the following is a brief introduction to the functions of GJSON.

The use of GJSON is similar to fastjson, is also very simple, as long as the parameters passed in the json string and the need to obtain the value can be.

1
2
json := `{"name":{"first":"li","last":"dj"},"age":18}`
lastName := gjson.Get(json, "name.last")

In addition to this feature simple fuzzy matching is also possible, with support for wildcards * and ? in the key. , * matches any number of characters, and ? matches a single character, as follows.

1
2
3
4
5
6
7
json := `{
    "name":{"first":"Tom", "last": "Anderson"},
    "age": 37,
    "children": ["Sara", "Alex", "Jack"]
}`
fmt.Println("third child*:", gjson.Get(json, "child*.2"))
fmt.Println("first c?ild:", gjson.Get(json, "c?ildren.0"))
  • child*.2: first child* matches to children, .2 reads the 3rd element.
  • c?ildren.0: c?ildren matches to children and .0 reads the first element.

Modifier operations are also supported in addition to fuzzy matching.

1
2
3
4
5
6
json := `{
    "name":{"first":"Tom", "last": "Anderson"},
    "age": 37,
    "children": ["Sara", "Alex", "Jack"]
}`
fmt.Println("third child*:", gjson.Get(json, "children|@reverse"))

children|@reverse first reads the array children, then flips it with the modifier @reverse and returns it, outputting it.

1
2
nestedJSON := `{"nested": ["one", "two", ["three", "four"]]}`
fmt.Println(gjson.Get(nestedJSON, "nested|@flatten"))

@flatten returns the inner array of the array nested after flattening it to the outer one.

1
["one","two","three", "four"]

There are some other interesting features, you can check the official documentation.

Analysis

The Get method parameter of GJSON is composed of two parts, one is a JSON string, and the other is called Path, which indicates the matching path of the JSON value to be obtained.

In GJSON, because there are many defined parsing scenarios to be met, parsing is divided into two parts, and the Path needs to be parsed first before traversing the parsed JSON string.

In the parsing process, if you encounter a value that can be matched, then it will be returned directly without further traversal, and if it matches multiple values, then the entire JSON string will be traversed. If a Path does not match in the JSON string, the entire JSON string will be traversed.

During the parsing process, the parsed content is not stored in a structure like fastjson, which can be used repeatedly. So when you call GetMany to return multiple values, you actually need to iterate through the JSON string several times, so it’s less efficient.

sobyte

In addition, the JSON is not checked when it is parsed, even if the string is not a JSON, it will still be parsed, so the user needs to make sure it is a JSON.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
func Get(json, path string) Result {
    // 解析 path 
    if len(path) > 1 {
        ...
    }
    var i int
    var c = &parseContext{json: json}
    if len(path) >= 2 && path[0] == '.' && path[1] == '.' {
        c.lines = true
        parseArray(c, 0, path[2:])
    } else {
        // 根据不同的对象进行解析,这里会一直循环,直到找到 '{' 或 '['
        for ; i < len(c.json); i++ {
            if c.json[i] == '{' {
                i++

                parseObject(c, i, path)
                break
            }
            if c.json[i] == '[' {
                i++
                parseArray(c, i, path)
                break
            }
        }
    }
    if c.piped {
        res := c.value.Get(c.pipe)
        res.Index = 0
        return res
    }
    fillIndex(json, c)
    return c.value
}

Inside the Get method you can see a long string of code that is used to parse various Paths, and then a for loop that iterates through the JSON until it finds ‘{’ or ‘[’, and then does the appropriate logic to process it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
func parseObject(c *parseContext, i int, path string) (int, bool) {
    var pmatch, kesc, vesc, ok, hit bool
    var key, val string
    rp := parseObjectPath(path)
    if !rp.more && rp.piped {
        c.pipe = rp.pipe
        c.piped = true
    }
    // 嵌套两个 for 循环 寻找 key 值
    for i < len(c.json) {
        for ; i < len(c.json); i++ {
            if c.json[i] == '"' { 
                i++
                var s = i
                for ; i < len(c.json); i++ {
                    if c.json[i] > '\\' {
                        continue
                    }
                    // 找到 key 值跳转到 parse_key_string_done
                    if c.json[i] == '"' {
                        i, key, kesc, ok = i+1, c.json[s:i], false, true
                        goto parse_key_string_done
                    }
                    ...
                }
                key, kesc, ok = c.json[s:], false, false
            // 直接break
            parse_key_string_done:
                break
            }
            if c.json[i] == '}' {
                return i + 1, false
            }
        }
        if !ok {
            return i, false
        }
        // 校验是否是模糊匹配
        if rp.wild {
            if kesc {
                pmatch = match.Match(unescape(key), rp.part)
            } else {
                pmatch = match.Match(key, rp.part)
            }
        } else {
            if kesc {
                pmatch = rp.part == unescape(key)
            } else {
                pmatch = rp.part == key
            }
        }
        // 解析 value
        hit = pmatch && !rp.more
        for ; i < len(c.json); i++ {
            switch c.json[i] {
            default:
                continue
            case '"':
                i++
                i, val, vesc, ok = parseString(c.json, i)
                if !ok {
                    return i, false
                }
                if hit {
                    if vesc {
                        c.value.Str = unescape(val[1 : len(val)-1])
                    } else {
                        c.value.Str = val[1 : len(val)-1]
                    }
                    c.value.Raw = val
                    c.value.Type = String
                    return i, true
                }
            case '{':
                if pmatch && !hit {
                    i, hit = parseObject(c, i+1, rp.path)
                    if hit {
                        return i, true
                    }
                } else {
                    i, val = parseSquash(c.json, i)
                    if hit {
                        c.value.Raw = val
                        c.value.Type = JSON
                        return i, true
                    }
                }
            ...
            break
        }
    }
    return i, false
}

In the above look at parseObject this code is not actually want to let you learn how to parse JSON, and traverse the string, but want to let you see how a bad case. for loop layer after layer, if one after another to see my San value dropped wildly, this code does not look very familiar to you? Is it a bit like the code written by a colleague you met at work?

Summary

Advantages:

  1. performance is not bad compared to the standard library.
  2. high playability, can be a variety of search, custom return values, which is very convenient.

Disadvantages.

  1. does not verify the correctness of JSON.
  2. the code smell is very heavy.

Note that if you need to parse the returned JSON value, GetMany function will iterate through the JSON string again and again according to the specified key value, parsing to map can reduce the number of iterations.

jsonparser

Github: https://github.com/buger/jsonparser

This is also one of the more popular and claimed high performance, capable of parsing ten times faster than standard libraries.

Analysis

jsonparser is also a byte slice of JSON that can be passed in to quickly locate the corresponding value and return it by passing in multiple key values.

Like GJSON, there is no data structure to cache parsed JSON strings like fastjson, but you can use the EachKey function to parse multiple values when you need to parse multiple values, and you only need to traverse the JSON string once to get multiple values.

If you encounter a value that can be matched, then it will be returned directly without further traversal, and if it matches multiple values, then the entire JSON string will be traversed. If a Path does not match in the JSON string, the entire JSON string will be traversed.

And when traversing the JSON string by means of a loop to reduce the use of recursion, reducing the depth of the call stack, to a certain extent, can also improve performance.

In terms of functionality, the ArrayEach, ObjectEach, and EachKey functions can be passed into a custom function to achieve personalized requirements through the function, making it much more practical.

For jsonparser, the code is nothing to analyze, very clear, interested in their own to see.

Summary

The reasons for the high performance of jsonparser compared to the standard library can be summarized as follows.

  1. the use of for loops to reduce the use of recursion.
  2. not using reflection compared to the standard library.
  3. the corresponding key value found in the lookup will exit directly, so you can continue without recursion.
  4. the JSON strings that are manipulated are already passed in, so there is no need to reapply for new space, which reduces memory allocation.

In addition to the design of the api is also very practical, ArrayEach, ObjectEach, EachKey and other three functions can be passed into a custom function in the actual business development to solve a lot of problems.

The disadvantage is also very obvious, you can not check the JSON, even if the input is not JSON.

Performance Comparison

Parsing small JSON strings

Parse a simple structured string of about 190 bytes in size

Library Name Operation Time Per Iteration Number of Memory Occupied Number of Memory Allocations Performance
standard library resolves to map 724 ns/op 976 B/op 51 allocs/op slow
resolves to struct 297 ns/op 256 B/op 5 allocs/op general
fastjson get 68.2 ns/op 0 B/op 0 allocs/op fast
parse 35.1 ns/op 0 B/op 0 allocs/op fast
GJSON to-map 255 ns/op 1009 B/op 11 allocs/op general
get 232 ns/op 448 B/op 1 allocs/op general
jsonparser get 106 ns/op 232 B/op 3 allocs/op fast

Parsing a medium-sized JSON string

Parsing a string with a certain complexity and a size of about 2.3KB

Library Name Operation Time Per Iteration Number of Memory Occupied Number of Memory Allocations Performance
standard library resolves to map 4263 ns/op 10212 B/op 208 allocs/op slow
parse as struct 4789 ns/op 9206 B/op 259 allocs/op slow
fastjson get 285 ns/op 0 B/op 0 allocs/op fast
parse 302 ns/op 0 B/op 0 allocs/op fast
GJSON to-map 2571 ns/op 8539 B/op 83 allocs/op general
get 1489 ns/op 448 B/op 1 allocs/op general
jsonparser get 878 ns/op 2728 B/op 5 allocs/op fast

Parsing large JSON strings

Parsing strings of higher complexity, about 2.2MB in size.

Library Name Operation Time Per Iteration Number of Memory Occupied Number of Memory Allocations Performance
standard library resolves to map 2292959 ns/op 5214009 B/op 95402 allocs/op slow
resolves to struct 1165490 ns/op 2023 B/op 76 allocs/op general
fastjson get 368056 ns/op 0 B/op 0 allocs/op fast
parse 371397 ns/op 0 B/op 0 allocs/op fast
GJSON to-map 1901727 ns/op 4788894 B/op 54372 allocs/op general
get 1322167 ns/op 448 B/op 1 allocs/op general
jsonparser get 233090 ns/op 1788865 B/op 376 allocs/op fastest

Summary

In this sharing process, I found a lot of JSON parsing libraries for comparative analysis, and I found that these high-performance parsing libraries basically have some common features:

  • Do Not using reflection.
  • Parsing JSON strings one by one by iterating through the bytes of the string.
  • try to use the incoming JSON string for parsing traversal to reduce memory allocation.
  • sacrificing some compatibility.

Nonetheless, functionally, each has certain features fastjson has the simplest api operation; GJSON offers fuzzy lookups and the highest degree of customization; jsonparser provides a degree of convenience in achieving high-performance parsing and also inserts callback functions for execution.

To sum up, back to the beginning of the article, for my own business, the business is only a simple parsing of some of the fields of the JSON string returned by the http request, and the fields are determined, no search function, but sometimes need to do some custom operations, so for me jsonparser is the most appropriate.

So if you have certain requirements for performance, you may want to pick a JSON parser with your business situation.