Analysis of the Problems Caused by Replacing Fastjson With Gson

The security vulnerability of Json serialization framework has always been a topic of conversation among programmers, especially in the past two years, fastjson has been targeted research, and more frequently reported vulnerabilities, a vulnerability does not matter, but the security team is always using email to urge the online application to upgrade the dependency, which can be fatal, I believe that many people are also unbearable, consider using other serialization framework to replace fastjson. No, we recently had a project where fastjson was replaced by gson, which caused a problem on line. Share this experience so that you do not encounter the same problem.

Problem Description

A very simple logic on the wire, serialize the object into fastjson and send the string using HTTP request.

It was working fine, but after replacing fastjson with gson, it triggered an OOM on the wire.

After memory dump analysis, it was found that a 400 M+ message was sent, and because the HTTP tool did not do the send size checksum, the transmission was forced, which directly led to the overall unavailability of the online service.

Problem Analysis

Why the same Json serialization, fastjson did not have any problems, but immediately exposed after switching to gson? By analyzing the memory dump data, we found that the values of many fields are duplicated, and then combined with the characteristics of our business data, we immediately located the problem - gson serialization duplicate objects have serious defects.

A simple example is used directly to illustrate the problem at that time. Simulate the data characteristics on line, using List<Foo> to add into the same reference object

Foo foo = new Foo();
Bar bar = new Bar();
List<Foo> foos = new ArrayList<>();
for(int i=0;i<3;i++){
    foos.add(foo);
}
bar.setFoos(foos);

Gson gson = new Gson();
String gsonStr = gson.toJson(bar);
System.out.println(gsonStr);

String fastjsonStr = JSON.toJSONString(bar);
System.out.println(fastjsonStr);

Observe the printed results:

gson：

`1`	`{"foos":[{"a":"aaaaa"},{"a":"aaaaa"},{"a":"aaaaa"}]}`

fastjson:

`1`	`{"foos":[{"a":"aaaaa"},{"$ref":"$.foos[0]"},{"$ref":"$.foos[0]"}]}`

You can find that gson handles duplicate objects by serializing each object, while fastjson handles duplicate objects by marking all objects except the first one with the reference symbol $ref.

The two different serialization strategies can lead to a qualitative change when the number of individual duplicate objects is very large and when a single object is submitted in a larger size, so let’s compare them for a special scenario.

Compression ratio test

Serialized objects: contain a large number of attributes. To simulate online business data.
Number of repetitions: 200. i.e. List contains 200 objects of the same reference to simulate the complex object structure on line and expand the variability.
serialization methods: gson, fastjson, Java, Hessian2. extra Java and Hessian2 control group is introduced to facilitate our understanding of the performance of each serialization framework in this particular scenario.
The main observation is the byte size of each serialization method after compression, because it is related to the size of the network transmission; the secondary observation is whether the list is still the same object after deserialization

public class Main {

    public static void main(String[] args) throws IOException, ClassNotFoundException {
        Foo foo = new Foo();
        Bar bar = new Bar();
        List<Foo> foos = new ArrayList<>();
        for(int i=0;i<200;i++){
            foos.add(foo);
        }
        bar.setFoos(foos);
        // gson
        Gson gson = new Gson();
        String gsonStr = gson.toJson(bar);
        System.out.println(gsonStr.length());
        Bar gsonBar = gson.fromJson(fastjsonStr, Bar.class);
        System.out.println(gsonBar.getFoos().get(0) == gsonBar.getFoos().get(1));  
        // fastjson
        String fastjsonStr = JSON.toJSONString(bar);
        System.out.println(fastjsonStr.length());
        Bar fastjsonBar = JSON.parseObject(fastjsonStr, Bar.class);
        System.out.println(fastjsonBar.getFoos().get(0) == fastjsonBar.getFoos().get(1));
				// java
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        ObjectOutputStream oos = new ObjectOutputStream(byteArrayOutputStream);
        oos.writeObject(bar);
        oos.close();
        System.out.println(byteArrayOutputStream.toByteArray().length);
        ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(byteArrayOutputStream.toByteArray()));
        Bar javaBar = (Bar) ois.readObject();
        ois.close();
        System.out.println(javaBar.getFoos().get(0) == javaBar.getFoos().get(1));
        // hessian2
        ByteArrayOutputStream hessian2Baos = new ByteArrayOutputStream();
        Hessian2Output hessian2Output = new Hessian2Output(hessian2Baos);
        hessian2Output.writeObject(bar);
        hessian2Output.close();
        System.out.println(hessian2Baos.toByteArray().length);
        ByteArrayInputStream hessian2Bais = new ByteArrayInputStream(hessian2Baos.toByteArray());
        Hessian2Input hessian2Input = new Hessian2Input(hessian2Bais);
        Bar hessian2Bar = (Bar) hessian2Input.readObject();
        hessian2Input.close();
        System.out.println(hessian2Bar.getFoos().get(0) == hessian2Bar.getFoos().get(1));
    }

}

Output results:

gson:
62810
false

fastjson:
4503
true

Java:
1540
true

Hessian2:
686
true

Conclusion Analysis: Due to the large size of a single object after serialization, the use of reference representation can be a good way to reduce the volume, it can be found that gson does not take this serialization optimization strategy, resulting in volume expansion. Even Java serialization, which is not always favored, is much better than it, and Hessian2 is even more exaggerated, which is directly optimized by 2 orders of magnitude than gson. And after deserialization, gson does not restore the same reference back to the original object, while other serialization frameworks can achieve this.

Throughput Testing

In addition to the size of the data after serialization, the throughput of each serialization is also a point of interest. The throughput of each serialization method can be accurately tested using benchmark tests.

@BenchmarkMode({Mode.Throughput})
@State(Scope.Benchmark)
public class MicroBenchmark {

    private Bar bar;

    @Setup
    public void prepare() {
        Foo foo = new Foo();
        Bar bar = new Bar();
        List<Foo> foos = new ArrayList<>();
        for(int i=0;i<200;i++){
            foos.add(foo);
        }
        bar.setFoos(foos);
    }

    Gson gson = new Gson();

    @Benchmark
    public void gson(){
        String gsonStr = gson.toJson(bar);
        gson.fromJson(gsonStr, Bar.class);
    }

    @Benchmark
    public void fastjson(){
        String fastjsonStr = JSON.toJSONString(bar);
        JSON.parseObject(fastjsonStr, Bar.class);
    }

    @Benchmark
    public void java() throws Exception {
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        ObjectOutputStream oos = new ObjectOutputStream(byteArrayOutputStream);
        oos.writeObject(bar);
        oos.close();

        ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(byteArrayOutputStream.toByteArray()));
        Bar javaBar = (Bar) ois.readObject();
        ois.close();
    }

    @Benchmark
    public void hessian2() throws Exception {
        ByteArrayOutputStream hessian2Baos = new ByteArrayOutputStream();
        Hessian2Output hessian2Output = new Hessian2Output(hessian2Baos);
        hessian2Output.writeObject(bar);
        hessian2Output.close();


        ByteArrayInputStream hessian2Bais = new ByteArrayInputStream(hessian2Baos.toByteArray());
        Hessian2Input hessian2Input = new Hessian2Input(hessian2Bais);
        Bar hessian2Bar = (Bar) hessian2Input.readObject();
        hessian2Input.close();
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
            .include(MicroBenchmark.class.getSimpleName())
            .build();

        new Runner(opt).run();
    }

}

Throughput Report:

Benchmark                 Mode  Cnt        Score         Error  Units
MicroBenchmark.fastjson  thrpt   25  6724809.416 ± 1542197.448  ops/s
MicroBenchmark.gson      thrpt   25  1508825.440 ±  194148.657  ops/s
MicroBenchmark.hessian2  thrpt   25   758643.567 ±  239754.709  ops/s
MicroBenchmark.java      thrpt   25   734624.615 ±   66892.728  ops/s

Isn’t it a bit surprising that fastjson leads the way, with the throughput of text class serialization being an order of magnitude higher than that of binary serialization, at a million per second and 100,000 per second, respectively?

Overall Test Conclusion

fastjson serialization with $ reference mark can also be gson correct deserialization, but I did not find the configuration to allow gson serialization into references
fastjson, hesian, java support circular reference resolution; gson does not support
fastjson can set DisableCircularReferenceDetect to turn off the detection of circular references and duplicate references
gson deserialization before the same reference object, after serialization and then deserialization back, will not be considered the same object, may lead to the expansion of the number of memory objects; and fastjson, java, hesian2 serialization method due to the record is the reference mark, there is no such problem
Take my test case as an example, hesian2 has a very strong serialization compression ratio, suitable for large messages serialized for network transmission scenarios
In my test case, for example, fastjson has a very high throughput, which can afford its fast, suitable for scenarios requiring high throughput
Serialization also needs to consider whether to support circular references, whether to support circular object optimization, whether to support enumerated types, collections, arrays, subclasses, polymorphism, internal classes, generalization and other comprehensive scenarios, as well as whether to support visualization and other comparative scenarios, compatibility after adding or deleting fields, and other features. In general, I recommend hessian2 and fastjson two serialization methods

Summary

We all know fastjson in order to fast, do relatively some of the more hack logic, which also leads to more vulnerabilities, but I think the coding is in the trade off, if there is a perfect framework, that other competing frameworks would not exist long ago. I do not have a deep study of each serialization framework, you may say jackson more excellent, I can only say that you can solve the problems encountered in your scenario, that is the right framework.

Finally, when you want to replace the serialization framework must be careful to understand the characteristics of the alternative framework, the original framework may solve the problem, the new framework may not be able to cover well.

Reference https://www.cnkirito.moe/serialize-practice/

Table of Contents