In February 2020, Wang Yin trolled the type system of Java, saying.

One of the more advanced interview questions about a programmer’s understanding of the Java type system is this:

1
2
3
4
5
6
public static void f() {
   String[] a = new String[2];
   Object[] b = a;
   a[0] = "hi";
   b[1] = Integer.valueOf(42);
}

Which line in this code is wrong? Why? If some version of Java can run this code without a problem, how can the error be exposed as more fatal? Note that the “wrong” line here is essentially, in principle, wrong.

So what does “error” mean here?

TL;DR

If I could only answer this question in one sentence, it would be:

Java arrays do not support generics, which breaks type safety in Java.

Some prerequisites for a type system

A good type system that detects errors as early as possible , such as when you assign a String to an int variable, the compiler will report an error instead of waiting for the program to run before reporting an error.

What’s wrong with Java’s array design

For the sake of simplicity, let’s assume that Java supports paradigmatic arrays, such as <? >[] like this representation.

1
2
3
4
5
6
public static void f() {
    String[] a = new String[2]; // 1
    Object[] b = a; // 2
    a[0] = "hi"; // 3
    b[1] = Integer.valueOf(42); // 4
}

In the above code, there is actually a hint of something wrong at the second step. Converting a String[] to an Object[] causes the type details of the array to “escape” from the type system.

Or in more understandable terms: in step 4, the compiler should report an error when stuffing an String[] with an Integer object.

If you could start over, how would you design it?

If we follow a perfect type system, Wang Yin’s code should look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// 我们依然假设Java支持了范型数组
public static void f() {
    // 1. a是一个数组,里面存储的是String或者子类
    <String>[]a = new String[2];
    // 2. b是一个数组,里面存储的类型是String的一个父类,比如是Object吧
    <? super String>[]b = a;
    // 3. 往a里面写String
    a[0] = "hi";
    // 4. 往b里面写一个Integer
    b[1] = Integer.valueOf(42);
}

The program looks a lot more normal this time, and according to the rules of the Java paradigm, it also triggers a compilation failure in the second step without any problems. <String>[] is converted to <? super String>[], which of course doesn’t work, otherwise the type system wouldn’t be able to tell when stuffing Integer objects into it later.

The problem doesn’t end there

Converting <String>[] to <? super String>[], essentially for reading: you can read a String as an Object.

The above example, on the other hand, implements writing. Can’t parametric arrays support reading?

Of course not. The upper and lower bounds of the paradigm is used to do these qualifications, the sample code is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
public static void f() {
    // a为一个数组,里面存储的是String或者子类
    <String>[] a = new String[100];

    // 写入
    // b中存储的是String的一个父类(也有可能是String,下同),String是下界
    <? super String>[] writeonlyA = a;
    // 这时候就可以写入元素了(符合下界限定)
    writeonlyA[0] = "Hi";
    // 无法读取元素(无法符合上界限定)
    // 编译器报错,无法推断elem的类型
    // elem = writeonlyA[0];

    // 读取
    // d为一个数组,里面存储的类型是String的一个子类,编译器会把它当作String来处理
    <? extends String>[] readonlyA = a;
    // 从readonlyA读取
    String elem = readonlyA[0];
    // 向readonlyA写入
    // 编译器报错,无论等号右面是什么类型,都无法保证符合类型约定,因为readonlyA没有明确的下界
    // readonlyA[0] = "Hi";
}
  • The lower bound is restricted in the type parameter by <? super T> restricts the lower bound, then writing is not a problem, you can always write to it as a T type, but reading becomes less likely.
  • The upper bound is restricted in the type parameter by <? extens T> in the type parameter, then reading is not a problem, it can always be read as T, but writing becomes less likely.

Is there a simpler formulation?

In the type system, lists and arrays are similar, and it just so happens that Java’s lists support paradigms, so let’s rewrite the above example with lists.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public static void f() {
    // a为一个数组,里面存储的是String或者子类
    List<String> a = new ArrayList<String>();

    // **类型安全的写入**
    // b中存储的是String的一个父类
    List<? super String> writeonlyA = a;
    // 这时候就可以写入元素了(符合上界限定)
    writeonlyA.add("Hi");
    // 从writeonlyA里面读取的类型只能是Object
    // 因为我们将a转为了更加"宽泛"的类型了
    Object x = writeonlyA.get(1);
    // 如果你想写入Integer(王垠的例子)
    // 下面这一句会报错
    // List<? super Integer> c = a;

    // **类型安全的读取**
    // d为一个数组,里面存储的类型是String的一个子类
    List<? extends String> readonlyA = a;
    // 往a里面写String
    a.add("hi");
    // 从readonlyA里面读取,类型系统可以很好的约束这个行为
    String xx = readonlyA.get(0);
    // 尝试写入的话,没有明确下界,无法写入,编译器会报错
    // readonlyA.add("d");
}

As you can see, the above program with List, using the type system + the upper and lower bounds of the paradigm, is perfect to limit the type insecurity operation.

However, since the array array does not support paradigms, the JVM can only handle arrays as covariant when implementing them, allowing type-unsafe conversions, resulting in a “loophole” in the Java type system.