Samstag, 15. Juni 2013

Subtyping in Java generics

Generic types introduce a new spectrum of type safety to Java program. At the same type, generic types can be quite expressive, especially when using wildcards. In this article, I want to explain how subtyping works with Java generics.

General thoughts on generic type subtyping


Different generic types of the same class or interface do not define a subtype hierarchy linear to the subtype hierarchy of possible generic argument types. This means for example that List<Number> is not a supertype of List<Integer>. The following prominent example gives a good intuition why this kind of subtyping is prohibited:

// assuming that such subtyping was possible
ArrayList<Number> list = new ArrayList<Integer>();
// the next line would cause a ClassCastException
// because Double is no subtype of Integer
list.add(new Double(.1d))

Before discussing this in further detail, let us first think a little bit about types in general: types introduce redundancy to your program. When you define a variable to be of type Number, you make sure that this variable only references objects that know how to handle any method defined by Number such as Number.doubleValue. By doing so, you make sure that you can safely call doubleValue on any object that is currently represented by your variable and you do not longer need to keep track of the actual type of the variable's referenced object. (As long as the reference is not null. The null reference is actually one of the few exceptions of Java's strict type safety. Of course, the null "object" does not know how to handle any method call.) If you however tried to assign an object of type String to this Number-typed variable, the Java compiler would recognize that this object does in fact not understand the methods required by Number and would throw an error because it could otherwise not guarantee that a possible future call to for example doubleValue would be understood. However, if we lacked types in Java, the program would not change its functionality just by that. As long if we never made an errornous method call, a Java program without types would be equivalent. Viewed in this light, types are merely to prevent us developers of doing something stupid while taking away a little bit of our freedom. Additionally, types are a nice way of implicit documentary of your program. (Other programming languages such as Smalltalk do not know types and besides being anoying most of the time this can also have its benefits.)

With this, let's return to generics. By defining generic types you allow users of your generic class or interface to add some type safety to their code because they can restrain themselfs to only using your class or interface in a certain way. When you for example define a List to only contain Numbers by defining List<Number>, you advice the Java compiler to throw an error whenever you for example try to add a String-typed object into this list. Before Java generics, you simply had to trust that the list only contained Numbers. This could be especially painful, when you handed references of your collections to methods defined in third-party code or received collections from this code. With generics, you could assure that all elements in your List were of a certain supertype even at compile time. 

At the same time, by using generics you loose some type-safety within your generic class or interface. When you for example implement a generic List

class MyList<T> extends ArrayList<T> { }

you do not know the type of T within MyList and you have to expect that the type could be as unsophisticated as Object. This is why you can restrain your generic type to require some minimum type:

class MyList<T extends Number> extends ArrayList<T> {
  double sum() { 
  double sum = .0d;
    for(Number val : this) {
      sum += val.doubleValue();
    }
  return sum;
  }
}

This allows you to asume that any object in MyList is a subtype of Number. That way, you gain some type safety within your generic class.

Wildcards


Wildcards are the Java equivalent to saying whatever type. Consequently, you are not allowed to use wildcards when instanciating a type, i.e. defining what concrete type some instance of a generic class should represent. A type instanciation occurs for example when instanciating an object as new ArrayList<Number> where you among other things implicitly call the type constructor of ArrayList which is contained in its class definition

class ArrayList<T> implements List<T> { ... }

with ArrayList<T> being a trivial type constructor with one single argument. Thus, neither within ArrayList's type constructor definition (ArrayList<T>)  nor in the call of this constructor (new ArrayList<Number>) you are allowed to use a wildcard. When you are however only referring to a type without instanciating a new object, you can use wildcards, such as in local variables. Therefore, the following definition is allowed:

ArrayList<?> list;

By defining this variable, you are creating a place holder for an ArrayList of any generic type. With this little restriction of the generic type however, you cannot add objects to the list via its reference by this variable. This is because you made such a general assumption of the generic type represented by the variable list that it would not be safe to add an object of for example type String, because the list beyond list could require objects of any other subtype of some type. In general this required type is unknown and there exists no object which is a subtype of any type and could be added safely. (The exception is the null reference which abrogates type checking. However, you should never add null to collections.) At the same time, all objects you get out of the list will be of type Object because this is the only safe asumption about a common supertype of al possible lists represented by this variable. For this reason, you can form more elaborate wildcards using the extends and super keywords:

ArrayList<? extends Number> list1 = new ArrayList<Integer>();
ArrayList<? super Number> list2 = new ArrayList<Object>();

When a wildcard defines a minimum subtype via extends such as list1, the compiler will enforce that any objects you get out of this list will be some subtype of Number such as for example Integer. Similarly, when defining a maximum subtype via super as in list2, you can expect any list to represent a supertype of Number such as Object. Thus you can safely add instances of any subtype of Number to this list.

Finally, you should note that you can actually use wildcards within type constructors if the used type arguments are itself generic. The following use of a type constructor is for example perfectly legal:

ArrayList<?> list = new ArrayList<List<?>>();

In this example, the requirement that the ArrayList must not be constructed by using a wildcard type is fullfilled because the wildcard is applied on the type argument and not on the constructed type itself.

As for subtyping of generic classes, we can summarize that some generic type is a subtype of another type if the raw type is a subtype and if the generic types are all subtypes to each other. Because of this we can define

List<? extends Number> list = new ArrayList<Integer>();

because the raw type ArrayList is a subtype of List and because the generic type Integer is a subtype of ? extends Number.

Finally, be aware that a wildcard List<?> is a shortcut for List<? extends Object> since this is a commonly used type definition. If the generic type constructor does however enforce another lower type boundary as for example in

class GenericClass<T extends Number> { }

a variable GenericClass<?> would instead be a shortcut to GenericClass<? extends Number>.

The get-and-put principle


This observation leads us to the get-and-put principle. This principle is best explained by another famous example:

class CopyClass {
  <T> void copy(List<T> from, List<T> to) {
    for(T item : from) to.add(item);
  }
}

This method definition is not very flexible. If you had some list List<Integer> you could not copy its contents to some List<Number>  or even List<Object>. Therefore, the get-and-put principle states that you should always use lower-bounded wildcards (? extends) when you only read objects from a generic instance (via a return argument) and always use upper-bounded wildcards (? super) when you only provide arguments to a generic instance's methods. Therefore, a better implementation of MyAddRemoveList would look like this:

class CopyClass {
  <T> void copy(List<? extends T> from, List<? super T> to) {
    for(T item : from) to.add(item);
  }
}

Since you are only reading from one list and writing to the other list, Unfortunately, this is something that is easily forgoten and you can even find classes in the Java core API that do not apply the get-and-put principle. (Note that the above method also describes a generic type constructor.)

Note that the types List<? extends T> and List<? super T> are both less specific than the requirement of List<T>. Also note that this kind of subtyping is already implicit for non-generic types. If you define a method that asks for a method parameter of type Number, you can automatically receive instances of any subtype as for example Integer. Nevertheless, it is always type safe to read this Integer object you received even when expecting the supertype Number. And since it is impossible to write back to this reference, i.e. you cannot overwrite the Integer object with for example an instance of Double, the Java language does not require you to waive your writing intention by declaring a method signature like void someMethod(<? extends Number> number). Similarly, when you promised to return an Integer from a method but the caller only requires a Number-typed object as a result, you can still return (write) any subtype from your method. Similarly, because you cannot read in a value from a hypothetical return variable, you do not have to waive these hypothetical reading rights by a wildcard when declaring a return type in your method signature.

Freitag, 7. Juni 2013

Advanced Java generics: retreiving generic type arguments

After their introduction in the JDK5, Java generics quickly became an integral element of many Java programs. However, as easy Java generics seem at first glance, as quickly a programer can get lost with this feature.

Most Java programers are aware of the Java compiler's type erasure. Generally speaking, type erasure means that all generic type information about a Java class is lost during the compilation of its source code. This is a tribute to Java's backwards compatibility: all generic variations of a Java class share a single representation within a running Java application. If an instance of ArrayList<String> would have to remember that its generic type was of type String, it would have to store this information somewhere within its functional description in order to indicate that for example List.get actually returns a String type. (By functional description I refer to properties which are shared among all instances of a class. This includes for example method or field definitions. In contrast to its functional description, an instance's state which is individual to each instance is stored in its object representation.) The functional description of the ArrayList<String> instance is thus represented by its class ArrayList.class. Since the ArrayList.class instance is however shared with other instances which could also be of type ArrayList<Integer>, this would already require to have two different versions of ArrayList.class. Such modifications of the class representation would however be incomprehensible to older JREs and thus break the backwards compatibility of Java applications. As a consequence, the following comparison will always succeed:

assert new ArrayList<String>().getClass() == new ArrayList<Integer>().getClass();

Since such a comparison is conducted at run time where the generic type of a class was already erased, this comparison translates to ArrayList.class == ArrayList.class what is trivial. Or more specifically, the running application will determine that ArrayList.class is equal to itself and return true, despite of String.class != Integer.class. This is a major difference of Java to other programming languages like for example C++ and also the reason for a common complaint about Java. (Academically speaking, C++ does not actually know generic types. Instead, C++ offers templates which are however similar to generics.)

So far, this is nothing new to many developers. However, contrary to popular belief it is sometimes possible to retrieve generic type information even during run time. Before explaining, when this is possible, let us look at an example. For this we define the following two classes:

class MyGenericClass<T> { }
class MyStringSubClass extends MyGenericClass<String> { }

MyGenericClass has a single argument for a generic type T. MyStringSubClass extends this generic class and is assigning T = String as its type parameter. As a result, the Java compiler is able to store the information about the generic argument's type String of superclass MyGenericClass in the byte code of its subclass MyStringSubClass. This modification can be achieved without breaking backwards compatibility, because this information is simply stored in a region of the compiled class's byte code which is ignored by old JRE versions. At the same time, all instances of MyStringSubClass can still share a single class representation, since T = String is set for all instances of MyStringSubClass.

But how can we get hold of this information stored in the byte code? The Java API provides the Class.getGenericSuperclass method which can be used to receive an instance of type Type. If the direct superclass is in fact generic, the returned instance is additionally of type ParameterizedType and can be cast to it. (Type is nothing but a marker interface. The actual instance will be an instance of the internal ParameterizedTypeImpl class, you should however always cast to the interface.) Thanks to a cast to the ParameterizedType interface, you can now call the method ParameterizedType.getActualTypeArguments to retrieve an array which is again of type Type. Any generic type argument of the generic superclass will be contained in this array at the same index as in the type definition. Any Type instance which represents a non-generic class is simply an implementation of a Java Class class. (Assuming, you are not handeling an array where the returned type is of GenericArrayType. I will skip this scenario in this article for the sake of simplicity.)

Now we can make use of this knowledge to write a utility function:

public static Class<?> findSuperClassParameterType(Object instance, Class<?> classOfInterest, int parameterIndex) {
  Class<?> subClass = instance.getClass();
  while (classOfInterest != subClass.getSuperclass()) {
    // instance.getClass() is no subclass of classOfInterest or instance is a direct instance of classOfInterest
    subClass = subClass.getSuperclass();
    if (subClass == null) throw new IllegalArgumentException();
  }
  ParameterizedType parameterizedType = (ParameterizedType) subClass.getGenericSuperclass();
  return (Class<?>) parameterizedType.getActualTypeArguments()[parameterIndex];
}

This function will browse through the class hierarchy of instance until it recognizes classOfInterest to be the next direct sub class in the hierarchy. When this is the case, this super class will be retrieved by using the Class.getGenericSuperclass method. As described above, this method returns a class's super class in a wrapped representation (ParamererizedType) which includes the generic types which are found in the subclass. This allows us to successfully run the following application:

Class<?> genericType = findSuperClassParameterType(new MyStringSubClass(), MyGenericClass.class, 0);
assert genericType == String.class;

Be however aware that

findSuperClassParamerterType(new MyGenericClass<String>(), MyGenericClass.class, 0)

will throw an exception in this implementation. As stated before: the generic information can only be retrieved with the help of a subclass. MyGenericClass<String> is however not a subclass of MyGenericClass.class but a direct instance with a generic argument. But without an explicit subclass, there is no <something>.class representation to store the String argument. Therefore this time, the generic type was irretrievably erased during compilation. For this reason, it is a good practice to define MyGenericClass to be abstract, if you are planing on performing such queries on a class.

However, we have not yet solved the problem, since there are several pitfalls we ignored so far. To show why, think of the following class hierarchy:

class MyGenericClass<T> { }
class MyGenericSubClass<U> extends MyGenericClass<U>
class MyStringSubSubClass extends MyGenericSubClass<String> { }

If we now call

findSuperClassParameterType(new MyStringSubClass(), MyGenericClass.class, 0);

an exception will be thrown. But why is this so? So far, we assumed that the type parameter T for MyGenericClass was stored in a direct subclass. In our first example, this was MyStringSubClass which mapped the generic parameter T = String. In contrast, now MyStringSubSubClass stores a reference U = String while MyGenericSubClass only knows that U = T. U is however not an actual class but a type variable of Java type TypeVariable. If we want to resolve this hierarchy, we have to resolve all of these dependencies. This can be achieved by adjusting our example code:

public static Class<?> findSubClassParameterType(Object instance, Class<?> classOfInterest, int parameterIndex) {
  Map<Type, Type> typeMap = new HashMap<Type, Type>();
  Class<?> instanceClass = instance.getClass();
  while (classOfInterest != instanceClass.getSuperclass()) {
    extractTypeArguments(typeMap, instanceClass);
    instanceClass = instanceClass.getSuperclass();
    if (instanceClass == null) throw new IllegalArgumentException();
  }

  ParameterizedType parameterizedType = (ParameterizedType) instanceClass.getGenericSuperclass();
  Type actualType = parameterizedType.getActualTypeArguments()[parameterIndex];
  if (typeMap.containsKey(actualType)) {
    actualType = typeMap.get(actualType);
  }
  if (actualType instanceof Class) {
    return (Class<?>) actualType;
  } else {
    throw new IllegalArgumentException();
  }

private static void extractTypeArguments(Map<Type, Type> typeMap, Class<?> clazz) {
  Type genericSuperclass = clazz.getGenericSuperclass();
  if (!(genericSuperclass instanceof ParameterizedType)) {
    return;
  }

  ParameterizedType parameterizedType = (ParameterizedType) genericSuperclass;
  Type[] typeParameter = ((Class<?>) parameterizedType.getRawType()).getTypeParameters();
  Type[] actualTypeArgument = parameterizedType.getActualTypeArguments();
  for (int i = 0; i < typeParameter.length; i++) {
    if(typeMap.containsKey(actualTypeArgument[i])) {
      actualTypeArgument[i] = typeMap.get(actualTypeArgument[i]);
    }
    typeMap.put(typeParameter[i], actualTypeArgument[i]);
  }
}

The above code will resolve any chained generic type definitions by tracking them in a map. Please note that it is not enough to examine all type definitions by a specific index since MyClass<A,B> extends MyOtherClass<B,A> defines a perfectly legal subtype.

However, we are still not done. Again, we will look at an example first:

class MyGenericOuterClass<U> {
  public class MyGenericInnerClass<U> { }
}
class MyStringOuterSubClass extends MyGenericOuterClass<String> { }

MyStringOuterSubClass.MyGenericInnerClass inner = new MyStringOuterSubClass().new MyGenericInnerClass();

This time a reflection on the inner class by calling

findSuperClassParameterType(inner, MyGenericInnerClass.class, 0);

will fail. At first glance, this might seem consequent. We are looking for the generic argument type in MyGenericInnerClass on an instance of the same class. As we described above, this is usually not possible since no generic type information can be stored in MyGenericInnerClass.class. Here however, we examine an instance of a (non-static) inner class of a generic class's subtype. MyStringOuterSubClass knows that U = String. We have to take this into account when reflecting on the parameter type of MyGenericInnterClass.

Now here is where things get really tricky. In order to find generic declarations in outer classes, we have to first get hold of this outer class. This can be achieved by reflection and the fact that the Java compiler adds a synthetic (this means without source code representation) field this$0 to any inner class. This field can be retrieved by calling Class.getDeclaredField("this$0"). By obtaining the instance of the outer class in which the current inner class is contained, we automatically gain access to its Java class. Now we could just proceed as above and scan the enclosing class for generic definitions and add them to out map. However, type variable representation of U in MyGenericOuterClass will not equal the representation of U in MyGenericInnerClass. For all we know, MyGenericInnerClass could be static and define its own generic variable name space. Therefore, any TypeVariable type which represent generic variables in the Java API, is equipped with a genericDeclaration property. If two generic variables were defined in different classes, the TypeVariable representations are not equal by their definition, even if they share a name in the same name space by one class being a non-static inner class of the other.

Therefore we have to do the following:
  1. First, try to find a generic type in the inner classes super class hierarchy. Just as you would do with a non-nested class.
  2. If you cannot resolve the type: For the (non-static) inner class and all of its outer classes, resolve the type variables as complete as possible. This can be achieved by the same extractTypeArguments algorithm and is basically 1. for each nested class. We can get hold of the outer classes by checking if the this$0 field is defined for an inner class.
  3. Check if one of the outer classes contains a definition for a generic variable with an identical variable name. If this is the case, you found the actual type of a generic variable you were looking for.
In code, this looks like this:

public static Class<?> findSubClassParameterType(Object instance, Class<?> classOfInterest, int parameterIndex) {
  Map<Type, Type> typeMap = new HashMap<Type, Type>();
  Class<?> instanceClass = instance.getClass();
  while (classOfInterest != instanceClass.getSuperclass()) {
    extractTypeArguments(typeMap, instanceClass);
    instanceClass = instanceClass.getSuperclass();
    if (instanceClass == null) throw new IllegalArgumentException();
  }

  ParameterizedType parameterizedType = (ParameterizedType) instanceClass.getGenericSuperclass();
  Type actualType = parameterizedType.getActualTypeArguments()[parameterIndex];
  if (typeMap.containsKey(actualType)) {
    actualType = typeMap.get(actualType);
  }

  if (actualType instanceof Class) {
    return (Class<?>) actualType;
  } else if (actualType instanceof TypeVariable) {
    return browseNestedTypes(instance, (TypeVariable<?>) actualType);
  } else {
    throw new IllegalArgumentException();
  }
}

private static Class<?> browseNestedTypes(Object instance, TypeVariable<?> actualType) {
  Class<?> instanceClass = instance.getClass();
  List<Class<?>> nestedOuterTypes = new LinkedList<Class<?>>();
  for (
    Class<?> enclosingClass = instanceClass.getEnclosingClass();
    enclosingClass != null;
    enclosingClass = enclosingClass.getEnclosingClass()) {
    try {
      Field this$0 = instanceClass.getDeclaredField("this$0");
      Object outerInstance = this$0.get(instance);
      Class<?> outerClass = outerInstance.getClass();
      nestedOuterTypes.add(outerClass);
      Map<Type, Type> outerTypeMap = new HashMap<Type, Type>();
      extractTypeArguments(outerTypeMap, outerClass);
      for (Map.Entry<Type, Type> entry : outerTypeMap.entrySet()) {
        if (!(entry.getKey() instanceof TypeVariable)) {
          continue;
        }
        TypeVariable<?> foundType = (TypeVariable<?>) entry.getKey();
        if (foundType.getName().equals(actualType.getName())
            && isInnerClass(foundType.getGenericDeclaration(), actualType.getGenericDeclaration())) {
          if (entry.getValue() instanceof Class) {
            return (Class<?>) entry.getValue();
          }
          actualType = (TypeVariable<?>) entry.getValue();
        }
      }
    } catch (NoSuchFieldException e) { /* this should never happen */ } catch (IllegalAccessException e) { /* this might happen */}

  }
  throw new IllegalArgumentException();
}

private static boolean isInnerClass(GenericDeclaration outerDeclaration, GenericDeclaration innerDeclaration) {
  if (!(outerDeclaration instanceof Class) || !(innerDeclaration instanceof Class)) {
    throw new IllegalArgumentException();
  }
  Class<?> outerClass = (Class<?>) outerDeclaration;
  Class<?> innerClass = (Class<?>) innerDeclaration;
  while ((innerClass = innerClass.getEnclosingClass()) != null) {
    if (innerClass == outerClass) {
      return true;
    }
  }
  return false;
}

Wow, this is ugly! But the above code makes findSubClassParameterType work even with nested classes. We could go into even greater detail, since we can also find types of generic interfaces, generic methods, fields or arrays. The idea of all such extractions however remains the same. If a subclass knows the generic arguments of its super class, they can be retreived via reflections. Otherwise, due to type erasure, the generic arguments will be irretrievably lost at run time.

But in the end, what is this good for? To many developers, this conveys the impression of performed black magic such that they rather avoid writing such code. Admittedly, there are in general easier ways to perform such a query. We could have defined the MyGenericSubclass like this instead:

class MyGenericClass<T> {
  private final Class<T> clazz;
  public MyGenericClass(Class<T> clazz) {
    this.clazz = clazz;
  }
  public Class<T> getGenericClass() {
    return clazz;
  }
}

Of course, this works as well and is even less code. However, when you are writing APIs that are to be used by other developers, you often want them to be as slim and easy as possible. (This can go from writing a big framework to writing software in a team of two.) By the above implementation, you force the users of your class to provide redundant information that you could have retrieved differently. Also, this approach does not work likely well for interfaces where you implicitly require the implementing classes to add corresponding constructors. This matter will become even more relevant when looking towards Java 8 and its functional interfaces (also known as closures or lambda expressions). If you require your generic interfaces to supply a getGenericClass method besides their functional method, you cannot longer use them within a lambda expression.

PS: I hacked this code while I was writing this blog article and never really tested it but by dupa debugging. If you need such functionality, there is an excellent library called gentyref which provides the above analysis and much more.