Explaining invokedynamic. Dynamical hashCode implementation. Part V

alex_ber
15 min readSep 7, 2020

--

This is fifth part of mini-series of Explaining invokedynamic. This is the full list of all articles:

Introduction. Part I

Toy example. Part II

Bootstrap method . Part III

Number multiplication (almost) complete example. Part IV

Dynamical hashCode implementation. Part V

Java 9 String concatenation. Part VI

Lambda. Part VII

Records. Part VII

Final notes. Part IX

On the previous part I’ve completed explanation about bootstrap method with concrete and (almost) complete example.

In this article I will shift my focus to the implementation logic and how you can have generic implementation using MethodHandle API.

Note: All code here is tested on Java 8, but the code should be able to be run in classpath on later version also (with some minor tweaks, that I’m describing).

Dynamical hashCode implementation

Note:

  1. Code provided in this chapter based on backport of ObjectMethods. This is new class in JDK 14 that is used as part of records implementation. I will discuss records below. For this chapter, it is not important. The only important fact for this chapter that records doesn't support inheritance.
  2. As all code provided in this article, this code was tested in JDK 8. Nevertheless, this code should work, but 1 line may be needed to be changed in JDK 9 and later (see comments).

Let’s start of problem definition. Suppose that we have JavaBean. We want to write some generic function that will calculate it’s hashCode based on it’s fields. We also want to ignore completely all transient fields.

Note: I’m usingtransient for marking fields for exclusion from calculating hashCode for demonstration purpose only.

Traditionally, in Java compoundhashCode is calculated as P(31). We’re calculating polynomial with x=31, where all coefficients are hashCode of the fields that participate in compound hashCode. It’s done using Horner’s scheme.

Let’s look on create concrete example:

Student here has hand-crafted hashCode method.

Note:

  1. hashCode method ignores transient fields sumGrades and numGrades.
  2. This is mutable JavaBean. As such it shouldn’t be replaced with records even in JDK 14.
  3. updateAverageGrade() is somewhat non-standard. Because in double variable lost of precision can occur, I’m actually storing the value with 2 addition transient fields. 200.0/3 can be stored exactly in double, so it will be stored as int 200 and int 3.

Our goal is to write general-purpose hashCode function that will behave exactly as this hand-crafted one.

Code in the hashCode function above calculates the following expression:

(((0*31+h_firstName)*31+h_last_name)*31+h_age)*31+h_averageGrade

This is equivalent to

0*31^4+h_firstName*31^3+h_last_name*31^2+h_age*31+h_averageGrade

Effectively, it is polynomial of degree 3 of P(31) where first coefficient is hashCode of firstName, second coefficient is hashCode of lastName, third coefficient is hashCode of age, and finally is hashCode of avegageGrade.

Note:

  1. As you can see the first coefficient is zero, that can be dropped. I didn’t drop it in order that my code will be inline with automatically generated code.
  2. Integer.hashCode and Double.hashCode was new function in JDK 8 that where added specifically to simplify hand-crafted implementation of hashCodefunction.
  3. java.lang.Object defines hashCodefunction. So, every instantiated object has it.

It can be shown, that given the following method:

private static int hashCombiner(int x, int y) {
return x*31 + y;
}

Our original formula can be rewritten as:

hashCombiner(hashCombiner(hashCombiner(hashCombiner(0, firstName), lastName), h_age), h_averageGrade)

So, our algorithm will look like this:

1. Collect all non-transient field of the given object.2. Calculate hashCode for every such field. Let’s denote, their value as h_n, h_{n-1},…,h_0.3. accumaltor = 04. for h_i in {h_n, h_{n-1},…,h_0}:
accumaltor = hashCombiner(accumaltor, h_i)
5. return accumaltor

This is the basic ingredient:

Backporting java.lang.runtime.ObjectMethods.makeHashCode from JDK 14

Note:

I will focus on makeHashCodefunction. I will make only 3 remark for makeEquals and makeToString.

Let’s go over the code of makeHashCodefunction and convince ourself that it implements compond hashCode algorithm described above.

On line 17 we have MethodHandle with signature ()i, that means, it receives no parameter and return int, actually 0.

On lines 30–33 we’re fetching ClassLoader. We’re bypassing SecurityManager. In JDK 8 it is sufficient to call Thread.currentThread().getContextClassLoader(); In the original code ClassLoader.getPlatformClassLoader(); is used. This is new function that is not available in JDK 8, it was added in JDK 9 with Java Platform Module System. So, if you’re running on JDK 9 you can switch back to use ClassLoader.getPlatformClassLoader(). For more details, see my article about Java Platform Module System for more details.

On lines 35–36 we’re obtaining MethodHandle to java.util.Objects#hashCode method. We should use MethodHandles.publicLookup() because on JDK9 and laterjava.util.Objects is part of java.base module that is not exported and ObjectMethods is in unnamed module, so MethodHandles.lookup() will fail.

Note: java.util.Objects was new in JDK 7. Quote from javadoc: “These utilities include null-safe or null-tolerant methods for computing the hash code of an object…”

On lines 38–39 we’re obtaining MethodHandle to hashCombiner private method. It can be looked directly only with MethodHandles.lookup(). This method combines accumulated so far hashCode with new one. See above.

On lines 41–56 we’re initializing primitiveHashers is mapping between primitive class to hashCode function for it’s corresponded wrapper class (int.class->Integer::hashCode for example).

On lines 63–64 there is hashCombiner method described above.

On lines 67–71 there is factory method hasher for MethodHandle that receives java.lang.Class and return int. If java.lang.Class is primitive than hashCode function of corresponding wrapped is returned. Otherwise, MethodHandle to java.util.Objects#hashCode with adopted recieving type (from java.lang.Object to concrete type that represents received java.lang.Class).

Now, let’s go over makeHashCode itself.

On line 83MethodHandles.dropArguments(ZERO, 0, receiverClass) drops first dummy argument (it is actually absent, because ZERO is primitive int, so nothing is drops) and then inserts receiverClass as first argument. Effectively, we’re changing signature of ZERO constant from ()int to (R)I, because eventually we’re going to invoke it on some concrete object (such as Student). This as if we have handle to the following function:

private static int accumulator(T object) {
return 0;
}

On lines 85–93 there is loop on MethodHandle getter that represent relevant fields upon which will be calculating compound hashCode function. This corresponds to h_n, h_{n-1},…,h_0 in algorithm above.

On line 87 I’ve added the following:

getter = getter.asType(MethodType.methodType(getter.type().returnType(), receiverClass));    //added

I’m repeating myself, this line is absent in the original code, it was added by me. I want to harmonize receiver class for every getter. If I have inheritance, than the receiver class will class whether the filed is defined in the source code. I want to make it equal. This is needed, because otherwise, MethodHandles.permuteArguments() call will fail. It will fail in MethodHandle construction time, not when one of invoke* methods will be called.

In the original code this line can be skipped, because hashCode is calculated for records, and they don’t support inheritance, that is all fields found will be from the same recevierClass.

This will be clear with explicit example below.

On line 88 getter.type().returnType() returns java.lang.Class that represent field’s type. It is used as parameter for hasher factory method that returns MethodHandle with correct implementation of hashCode function (for the details,, see description of definition of hasher above). The returned MethodHandle has type (T)I where T is type of the field. The name of the handle is hasher. This as if we have handle to the following function:

private static int hasher(T object) {return Objects.hashCode(object);
}

On line 89, we’re defining hashThisFieldas if it was hasher(getter). MethodHandles.filterArguments(hasher, 0, getter) adds preprocessor to the first argument. This is as if, instead of x, we’re passing getter(x). Such preprocessor should be unary function. Effectively, this as if we have handle to the following function:

private static int hashThisField(T object) {return Objects.hashCode(getter(object));
}

Note: objects.getClass()should be receiverClass. This is what line 87 enforces. Otherwise, hashThisField and accumulator will have different receiver class, that will lead to IllegalArgumentException in line 91.

On line 90, we’re defining combineHashes as if it was HASH_COMBINER(accumulator, hashThisField). MethodHandles.filterArguments(HASH_COMBINER, 0, accumulator, hashThisField); adds 2 preprocessor for first and for second arguments. Each of preprocessor should be unary function. Effectively, this as if we have handle to the following function:HASH_COMBINER(accumulator, hashThisField). This as if we have handle to the following function:

private static int combineHashes(T object, T object) {
return accumulator(object)*31 + hashThisField(object);
}

Note: We have the same object twice now in signature of combineHashes MethodHandle.

On line 91 we’re redefining accumulator as MethodHandles.permuteArguments(combineHashes, accumulator.type(), 0, 0); Quote from the JavaDoc:

No argument or return value conversions are applied. The type of each incoming argument, as determined by newType, must be identical to the type of the corresponding outgoing parameter or parameters in the target method handle. The return type of newType must be identical to the return type of the original target.

The reordering array need not specify an actual permutation. An incoming argument will be duplicated if its index appears more than once in the array, and an incoming argument will be dropped if its index does not appear in the array. As in the case of dropArguments, incoming arguments which are not mentioned in the reordering array are may be any type, as determined only by newType.

https://docs.oracle.com/javase/8/docs/api/java/lang/invoke/MethodHandles.html#permuteArguments-java.lang.invoke.MethodHandle-java.lang.invoke.MethodType-int...-

So, because only 0 is used in the reorder array, the last parameter will be dropped.

Note: combineHashes and accumulator should have exactly the same type. (again, this is why I added line 87).

Effectively, we have combineHasheswith last argument dropped, so we have:

private static int accumulator(T object) {
return accumulator_old_(object)*31 + hashThisField(object);
}

This accumulator_new is used as accumulator in the next iteration.

It will basically reproduce rewritten formula hashCode calculation.

private static int makeHashCode(T object) {
for getter:getters:
accumulator = accumulator(object)*31 +
Objects.hashCode(getter(object));
return accumulator;
}

Note: We’re building calculation. When we’re combining accumalator no actual calculation is performed. When some invoke* method will be called, our calculation will be triggered with passed object.You can look on this as we’re constructing lazy evalution for object.

Note:

  1. I want to explain following line from makeEquals:
MethodHandle isInstance = MethodHandles.dropArguments(CLASS_IS_INSTANCE.bindTo(receiverClass), 0,
receiverClass); // (RO)Z

CLASS_IS_INSTANCE is defined as if we have following method inside java.lang.Class:

public boolean CLASS_IS_INSTANCE(Object obj){
return this.isInstance(obj);
}

From JVM perspective this signature is equivalent to:

public static boolean CLASS_IS_INSTANCE(Class this, Object obj){
return this.isInstance(obj);
}

CLASS_IS_INSTANCE.bindTo(receiverClass) change it as if it where in java.lang.Class:

public boolean CLASS_IS_INSTANCE(Object obj){
return receiverClass.isInstance(obj);
}

The whole statement above change it as if it where in receiverClass:

public boolean isInstance(Object obj){
return receiverClass.isInstance(obj);
}

From JVM perspective this signature is equivalent to:

public static boolean isInstance(Class<?> receiverClass, Object obj){
return receiverClass.isInstance(obj);
}

2. I want to explain another line from makeEquals:

MethodHandles.guardWithTest(isInstance, accumulator.asType(ro), instanceFalse)

This is construct lazy conditional expression (lazy if). When it will be invoked on receiverClass and obj, if the evaluation result will be true, than accumulator addapted to ro type will be evaluated, otherwise, instanceFalse will be evaluated (that will ignore both arguments and just will return false).

3. I want I want to explain following line from makeToString:

MethodHandle formatter = MethodHandles.insertArguments(STRING_FORMAT, 0, formatString)
.asCollector(String[].class, getters.size()); // (R*)String

STRING_FORMAT is defined as :

public static boolean STRING_FORMAT(String format, Object[] args){
return this.format(format, args);
}

MethodHandles.insertArguments(STRING_FORMAT, 0, formatString) change it to:

public static boolean CLASS_IS_INSTANCE(String formatString, Object[] args){
return formatString.format(format, args);
}

and finally asCollector(String[].class, getters.size()) part change it to:

public static boolean formatter(String formatString, R args0, R args1,...R argsn){
return formatString.format(format, new Object[]{args0, args1,...,argsn});
}

So, far we have Student object and in ObjectMethods we have functions with following signatures:

public static MethodHandle makeHashCode(Class<?> receiverClass, List<MethodHandle> getters) ;public static MethodHandle makeEquals(Class<?> receiverClass, List<MethodHandle> getters);public static MethodHandle makeToString(Class<?> receiverClass, List<MethodHandle> getters, List<String> names);

So, first, we have to know how to extract Class<?> receiverClass, List<MethodHandle> getters, List<String> names from arbitrary object.

Note: Extracting receiverClassfrom non-null object is easy:

Class<?> receiverClass = object.getClass();

Well, I don’t know how to construct List<MethodHandle> getters, but I know how to construct List<Field> allFields. Just to remind this should be all non-transient fields. But, Field can be easily converted to MethodHandle by publicLookup.unreflectGetter(field).Moreover, Field also contain name field, so it can be also easy converted to List<String>.

So, we need intermediate layer that knows how to construct List<Field> from object and also knows how to filter all transient fields. Here we go:

adopted from org.springframework.util.ReflectionUtils

I want to give only 2 notes:

  1. Function
private static Field[] getDeclaredFields(Class<?> clazz)

was simplified, I’ve removed local cache. You’re welcome to add it back.

2. Loop

Class<?> targetClass = clazz;
do {
Field[] fields = getDeclaredFields(targetClass);
for (Field field : fields) {
//skiped for bravity
}
targetClass = targetClass.getSuperclass();
}
while (targetClass != null && targetClass != Object.class);

If clazz represents primitivetype or interface (Lambda is also (functional) interface) we will not hit java.lang.Object on traversing inheritance hierarchy up. Then, when the hierarchy will end targetClass.getSuperclass() will return null and we will break the loop on first condition.

Otherwise, all other constructs in Java, including enum,(from JDK 1.5), including record (starting from JDK 14, preview feature) has java.lang.Object as it’s base class. So, when we’re traversing inheritance hierarchy from clazz to Class<java.lang.Object> not included Class<java.lang.Object> (because Object.class.getDeclaredFields() returns empty array. So, we can make some optimization here.

If some class in inheritance hierarchy implements some interface, it can be ignored, because interfaces in Java don’t have fields (even, after changing interfaces in JDK 8).

Note: traversing inheritance hierarchy to discover all declared methods looks far more complicated.

  • java.lang.Object should be also traversed.
  • If clazz is interface than all it’s super interfaces should be traversed (this is done with clazz.getInterfaces() and not with clazz.getSuperclass()).
  • Also, starting from JDK 8 interfaces also has default method that expected also to be found as methods in inheritance hierarchy.
  • You should take into account also bridge-method and synthetic method.

We have Student object and in ObjectMethods we have functions with following signatures:

public static MethodHandle makeHashCode(Class<?> receiverClass, List<MethodHandle> getters) ;

We know how to extract receiverClass

Class<?> receiverClass = object.getClass();

In the ReflectionUtils we have getAllFields function with the following signature:

public static List<Field> getAllFields(Class<?> clazz, FieldFilter ff) ;

where FieldFilter is

@FunctionalInterface
public interface FieldFilter{
public boolean matches(Field field);
}

All these is enough to define:

public static MethodHandle createHashCodeMethodHandle(Object obj);

public static List<String> getGetterNames(Class<?> clazz) is pretty straightforward. I’m using Lamdba to define FieldFilter ff as

f->!Modifier.isTransient(f.getModifiers()

it will be used inside ReflectionUtils.getAllFields to include only non-transient fields.

ReflectionUtils.getGetters looks similar to ReflectionUtils.getGetterNames. It use the same Lamdba to define FieldFilter that is used inside ReflectionUtils.getAllFields to include only non-transient fields. The interesting part is:

field.setAccessible(true);

In JDK 8 this calls works fine. In JDK 9 and later this code works when it runs from the classpath. See my article about Java Platform Module System for more details.

Why do we need to make this call anyway?

In few lines below we’re doing effectively the following call:

MethodHandles.publicLookup().unreflectGetter(field);

This converts Field to MethodHandle.Field make access check every time we’re accessing the associated value (when we’re invoking it’s get method). So we can have Field reference and even we can retrieve metadata (name, type) of say, private field, but when we will invoke get method check whether we can have access to the value is done, and if access check fails IllegalAccessException is thrown.

When we creates MethodHandle this access check is done in creation time. When MethodHandle is constructed we have this access check. We can’t have MethodHandle with unauthorized access. If we do we MethodHandle we can freely use it’s invoke* methods. No additional access check is done when one of invoke* methods are invoked. Access check is done only once in it’s construction.

So, if we have Field with accessibility flag false access to it’s content should be denied. So, in this case unreflectGetter(field) will fail (it constructs and returns MethodHandle).

Using MethodHandles.lookup().unreflectGetter(field); will not make any difference. The MethodHandles.Lookup object will have privileges of ReflectionUtils class. But this class can’t access private fields of the Student class. So, call to unreflectGetter(field) will fail for the same reason.

When we make field.setAccessible(true); it doesn’t meter what MethodHandles.Lookup object is used (again, I’m talking about classpath case); this call make such field effectively public. So, we will succeed to create MethodHandle. Internally, IMPL_LOOKUPinternal object, that has unrestricted access to everything, will be used inside Lookup.unreflect instead you obtained MethodHandles.Lookup to get MethodHandle.

ReflectionUtils.createHashCodeMethodHandle implementation is simple now.

Now, let’s make some shortcut. Let’s test ReflectionUtils.createHashCodeMethodHandle directly.

This code is straightforward, the only line that I want to explain is line 21:

System.out.println((int)hashCodeMh.invokeExact(student));

Why we have casting to int?

Quote from JavaDoc:

Signature polymorphism

The unusual compilation and linkage behavior of invokeExact and plain invoke is referenced by the term signature polymorphism. As defined in the Java Language Specification, a signature polymorphic method is one which can operate with any of a wide range of call signatures and return types.

In source code, a call to a signature polymorphic method will compile, regardless of the requested symbolic type descriptor. As usual, the Java compiler emits an invokevirtual instruction with the given symbolic type descriptor against the named method. The unusual part is that the symbolic type descriptor is derived from the actual argument and return types, not from the method declaration.

When the JVM processes bytecode containing signature polymorphic calls, it will successfully link any such call, regardless of its symbolic type descriptor. (In order to retain type safety, the JVM will guard such calls with suitable dynamic type checks, as described elsewhere.)

Bytecode generators, including the compiler back end, are required to emit untransformed symbolic type descriptors for these methods. Tools which determine symbolic linkage are required to accept such untransformed descriptors, without reporting linkage errors.

https://docs.oracle.com/javase/8/docs/api/java/lang/invoke/MethodHandle.html

So, without casting to int we will have java.lang.invoke.WrongMethodTypeException: expected (Student)int but found (Student)Object

We can also re-arrange the code a bit:

Here, we’re constructing MethodHandle before we even created Student. Than, we’re populating Student with it’s state, and than we’re invoking our MethodHandle. This shows that our getters are “attached” to the field of the Student-any change to Stutdent's fields is ucorrectly reflected in hashCode calculation.

Now, let’s add some inheritance:

Note:

  • This class is constructed for demonstration purposes only.
  • active=1 has no meaning, other to indicate that we have additional field that participates in hashCode calculation.
  • hashCode function is again hand-crafted. It is designed to reflect our automatic calculation. I’m using getter-method to access field of the base-class, because I’ve defined them private and otherwise, I can’t access them. The correct way is to define them protected. I want to keep them private for demonstration of MethodHandlemechanism.

Let’s use this new class in our demo:

Note, that our hashCode is indeed have changed.

Now, let’s comment out line 87 in ObjectMethods.makeHashCode. This line is not present in the original code, it was added to make receiver class the same to all getters. I want you to demonstrate why this line was added.

We will get the following exception:

First of all, note that it was thrown from line of the main Class. This is line when we’re constructing our MethodHandle, not invoking. As stack trace indicates IllegalArgumentException was thrown from MethodHandles.permuteArguments because parameter types do not match (exactly!) after reorder: (Graduated,Student)int, (Graduated)int.

What is happening here, that MethodHandles.permuteArguments function fails on argument check, Student is not equal (exactly!) to Graduated.

Why do we have Student in the method signature in the first place? Our receiver class is Graduated.(Graduated,Student)int is signature combineHashes handle, it’s second parameter is hashThisField, that have signature MethodHandle(Student)int, that has filter applied with getter that have signature (on the second iteration of the loop) MethodHandle(Student)String.

If we will look on singatures of the getters we will see the following:

[MethodHandle(Graduated)int, MethodHandle(Student)String, MethodHandle(Student)String, MethodHandle(Student)int, MethodHandle(Student)double]

First one represent active field that is defined in the Graduated class. This is fine. But other fileds are defined in the Student class. These MethodHandle will work even on Student class. But this cause problem to us, so line 87 converts all of them to have Graduated as receiver class.

Now, let’s uncomment line 87 and make another experiment.

What if in line 81 we will change the static type of student to be Student? That is to Student student = new Graduated();

Here is the full code snippet for your convenience:

This code compiles just fine. If it was regular code, it should work in runtime exactly as previous one, because usually only runtime type of the objects are matter. Let’s run the code and see what will happen. We’re receiving java.lang.invoke.WrongMethodTypeException.

You should recall, that invokeExact is not regular method, it is polymorphic one. If you will read the javadoc quoted about signature polymorphism again, you will found that effectively it is invoked as it has Student.class as it’s parameter. Why Student? Because this is compile-time type of student, and compile-time type of object is taken for polymorphic methods. But MethodHandle we have construct in line 9 is constucted for Graduated class and not for Student. invokeExact expect that types of the actual parameters and of MethodHandle matches exactly, so it reports java.lang.invoke.WrongMethodTypeException: expected (Graduated)int but found (Student)int

One possible fix, will be use invoke instead of invokeExact. Code snippet for your convenience:

Quote from the JavaDoc of invoke method:

Invokes the method handle, allowing any caller type descriptor, and optionally performing conversions on arguments and return values.

If the call site’s symbolic type descriptor exactly matches this method handle’s type, the call proceeds as if by invokeExact.

Otherwise, the call proceeds as if this method handle were first adjusted by calling asType to adjust this method handle to the required type, and then the call proceeds as if by invokeExact on the adjusted method handle.

There is no guarantee that the asType call is actually made. If the JVM can predict the results of making the call, it may perform adaptations directly on the caller's arguments, and call the target method handle according to its own exact type.

https://docs.oracle.com/javase/8/docs/api/java/lang/invoke/MethodHandle.html#invoke-java.lang.Object...-

So, this code is roughly equivalent to this one:

Note: Why roughly equivalent and not exactly? See the quoted JavaDoc. JVM is permitted to make some optimizations, but the observed result will be the same.

This code is not equivalent to this one:

Here, you actually get’s wrong result.

In order to understand why, you should recall, that there is MethodHandle invocation and MethodHandle construction (there is also MethodHandle compilation that I’m ignoring). Invocation part lines 23–24 are indeed equivalent, but the construction part are not. In the last code-snippet, look on line 10. We’re inspecting only Student.class. It doesn’t contain addition active field, so when MethodHandle is constructed it will miss this field for the hashCode calculation. This is exactly what we’re observing.

Let’s finish our example. The client code doesn’t emit invokedynamic bytecode. We have ReflectionUtils.createHashCodeMethodHandle that returns MethodHandle. We should define CallSite to be connected to this MethodHandle. In this example, we don’t have any alternative implementation for our hashCode calculating, we’re using the generic one with possibly adapted types. This why, ConstantCallSite is sufficient for us. Nevertheless, it is easy to switch to another CallSite. Let’s look on the code:

In order to use student as polymorphic object (see line 11 Student student = new Graduated();) I’m using invoke variant and not invokeExact.

Note, that using asType in MethodHandler construction time (at line 24) will not prevent failure with calling invoke function in line 23 of ClientMH, because this will fail before we will reach our MethodHandler implementation (in invocation time).

--

--

alex_ber
alex_ber

Written by alex_ber

Senior Software Engineer at Pursway

No responses yet