Explaining invokedynamic. Dynamical hashCode implementation. Part V
This is fifth part of mini-series of Explaining invokedynamic. This is the full list of all articles:
Number multiplication (almost) complete example. Part IV
Dynamical hashCode implementation. Part V
Java 9 String concatenation. Part VI
On the previous part I’ve completed explanation about bootstrap method with concrete and (almost) complete example.
In this article I will shift my focus to the implementation logic and how you can have generic implementation using MethodHandle
API.
Note: All code here is tested on Java 8, but the code should be able to be run in classpath on later version also (with some minor tweaks, that I’m describing).
Dynamical hashCode implementation
Note:
- Code provided in this chapter based on backport of ObjectMethods. This is new class in JDK 14 that is used as part of
records
implementation. I will discussrecords
below. For this chapter, it is not important. The only important fact for this chapter thatrecords
doesn't support inheritance. - As all code provided in this article, this code was tested in JDK 8. Nevertheless, this code should work, but 1 line may be needed to be changed in JDK 9 and later (see comments).
Let’s start of problem definition. Suppose that we have JavaBean
. We want to write some generic function that will calculate it’s hashCode based on it’s fields. We also want to ignore completely all transient
fields.
Note: I’m usingtransient
for marking fields for exclusion from calculating hashCode
for demonstration purpose only.
Traditionally, in Java compoundhashCode
is calculated as P(31)
. We’re calculating polynomial with x=31
, where all coefficients are hashCode
of the fields that participate in compound hashCode.
It’s done using Horner’s scheme.
Let’s look on create concrete example:
Student here has hand-crafted hashCode
method.
Note:
hashCode
method ignorestransient
fieldssumGrades
andnumGrades.
- This is mutable
JavaBean
. As such it shouldn’t be replaced with records even in JDK 14. updateAverageGrade()
is somewhat non-standard. Because indouble
variable lost of precision can occur, I’m actually storing the value with 2 addition transient fields. 200.0/3 can be stored exactly indouble
, so it will be stored asint 200
andint 3.
Our goal is to write general-purpose hashCode
function that will behave exactly as this hand-crafted one.
Code in the hashCode
function above calculates the following expression:
(((0*31+h_firstName)*31+h_last_name)*31+h_age)*31+h_averageGrade
This is equivalent to
0*31^4+h_firstName*31^3+h_last_name*31^2+h_age*31+h_averageGrade
Effectively, it is polynomial of degree 3 of P(31)
where first coefficient is hashCode
of firstName, second coefficient is hashCode
of lastName, third coefficient is hashCode
of age, and finally is hashCode
of avegageGrade.
Note:
- As you can see the first coefficient is zero, that can be dropped. I didn’t drop it in order that my code will be inline with automatically generated code.
Integer.hashCode
andDouble.hashCode
was new function in JDK 8 that where added specifically to simplify hand-crafted implementation ofhashCode
function.- java.lang.Object defines
hashCode
function. So, every instantiated object has it.
It can be shown, that given the following method:
private static int hashCombiner(int x, int y) {
return x*31 + y;
}
Our original formula can be rewritten as:
hashCombiner(hashCombiner(hashCombiner(hashCombiner(0, firstName), lastName), h_age), h_averageGrade)
So, our algorithm will look like this:
1. Collect all non-transient field of the given object.2. Calculate hashCode for every such field. Let’s denote, their value as h_n, h_{n-1},…,h_0.3. accumaltor = 04. for h_i in {h_n, h_{n-1},…,h_0}:
accumaltor = hashCombiner(accumaltor, h_i)5. return accumaltor
This is the basic ingredient:
Note:
I will focus on makeHashCode
function. I will make only 3 remark for makeEquals
and makeToString.
Let’s go over the code of makeHashCode
function and convince ourself that it implements compond hashCode
algorithm described above.
On line 17 we have MethodHandle
with signature ()i,
that means, it receives no parameter and return int, actually 0.
On lines 30–33 we’re fetching ClassLoader
. We’re bypassing SecurityManager
. In JDK 8 it is sufficient to call Thread.currentThread().getContextClassLoader();
In the original code ClassLoader.getPlatformClassLoader();
is used. This is new function that is not available in JDK 8, it was added in JDK 9 with Java Platform Module System. So, if you’re running on JDK 9 you can switch back to use ClassLoader.getPlatformClassLoader()
. For more details, see my article about Java Platform Module System for more details.
On lines 35–36 we’re obtaining MethodHandle
to java.util.Objects#hashCode
method. We should use MethodHandles.publicLookup()
because on JDK9 and laterjava.util.Objects
is part of java.base
module that is not exported and ObjectMethods is in unnamed module, so MethodHandles.lookup()
will fail.
Note: java.util.Objects
was new in JDK 7. Quote from javadoc: “These utilities include null
-safe or null
-tolerant methods for computing the hash code of an object…”
On lines 38–39 we’re obtaining MethodHandle
to hashCombiner
private method. It can be looked directly only with MethodHandles.lookup()
. This method combines accumulated so far hashCode
with new one. See above.
On lines 41–56 we’re initializing primitiveHashers
is mapping between primitive class to hashCode
function for it’s corresponded wrapper class (int.class
->Integer::hashCode
for example).
On lines 63–64 there is hashCombiner
method described above.
On lines 67–71 there is factory method hasher
for MethodHandle
that receives java.lang.Class
and return int.
If java.lang.Class
is primitive than hashCode
function of corresponding wrapped is returned. Otherwise, MethodHandle
to java.util.Objects#hashCode
with adopted recieving type (from java.lang.Object to concrete type that represents received java.lang.Class
).
Now, let’s go over makeHashCode
itself.
On line 83MethodHandles.dropArguments(ZERO, 0, receiverClass)
drops first dummy argument (it is actually absent, because ZERO is primitive int,
so nothing is drops) and then inserts
receiverClass as first argument. Effectively, we’re changing signature of ZERO constant from ()int
to (R)I
, because eventually we’re going to invoke it on some concrete object (such as Student
). This as if we have handle to the following function:
private static int accumulator(T object) {
return 0;
}
On lines 85–93 there is loop on MethodHandle
getter that represent relevant fields upon which will be calculating compound hashCode
function. This corresponds to h_n, h_{n-1},…,h_0 in algorithm above.
On line 87 I’ve added the following:
getter = getter.asType(MethodType.methodType(getter.type().returnType(), receiverClass)); //added
I’m repeating myself, this line is absent in the original code, it was added by me. I want to harmonize receiver class for every getter. If I have inheritance, than the receiver class will class whether the filed is defined in the source code. I want to make it equal. This is needed, because otherwise, MethodHandles.permuteArguments()
call will fail. It will fail in MethodHandle
construction time, not when one of invoke*
methods will be called.
In the original code this line can be skipped, because hashCode
is calculated for records
, and they don’t support inheritance, that is all fields found will be from the same recevierClass
.
This will be clear with explicit example below.
On line 88 getter.type().returnType()
returns java.lang.Class
that represent field’s type. It is used as parameter for hasher
factory method that returns MethodHandle
with correct implementation of hashCode
function (for the details,, see description of definition of hasher
above). The returned MethodHandle
has type (T)I
where T is type of the field. The name of the handle is hasher
. This as if we have handle to the following function:
private static int hasher(T object) {return Objects.hashCode(object);
}
On line 89, we’re defining hashThisField
as if it was hasher(getter). MethodHandles.filterArguments(hasher, 0, getter)
adds preprocessor to the first argument. This is as if, instead of x
, we’re passing getter(x)
. Such preprocessor should be unary function. Effectively, this as if we have handle to the following function:
private static int hashThisField(T object) {return Objects.hashCode(getter(object));
}
Note: objects.getClass()
should be receiverClass
. This is what line 87 enforces. Otherwise, hashThisField
and accumulator
will have different receiver class, that will lead to IllegalArgumentException
in line 91.
On line 90, we’re defining combineHashes
as if it was HASH_COMBINER(accumulator, hashThisField). MethodHandles.filterArguments(HASH_COMBINER, 0, accumulator, hashThisField);
adds 2 preprocessor for first and for second arguments. Each of preprocessor should be unary function. Effectively, this as if we have handle to the following function:HASH_COMBINER(accumulator, hashThisField)
. This as if we have handle to the following function:
private static int combineHashes(T object, T object) {
return accumulator(object)*31 + hashThisField(object);
}
Note: We have the same object twice now in signature of combineHashes MethodHandle.
On line 91 we’re redefining accumulator
as MethodHandles.permuteArguments(combineHashes, accumulator.type(), 0, 0);
Quote from the JavaDoc:
No argument or return value conversions are applied. The type of each incoming argument, as determined by
newType
, must be identical to the type of the corresponding outgoing parameter or parameters in the target method handle. The return type ofnewType
must be identical to the return type of the original target.The reordering array need not specify an actual permutation. An incoming argument will be duplicated if its index appears more than once in the array, and an incoming argument will be dropped if its index does not appear in the array. As in the case of
dropArguments
, incoming arguments which are not mentioned in the reordering array are may be any type, as determined only bynewType
.
So, because only 0 is used in the reorder
array, the last parameter will be dropped.
Note: combineHashes
and accumulator
should have exactly the same type. (again, this is why I added line 87).
Effectively, we have combineHashes
with last argument dropped, so we have:
private static int accumulator(T object) {
return accumulator_old_(object)*31 + hashThisField(object);
}
This accumulator_new
is used as accumulator
in the next iteration.
It will basically reproduce rewritten formula hashCode
calculation.
private static int makeHashCode(T object) {
for getter:getters:
accumulator = accumulator(object)*31 +
Objects.hashCode(getter(object));return accumulator;
}
Note: We’re building calculation. When we’re combining accumalator
no actual calculation is performed. When some invoke*
method will be called, our calculation will be triggered with passed object.
You can look on this as we’re constructing lazy evalution for object.
Note:
- I want to explain following line from
makeEquals
:
MethodHandle isInstance = MethodHandles.dropArguments(CLASS_IS_INSTANCE.bindTo(receiverClass), 0,
receiverClass); // (RO)Z
CLASS_IS_INSTANCE
is defined as if we have following method inside java.lang.Class
:
public boolean CLASS_IS_INSTANCE(Object obj){
return this.isInstance(obj);
}
From JVM perspective this signature is equivalent to:
public static boolean CLASS_IS_INSTANCE(Class this, Object obj){
return this.isInstance(obj);
}
CLASS_IS_INSTANCE.bindTo(receiverClass)
change it as if it where in java.lang.Class
:
public boolean CLASS_IS_INSTANCE(Object obj){
return receiverClass.isInstance(obj);
}
The whole statement above change it as if it where in receiverClass
:
public boolean isInstance(Object obj){
return receiverClass.isInstance(obj);
}
From JVM perspective this signature is equivalent to:
public static boolean isInstance(Class<?> receiverClass, Object obj){
return receiverClass.isInstance(obj);
}
2. I want to explain another line from makeEquals
:
MethodHandles.guardWithTest(isInstance, accumulator.asType(ro), instanceFalse)
This is construct lazy conditional expression (lazy if). When it will be invoked on receiverClass
and obj,
if the evaluation result will be true, than accumulator
addapted to ro type will be evaluated, otherwise, instanceFalse
will be evaluated (that will ignore both arguments and just will return false
).
3. I want I want to explain following line from makeToString:
MethodHandle formatter = MethodHandles.insertArguments(STRING_FORMAT, 0, formatString)
.asCollector(String[].class, getters.size()); // (R*)String
STRING_FORMAT
is defined as :
public static boolean STRING_FORMAT(String format, Object[] args){
return this.format(format, args);
}
MethodHandles.insertArguments(STRING_FORMAT, 0, formatString)
change it to:
public static boolean CLASS_IS_INSTANCE(String formatString, Object[] args){
return formatString.format(format, args);
}
and finally asCollector(String[].class, getters.size())
part change it to:
public static boolean formatter(String formatString, R args0, R args1,...R argsn){
return formatString.format(format, new Object[]{args0, args1,...,argsn});
}
So, far we have Student
object and in ObjectMethods
we have functions with following signatures:
public static MethodHandle makeHashCode(Class<?> receiverClass, List<MethodHandle> getters) ;public static MethodHandle makeEquals(Class<?> receiverClass, List<MethodHandle> getters);public static MethodHandle makeToString(Class<?> receiverClass, List<MethodHandle> getters, List<String> names);
So, first, we have to know how to extract Class<?> receiverClass, List<MethodHandle> getters, List<String> names
from arbitrary object.
Note: Extracting receiverClass
from non-null object is easy:
Class<?> receiverClass = object.getClass();
Well, I don’t know how to construct List<MethodHandle> getters
, but I know how to construct List<Field> allFields
. Just to remind this should be all non-transient fields. But, Field
can be easily converted to MethodHandle
by publicLookup.unreflectGetter(field).
Moreover, Field
also contain name
field, so it can be also easy converted to List<String>
.
So, we need intermediate layer that knows how to construct List<Field>
from object and also knows how to filter all transient fields. Here we go:
I want to give only 2 notes:
- Function
private static Field[] getDeclaredFields(Class<?> clazz)
was simplified, I’ve removed local cache. You’re welcome to add it back.
2. Loop
Class<?> targetClass = clazz;
do {
Field[] fields = getDeclaredFields(targetClass);
for (Field field : fields) {
//skiped for bravity
}
targetClass = targetClass.getSuperclass();
}
while (targetClass != null && targetClass != Object.class);
If clazz represents primitive
type or interface
(Lambda is also (functional) interface) we will not hit java.lang.Object
on traversing inheritance hierarchy up. Then, when the hierarchy will end targetClass.getSuperclass()
will return null
and we will break the loop on first condition.
Otherwise, all other constructs in Java, including enum,
(from JDK 1.5), including record
(starting from JDK 14, preview feature) has java.lang.Object
as it’s base class. So, when we’re traversing inheritance hierarchy from clazz
to Class<java.lang.Object>
not included Class<java.lang.Object>
(because Object.class.getDeclaredFields()
returns empty array. So, we can make some optimization here.
If some class in inheritance hierarchy implements some interface
, it can be ignored, because interfaces in Java don’t have fields (even, after changing interfaces in JDK 8).
Note: traversing inheritance hierarchy to discover all declared methods looks far more complicated.
java.lang.Object
should be also traversed.- If clazz is interface than all it’s super interfaces should be traversed (this is done with
clazz.getInterfaces()
and not withclazz.getSuperclass()
). - Also, starting from JDK 8 interfaces also has default method that expected also to be found as methods in inheritance hierarchy.
- You should take into account also bridge-method and synthetic method.
We have Student
object and in ObjectMethods
we have functions with following signatures:
public static MethodHandle makeHashCode(Class<?> receiverClass, List<MethodHandle> getters) ;
We know how to extract receiverClass
Class<?> receiverClass = object.getClass();
In the ReflectionUtils
we have getAllFields
function with the following signature:
public static List<Field> getAllFields(Class<?> clazz, FieldFilter ff) ;
where FieldFilter
is
@FunctionalInterface
public interface FieldFilter{ public boolean matches(Field field);
}
All these is enough to define:
public static MethodHandle createHashCodeMethodHandle(Object obj);
public static List<String> getGetterNames(Class<?> clazz)
is pretty straightforward. I’m using Lamdba to define FieldFilter ff
as
f->!Modifier.isTransient(f.getModifiers()
it will be used inside ReflectionUtils.getAllFields
to include only non-transient fields.
ReflectionUtils.getGetters
looks similar to ReflectionUtils.getGetterNames
. It use the same Lamdba to define FieldFilter
that is used inside ReflectionUtils.getAllFields
to include only non-transient fields. The interesting part is:
field.setAccessible(true);
In JDK 8 this calls works fine. In JDK 9 and later this code works when it runs from the classpath. See my article about Java Platform Module System for more details.
Why do we need to make this call anyway?
In few lines below we’re doing effectively the following call:
MethodHandles.publicLookup().unreflectGetter(field);
This converts Field
to MethodHandle.
Field make access check every time we’re accessing the associated value (when we’re invoking it’s get
method). So we can have Field
reference and even we can retrieve metadata (name, type) of say, private
field, but when we will invoke get
method check whether we can have access to the value is done, and if access check fails IllegalAccessException
is thrown.
When we creates MethodHandle
this access check is done in creation time. When MethodHandle
is constructed we have this access check. We can’t have MethodHandle
with unauthorized access. If we do we MethodHandle
we can freely use it’s invoke*
methods. No additional access check is done when one of invoke*
methods are invoked. Access check is done only once in it’s construction.
So, if we have Field
with accessibility
flag false
access to it’s content should be denied. So, in this case unreflectGetter(field)
will fail (it constructs and returns MethodHandle
).
Using MethodHandles.lookup().unreflectGetter(field);
will not make any difference. The MethodHandles.Lookup
object will have privileges of ReflectionUtils
class. But this class can’t access private fields of the Student
class. So, call to unreflectGetter(field)
will fail for the same reason.
When we make field.setAccessible(true);
it doesn’t meter what MethodHandles.Lookup
object is used (again, I’m talking about classpath case); this call make such field effectively public
. So, we will succeed to create MethodHandle
. Internally, IMPL_LOOKUP
internal object, that has unrestricted access to everything, will be used inside Lookup.unreflect
instead you obtained MethodHandles.Lookup
to get MethodHandle.
ReflectionUtils.createHashCodeMethodHandle
implementation is simple now.
Now, let’s make some shortcut. Let’s test ReflectionUtils.createHashCodeMethodHandle
directly.
This code is straightforward, the only line that I want to explain is line 21:
System.out.println((int)hashCodeMh.invokeExact(student));
Why we have casting to int
?
Quote from JavaDoc:
Signature polymorphism
The unusual compilation and linkage behavior of
invokeExact
and plaininvoke
is referenced by the term signature polymorphism. As defined in the Java Language Specification, a signature polymorphic method is one which can operate with any of a wide range of call signatures and return types.In source code, a call to a signature polymorphic method will compile, regardless of the requested symbolic type descriptor. As usual, the Java compiler emits an
invokevirtual
instruction with the given symbolic type descriptor against the named method. The unusual part is that the symbolic type descriptor is derived from the actual argument and return types, not from the method declaration.When the JVM processes bytecode containing signature polymorphic calls, it will successfully link any such call, regardless of its symbolic type descriptor. (In order to retain type safety, the JVM will guard such calls with suitable dynamic type checks, as described elsewhere.)
Bytecode generators, including the compiler back end, are required to emit untransformed symbolic type descriptors for these methods. Tools which determine symbolic linkage are required to accept such untransformed descriptors, without reporting linkage errors.
https://docs.oracle.com/javase/8/docs/api/java/lang/invoke/MethodHandle.html
So, without casting to int
we will have java.lang.invoke.WrongMethodTypeException: expected (Student)int but found (Student)Object
We can also re-arrange the code a bit:
Here, we’re constructing MethodHandle
before we even created Student
. Than, we’re populating Student
with it’s state, and than we’re invoking our MethodHandle.
This shows that our getter
s are “attached” to the field of the Student
-any change to Stutdent
's fields is ucorrectly reflected in hashCode
calculation.
Now, let’s add some inheritance:
Note:
- This class is constructed for demonstration purposes only.
active=1
has no meaning, other to indicate that we have additional field that participates inhashCode
calculation.hashCode
function is again hand-crafted. It is designed to reflect our automatic calculation. I’m using getter-method to access field of the base-class, because I’ve defined themprivate
and otherwise, I can’t access them. The correct way is to define themprotected
. I want to keep themprivate
for demonstration ofMethodHandle
mechanism.
Let’s use this new class in our demo:
Note, that our hashCode
is indeed have changed.
Now, let’s comment out line 87 in ObjectMethods.makeHashCode
. This line is not present in the original code, it was added to make receiver class the same to all getters. I want you to demonstrate why this line was added.
We will get the following exception:
First of all, note that it was thrown from line of the main Class. This is line when we’re constructing our MethodHandle
, not invoking. As stack trace indicates IllegalArgumentException
was thrown from MethodHandles.permuteArguments
because parameter types do not match (exactly!) after reorder: (Graduated,Student)int
, (Graduated)int
.
What is happening here, that MethodHandles.permuteArguments
function fails on argument check, Student
is not equal (exactly!) to Graduated
.
Why do we have Student
in the method signature in the first place? Our receiver class is Graduated
.(Graduated,Student)int
is signature combineHashes
handle, it’s second parameter is hashThisField
, that have signature MethodHandle(Student)int
, that has filter applied with getter
that have signature (on the second iteration of the loop) MethodHandle(Student)String
.
If we will look on singatures of the getters
we will see the following:
[MethodHandle(Graduated)int, MethodHandle(Student)String, MethodHandle(Student)String, MethodHandle(Student)int, MethodHandle(Student)double]
First one represent active
field that is defined in the Graduated
class. This is fine. But other fileds are defined in the Student
class. These MethodHandle
will work even on Student
class. But this cause problem to us, so line 87 converts all of them to have Graduated
as receiver class.
Now, let’s uncomment line 87 and make another experiment.
What if in line 81 we will change the static type of student to be Student? That is to Student student = new Graduated();
Here is the full code snippet for your convenience:
This code compiles just fine. If it was regular code, it should work in runtime exactly as previous one, because usually only runtime type of the objects are matter. Let’s run the code and see what will happen. We’re receiving java.lang.invoke.WrongMethodTypeException
.
You should recall, that invokeExact
is not regular method, it is polymorphic one. If you will read the javadoc quoted about signature polymorphism again, you will found that effectively it is invoked as it has Student.class
as it’s parameter. Why Student? Because this is compile-time type of student, and compile-time type of object is taken for polymorphic methods. But MethodHandle
we have construct in line 9 is constucted for Graduated
class and not for Student
. invokeExact
expect that types of the actual parameters and of MethodHandle
matches exactly, so it reports java.lang.invoke.WrongMethodTypeException: expected (Graduated)int but found (Student)int
One possible fix, will be use invoke
instead of invokeExact.
Code snippet for your convenience:
Quote from the JavaDoc of invoke
method:
Invokes the method handle, allowing any caller type descriptor, and optionally performing conversions on arguments and return values.
If the call site’s symbolic type descriptor exactly matches this method handle’s
type
, the call proceeds as if byinvokeExact
.Otherwise, the call proceeds as if this method handle were first adjusted by calling
asType
to adjust this method handle to the required type, and then the call proceeds as if byinvokeExact
on the adjusted method handle.There is no guarantee that the
asType
call is actually made. If the JVM can predict the results of making the call, it may perform adaptations directly on the caller's arguments, and call the target method handle according to its own exact type.
So, this code is roughly equivalent to this one:
Note: Why roughly equivalent and not exactly? See the quoted JavaDoc. JVM is permitted to make some optimizations, but the observed result will be the same.
This code is not equivalent to this one:
Here, you actually get’s wrong result.
In order to understand why, you should recall, that there is MethodHandle
invocation and MethodHandle
construction (there is also MethodHandle
compilation that I’m ignoring). Invocation part lines 23–24 are indeed equivalent, but the construction part are not. In the last code-snippet, look on line 10. We’re inspecting only Student.class
. It doesn’t contain addition active
field, so when MethodHandle
is constructed it will miss this field for the hashCode calculation. This is exactly what we’re observing.
Let’s finish our example. The client code doesn’t emit invokedynamic
bytecode. We have ReflectionUtils.createHashCodeMethodHandle
that returns MethodHandle
. We should define CallSite
to be connected to this MethodHandle.
In this example, we don’t have any alternative implementation for our hashCode
calculating, we’re using the generic one with possibly adapted types. This why, ConstantCallSite
is sufficient for us. Nevertheless, it is easy to switch to another CallSite
. Let’s look on the code:
In order to use student
as polymorphic object (see line 11 Student student = new Graduated();
) I’m using invoke
variant and not invokeExact
.
Note, that using asType
in MethodHandler
construction time (at line 24) will not prevent failure with calling invoke
function in line 23 of ClientMH
, because this will fail before we will reach our MethodHandler
implementation (in invocation time).
This is fifth part of mini-series of Explaining invokedynamic. This is the full list of all articles:
Number multiplication (almost) complete example. Part IV
Dynamical hashCode implementation. Part V