Explaining invokedynamic. Introduction. Part I

alex_ber
10 min readSep 6, 2020

--

This is first part of mini-series of Explaining invokedynamic. This is the full list of all articles:

Introduction. Part I

Toy example. Part II

Bootstrap method . Part III

Number multiplication (almost) complete example. Part IV

Dynamical hashCode implementation. Part V

Java 9 String concatenation. Part VI

Lambda. Part VII

Records. Part VII

Final notes. Part IX

Introduction

Since Java 7 new bytecode instruction invokedynamic (or indy) was added.

It was added in JSR 292 in about 2011. It was originally designed for supporting Dynamically Typed Languages, as JSR name stated. So, it was ignored by waste majority of Java developers.

In Java 8 Lamda was introduced. It was foreign to Java concept by itself (more on this in the separate article later), so most of the attention was focused on what it is, why existing nested and inner classes are not enough, and why SAM is good way to go. When somebody tries to talk about invokedynamic this talk was for advanced user, once that write compiler, and they know how to read bytecode instructions almost as Java code. For most developers, including myself back than, this information was inaccessible. Things became even worst, because MethodHandle API was also new for almost everybody.

In this article I will try to explain you what is invokedynamic (or indy), how it works with concrete example, without getting too far inside JVM.Let’s start from short informal explanation about invocestatic and invocevirtual bytecodes.

Note: All code here is tested on Java 8, but the code should be able to be run in classpath on later version also (with some minor tweaks, that I’m describing).

invocestatic

How Java make call to the static method? Well, this sound pretty simple, compilers convert your source code to bytecode where it put exact instruction what static method should be called. This is almost true, but you should recall, that Java support method overloading. Let’s consider the following concrete code example:

We have 2 overloaded version of printfunction here. How compiler know what variant to chose?

Note: I’m simplifying the whole process, focusing only on relevant part.

Recall, Java is statically-typed language. This means, that variable types are determent in compilation type. So, for a compilers see that it’s static type is int.

When, it generate byte-code for line 16 it sees 2 candidate functions, print that receives int and print that receives long.Because it should send variable a that have (static) type int,compiler chose more specific method, namely one that receives int. There are exact rules on how such ambiguity has to be resolved. The crucial point, it is done in compile time. Once, compile chose the method to link to, it emit byte-code that will call this method directly.

When compiler go to the line 17, it sees 2 candidate print function again. Now, it should send variable b that have (static) type long, now more specific print method will be one that receives long, so now, compiler will emit byte-code that will call this method directly.

When compiler go to the line 18, it sees 2 candidate print function again. Now, it should send variable cthat have (static) type long. It is, indeed, true, that in runtime it will hold int value, but there is now way to compiler to knows this in advances (there is theoretical justification that I will skip here) it can be resolved only in runtime, that was all dynamically typed languages do. As I’ve said above, Java is statically typed language, so compilers see (static) type long (and compiler can’t “see” in general case that in runtime it will actually holds int), so more specific print method will be one that receives long will be chosen, compiler will emit byte-code that will call this method directly.

To summaries: compiler emit bytecode that links caller sites (lines 16,17,18 above) with correct implementation of the print method. Choosing correct method is hard-wired into compiler.

invocevirtual

HowHow Java make call to the function method? Well, compilers convert your source code to bytecode where it links to the correct method. Compiler consults Object’s dispatch mechanism in order to do so. This is why it is called single dispatch.

Let’s look on concrete example:

What compilers do in lines 23,24,26,27,29,30?

Note: I’m simplifying the whole process, focusing only on relevant part.

  • Line 23:

As far compiler compiler concern, the type of obj1 is Parent, so first of all it checks that Parent.class has method show. It does. Now, from compiler perspective the “real” signature of show method is:

public static void show(Parent this)

so, compiler will emit byte-code that will call this method and will pass obj1 to it (as first implicit argument).

When the program will run “Parent’s show()” will be printed out.

  • Line 24:

As with line 23 — as far compiler compiler concern, the type of obj1 is Parent, so first of all it checks that Parent.class has method time. It does. Now, from compiler perspective the “real” signature of time method is:

public static void time(Parent this)

so, compiler will emit byte-code that will call this method and will pass obj1 to it (as first implicit argument).

When the program will run “now” will be printed out.

  • Line 26:

As far compiler compiler concern, the type of obj2 is Child so first of all it checks that Child.class has method show. It does. Now, from compiler perspective the “real” signature of showmethod is:

public static void show(Child this)

so, compiler will emit byte-code that will call this method and will pass obj2 to it (as first implicit argument).

When the program will run “Child’s show()” will be printed out.

  • Line 27:

It starts as in line 26 — as far compiler compiler concern, the type of obj2 is Child so first of all it checks that Child.class has method time. This time, it doesn’t. So, compiler looks up for this method, in it’s super class (class that is not interface can have only at most one parent class). It’s look on Parent.class. This time it founds time method. Now, from compiler perspective the “real” signature of showmethod is:

public static void show(Parent this)

so, compiler will emit byte-code that will call this method and will pass obj2 to it (as first implicit argument).

Note, that obj2 is really Child, but show function above will have access to it only as Parent (for example, fields that are defined in Child will be inaccessible to it).

When the program will run “now” will be printed out.

  • Line 29:

As far compiler compiler concern, the type of obj3 is Parent, so first of all it checks that Parent.class has method show. It does. Now, from compiler perspective the “real” signature of show method is:

public static void show(Parent this)

so, compiler will emit byte-code that will call this method and will pass obj3 to it (as first implicit argument).

Note, that obj3 is really Child, but show function above will have access to it only as Parent (for example, fields that are defined in Child will be inaccessible to it).

When the program will run “Parent’s show()” will be printed out.

  • Line 30:

It starts as in line 29 — As far compiler compiler concern, the type of obj3 is Parent, so first of all it checks that Parent.class has method time. It doesn’t. So, compiler looks up for this method, in it’s super class (class that is not interface can have only at most one parent class). It’s look on Parent.class. This time it founds time method. Now, from compiler perspective the “real” signature of timemethod is:

public static void time(Parent this)

so, compiler will emit byte-code that will call this method and will pass obj3 to it (as first implicit argument).

Note, that obj3 is really Child, but show function above will have access to it only as Parent (for example, fields that are defined in Child will be inaccessible to it).

When the program will run “time” will be printed out.

To summaries: compiler emit bytecode that links caller sites (lines 23, 24, 26, 27, 29,30) with correct implementation of the show/time method. Choosing correct method is hard-wired into compiler. Algorithm is different from previous one, but is known to compiler and he can execute it in compile time. Note also, that we pass the object upon the method is called as implicit first parameter (this) to the method.

MethodHandle, CallSite and MethodHandles.Lookup

A CallSite is a holder for a variable MethodHandle.One way took on it is handle to the method (constructor, field, or similar low-level operation). This handle can change overtime. An invokedynamic instruction delegates all calls to it’s MethodHandle.

MethodHandle is such an Object which stores the metadata about the method (constructor, field, or similar low-level operation), such as the name of the method signature of the method etc. One way took on it is a destination of the pointer to method (de-referenced method (constructor, field, or similar low-level operation)).

Java code can create a method handle that directly accesses any method, constructor, or field that is accessible to that code. This is done via a reflective, capability-based API called MethodHandles.Lookup For example, a static method handle can be obtained from Lookup.findStatic. There are also conversion methods from Core Reflection API objects, such as Lookup.unreflect.

It is important to understand 2 key difference from Core Reflection API and MethodHandle.

  • With MethodHandle access check is done only once in construction time, with Core Reflection API it is done on every call to invoke method (and Securty Manager is invoked each time, slowing down the performance).
  • Core Reflection API invoke method is regular method. In MethodHandle all invoke* variances are signature polymorphic methods.

Basically, access check means whether you can access method (constructor, field, or similar low-level operation). For example, if the method (constructor, field, or similar low-level operation) is private, you can’t normally invoke it (get value from the field). More on this just in a second.

What is signature polymorphic method? Quote from JavaDoc:

Signature polymorphism

The unusual compilation and linkage behavior of invokeExact and plain invoke is referenced by the term signature polymorphism. As defined in the Java Language Specification, a signature polymorphic method is one which can operate with any of a wide range of call signatures and return types.

In source code, a call to a signature polymorphic method will compile, regardless of the requested symbolic type descriptor. As usual, the Java compiler emits an invokevirtual instruction with the given symbolic type descriptor against the named method. The unusual part is that the symbolic type descriptor is derived from the actual argument and return types, not from the method declaration.

When the JVM processes bytecode containing signature polymorphic calls, it will successfully link any such call, regardless of its symbolic type descriptor. (In order to retain type safety, the JVM will guard such calls with suitable dynamic type checks, as described elsewhere.)

Bytecode generators, including the compiler back end, are required to emit untransformed symbolic type descriptors for these methods. Tools which determine symbolic linkage are required to accept such untransformed descriptors, without reporting linkage errors.

https://docs.oracle.com/javase/8/docs/api/java/lang/invoke/MethodHandle.html

I will get back to signature polymorphic method later.

Now, let’s see how we can convert Core Reflection API objects to MethodHandle. Quote:

A lookup object is a factory for creating method handles, when the creation requires access checking. Method handles do not perform access checks when they are called, but rather when they are created. Therefore, method handle access restrictions must be enforced when a method handle is created. The caller class against which those restrictions are enforced is known as the lookup class.

A lookup class which needs to create method handles will call MethodHandles.lookup to create a factory for itself. When the Lookup factory object is created, the identity of the lookup class is determined, and securely stored in the Lookup object. The lookup class (or its delegates) may then use factory methods on the Lookup object to create method handles for access-checked members. This includes all methods, constructors, and fields which are allowed to the lookup class, even private ones…

Access checks are applied in the factory methods of Lookup, when a method handle is created. This is a key difference from the Core Reflection API, since java.lang.reflect.Method.invoke performs access checking against every caller, on every call.

All access checks start from a Lookup object, which compares its recorded lookup class against all requests to create method handles. A single Lookup object can be used to create any number of access-checked method handles, all checked against a single lookup class.

A Lookup object can be shared with other trusted code, such as a metaobject protocol. A shared Lookup object delegates the capability to create method handles on private members of the lookup class. Even if privileged code uses the Lookup object, the access checking is confined to the privileges of the original lookup class.

A lookup can fail, because the containing class is not accessible to the lookup class, or because the desired class member is missing, or because the desired class member is not accessible to the lookup class, or because the lookup object is not trusted enough to access the member. In any of these cases, a ReflectiveOperationException will be thrown from the attempted lookup.

https://docs.oracle.com/javase/8/docs/api/java/lang/invoke/MethodHandles.Lookup.html

Note:

  1. MethodHandles has package-private IMPL_LOOKUP field which can access to anything (you can get it with Core Reflection API, at least with JDK 8, see my Java Platform Module System article on how you can access it in later versions from classpath).
/** Package-private version of lookup which is trusted. */
static final Lookup IMPL_LOOKUP = new Lookup(Object.class, TRUSTED);

2. You can call java.lang.reflect.Method.setAccessible(true) before you’re passing it toLookup.unreflect. When we make field.setAccessible(true); it doesn’t meter what MethodHandles.Lookup object is used (again, I’m talking about classpath case); this call make such field effectively public. So, we will succeed to create MethodHandle. Internally, IMPL_LOOKUPinternal object, that has unrestricted access to everything, will be used inside Lookup.unreflect instead you obtained MethodHandles.Lookup to get MethodHandle.

So, you can think of MethodHandleas more powerful alternative to Core Reflection API. Framework developers are often used Core Reflection API to load class at runtime, to get method/feild, to call a method at runtime. But Core Reflection API has a performance cost as it does the security checking each time. MethodHandle can be seen as alternative approach (but again, it is much more powerful, than this, we will see some power below).

--

--

alex_ber
alex_ber

Written by alex_ber

Senior Software Engineer at Pursway

Responses (1)