Replace the machine code bindings for invoke bytecodes to not use type speculating inline caches and/or racy machine code patching.
The primary goal of this JEP is to improve long-term maintainability of the code. The secondary goal is to make megamorphic calls fast.
It is not a goal to improve the overall performance of Java applications with this new dispatch mechanism. For some applications with hot megamorphic calls, that might be a bonus, but it is not an explicit goal.
Today's dynamically monomorphic calls should not regress noticeably, and the new mechanism has to remove more code complexity than it introduces.
Our baseline megamorphic calls are inefficient. To hide the overheads of megamorphic calls, type speculation machinery called inline caches hope for dynamically monomorphic callsites. When we are lucky and type speculation is successful, a more efficient monomorphic call can be made. But dynamically megamorphic calls are still slow, causing unpredictable performance characteristics, as the difference between monomorphic and megamorphic calls is significantly different. This type speculation logic is also very complicated and has caused many bugs over the years. The complicated life cycle of Just-In-Time (JIT) compiled machine code is a consequence of inline caches. It is also a significant complication for class unloading to deal with the various races that inline caches bring.
To replace type speculating inline caches, the baseline megamorphic calls are to be optimized to be nearly as fast as speculatively monomorphic calls. That way, the speculation machinery (~10000 lines of racy low level code, involving racy machine code patching) can be removed.
The proposal for invokevirtual calls is to flatten the vtable to have direct code pointers at a negative offset from the class pointer. This allows dispatching with a single indirect branch (compared to two direct branches for dynamically monomorphic calls today).
The proposal for invokeinterface calls is to give each interface method a unique number called "selector", and create a cuckoo-style hash table mapping selectors to code pointers for the concrete implementations of the interface method. The table is embedded straight into the native class. This allows dispatching the majority of the time with one indirect branch and one direct branch (compared to two direct branches for dynamically monomorphic calls today).
As for direct calls, today they emit a call straight to the target compiled code, when the target method is compiled, but go through a stub that fills in a method register when going into interpreted mode. This elides the cost of setting a register when calling compiled code, but causes many races and introduces a lot of code complexity, involving racy machine code patching. The cost of setting this register unconditionally and remove the stub code is considered minimal, compared to spilling and stack banging that we usually perform when calling a method.
An alternative invokeinterface dispatch mechanism performs graph colouring of interface methods to optimize the call further. Currently not considering that, because the primary goal of this JEP is to simplify the code and make it more maintainable.
The usual jtreg tests must pass, especially the ones concerning invoke bytecodes. Fortunately, there is already a lot of testing for invoke bytecodes, that can be reused in this effort to change its machine code bindings.
Risks and Assumptions
An assumption with this work is that dynamically monomorphic call sites using a well-predicted indirect branch should be as performant as a direct branch, due to branch target buffering in hardware. In other words, the software type speculation trick is already done in the hardware as well nowadays. But it might regress on machines without well performant branch prediction. The assumption is that such hardware is to a large extent out of fashion today.