Fixing bugs in Spring Boot and Mockito by instrumenting them
Have you ever wondered how libraries like Spring and Mockito modify your code at run-time to implement all their advanced features? Wouldn’t it be cool to get a peek behind the curtains? This is the premise of my meta-agent, a Java agent to instrument instrumenters, to get these insights and what this blog post is about. This post is a collaboration with Mikaël Francoeur, who had the idea for the meta-agent and wrote most of this post. So it’s my first ever post-collaboration. But I start with a short introduction to the agent itself before Mikaël takes over with real-world examples.
Meta-Agent
The meta-agent (GitHub) is a Java agent that instruments the Instrumentation.addTransformer methods agents use to add bytecode transformers and wrap the added transformers to capture bytecode before and after each transformation. This allows the agent to capture what every instrumenting agent does at run-time. I covered the basics of writing your own instrumenting agent before in my blog post Instrumenting Java Code to Find and Handle Unused Classes and my related talk. So, I’ll skip all the implementation details here.
But how can you use it? You first have to download the agent (or build it from scratch via mvn package -DskipTests), then you can just attach it to your JVM at the start:
Understanding class loader hierarchies is essential when developing Java agents, especially if these agents are instrumenting code. In my Instrumenting Java Code to Find and Handle Unused Classes post, I instrumented all classes with an agent and used a Store class in this newly added code:
A challenge here is that all instrumented classes will use the Store. We, therefore, have to put the store onto the bootstrap classpath, making it visible to all classes.
Class loaders are responsible for (possibly dynamically) loading classes, and they form a hierarchy:
A class loader is an object that is responsible for loading classes. The class ClassLoader is an abstract class. Given the binary name of a class, a class loader should attempt to locate or generate data that constitutes a definition for the class. A typical strategy is to transform the name into a file name and then read a “class file” of that name from a file system.
[…]
The ClassLoader class uses a delegation model to search for classes and resources. Each instance of ClassLoader has an associated parent class loader. When requested to find a class or resource, a ClassLoader instance will usually delegate the search for the class or resource to its parent class loader before attempting to find the class or resource itself.
A typical Java application has a bootstrap class loader (internal JDK classes and the ClassLoader class itself, implemented in C++ code), a platform classloader (all other JDK classes), and an application/system class loader (application classes):
Bootstrap class loader. It is the virtual machine’s built-in class loader, typically represented as null, and does not have a parent.
Platform class loader. The platform class loader is responsible for loading the platform classes. Platform classes include Java SE platform APIs, their implementation classes and JDK-specific run-time classes that are defined by the platform class loader or its ancestors. The platform class loader can be used as the parent of a ClassLoader instance. […]
System class loader. It is also known as application class loader and is distinct from the platform class loader. The system class loader is typically used to define classes on the application class path, module path, and JDK-specific tools. The platform class loader is the parent or an ancestor of the system class loader, so the system class loader can load platform classes by delegating to its parent.
An application might create more class loaders to load classes, e.g., from JARs or do some access control; these classes typically have the application class loader as their parent.
Classes loaded by the application class loader (or children of it) can reference JDK classes but not vice versa. This leads to the problem mentioned before. We can mitigate this by putting all classes that our instrumentation-generated code uses into a runtime JAR which we then “put” on the bootstrap class path.
But we don’t put it there but instead tell the bootstrap class loader to also look into our runtime JAR when looking for a class. We do this by using the method void appendToBootstrapClassLoaderSearch(JarFile jarfile) of the Instrumentation class:
Specifies a JAR file with instrumentation classes to be defined by the bootstrap class loader.
When the virtual machine’s built-in class loader, known as the “bootstrap class loader”, unsuccessfully searches for a class, the entries in the JAR file will be searched as well.
This method may be used multiple times to add multiple JAR files to be searched in the order that this method was invoked.
But the documentation also tells us that you can create a giant mess when you aren’t careful, including only the minimal number of required classes in the added JAR:
The agent should take care to ensure that the JAR does not contain any classes or resources other than those to be defined by the bootstrap class loader for the purpose of instrumentation. Failure to observe this warning could result in unexpected behavior that is difficult to diagnose. For example, suppose there is a loader L, and L’s parent for delegation is the bootstrap class loader. Furthermore, a method in class C, a class defined by L, makes reference to a non-public accessor class C$1. If the JAR file contains a class C$1 then the delegation to the bootstrap class loader will cause C$1 to be defined by the bootstrap class loader. In this example an IllegalAccessError will be thrown that may cause the application to fail. One approach to avoiding these types of issues, is to use a unique package name for the instrumentation classes.
You have to append the classes to the search path before (!) the first reference of the classes, as a class that cannot be resolved when first referenced will never be adequately resolved.
More information on class loaders can be found in the Baeldung article Class Loaders in Java.
How to get the class loader hierarchy of your project
I wanted to know the class loader hierarchy for my own projects, so of course, I wrote an agent for it: The ClassLoader Hierarchy Agent prints the class loader hierarchy at agent load time, the JVM shutdown, and in regular intervals.
Its usage is quite simple. Just attach it to a JVM or add it at startup:
Usage: java -javaagent:classloader-hierarchy-agent.jar[=maxPackages=10,everyNSeconds=0] <main class>
maxPackages: maximum number of packages to print per classloader
every: print the hierarchy every N seconds (0 to disable)
For the finagle-httprenaissance benchmark, the agent, for example, prints the following when the benchmark is in full swing:
The root node is the bootstrap class loader. For every class loader, it gives us a thread that uses it as its primary class loader, a short list of packages associated with the class loader, and its child class loaders.
Class loaders can have names, but sadly not many class loader creators use this feature, which turns understanding the individual class loader hierarchies into a guessing game. This is especially the case for Spring based applications like the Spring PetClinic:
Feel free to try this agent on your applications; maybe you gain some new insights.
Conclusion
Understanding class loader hierarchies helps to understand subtle problems in writing instrumenting agents. Knowing how to write small agents can empower you to write simple tools to understand the properties of your application.
I hope this blog post helped you to understand class loader hierarchies and agents a little bit better. I’m writing it in a lovely park in Milan: