Fixing bugs in Spring Boot and Mockito by instrumenting them
Have you ever wondered how libraries like Spring and Mockito modify your code at run-time to implement all their advanced features? Wouldn’t it be cool to get a peek behind the curtains? This is the premise of my meta-agent, a Java agent to instrument instrumenters, to get these insights and what this blog post is about. This post is a collaboration with Mikaël Francoeur, who had the idea for the meta-agent and wrote most of this post. So it’s my first ever post-collaboration. But I start with a short introduction to the agent itself before Mikaël takes over with real-world examples.
Meta-Agent
The meta-agent (GitHub) is a Java agent that instruments the Instrumentation.addTransformer methods agents use to add bytecode transformers and wrap the added transformers to capture bytecode before and after each transformation. This allows the agent to capture what every instrumenting agent does at run-time. I covered the basics of writing your own instrumenting agent before in my blog post Instrumenting Java Code to Find and Handle Unused Classes and my related talk. So, I’ll skip all the implementation details here.
But how can you use it? You first have to download the agent (or build it from scratch via mvn package -DskipTests), then you can just attach it to your JVM at the start:
A few months ago, I told you about the onjcmd feature in my blog post Level-up your Java Debugging Skills with on-demand Debugging (which is coming to JavaLand 2024). The short version is that adding onjcmd=y to the list of JDWP options allows you to delay accepting the incoming connection request in the JDWP agent until jcmd <JVM pid> VM.start_java_debugging is called.
The main idea is that the JDWP agent
only listens on the debugging port after it is triggered, which could have some security benefits
and that the JDWP agent causes less overhead while waiting, compared to just accepting connections from the beginning.
The first point is debatable; one can find arguments for and against it. But for the second point, we can run some benchmarks. After renewed discussions, I started benchmarking to conclude whether the onjcmd feature improves on-demand debugging performance. Spoiler alert: It doesn’t.
Renaissance is a modern, open, and diversified benchmark suite for the JVM, aimed at testing JIT compilers, garbage collectors, profilers, analyzers and other tools.
Renaissance is a benchmarking suite that contains a range of modern workloads, comprising of various popular systems, frameworks and applications made for the JVM.
Renaissance benchmarks exercise a range of programming paradigms, including concurrent, parallel, functional and object-oriented programming.
Renaissance typically runs the sub-benchmarks in multiple iterations. Still, I decided to run the sub-benchmarks just once per Renaissance run (via -r 1) and instead run Renaissance itself ten times using hyperfine to get a proper run-time distribution. I compared three different executions of Renaissance for this blog post:
without JDWP: Running Renaissance without any debugging enabled, to have an appropriate baseline, via java -jar renaissance.jar all -r 1
with JDWP: Running Renaissance in debugging mode, with the JDWP agent accepting debugging connections the whole time without suspending the JVM, via java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005 -jar renaissance.jar all -r 1
with onjcmd: Running Renaissance in debugging mode, with the JDWP agent accepting debugging connections only after the jcmd call without suspending the JVM, via java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,onjcmd=y,address=*:5005 -jar renaissance.jar all -r 1
Remember that we never start a debugging session or use jcmd, as we’re only interested in the performance of the JVM while waiting for a debugging connection in the JDWP agent.
Yes, I know that Renaissance uses different iteration numbers for the sub-benchmarks, but this should not affect the overall conclusions from the benchmark.
Results
Now to the results. For a current JDK 21 on my Ubuntu 23.10 machine with a ThreadRipper 3995WX CPU, hyperfine obtains the following benchmarks:
Benchmark 1: without JDWP
Time (mean ± σ): 211.075 s ± 1.307 s [User: 4413.810 s, System: 1438.235 s]
Range (min … max): 209.667 s … 213.361 s 10 runs
Benchmark 2: with JDWP
Time (mean ± σ): 218.985 s ± 1.924 s [User: 4533.024 s, System: 1133.126 s]
Range (min … max): 216.673 s … 222.249 s 10 runs
Benchmark 3: with onjcmd
Time (mean ± σ): 219.469 s ± 1.185 s [User: 4537.213 s, System: 1181.856 s]
Range (min … max): 217.824 s … 221.316 s 10 runs
Summary
"without JDWP" ran
1.04 ± 0.01 times faster than "with JDWP"
1.04 ± 0.01 times faster than "with onjcmd"
You can see that the run-time difference between “with JDWP” and “with onjcmd” is 0.5s, way below the standard deviations of both benchmarks. Plotting the benchmark results using box plots visualizes this fact:
Or, more analytically, Welch’s t-test doesn’t rule out the possibility of both benchmarks producing the same run-time distribution with p=0.5. There is, therefore, no measurable effect on the performance if we use the onjcmd feature. But what we do notice is that enabling the JDWP agent results in an increase in the run-time by 4%.
The question is then: Why has it been implemented in the JDK at all? Let’s run Renaissance on JDK 11.0.3, the first release supporting onjcmd.
Results on JDK 11.0.3
Here, using onjcmd results in a significant performance improvement of a factor of 1.5 (from 354 to 248 seconds) compared to running the JDWP agent without it:
Benchmark 1: without JDWP
Time (mean ± σ): 234.011 s ± 2.182 s [User: 5336.885 s, System: 706.926 s]
Range (min … max): 229.605 s … 237.845 s 10 runs
Benchmark 2: with JDWP
Time (mean ± σ): 353.572 s ± 20.300 s [User: 4680.987 s, System: 643.978 s]
Range (min … max): 329.610 s … 402.410 s 10 runs
Benchmark 3: with onjcmd
Time (mean ± σ): 247.766 s ± 1.907 s [User: 4690.555 s, System: 609.904 s]
Range (min … max): 245.575 s … 251.026 s 10 runs
Summary
"without JDWP" ran
1.06 ± 0.01 times faster than "with onjcmd"
1.51 ± 0.09 times faster than "with JDWP"
We excluded the finagle-chirper sub-benchmark here, as it causes the run-time to increase drastically. The sub-benchmark alone does not cause any problems, so the GC run possibly causes the performance hit before the sub-benchmark, which cleans up after the dotty sub-benchmark. Dotty is run directly before finagle-chirper.
Please be aware that the run sub-benchmarks on JDK 11 differ from the run on JDK 21, so don’t compare it to the results for JDK 21.
But what explains this difference?
Fixes since JDK 11.0.3
Between JDK 11.0.3 and JDK 21, there have been improvements to the OpenJDK, some of which drastically improved the performance of the JVM in debugging mode. Most notable is the fix for JDK-8227269 by Roman Kennke. The issue, reported by Egor Ushakov, reads as follows:
Slow class loading when running with JDWP
When debug mode is active (-agentlib:jdwp), an application spends a lot of time in JVM internals like Unsafe.defineAnonymousClass or Class.getDeclaredConstructors.Sometimes this happens on EDT and UI freezes occur.
If we look into the code, we’ll see that whenever a new class is loaded and an event about it is delivered, when a garbage collection has occurred, classTrack_processUnloads iterates over all loaded classes to see if any of them have been unloaded. This leads to O(classCount * gcCount) performance, which in case of frequent GCs (and they are frequent, especially the minor ones) is close to O(classCount^2). In IDEA, we have quite a lot of classes, especially counting all lambdas, so this results in quite significant overhead.
This change came into the JDK with 11.0.9. We see the 11.0.3 results with 11.0.8, but with 11.0.9, we see the results of the current JDK 11:
Benchmark 1: without JDWP
Time (mean ± σ): 234.647 s ± 2.731 s [User: 5331.145 s, System: 701.760 s]
Range (min … max): 228.510 s … 238.323 s 10 runs
Benchmark 2: with JDWP
Time (mean ± σ): 250.043 s ± 3.587 s [User: 4628.578 s, System: 716.737 s]
Range (min … max): 242.515 s … 254.456 s 10 runs
Benchmark 3: with onjcmd
Time (mean ± σ): 249.689 s ± 1.765 s [User: 4788.539 s, System: 729.207 s]
Range (min … max): 246.324 s … 251.559 s 10 runs
Summary
"without JDWP" ran
1.06 ± 0.01 times faster than "with onjcmd"
1.07 ± 0.02 times faster than "with JDWP"
This clearly shows the significant impact of the change. 11.0.3 came out on Apr 18, 2019, and 11.0.9 on Jul 15, 2020, so the onjcmd improved on-demand debugging for almost a year.
Want to try this out yourself? Get the binaries from SapMachine and run the benchmarks yourself. This kind of performance archaeology is quite rewarding, giving you insights into critical performance issues.
Conclusion
A few years ago, it was definitely a good idea to add the onjcmd feature to have usable on-demand debugging performance-wise. But nowadays, we can just start the JDWP agent to wait for a connection and connect to it whenever we want to, without any measurable performance penalty (in the Renaissance benchmark).
This shows us that it is always valuable to reevaluate if specific features are worth the maintenance cost. I hope this blog post gave you some insights into the performance of on-demand debugging. See you next week for the next installment in my hello-ebpf series.
This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.
JFR (JDK Flight Recorder) is the default profiler for OpenJDK (see my other blog posts for more information). What makes JFR stand out from the other profilers is the ability to log many, many different events that contain lots of information, like information on class loading, JIT compilation, and garbage collection. You can see a list of all available events on my JFR Event Collection website:
This website gives an overview of the events, with descriptions from the OpenJDK, their properties, examples, configurations, and the JDK versions in which every event is present. However, few descriptions are available, and the available texts are mostly single sentences.
TL:DR: I used GPT3.5 to create a description for every event by giving it the part of the OpenJDK source code that creates the event.
For most events, I state the lack of a description, coupled with a request that the knowledgeable reader might contribute one:
But as you can see, there is not really any progress in creating documentation. So, I have two options left:
Ask knowledgeable JDK developers to add descriptions: It’s time-consuming, and it would only be added in the next release
Write the descriptions myself directly for the website: This is pretty time-consuming, with over 150 events per JDK version.
Task AI to interpret the code that creates each event in the JDK source code.
With 1. and 2. infeasible, I started working on the AI approach, implementing it in my JFR event collector tool that collects the information displayed on the website.
I tried to use local AI models for this project but failed, so I started using GPT3.5-turbo and testing it on the OpenAI ChatGPT website. The main structure of my endeavor is as follows:
For every event, I first collect all usages and creations in the OpenJDK source, and then I construct a prompt of the following form that includes the surrounding context of the source locations:
Explain the JFR event <event> concisely so that the reader, proficient in JFR, knows the meaning and relevance of the event to profiling and its fields, without giving code snippets or referencing the code directly, take the following code as the context of its usage and keep it short and structured (and in markdown format, so format it properly to make it readable, using bullet points for field lists, but nothing else): <context>
Now some information about the event: Fields: <fields> It is <flags: experimental, internal>
Don’t mention implementation details, like methods, but explain the meaning of the event and how to use it for profiling. Keep field names lowercase and in backticks. Don’t use headings. Don’t repeat yourself. Keep it short.
For example, I asked ChatGPT the following for the ReservedStackActivation event:
Explain the JFR event ReservedStackActivation […]:
I then pass this prompt to GPT-3.5 and obtain the result:
The ReservedStackActivation event in JFR signifies a potentially dangerous stack overflow in a Java method that has been annotated with ReservedStackAccess. It is used for profiling purposes to identify methods that may be causing stack overflows.
The relevant field for this event is:
method: Represents the Java method where the potential stack overflow occurred.
My prompt and the code used to find the usage contexts, simple regular expressions without C++ parsing, is quite simplistic, but the results are somewhat usable.
But this event has also a description:
Activation of Reserved Stack Area caused by stack overflow with ReservedStackAccess annotated method in call stack
Why did I choose this event, then? Because it allows you to compare the LLM generated and the OpenJDK developer’s written description. Keep in mind that the LLM did not get passed the event description. The generated version is similar, yet more text.
You can find my implementation on GitHub (GPLv2.0 licensed) and the generated documentation on the JFR Event Collection:
Conclusion
I’m unsure whether I like or dislike the results of this experiment: It’s, on the one hand, great to generate descriptions for events that didn’t have any, using the code as the source of truth. But does it really give new insights, or is it just bloated text? I honestly don’t know whether the website needs it. Therefore, I am currently just generating it for JDK 21 and might remove the feature in the future. The AI can’t replace the insights you get by reading articles on specific events, like Gunnar Morling’s recent post on the NativeMemory events.
Do you have any opinions on this? Feel free to use the usual channels to voice your opinion, and consider improving the JFR documentation if you can.
See you next week with a blog post on something completely different yet slightly related to Panama and the reason for my work behind last week’s From C to Java Code using Panama article. Consider this as my Christmas present to my readers.
This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. Thanks to Vedran Lerenc for helping me with the LLM part of this project.
Or: I just released version 0.0.11 with a cool new feature that I can’t wait to tell you about…
According to the recent JetBrains survey, most people use Maven as their build system and build Spring Boot applications with Java. Yet my profiling plugin for IntelliJ only supports profiling pure Java run configuration. Configurations where the JVM gets passed the main class to run. This is great for tiny examples where you directly right-click on the main method and profile the whole application using the context menu:
But this is not great when you’re using the Maven build system and usually run your application using the exec goal, or, god forbid, use Spring Boot or Quarkus-related goals. Support for these goals has been requested multiple times, and last week, I came around to implementing it (while also two other bugs). So now you can profile your Spring Boot, like the Spring pet-clinic, application running with spring-boot:run:
Giving you a profile like:
Or your Quarkus application running with quarkus:dev:
Giving you a profile like:
This works specifically by using the options of these goals, which allows the profiler plugin to pass profiling-specific JVM options. If the plugin doesn’t detect a directly supported plugin, it passes the JVM options via the MAVEN_OPTS environment variable. This should work with the exec goals and others.
Gradle script support has also been requested, but despite searching the whole internet till the night, I didn’t find any way to add JVM options to the JVM that Gradle runs for the Spring Boot or run tasks without modifying the build.gradle file itself (see Baeldung).
Only Quarku’s quarkusDev task has the proper options so that I can pass the JVM options. So, for now, I only have basic Quarkus support but nothing else. Maybe one of my readers knows how I could still provide profiling support for non-Quarkus projects.
You can configure the options that the plugin uses for specific task prefixes yourself in the .profileconfig.json file:
{
"additionalGradleTargets": [
{
// example for Quarkus
"targetPrefix": "quarkus",
"optionForVmArgs": "-Djvm.args",
"description": "Example quarkus config, adding profiling arguments via -Djvm.args option to the Gradle task run"
}
],
"additionalMavenTargets": [
{ // example for Quarkus
"targetPrefix": "quarkus:",
"optionForVmArgs": "-Djvm.args",
"description": "Example quarkus config, adding profiling arguments via -Djvm.args option to the Maven goal run"
}
]
}
This update has been the first one with new features since April. The new features should make life easier for profiling both real-world and toy applications. If you have any other feature requests, feel free to create an issue on GitHub and, ideally, try to create a pull request. I’m happy to help you get started.
See you next week on some topics I have not yet decided on. I have far more ideas than time…
This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. Thanks to the issue reporters and all the other people who tried my plugin.
Java Flight Recorder (JFR) is one of the main open-source profilers for Java, and the only one built directly into the OpenJDK. You can find an introduction to Java profiling in my InfoQ Unleash the Power of Open-Source Profilers article and additional information and presentation on my Profiling Talks page. Furthermore, I wrote an introduction to custom JFR events: Custom JFR Events: A Short Introduction. JFR and custom events are pretty helpful when profiling applications, this blog post gives you an example from the real world.
I was searching for some JFR-related settings on the internet when I stumbled upon the /jfr command that exists in Minecraft:
This, of course, intrigued me, especially as Minecraft apparently adds some custom JFR events:
So I had to check it out. I downloaded and started the Java server, got a demo account, and connected to my local instance. This works with a demo account when you launch the demo world, enable the cheat mode in the settings, kick yourself via “/kick @p,” and then select your own server. I found this via this bug report.
You then must ensure that you have OP privileges and add them, if not via the Minecraft server shell. Then, you can type /jfr start in the chat (launch it by typing T) to start the recording and /jfr stop to stop it.
You see that it’s my first time “playing” Minecraft, and I’m great at getting attacked. It’s probably also my last time.
Minecraft stores the JFR file in the debug folder in the working directory of your server, both as a JFR file and as a JSON file. You can view the JFR file in a JFR viewer of your choice, like JMC or my IntelliJ JFR plugin (web view of the file, JFR file itself), and explore the custom JFR events:
This lets you get insights into the chunk generation and specific traffic patterns of the Minecraft server.
But what does the event specification look like? We could disassemble the Minecraft JAR and potentially get into legal trouble, or we could just use the jfr utility with its metadata command and get an approximation of the event definition from the JFR metadata:
@Name("minecraft.ChunkGeneration")
@Label("Chunk Generation")
@Category({"Minecraft", "World Generation"})
class ChunkGeneration extends jdk.jfr.Event {
@Label("Start Time")
@Timestamp("TICKS")
long startTime;
@Label("Duration")
@Timespan("TICKS")
long duration;
@Label("Event Thread")
@Description("Thread in which event was committed in")
Thread eventThread;
@Label("Stack Trace")
@Description("Stack Trace starting from the method the event was committed in")
StackTrace stackTrace;
@Label("First Block X World Position")
int worldPosX;
@Label("First Block Z World Position")
int worldPosZ;
@Label("Chunk X Position")
int chunkPosX;
@Label("Chunk Z Position")
int chunkPosZ;
@Label("Status")
String status;
@Label("Level")
String level;
}
You can find all defined events here. The actual implementation of these events is only slightly larger because some events accumulate data over a period of time.
I’m, of course, not the first OpenJDK developer who stumbled upon these custom events. Erik Gahlin even found them shortly after their addition in 2021 and promptly created an issue to recommend improvements (see MC-236873):
Conclusion
In my previous blog post, I showed you how to create custom JFR events for a small sample application. Seeing custom events in Minecraft shows you that custom events are used in the wild by applications used by millions of users, helping developers improve the performance of their applications.
JDK Flight Recorder (JFR) is one of the two prominent open-source profilers for the OpenJDK (besides async-profiler). It offers many features (see Profiling Talks) and the ability to observe lots of information by recording over one hundred different events. If you want to know more about the existing events, visit my JFR Event Collection website (related blog post):
Besides these built-in events, JFR allows you to implement your events to record custom information directly in your profiling file.
Let’s start with a small example to motivate this. Consider for a moment that we want to run the next big thing after Software-as-a-Service: Math-as-a-Service, a service that provides customers with the freshest Fibonacci numbers and more.
We develop this service using Javalin:
public static void main(String[] args) throws Exception {
// create a server with 4 threads in the thread pool
Javalin.create(conf -> {
conf.jetty.server(() ->
new Server(new QueuedThreadPool(4))
);
})
.get("/fib/{fib}", ctx -> {
handleRequest(ctx, newSessionId());
})
.start(7070);
System.in.read();
}
static void handleRequest(Context ctx, int sessionId) {
int n = Integer.parseInt(ctx.pathParam("fib"));
// log the current session and n
System.out.printf("Handle session %d n = %d\n", sessionId, n);
// compute and return the n-th fibonacci number
ctx.result("fibonacci: " + fib(n));
}
public static int fib(int n) {
if (n <= 1) {
return n;
}
return fib(n - 1) + fib(n - 2);
}
This is a pretty standard tiny web endpoint, minus all the user and session handling. It lets the customer query the n-th Fibonacci number by querying /fib/{n}. Our built-in logging prints n and the session ID on standard out, but what if we want to store it directly in our JFR profile while continuously profiling our application?
This is where custom JFR events come in handy:
public class SessionEvent extends jdk.jfr.Event {
int sessionId;
int n;
public SessionEvent(int sessionId, int n) {
this.sessionId = sessionId;
this.n = n;
}
}
The custom event class extends the jdk.jfr.Event class and simply define a few fields for the custom data. These fields can be annotated with @Label("Human readable label") and @Description("Longer description") to document them.
We can now use this event class to record the relevant data in the handleRequest method:
static void handleRequest(Context ctx, int sessionId) {
int n = Integer.parseInt(ctx.pathParam("fib"));
System.out.printf("Handle session %d n = %d\n", sessionId, n);
// create event
var event = new SessionEvent(sessionId, n);
// add start and stacktrace
event.begin();
ctx.result("fibonacci: " + fib(n));
// add end and store
event.commit();
}
This small addition records the timing and duration of each request, as well as n and the session ID in the JFR profile. The sample code, including a request generator, can be found on GitHub. After we ran the server, we can view the recorded events in a JFR viewer, like JDK Mission Control or my JFR viewer (online view):
This was my short introduction to custom JFR events; if you want to learn more, I highly recommend Gunnar Morlings Monitoring REST APIs with Custom JDK Flight Recorder Events article. Come back next week for a real-world example of custom JFR events.
This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.
You gave orders via jcmd (onjcmd=y option), a feature contributed by the SapMachine team
the program threw a specific exception (onthrow=<exception>)
The program threw an uncaught exception (onuncaught=y)
This is quite useful because the JDWP agent has to do substantial initialization before it can start listening for the attaching debugger:
The triggering event invokes the bulk of the initialization, including creation of threads and monitors, transport setup, and installation of a new event callback which handles the complete set of events.
Other things, like class loading, were slower with an attached debugger in older JDK versions (see JDK-8227269).
But what happens after you end the debugging session? Is your debugged program aborted, and if not, can you reattach your debugger at a later point in time? The answer is as always: It depends. Or, more precisely: It depends on the remote debugger you’re using and how you terminate the debugging session.
But why should you disconnect and then reattach a debugger? It allows you to not run the debugger during longer ignorable stretches of your application’s execution. The overhead of running the JDWP agent waiting for a connection is minimal compared to the plethora of events sent from the agent to the debugger during a debugging session (like class loading events, see A short primer on Java debugging internals).
Before we cover how to (re)attach a debugger in IDEs, we’ll see how this works on the JDWP/JDI level:
On JVM Level
The JDWP agent does not prevent the debugger from reattaching. There are two ways that Debugging sessions can be closed by the debugger: dispose and exit. Disposing of a connection via the JDWP Dispose command is the least intrusive way. This command is exposed to the debugger in JDI via the VirtualMachine#dispose() method:
Invalidates this virtual machine mirror. The communication channel to the target VM is closed, and the target VM prepares to accept another subsequent connection from this debugger or another debugger, including the following tasks:
Any current method invocations executing in the target VM are continued after the disconnection. Upon completion of any such method invocation, the invoking thread continues from the location where it was originally stopped.
Causes the mirrored VM to terminate with the given error code. All resources associated with this VirtualMachine are freed. If the mirrored VM is remote, the communication channel to it will be closed.
This, of course, prevents the debugger from reattaching.
Reattaching with IDEs
NetBeans, IntelliJ IDEA, and Eclipse all support reattaching after ending a debugging session by just creating a new remote debugging session. Be aware that this only works straightforwardly when using remote debugging, as the local debugging UI is usually directly intertwined with the UI for running the application. I would recommend trying remote debugging once in a while, even when debugging on your local machine, to be able to use all the advanced features.
Terminating an Application with IDEs
NetBeans is the only IDE of the three that does not support this (as far as I can ascertain). IntelliJ IDEA and Eclipse both support it, with Eclipse having the more straight-forward UI:
If the terminate button is not active, then you might have to tick the Allow termination of remote VM check-box in the remote configuration settings:
IntelliJ IDEA’s UI is, in this instance, arguably less discoverable: To terminate the application, you have to close the specific debugging session tab explicitly.
This then results in a popup that offers you the ability to terminate:
Conclusion
The ability to disconnect and then reattach debuggers is helpful for many complex debugging scenarios and can help you debug faster. Being able to terminate the application directly from the debugger is an additional time saver when working with remote debugging sessions. Both are often overlooked gems of Java debugging, showing once more how versatile the JDWP agent and UI debuggers are.
I hope you enjoyed this addendum to my Level-up your Java Debugging Skills with on-demand Debugging blog post. If you want even more debugging from me, come to my talk on debugging at JUG Karlsruhe on the 7th of November, to the ConFoo conference in Montreal on the 23rd of February, and hopefully, next year a conference or user group near you.
This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. It was supported by rainy weather and the subsequent afternoon in a cafe in Bratislava:
Both try to utilize a computation resource fully, be it hardware core or platform thread, by multiplexing multiple tasks onto it, despite many tasks waiting regularly for IO operations to complete:
When one task waits, another can be scheduled, improving overall throughput. This works especially well when longer IO operations follow short bursts of computation.
There are, of course, differences between the two, most notably: HyperThreading doesn’t need the tasks to cooperate, as Loom does, so a virtual core can’t starve other virtual cores. Also noteworthy is that the scheduler for Hyper-Threading is implemented in silicon and cannot be configured or even changed, while the virtual thread execution can be targeted to one’s needs.
I hope you found this small insight helpful in understanding virtual threads and putting them into context. You can find more about these topics in resources like JEP 444 (Virtual Threads) and the “Hyper-Threading Technology Architecture and Microarchitecture” paper.
This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.
Have you ever wanted to bring your JFR events into context? Adding information on sessions, user IDs, and more can improve your ability to make sense of all the events in your profile. Currently, we can only add context by creating custom JFR events, as I presented in my Profiling Talks:
We can use these custom events (see Custom JFR Events: A Short Introduction and Custom Events in the Blocky World: Using JFR in Minecraft) to store away the information and later relate them to all the other events by using the event’s time, duration, and thread. This works out-of-the-box but has one major problem: Relating events is quite fuzzy, as time stamps are not as accurate (see JFR Timestamps and System.nanoTime), and we do all of this in post-processing.
But couldn’t we just attach some context to every JFR event we’re interested in? Not yet, but Jaroslav Bachorik from DataDog is working on it. Recently, he wrote three blog posts (1, 2, 3). The following is a different take on his idea, showing how to use it in a small file server example.
The main idea of Jaroslav’s approach is to store a context in thread-local memory and attach it to every JFR event as configured. But before I dive into the custom context, I want to show you the example program, which you can find, as always, MIT-licensed on GitHub.
Example
We create a simple file server via Javalin, which allows a user to
Register (URL schema register/{user})
Store data in a file (store/{user}/{file}/{content})
Retrieve file content (load/{user}/{file})
Delete files (delete/{user}/{file})
The URLs are simple to use, and we don’t bother about error handling, user authentication, or large files, as this would complicate our example. I leave it as an exercise for the inclined reader. The following is the most essential part of the application: the server declaration:
This example runs on Jaroslav’s OpenJDK fork (commit 6ea2b4f), so if you want to run it in its complete form, please build the fork and make sure that you’re PATH and JAVA_HOME environment variables are set accordingly.
You can build the server using mvn package and start it, listening on the port 1000, via:
java -jar target/jfr-context-example.jar 1000
You can then use it via your browser or curl:
# start the server
java -XX:StartFlightRecording=filename=flight.jfr,settings=config.jfc \
-jar target/jfr-context-example.jar 1000 &
pid=$!
# register a user
curl http://localhost:1000/register/moe
# store a file
curl http://localhost:1000/store/moe/hello_file/Hello
# load the file
curl http://localhost:1000/load/moe/hello_file
-> Hello
# delete the file
curl http://localhost:1000/delete/moe/hello_file
kill $pid
# this results in the flight.jfr file
To make testing easier, I created the test.sh script, which starts the server, registers a few users and stores, loads, and deletes a few files, creating a JFR file along the way. We're using a custom JFR configuration to enable the IO events without any threshold. This is not recommended for production but is required in our toy example to get any such event: