Redacting Sensitive Data from Java Flight Recorder Files

I few weeks ago, I showed you how to read and write JFR files programmatically. This week we’re using the covered basic-jfr-professor to create a fully fledged (yet still experimental) JFR and hserr file redaction tool called jfr-redact.

Duke the Java mascot redacting information on a sheet of paper using a black marker

TL;DR: Download jfr-redact from GitHub and redact sensitive information like user names, tokens, and keys from files via:

# Using the JAR directly
java -jar jfr-redact.jar redact recording.jfr

# Redact text files
java -jar jfr-redact.jar redact-text hs_err.log
Continue reading

Implement a new JStall Feature with Me

Or: How I use GitHub Copilot to go from feature to idea

A few weeks back, I introduced you to jstall (Quickly Inspect your Java Application with JStall), a tool that analyses what your JVM is currently doing. This week is the first time I’m bringing you in on the development and letting you peek behind the curtains to see how I go from idea to implemented feature. The feature I’ll implement the jvm-support analysis that checks that the JVM running your application is not outdated.

This is the first time I recorded my development process, so I hope you still liked it.

See you on another day for something command-line parser- or redaction-related.

This blog post is part of my work in the SapMachine team at SAP, making profiling easier for everyone.

Quickly Inspect your Java Application with JStall

Welcome to the last blog post of the year. Last week, I discussed the limitations of custom JFR events. This week, I’ll also be covering a profiling-related topic and showcasing a tiny tool called JStall.

I hope I’m not the only one who sometimes wonders: “What is my Java application doing right now?” When you don’t see any output. Yes, you could perform a simple thread dump via jstack, but it is hard to understand which threads are actually consuming CPU and making any sort of progress. This is where my tiny tool called JStall comes in:

JStall is a small command-line tool for one-shot inspection of running JVMs using thread dumps and short, on-demand profiling. The tool essentially takes multiple thread dumps of your application and uses the per-thread cpu-time information to find the most CPU-time-consuming Java threads.

First, download the JStall executable from the GitHub releases page. Let us then start by finding the currently running JVMs:

> ./jstall
Usage: jstall <command> <pid|files> [options]

Available commands:
  status     - Show overall status (deadlocks + most active threads)
  deadlock   - Check for deadlocks
  most-work  - Show threads doing the most work
  flame      - Generate flame graph
  threads    - List all threads

Available JVMs:
  7153 ./jstall
  1223 <unknown>
  8136 ./renaissance-gpl-0.16.0.jar
  6138 org.jetbrains.idea.maven.server.RemoteMavenServer36
  5597 DeadlockDemo
  49294 com.intellij.idea.Main

This provides us with a list of options for the main status command, as well as a list of JVM processes and their corresponding main classes. Let’s start checking for deadlocking:

Continue reading

Don’t use Arrays or other Complex Types in Custom JFR Events

JDK Flight Recorder (JFR) provides support for custom events as a profiler. Around two years ago, I wrote a blog post on this very topic: Custom JFR Events: A Short Introduction. These custom events are beneficial because they enable us to record additional project-specific information alongside the standard JFR events, all in the same file. We can then view and process this information with the JFR tools. You can freely specify these events in Java.

There is only one tiny problem nobody talks about: Array support (and, in more general, the support of complex types).

Take the following event:

class ArrayEvent extends Event {
    
    @Label("String Array")
    String[] stringArray;
    
    @Label("Int Array")
    int[] intArray;
    
    @Label("Long Array")
    long[] longArray;
    
    @Label("Non array field")
    String nonArrayField = "default";
}

What would you expect to happen when we create the event using the following code?

ArrayEvent event = new ArrayEvent();
event.stringArray = new String[]{"one", "two", "three"};
event.intArray = new int[]{1, 2, 3, 4, 5};
event.longArray = new long[]{100L, 200L, 300L};
event.commit();
Continue reading

Profiling with the Cloud Foundry CLI Java plugin

Welcome back to my blog, this time for a blog post on profiling your Java applications in Cloud Foundry and the tool I helped to develop to make it easier.

Cloud Foundry “is an open source, multi-cloud application platform as a service (PaaS) governed by the Cloud Foundry Foundation, a 501(c)(6) organization” (Wikipedia). It allows you to run your workloads easily in the cloud, including your Java applications. You just need to define a manifest.yml, like for example:

---
applications:
- name: sapmachine21
  random-route: true
  path: test.jar
  memory: 512M
  buildpacks: 
  - sap_java_buildpack
  env:
    TARGET_RUNTIME: tomcat
    JBP_CONFIG_COMPONENTS: "jres: ['com.sap.xs.java.buildpack.jdk.SAPMachineJDK']"
    JBP_CONFIG_SAP_MACHINE_JDK : "{ version: 21.+ }"
    JBP_CONFIG_JAVA_OPTS: "[java_opts: '-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints']"

But how would you profile this application? This and more is the topic of this blog post.

I will not discuss why you might want to use Cloud Foundry or how you can deploy your own applications. I assume you came this far in the blog post because you already have basic Cloud Foundry knowledge and want to learn how to profile your applications easily.

The Java Plugin

Cloud Foundry has a cf CLI with a proper plugin system with lots of plugins. A team at SAP, which included Tim Gerrlach, started to develop the Java plugin many years ago at SAP. It’s a plugin offering utilities to gain insights into JVMs running in your Cloud Foundry app.

Continue reading

Java 25’s new CPU-Time Profiler: Removing Redundant Synchronization (4)

The changes I described in this blog post led to segfaults in tests, so I backtracked on them for now. Maybe I made a mistake implementing the changes, or my reasoning in the blog post is incorrect. I don’t know yet.

In the last blog post, I wrote about how to size the request queue properly and proposed the sampler queue’s dynamic sizing. But what I didn’t talk about in this or the previous blog post are two topics; one rather funny and one rather serious:

  1. Is the sampler queue really a queue?
  2. Should the queue implementation use Atomics and acquire-release semantics?

This is what we cover in this short blog post. First, to the rather fun topic:

Is it a Queue?

I always called the primary data structure a queue, but recently, I wondered whether this term is correct. But what is a queue?

Definition: A collection of items in which only the earliest added item may be accessed. Basic operations are add (to the tail) or enqueue and delete (from the head) or dequeue. Delete returns the item removed. Also known as “first-in, first-out” or FIFO.

Dictionary of Algorithms and Data Structures by Paul E. Black

But how does my sampler use the sampler queue?

Continue reading

Java 25’s new CPU-Time Profiler: Queue Sizing (3)

Welcome back to my series on the new CPU-time profiler in Java 25. In the previous blog post, I covered the implementation of the new profiler. In this week’s blog post, I’ll dive deep into the central request queue, focusing on deciding its proper size.

The JfrCPUTimeTraceQueue allows the signal handler to record sample requests that the out-of-thread sampler and the safepoint handler process. So it’s the central data structure of the profiler:

This queue is thread-local and pre-allocated, as it’s used in the signal handler, so the correct sizing is critical:

  • If the size is too small, you’ll lose many samples because the signal handler can’t record sample requests.
  • If you size it too large, you waste lots of memory. A sampling request is 48 bytes, so a queue with 500 elements (currently the default) requires 24kB. This adds up fast if you have more than a few threads.

So, in this blog post, we’re mainly concerned about setting the correct default size and discussing a potential solution to the whole problem.

Continue reading

Java 25’s new CPU-Time Profiler: The Implementation (2)

I developed, together with others, the new CPU-time profiler for Java, which is now included in JDK 25. A few weeks ago, I covered the profiler’s user-facing aspects, including the event types, configuration, and rationale, alongside the foundations of safepoint-based stack walking in JFR (see Taming the Bias: Unbiased Safepoint-Based Stack Walking). If you haven’t read those yet, I recommend starting there. In this week’s blog post, I’ll dive into the implementation of the new CPU-time profiler.

It was a remarkable coincidence that safepoint-based stack walking made it into JDK 25. Thanks to that, I could build on top of it without needing to re-implement:

  • The actual stack walking given a sampling request
  • Integration with the safepoint handler

Of course, I worked on this before, as described in Taming the Bias: Unbiased Safepoint-Based Stack Walking. But Erik’s solution for JDK 25 was much more complete and profited from his decades of experience with JFR. In March 2025, whether the new stack walker would get into JDK 25 was still unclear. So I came up with other ideas (which I’m glad I didn’t need). You can find that early brain-dump in Profiling idea (unsorted from March 2025).

In this post, I’ll focus on the core components of the new profiler, excluding the stack walking and safepoint handler. Hopefully, this won’t be the last article in the series; I’m already researching the next one.

Main Components

There are a few main components of the implementation that come together to form the profiler:

Continue reading

An Experimental Front-End for JFR Queries

Ever wondered how the views of the jfr tool are implemented? There are views like hot-methods which gives the most used methods, or cpu-load-samples that gives you the system load over time that you can directly use on the command line:

> jfr view cpu-load-samples recording.jfr

                                     CPU Load

Time                         JVM User           JVM System           Machine Total
------------------ ------------------ -------------------- -----------------------
14:33:29                        8,25%                0,08%                  29,65%
14:33:30                        8,25%                0,00%                  29,69%
14:33:31                        8,33%                0,08%                  25,42%
14:33:32                        8,25%                0,08%                  27,71%
14:33:33                        8,25%                0,08%                  24,64%
14:33:34                        8,33%                0,00%                  30,67%
...

This is helpful when glancing at JFR files and trying to roughly understand their contents, without loading the files directly into more powerful, but also more resource-hungry, JFR viewers.

In this short blog post, I’ll show you how the views work under the hood using JFR queries and how to use the queries with my new experimental JFR query tool.

I didn’t forget the promised blog post on implementing the new CPU-time profiler in JDK 25; it’ll come soon.

Under the hood, JFR views use a built-in query language to define all views in the view.ini file. The above is, for example, defined as:

[environment.cpu-load-samples]
label = "CPU Load"
table = "SELECT startTime, jvmUser, jvmSystem, machineTotal FROM CPULoad"

With my new query tool (GitHub), we can plot this as:

Continue reading

Java 25’s new CPU-Time Profiler (1)

This is the first part of my series; the other parts are

Back to the blog post:

More than three years in the making, with a concerted effort starting last year, my CPU-time profiler landed in Java with OpenJDK 25. It’s an experimental new profiler/method sampler that helps you find performance issues in your code, having distinct advantages over the current sampler. This is what this week’s and next week’s blog posts are all about. This week, I will cover why we need a new profiler and what information it provides; next week, I’ll cover the technical internals that go beyond what’s written in the JEP. I will quote the JEP 509 quite a lot, thanks to Ron Pressler; it reads like a well-written blog post in and of itself.

Before I show you its details, I want to focus on what the current default method profiler in JFR does:

Continue reading

Taming the Bias: Unbiased* Safepoint-Based Stack Walking in JFR

Two years ago, I still planned to implement a new version of AsyncGetCallTrace in Java. This plan didn’t materialize, but Erik Österlund had the idea to fully walk the stack at safepoints during the discussions. Walking stacks only at safepoints normally would incur a safepoint-bias (see The Inner Workings of Safepoints), but when you record some program state in signal handlers, you can prevent this. I wrote about this idea and its basic implementation in Taming the Bias: Unbiased Safepoint-Based Stack Walking. I’ll revisit this topic in this week’s short blog post because Markus Grönlund took Erik’s idea and started implementing it for the standard JFR method sampler:

Continue reading

Using AI to Create JFR Event Descriptions

JFR (JDK Flight Recorder) is the default profiler for OpenJDK (see my other blog posts for more information). What makes JFR stand out from the other profilers is the ability to log many, many different events that contain lots of information, like information on class loading, JIT compilation, and garbage collection. You can see a list of all available events on my JFR Event Collection website:

This website gives an overview of the events, with descriptions from the OpenJDK, their properties, examples, configurations, and the JDK versions in which every event is present. However, few descriptions are available, and the available texts are mostly single sentences.

TL:DR: I used GPT3.5 to create a description for every event by giving it the part of the OpenJDK source code that creates the event.

For most events, I state the lack of a description, coupled with a request that the knowledgeable reader might contribute one:

But as you can see, there is not really any progress in creating documentation. So, I have two options left:

  1. Ask knowledgeable JDK developers to add descriptions: It’s time-consuming, and it would only be added in the next release
  2. Write the descriptions myself directly for the website: This is pretty time-consuming, with over 150 events per JDK version.
  3. Task AI to interpret the code that creates each event in the JDK source code.

With 1. and 2. infeasible, I started working on the AI approach, implementing it in my JFR event collector tool that collects the information displayed on the website.

I tried to use local AI models for this project but failed, so I started using GPT3.5-turbo and testing it on the OpenAI ChatGPT website. The main structure of my endeavor is as follows:

For every event, I first collect all usages and creations in the OpenJDK source, and then I construct a prompt of the following form that includes the surrounding context of the source locations:

Explain the JFR event <event> concisely so that the reader, proficient in JFR, knows the meaning and relevance of the event to profiling and its fields, without giving code snippets or referencing the code directly, take the following code as the context of its usage and keep it short and structured
(and in markdown format, so format it properly to make it readable, using bullet points for field lists, but nothing else):
<context>

Now some information about the event:
Fields:
<fields>
It is <flags: experimental, internal>

Don’t mention implementation details, like methods, but explain the meaning of the event and how to use it for profiling.
Keep field names lowercase and in backticks.
Don’t use headings.
Don’t repeat yourself.
Keep it short.

For example, I asked ChatGPT the following for the ReservedStackActivation event:

Explain the JFR event ReservedStackActivation […]:

file: src/hotspot/share/runtime/sharedRuntime.cpp

// ... for brevity

frame SharedRuntime::look_for_reserved_stack_annotated_method(JavaThread* current, frame fr) {
  ResourceMark rm(current);
  // ...
   if (found) {
      activation = fr;
      warning("Potentially dangerous stack overflow in "
              "ReservedStackAccess annotated method %s [%d]",
              method->name_and_sig_as_C_string(), count++);
      EventReservedStackActivation event;
      if (event.should_commit()) {
        event.set_method(method);
        event.commit();
      }
    }
  }
  return activation;
}

// ...
The code context is from the JDK repository on GitHub.

I then pass this prompt to GPT-3.5 and obtain the result:

The ReservedStackActivation event in JFR signifies a potentially dangerous stack overflow in a Java method that has been annotated with ReservedStackAccess. It is used for profiling purposes to identify methods that may be causing stack overflows.

The relevant field for this event is:

  • method: Represents the Java method where the potential stack overflow occurred.

My prompt and the code used to find the usage contexts, simple regular expressions without C++ parsing, is quite simplistic, but the results are somewhat usable.

But this event has also a description:

Activation of Reserved Stack Area caused by stack overflow with ReservedStackAccess annotated method in call stack

Why did I choose this event, then? Because it allows you to compare the LLM generated and the OpenJDK developer’s written description. Keep in mind that the LLM did not get passed the event description. The generated version is similar, yet more text.

You can find my implementation on GitHub (GPLv2.0 licensed) and the generated documentation on the JFR Event Collection:

Conclusion

I’m unsure whether I like or dislike the results of this experiment: It’s, on the one hand, great to generate descriptions for events that didn’t have any, using the code as the source of truth. But does it really give new insights, or is it just bloated text? I honestly don’t know whether the website needs it. Therefore, I am currently just generating it for JDK 21 and might remove the feature in the future. The AI can’t replace the insights you get by reading articles on specific events, like Gunnar Morling’s recent post on the NativeMemory events.

Do you have any opinions on this? Feel free to use the usual channels to voice your opinion, and consider improving the JFR documentation if you can.

See you next week with a blog post on something completely different yet slightly related to Panama and the reason for my work behind last week’s From C to Java Code using Panama article. Consider this as my Christmas present to my readers.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. Thanks to Vedran Lerenc for helping me with the LLM part of this project.

Profiling Maven Projects with my IntelliJ Profiler Plugin

Or: I just released version 0.0.11 with a cool new feature that I can’t wait to tell you about…

According to the recent JetBrains survey, most people use Maven as their build system and build Spring Boot applications with Java. Yet my profiling plugin for IntelliJ only supports profiling pure Java run configuration. Configurations where the JVM gets passed the main class to run. This is great for tiny examples where you directly right-click on the main method and profile the whole application using the context menu:

But this is not great when you’re using the Maven build system and usually run your application using the exec goal, or, god forbid, use Spring Boot or Quarkus-related goals. Support for these goals has been requested multiple times, and last week, I came around to implementing it (while also two other bugs). So now you can profile your Spring Boot, like the Spring pet-clinic, application running with spring-boot:run:

Giving you a profile like:

Or your Quarkus application running with quarkus:dev:

Giving you a profile like:

This works specifically by using the options of these goals, which allows the profiler plugin to pass profiling-specific JVM options. If the plugin doesn’t detect a directly supported plugin, it passes the JVM options via the MAVEN_OPTS environment variable. This should work with the exec goals and others.

Gradle script support has also been requested, but despite searching the whole internet till the night, I didn’t find any way to add JVM options to the JVM that Gradle runs for the Spring Boot or run tasks without modifying the build.gradle file itself (see Baeldung).

I left when it was dark and rode out into the night with my bike. Visiting other lost souls in the pursuit of sweet potato curry.

Only Quarku’s quarkusDev task has the proper options so that I can pass the JVM options. So, for now, I only have basic Quarkus support but nothing else. Maybe one of my readers knows how I could still provide profiling support for non-Quarkus projects.

You can configure the options that the plugin uses for specific task prefixes yourself in the .profileconfig.json file:

{
    "additionalGradleTargets": [
        {
            // example for Quarkus
            "targetPrefix": "quarkus",
            "optionForVmArgs": "-Djvm.args",
            "description": "Example quarkus config, adding profiling arguments via -Djvm.args option to the Gradle task run"
        }
    ],
    "additionalMavenTargets": [
        {   // example for Quarkus
            "targetPrefix": "quarkus:",
            "optionForVmArgs": "-Djvm.args",
            "description": "Example quarkus config, adding profiling arguments via -Djvm.args option to the Maven goal run"
        }
    ]
}

This update has been the first one with new features since April. The new features should make life easier for profiling both real-world and toy applications. If you have any other feature requests, feel free to create an issue on GitHub and, ideally, try to create a pull request. I’m happy to help you get started.

See you next week on some topics I have not yet decided on. I have far more ideas than time…

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. Thanks to the issue reporters and all the other people who tried my plugin.

Custom Events in the Blocky World: Using JFR in Minecraft

Java Flight Recorder (JFR) is one of the main open-source profilers for Java, and the only one built directly into the OpenJDK. You can find an introduction to Java profiling in my InfoQ Unleash the Power of Open-Source Profilers article and additional information and presentation on my Profiling Talks page. Furthermore, I wrote an introduction to custom JFR events: Custom JFR Events: A Short Introduction. JFR and custom events are pretty helpful when profiling applications, this blog post gives you an example from the real world.

I was searching for some JFR-related settings on the internet when I stumbled upon the /jfr command that exists in Minecraft:

This, of course, intrigued me, especially as Minecraft apparently adds some custom JFR events:

So I had to check it out. I downloaded and started the Java server, got a demo account, and connected to my local instance. This works with a demo account when you launch the demo world, enable the cheat mode in the settings, kick yourself via “/kick @p,” and then select your own server. I found this via this bug report.

You then must ensure that you have OP privileges and add them, if not via the Minecraft server shell. Then, you can type /jfr start in the chat (launch it by typing T) to start the recording and /jfr stop to stop it.

You see that it’s my first time “playing” Minecraft, and I’m great at getting attacked. It’s probably also my last time.

Minecraft stores the JFR file in the debug folder in the working directory of your server, both as a JFR file and as a JSON file. You can view the JFR file in a JFR viewer of your choice, like JMC or my IntelliJ JFR plugin (web view of the file, JFR file itself), and explore the custom JFR events:

This lets you get insights into the chunk generation and specific traffic patterns of the Minecraft server.

But what does the event specification look like? We could disassemble the Minecraft JAR and potentially get into legal trouble, or we could just use the jfr utility with its metadata command and get an approximation of the event definition from the JFR metadata:

jfr metadata debug/server-2023-11-17-155349.jfr | \
    grep minecraft --after-context=40

The ChunkGeneration event looks as follows:

@Name("minecraft.ChunkGeneration")
@Label("Chunk Generation")
@Category({"Minecraft", "World Generation"})
class ChunkGeneration extends jdk.jfr.Event {
  @Label("Start Time")
  @Timestamp("TICKS")
  long startTime;

  @Label("Duration")
  @Timespan("TICKS")
  long duration;

  @Label("Event Thread")
  @Description("Thread in which event was committed in")
  Thread eventThread;

  @Label("Stack Trace")
  @Description("Stack Trace starting from the method the event was committed in")
  StackTrace stackTrace;

  @Label("First Block X World Position")
  int worldPosX;

  @Label("First Block Z World Position")
  int worldPosZ;

  @Label("Chunk X Position")
  int chunkPosX;

  @Label("Chunk Z Position")
  int chunkPosZ;

  @Label("Status")
  String status;

  @Label("Level")
  String level;
}

You can find all defined events here. The actual implementation of these events is only slightly larger because some events accumulate data over a period of time.

I’m, of course, not the first OpenJDK developer who stumbled upon these custom events. Erik Gahlin even found them shortly after their addition in 2021 and promptly created an issue to recommend improvements (see MC-236873):

Conclusion

In my previous blog post, I showed you how to create custom JFR events for a small sample application. Seeing custom events in Minecraft shows you that custom events are used in the wild by applications used by millions of users, helping developers improve the performance of their applications.

If you’re looking for more information on custom JFR events, I would recommend reading my previous blog post Custom JFR Events: A Short Introduction and Gunnar Morlings Monitoring REST APIs with Custom JDK Flight Recorder Events article. See you in a couple of days with a new blog post on JFR event configuration.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Custom JFR Events: A Short Introduction

JDK Flight Recorder (JFR) is one of the two prominent open-source profilers for the OpenJDK (besides async-profiler). It offers many features (see Profiling Talks) and the ability to observe lots of information by recording over one hundred different events. If you want to know more about the existing events, visit my JFR Event Collection website (related blog post):

Besides these built-in events, JFR allows you to implement your events to record custom information directly in your profiling file.

Let’s start with a small example to motivate this. Consider for a moment that we want to run the next big thing after Software-as-a-Service: Math-as-a-Service, a service that provides customers with the freshest Fibonacci numbers and more.

We develop this service using Javalin:

public static void main(String[] args) throws Exception {
    // create a server with 4 threads in the thread pool                                                                               
    Javalin.create(conf -> {                                                  
            conf.jetty.server(() ->                                           
                new Server(new QueuedThreadPool(4))                           
            );                                                                
            })                                                                
            .get("/fib/{fib}", ctx -> {                                       
                handleRequest(ctx, newSessionId());                           
            })                                                                
            .start(7070);                                                     
    System.in.read();                                                         
}                                                                             
                                                                              
static void handleRequest(Context ctx, int sessionId) {                       
    int n = Integer.parseInt(ctx.pathParam("fib"));
    // log the current session and n                           
    System.out.printf("Handle session %d n = %d\n", sessionId, n);            
    // compute and return the n-th fibonacci number                                                        
    ctx.result("fibonacci: " + fib(n));                                                                                                 
}                                                                             
                                                                              
public static int fib(int n) {                                                
    if (n <= 1) {                                                             
        return n;                                                             
    }                                                                         
    return fib(n - 1) + fib(n - 2);                                           
}                                                                                                                                                     

This is a pretty standard tiny web endpoint, minus all the user and session handling. It lets the customer query the n-th Fibonacci number by querying /fib/{n}. Our built-in logging prints n and the session ID on standard out, but what if we want to store it directly in our JFR profile while continuously profiling our application?

This is where custom JFR events come in handy:

public class SessionEvent extends jdk.jfr.Event {
    int sessionId;
    int n;

    public SessionEvent(int sessionId, int n) {
        this.sessionId = sessionId;
        this.n = n;
    }
}

The custom event class extends the jdk.jfr.Event class and simply define a few fields for the custom data. These fields can be annotated with @Label("Human readable label") and @Description("Longer description") to document them.

We can now use this event class to record the relevant data in the handleRequest method:

static void handleRequest(Context ctx, int sessionId) {            
    int n = Integer.parseInt(ctx.pathParam("fib"));                
    System.out.printf("Handle session %d n = %d\n", sessionId, n);
    // create event 
    var event = new SessionEvent(sessionId, n);
    // add start and stacktrace                   
    event.begin();                                                 
    ctx.result("fibonacci: " + fib(n));
    // add end and store                          
    event.commit();                                                
}                                                                  

This small addition records the timing and duration of each request, as well as n and the session ID in the JFR profile. The sample code, including a request generator, can be found on GitHub. After we ran the server, we can view the recorded events in a JFR viewer, like JDK Mission Control or my JFR viewer (online view):

This was my short introduction to custom JFR events; if you want to learn more, I highly recommend Gunnar Morlings Monitoring REST APIs with Custom JDK Flight Recorder Events article. Come back next week for a real-world example of custom JFR events.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Putting JFR into Context

Have you ever wanted to bring your JFR events into context? Adding information on sessions, user IDs, and more can improve your ability to make sense of all the events in your profile. Currently, we can only add context by creating custom JFR events, as I presented in my Profiling Talks:

We can use these custom events (see Custom JFR Events: A Short Introduction and Custom Events in the Blocky World: Using JFR in Minecraft) to store away the information and later relate them to all the other events by using the event’s time, duration, and thread. This works out-of-the-box but has one major problem: Relating events is quite fuzzy, as time stamps are not as accurate (see JFR Timestamps and System.nanoTime), and we do all of this in post-processing.

But couldn’t we just attach some context to every JFR event we’re interested in? Not yet, but Jaroslav Bachorik from DataDog is working on it. Recently, he wrote three blog posts (1, 2, 3). The following is a different take on his idea, showing how to use it in a small file server example.

The main idea of Jaroslav’s approach is to store a context in thread-local memory and attach it to every JFR event as configured. But before I dive into the custom context, I want to show you the example program, which you can find, as always, MIT-licensed on GitHub.

Example

We create a simple file server via Javalin, which allows a user to

  • Register (URL schema register/{user})
  • Store data in a file (store/{user}/{file}/{content})
  • Retrieve file content (load/{user}/{file})
  • Delete files (delete/{user}/{file})

The URLs are simple to use, and we don’t bother about error handling, user authentication, or large files, as this would complicate our example. I leave it as an exercise for the inclined reader. The following is the most essential part of the application: the server declaration:

FileStorage storage = new FileStorage();                                                               
try (Javalin lin = Javalin.create(conf -> {                                                            
            conf.jetty.server(() ->                                                                    
                    new Server(new QueuedThreadPool(4))                                                
            );                                                                                         
        })                                                                                             
        .exception(Exception.class, (e, ctx) -> {                                                      
            ctx.status(500);                                                                           
            ctx.result("Error: " + e.getMessage());                                                    
            e.printStackTrace();                                                                       
        })                                                                                             
        .get("/register/{user}", ctx -> {                                                              
            String user = ctx.pathParam("user");                                                       
            storage.register(user);                                                                    
            ctx.result("registered");                                                                  
        })                                                                                             
        .get("/store/{user}/{file}/{content}", ctx -> {                                                
            String user = ctx.pathParam("user");                                                       
            String file = ctx.pathParam("file");                                                       
            storage.store(user, file, ctx.pathParam("content"));                                       
            ctx.result("stored");                                                                      
        })                                                                                             
        .get("/load/{user}/{file}", ctx -> {                                                           
            String user = ctx.pathParam("user");                                                       
            String file = ctx.pathParam("file");                                                       
            ctx.result(storage.load(user, file));                                                      
        })                                                                                             
        .get("/delete/{user}/{file}", ctx -> {                                                         
            String user = ctx.pathParam("user");                                                       
            String file = ctx.pathParam("file");                                                       
            storage.delete(user, file);                                                                
            ctx.result("deleted");                                                                     
        })) {                                                                                          
    lin.start(port);                                                                                   
    Thread.sleep(100000000);                                                                           
} catch (InterruptedException ignored) {                                                               
}                                                                                                      

This example runs on Jaroslav’s OpenJDK fork (commit 6ea2b4f), so if you want to run it in its complete form, please build the fork and make sure that you’re PATH and JAVA_HOME environment variables are set accordingly.

You can build the server using mvn package and
start it, listening on the port 1000, via:

java -jar target/jfr-context-example.jar 1000

You can then use it via your browser or curl:

# start the server
java -XX:StartFlightRecording=filename=flight.jfr,settings=config.jfc \
-jar target/jfr-context-example.jar 1000 &
pid=$!

# register a user
curl http://localhost:1000/register/moe

# store a file
curl http://localhost:1000/store/moe/hello_file/Hello

# load the file
curl http://localhost:1000/load/moe/hello_file
-> Hello

# delete the file
curl http://localhost:1000/delete/moe/hello_file

kill $pid

# this results in the flight.jfr file

To make testing easier, I created the test.sh script, which starts the server, registers a few users and stores, loads, and deletes a few files, creating a JFR file along the way. We're using a custom JFR configuration to enable the IO events without any threshold. This is not recommended for production but is required in our toy example to get any such event:

<?xml version="1.0" encoding="UTF-8"?>

<configuration version="2.0" label="Custom" description="Custom config for the example"
  provider="Johannes Bechberger">
    <event name="jdk.FileRead" withContext="true">
        <setting name="enabled">true</setting>
        <setting name="stackTrace">true</setting>
        <setting name="threshold" control="file-threshold">0 ms</setting>
    </event>

    <event name="jdk.FileWrite" withContext="true">
        <setting name="enabled">true</setting>
        <setting name="stackTrace">true</setting>
        <setting name="threshold" control="file-threshold">0 ms</setting>
    </event>
</configuration>

We can use the jfr tool to easily print all the jdk.FileRead events from the created flight.jfr file in JSON format:

jfr print --events jdk.FileRead --json flight.jfr

This prints a list of events like:

{
  "type": "jdk.FileRead", 
  "values": {
    "startTime": "2023-10-18T14:31:56.369071625+02:00", 
    "duration": "PT0.000013042S", 
    "eventThread": {
      "osName": "qtp2119992687-32", 
      ...
    }, 
    "stackTrace": {
      "truncated": false, 
      "frames": [...]
    }, 
    "path": "\/var\/folders\/nd\/b8fyk_lx25b1ndyj4kmb2hk403cmxz\/T\/tmp13266469351066000997\/moe\/test_1", 
    "bytesRead": 8, 
    "endOfFile": false
  }
}

You can find more information on this and other events in my JFR Event Collection:

There are, of course, other events, but in our file server example, we’re only interested in file events for now (this might change as Jaroslav adds more features to his fork).

Now, we can start bringing the events into context.

Adding Custom Context

Before we can add the context, we have to define it, as described in Jaroslav’s blog post. We create a context that stores the current user, action, trace ID, and optional file:

@Name("tracer-context")
@Description("Tracer context type tuple")
public class TracerContextType extends ContextType implements AutoCloseable {

    private static final AtomicLong traceIdCounter = new AtomicLong(0);

    // attributes are defined as plain public fields annotated by at least @Name annotation
    @Name("user")
    @Description("Registered user")
    public String user;

    @Name("action")
    @Description("Action: register, store, load, delete")
    public String action;

    @Name("file")
    @Description("File if passed")
    public String file;

    // currently no primitives allowed here
    @Name("trace")
    public String traceId;

    public TracerContextType(String user, String action, String file) {
        this.user = user;
        this.action = action;
        this.file = file;
        this.traceId = "" + traceIdCounter.incrementAndGet();
        this.set();
    }

    public TracerContextType(String user, String action) {
        this(user, action,"");
    }

    @Override
    public void close() throws Exception {
        unset();
    }
}

A context has to be set and then later unset, which can be cumbersome in the face of exceptions. Implementing the AutoClosable interface solves this by allowing us to wrap code in a try-with-resources statement:

try (var t = new TracerContextType(/* ... */)) {
    // ...
}

All JFR events with enabled context that happen in the body of the statement are associated with the TracerContextType instance. We can use the code of all request handlers in our server with such a construct, e.g.:

.get("/store/{user}/{file}/{content}", ctx -> {                 
    String user = ctx.pathParam("user");                        
    String file = ctx.pathParam("file");                        
    try (var t = new TracerContextType(user, "store", file)) {  
        storage.store(user, file, ctx.pathParam("content"));    
        ctx.result("stored");                                   
    }                                                           
})                                                              

One last thing before we can analyze the annotated events: JFR has to know about your context before the recording starts. We do this by creating a registration class registered as a service.

@AutoService(ContextType.Registration.class)
public class TraceContextTypeRegistration implements ContextType.Registration {

    @Override
    public Stream<Class<? extends ContextType>> types() {
        return Stream.of(TracerContextType.class);
    }
}

We use the auto-service project by Google to automatically create the required build files (read more in this blog post by Pedro Rijo.

Using the Custom Context

After adding the context, we can see it in the jdk.FileRead events:

{
  "type": "jdk.FileRead", 
  "values": {
    "startTime": "2023-10-18T14:31:56.369071625+02:00", 
    "duration": "PT0.000013042S", 
    "eventThread": {
      "osName": "qtp2119992687-32", 
      ...
    }, 
    "stackTrace": {
      "truncated": false, 
      "frames": [...]
    }, 
    "tracer-context_user": "moe", 
    "tracer-context_action": "load", 
    "tracer-context_file": "test_1", 
    "tracer-context_trace": "114", 
    "path": "\/var\/folders\/nd\/b8fyk_lx25b1ndyj4kmb2hk403cmxz\/T\/tmp13266469351066000997\/moe\/test_1", 
    "bytesRead": 8, 
    "endOfFile": false
  }
}

We clearly see the stored context information (tracer-context_*).

Using the jq tool, we can analyze the events, like calculating how many bytes the server has read for each user:

➜ jfr print --events jdk.FileRead --json flight.jfr |
  jq -r '
    .recording.events
    | group_by(.values."tracer-context_user")
    | map({
      user: .[0].values."tracer-context_user",
      bytesRead: (map(.values.bytesRead) | add)
    })
   | map([.user, .bytesRead])
   | ["User", "Bytes Read"]
   , .[]
   | @tsv
 '
User    Bytes Read
        3390101
bob     80
curly   100
frank   100
joe     80
john    90
larry   100
mary    90
moe     80
sally   100
sue     80

The empty user is for all the bytes read unrelated to any specific user (like class files), which is quite helpful.

Conclusion

This small example is just a glimpse of what is possible with JFR contexts. Jaroslav’s prototypical implementation is still limited; it, e.g., doesn’t support contexts at method sampling events, but it is already a significant improvement over the status quo. I’ll be creating follow-up blog posts as the prototype evolves and matures.

Thanks for coming so far, and see you next week for another blog post and maybe at a meet-up or conference (see Talks).

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Taming the Bias: Unbiased Safepoint-Based Stack Walking

Walking only at safepoints has advantages: The main one is that you aren’t walking the stack in a signal handler but synchronously to the executed program. Therefore you can allocate memory, acquire locks and rematerialize virtual thread / Loom frames. The latter is significant because virtual threads are the new Java feature that cannot support using signal-handler-based APIs like AsyncGetCallTrace.

This blog post is based on the ideas of Erik Österlund, and the second one is related to these new ideas. The first one is AsyncGetCallTrace Reworked: Frame by Frame with an Iterative Touch!, which you should read before continuing with this post. For a refresher on safepoints, please read The Inner Workings of Safepoints.

Erik summed up the problems with my previous JEP proposal, and in a way with AsyncGetCallTrace, quite nicely:

Well the current proposal doesn’t have a clear story for
1) Making it safe
2) Working with virtual threads
3) Supporting incremental stack scanning
4) Supporting concurrent stack scanning

He proposed that walking Java threads only at safepoints while obtaining some information in the signal handler might do the trick. So I got to work, implementing an API that does just this.

Idea

The current interaction between a sampler of the profiler and the Java Threads looks like the following:

The sampler thread signals every Java thread using POSIX signals and then obtains the full trace directly in the signal handler while the thread is paused at an arbitrary location. I explored variations of this approach in my post Couldn’t we just Use AsyncGetCallTrace in a Separate Thread?

My new approach, on the contrary, walks the Java thread in a signal handler till we find the first bytecode-backed Java frame, stores this in the thread-local queue, triggers a safepoint, and then walks the full Java stack at these safepoints for all enqueued top-frames. We, therefore, have a two-step process:

Instead of just walking the stack in the signal handler:

The new API exploits a few implementation details of the OpenJDK:

  1. There is a safepoint check at least at the end of every non-inlined method (and sometimes there is not, but this is a bug, see The Inner Workings of Safepoints). OpenJ9 doesn’t have checks at returns, so the whole approach I am proposing doesn’t work for them.
  2. When we are at the return of a non-inlined method, we have enough information to obtain all relevant information of the top inlined and the first non-inlined frame using only the program counter, stack pointer, frame pointer, and bytecode pointer obtained in the signal handler. We focus on the first non-inlined method/frame, as inlined methods don’t have physical frames, and walking them would result in walking using Java internal information, which we explicitly want to avoid.

Proposed API

This API builds upon the API defined in jmethodIDs in Profiling: A Tale of Nightmares and the iterator API defined in AsyncGetCallTrace Reworked: Frame by Frame with an Iterative Touch!

But, in contrast to the other parts of the API, this new safepoint-based part only works when the previously defined conditions hold. This is not the case in OpenJ9, so I propose making the new feature optional. But how do profilers know whether an implementation supports an optional part of the API? By using the ASGST_Capabilities:

// Implementations don't have to implement all methods,
// only the iterator related and those that match 
// their capabilities
enum ASGST_Capabilities {
  ASGST_REGISTER_QUEUE = 1, // everything safepoint queue related
  ASGST_MARK_FRAME     = 2  // frame marking related
};

Profilers can query the capability bit map by calling the int ASGST_Capabilities() and should use the signal handler-based approach whenever the capability bit ASGST_REGISTER_QUEUE is absent. ASGST_MARK_FRAME foreshadows a new feature based on stack watermarks, see JEP 376, which I cover in a follow-up blog post. Calling an unsupported API method is undefined.

Now back to the actual API itself. The main two methods of the proposed API are ASGST_RegisterQueue and ASGST_Enqueue. You typically first register a queue for the current thread using ASGST_RegisterQueue, typically in a ThreadStart JVMTI event handler:

typedef void (*ASGST_Handler)(ASGST_Iterator*,
                              void* queue_arg,
                              void* arg);

// Register a queue to the current thread 
// (or the one passed via env)
// @param fun handler called at a safe point with iterators,
//   the argument for RegisterQueue and the argument 
//   passed via Enqueue
//
// The handler can only call safe point safe methods, 
// which excludes all JVMTI methods, but the handler 
// is not called inside a signal handler, so allocating 
// or obtaining locks is possible
//
// Not signal safe, requires ASGST_REGISTER_QUEUE capability
ASGST_Queue* ASGST_RegisterQueue(JNIEnv* env, int size, 
  int options, ASGST_Handler fun, void* argument);

A queue has a fixed size and has a registered handler, which is called for every queue item in insertion order at every safepoint, after which the queue elements are removed. Be aware that you cannot obtain the top frames using the queue handler and cannot call any JVMTI methods, but also that you aren’t bound to signal safe methods in the handler.

The ASGST_Enqueue method obtains and enqueues the top frame into the passed queue, as well as triggering a thread-local handshake/safepoint:

// Enqueue the processing of the current stack 
// at the end of the queue and return the kind 
// (or error if <= 0)
// you have to deal with the top C and native frames 
// yourself (but there is an option for this)
//
// @param argument argument passed through 
//   to the ASGST_Handler for the queue as the third argument
// @return kind or error, 
//   returns ASGST_ENQUEUE_FULL_QUEUE if queue is full
//   or ASGST_ENQUEUE_NO_QUEUE if queue is null
//
// Signal safe, but has to be called with a queue 
// that belongs to the current thread, or the thread
// has to be stopped during the duration of this call
// Requires ASGST_REGISTER_QUEUE capability
int ASGST_Enqueue(ASGST_Queue* queue, void* ucontext, 
  void* argument);

The passed argument is passed directly to the last parameter of the queue handler. Be aware of handling the case that the queue is full. Typically one falls back onto walking the stack in the signal handler or compressing the queue. The elements of a queue, including the arguments, can be obtained using the ASGST_GetQueueElement method:

// Returns the nth element in the queue (from the front),
// 0 gives you the first/oldest element.
// -1 gives you the youngest element, ..., -size the oldest.
//
// Modification of the returned element are allowed, 
// as long as the queue's size has not been modified 
// between the call to ASGST_GetQueueElement and the 
// modification (e.g. by calling ASGST_ResizeQueue).
//
// Modifiying anything besides the arg field
// is highly discouraged.
//
// @returns null if n is out of bounds
//
// Signal safe
ASGST_QueueElement* ASGST_GetQueueElement(ASGST_Queue* queue, 
  int n);

The critical detail is that modifying the arg field is supported; this allows us to do queue compression: In the signal handler, we obtain the last element in the queue using the ASGST_GetQueueElement method and then get the currently enqueuable element using ASGST_GetEnqueuableElement. We can then check whether both elements are equal and then update the argument, omitting to enqueue the current ucontext.

Another helper method is ASGST_ResizeQueue which can be used to set the queue size:

// Trigger the resizing of the queue at end of the next safepoint
// (or the current if currently processing one)
//
// Signal safe, but has to be called with a queue 
// that belongs to the current thread
// Requires ASGST_REGISTER_QUEUE capability
void ASGST_ResizeQueue(ASGST_Queue* queue, int size);

The current queue size and more can be obtained using ASGST_QueueSizeInfo:

typedef struct {
  jint size; // size of the queue
  jint capacity; // capacity of the queue
  jint attempts; // attempts to enqueue since last safepoint end
} ASGST_QueueSizeInfo;

// Returns the number of elements in the queue, its capacity,
// and the number of attempts since finishing the previous 
// safepoint
//
// Signal safe, but only proper values in queues thread
ASGST_QueueSizeInfo ASGST_GetQueueSizeInfo(ASGST_Queue* queue);

This returns the defined size/capacity, the current number of elements, and the number of enqueue attempts, including unsuccessful ones. This can be used in combination with ASGST_ResizeQueue to dynamically adjust the size of these queues.

One might want to remove a queue from a thread; this can be done using the non-signal safe method ASGST_DeregisterQueue.

Lastly, one might want to be triggered before and after a non-empty queue is processed:

// Handler that is called at a safe point with enqueued samples
// before and after processing
//
// called with the queue, a frame iterator, and the OnQueue 
// argument frame iterator is null if offerIterator at handler 
// registration was false
typedef void (*ASGST_OnQueueSafepointHandler)(ASGST_Queue*, 
                                              ASGST_Iterator*, 
                                              void*);

// Set the handler that is called at a safe point before 
// the elements in the (non-empty) queue are processed.
//
// @param before handler or null to remove the handler
//
// Not signal safe, requires ASGST_REGISTER_QUEUE capability
void ASGST_SetOnQueueProcessingStart(ASGST_Queue* queue, 
  int options, bool offerIterator, 
  ASGST_OnQueueSafepointHandler before, void* arg);

// Set the handler that is called at a safe point after 
// the elements in the (non-empty) queue are processed.
//
// @param after handler or null to remove the handler
//
// Not signal safe, requires ASGST_REGISTER_QUEUE capability
void ASGST_SetOnQueueProcessingEnd(ASGST_Queue* queue,
  int options, bool offerIterator, 
  ASGST_OnQueueSafepointHandler end, void* arg);

This should enable performance optimizations, enabling the profiler to walk the whole stack, e.g., only once per queue processing safepoint.

This is the whole API that can be found in my OpenJDK fork with the profile2.h header. The current implementation is, of course, a prototype; there are, e.g., known inaccuracies with native (C to Java) frames on which I’m currently working.

But how can we use this API? I use the same profiler from the AsyncGetCallTrace Reworked: Frame by Frame with an Iterative Touch! blog post to demonstrate using the new API.

Implementing a Small Profiler

The best thing: The code gets more straightforward and uses locks to handle concurrency. Writing code that runs at safepoints is far easier than code in signal handlers; the new API moves complexity from the profiler into the JVM.

But first, you have to build and use my modified OpenJDK as before. This JDK has been tested on x86 and aarch64. The profiler API implementation is still a prototype and contains known errors, but it works well enough to build a small profiler. Feel free to review the code; I’m open to help, suggestions, or sample programs and tests.

To use this new API, you have to include the profile2.h header file, there might be some linker issues on Mac OS, so add -L$JAVA_HOME/lib/server -ljvm to your compiler options.

Now to the significant changes to the version that walks the stack in the signal handler written for the previous blog post. First, we have to register a queue into every thread; we do this in the ThreadStart JVMTI event handler and store the result in a thread-local queue variable:

thread_local ASGST_Queue* queue;
// ...
void JNICALL
OnThreadStart(jvmtiEnv *jvmti_env,
            JNIEnv* jni_env,
            jthread thread) {
  // the queue is large, but aren't doing any  compression, 
  // so we need it
  queue = ASGST_RegisterQueue(jni_env, 10'000, 0, &asgstHandler, 
    (void*)nullptr);
  // ...
}

We then have to enqueue the last Java frames into the queue in the signal handler:

static void signalHandler(int signo, siginfo_t* siginfo, 
 void* ucontext) {
  totalTraces++;
  // queue has not been initialized
  if (queue == nullptr) {
    failedTraces++;
    return;
  }
  int res = ASGST_Enqueue(queue, ucontext, (void*)nullptr);
  if (res != 1) { // not Java trace
    failedTraces++;
    if (res == ASGST_ENQUEUE_FULL_QUEUE) {
      // we could do some compression here
      // but not in this example
      queueFullTraces++;
    }
  }
}

We record the total traces, the failed traces, and the number of times the queue had been full. The enqueued frames are processed using the asgstHandler method at every safepoint. This method obtains the current trace and stores it directly in the flame graph, acquiring the lock to prevent data races:

// we can acquire locks during safepoints
std::mutex nodeLock;
Node node{"main"};

void asgstHandler(ASGST_Iterator* iterator, void* queueArg, 
 void* arg) {
  std::vector<std::string> names;
  ASGST_Frame frame;
  int count;
  for (count = 0; ASGST_NextFrame(iterator, &frame) == 1 &&
         count < MAX_DEPTH; count++) {
    names.push_back(methodToString(frame.method));
  }
  // lets use locks to deal with the concurrency
  std::lock_guard<std::mutex> lock{nodeLock};
  node.addTrace(names);
}

That’s all. I might write a blog post on compression in the future, as the queues tend to fill up in wall-clock mode for threads that wait in native.

You can find the complete code on GitHub; feel free to ask any yet unanswered questions. To use the profiler, just run it from the command line as before:

java -agentpath:libSmallProfiler.so=output=flames.html \
  -cp samples math.MathParser

This assumes that you use the modified OpenJDK. MathParser is a demo program that generates and evaluates simple mathematical expressions. The resulting flame graph should look something like this:

Conclusion

The new API can be used to write profilers easier and walk stacks in a safe yet flexible manner. A prototypical implementation of the API showed accuracy comparable to AsyncGetCallTrace when we ignore the native frames. Using the queues offers ample opportunities for profile compression and incremental stack walking, only walking the new stacks for every queue element.

I want to come back to the quote from Erik that I wrote in the beginning, answering his concerns one by one:

Well the current proposal doesn’t have a clear story for
1) Making it safe
2) Working with virtual threads
3) Supporting incremental stack scanning
4) Supporting concurrent stack scanning

  1. Walking at Java frames at safepoints out of signal handlers makes the stack walking safer, and using improved method ids helps with the post-processing.
  2. Walking only at safepoints should make walking virtual threads possible; it is yet to be decided how to expose virtual threads in the API. But the current API is flexible enough to accommodate it.
  3. and 4. Stack watermarks allow profilers to implement incremental and concurrent stack walking, which should improve performance and offer the ability to compress stack traces—more on this in a future blog post.

Thank you for joining me on my API journey; I’m open to any suggestions; please reach me using the typical channels.

Just keep in mind:

This project is part of my work in the SapMachine team at SAP, making profiling easier for everyone. Thanks to Erik Österlund for the basic idea, and to Jaroslav Bachorik for all the feedback and help on the JEP.

jmethodIDs in Profiling: A Tale of Nightmares

jmethodIDs identify methods in many low-level C++ JVM API methods (JVMTI). These ids are used in debugging related methods like SetBreakpoint(jvmtiEnv*,jmethodID,jlocation) and, of course, in the two main profiling APIs in the OpenJDK, GetStackTrace, and AsyncGetCallTrace (ASGCT):

JVMTI has multiple helper methods to get the methods name, signature, declaring class, modifiers, and more for a given jmethodID. Using these IDs is, therefore, an essential part of developing profilers but also a source of sorrow:

Honestly, I don’t see a way to use jmethodID safely.

Jaroslav Bachorik, profiler developer

In this blog post, I will tell you about the problems of jmethodID that keep profiler writers awake at night and how I intend to remedy the situation for profiler writers in JEP 435.

Background

But first: What are jmethodIDs, and how are they implemented?

[A jmethodID] identifies a Java programming language method, initializer, or constructor. jmethodIDs returned by JVMTI functions and events may be safely stored. However, if the class is unloaded, they become invalid and must not be used.

JVMTI SPECIFICATION

In OpenJDK, they are defined as pointers to an anonymous struct (source). Every Java method is backed by an object of the Method class in the JDK. jmethodIDs are actually just pointing to a pointer that points to the related method object (source):

This indirection creates versatility: The jmethodID stays the same when methods are redefined (see Instrumenting Java Code to Find and Handle Unused Classes for an example of a Java agent which redefines classes).

This is not true for jclass, the jmethodID pendant for classes that points directly to a class object:

The jclass becomes invalid if the class is redefined.

jmethodIDs are allocated on demand because they can stay with the JVM till the defining class is unloaded. The indirections for all ids are stored in the jmethodID cache of the related class (source). This cache has a lock to guard its parallel access from different threads, and the cache is dynamically sized (similar to the ArrayList implementation) to conserve memory.

OpenJ9 also uses an indirection (source), but my understanding of the code base is too limited to make any further claims, so the rest of the blog post is focused on OpenJDK. Now over to the problems for profiler writers:

Problems

The fact that jmethodIDs are dynamically allocated in resizable caches causes major issues: Common profilers, like async-profiler, use AsyncGetCallTrace, as stated in the beginning. ASGCT is used inside signal handlers where obtaining a lock is unsupported. So the profiler has to ensure that every method that might appear in a trace (essentially every method) has an allocated jmethodID before the profiling starts. This leads to significant performance issues when attaching profilers to a running JVM. This is especially problematic in OpenJDK 8:

[…] the quadratic complexity of creating new jmethodIDs during class loading: for every added jmethodID, HotSpot runs a linear scan through the whole list of previously added jmethodIDs trying to find an empty slot, when there are usually none. In extreme cases, it took hours (!) to attach async-profiler to a running JVM that had hundreds thousands classes: https://github.com/async-profiler/async-profiler/issues/221

Andrei Pangin, developer of Async-Profiler

A jmethodID becomes invalid when its defining class is unloaded. Still, there is no way for a profiler to know when a jmethodID becomes invalid or even get notified when a class is unloaded. So processing a newly observed jmethodID and obtaining the name, signature, modifiers, and related class, should be done directly after obtaining the id. But this is impossible as all accessor methods allocate memory and thereby cannot be used in signal handlers directly after AsyncGetCallTrace invocations.

As far as I know, methods can be unloaded concurrently to
the native code executing JVMTI functions. This introduces a potential race
condition where the JVM unloads the methods during the check->use flow,
making it only a partial solution. To complicate matters further, no method
exists to confirm whether a jmethodID is valid.

Theoretically, we could monitor the CompiledMethodUnload event to track
the validity state, creating a constantly expanding set of unloaded
jmethodID values or a bloom filter, if one does not care about few
potential false positives. This strategy, however, doesn’t address the
potential race condition, and it could even exacerbate it due to possible
event delays. This delay might mistakenly validate a jmethodID value that
has already been unloaded, but for which the event hasn’t been delivered
yet.

Honestly, I don’t see a way to use jmethodID safely unless the code using
it suspends the entire JVM and doesn’t resume until it’s finished with that
jmethodID. Any other approach might lead to JVM crashes, as we’ve
observed with J9.

Jaroslav Bachorik ON ThE OpenJDK MailingList

(Concurrent) class unloading, therefore, makes using all profiling APIs inherently unsafe.

jclass ids suffer from the same problems, but ses, we could just process all jmethodIDs and jclass ids, whenever a class is loaded and store all information on all classes, but this would result in a severe performance penalty, as only a subset of all methods actually appears in the observed traces. This approach feels more like a hack.

While jmethodIDs are pretty helpful for other applications like writing debuggers, they are unsuitable for profilers. As I’m currently in the process of developing a new profiling API, I started looking into replacements for jmethodIDs that solve all the problems mentioned before:

Solution

My solution to all these problems is ASGST_Method and ASGST_Class, replacements for jmethodID and jclass, with signal-safe helper methods and a proper notification mechanism for class, unloads, and redefinitions.

The level of indirection that jmethodID offers is excellent, but directly mapping ASGST_Method to method objects removes the problematic dynamic jmethodID allocations. The main disadvantage is that class redefinitions cause a method to have a new ASGST_Method id and a new ASGST_Class id. We solve this the same way JFR solves it:

We use a class local id (idnum) for every method and a JVM internal class idnum, which are both redefinition invariant. The combination of class and method idnum (cmId) is then a unique id for a method. The problem with this approach is that mapping a cmId to an ASGST_Method or a method object is prohibitively expensive as it requires the JVM to check all methods of all classes. Yet this is not a problem in the narrow space of profiling, as a self-maintained mapping from a cmId to collected method information is enough.

The primary method for getting the method information, like name and signature, is ASGST_GetMethodInfo in my proposal:

// Method info
// You have to preallocate the strings yourself 
// and store the lengths in the appropriate fields, 
// the lengths are set to the respective
// string lengths by the VM, 
// be aware that strings are null-terminated
typedef struct {
  ASGST_Class klass;
  char* method_name;
  jint method_name_length;
  char* signature;
  jint signature_length;
  char* generic_signature;
  jint generic_signature_length;
  jint modifiers;
  jint idnum; // class local id, doesn't change with redefinitions
  jlong class_idnum; // class id that doesn't change
} ASGST_MethodInfo;

// Obtain the method information for a given ASGST_Method and 
// store it in the pre-allocated info struct.
// It stores the actual length in the *_len fields and 
// a null-terminated string in the string fields.
// A field is set to null if the information is not available.
//
// Signal safe
void ASGST_GetMethodInfo(ASGST_Method method,
                         ASGST_MethodInfo* info);

jint ASGST_GetMethodIdNum(ASGST_Method method);

The similar ASGST_Class related is ASGST_GetClassInfo:

// Class info, like the method info
typedef struct {
  char* class_name;
  jint class_name_length;
  char* generic_class_name;
  jint generic_class_name_length;
  jint modifiers;
  jlong idnum; // id, doesn't change with redefinitions
} ASGST_ClassInfo;

// Similar to GetMethodInfo
//
// Signal safe
void ASGST_GetClassInfo(ASGST_Class klass,
                        ASGST_ClassInfo* info);

jlong ASGST_GetClassIdNum(ASGST_Class klass);

Both methods return a subset of the information available through JVMTI methods. The only information missing that is required for profilers is the mapping from method byte-code index to line number:

typedef struct {
  jint start_bci;
  jint line_number;
} ASGST_MethodLineNumberEntry;

// Populates the method line number table, 
// mapping BCI to line number.
// Returns the number of written elements
//
// Signal safe
int ASGST_GetMethodLineNumberTable(ASGST_Method method, 
  ASGST_MethodLineNumberEntry* entries, int length); 

All the above methods are signal safe so the profiler can process the methods directly. Nonetheless, I propose conversion methods so that the profiler writer can use jmethodIDs and jclass ids whenever needed, albeit with the safety problems mentioned above:

jmethodID ASGST_MethodToJMethodID(ASGST_Method method);

ASGST_Method ASGST_JMethodIDToMethod(jmethodID methodID);

jclass ASGST_ClassToJClass(ASGST_Class klass);

ASGST_Class ASGST_JClassToClass(jclass klass);

The last part of my proposal deals with invalid class and method ids: I propose a call-back for class unloads, and redefinitions, which is called shortly before the class and the method ids become invalid. In this handler, the profiler can execute its own code, but no JVMTI methods and only the ASGST_* methods that are signal-safe.

Remember that the handler can be executed concurrently, as classes can be unloaded concurrently. Class unload handlers must have the following signature:

void ASGST_ClassUnloadHandler(ASGST_Class klass, 
  ASGST_Method *methods, int count, bool redefined, void* arg);

These handlers can be registered and deregistered:

// Register a handler to be called when class is unloaded
//
// not signal and safe point safe
void ASGST_RegisterClassUnloadHandler(
  ASGST_ClassUnloadHandler handler, void* arg);

// Deregister a handler to be called when a class is unloaded
// @returns true if handler was present
//
// not signal and safe point safe
bool ASGST_DeregisterClassUnloadHandler(
  ASGST_ClassUnloadHandler handler, void* arg);

The arg parameter is passed directly to the handler as context information. This is due to the non-existence of proper closures or lambdas in C.

You might wonder we my API would allow multiple handlers. This is because a JVM should support multiple profilers at once.

Conclusion

jmethodIDs are unusable for profiling and cause countless errors, as every profiler will tell you. In this blog post, I offered a solution I want to integrate into the new OpenJDK profiling API (JEP 435). My proposal provides the safety that profiler writers crave. If you have any opinions on this proposal, please let me know. You can find a draft implementation can be found on GitHub.

See you next week with a blog post on safe points and profiling.

This project is part of my work in the SapMachine team at SAP, making profiling easier for everyone. Thanks to Martin Dörr, Andrei Pangin, and especially Jaroslav Bachorik for their invaluable input on my proposal and jmethodIDs.

Mastering the Art of Controlling the JIT: Unlocking Reproducible Profiler Tests

In my last blog post, I hinted Using Async-Profiler and Jattach Programmatically with AP-Loader, that I’m currently working on a test library for writing better profiling API tests. The library is still work-in-progress, but it already allows you to write profiling API tests in plain Java:

private int innerASGCT2() {
    new Tracer().runASGCT().assertTrue(
        Frame.hasMethod(0, "innerASGCT2", "()I"), 
        Frame.hasMethod(1, "testRunASGCT2"));
    return 0;
}

@Test
public void testRunASGCT2() {
    innerASGCT2();
}

This test case checks that calling AsyncGetCallTrace gives the correct result in this specific example. The test library allows you to write tests comparing the returns of multiple GetStackTrace, AsyncGetCallTrace, and AsyncGetStackTrace invocations in different modes and settings. The library can be found as trace-tester on GitHub; I aim to bring it into the OpenJDK later with my JEP.

Writing small test cases this way is great, but it would be even better if we could force specific methods to be compiled, interpreted, or inlined so that we can test different scenarios. The proposed AsyncGetStackTrace will return the compilation level directly for every frame, so it is necessary to check the correctness of the level too.

Consider reading my Validating Java Profiling APIs post to get a different angle on profiling API testing.

Introduction

Before I start with discussing the ways you can force methods to be compiled, interpreted, or inlined, I’ll have to clarify that:

  1. The following only works with the HotSpot tired JIT compiler and not other JVM’s like OpenJ9 (see issue #11272)
  2. It should only be used for testing. I would refrain from using it anywhere near production, even if you know that specific methods should be compiled. Use a tool like JITWatch by Chris Newland to check whether the JVM doesn’t make the correct decisions automatically: Ask your fellow JVM expert how to deal with this.
  3. I’m not an expert in the APIs I’m showing you, nor in tiered compilation, so be aware that I might be missing something, but I’m happy for any suggestions and corrections.
  4. There are four different compilation levels, but I’m subsuming all C1 variants under the C1 label because some of my used techniques only work on the C1/C2/inlined level. You can read more on tiered compilation in articles like Tiered Compilation in JVM on Baeldung.

Now that I finished the obligatory disclaimer: What are the stages in the life of a method with a tiered JIT?

The first time the JVM executes a method, the method’s byte code is interpreted without compilation. This allows the JVM to gather information on the method, as C1 and C2 are profile guided.

The method is then compiled when the JVM deems this to be beneficial, usually after the method has been executed a few times. The next call of the method will then use the compiled version. The method is initially compiled with different levels of the C1 compiler before finally being C2 compiled, which takes the longest but produces the best native instructions.

The JVM might decide at any point to use the interpreted version of a method by deoptimizing it. The compiled versions are kept, depending on the compiler and the reasons for the deoptimization.

Every compiler can decide to inline called methods of a currently compiled method. A compiler uses the initial byte code for this purpose.

What we want and what we get

The ideal would be to tell the JVM to just use a method in its compiled version, e.g.:

But this is not possible, as the JVM does not have any information it needs for compilation before the first execution of a method. We, therefore, have first to execute the method (or the benchmark) and then set the compilation level:

How do we get it?

We can split the task of forcing a method to be compiled (or inlined, for that matter) into two parts:

  1. Force all methods into their respective state (→ WhiteBox API) after the initial execution.
  2. Force the JIT to never compile a method with a different compiler (→ Compiler Control)

The following is the modified state diagram when forcing a method to be C1 compiled:

In the following, I’ll discuss how to use both the WhiteBox API and Compiler Control to facilitate the wanted behavior.

WhiteBox API

Many JVM tests are written in the JTreg framework, allowing developers to write these tests in Java. But these tests often require specific functionality not regularly available to Java developers. This functionality is exported in the WhiteBox API:

One of the not so well-known tools of the HotSpot VM is its WhiteBox testing API. Introduced in Java 7 it has been significantly improved and extended in Java 8 and 9. It can be used to query or change HotSpot internals which are not otherwise exposed to Java-land. While its features make it an indispensable tool for writing good HotSpot regression tests, it can also be used for experiments or for the mere fun of peeking into the VM. This entry will focus on the usage of the WhiteBox API in Java 8 and 9.

The WhiteBox API is implemented as a Java class (called sun.hotspot.WhiteBox) which defines various entry points into the HotSpot VM. Most of the functionality is implemented natively, directly in the HotSpot VM. The API is implemented as a singleton which can be easily retrieved by calling the static method WhiteBox.getWhiteBox().

Unfortunately, currently even a simple JavaDoc documentation of the API doesn’t exist, so in order to make full use of its functionality, you’ll have to peek right into WhiteBox.java.

The WhiteBox testing API

This API can be used outside of JTreg tests after enabling it by passing -Xbootclasspath/a:wb.jar -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI as JVM arguments. To use it, you have to build the WhiteBox JAR from scratch for your specific JVM by calling make build-test-lib (after you set up the build via the configure script).

But please be aware that using this API outside of JVM tests is relatively rare, and the documentation is still non-existent, so using it entails reading a lot of JDK sources and experimentation.

The build target did not work in JDK 21, and when I fixed it, the first question in the PR was by Daniel Jelinski, who asked:

That’s interesting. How did you find this? Is the result of this target used anywhere?
As far as I could tell, the build-test-lib target itself is not used anywhere. The classes that fail to compile here are used by tests without any problems – each test specifies the necessary imports individually. Should we remove this make target instead?

8307732: build-test-lib is broken #13885

So it would be best if you certainly did not depend on it.

The WhiteBox API consists of the singleton class jdk.test.whitebox.WhiteBox which offers many methods: From GC related methods like boolean isObjectInOldGen(Object o) and void fullGC() to NMT-related methods like long NMTMalloc(long size) and JIT-related methods like void deoptimizeAll().

You can even use it to force the compilation of a method and to set JVM flags, as shown in this example by Jean-Philippe Bempel:

public class WhiteBoxTest {
    static WhiteBox wb = WhiteBox.getWhiteBox();

    private void m() {
        System.out.println("foo");
    }

    public static void main(String[] args) throws Exception {
        wb.setBooleanVMFlag("PrintCompilation", true);
        wb.setBooleanVMFlag("BackgroundCompilation", false);
        wb.enqueueMethodForCompilation(
          WhiteBoxTest.class.getDeclaredMethod("m", null), 4);
    }
}

This is from his blog post WhiteBox API, the only blog post I could find on this topic.

Back to our goal of forcing the compilation of a method. It is a good idea to reset the state of a method and deoptimize it to start from a blank slate:

// obtain a method reference
Executable m = X.class.getDeclaredMethod("m", null);
// obtain a WhiteBox instance
WhiteBox wb = WhiteBox.getWhiteBox();
// deooptimize the method
wb.deoptimizeMethod(m);
// clear its state, found by experimentation to be neccessary
wb.clearMethodState(m);

We can then either leave the method uncompiled (for compilation level 0) or enqueue for compilation:

// level 1 - 3: C1, level 4: C2
wb.enqueueMethodForCompilation(m, level);

But be aware that it takes some time to actually compile the method, so it’s best to wait till it is compiled:

while (wb.getMethodCompilationLevel(m) != level) {
    Thread.sleep(1);
}

We can then also force a method to be never inlined:

wb.testSetDontInlineMethod(m, true);
wb.testSetForceInlineMethod(m, false);

Or inversely to be always inlined:

wb.testSetDontInlineMethod(m, false);
wb.testSetForceInlineMethod(m, true);

I implemented this in the WhiteBoxUtil class in my trace-tester library. This allows us to force all methods in their respective states. But the JVM can still decide to optimize further or inline a method, even when specifying the contrary. So we have to force the JVM using the second the Compiler Control specifications.

Compiler Control

This control mechanism has been introduced in Java 9 with JEP 165 by Nils Eliasson:

Summary

This JEP proposes an improved way to control the JVM compilers. It enables runtime manageable, method dependent compiler flags. (Immutable for the duration of a compilation.)

Goals

  • Fine-grained and method-context dependent control of the JVM compilers (C1 and C2)
  • The ability to change the JVM compiler control options in run time
  • No performance degradation

Motivation

Method-context dependent control of the compilation process is a powerful tool for writing small contained JVM compiler tests that can be run without restarting the entire JVM. It is also very useful for creating workarounds for bugs in the JVM compilers. A good encapsulation of the compiler options is also good hygiene.

JEP 165

This mechanism is properly standardized for the OpenJDK, unlike the WhiteBox APi. The compiler control allows to specify compilation settings by defining them in a JSON file and applying them:

  • Using jcmd (see JEP): jcmd <pid> Compiler.add_directives <file>
  • Passing it via JVM arguments: -XX:+UnlockDiagnosticVMOptions -XX:CompilerDirectivesFile=<file>
  • Using the WhiteBox API: int addCompilerDirective(String compDirect)

The following directives specify as an example that the method m should not be C2 compiled and not be inlined:

[
  {
    // can also contain patterns
    "match": ["X::m()"],
    // "-" prefixes not inlined, "+" inlined methods
    "inline": ["-X::m()"],
    "C1": {},
    "C2": {
      "Exclude": true
    }
  }
  // multiple directives supported
  // first directives have priority
]

This, in theory, allows the method to be deoptimized, but this did not happen during my testing. With forced compilation, one can assume that this method will almost be used in its compiled form.

I recommend this Compiler Control guide for a more in-depth guide with all options. An implementation of the control file generation with a fluent API can be found in the trace-tester project in the CompilerDirectives class. Feel free to adapt this for your own projects.

Conclusion

I’ve shown you in this article how to control the JIT to specify the inlining and compilation of methods using two lesser-known JVM APIs. This allows us to write reproducible profiling APIs and makes it easier to check how a profiling API reacts to different scenarios.

If you have any suggestions, feel free to reach out. I look forward to preparing slides for my upcoming talks in Milan, Munich, Arnhem, and Karlsruhe. Feel free to come to my talks; more information soon on Twitter.

This project is part of my work in the SapMachine team at SAP, making profiling easier for everyone.

Using Async-Profiler and Jattach Programmatically with AP-Loader

Using async-profiler and jattach can be quite a hassle. First, you have to download the proper archive from GitHub for your OS and architecture; then, you have to unpack it and place it somewhere. It gets worse if you want to embed it into your library, agent, or application: Library developers cannot just use maven dependency but have to create wrapper code and build scripts that deal with packaging the binaries themselves, or worse, they depend on a preinstalled version which they do not control.

In November 2022, I started the ap-loader project to remedy this situation: I wrapped async-profiler and jattach in a platform-independent JAR which can be pulled from maven central. I already wrote a blog post on its essential features: AP-Loader: A new way to use and embed async-profiler.

Flamegraph for a recording of profiling data for the dacapo benchmark suite

In this blog post, I’m focusing on its programmatic usage: Async-profiler can be used in a library to gather profiling data of the current or a different process, but the profiler distribution contains more: It contains converters to convert from JFR to flamegraphs, and jattach to attach a native agent dynamically to (potentially the current) JVM and send commands to it.

This blog post does assume that you’re familiar with the basic usage of async-profiler. If you are not, consider reading the async-profiler README or the Async-profiler – manual by use cases by Krzysztof Ślusarski.

The ap-loader library allows you to depend on a specific version of async-profiler using gradle or maven:

<dependency>
    <groupId>me.bechberger</groupId>
    <artifactId>ap-loader-all</artifactId>
    <version>2.9-5</version>
</dependency>

There are multiple maven artifacts: ap-loader-all which contains the native libraries for all platforms for which async-profiler has pre-built libraries and artifacts that only support a single platform like ap-loader-macos. I recommend using the ap-loader-all if you don’t know what you’re doing, the current release is still tiny, with 825KB.

The version number consists of the async-profiler version and the version (here 2.9) of the ap-loader support libraries (here 5). I’m typically only publishing the newest ap-loader version for the latest async-profiler. The changes in ap-loader are relatively minimal, and I keep the API stable between versions.

The ap-loader library consists of multiple parts:

  • AsyncProfilerLoader class: Wraps async-profiler and jattach, adding a few helper methods
  • converter package: Contains all classes from the async-profiler converter JAR and helps to convert between multiple formats
  • AsyncProfiler class: API for async-profiler itself, wrapping the native library.

All but the AsyncProfilerLoader class is just copied from the underlying async-profiler release. ap-loader contains all Java classes from async-profiler, but I omit the helper classes here for brevity.

AsyncProfilerLoader

This is the main entry point to ap-loader; it lives in the one.profiler package like the AsyncProfiler class. Probably the most essential method is load:

Load

The load method loads the included async-profiler library for the current platform:

AsyncProfiler profiler = AsyncProfilerLoader.load();

It returns the instantiated API wrapper class. The method throws an IllegalStateException if the present ap-loader dependencies do not support the platform and an IOException if loading the library resulted in other problems.

Newer versions of the AsyncProfiler API contain the AsyncProfiler#getInstance() method, which can also load an included library. The main difference is that you have to include the native library for all the different platforms, replicating all the work of the ap-loader build system every time you update async-profiler.

Dealing with multiple platforms is hard, and throwing an exception when not supporting a platform might be inconvenient for your use case. AsyncProfilerLoader has the loadOrNull method which returns null instead and also the isSupported to check whether the current combination of OS and CPU is supported. A typical use case could be:

if (AsyncProfilerLoader.isSupported()) {
  AsyncProfilerLoader.load().start(...);
} else {
  // use JFR or other fall-backs
}

This might still throw IOExceptions, but they should never happen in normal circumstances and are probably by problems that should be investigated, being either an error in ap-loader or in your application.

If you want to merely get the path to the extracted libAsyncProfiler, then use the getAsyncProfilerPath method which throws the same exceptions as the load method. A similar method exists for jattach (getJattachPath).

Execute Profiler

The async-profiler project contains the profiler.sh script (will be replaced by asprof starting with async-profiler 2.10):

To run the agent and pass commands to it, the helper script profiler.sh is provided. A typical workflow would be to launch your Java application, attach the agent and start profiling, exercise your performance scenario, and then stop profiling. The agent’s output, including the profiling results, will be displayed in the Java application’s standard output.

Async-Profiler documentation

This helper script is also included in ap-loader and allows you to use the script on the command-line via java -jar ap-loader profiler ..., the API exposes this functionality via ExecutionResult executeProfiler(String... args).

AsyncProfilerLoader.executeProfiler("-e", "wall", "8983")
// is equivalent to
./profiler.sh -e wall -t -i 5ms -f result.html 8983

The executeProfiler method throws an IllegalStateException if the current platform is not supported. The returned instance of ExecutionResult contains the standard and error output:

public static class ExecutionResult {
  private final String stdout;
  private final String stderr;
    // getter and constructor
    ...
}

executeProfiler throws an IOException if the profiler execution failed.

Execute Converter

You cannot only use the converter by using the classes from the one.profiler.converter, but you can also execute the converter by calling ExecutionResult executeProfiler(String... args), e.g., the following:

AsyncProfilerLoader.executeConverter(
  "jfr2flame", "<input.jfr>", "<output.html>")
// is equivalent to
java -cp converter.jar \
  jfr2flame <input.jfr> <output.html>

The executeConverter returns the output of the conversion tool on success and throws an IOException on error, as before.

JAttach

There are multiple ways to use the embedded jattach besides using the binary returned by getJattachPath: ExecutionResult executeJattach(String... args) and boolean jattach(Path agentPath[, String arguments]).

executeJattach works similar to executeProfiler, e.g.:

AsyncProfilerLoader.executeJattach(
  "<pid>", "load", "instrument", "false", "javaagent.jar=arguments")
// is equivalent to
jattach <pid> load instrument false "javaagent.jar=arguments"

This runs the same as jattach with the only exception that every string that ends with
libasyncProfiler.so is mapped to the extracted async-profiler library for the load command.
One can, therefore, for example, start the async-profiler on a different JVM via the following:

AsyncProfilerLoader.executeJattach(
  PID, "load", "libasyncProfiler.so", true, "start")

But this use case can, of course, be accomplished by using the executeProfiler method, which internally uses jattach.

A great use case for jattach is to attach a custom native agent to the currently running JVM. Starting with JVM 9 doing this via VirtualMachine#attach throws an IOException if you try this without setting -Djdk.attach.allowAttachSelf=true. The boolean jattach(Path agentPath[, String arguments]) methods simplify this, constructing the command line arguments for you and returning true if jattach succeeded, e.g.:

AsyncProfilerLoader.jattach("libjni.so")

This attaches the libjni.so agent to the current JVM. The process id of this JVM can be obtained by using the getProcessId method.

Extracting a Native Library

I happen to write many small projects for testing profilers that often require loading a native library from the resources folder; an example can be found in the trace_validation (blog post) project:

/**
 * extract the native library and return its temporary path
 */
public static synchronized Path getNativeLibPath(
 ClassLoader loader) {
  if (nativeLibPath == null) {
    try {
      String filename = System.mapLibraryName(NATIVE_LIB);
      InputStream in = loader.getResourceAsStream(filename);
      // ...
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  }
  return nativeLibPath;
}

I, therefore, added the extractCustomLibraryFromResources method:

/**                                                                                                                                        
 * Extracts a custom native library from the resources and 
 * returns the alternative source if the file is not 
 * in the resources.                                                                                                    
 *                                                                                                                                         
 * If the file is extracted, then it is copied to 
 * a new temporary folder which is deleted upon JVM exit.                            
 *                                                                                                                                         
 * This method is mainly seen as a helper method 
 * to obtain custom native agents for #jattach(Path) and                          
 * #jattach(Path, String). It is included in ap-loader 
 * to make it easier to write applications that need                           
 * custom native libraries.                                                                                                           
 *                                                                                                                                         
 * This method works on all architectures.                                                                                          
 *                                                                                                                                         
 * @param classLoader the class loader to load 
 *                 the resources from                                                                          
 * @param fileName the name of the file to copy, 
 *                 maps the library name if the fileName 
 *                 does not start with "lib", e.g. "jni" 
 *                 will be treated as "libjni.so" on Linux 
 *                 and as "libjni.dylib" on macOS                                       
 * @param alternativeSource the optional resource directory 
 *                 to use if the resource is not found in 
 *                 the resources, this is typically the case 
 *                 when running the application from an IDE, 
 *                 an example would be "src/main/resources" 
 *                 or "target/classes" for maven projects                                                    
 * @return the path of the library                                                                                                         
 * @throws IOException if the extraction fails and 
 *                  the alternative source is not present 
 *                  for the current architecture                      
 */                                                                                                                                        
public static Path extractCustomLibraryFromResources(
  ClassLoader classLoader, String fileName, 
  Path alternativeSource) throws IOException

This can be used effectively together with jattach to attach a native agent from the resources to the current JVM:

// extract the agent first from the resources
Path p = one.profiler.AsyncProfilerLoader.
  extractCustomLibraryFromResources(
    ....getClassLoader(), "library name");
// attach the agent to the current JVM
one.profiler.AsyncProfilerLoader.jattach(p, "optional arguments")
// -> returns true if jattach succeeded

This use-case comes from a profiler test helper library on which I hope to write a blog post in the near future.

Conclusion

ap-loader makes it easy to use async-profiler and its included tools programmatically without creating complex build systems. The project is regularly updated to keep pace with the newest stable async-profiler version; updating a version just requires changing a single dependency in your dependencies list.

The ap-loader is mature, so try it and tell me about it. I’m happy to help with any issues you have with this library, so feel free to write to me or create an issue on GitHub.

This project is part of my work in the SapMachine team at SAP, making profiling easier for everyone.