Johannes Bechberger is a JVM developer working on profilers and their underlying technology in the SapMachine team at SAP. This includes improvements to async-profiler and its ecosystem, a website to view the different JFR event types, and improvements to the FirefoxProfiler, making it usable in the Java world. He started at SAP in 2022 after two years of research studies at the KIT in the field of Java security analyses. His work today is comprised of many open-source contributions and his blog, where he writes regularly on in-depth profiling and debugging topics, and of working on his JEP Candidate 435 to add a new profiling API to the OpenJDK.
In my last blog post, I showed you how to work with JFR files using DuckDB, which started a blog series that I surely will continue. Just not this week. Instead, I want to showcase a tiny app to run AI models using the MediaPipe API directly on your phone. I created the app for another purpose (perhaps described in a future blog post) earlier this year, but never wrote anything about it. So here we are.
TL;DR: I built an Android app that offers AI models via a server
The app is open-source and available on GitHub; it’s experimental, but maybe it can help you build your own apps. You can download the releases page of the repo and install it.
The LLM API endpoint, writing a poem on a backyard sceneContinue reading →
In my previous post, I showed you how tricky it is to compare objects from the JFR Java API. You probably wondered why I wrote about this topic. Here is the reason: In this blog post, I’ll cover how to load JFR files into a DuckDB database to allow querying profiling data with simple SQL queries, all JFR views included.
This blog post will start a small series on making JFR quack.
TL;DR
You can now use a query tool (via GitHub) to transform JFR files into similarly sized DuckDB files:
CREATE VIEW "hot-methods" AS
SELECT
(c.javaName || '.' || m.name || m.descriptor) AS "Method",
COUNT(*) AS "Samples",
format_percentage(COUNT(*) / (SELECT COUNT(*) FROM ExecutionSample)) AS "Percent"
FROM ExecutionSample es
JOIN Method m ON es.stackTrace$topMethod = m._id
JOIN Class c ON m.type = c._id
GROUP BY es.stackTrace$topApplicationMethod, c.javaName, m.name, m.descriptor
ORDER BY COUNT(*) DESC
LIMIT 25
In the last blog post, I showed you how to silence JFR’s startup messages. This week’s blog post is also related to JFR, and no, it’s not about the JFR Events website, which got a simple search bar. It’s a short blog post on comparing objects from JFR recordings in Java and why this is slightly trickier than you might have expected.
Example
Getting a JFR recording is simple; just use the RecordingStream API. We do this in the following to record an execution trace of a tight loop using JFR and store it in a list:
List<RecordedEvent> events = new ArrayList<>();
// Know when to stop the loop
AtomicBoolean running = new AtomicBoolean(true);
// We obtain one hundred execution samples
// that have all the same stack trace
final long currentThreadId = Thread.currentThread().threadId();
try (RecordingStream rs = new RecordingStream()) {
rs.enable("jdk.ExecutionSample").with("period", "1ms");
rs.onEvent("jdk.ExecutionSample", event -> {
if (event.getThread("sampledThread")
.getJavaThreadId() != currentThreadId) {
return; // don't record other threads
}
events.add(event);
if (events.size() >= 100) {
// we can signal to stop
running.set(false);
}
});
rs.startAsync();
int i = 0;
while (running.get()) { // some busy loop to produce sample
for (int j = 0; j < 100000; j++) {
i += j;
}
}
rs.stop();
}
[0.172s][info][jfr,startup] Started recording 1. No limit specified, using maxsize=250MB as default.
[0.172s][info][jfr,startup]
[0.172s][info][jfr,startup] Use jcmd 29448 JFR.dump name=1 to copy recording data to file.
when starting the Flight Recorder with -XX:StartFlightRecorder? Even though the default logging level is warning, not info?
This is what this week’s blog post is all about. After I showed you last week how to waste CPU like a Professional, this week I’ll show you how to silence JFR. Back to the problem:
As a short backstory, my profiler needed a test to check that the queue size of the sampler really increased dynamically (see Java 25’s new CPU-Time Profiler: Queue Sizing (3)), so I needed a way to let a thread spend a pre-defined number of seconds running natively on the CPU. You can find the test case in its hopefully final form here, but be aware that writing such cases is more complicated than it looks.
So here we are: In need to essentially properly waste CPU-time, preferably in user-land, for a fixed amount of time. The problem: There are only a few scant resources online, so I decided to create my own. I’ll show you seven different ways to implement a simple
void my_wait(int seconds);
method, and you’ll learn far more about this topic than you ever wanted to. That works both on Mac OS and Linux. All the code is MIT licensed; you can find it on GitHub in my waste-cpu-experiments, alongside some profiling results.
Welcome back to my blog, this time for a blog post on profiling your Java applications in Cloud Foundry and the tool I helped to develop to make it easier.
Cloud Foundry “is an open source, multi-cloud application platform as a service (PaaS) governed by the Cloud Foundry Foundation, a 501(c)(6) organization” (Wikipedia). It allows you to run your workloads easily in the cloud, including your Java applications. You just need to define a manifest.yml, like for example:
But how would you profile this application? This and more is the topic of this blog post.
I will not discuss why you might want to use Cloud Foundry or how you can deploy your own applications. I assume you came this far in the blog post because you already have basic Cloud Foundry knowledge and want to learn how to profile your applications easily.
The Java Plugin
Cloud Foundry has a cf CLI with a proper plugin system with lots of plugins. A team at SAP, which included Tim Gerrlach, started to develop the Java plugin many years ago at SAP. It’s a plugin offering utilities to gain insights into JVMs running in your Cloud Foundry app.
The changes I described in this blog post led to segfaults in tests, so I backtracked on them for now. Maybe I made a mistake implementing the changes, or my reasoning in the blog post is incorrect. I don’t know yet.
Should the queue implementation use Atomics and acquire-release semantics?
This is what we cover in this short blog post. First, to the rather fun topic:
Is it a Queue?
I always called the primary data structure a queue, but recently, I wondered whether this term is correct. But what is a queue?
Definition: A collection of items in which only the earliest added item may be accessed. Basic operations are add (to the tail) or enqueue and delete (from the head) or dequeue. Delete returns the item removed. Also known as “first-in, first-out” or FIFO.
Welcome back to my series on the new CPU-time profiler in Java 25. In the previous blog post, I covered the implementation of the new profiler. In this week’s blog post, I’ll dive deep into the central request queue, focusing on deciding its proper size.
The JfrCPUTimeTraceQueue allows the signal handler to record sample requests that the out-of-thread sampler and the safepoint handler process. So it’s the central data structure of the profiler:
This queue is thread-local and pre-allocated, as it’s used in the signal handler, so the correct sizing is critical:
If the size is too small, you’ll lose many samples because the signal handler can’t record sample requests.
If you size it too large, you waste lots of memory. A sampling request is 48 bytes, so a queue with 500 elements (currently the default) requires 24kB. This adds up fast if you have more than a few threads.
So, in this blog post, we’re mainly concerned about setting the correct default size and discussing a potential solution to the whole problem.
I developed, together with others, the new CPU-time profiler for Java, which is now included in JDK 25. A few weeks ago, I covered the profiler’s user-facing aspects, including the event types, configuration, and rationale, alongside the foundations of safepoint-based stack walking in JFR (see Taming the Bias: Unbiased Safepoint-Based Stack Walking). If you haven’t read those yet, I recommend starting there. In this week’s blog post, I’ll dive into the implementation of the new CPU-time profiler.
It was a remarkable coincidence that safepoint-based stack walking made it into JDK 25. Thanks to that, I could build on top of it without needing to re-implement:
The actual stack walking given a sampling request
Integration with the safepoint handler
Of course, I worked on this before, as described in Taming the Bias: Unbiased Safepoint-Based Stack Walking. But Erik’s solution for JDK 25 was much more complete and profited from his decades of experience with JFR. In March 2025, whether the new stack walker would get into JDK 25 was still unclear. So I came up with other ideas (which I’m glad I didn’t need). You can find that early brain-dump in Profiling idea (unsorted from March 2025).
In this post, I’ll focus on the core components of the new profiler, excluding the stack walking and safepoint handler. Hopefully, this won’t be the last article in the series; I’m already researching the next one.
Main Components
There are a few main components of the implementation that come together to form the profiler:
This is my actual collection of ideas from March 2025, when it was unclear whether the updated JFR sampling at safepoints made it into JDK 25. It eventually did, so I scrapped the ideas. But it offers the reader an interesting, unfiltered look into my ideas and thoughts at the time, probably only useful for people who are really into profiling and the OpenJDK. Just be aware that it is therefore a document of its time (March 2025) and doesn’t reflect the actual current implementation. Also, don’t expect any deeper explanations.
Ever wondered how the views of the jfr tool are implemented? There are views like hot-methods which gives the most used methods, or cpu-load-samples that gives you the system load over time that you can directly use on the command line:
> jfr view cpu-load-samples recording.jfr
CPU Load
Time JVM User JVM System Machine Total
------------------ ------------------ -------------------- -----------------------
14:33:29 8,25% 0,08% 29,65%
14:33:30 8,25% 0,00% 29,69%
14:33:31 8,33% 0,08% 25,42%
14:33:32 8,25% 0,08% 27,71%
14:33:33 8,25% 0,08% 24,64%
14:33:34 8,33% 0,00% 30,67%
...
This is helpful when glancing at JFR files and trying to roughly understand their contents, without loading the files directly into more powerful, but also more resource-hungry, JFR viewers.
In this short blog post, I’ll show you how the views work under the hood using JFR queries and how to use the queries with my new experimental JFR query tool.
I didn’t forget the promised blog post on implementing the new CPU-time profiler in JDK 25; it’ll come soon.
Under the hood, JFR views use a built-in query language to define all views in the view.ini file. The above is, for example, defined as:
More than three years in the making, with a concerted effort starting last year, my CPU-time profiler landed in Java with OpenJDK 25. It’s an experimental new profiler/method sampler that helps you find performance issues in your code, having distinct advantages over the current sampler. This is what this week’s and next week’s blog posts are all about. This week, I will cover why we need a new profiler and what information it provides; next week, I’ll cover the technical internals that go beyond what’s written in the JEP. I will quote the JEP 509 quite a lot, thanks to Ron Pressler; it reads like a well-written blog post in and of itself.
Before I show you its details, I want to focus on what the current default method profiler in JFR does:
If you’re waiting for my CPU time profiler blog post, it’ll come soon (hopefully on Monday).
Garden Linux is a Debian GNU/Linux derivate that aims to provide small, auditable Linux images for most cloud providers (e.g. AWS, Azure, GCP etc.) and bare-metal machines. Garden Linux is the best Linux for Gardener nodes. Garden Linux provides great possibilities for customizing that is made by a highly customizable feature set to fit your needs.
Two years ago, I still planned to implement a new version of AsyncGetCallTrace in Java. This plan didn’t materialize, but Erik Österlund had the idea to fully walk the stack at safepoints during the discussions. Walking stacks only at safepoints normally would incur a safepoint-bias (see The Inner Workings of Safepoints), but when you record some program state in signal handlers, you can prevent this. I wrote about this idea and its basic implementation in Taming the Bias: Unbiased Safepoint-Based Stack Walking. I’ll revisit this topic in this week’s short blog post because Markus Grönlund took Erik’s idea and started implementing it for the standard JFR method sampler:
Smartphones are more powerful then ever, with processors rivaling old laptops. So let’s try to use them like a laptop to develop web-applications on the go. In this weeks blog post I’ll show you how to do use run and develop a CAP JavaSpring Boot application on your smartphone and how to run VSCode locally to develop and modify it. This, of course, works only on Android phones, as they are a Linux at their core.
Welcome back to my hello-ebpf series. Last week, I was an attendee at a scheduling and power management summit (and to bring someone a sched-ext-themed birthday cake), and the Chemnitz Linux Days, so this blog post will be a bit shorter but cover a small scheduler that I wrote for Chemnitz.
In all of my eBPF presentations, I quote Brendan Gregg with his statement, “eBPF is a crazy technology, it’s like putting JavaScript into the Linux kernel.” and a picture of him shouting at hard drives. I took this picture from the following video where he demonstrates that his disk-read-latency measurement tool can detect if someone shouts at the hard drives:
Andrea Righi came up with the idea of writing a tiny scheduler that reacts to sound. The great thing about sched-ext and eBPF is that we can write experimental schedulers without much effort, especially if you use my hello-ebpf library and access the vast Java ecosystem.
For this scheduler, we scale the number of cores any task can use by the loudness level. So shouting at your computer makes your application run faster:
In this scenario, I ran the slow roads game while my system was exhausted by stress-ng, so more cores mean a less overcommitted system and, thereby, a more fluent gaming experience.
Around ten months ago I wrote a blog post together with Mikaël Francoeur on how to instrument instrumenters:
Have you ever wondered how libraries like Spring and Mockito modify your code at run-time to implement all their advanced features? Wouldn’t it be cool to get a peek behind the curtains? This is the premise of my meta-agent, a Java agent to instrument instrumenters, to get these insights and what this blog post is about.
This launched a website under localhost:7071 where you could view the actions of every instrument and transformer. The only problem? It’s cumbersome to use, especially programmatically. Join me in this short blog post to learn about the newest edition of meta-agent and what it can offer.
Instrumentation Handler
An idea that came up at the recent ConFoo conference in discussion with Mikaël and Jonatan Ivanov was to add a new handler mechanism to call code every time a new transformer is added or a class is instrumented. So I got to work.
Welcome back to my hello-ebpf series. I was a FOSDEM earlier this month and gave three(ish) talks. This blog post covers one of them, which is related to concurrency testing.
Backstory
The whole story started in the summer of last year when I thought about the potential use cases for custom Linux schedulers. The only two prominent use cases at the time were:
Improve performance on servers and heterogenous systems
Allow people to write their own schedulers for educational purposes
Of course, the latter is also important for the hello-ebpf project, which allows Java developers to create their own schedulers. But I had another idea back then: Why can’t we use custom Linux schedulers for concurrency testing? I presented this idea in my keynote at the eBPF summit in September, which you can watch here.
I ended up giving this talk because of a certain Bill Mulligan, who liked the idea of showing my eBPF SDK in Java and sched-ext in a single talk. For whatever reason, I foolishly agreed to give the talk and spent a whole weekend in Oslo before JavaZone frantically implementing sched-ext support in hello-ebpf.
My idea then was one could use a custom scheduler to test specific scheduling orders to find (and possibly) reproduce erroneous behavior of concurrent code. One of the examples in my talk was dead-locks:
If you’re here for eBPF content, this blog post is not for you. I recommend reading an article on a concurrency fuzzing scheduler at LWN.
Ever wonder how the JDK Flight Recorder (JFR) keeps track of the classes and methods it has collected for stack traces and more? In this short blog post, I’ll explore JFR tagging and how it works in the OpenJDK.
Tags
JFR files consist of self-contained chunks. Every chunk contains:
The maximum chunk size is usually 12MB, but you can configure it:
java -XX:FlightRecorderOptions:maxchunksize=1M
Whenever JFR collects methods or classes, it has to somehow tell the JFR writer which entities have been used so that their mapping can be written out. Each entity also has to have a tracing ID that can be used in the events that reference it.
This is where JFR tags come in. Every class, module, and package entity has a 64-bit value called _trace_id (e.g., classes). Which consists of both the ID and the tag. Every method has an _orig_method_idnum, essentially its ID and a trace flag, which is essentially the tag.
In a world without any concurrency, the tag could just be a single bit, telling us whether an entity is used. But in reality, an entity can be used in the new chunk while we’re writing out the old chunk. So, we need to have two distinctive periods (0 and 1) and toggle between them whenever we write a chunk.
Tagging
We can visualize the whole life cycle of a tag for a given entity:
In this example, the entity, a class, is brought into JFR by the method sampler (link) while walking another thread’s stack. This causes the class to be tagged and enqueued in the internal entity queue (and is therefore known to the JFR writer) if it hasn’t been tagged before (source):
This shows that tagging also prevents entities from being duplicated in a chunk.
Then, when a chunk is written out. First, a safepoint is requested to initialize the next period (the next chunk) and the period to be toggled so that the subsequent use of an entity now belongs to the new period and chunk. Then, the entity is written out, and its tag for the previous period is reset (code). This allows the aforementioned concurrency.
But how does it ensure that the tagged classes aren’t unloaded before they are emitted? By writing out the classes when any class is unloaded. This is simple yet effective and doesn’t need any change in the GC.
Conclusion
Tagging is used in JFR to record classes properly, methods, and other entities while also preventing them from accidentally being garbage collected before they are written out. This is a simple but memory-effective solution. It works well in the context of concurrency but assumes entities are used in the event creation directly when tagging them. It is not supported to tag the entities and then push them into the queue to later create events asynchronously. This would probably require something akin to reference counting.
Thanks for coming this far in a blog post on a profiling-related topic. I chose this topic because I wanted to know more about tagging and plan to do more of these short OpenJDK-specific posts.
Writing custom Linux schedulers is pretty easy using sched-ext. You can write your own tiny scheduler in a few lines of code using C, Rust, or even Java. We’re so confident in sched-ext that we’re starting a Scheduler Contest for FOSDEM’25. Think you can craft the ultimate scheduler? A scheduler that does something interesting, helpful, or fun? Join our sched-ext contest and show us what you’ve got, with the chance of winning hand-crafted sched-ext swag!
How to participate: Submit your scheduler using sched-ext as a pull request to the repository fosdem25, ensuring:
It runs with a 6.12 Linux kernel.
It’s GPLv2-licensed
It compiles and is understandable. We’re programming language agnostic; just make sure to include a script so we can build and run it.
The implementations.md document provides details on the submission format.
You can also submit a unique scheduling policy idea to ideas.md.
Try to surprise us…
Deadlines & Announcement:
The submission deadline is Sunday, 2 February, at 10:00 AM (CET).
To claim your prize, you must be present during Andrea’s talk. By participating, you agree to share your submission under GPL licensing and allow us to showcase it.
Legal Note: This contest is for fun! We, Andrea and Johannes, select the winners based on our personal preferences and our own definition of “best”. All decisions are final, and we’re not liable beyond delivering the prizes to the winners. If you have any questions, feel free to create a GitHub issue.
There will be multiple talks at FOSDEM’25 on sched-ext: