This is my actual collection of ideas from March 2025, when it was unclear whether the updated JFR sampling at safepoints made it into JDK 25. It eventually did, so I scrapped the ideas. But it offers the reader an interesting, unfiltered look into my ideas and thoughts at the time, probably only useful for people who are really into profiling and the OpenJDK. Just be aware that it is therefore a document of its time (March 2025) and doesn’t reflect the actual current implementation. Also, don’t expect any deeper explanations.
Current approach (see PR):
- in signal handler: place traced stack in queue
- in GC: frantically try to record that classes in queue are still used
- in separate thread: autoadapt signal timers and get items from queue, push them into JFR
Problems:
- GC unloads stuff
- JFR does rotate chunks at safepoints which is bad because we now have to synchronize this with our sampler thread
- but we also have to synchronize the GC with our sampler thread (so that the queue stays still)
- which is not really good
- and threads can be deleted before we have the chance to record them
Tried to fix them, but it’s a whack-a-mole and the code gets more and more complicated
New idea:
- take ideas from Taming the Bias: Unbiased Safepoint-Based Stack Walking (originally based on idea by Erik)
- if we just record the traces at safepoints, we have far less problems
- signal handler still pushes elements into queue
- but threads aren’t deleted and classes unloaded while there in the queue
- still have to synchronize between safepoint handler and signal handler, but that should be fine
Problems:
- one short queue per thread, which is bad
- especially as times between safepoints are potentially large
- but the queue items are far larger than the simple (pc, bci, sp) combo of the approach from my taming the bias article
So it’s a cool approach, just not really suitable
But the main issue are the individual queues per thread everything else is ok (?). So…
Assumptions
- there are that many samples taken overall between a single threads stack sampling and its safepoint hit (assuming normal scheduling behaviour, not Hello eBPF: Concurrency Testing using Custom Linux Schedulers (19))
- iterating over the queue and skipping elements doesn’t cost much
Idea
- use one queue for all threads
- the signal handler doesn’t change
- queue doesn’t change much either
- use a separate thread only for timer adaption
at safepoint
- if overflow is detected by the queue: create overflow event
- overflow means: push on queue item that is not empty. push should fail. or should it? not failing means that the code is simpler: just check if the current element is full, then fail. Problematic if the thread that belongs to the top element is has a long time to safepoints. Alternative 1: Skip element, as simple, but might fill up over time. Alternative 2: Add a third state to the queue items called “in work” and skip elements only if they are in work, so currently worked on by the safepoint handler. Start by implementing the initial idea, then proceed to Alternative 1 if problems arise
- iterate over the queue, from the last recorded head, and skip elements that don’t belong to the current thread:
int i;
for (i = lastHead; i < currentHead; i++) {
if (!isFull(i)) {
continue;
}
var element = element(i);
if (hasCurrentThread(element)) {
process(element);
}
}
lastHead = i;
The only problem:
I’ve been wrong. And that’s a good thing. I totally forgot that “one short queue per thread, which is bad” is not that bad, as the queues only consist of entries of type Element:
struct Element {
// Encodes full/empty flag along with generation of the element.
// Also, establishes happens-before relationship between producer and consumer.
// Update of this field "commits" enqueue/dequeue transaction.
u4 _state;
JfrCPUTimeTrace* _trace;
};
These objects are small (48 bytes), so we can easily have a queue of 1000 elements per thread.
The trace objects are shared between the different queues and there is one empty element queue for all.
So we don’t need all the complexity from before and only use the original queue implementation.





