This week, a short blog post on a question that bothered me this week: How can I get the operating systems thread ID for a given Java thread? This is useful when you want to deal with Java threads using native code (foreshadowing another blog post). The question was asked countless times on the internet, but I couldn’t find a comprehensive collection, so here’s my take. But first some background:
Background
In Java, normal threads are mapped 1:1 to operating system threads. This is not the case for virtual threads because they are multiplexed on fewer carrier threads than virtual threads, but we ignore these threads for simplicity here.
But what is an operating system thread? An operating system thread is an operating system task that shares the address space (and more) with other thread tasks of the same process/thread group. The main thread is the thread group leader; its operating system ID is the same as the process ID.
Be aware that the Java thread ID is not related to the operating system ID but rather to the Java thread creation order. Now, what different options do we have to translate between the two?
Different Options
During my research, I found three different mechanisms:
- Using the
gettid()
method - Using JFR
- Parsing thread dumps
In the end, I found that option 3 is best; you’ll see why in the following.
Using gettid
The option that is mentioned most often on the internet is to use the gettid
C method:
#include <unistd.h>Man Page for gettid
pid_t gettid(void);
gettid() returns the caller's thread ID (TID). In a single-threaded process, the thread ID is equal to the process ID (PID, as returned by getpid(2)). In a multithreaded process, all threads have the same PID, but each one has a unique TID.
This allows us to obtain the operating system ID for the current thread. Using Java’s new FFI API (see From C to Java Code using Panama), we can call this method directly in Java:
import java.lang.foreign.*; import java.lang.invoke.MethodHandle; class GetTidExample { public static MemorySegment lookup(String symbol) { return Linker.nativeLinker().defaultLookup().find(symbol) .or(() -> SymbolLookup.loaderLookup().find(symbol)) .orElseThrow(); } private static final MethodHandle GETTID = Linker.nativeLinker().downcallHandle( lookup("gettid"), FunctionDescriptor.of(ValueLayout.JAVA_INT)); public static int gettid() throws Throwable{ return (int) GETTID.invokeExact(); } public static void main(String[] args) throws Throwable { System.out.println("TID: " + gettid()); } }
This solution has only two problems:
- It does only work on Linux.
- It only returns the ID for the current thread, so we can’t use it to map all Java threads to operating system tasks
Using JFR
In JFR every recorded thread has the following properties:
long getId()
: the Java thread idString getJavaName()
: the Java thread nameint getJavaThreadId()
: the Java thread idString getOSName()
: the name of the operating system tasklong getOSThreadId()
: the operating system task id
This is great, and I wonder why these properties aren’t available in the standard Thread class. With the RecordingStream class, it’s possible to start a JFR recording on the spot and thereby obtain RecordedThread objects for every running Java thread. This works on all operating systems but has the major disadvantage of having to start a flight recording session, with all the issues when the application is profiled or when threads don’t emit any JFR events (because they are idle).
So, onto the last option:
Parsing Thread Dumps
A typical Java thread dump that you can obtain via jstack
, contains something like the following for every thread:
"main" #1 [4611] prio=5 os_prio=31 cpu=82.32ms elapsed=0.41s allocated=2016K defined_classes=174 tid=0x000000012500a400 nid=4611 runnable [0x000000016d59a000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(java.base@22/Native Method) at java.io.FileInputStream.read(java.base@22/FileInputStream.java:287)
Which includes a host of relevant information, for example:
#number
: the Java thread idos_prio
: the operating system prioritycpu
: the CPU time of the thread spent in userlandelapsed
: the overall CPU time spent by the threadnid
: the operating system id
You can find the implementation in thread.cpp, osThread.cpp, and javaThread.cpp in the OpenJDK.
So we just have to execute jstack from our code and parse the output to get a mapping of Java thread ids to operating system ids. You can find my implementation on GitHub.
But you might wonder how costly this all is. I ran the code using JMH and observed a runtime of around 70ms (+- 10ms) for parsing the stack dumps of a single threaded application. This is certainly costlier for applications with many threads or deep stack traces (because the stack traces are also included in the thread dumps).
Conclusion
It’s possible to obtain the operating system thread ids, albeit its costly, when you want to obtain more than just the current thread’s id or aren’t just running on Linux.
See you in two weeks with another blog post on something possibly eBPF-related.
P.S: ThreadMXBean offers lots of information on threads, but sadly, not operating system ids.