
Welcome to the last blog post of the year. Last week, I discussed the limitations of custom JFR events. This week, I’ll also be covering a profiling-related topic and showcasing a tiny tool called JStall.
I hope I’m not the only one who sometimes wonders: “What is my Java application doing right now?” When you don’t see any output. Yes, you could perform a simple thread dump via jstack, but it is hard to understand which threads are actually consuming CPU and making any sort of progress. This is where my tiny tool called JStall comes in:
JStall is a small command-line tool for one-shot inspection of running JVMs using thread dumps and short, on-demand profiling. The tool essentially takes multiple thread dumps of your application and uses the per-thread cpu-time information to find the most CPU-time-consuming Java threads.
First, download the JStall executable from the GitHub releases page. Let us then start by finding the currently running JVMs:
> ./jstall Usage: jstall <command> <pid|files> [options] Available commands: status - Show overall status (deadlocks + most active threads) deadlock - Check for deadlocks most-work - Show threads doing the most work flame - Generate flame graph threads - List all threads Available JVMs: 7153 ./jstall 1223 <unknown> 8136 ./renaissance-gpl-0.16.0.jar 6138 org.jetbrains.idea.maven.server.RemoteMavenServer36 5597 DeadlockDemo 49294 com.intellij.idea.Main
This provides us with a list of options for the main status command, as well as a list of JVM processes and their corresponding main classes. Let’s start checking for deadlocking:
Checking for Deadlocks
Process 5597 is an instance of the DeadlockDemo program that we saw running above. Let’s obtain the status (a combination of deadlock and most-work analysis):
> ./jstall 5597
=== deadlock ===
Deadlock detected:
Found one Java-level deadlock:
=============================
"DeadlockThread-1":
waiting to lock monitor 0x00006000029215f0 (object 0x000000052f817880, a java.lang.Object),
which is held by "DeadlockThread-2"
"DeadlockThread-2":
waiting to lock monitor 0x00006000029280d0 (object 0x000000052f817878, a java.lang.Object),
which is held by "DeadlockThread-1"
Java stack information for the threads listed above:
===================================================
"DeadlockThread-1":
at DeadlockDemo.lambda$main$0(scratch_2.java:15)
- waiting to lock <0x000000052f817880> (a java.lang.Object)
- locked <0x000000052f817878> (a java.lang.Object)
at DeadlockDemo$$Lambda/0x000007fc01040400.run(Unknown Source)
at java.lang.Thread.runWith(java.base@25/Thread.java:1487)
at java.lang.Thread.run(java.base@25/Thread.java:1474)
"DeadlockThread-2":
at DeadlockDemo.lambda$main$1(scratch_2.java:26)
- waiting to lock <0x000000052f817878> (a java.lang.Object)
- locked <0x000000052f817880> (a java.lang.Object)
at DeadlockDemo$$Lambda/0x000007fc01040800.run(Unknown Source)
at java.lang.Thread.runWith(java.base@25/Thread.java:1487)
at java.lang.Thread.run(java.base@25/Thread.java:1474)
Found one Java-level deadlock:
=============================
"DeadlockThread-3":
waiting to lock monitor 0x0000600002938000 (object 0x000000052f817890, a java.lang.Object),
which is held by "DeadlockThread-4"
"DeadlockThread-4":
waiting to lock monitor 0x0000600002934000 (object 0x000000052f817888, a java.lang.Object),
which is held by "DeadlockThread-3"
Java stack information for the threads listed above:
===================================================
"DeadlockThread-3":
at DeadlockDemo.lambda$main$2(scratch_2.java:38)
- waiting to lock <0x000000052f817890> (a java.lang.Object)
- locked <0x000000052f817888> (a java.lang.Object)
at DeadlockDemo$$Lambda/0x000007fc01040c00.run(Unknown Source)
at java.lang.Thread.runWith(java.base@25/Thread.java:1487)
at java.lang.Thread.run(java.base@25/Thread.java:1474)
"DeadlockThread-4":
at DeadlockDemo.lambda$main$3(scratch_2.java:49)
- waiting to lock <0x000000052f817888> (a java.lang.Object)
- locked <0x000000052f817890> (a java.lang.Object)
at DeadlockDemo$$Lambda/0x000007fc01041000.run(Unknown Source)
at java.lang.Thread.runWith(java.base@25/Thread.java:1487)
at java.lang.Thread.run(java.base@25/Thread.java:1474)
Found 2 deadlocks.
=== most-work ===
Top threads by activity (2 dumps):
Combined CPU time: 0.00s, Elapsed time: 5.10s (0.0% overall utilization)
1. Monitor Deflation Thread
CPU time: 0.00s (64.8% of total)
Core utilization: 0.0%
States: RUNNABLE: 100.0%
2. Service Thread
CPU time: 0.00s (14.8% of total)
Core utilization: 0.0%
States: RUNNABLE: 100.0%
3. Attach Listener
CPU time: 0.00s (13.6% of total)
Core utilization: 0.0%
States: RUNNABLE: 100.0%
For this, JStack obtained two thread dumps and analyzed them. JStack found two thread dumps and identified that the top CPU-time-consuming threads are JVM-related.
Listing All Threads
Using the threads command, we can take a look at all threads:
> ./jstall threads 5597 Threads (2 dumps): Combined CPU time: 0.00s, Elapsed time: 5.10s (0.0% overall utilization) THREAD CPU TIME CPU % STATES TOP STACK FRAME ------------------------ -------- ----- ------------- -------------------------------------------------- Monitor Deflation Thread 0.00s 65.2% RUNNABLE Service Thread 0.00s 18.5% RUNNABLE Attach Listener 0.00s 7.6% RUNNABLE C1 CompilerThread0 0.00s 4.3% RUNNABLE C2 CompilerThread0 0.00s 4.3% RUNNABLE Common-Cleaner 0.00s 0.0% TIMED_WAITING java.lang.Object.wait0 DeadlockThread-1 0.00s 0.0% BLOCKED DeadlockDemo.lambda$main$0 DeadlockThread-2 0.00s 0.0% BLOCKED DeadlockDemo.lambda$main$1 DeadlockThread-3 0.00s 0.0% BLOCKED DeadlockDemo.lambda$main$2 DeadlockThread-4 0.00s 0.0% BLOCKED DeadlockDemo.lambda$main$3 Finalizer 0.00s 0.0% WAITING java.lang.Object.wait0 Notification Thread 0.00s 0.0% RUNNABLE Reference Handler 0.00s 0.0% RUNNABLE java.lang.ref.Reference.waitForReferencePending... Signal Dispatcher 0.00s 0.0% RUNNABLE main 0.00s 0.0% WAITING java.lang.Object.wait0
This provides a concise overview of the app’s progress.
Status of a Renaissance Instance
Let’s look at process 8136, which is an instance of the Renaissance benchmark suite.
> ./jstall 8136
=== most-work ===
Top threads by activity (2 dumps):
Combined CPU time: 13.34s, Elapsed time: 5.11s (261.4% overall utilization)
1. Executor task launch worker for task 0.0 in stage 53.0 (TID 75)
CPU time: 4.14s (31.1% of total)
Core utilization: 81.2%
States: RUNNABLE: 100.0%
Common stack prefix:
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
org.apache.spark.scheduler.Task.run(Task.scala:141)
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
org.apache.spark.executor.Executor$TaskRunner$$Lambda/0x0000030001959c00.apply
... (8 more lines)
2. Executor task launch worker for task 1.0 in stage 53.0 (TID 76)
CPU time: 4.12s (30.9% of total)
Core utilization: 80.8%
States: RUNNABLE: 100.0%
Common stack prefix:
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
org.apache.spark.scheduler.Task.run(Task.scala:141)
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
org.apache.spark.executor.Executor$TaskRunner$$Lambda/0x0000030001959c00.apply
... (8 more lines)
3. Executor task launch worker for task 2.0 in stage 53.0 (TID 77)
CPU time: 4.06s (30.4% of total)
Core utilization: 79.5%
States: RUNNABLE: 100.0%
Common stack prefix:
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
org.apache.spark.scheduler.Task.run(Task.scala:141)
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
org.apache.spark.executor.Executor$TaskRunner$$Lambda/0x0000030001959c00.apply
... (8 more lines)
Here, you see the top three (configurable) CPU-time-consuming threads over the last 10 seconds, along with the stack trace parts common to both thread dumps.
Using the threads command, you can again see all threads:
> ./jstall threads 8136 Threads (2 dumps): Combined CPU time: 13.12s, Elapsed time: 5.11s (256.8% overall utilization) THREAD CPU TIME CPU % STATES TOP STACK FRAME -------------------------------------------------- -------- ----- --------------------------- -------------------------------------------------- Executor task launch worker for task 1.0 in sta... 4.16s 31.7% RUNNABLE: 50%, BLOCKED: 50% java.lang.ClassValue$ClassValueMap.readAccess Executor task launch worker for task 2.0 in sta... 4.16s 31.7% RUNNABLE scala.runtime.ClassValueCompat.get Executor task launch worker for task 0.0 in sta... 4.05s 30.9% RUNNABLE org.apache.spark.util.collection.SizeTrackingAp... main 0.20s 1.5% WAITING jdk.internal.misc.Unsafe.park C2 CompilerThread0 0.16s 1.2% RUNNABLE task-result-getter-1 0.07s 0.6% WAITING jdk.internal.misc.Unsafe.park task-result-getter-0 0.07s 0.5% WAITING jdk.internal.misc.Unsafe.park ...
But thread dumps aren’t the only way to peek at a running Java application; you can also take a short profile and generate a flamegraph:
Generating Flamegraphs
For this purpose, I integrated async-profiler using my ap-loader library:
> ./jstall flame 8136 Starting flamegraph generation for PID 8136... Event: cpu Duration: 10s Interval: 10ms Output: flame.html Executing: asprof -d 10 -e cpu -i 10000000 -f flame.html 8136 Profiling for 10 seconds Done ✓ Flamegraph successfully generated! Output file: .../flame.html File size: 349576 bytes
Which results in this rather unwieldy flamegraph, courtesy of the Apache Spark benchmark:

Adding Analyses
Do you have an idea for another thread dump analysis? Just implement the Analysis interface, create a CLI command class, and submit a pull request on GitHub. The Analysis interface is pretty simple:
public interface Analyzer {
/**
* Returns the name of this analyzer.
*/
String name();
/**
* Returns the set of options this analyzer supports.
*
* Common options: "dumps", "interval", "keep", "top"
*/
Set<String> supportedOptions();
/**
* Returns the dump requirement for this analyzer.
*/
DumpRequirement dumpRequirement();
default AnalyzerResult analyze(List<ThreadDumpWithRaw> dumpsWithRaw,
Map<String, Object> options) {
List<ThreadDump> dumps = dumpsWithRaw.stream().map(ThreadDumpWithRaw::parsed).toList();
return analyzeThreadDumps(dumps, options);
}
default AnalyzerResult analyzeThreadDumps(List<ThreadDump> dumps,
Map<String, Object> options) {
throw new UnsupportedOperationException(
"Analyzer must implement either analyze(List<ThreadDumpWithRaw>, Map) " +
"or analyzeThreadDumps(List<ThreadDump>, Map)");
}
}
Additionally, there are helper classes that make writing new analyses even easier. So feel free to submit your own. Ideas for new analyses could be:
- A lock view that emits a lock graph
- A thread-state analysis
- A thread blockage analysis that takes the top stack frames and determines whether the threads are waiting for IO, user input, or something else
The threaddump analyses use the jthreaddump parsing library, which you can also use for your own projects.
Conclusion
I sometimes get the time to write tools that solve minor annoyances, in this case, finding the status of a Java app. I hope you find JStall as helpful as I do, and make this open-source project part of your daily toolbox. Feel free to contribute via GitHub issues or pull requests.
Thank you for reading some (at least one) of my blog posts this year. I hope you will join me again next year.
This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.
P.S.: Never stop exploring…

P.P.S.: You can, of course, use the JStack tool via jbang too: jbang jstall@parttimenerd/jstall
P.P.P.S: Abschalten




