Quickly Inspect your Java Application with JStall

Welcome to the last blog post of the year. Last week, I discussed the limitations of custom JFR events. This week, I’ll also be covering a profiling-related topic and showcasing a tiny tool called JStall.

I hope I’m not the only one who sometimes wonders: “What is my Java application doing right now?” When you don’t see any output. Yes, you could perform a simple thread dump via jstack, but it is hard to understand which threads are actually consuming CPU and making any sort of progress. This is where my tiny tool called JStall comes in:

JStall is a small command-line tool for one-shot inspection of running JVMs using thread dumps and short, on-demand profiling. The tool essentially takes multiple thread dumps of your application and uses the per-thread cpu-time information to find the most CPU-time-consuming Java threads.

First, download the JStall executable from the GitHub releases page. Let us then start by finding the currently running JVMs:

> ./jstall
Usage: jstall <command> <pid|files> [options]

Available commands:
  status     - Show overall status (deadlocks + most active threads)
  deadlock   - Check for deadlocks
  most-work  - Show threads doing the most work
  flame      - Generate flame graph
  threads    - List all threads

Available JVMs:
  7153 ./jstall
  1223 <unknown>
  8136 ./renaissance-gpl-0.16.0.jar
  6138 org.jetbrains.idea.maven.server.RemoteMavenServer36
  5597 DeadlockDemo
  49294 com.intellij.idea.Main

This provides us with a list of options for the main status command, as well as a list of JVM processes and their corresponding main classes. Let’s start checking for deadlocking:

Checking for Deadlocks

Process 5597 is an instance of the DeadlockDemo program that we saw running above. Let’s obtain the status (a combination of deadlock and most-work analysis):

> ./jstall 5597
=== deadlock ===
Deadlock detected:

Found one Java-level deadlock:
=============================
"DeadlockThread-1":
  waiting to lock monitor 0x00006000029215f0 (object 0x000000052f817880, a java.lang.Object),
  which is held by "DeadlockThread-2"

"DeadlockThread-2":
  waiting to lock monitor 0x00006000029280d0 (object 0x000000052f817878, a java.lang.Object),
  which is held by "DeadlockThread-1"

Java stack information for the threads listed above:
===================================================
"DeadlockThread-1":
        at DeadlockDemo.lambda$main$0(scratch_2.java:15)
        - waiting to lock <0x000000052f817880> (a java.lang.Object)
        - locked <0x000000052f817878> (a java.lang.Object)
        at DeadlockDemo$$Lambda/0x000007fc01040400.run(Unknown Source)
        at java.lang.Thread.runWith(java.base@25/Thread.java:1487)
        at java.lang.Thread.run(java.base@25/Thread.java:1474)
"DeadlockThread-2":
        at DeadlockDemo.lambda$main$1(scratch_2.java:26)
        - waiting to lock <0x000000052f817878> (a java.lang.Object)
        - locked <0x000000052f817880> (a java.lang.Object)
        at DeadlockDemo$$Lambda/0x000007fc01040800.run(Unknown Source)
        at java.lang.Thread.runWith(java.base@25/Thread.java:1487)
        at java.lang.Thread.run(java.base@25/Thread.java:1474)

Found one Java-level deadlock:
=============================
"DeadlockThread-3":
  waiting to lock monitor 0x0000600002938000 (object 0x000000052f817890, a java.lang.Object),
  which is held by "DeadlockThread-4"

"DeadlockThread-4":
  waiting to lock monitor 0x0000600002934000 (object 0x000000052f817888, a java.lang.Object),
  which is held by "DeadlockThread-3"

Java stack information for the threads listed above:
===================================================
"DeadlockThread-3":
        at DeadlockDemo.lambda$main$2(scratch_2.java:38)
        - waiting to lock <0x000000052f817890> (a java.lang.Object)
        - locked <0x000000052f817888> (a java.lang.Object)
        at DeadlockDemo$$Lambda/0x000007fc01040c00.run(Unknown Source)
        at java.lang.Thread.runWith(java.base@25/Thread.java:1487)
        at java.lang.Thread.run(java.base@25/Thread.java:1474)
"DeadlockThread-4":
        at DeadlockDemo.lambda$main$3(scratch_2.java:49)
        - waiting to lock <0x000000052f817888> (a java.lang.Object)
        - locked <0x000000052f817890> (a java.lang.Object)
        at DeadlockDemo$$Lambda/0x000007fc01041000.run(Unknown Source)
        at java.lang.Thread.runWith(java.base@25/Thread.java:1487)
        at java.lang.Thread.run(java.base@25/Thread.java:1474)

Found 2 deadlocks.

=== most-work ===
Top threads by activity (2 dumps):
Combined CPU time: 0.00s, Elapsed time: 5.10s (0.0% overall utilization)

1. Monitor Deflation Thread
   CPU time: 0.00s (64.8% of total)
   Core utilization: 0.0%
   States: RUNNABLE: 100.0%
2. Service Thread
   CPU time: 0.00s (14.8% of total)
   Core utilization: 0.0%
   States: RUNNABLE: 100.0%
3. Attach Listener
   CPU time: 0.00s (13.6% of total)
   Core utilization: 0.0%
   States: RUNNABLE: 100.0%

For this, JStack obtained two thread dumps and analyzed them. JStack found two thread dumps and identified that the top CPU-time-consuming threads are JVM-related.

Listing All Threads

Using the threads command, we can take a look at all threads:

> ./jstall threads 5597
Threads (2 dumps):
Combined CPU time: 0.00s, Elapsed time: 5.10s (0.0% overall utilization)

THREAD                    CPU TIME  CPU %  STATES         TOP STACK FRAME                                   
------------------------  --------  -----  -------------  --------------------------------------------------
Monitor Deflation Thread     0.00s  65.2%  RUNNABLE                                                         
Service Thread               0.00s  18.5%  RUNNABLE                                                         
Attach Listener              0.00s   7.6%  RUNNABLE                                                         
C1 CompilerThread0           0.00s   4.3%  RUNNABLE                                                         
C2 CompilerThread0           0.00s   4.3%  RUNNABLE                                                         
Common-Cleaner               0.00s   0.0%  TIMED_WAITING  java.lang.Object.wait0                            
DeadlockThread-1             0.00s   0.0%  BLOCKED        DeadlockDemo.lambda$main$0                        
DeadlockThread-2             0.00s   0.0%  BLOCKED        DeadlockDemo.lambda$main$1                        
DeadlockThread-3             0.00s   0.0%  BLOCKED        DeadlockDemo.lambda$main$2                        
DeadlockThread-4             0.00s   0.0%  BLOCKED        DeadlockDemo.lambda$main$3                        
Finalizer                    0.00s   0.0%  WAITING        java.lang.Object.wait0                            
Notification Thread          0.00s   0.0%  RUNNABLE                                                         
Reference Handler            0.00s   0.0%  RUNNABLE       java.lang.ref.Reference.waitForReferencePending...
Signal Dispatcher            0.00s   0.0%  RUNNABLE                                                         
main                         0.00s   0.0%  WAITING        java.lang.Object.wait0

This provides a concise overview of the app’s progress.

Status of a Renaissance Instance

Let’s look at process 8136, which is an instance of the Renaissance benchmark suite.

> ./jstall 8136
=== most-work ===
Top threads by activity (2 dumps):
Combined CPU time: 13.34s, Elapsed time: 5.11s (261.4% overall utilization)

1. Executor task launch worker for task 0.0 in stage 53.0 (TID 75)
   CPU time: 4.14s (31.1% of total)
   Core utilization: 81.2%
   States: RUNNABLE: 100.0%
   Common stack prefix:
     org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
     org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
     org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
     org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
     org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
     org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
     org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
     org.apache.spark.scheduler.Task.run(Task.scala:141)
     org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
     org.apache.spark.executor.Executor$TaskRunner$$Lambda/0x0000030001959c00.apply
   ... (8 more lines)

2. Executor task launch worker for task 1.0 in stage 53.0 (TID 76)
   CPU time: 4.12s (30.9% of total)
   Core utilization: 80.8%
   States: RUNNABLE: 100.0%
   Common stack prefix:
     org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
     org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
     org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
     org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
     org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
     org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
     org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
     org.apache.spark.scheduler.Task.run(Task.scala:141)
     org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
     org.apache.spark.executor.Executor$TaskRunner$$Lambda/0x0000030001959c00.apply
   ... (8 more lines)

3. Executor task launch worker for task 2.0 in stage 53.0 (TID 77)
   CPU time: 4.06s (30.4% of total)
   Core utilization: 79.5%
   States: RUNNABLE: 100.0%
   Common stack prefix:
     org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
     org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:200)
     org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
     org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
     org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
     org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
     org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
     org.apache.spark.scheduler.Task.run(Task.scala:141)
     org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
     org.apache.spark.executor.Executor$TaskRunner$$Lambda/0x0000030001959c00.apply
   ... (8 more lines)

Here, you see the top three (configurable) CPU-time-consuming threads over the last 10 seconds, along with the stack trace parts common to both thread dumps.

Using the threads command, you can again see all threads:

> ./jstall threads 8136
Threads (2 dumps):
Combined CPU time: 13.12s, Elapsed time: 5.11s (256.8% overall utilization)

THREAD                                              CPU TIME  CPU %  STATES                       TOP STACK FRAME                                   
--------------------------------------------------  --------  -----  ---------------------------  --------------------------------------------------
Executor task launch worker for task 1.0 in sta...     4.16s  31.7%  RUNNABLE: 50%, BLOCKED: 50%  java.lang.ClassValue$ClassValueMap.readAccess     
Executor task launch worker for task 2.0 in sta...     4.16s  31.7%  RUNNABLE                     scala.runtime.ClassValueCompat.get                
Executor task launch worker for task 0.0 in sta...     4.05s  30.9%  RUNNABLE                     org.apache.spark.util.collection.SizeTrackingAp...
main                                                   0.20s   1.5%  WAITING                      jdk.internal.misc.Unsafe.park                     
C2 CompilerThread0                                     0.16s   1.2%  RUNNABLE                                                                       
task-result-getter-1                                   0.07s   0.6%  WAITING                      jdk.internal.misc.Unsafe.park                     
task-result-getter-0                                   0.07s   0.5%  WAITING                      jdk.internal.misc.Unsafe.park                     
...

But thread dumps aren’t the only way to peek at a running Java application; you can also take a short profile and generate a flamegraph:

Generating Flamegraphs

For this purpose, I integrated async-profiler using my ap-loader library:

> ./jstall flame 8136  
Starting flamegraph generation for PID 8136...
Event: cpu
Duration: 10s
Interval: 10ms
Output: flame.html

Executing: asprof -d 10 -e cpu -i 10000000 -f flame.html 8136

Profiling for 10 seconds
Done

✓ Flamegraph successfully generated!
Output file: .../flame.html
File size: 349576 bytes

Which results in this rather unwieldy flamegraph, courtesy of the Apache Spark benchmark:

Adding Analyses

Do you have an idea for another thread dump analysis? Just implement the Analysis interface, create a CLI command class, and submit a pull request on GitHub. The Analysis interface is pretty simple:

public interface Analyzer {

    /**
     * Returns the name of this analyzer.
     */
    String name();

    /**
     * Returns the set of options this analyzer supports.
     *
     * Common options: "dumps", "interval", "keep", "top"
     */
    Set<String> supportedOptions();

    /**
     * Returns the dump requirement for this analyzer.
     */
    DumpRequirement dumpRequirement();

    default AnalyzerResult analyze(List<ThreadDumpWithRaw> dumpsWithRaw, 
      Map<String, Object> options) {
        List<ThreadDump> dumps = dumpsWithRaw.stream().map(ThreadDumpWithRaw::parsed).toList();
        return analyzeThreadDumps(dumps, options);
    }

    default AnalyzerResult analyzeThreadDumps(List<ThreadDump> dumps, 
     Map<String, Object> options) {
        throw new UnsupportedOperationException(
            "Analyzer must implement either analyze(List<ThreadDumpWithRaw>, Map) " + 
            "or analyzeThreadDumps(List<ThreadDump>, Map)");
    }
}

Additionally, there are helper classes that make writing new analyses even easier. So feel free to submit your own. Ideas for new analyses could be:

  • A lock view that emits a lock graph
  • A thread-state analysis
  • A thread blockage analysis that takes the top stack frames and determines whether the threads are waiting for IO, user input, or something else

The threaddump analyses use the jthreaddump parsing library, which you can also use for your own projects.

Conclusion

I sometimes get the time to write tools that solve minor annoyances, in this case, finding the status of a Java app. I hope you find JStall as helpful as I do, and make this open-source project part of your daily toolbox. Feel free to contribute via GitHub issues or pull requests.

Thank you for reading some (at least one) of my blog posts this year. I hope you will join me again next year.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

P.S.: Never stop exploring…

P.P.S.: You can, of course, use the JStack tool via jbang too: jbang jstall@parttimenerd/jstall

P.P.P.S: Abschalten

Author

  • Johannes Bechberger

    Johannes Bechberger is a JVM developer working on profilers and their underlying technology in the SapMachine team at SAP. This includes improvements to async-profiler and its ecosystem, a website to view the different JFR event types, and improvements to the FirefoxProfiler, making it usable in the Java world. His work today comprises many open-source contributions and his blog, where he regularly writes on in-depth profiling and debugging topics. He also works on hello-ebpf, the first eBPF library for Java. His most recent contribution is the new CPU Time Profiler in JDK 25.

    View all posts

New posts like these come out at least every two weeks, to get notified about new posts, follow me on BlueSky, Twitter, Mastodon, or LinkedIn, or join the newsletter:

Leave a Reply

Your email address will not be published. Required fields are marked *