On HotSpot Error Files and Useful Tools

Ever crashed a JVM? Then you typically get a HotSpot error file, or hs err for short. These files contain information about which crash occurred and what the JVM’s state was at the time of the crash. In this short blog post, we’ll be diving into the components of a typical hs err file and show case a few custom-built tools to work with them.

But before we start: A crashing JVM is normal when you develop the JVM, e.g., adding a new CPU-time profiler, but normally it should not happen (except maybe when your JVM runs out of memory), so the hs err files are not a common sight. Still, they are important to me and my colleagues at SapMachine, which is why I’m writing this blog post.

TL;DR: I wrote an online tool for redacting hs err files and also a syntax highlighter extension for VSCode, all as part of the jhserr project on GitHub.

Header and Summary

The initial part of such an error file is most important, as it provides the error, the error location, and information on the JVM and underlying OS versions. An example is the following:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000013000f588, pid=16540, tid=8451
#
# JRE version: OpenJDK Runtime Environment (22.0) (slowdebug build 22-internal-adhoc.jbechberger.jdk)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug 22-internal-adhoc.jbechberger.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
# Problematic frame:
# C  [libSmallProfiler.so+0xf588]  signalHandler(int, __siginfo*, void*)+0x464
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -agentpath:libSmallProfiler.so=interval=0.1s math.MathParser

Host: HKJHJKLHJ, "MacBookPro17,1" arm64, 8 cores, 16G, Darwin 22.6.0, macOS 13.5.1 (22G90)
Time: Thu Aug 31 17:31:03 2023 CEST elapsed time: 1.376730 seconds (0d 0h 0m 1s)

Here you can clearly see that a segmentation fault (accessing memory it should not have) caused the JVM version 22 built by me to crash after it ran for 1.38s on an ARM MacBook. Furthermore, it shows you how the JVM was invoked, so the JVM probably crashed during some of my profiler experiments.

When a crash is caused by a failing assertion in a debug build, then you also see it:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (../../src/hotspot/os/posix/os_posix.cpp:803), pid=10064, tid=25091
#  assert(ms < MILLIUNITS) failed: Un-interruptable sleep, short time use only
# ...

Which is pretty neat, as it even includes the code location where the assertion failed. But what if you want to know the whole stack trace of the crashing thread? This is where the next section of the hs err file comes to our help:

Thread Section

This section gives you more information on the crashing thread, the native as well as the Java frame on the stack:

---------------  T H R E A D  ---------------

Current thread (0x0000000127f09120):  JfrThreadSampler "JFR Thread Sampler" [stack: 0x00000001720f8000,0x00000001722fb000] [id=25091]

Stack: [0x00000001720f8000,0x00000001722fb000],  sp=0x00000001722fae10,  free space=2059k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.dylib+0x121c6a0]  VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x608
V  [libjvm.dylib+0x121cde0]  VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, char*)+0x40
V  [libjvm.dylib+0x5de794]  report_vm_error(char const*, int, char const*, char const*, ...)+0x94
V  [libjvm.dylib+0xe439fc]  os::naked_short_sleep(long)+0x3c
V  [libjvm.dylib+0x979bb4]  JfrThreadSampler::run()+0x10c
V  [libjvm.dylib+0x116c118]  Thread::call_run()+0x220
V  [libjvm.dylib+0xe3c0c8]  thread_native_entry(Thread*)+0x160
C  [libsystem_pthread.dylib+0x726c]  _pthread_start+0x94

This might also include registers and their values, instructions, and the stack-to-memory mapping, helping you investigate issues with the current frame.

This section is followed by information on the whole JVM process:

Process Section

The process section starts with the user and user group id, followed by some internal state, every Java and every native thread with name, thread id and status, allowing you a glimpse at all currently running threads:

---------------  P R O C E S S  ---------------

uid  : 3670 euid : 3670 gid  : 15 egid : 15

umask: 0022 (----w--w-)

Threads class SMR info:
_java_thread_list=0x0000eb2e804febf0, length=14, elements={ ...
}
_java_thread_list_alloc_cnt=17, _java_thread_list_free_cnt=15, _java_thread_list_max=15, _nested_thread_list_max=0
_tlh_cnt=64, _tlh_times=17, avg_tlh_time=0.27, _tlh_time_max=8
_deleted_thread_cnt=1, _deleted_thread_times=0, avg_deleted_thread_time=0.00, _deleted_thread_time_max=0
_delete_lock_wait_cnt=0, _delete_lock_wait_max=0
_to_delete_list_cnt=0, _to_delete_list_max=1

Java Threads: ( => current thread )
  0x0000eb2e800beb80 JavaThread "main"                              [_thread_blocked, id=643836, stack(0x0000eb2e845e2000,0x0000eb2e847e0000) (2040K)]
  0x0000eb2e803711e0 JavaThread "Reference Handler"          daemon [_thread_blocked, id=643920, stack(0x0000eb2e600dc000,0x0000eb2e602da000) (2040K)]
...

Other Threads:
  0x0000eb2e803380e0 VMThread "VM Thread"                           [id=643918, stack(0x0000eb2e60706000,0x0000eb2e60904000) (2040K)]
  0x0000eb2e8016a7e0 WatcherThread "VM Periodic Task Thread"        [id=643917, stack(0x0000eb2e604f8000,0x0000eb2e606f6000) (2040K)]
  0x0000eb2e80143df0 ConcurrentGCThread "ShenControl"               [id=643911, stack(0x0000eb2e60b62000,0x0000eb2e60d60000) (2040K)]
...
Threads with active compile tasks:
Total: 0
VM state: not at safepoint (normal execution)

This is then followed by internal heap and class space information, as well as information on every heap region in a large table (lots of internal information):

Heap Regions:
Region state: EU=empty-uncommitted, EC=empty-committed, R=regular, H=humongous start, HP=pinned humongous start
              HC=humongous continuation, CS=collection set, TR=trash, P=pinned, CSP=pinned collection set
BTE=bottom/top/end, TAMS=top-at-mark-start
UWM=update watermark, U=used
T=TLAB allocs, G=GCLAB allocs
S=shared allocs, L=live data
CP=critical pins
|    0|R  |Y|BTE     c0000000,     c0080000,     c0080000|TAMS     c0080000|UWM     c0080000|U   512K|T     0B|G     0B|S   512K|L   511K|CP   0
|    1|R  |Y|BTE     c0080000,     c0100000,     c0100000|TAMS     c0100000|UWM     c0100000|U   512K|T     0B|G     0B|S   512K|L   502K|CP   0
|    2|H  |Y|BTE     c0100000,     c0180000,     c0180000|TAMS     c0100000|UWM     c0100000|U   512K|T     0B|G     0B|S   51
...

Events SubSection

After the heap information, we see many internal events. Like compilation events:

Compilation events (250 events):
Event: 1.821 Thread 0x0000eb2e8037d7e0 nmethod 95 0x0000eb2e67612ec8 code [0x0000eb2e67612fc0, 0x0000eb2e67613160]
Event: 1.822 Thread 0x0000eb2e8049fb40   96       1       java.lang.invoke.MethodType::returnType (5 bytes)
Event: 1.822 Thread 0x0000bd9f33b3d7a0 nmethod 94 0x0000eb2e67613188 code [0x0000eb2e67613280, 0x0000eb2e67613b28]
Event: 1.822 Thread 0x0000eb2e8049fb40 nmethod 96 0x0000eb2e6f1a5848 code [0x0000eb2e6f1a5940, 0x0000eb2e6f1a5ac8]

Or some GC and Metaspace history events:

GC Heap Usage History (1 events):
Event: 2.262 {heap Before GC invocations=0 (full 0):
 Shenandoah Heap
   1024M max, 1024M soft max, 1024M committed, 105M used
  2048 x 512K regions
 Status: not cancelled
 Reserved region:
  - [0x00000000c0000000, 0x0000000100000000) 
 Collection set:
  - map (vanilla): 0x0000000000009800
  - map (biased):  0x0000000000008000

}

Metaspace Usage History (1 events):
Event: 2.262 {metaspace Before GC invocations=0 (full 0):
 Metaspace       used 794K, committed 1024K, reserved 1114112K
  class space    used 82K, committed 192K, reserved 1048576K
}

Or dynamic library-related events:

Dll operation events (8 events):
Event: 0.011 Attempting to load shared library /testfolder/output_openjdk27_dev_dbgU_linuxaarch64/testee-vm/lib/libjava.so
Event: 0.011 Loaded shared library /testfolder/output_openjdk27_dev_dbgU_linuxaarch64/testee-vm/lib/libjava.so
Event: 1.667 Attempting to load shared library /testfolder/output_openjdk27_dev_dbgU_linuxaarch64/testee-vm/lib/libnio.so
Event: 1.668 Loaded shared library /testfolder/output_openjdk27_dev_dbgU_linuxaarch64/testee-vm/lib/libnio.so

Or deoptimization and class loading events:

Deoptimization events (4 events):
Event: 2.229 Thread 0x0000eb2e8054ca90 Uncommon trap: trap_request=0xffffff45 fr.pc=0x0000eb2e6f1ab46c relative=0x000000000000016c
Event: 2.229 Thread 0x0000eb2e8054ca90 Uncommon trap: reason=unstable_if action=reinterpret pc=0x0000eb2e6f1ab46c method=java.lang.String.isLatin1()Z @ 10 c2
Event: 2.229 Thread 0x0000eb2e8054ca90 DEOPT PACKING pc=0x0000eb2e6f1ab46c sp=0x0000eb2e5a6075b0
Event: 2.229 Thread 0x0000eb2e8054ca90 DEOPT UNPACKING pc=0x0000eb2e6eaa151c sp=0x0000eb2e5a6074f0 mode 2

Classes loaded (92 events):
Event: 1.505 Loading class java/lang/invoke/MethodHandleNatives$Constants
Event: 1.505 Loading class java/lang/invoke/MethodHandleNatives$Constants done
Event: 1.557 Loading class java/lang/invoke/DirectMethodHandle$StaticAccessor
Event: 1.557 Loading class java/lang/invoke/DirectMethodHandle$StaticAccessor done
Event

And many more. You want to know what the JVM did before the crash? This is the section for you.

Dynamic Libraries Subsection

This is followed by information on the loaded dynamic libraries (and where they are loaded):

Dynamic libraries:
00008000-0000a000 rw-p 00000000 00:00 0 
c0000000-100000000 rw-p 00000000 00:00 0 
400000000-400010000 ---p 00000000 00:00 0 
400010000-400ea0000 rw-p 00010000 fc:00 3622183                          /testfolder/output_openjdk27_dev_dbgU_linuxaarch64/testee-vm/lib/server/classes_coh.jsa
400ea0000-401000000 ---p 00000000 00:00 0 
401000000-401020000 rw-p 00000000 00:00 0 
401020000-401040000 ---p 00000000 00:00 0 
401040000-401050000 rw-p 00000000 00:00 0 
401050000-441000000 ---p 00000000 00:00 0 
bd9eff9a0000-bd9eff9a2000 r-xp 00000000 fc:00 3622338                    /testfolder/output_openjdk27_dev_dbgU_linuxaarch64/testee-vm/bin/java

JVM SubSection

At the end of the file, you get information on the exact JVM arguments, all JVM options, signal handlers, and in-depth CPU information. I’m skipping these here for brevity.

Syntax Highlighting

One problem when working with the hs err files is that there is (was) no syntax highlighting in the editor that many people use for JVM C++ development: VSCode (IntelliJ apparently already has a plugin).

So I created one:

This also includes proper section handling, an overview structure, and highlighting of the important information.

Redaction

Sometimes you want to redact sensitive information such as user, folder, and host names, environment variables, or dynamic libraries. For this reason I created the jhserr tool which also supports converting hs err files to JSON and generating them back. This way jhserr can be used by others to develop their own small hs err related tools.

But command-line tools are cumbersome when you just want to process a single hs err file, so I used GraalVM Web Image to compile the jhserr library to WebAssembly and run it in the browser (find it here):

Conclusion

I hoped you liked learning about the hs err files and the new tools I wrote. These tools have been requested by my colleagues developing the JVM and have already proved useful. Looking into GraalVM’s Web Image technology was also interesting, as it allows me to run Java applications on the web, without additional backend servers, which is great for data privacy in the case of a redaction tool.

See you next week (hopefully) for something JCmd-related.

This blog post is part of my work in the SapMachine team at SAP, making profiling easier for everyone.

Author

  • Johannes Bechberger

    Johannes Bechberger is a JVM developer working on profilers and their underlying technology in the SapMachine team at SAP. This includes improvements to async-profiler and its ecosystem, a website to view the different JFR event types, and improvements to the FirefoxProfiler, making it usable in the Java world. His work today comprises many open-source contributions and his blog, where he regularly writes on in-depth profiling and debugging topics. He also works on hello-ebpf, the first eBPF library for Java. His most recent contribution is the new CPU Time Profiler in JDK 25.

    View all posts

New posts like these come out at least every two weeks, to get notified about new posts, follow me on BlueSky, Twitter, Mastodon, or LinkedIn, or join the newsletter:

Leave a Reply

Your email address will not be published. Required fields are marked *