Redacting Data from Heap Dumps via hprof-redact

Two weeks ago, I showed you how Redacting Sensitive Data from Java Flight Recorder Files is possible using my new jfr-redact tool. This tool also supports redacting information from hs-error files, but it doesn’t handle heap dumps. Sadly, there is currently no support in OpenJDK for redacting these files directly, to quote Volker Simonis’ comment under my last blog post:

There’s also “JDK-8337517: Redacted Heap Dumps” (https://bugs.openjdk.org/browse/JDK-8337517) which unfortunately didn’t receive enough support from upstream 🙁

Well, there is now the tool hprof-redact that allows you to easily null all primitives and strings in the heap dump and even implement your own basic redactions when using it as a library. It’s a small tool (written with femtocli, of course) under MIT license, which we’ll cover in this blog post. Please be aware that it is still an early prototype, but it might already be useful:

./hprof-redact source.hprof output.prof

But first, what are heap dumps?

Heap Dumps

Heap dumps are snapshots of a Java application’s heap. It essentially contains all live objects, along with optional thread stacks. You can obtain a heap dump via jmap:

jmap -dump:file=file.hprof PID

This will result in a large binary file. The jmap tool also supports directly compressing the file while writing via -dump:gz=[1 to 9, 1 fastest compression, 9 highest].

You can work with the heap dump using a couple of open-source tools:

hprof-slurp: a heap-dump analyzer written in Rust that supports showing a summary of the instances per class
Eclipse Memory Analyzer Tool: A powerful UI tool to analyze heap dumps. If you like JMC, you’ll like this tool.

Apparently, there’s also support for viewing heap dumps in the Ultimate version of IntelliJ, but I haven’t used it in a while.

The great thing about the OpenJDK heap dump format, compared to the JFR format, is that the heap dump format is somewhat formally specified in a comment at the heapDumper.cpp implementation.

Why do we need to redact?

Heap dumps contain everything you store on the heap. Consider the following example, where, as part of your application, you have a configuration object with a user name and a secret (si:

record Configuration(String user, String password) {
}
public class SecretFieldTest {

    public static void main(String[] args) 
      throws InterruptedException {
        
        Configuration config = 
            new Configuration("admin-user", "very-secret-password");

        System.out.println("Press Ctrl+C to exit...");
        Thread.sleep(Long.MAX_VALUE);
    }
}

In this, a heap dump will clearly contain the one Configuration class instance with the very secret key, that you probably do not want to leak. Also, the key itself isn’t helpful for analyzing your heap anyway.

Sadly, I was unable to get MAT to give me the actual string values, but hprof-slurp and its --listStrings option had me covered.

> java test_programs/SecretFieldTest.java &
> jmap -dump:file=file.hprof 28287
> hprof-slurp --inputFile file.hprof --listStrings | grep "very-secret"
very-secret-password

Maybe MAT not showing the primitive and string values is a sign of how unimportant the actual values are.

Using hprof-redact

Hprof-redact is built to be as simple as possible. It just does what Henry Lin wanted to do in JDK-8337517: It transforms primitive values and strings. You can download hprof-redact from GitHub releases or use it via jbang (jbang hprof-redact@parttimenerd/hprof-redact):

Usage: hprof-redact [-hV] [--transformer=<transformer>] [--verbose] <input>
                    <output>
Stream and redact HPROF heap dumps.
      <input>                        Input HPROF path.
      <output>                       Output HPROF path or '-' for stdout.
  -h, --help                         Show this help message and exit.
  -t, --transformer=<transformer>    Transformer to apply (default: zero).
                                     Options: zero (zero primitives + string
                                     contents), zero-strings (zero string
                                     contents only), drop-strings (empty string
                                     contents).
  -v, --verbose                      Log changed field values (primitive fields
                                     only) to stderr.
  -V, --version                      Print version information and exit.

The zero and zero-strings transformers replace the strings with strings of the same size, only consisting of null-bytes. This ensures that your heap instances have the same size and that you can still detect when large strings are a problem. Of course, this might leak a tiny bit of information, and replacing all strings with empty strings also drastically reduces the size of the heap dump.

When looking at the previous example, we can zero all strings, including the secret of our file.hprof via

target/hprof-redact file.hprof redacted.hprof

Running hprof-slurp on the redacted file shows no secret value.

hprof-slurp --listStrings still lists many other strings, including those related to class names, methods, and more, which are not redacted by default. I’m mainly focusing on redacting primitives and strings with hprof-redact after all.

hprof-redact tries to be fast and save memory, but using a two-pass file parsing: the first pass scans for metadata records to build a mapping of ID to name kind, and the second pass applies transformations to strings and primitive values based on their kind and to heap dump records based on class and field information. Heap dumps can get large, so this is important.

Implementing your own redaction

Having only three simple redaction transformers (at the time of writing) might seem limiting. Still, first: You’re always happy to contribute your own transformers to the project, open a pull request in the GitHub repository.

But you can also use hprof-redact as a library:

<dependency>
    <groupId>me.bechberger</groupId>
    <artifactId>hprof-redact</artifactId>
    <version>0.1.1</version>
</dependency>

Make sure to check for the latest version of the library.

Just implement the HprofTransformer interface. As an example, let’s write a transformer that replaces every string value with "REDACTED" and every integer value with 42:

import me.bechberger.hprof.transformer.HprofTransformer;

public class MyTransformer implements HprofTransformer {
    @Override
    public String transformUtf8String(String value) {
        return "REDACTED";
    }
    
    @Override
    public int transformInt(int value) {
        return 42;
    }
}

You can use it on a heap dump file as follows:

import me.bechberger.hprof.HprofRedact;

void main() throws IOException {
    HprofRedact.process(
        Path.of("input.hprof"),
        Path.of("output.hprof"),
        new MyTransformer());
}

Conclusion

hprof-redact is a simple tool and library that solves a minor pain point in working with heap dumps, featuring a custom, fast heap dump parser and a simple command-line interface.

Thanks for coming this far, I hope you find hprof-redact useful too. See you in another week, probably with something on JSON and parsers.

This blog post is part of my work in the SapMachine team at SAP, making profiling easier for everyone.

Author

Johannes Bechberger

Johannes Bechberger is a JVM developer working on profilers and their underlying technology in the SapMachine team at SAP. This includes improvements to async-profiler and its ecosystem, a website to view the different JFR event types, and improvements to the FirefoxProfiler, making it usable in the Java world. His work today comprises many open-source contributions and his blog, where he regularly writes on in-depth profiling and debugging topics. He also works on hello-ebpf, the first eBPF library for Java. His most recent contribution is the new CPU Time Profiler in JDK 25.

View all posts

New posts like these come out at least every two weeks, to get notified about new posts, follow me on BlueSky, Twitter, Mastodon, or LinkedIn, or join the newsletter:

Mostly nerdless

Every two weeks a text on profiling, debugging or eBPF