Redacting Sensitive Data from Java Flight Recorder Files

I few weeks ago, I showed you how to read and write JFR files programmatically. This week we’re using the covered basic-jfr-professor to create a fully fledged (yet still experimental) JFR and hserr file redaction tool called jfr-redact.

Duke the Java mascot redacting information on a sheet of paper using a black marker

TL;DR: Download jfr-redact from GitHub and redact sensitive information like user names, tokens, and keys from files via:

# Using the JAR directly
java -jar jfr-redact.jar redact recording.jfr

# Redact text files
java -jar jfr-redact.jar redact-text hs_err.log

Foundations

JFR events like jdk.InitialEnvironmentVariable make it really easy to leak information:

As keys and tokens might be passed via environment variables. Additionally, we can have Socket IO-related events that might leak internal hostnames, ports, and more, to name a few.

So we want to remove specific events and redact certain event properties. The OpenJDK already provides us with a basic tool for the former: jfr scrub

jfr scrub subcommand

Use jfr scrub to remove sensitive contents from a file or to reduce its size.

The syntax is:

jfr scrub [–include-events <filter>] [–exclude-events <filter>] [–include-categories <filter>] [–exclude-categories <filter>] [–include-threads <filter>] [–exclude-threads <filter>] <input-file> [<output-file>]

--include-events <filter>
Select events matching an event name.
--exclude-events <filter>
Exclude events matching an event name.
--include-categories <filter>
Select events matching a category name.
--exclude-categories <filter>
Exclude events matching a category name.
--include-threads <filter>
Select events matching a thread name.
--exclude-threads <filter>
Exclude events matching a thread name.
<input-file>
The input file to read events from.
<output-file>
The output file to write filter events to.

Documentation for jfr scrub

But this tool cannot filter properties. There are already ideas to perform simple property filtering directly during JFR recording (see JBS issue). Still, it remains to be seen when and how it will be implemented and integrated.

So we need to implement our own.

Using Basic-JFR-Processor to Build a Simple Redactor

basic-jfr-professor makes it really easy to, for example, create a redaction tool that removes all jdk.InitialEnvironmentVariable events and redact all properties with names “token” and “port” (source):

public static void main(String[] args) {
    if (args.length < 2) {
        System.err.println(
          "Usage: SimpleRedactorExample <input.jfr> <output.jfr>");
        System.exit(2);
    }

    Path input = Path.of(args[0]);
    Path output = Path.of(args[1]);

    // Create a modifier that drops events
    JFREventModifier modifier = new JFREventModifier() {
        @Override
        public boolean shouldRemoveEvent(RecordedEvent event) {
            return event.getEventType().getName()
              .equals("jdk.InitialEnvironmentVariable");
        }

        @Override
        public String process(String fieldName, String value) {
            if (fieldName.equals("token")) {
                return "<redacted>";
            }
            return value;
        }

        @Override
        public int process(String fieldName, int value) {
            if (fieldName.equals("port")) {
                return 0;
            }
            return value;
        }
    };

    JFRProcessor processor = new JFRProcessor(modifier, input);

    try (FileOutputStream out = 
           new FileOutputStream(output.toFile())) {
        // process(...) returns a RecordingImpl
        // that should be closed to finalize the file
        RecordingImpl result = processor.process(out);
        // Close the recording to flush any remaining data
        result.close();
    } catch (IOException e) {
        e.printStackTrace();
        System.exit(1);
    }
}

We can extend this to make it more configurable and also to identify sensitive strings during the discovery phase, and then replace them throughout: My jfr-redact project.

JFR-Redact

It has the following features (from the README):

  • Property Redaction: Redact sensitive properties in events with key and value fields
    • Patterns: password, passwort, pwd, secret, token, key, … (case-insensitive)
  • Event Removal: Remove entire event types that could leak information
    • Examples: jdk.OSInformation, SystemProcess, InitialEnvironmentVariable, ProcessStart
  • Event Filtering: Advanced filtering similar to jfr scrub command (docs)
    • Filter by event name, category, or thread name
    • Supports glob patterns (*, ?) and comma-separated lists
    • Include/exclude filters with flexible combinations
  • String Pattern Redaction: Redact sensitive patterns in string fields
    • Home folders: /Users/[^/]+, C:\Users\[a-zA-Z0-9_\-]+, /home/[^/]+
    • e-mail addresses, UUIDs, IP addresses
    • Configurable to exclude method names, class names, or thread names
  • Two-Pass Discovery: Automatically discover sensitive values and redact them everywhere
    • First pass: Extract usernames, hostnames, and other values from patterns (e.g., extract johndoe from /Users/johndoe)
    • Second pass: Redact discovered values wherever they appear in the file
    • Configurable minimum occurrences and allowed lists to reduce false positives
    • Use --discovery-mode=fast for single-pass (faster), --discovery-mode=default for two-pass (more thorough)
  • Words Mode: Discover and redact specific words/identifiers
    • Discover all distinct words in a file: jfr-redact words discover recording.jfr
    • Create rules to keep or redact specific words
    • Apply rules: jfr-redact words redact app.log redacted.log -r rules.txt
  • Network Redaction: Redact ports and addresses from SocketRead/SocketWrite events
  • Path Redaction: Redact directory paths while keeping filenames (configurable)
  • Pseudonymization: Preserve relationships between values while protecting data
    • Hash mode: Consistent mapping to pseudonyms (e.g., <redacted:a1b2c3>)
    • Counter mode: Sequential numbering (value1→1, value2→2)
    • Realistic mode: Generate plausible alternatives (e.g., john.doe@company.comalice.smith@test.com)
    • Custom replacements: Define specific mappings in config (e.g., johndoealice, /home/johndoe/home/testuser)
    • Optional, enabled via --pseudonymize flag
  • Text File Redaction: Apply the same redaction patterns to arbitrary text files
    • Perfect for redacting Java error logs (hs_err_pid*.log), which contain system properties, environment variables, and file paths

As I mentioned earlier, it’s highly experimental, so we offer no guarantees that it works correctly; however, it should capture most sensitive information.

After downloading it, you basically just call it, using either a custom config or a predefined one:

jfr-redact redact recording.jfr redacted.jfr --config strict

It supports a superset of the features of jfr scrub. There are three additional interesting modes that I would like to showcase: the text redaction and the words mode, as well as the ability to concatenate multiple JFR files.

Text Redaction

You can configure jfr-redact with custom string redaction rules, which jfr-redact applies to strings within JFR events. One example is the redaction of IP addresses:

    # IP addresses
    ip_addresses:
      enabled: true
      patterns:
        - '\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b'
        - '\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\b'

Such rules can be used to create custom config files for basic text redaction. A common form of text that I often encounter in JDK development is error files generated when a JVM exits involuntarily, known as hserr files. This why I created a special hserr.yaml config, which removes all information from hserr files that might be sensitive.

You can use the text-redaction feature as follows:

# Redact a Java error log file (hs_err_pid*.log)
# Uses the preset hserr by default
java -jar jfr-redact.jar redact-text hs_err_pid12345.log hs_err_pid12345.redacted.log

# Redact any text file with pseudonymization
java -jar jfr-redact.jar redact-text debug-output.txt debug-output.redacted.txt --pseudonymize

And provide custom configuration files via the --config option. So

OS:
uname: Darwin FVF 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul  5 22:22:52 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T8103 arm64
OS uptime: 3 days 22:33 hours
rlimit (soft/hard): STACK 8176k/65520k , CORE 0k/infinity , NPROC 2666/4000 , NOFILE 10240/infinity , AS infinity/infinity , CPU infinity/infinity , DATA infinity/infinity , FSIZE infinity/infinity , MEMLOCK infinity/infinity , RSS infinity/infinity
load average: 9.25 11.63 11.44

CPU: total 8 (initial active 8) 0x61:0x0:0x1b588bb3:0, fp, asimd, aes, pmull, sha1, sha256, crc32, lse, sha3, sha512

Is replaced with

OS:
uname: Darwin *** 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul  5 22:22:52 PDT 2023; root:***~6/RELEASE_ARM64_T8103 arm64
OS uptime: ***
rlimit (soft/hard): STACK 8176k/65520k , CORE 0k/infinity , NPROC 2666/4000 , NOFILE 10240/infinity , AS infinity/infinity , CPU infinity/infinity , DATA infinity/infinity , FSIZE infinity/infinity , MEMLOCK infinity/infinity , RSS infinity/infinity
load average: 9.25 11.63 11.44

CPU: total 8 (initial active 8) 0x61:0x0:0x1b588bb3:0, fp, asimd, aes, pmull, sha1, sha256, crc32, lse, sha3, sha512

Words Mode

Perhaps you don’t trust the complex redaction engine; maybe you want to review the individual words in the file and redact specific words. jfr-redact has you covered with a mode that Götz, a colleague of mine, requested. It models his own workflow:

  1. Get all words in a file sorted alphabetically: Words must match [a-zA-Z0-9_\\-+/]+, contain at least one letter and are not hexadecimal numbers
  2. Read through all the words to check whether you spot something sensitive
  3. Redact all sensitive words

This is more work than auto-redaction, but less than reading through a whole document/JFR printout. jfr-redact can’t help you with step 2, but for step 1 it has the words discover, and for step 3 the words redact command. As always, you can read more about these commands in the README; I provide a brief overview in the following.

Consider the hserr file excerpt from the previous section. Let’s store it in a file called hserr.log and start the discovery:

> jfr-redact words discover hserr.log words.txt

Successfully wrote 47 words to:
  .../words.txt

The generated file looks as follows:

0k/infinity
10240/infinity
6/RELEASE_ARM64_T8103
8176k/65520k
AS
CORE
CPU
DATA
Darwin
FSIZE
FVF
...
xnu-8796.141.3

Consider now that we find xnu-8796.141.3 to be sensitive. We can just prefix its line with - (all non-prefixed lines are ignored):

0k/infinity
...
- xnu-8796.141.3

Now we can run the redaction of the hserr file using the rules in the words file:

> jfr-redact words redact hserr.log hserr.redacted.log -r words.txt
Loaded 1 redaction rules
Redacting text file: hserr.log
Processed 7 lines total
Redacted 1 lines
Processed 50 unique values: 1 redacted, 0 kept
Wrote redacted output to: hserr.redacted.log

The resulting file has, as expected, the string xnu-8796.141.3 replaced with ***:

OS:
uname: Darwin FVF 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul  5 22:22:52 PDT 2023; root:***~6/RELEASE_ARM64_T8103 arm64
OS uptime: 3 days 22:33 hours
rlimit (soft/hard): STACK 8176k/65520k , CORE 0k/infinity , NPROC 2666/4000 , NOFILE 10240/infinity , AS infinity/infinity , CPU infinity/infinity , DATA infinity/infinity , FSIZE infinity/infinity , MEMLOCK infinity/infinity , RSS infinity/infinity
load average: 9.25 11.63 11.44

CPU: total 8 (initial active 8) 0x61:0x0:0x1b588bb3:0, fp, asimd, aes, pmull, sha1, sha256, crc32, lse, sha3, sha512

It’s essential to note that only single words are matched. The redaction rules also support glob patterns, and keeping and replacing. Read more about this in the help documentation of words redact.

Concat Mode

A feature that is not directly related to redaction but to JFR processing in general is the ability to concatenate multiple JFR files without any processing:

# Concatenate two JFR files
java -jar jfr-redact.jar concat one.jfr two.jfr -o combined.jfr

This process takes a long time, especially for larger files, possibly due to the JMC writer API not prioritizing performance. Concatenating 250MB of JFR takes, for example, around 15 minutes on my MacBook Pro M4.

An interesting observation is that passing a large file (e.g., a 242 MB one) on its own reduces its size substantially (to 182 MB in my case). So you can also use the concatenation for compression. The reason for this is that a JFR file typically consists of multiple parts, each with its own constant pool. But the JMC writer API only creates one part, so there is only one instance of any constant, which saves memory.

Usage as a Library

You can, of course, use jfr-redact also as a library:

<dependency>
  <groupId>me.bechberger</groupId>
  <artifactId>jfr-redact</artifactId>
  <version>0.1.2</version>
</dependency>

Refer to the tests and individual command implementations to see how to use the library.

Conclusion

I always wanted to have a small tool to remove information from JFR files, and recently found the time to implement this tool, building a few smaller libraries along the way. jfr-redact enables the easy redaction of sensitive information from JFR and text files, and even supports pseudonymization.

I hope this tool is as helpful for you as it is for me. See you in another week with something different.

This blog post is part of my work in the SapMachine team at SAP, making profiling easier for everyone.

Author

  • Johannes Bechberger

    Johannes Bechberger is a JVM developer working on profilers and their underlying technology in the SapMachine team at SAP. This includes improvements to async-profiler and its ecosystem, a website to view the different JFR event types, and improvements to the FirefoxProfiler, making it usable in the Java world. His work today comprises many open-source contributions and his blog, where he regularly writes on in-depth profiling and debugging topics. He also works on hello-ebpf, the first eBPF library for Java. His most recent contribution is the new CPU Time Profiler in JDK 25.

    View all posts

New posts like these come out at least every two weeks, to get notified about new posts, follow me on BlueSky, Twitter, Mastodon, or LinkedIn, or join the newsletter:

2 thoughts on “Redacting Sensitive Data from Java Flight Recorder Files

Leave a Reply

Your email address will not be published. Required fields are marked *